Getting Started
Set up your first crawling project and start extracting data from websites. Simple setup, reliable results.
Getting Started
This guide walks you through setting up your first project and running your first crawl. Takes about 10 minutes to get results.
The system is built around projects that contain crawl sessions. Each session crawls a website and extracts the content into structured data you can analyze or export.
How It Works
The crawler works in layers:
- Projects - Organize related crawling activities
- Sessions - Individual crawl runs with specific settings
- Pages - Extracted content with metadata and links
- Queue - Manages page discovery and crawling order
Data comes out clean and structured, not as raw HTML. You can export to CSV, JSON, or connect via API.
Quick Start: First Crawl
Step 1: Create a Project
Projects group related crawls together.
Via Dashboard:
- Go to Dashboard
- Click "New Project"
- Give it a name and description
- Choose project type if you want (optional - helps with default settings)
Via API:
curl -X POST https://api.bestaiscraper.com/projects \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Site Analysis",
"description": "Analyze competitor site structure"
}'
Use clear names - you'll have multiple projects eventually.
Step 2: Crawl Settings
Default settings work for most sites:
{
"settings": {
"respectRobots": true,
"crawlDelay": 2000,
"maxPages": 50,
"timeout": 25000,
"userAgent": "BestAIScraper/1.0"
}
}
These prevent getting blocked and ensure reliable extraction. You can adjust them later if needed.
Step 3: Start Crawling
Dashboard Method:
- Open your project
- Click "Start Crawl"
- Enter the website URL
- Set max pages (start with 25)
- Click "Start"
API Method:
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"maxPages": 25,
"timeout": 25000
}'
What Happens:
- System discovers pages by following links
- Content gets extracted and cleaned
- Links between pages are tracked
- Data gets organized for export
Step 4: Monitor Progress
Check crawl status:
curl -X GET https://api.bestaiscraper.com/sessions/456/status \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"status": "running",
"progress": {
"pages_crawled": 12,
"pages_found": 28,
"current_url": "https://example.com/page-12",
"crawl_speed": "2.4 pages/minute"
}
}
When status shows "completed", your data is ready.
Step 5: Get Your Data
View Results: Go to Projects → Your Project → Results
Export Data:
curl -X GET https://api.bestaiscraper.com/projects/123/export \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Accept: text/csv"
What You Get:
- Clean page content (no HTML tags)
- Page titles, descriptions, word counts
- Link relationships between pages
- Export as CSV, JSON, or via API
What You Get
After a successful crawl:
- All page content cleaned and structured
- Page titles, descriptions, metadata
- Link relationships mapped
- Content organized by page hierarchy
- Data ready for analysis or export
This beats manual copy-pasting because it's consistent, complete, and you can repeat it anytime.
Common Uses
Marketing: Find content gaps, analyze messaging, discover keywords E-commerce: Compare prices, track product changes, monitor inventory Analysis: Research markets, track trends, gather competitive data
The data format stays consistent, so you can build analysis workflows around it.
Next Steps
Multiple Sites
Crawl several sites in the same project:
sites=("site1.com" "site2.com" "site3.com")
for site in "${sites[@]}"; do
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
-H "Authorization: Bearer YOUR_API_KEY" \
-d "{\"url\": \"https://$site\", \"maxPages\": 30}"
done
Webhooks
Get notified when crawls complete:
curl -X POST https://api.bestaiscraper.com/projects/123/webhooks \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://your-app.com/webhook",
"events": ["session.completed"]
}'
Scheduled Crawls
Planned feature - Run crawls automatically:
curl -X POST https://api.bestaiscraper.com/projects/123/schedule \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"schedule": "0 9 * * 1",
"maxPages": 50
}'
Why This Works Better
vs Manual Research:
- Faster and more consistent
- Repeatable process
- Better data format
- No copy-paste errors
vs Developer Tools:
- No setup or maintenance
- Works immediately
- No technical knowledge needed
- Predictable costs
vs Enterprise Software:
- Quick to implement
- Reasonable pricing
- Built for actual business needs
The main benefit is having reliable, current data instead of guessing or using outdated information.
Learning Path
Week 1: Run several crawls, understand the data format, try exports Week 2: Learn about Actions and Project ManagementWeek 3: Set up monitoring, integrate with your existing tools
The core concepts are simple - projects contain sessions, sessions crawl sites, sites become structured data.
Documentation
- Project Management - Organizing crawls
- Crawler Core Concepts - How the system works
- Troubleshooting - Common issues and solutions
Getting Help
Common Questions
"Crawls seem slow" - Normal. 2-3 pages per minute ensures good data quality.
"Can I crawl protected sites?" - Yes, see authentication docs.
"Getting blocked?" - Rare with default settings. See troubleshooting guide.
"Need more pages?" - Increase maxPages or upgrade plan.
Support
- Email: support@bestaiscraper.com
- Chat: Available in dashboard
- Docs: Comprehensive guides and tutorials
We respond quickly and solve problems instead of sending you in circles.
You're Done
You now have a working system for extracting structured data from websites. It's faster and more reliable than manual research.
Next steps:
- Run more crawls to get familiar with the data
- Export results and see how they fit your workflow
- Read other docs to understand advanced features
The system is straightforward - you'll figure out the rest by using it.
Previous