Getting Started

This guide walks you through setting up your first project and running your first crawl. Takes about 10 minutes to get results.

The system is built around projects that contain crawl sessions. Each session crawls a website and extracts the content into structured data you can analyze or export.

How It Works

The crawler works in layers:

Projects - Organize related crawling activities
Sessions - Individual crawl runs with specific settings
Pages - Extracted content with metadata and links
Queue - Manages page discovery and crawling order

Data comes out clean and structured, not as raw HTML. You can export to CSV, JSON, or connect via API.

Quick Start: First Crawl

Step 1: Create a Project

Projects group related crawls together.

Via Dashboard:

Go to Dashboard
Click "New Project"
Give it a name and description
Choose project type if you want (optional - helps with default settings)

Via API:

curl -X POST https://api.bestaiscraper.com/projects \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Site Analysis",
    "description": "Analyze competitor site structure"
  }'

Use clear names - you'll have multiple projects eventually.

Step 2: Crawl Settings

Default settings work for most sites:

{
  "settings": {
    "respectRobots": true,
    "crawlDelay": 2000,
    "maxPages": 50,
    "timeout": 25000,
    "userAgent": "BestAIScraper/1.0"
  }
}

These prevent getting blocked and ensure reliable extraction. You can adjust them later if needed.

Step 3: Start Crawling

Dashboard Method:

Open your project
Click "Start Crawl"
Enter the website URL
Set max pages (start with 25)
Click "Start"

API Method:

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "maxPages": 25,
    "timeout": 25000
  }'

What Happens:

System discovers pages by following links
Content gets extracted and cleaned
Links between pages are tracked
Data gets organized for export

Step 4: Monitor Progress

Check crawl status:

curl -X GET https://api.bestaiscraper.com/sessions/456/status \
  -H "Authorization: Bearer YOUR_API_KEY"

Response:

{
  "status": "running",
  "progress": {
    "pages_crawled": 12,
    "pages_found": 28,
    "current_url": "https://example.com/page-12",
    "crawl_speed": "2.4 pages/minute"
  }
}

When status shows "completed", your data is ready.

Step 5: Get Your Data

View Results: Go to Projects → Your Project → Results

Export Data:

curl -X GET https://api.bestaiscraper.com/projects/123/export \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Accept: text/csv"

What You Get:

Clean page content (no HTML tags)
Page titles, descriptions, word counts
Link relationships between pages
Export as CSV, JSON, or via API

What You Get

After a successful crawl:

All page content cleaned and structured
Page titles, descriptions, metadata
Link relationships mapped
Content organized by page hierarchy
Data ready for analysis or export

This beats manual copy-pasting because it's consistent, complete, and you can repeat it anytime.

Marketing: Find content gaps, analyze messaging, discover keywords E-commerce: Compare prices, track product changes, monitor inventory Analysis: Research markets, track trends, gather competitive data

The data format stays consistent, so you can build analysis workflows around it.

Next Steps

Multiple Sites

Crawl several sites in the same project:

sites=("site1.com" "site2.com" "site3.com")

for site in "${sites[@]}"; do
  curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -d "{\"url\": \"https://$site\", \"maxPages\": 30}"
done

Webhooks

Get notified when crawls complete:

curl -X POST https://api.bestaiscraper.com/projects/123/webhooks \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://your-app.com/webhook",
    "events": ["session.completed"]
  }'

Scheduled Crawls

Planned feature - Run crawls automatically:

curl -X POST https://api.bestaiscraper.com/projects/123/schedule \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "schedule": "0 9 * * 1",
    "maxPages": 50
  }'

Why This Works Better

vs Manual Research:

Faster and more consistent
Repeatable process
Better data format
No copy-paste errors

vs Developer Tools:

No setup or maintenance
Works immediately
No technical knowledge needed
Predictable costs

vs Enterprise Software:

Quick to implement
Reasonable pricing
Built for actual business needs

The main benefit is having reliable, current data instead of guessing or using outdated information.

Learning Path

Week 1: Run several crawls, understand the data format, try exports Week 2: Learn about Actions and Project ManagementWeek 3: Set up monitoring, integrate with your existing tools

The core concepts are simple - projects contain sessions, sessions crawl sites, sites become structured data.

Documentation

Project Management - Organizing crawls
Crawler Core Concepts - How the system works
Troubleshooting - Common issues and solutions

Getting Help

Common Questions

"Crawls seem slow" - Normal. 2-3 pages per minute ensures good data quality.

"Can I crawl protected sites?" - Yes, see authentication docs.

"Getting blocked?" - Rare with default settings. See troubleshooting guide.

"Need more pages?" - Increase maxPages or upgrade plan.

Support

Email: support@bestaiscraper.com
Chat: Available in dashboard
Docs: Comprehensive guides and tutorials

We respond quickly and solve problems instead of sending you in circles.

You're Done

You now have a working system for extracting structured data from websites. It's faster and more reliable than manual research.

Next steps:

Run more crawls to get familiar with the data
Export results and see how they fit your workflow
Read other docs to understand advanced features

The system is straightforward - you'll figure out the rest by using it.

Dashboard | All Documentation

Getting Started

Getting Started

How It Works

Quick Start: First Crawl

Step 1: Create a Project

Step 2: Crawl Settings

Step 3: Start Crawling

Step 4: Monitor Progress

Step 5: Get Your Data

What You Get

Common Uses

Next Steps

Multiple Sites

Webhooks

Scheduled Crawls

Why This Works Better

Learning Path

Documentation

Getting Help

Common Questions

Support

You're Done

Crawler Core Concepts

Managing Projects