Troubleshooting Guide

Common problems and how to fix them. Most issues have simple solutions.

Normal Behavior

Before troubleshooting, check if what you're seeing is actually normal:

Crawl speed: 2-3 pages per minute is normal for quality data
Some errors: 5-10% error rate is expected on large sites
Processing time: Results take a few minutes to appear
Data format: We extract clean content, not raw HTML

If these look normal, your crawl is working fine.

Speed Issues

Slow Crawling

Normal speed: 2-3 pages per minute
Why: Quality over speed - prevents corrupted data and blocks

When to investigate:

Less than 1 page per minute consistently
Crawl stuck on same page for 10+ minutes
Multiple timeouts in a row

Solutions:

# Reduce timeout for faster failures
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://target.com",
    "maxPages": 25,
    "timeout": 15000
  }'

# Skip problematic pages
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://target.com",
    "maxPages": 50,
    "skipErrors": true
  }'

Tips:

Start with 10-20 pages first
Try different times of day
Target specific sections vs entire site

Page Timeouts

Common causes and fixes:

Slow website (60% of cases)

# Increase timeout
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "timeout": 45000,
    "maxPages": 20
  }'

Heavy JavaScript (25% of cases)

# Enable JavaScript rendering
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "enableJavaScript": true,
    "timeout": 30000
  }'

Server overload (10% of cases)

# Add delays between requests
curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "crawlDelay": 3000,
    "respectRobots": true
  }'

Geographic restrictions (5% of cases)

Contact support for proxy options
Try different start URLs on same domain

Access Issues

Getting Blocked (403/429 Errors)

Why: Website thinks you're crawling too aggressively
Frequency: Less than 2% of crawls with default settings

Immediate fix:

{
  "respectRobots": true,
  "crawlDelay": 5000,
  "maxConcurrentPages": 1,
  "userAgent": "BestAIScraper/1.0"
}

Gentle approach:

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://target.com",
    "crawlDelay": 10000,
    "maxPages": 10,
    "respectRobots": true
  }'

Authentication Required

Basic authentication:

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://secure-site.com",
    "auth": {
      "type": "basic",
      "username": "your-username",
      "password": "your-password"
    }
  }'

Cookie-based sessions:

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://member-site.com",
    "cookies": [
      {
        "name": "session_id",
        "value": "abc123xyz",
        "domain": "member-site.com"
      }
    ]
  }'

Custom headers (API keys, tokens):

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "url": "https://api-site.com",
    "headers": {
      "X-API-Key": "your-api-key",
      "Authorization": "Bearer your-token"
    }
  }'

Contact support if you need help setting up authentication.

Data Quality Issues

Data Looks Different

This is usually good - we extract clean, structured data instead of raw HTML.

What we transform:

<!-- Raw HTML -->
<div class="product-title">Widget Pro</div>
<span class="price">$299.99</span>

<!-- Clean output -->
Product: Widget Pro
Price: $299.99

Common "issues" that are actually features:

Missing navigation/footer: We extract main content, skip boilerplate

Solution: Use includeNavigation: true if needed

Different formatting: We convert HTML to clean markdown

Solution: Raw HTML available with includeRawHTML: true

Missing images: We extract text content by default

Solution: Enable with extractImages: true

Pages Have No Data

JavaScript-heavy content (70%)

curl -X POST https://api.bestaiscraper.com/projects/123/crawl \
  -d '{
    "enableJavaScript": true,
    "waitForContent": 5000
  }'

Empty/error pages (20%)

Normal - we skip pages with no valuable content
Check status codes - 404s and redirects are expected

Authentication required (5%)

See authentication section above

Content behind forms (3%)

Contact support for form automation

Unusual page structure (2%)

Single-page apps, unusual CMS systems
Contact support for custom extractors

API Issues

API Calls Not Working

Quick diagnostics:

# Test basic API access
curl -X GET https://api.bestaiscraper.com/projects \
  -H "Authorization: Bearer YOUR_API_KEY"

# Check API key format
echo $YOUR_API_KEY | wc -c  # Should be 40+ characters

# Verify project exists
curl -X GET https://api.bestaiscraper.com/projects/123 \
  -H "Authorization: Bearer YOUR_API_KEY"

Common issues:

401 Unauthorized: Wrong or expired API key

Generate new key in dashboard settings

404 Not Found: Wrong project ID or endpoint URL

Check project ID in dashboard URL

429 Rate Limiting: Too many requests

Add 1-second delays between API calls

Timeout: Large crawls or slow networks

Use webhooks for async processing

Webhook Issues

Debugging checklist:

# Test endpoint manually
curl -X POST https://your-app.com/webhook \
  -H "Content-Type: application/json" \
  -d '{"test": true}'

# Check webhook config
curl -X GET https://api.bestaiscraper.com/projects/123/webhooks \
  -H "Authorization: Bearer YOUR_API_KEY"

Common problems:

SSL certificate: Ensure valid HTTPS
Response code: Return 200 OK
Timeout: Respond within 10 seconds
Content-Type: Accept application/json

Getting Help

When to Contact Support

Contact us for:

Billing or account issues
Site-specific blocking problems
Custom authentication setup
Enterprise features

Try self-service first for:

Slow crawls (usually normal)
Basic timeouts (increase timeout)
Standard authentication (use guides above)

How to Get Help

Include this info:

Project ID (from dashboard URL)
Crawl Session ID (if applicable)
Target website
Expected vs actual result
Steps already tried

Contact methods:

Email: support@bestaiscraper.com (2-4 hours)
Live chat: Dashboard support widget (immediate)

Advanced Diagnostics

Check Crawl Health

# Session overview
curl -X GET https://api.bestaiscraper.com/sessions/456 \
  -H "Authorization: Bearer $API_KEY"

# Failed pages analysis
curl -X GET https://api.bestaiscraper.com/sessions/456/pages?error=true \
  -H "Authorization: Bearer $API_KEY"

# Queue status
curl -X GET https://api.bestaiscraper.com/sessions/456/queue \
  -H "Authorization: Bearer $API_KEY"

Custom Configurations

High-security sites:

{
  "respectRobots": true,
  "crawlDelay": 10000,
  "userAgent": "Mozilla/5.0 (compatible; BestAIScraper/1.0)",
  "maxRetries": 1,
  "timeout": 60000
}

JavaScript-heavy sites:

{
  "enableJavaScript": true,
  "waitForContent": 10000,
  "scrollPage": true,
  "timeout": 45000
}

Large e-commerce sites:

{
  "maxPages": 100,
  "crawlDelay": 2000,
  "focusAreas": ["products", "categories"],
  "skipPatterns": ["reviews", "user-content"]
}

Prevention

Best Practices

Before every crawl:

Start small (10-20 pages) to test
Check robots.txt for restrictions
Test during off-peak hours
Use descriptive project names

For long-term success:

Monitor success rates, adjust settings
Track site changes that affect crawling
Set up alerts for important sites
Document what works for different site types

Success Metrics

Excellent performance:

90%+ success rate
2-4 pages per minute
Rich content from most pages

Good performance:

80-90% success rate
1-2 pages per minute
Clean content from key pages

Investigate if:

Below 70% success rate
Less than 1 page per minute
Mostly empty pages

Focus on quality over quantity - 50 valuable pages beats 200 empty ones.

You're Set

This covers 95% of issues you'll encounter. The system is reliable - most problems have simple solutions.

Remember: Every "problem" is just a step toward better data extraction. Teams that push through initial challenges build the best competitive intelligence systems.

Contact support if you're still stuck. We solve problems, not create more paperwork.

Dashboard | Documentation

Troubleshooting Guide

Managing Projects