Managing Projects
Learn how to organize your scraping activities using projects, sessions, and effective project management strategies.
Managing Projects
Projects are the organizational foundation of Best AI Scraper. They help you group related scraping activities, manage configurations, and track results across different websites or campaigns.
Project Structure
Hierarchy Overview
Project
├── Actions (what to extract)
├── Sessions (individual crawl runs)
│ ├── Pages (crawled URLs)
│ └── Results (extracted data)
└── Settings (project configuration)
Project Lifecycle
- Create Project: Define scope and objectives
- Configure Actions: Specify what data to extract
- Run Sessions: Execute crawling with different parameters
- Analyze Results: Review and export extracted data
- Iterate: Refine actions and re-run as needed
Creating Projects
Via Dashboard
- Navigate to your Dashboard
- Click "New Project"
- Fill in project details:
- Name: Descriptive project name
- Description: Project goals and scope
- Tags: Organizational labels
- Website: Primary target website (optional)
Via API
curl -X POST https://api.bestaiscraper.com/projects \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "E-commerce Competitor Analysis",
"description": "Track pricing and product information from competitor sites",
"tags": ["ecommerce", "monitoring", "competitive"],
"settings": {
"respectRobots": true,
"crawlDelay": 2000,
"userAgent": "BestAIScraper/1.0"
}
}'
Project Configuration
Basic Settings
Every project has configurable settings that apply to all sessions:
{
"settings": {
"respectRobots": true,
"crawlDelay": 1000,
"maxConcurrentPages": 5,
"userAgent": "BestAIScraper/1.0",
"timeout": 30000,
"retryCount": 3
}
}
Setting Descriptions:
respectRobots
(boolean): Follow robots.txt directivescrawlDelay
(number): Delay between requests in millisecondsmaxConcurrentPages
(number): Maximum concurrent page requestsuserAgent
(string): User agent string for requeststimeout
(number): Page load timeout in millisecondsretryCount
(number): Number of retries for failed pages
Advanced Configuration
{
"settings": {
"authentication": {
"type": "basic",
"username": "user",
"password": "pass"
},
"headers": {
"X-Custom-Header": "value"
},
"cookies": [
{
"name": "session",
"value": "abc123",
"domain": "example.com"
}
]
}
}
Project Templates
Speed up project creation with pre-configured templates:
E-commerce Analysis
{
"template": "ecommerce-analysis",
"actions": [
{
"type": "ecommerce-data",
"config": {
"extractPrices": true,
"extractReviews": true
}
},
{
"type": "structured-data",
"config": {
"schemas": ["Product", "Offer"]
}
}
]
}
SEO Audit
{
"template": "seo-audit",
"actions": [
{
"type": "internal-links",
"config": {
"followDepth": 3,
"extractAnchorContext": true
}
},
{
"type": "structured-data",
"config": {
"validateSchema": true
}
}
]
}
Content Discovery
{
"template": "content-discovery",
"actions": [
{
"type": "internal-links",
"config": {
"followDepth": 5,
"filterPatterns": ["*/blog/*", "*/articles/*"]
}
}
]
}
Managing Sessions
Session Types
One-time Crawl
Perfect for ad-hoc analysis:
{
"type": "one-time",
"startUrl": "https://example.com",
"maxPages": 100
}
Scheduled Crawl
For regular monitoring:
{
"type": "scheduled",
"startUrl": "https://example.com",
"schedule": "0 9 * * 1",
"maxPages": 50
}
Incremental Crawl
Only crawl new/changed pages:
{
"type": "incremental",
"startUrl": "https://example.com",
"lastCrawlDate": "2025-01-01T00:00:00Z"
}
Session Monitoring
Track session progress in real-time:
curl -X GET https://api.bestaiscraper.com/sessions/sess_123/status \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"id": "sess_123",
"status": "running",
"progress": {
"pages_crawled": 25,
"pages_found": 47,
"pages_remaining": 22,
"current_url": "https://example.com/page-25",
"estimated_completion": "2025-01-06T10:15:00Z"
}
}
Project Collaboration
Team Access
Invite team members to collaborate on projects:
curl -X POST https://api.bestaiscraper.com/projects/proj_123/members \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"email": "teammate@example.com",
"role": "editor"
}'
Role Permissions:
- Viewer: Read-only access to project and results
- Editor: Can run sessions and modify actions
- Admin: Full project management including member management
Sharing Results
Generate shareable links for results:
curl -X POST https://api.bestaiscraper.com/projects/proj_123/share \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"permissions": "read",
"expireAt": "2025-02-01T00:00:00Z"
}'
Project Analytics
Usage Tracking
Monitor project resource usage:
{
"project_id": "proj_123",
"analytics": {
"total_sessions": 15,
"pages_crawled": 1250,
"data_points_extracted": 5430,
"success_rate": 0.96,
"avg_session_duration": 180,
"monthly_usage": {
"api_calls": 847,
"storage_mb": 23.4
}
}
}
Performance Insights
Identify optimization opportunities:
{
"insights": [
{
"type": "efficiency",
"message": "Consider reducing followDepth to improve crawl speed",
"action": "internal-links",
"impact": "medium"
},
{
"type": "coverage",
"message": "12% of pages failed to extract data",
"pages": ["https://example.com/js-heavy-page"],
"suggestion": "Enable JavaScript rendering"
}
]
}
Data Export & Integration
Export Formats
Export project results in multiple formats:
- JSON: Programmatic integration
- CSV: Spreadsheet analysis
- XML: Legacy system integration
- Webhook: Real-time streaming
Export API
curl -X GET https://api.bestaiscraper.com/projects/proj_123/export \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Accept: text/csv" \
-G -d "format=csv" -d "action_type=internal-links"
Webhook Integration
Set up webhooks for automated data processing:
{
"webhook": {
"url": "https://your-app.com/webhook",
"events": ["session.completed"],
"filters": {
"project_id": "proj_123",
"action_types": ["ecommerce-data"]
}
}
}
Best Practices
Project Organization
- Use descriptive names: Include target website and purpose
- Tag consistently: Develop a tagging taxonomy
- Document objectives: Clear descriptions help team members
- Archive completed projects: Keep workspace clean
Performance Optimization
- Start small: Test with limited pages before scaling
- Monitor resources: Track usage against plan limits
- Optimize actions: Remove unnecessary extractors
- Use filters: Target specific content areas
Data Management
- Regular exports: Don't rely solely on platform storage
- Version control: Track configuration changes
- Data validation: Verify extraction accuracy
- Cleanup old data: Remove outdated sessions
Need help organizing your projects? Check our Getting Started guide or contact support for personalized assistance.
Previous