crawl command crawls a website and stores the data without running audit rules. Use this to separate crawling from analysis, or to crawl first and analyze later.
Usage
Arguments
| Argument | Description |
|---|---|
url | The URL to crawl (required) |
Options
| Option | Alias | Description | Default |
|---|---|---|---|
--maxPages | -m | Maximum pages to crawl | 500 |
--refresh | -r | Ignore cache, fetch all pages fresh | false |
--resume | Resume interrupted crawl | false |
Examples
Basic Crawl
Crawl More Pages
Fresh Crawl (Ignore Cache)
Resume Interrupted Crawl
Crawl Behavior
The crawl command:- Fetches and stores HTML content for each page
- Extracts and follows internal links
- Respects robots.txt and sitemaps
- Deduplicates URLs automatically
- Caches page content locally
Output
squirrel analyze to run audit rules on the stored data.
Exit Codes
| Code | Meaning |
|---|---|
0 | Success |
1 | Error (invalid URL, crawl failed, etc.) |
Configuration
The crawl command respects settings fromsquirrel.toml:
Workflow
- You want to crawl once and analyze multiple times
- Testing different rule configurations
- Crawling is slow and you want to iterate on analysis
Related
- analyze - Analyze stored crawl
- audit - Crawl + analyze in one command
- Configuration - Config file options