Skip to main content
The crawl command crawls a website and stores the data without running audit rules. Use this to separate crawling from analysis, or to crawl first and analyze later.

Usage

squirrel crawl <url> [options]

Arguments

ArgumentDescription
urlThe URL to crawl (required)

Options

OptionAliasDescriptionDefault
--maxPages-mMaximum pages to crawl500
--refresh-rIgnore cache, fetch all pages freshfalse
--resumeResume interrupted crawlfalse

Examples

Basic Crawl

squirrel crawl https://example.com

Crawl More Pages

squirrel crawl https://example.com -m 1000

Fresh Crawl (Ignore Cache)

squirrel crawl https://example.com --refresh

Resume Interrupted Crawl

squirrel crawl https://example.com --resume

Crawl Behavior

The crawl command:
  • Fetches and stores HTML content for each page
  • Extracts and follows internal links
  • Respects robots.txt and sitemaps
  • Deduplicates URLs automatically
  • Caches page content locally

Output

Crawling: https://example.com
Max pages: 500

✓ Crawled 42 pages in 12.3s

Crawl ID: a7b3c2d1
After crawling, use squirrel analyze to run audit rules on the stored data.

Exit Codes

CodeMeaning
0Success
1Error (invalid URL, crawl failed, etc.)

Configuration

The crawl command respects settings from squirrel.toml:
[crawler]
max_pages = 100
delay_ms = 200
timeout_ms = 30000
include = ["/blog/*"]
exclude = ["/admin/*"]
See Configuration for all options.

Workflow

# 1. Crawl the site
squirrel crawl https://example.com

# 2. Analyze the crawl
squirrel analyze

# 3. View the report
squirrel report
This workflow is useful when:
  • You want to crawl once and analyze multiple times
  • Testing different rule configurations
  • Crawling is slow and you want to iterate on analysis