Skip to main content
squirrelscan uses a TOML configuration file to customize crawling, analysis, and output behavior. This is optional - squirrelscan works out of the box with sensible defaults.

Config File Location

squirrelscan looks for squirrel.toml starting from your current directory and walking up to your home directory. This means you can:
  • Place a config in your project root for project-specific settings
  • Place a config in ~ for global defaults
To create a config file:
squirrel init

Zero-Config Mode

If no config file exists, squirrelscan uses defaults that work well for most sites:
  • Crawls up to 50 pages
  • 100ms delay between requests
  • Respects robots.txt
  • Checks external links for broken URLs (cached for 7 days)
  • Runs all rules except AI-powered ones

Config Reference

project

Project-level settings for multi-domain support.
[project]
domains = ["example.com", "blog.example.com"]
KeyTypeDefaultDescription
domainsstring[][]Allowed domains for crawling. Supports subdomain wildcards - ["example.com"] allows www.example.com, api.example.com, etc. When empty, only the seed URL’s host is crawled.

crawler

Controls how squirrelscan discovers and fetches pages.
[crawler]
max_pages = 100
concurrency = 10
per_host_delay_ms = 500
exclude = ["/admin/*", "*.pdf"]
KeyTypeDefaultDescription
max_pagesnumber50Maximum pages to crawl per audit
delay_msnumber100Base delay between requests (ms)
timeout_msnumber30000Request timeout (ms)
user_agentstring""User agent (empty = random browser UA per crawl)
follow_redirectsbooleantrueFollow HTTP 3xx redirects
concurrencynumber5Maximum concurrent requests globally
per_host_concurrencynumber2Maximum concurrent requests per host
per_host_delay_msnumber200Minimum delay between requests to the same host
includestring[][]URL patterns to include. If set, overrides domains
excludestring[][]URL patterns to exclude from crawling
allow_query_paramsstring[][]Query parameters to preserve (others may be stripped for deduplication)
drop_query_prefixesstring[]["utm_", "gclid", "fbclid"]Query param prefixes to strip (tracking params)
respect_robotsbooleantrueObey robots.txt rules and crawl-delay

rules

Configure which SEO rules run during analysis.
[rules]
enable = ["*"]
disable = ["ai/*", "content/quality"]
KeyTypeDefaultDescription
enablestring[]["*"]Patterns of rules to enable. Supports wildcards: *, seo/*, core/meta-title
disablestring[]["ai/ai-content", "ai/llm-parsability", "content/quality"]Patterns of rules to disable. Takes precedence over enable
Rule IDs follow the format domain/rule-name. Common domains:
  • seo/* - Meta tags, canonical, robots
  • content/* - Word count, headings, thin content
  • links/* - Broken links, redirects
  • images/* - Alt text, dimensions
  • schema/* - JSON-LD validation
  • security/* - HTTPS, CSP, HSTS
  • performance/* - Core Web Vitals hints
  • ai/* - LLM-powered analysis (disabled by default)
Configure external link checking during audits.
[external_links]
enabled = true
cache_ttl_days = 7
timeout_ms = 10000
concurrency = 5
KeyTypeDefaultDescription
enabledbooleantrueEnable checking external links for broken URLs (4xx/5xx)
cache_ttl_daysnumber7How long to cache external link check results (in days). Results are shared across all site audits.
timeout_msnumber10000Timeout for each external link check (ms)
concurrencynumber5Maximum concurrent external link checks
External link results are cached globally in ~/.squirrel/link-cache.db to avoid re-checking the same URLs across different site audits. This dramatically speeds up subsequent audits.

output

Default output settings for reports.
[output]
format = "json"
path = "report.json"
KeyTypeDefaultDescription
format"console" | "json" | "html""console"Default output format
pathstring-Default output file path

rule_options

Per-rule configuration for rules that accept options.
[rule_options.content/word-count]
min_words = 300

[rule_options.seo/meta-title]
max_length = 70
Each rule documents its available options. This section is a key-value map where keys are rule IDs.

Examples

High-Volume Crawl

For large sites, increase limits and be polite:
[crawler]
max_pages = 500
concurrency = 10
per_host_delay_ms = 500
respect_robots = true

Multi-Domain Project

Audit a main site plus its API docs and blog:
[project]
domains = ["example.com"]

[crawler]
max_pages = 200
This crawls example.com, www.example.com, docs.example.com, blog.example.com, etc.

CI Pipeline

Strict settings for automated checks:
[crawler]
max_pages = 100
timeout_ms = 10000

[rules]
enable = ["seo/*", "security/*", "links/*"]
disable = []

[output]
format = "json"

Exclude Admin Areas

Skip paths you don’t want crawled:
[crawler]
exclude = [
  "/admin/*",
  "/api/*",
  "/wp-admin/*",
  "*.pdf",
  "*?preview=*"
]
If you want faster audits and don’t need external link validation:
[external_links]
enabled = false
For thorough audits with more concurrent checks:
[external_links]
enabled = true
cache_ttl_days = 1
timeout_ms = 5000
concurrency = 10

Validating Config

Check your config file for errors:
squirrel config validate
View the current effective config:
squirrel config show