Skip to main content
squirrelscan uses TOML config files and JSON settings to customize crawling, analysis, and output behavior. This is optional - squirrelscan works out of the box with sensible defaults.

Configuration Hierarchy

squirrelscan uses a layered configuration system that merges settings from multiple sources:

1. Project Config (squirrel.toml)

Project-specific settings for crawling, rules, and output. Located in your project directory. Priority: Highest (overrides all other settings) Location: squirrelscan searches for squirrel.toml starting from the current directory and walking up to your home directory. Create:
squirrel init
Scope: Crawler settings, rule enable/disable, rule options, output format Example locations:
/Users/you/projects/mysite/squirrel.toml       ← Project-specific
/Users/you/projects/squirrel.toml              ← Shared across projects in directory
/Users/you/squirrel.toml                       ← Global fallback

2. Local Settings (.squirrel/settings.json)

CLI behavior settings that can be scoped to a specific project directory. Priority: Medium (overrides user settings) Location: .squirrel/settings.json in project directory (created on demand) Scope: CLI settings like notifications, channel preferences Create:
squirrel self settings set notifications false --local
Example:
{
  "notifications": false,
  "channel": "beta"
}

3. User Settings (~/.squirrel/settings.json)

Global CLI settings that apply to all projects. Priority: Low (defaults can override) Location:
  • Unix/macOS: ~/.squirrel/settings.json
  • Windows: %LOCALAPPDATA%\squirrel\settings.json
Scope: Update channel, auto-update, notifications, feedback email Manage:
squirrel self settings show
squirrel self settings set channel beta

4. Defaults

Built-in defaults when no config is provided. Priority: Lowest Scope: All settings

Zero-Config Mode

If no config file exists, squirrelscan uses defaults that work well for most sites:
  • Crawls up to 500 pages
  • 100ms delay between requests
  • Respects robots.txt
  • Checks external links (cached for 7 days)
  • Runs all rules except AI-powered ones
  • Console output format

Using Config Files

Create Project Config

Initialize a squirrel.toml in your current directory:
squirrel init
This creates a config file with all available settings and their defaults:
[project]
name = ""
domains = []

[crawler]
max_pages = 500
delay_ms = 100
timeout_ms = 30000
# ... all crawler options

[rules]
enable = ["*"]
disable = ["ai/*"]

[external_links]
enabled = true
cache_ttl_days = 7

[output]
format = "console"

[rule_options]
# Per-rule configuration

Edit Config

Modify values directly:
nano squirrel.toml
Or use the CLI:
squirrel config set crawler.max_pages 200

Validate Config

Check for errors:
squirrel config validate
View effective config:
squirrel config show

Config File Discovery

squirrelscan walks up from the current directory to find squirrel.toml:
Current directory: /Users/you/projects/mysite/blog
Searches:
  1. /Users/you/projects/mysite/blog/squirrel.toml  ← Check here first
  2. /Users/you/projects/mysite/squirrel.toml       ← Then parent
  3. /Users/you/projects/squirrel.toml              ← Then grandparent
  4. /Users/you/squirrel.toml                       ← Stop at home
This allows you to:
  • Share config across multiple projects in a directory
  • Override parent configs in subdirectories
  • Set global defaults in your home directory

Full Config Reference

Project Settings

Project-level metadata and multi-domain support.
[project]
name = "mysite"
domains = ["example.com"]
KeyTypeDefaultDescription
namestringCurrent directory nameProject name for local URLs (localhost, 127.0.0.1). Defaults to the current working directory name.
domainsstring[][]Allowed domains for crawling. Supports subdomain wildcards - ["example.com"] allows www.example.com, api.example.com, etc. When empty, only the seed URL’s domain is crawled.
Example - Multi-domain:
[project]
domains = ["example.com"]
Crawls: example.com, www.example.com, docs.example.com, blog.example.com

Crawler Settings

Controls how squirrelscan discovers and fetches pages.
[crawler]
max_pages = 200
concurrency = 10
per_host_delay_ms = 500
exclude = ["/admin/*", "*.pdf"]
KeyTypeDefaultDescription
max_pagesnumber500Maximum pages to crawl per audit
delay_msnumber100Base delay between requests (ms)
timeout_msnumber30000Request timeout (ms)
user_agentstring""Custom user agent (empty = random browser UA per crawl)
follow_redirectsbooleantrueFollow HTTP 3xx redirects
concurrencynumber5Maximum concurrent requests globally
per_host_concurrencynumber2Maximum concurrent requests per host
per_host_delay_msnumber200Minimum delay between requests to same host (ms)
includestring[][]URL patterns to include (glob format). When set, only matching URLs are crawled.
excludestring[][]URL patterns to exclude from crawling (glob format)
allow_query_paramsstring[][]Query parameters to preserve (others stripped for deduplication)
drop_query_prefixesstring[]["utm_", "gclid", "fbclid"]Query param prefixes to strip (tracking params)
respect_robotsbooleantrueObey robots.txt rules and crawl-delay
breadth_firstbooleantrueUse breadth-first crawling for better site coverage
max_prefix_budgetnumber0.25Max percentage (0-1) of crawl budget for any single path prefix
URL Pattern Syntax: Patterns use glob syntax:
  • * - Match anything except /
  • ** - Match anything including /
  • ? - Match single character
  • [abc] - Match character set
Examples:
[crawler]
# Only crawl blog
include = ["/blog/**"]

# Exclude admin, API, and PDFs
exclude = ["/admin/*", "/api/*", "*.pdf"]

# Preserve specific query params
allow_query_params = ["page", "id", "category"]

# Strip all UTM and tracking params
drop_query_prefixes = ["utm_", "gclid", "fbclid", "mc_", "_ga"]

Rules Configuration

Configure which audit rules run during analysis.
[rules]
enable = ["*"]
disable = ["ai/*", "content/quality"]
KeyTypeDefaultDescription
enablestring[]["*"]Patterns of rules to enable. Supports wildcards.
disablestring[]["ai/ai-content", "ai/llm-parsability", "content/quality"]Patterns of rules to disable. Takes precedence over enable.
Rule Pattern Syntax: Rule IDs follow the format category/rule-name:
  • * - All rules
  • core/* - All rules in core category
  • core/meta-title - Specific rule
Common Categories:
  • core - Meta tags, canonical, H1, Open Graph
  • content - Word count, headings, duplicates, freshness
  • links - Broken links, redirects, orphan pages
  • images - Alt text, dimensions, formats, lazy loading
  • schema - JSON-LD validation, structured data
  • security - HTTPS, HSTS, CSP, headers
  • a11y - Accessibility (ARIA, contrast, focus)
  • i18n - Internationalization (lang, hreflang)
  • perf - Performance hints (LCP, CLS, INP)
  • social - Social media (Open Graph, Twitter Cards)
  • crawl - Crawlability (robots, sitemaps, indexability)
  • url - URL structure (length, keywords, parameters)
  • mobile - Mobile optimization (viewport, tap targets)
  • legal - Legal compliance (privacy, cookies, terms)
  • local - Local SEO (NAP, geo tags)
  • video - Video optimization (schema, thumbnails)
  • analytics - Analytics tracking (GTM, consent)
  • eeat - E-E-A-T signals (author, expertise, trust)
  • ai - AI-powered analysis (disabled by default, requires LLM)
Examples: Enable only specific categories:
[rules]
enable = ["core/*", "links/*", "images/*"]
disable = []
Disable slow or AI-powered rules:
[rules]
enable = ["*"]
disable = ["ai/*", "content/quality", "perf/*"]
Disable specific rules:
[rules]
enable = ["*"]
disable = ["content/word-count", "images/modern-format"]

Configure external link checking during audits.
[external_links]
enabled = true
cache_ttl_days = 7
timeout_ms = 10000
concurrency = 5
KeyTypeDefaultDescription
enabledbooleantrueEnable checking external links for broken URLs (4xx/5xx)
cache_ttl_daysnumber7How long to cache external link check results (days). Results shared across all audits.
timeout_msnumber10000Timeout for each external link check (ms)
concurrencynumber5Maximum concurrent external link checks
External link results are cached globally in ~/.squirrel/link-cache.db to avoid re-checking the same URLs across different site audits. Examples: Disable for faster audits:
[external_links]
enabled = false
Aggressive checking with short cache:
[external_links]
enabled = true
cache_ttl_days = 1
timeout_ms = 5000
concurrency = 10

Output Settings

Default output format and path for reports.
[output]
format = "json"
path = "report.json"
KeyTypeDefaultDescription
format"console" | "text" | "json" | "html" | "markdown" | "llm""console"Default output format
pathstring-Default output file path (optional)
Formats:
  • console - Colored terminal output
  • text - Plain text without colors
  • json - Machine-readable JSON
  • html - Interactive browser report
  • markdown - Markdown for docs
  • llm - Optimized for LLM consumption
Note: CLI flags (--format, --output) override these defaults.

Rule Options

Per-rule configuration for rules that accept options.
[rule_options."core/meta-title"]
min_length = 30
max_length = 60

[rule_options."content/word-count"]
min_words = 500
Each rule documents its available options on its documentation page. This section uses dotted keys where each key is a rule ID. Common Rule Options:
# Title length
[rule_options."core/meta-title"]
min_length = 30
max_length = 60

# Description length
[rule_options."core/meta-description"]
min_length = 120
max_length = 160

# Word count
[rule_options."content/word-count"]
min_words = 300

# Image file size
[rule_options."images/image-file-size"]
max_size_kb = 500
See individual rule documentation for all available options.

Configuration Examples

High-Volume Crawl

For large sites with increased politeness:
[crawler]
max_pages = 1000
concurrency = 10
per_host_delay_ms = 500
respect_robots = true

Multi-Domain Project

Audit main site plus subdomains:
[project]
domains = ["example.com"]

[crawler]
max_pages = 500
Crawls: example.com, www.example.com, docs.example.com, blog.example.com

CI/CD Pipeline

Strict settings for automated checks:
[crawler]
max_pages = 100
timeout_ms = 10000

[rules]
enable = ["core/*", "security/*", "links/*"]
disable = []

[external_links]
enabled = true
cache_ttl_days = 1

[output]
format = "json"
path = "report.json"

Exclude Admin Areas

Skip paths you don’t want crawled:
[crawler]
exclude = [
  "/admin/*",
  "/wp-admin/*",
  "/api/*",
  "*.pdf",
  "*?preview=*"
]

Fast Local Development

Lightweight config for quick local tests:
[crawler]
max_pages = 50
concurrency = 10
per_host_delay_ms = 0

[external_links]
enabled = false

[rules]
enable = ["core/*", "content/*"]
disable = []

SEO-Focused Audit

Focus on SEO rules only:
[rules]
enable = ["core/*", "content/*", "schema/*", "crawl/*", "url/*"]
disable = []

Accessibility Audit

Focus on accessibility:
[rules]
enable = ["a11y/*", "mobile/*"]
disable = []

Performance Audit

Focus on performance:
[rules]
enable = ["perf/*", "images/*"]
disable = []

Managing Settings

View Current Config

Show effective config (merged from all sources):
squirrel config show
Show config file path:
squirrel config path

Modify Config

Edit file directly:
nano squirrel.toml
Or use CLI:
squirrel config set crawler.max_pages 200
Preview change:
squirrel config set crawler.max_pages 200 --dry-run

Validate Config

Check for errors:
squirrel config validate
Output on success:
Config valid: /path/to/squirrel.toml
Output on error:
Invalid config: crawler.max_pages: Expected number, received string

CLI Settings vs Project Config

Setting TypeFileScopeManaged With
Project configsquirrel.tomlCrawler, rules, outputsquirrel init, squirrel config
User settings~/.squirrel/settings.jsonCLI behavior, updatessquirrel self settings
Local settings.squirrel/settings.jsonCLI behavior (project-scoped)squirrel self settings --local
Project Config (squirrel.toml):
  • Crawler settings (max pages, delays, concurrency)
  • Rule enable/disable
  • Rule options
  • Output format
  • External link checking
CLI Settings (settings.json):
  • Update channel (stable/beta)
  • Auto-update preferences
  • Notification settings
  • Feedback email
  • Dismissed updates