Documentation Index
Fetch the complete documentation index at: https://docs.squirrelscan.com/llms.txt
Use this file to discover all available pages before exploring further.
squirrelscan uses TOML config files and JSON settings to customize crawling, analysis, and output behavior. This is optional - squirrelscan works out of the box with sensible defaults.
Configuration Hierarchy
squirrelscan uses a layered configuration system that merges settings from multiple sources:
1. Project Config (squirrel.toml)
Project-specific settings for crawling, rules, and output. Located in your project directory.
Priority: Highest (overrides all other settings)
Location: squirrelscan searches for squirrel.toml starting from the current directory and walking up to your home directory.
Create:
Scope: Crawler settings, rule enable/disable, rule options, output format
Example locations:
/Users/you/projects/mysite/squirrel.toml ← Project-specific
/Users/you/projects/squirrel.toml ← Shared across projects in directory
/Users/you/squirrel.toml ← Global fallback
2. Local Settings (.squirrel/settings.json)
CLI behavior settings that can be scoped to a specific project directory.
Priority: Medium (overrides user settings)
Location: .squirrel/settings.json in project directory (created on demand)
Scope: CLI settings like notifications, channel preferences
Create:
squirrel self settings set notifications false --local
Example:
{
"notifications": false,
"channel": "beta"
}
3. User Settings (~/.squirrel/settings.json)
Global CLI settings that apply to all projects.
Priority: Low (defaults can override)
Location:
- Unix/macOS:
~/.squirrel/settings.json
- Windows:
%LOCALAPPDATA%\squirrel\settings.json
Scope: Update channel, auto-update, notifications, feedback email
Manage:
squirrel self settings show
squirrel self settings set channel beta
4. Defaults
Built-in defaults when no config is provided.
Priority: Lowest
Scope: All settings
Zero-Config Mode
If no config file exists, squirrelscan uses defaults that work well for most sites:
- Crawls up to 500 pages
- 100ms delay between requests
- Respects robots.txt
- Checks external links (cached for 7 days)
- Runs all rules except AI-powered ones
- Console output format
Using Config Files
Create Project Config
Initialize a squirrel.toml in your current directory:
This creates a config file with all available settings and their defaults:
[project]
name = ""
domains = []
[crawler]
max_pages = 500
delay_ms = 100
timeout_ms = 30000
# ... all crawler options
[rules]
enable = ["*"]
disable = ["ai/*"]
[external_links]
enabled = true
cache_ttl_days = 7
[output]
format = "console"
[rule_options]
# Per-rule configuration
Edit Config
Modify values directly:
Or use the CLI:
squirrel config set crawler.max_pages 200
Validate Config
Check for errors:
View effective config:
Config File Discovery
squirrelscan walks up from the current directory to find squirrel.toml:
Current directory: /Users/you/projects/mysite/blog
Searches:
1. /Users/you/projects/mysite/blog/squirrel.toml ← Check here first
2. /Users/you/projects/mysite/squirrel.toml ← Then parent
3. /Users/you/projects/squirrel.toml ← Then grandparent
4. /Users/you/squirrel.toml ← Stop at home
This allows you to:
- Share config across multiple projects in a directory
- Override parent configs in subdirectories
- Set global defaults in your home directory
Full Config Reference
Project Settings
Project-level metadata and multi-domain support.
[project]
name = "mysite"
domains = ["example.com"]
| Key | Type | Default | Description |
|---|
name | string | Current directory name | Project name for local URLs (localhost, 127.0.0.1). Defaults to the current working directory name. |
domains | string[] | [] | Allowed domains for crawling. Supports subdomain wildcards - ["example.com"] allows www.example.com, api.example.com, etc. When empty, only the seed URL’s domain is crawled. |
Example - Multi-domain:
[project]
domains = ["example.com"]
Crawls: example.com, www.example.com, docs.example.com, blog.example.com
Crawler Settings
Controls how squirrelscan discovers and fetches pages.
[crawler]
max_pages = 200
concurrency = 10
per_host_delay_ms = 500
exclude = ["/admin/*", "*.pdf"]
| Key | Type | Default | Description |
|---|
max_pages | number | 500 | Maximum pages to crawl per audit |
delay_ms | number | 100 | Base delay between requests (ms) |
timeout_ms | number | 30000 | Request timeout (ms) |
user_agent | string | "" | Custom user agent (empty = random browser UA per crawl) |
follow_redirects | boolean | true | Follow HTTP 3xx redirects |
concurrency | number | 5 | Maximum concurrent requests globally |
per_host_concurrency | number | 2 | Maximum concurrent requests per host |
per_host_delay_ms | number | 200 | Minimum delay between requests to same host (ms) |
include | string[] | [] | URL patterns to include (glob format). When set, only matching URLs are crawled. |
exclude | string[] | [] | URL patterns to exclude from crawling (glob format) |
allow_query_params | string[] | [] | Query parameters to preserve (others stripped for deduplication) |
drop_query_prefixes | string[] | ["utm_", "gclid", "fbclid"] | Query param prefixes to strip (tracking params) |
respect_robots | boolean | true | Obey robots.txt rules and crawl-delay |
breadth_first | boolean | true | Use breadth-first crawling for better site coverage |
max_prefix_budget | number | 0.25 | Max percentage (0-1) of crawl budget for any single path prefix |
URL Pattern Syntax:
Patterns use glob syntax:
* - Match anything except /
** - Match anything including /
? - Match single character
[abc] - Match character set
Examples:
[crawler]
# Only crawl blog
include = ["/blog/**"]
# Exclude admin, API, and PDFs
exclude = ["/admin/*", "/api/*", "*.pdf"]
# Preserve specific query params
allow_query_params = ["page", "id", "category"]
# Strip all UTM and tracking params
drop_query_prefixes = ["utm_", "gclid", "fbclid", "mc_", "_ga"]
Rules Configuration
Configure which audit rules run during analysis.
[rules]
enable = ["*"]
disable = ["ai/*", "content/quality"]
| Key | Type | Default | Description |
|---|
enable | string[] | ["*"] | Patterns of rules to enable. Supports wildcards. |
disable | string[] | ["ai/ai-content", "ai/llm-parsability", "content/quality"] | Patterns of rules to disable. Takes precedence over enable. |
Rule Pattern Syntax:
Rule IDs follow the format category/rule-name:
* - All rules
core/* - All rules in core category
core/meta-title - Specific rule
Common Categories:
core - Meta tags, canonical, H1, Open Graph
content - Word count, headings, duplicates, freshness
links - Broken links, redirects, orphan pages
images - Alt text, dimensions, formats, lazy loading
schema - JSON-LD validation, structured data
security - HTTPS, HSTS, CSP, headers
a11y - Accessibility (ARIA, contrast, focus)
i18n - Internationalization (lang, hreflang)
perf - Performance hints (LCP, CLS, INP)
social - Social media (Open Graph, Twitter Cards)
crawl - Crawlability (robots, sitemaps, indexability)
url - URL structure (length, keywords, parameters)
mobile - Mobile optimization (viewport, tap targets)
legal - Legal compliance (privacy, cookies, terms)
local - Local SEO (NAP, geo tags)
video - Video optimization (schema, thumbnails)
analytics - Analytics tracking (GTM, consent)
eeat - E-E-A-T signals (author, expertise, trust)
ai - AI-powered analysis (disabled by default, requires LLM)
Examples:
Enable only specific categories:
[rules]
enable = ["core/*", "links/*", "images/*"]
disable = []
Disable slow or AI-powered rules:
[rules]
enable = ["*"]
disable = ["ai/*", "content/quality", "perf/*"]
Disable specific rules:
[rules]
enable = ["*"]
disable = ["content/word-count", "images/modern-format"]
External Links Configuration
Configure external link checking during audits.
[external_links]
enabled = true
cache_ttl_days = 7
timeout_ms = 10000
concurrency = 5
| Key | Type | Default | Description |
|---|
enabled | boolean | true | Enable checking external links for broken URLs (4xx/5xx) |
cache_ttl_days | number | 7 | How long to cache external link check results (days). Results shared across all audits. |
timeout_ms | number | 10000 | Timeout for each external link check (ms) |
concurrency | number | 5 | Maximum concurrent external link checks |
External link results are cached globally in ~/.squirrel/link-cache.db to avoid re-checking the same URLs across different site audits.
Examples:
Disable for faster audits:
[external_links]
enabled = false
Aggressive checking with short cache:
[external_links]
enabled = true
cache_ttl_days = 1
timeout_ms = 5000
concurrency = 10
Output Settings
Default output format and path for reports.
[output]
format = "json"
path = "report.json"
| Key | Type | Default | Description |
|---|
format | "console" | "text" | "json" | "html" | "markdown" | "llm" | "console" | Default output format |
path | string | - | Default output file path (optional) |
Formats:
console - Colored terminal output
text - Plain text without colors
json - Machine-readable JSON
html - Interactive browser report
markdown - Markdown for docs
llm - Optimized for LLM consumption
Note: CLI flags (--format, --output) override these defaults.
Rule Options
Per-rule configuration for rules that accept options.
[rule_options."core/meta-title"]
min_length = 30
max_length = 60
[rule_options."content/word-count"]
min_words = 500
Each rule documents its available options on its documentation page. This section uses dotted keys where each key is a rule ID.
Common Rule Options:
# Title length
[rule_options."core/meta-title"]
min_length = 30
max_length = 60
# Description length
[rule_options."core/meta-description"]
min_length = 120
max_length = 160
# Word count
[rule_options."content/word-count"]
min_words = 300
# Image file size
[rule_options."images/image-file-size"]
max_size_kb = 500
See individual rule documentation for all available options.
Configuration Examples
High-Volume Crawl
For large sites with increased politeness:
[crawler]
max_pages = 1000
concurrency = 10
per_host_delay_ms = 500
respect_robots = true
Multi-Domain Project
Audit main site plus subdomains:
[project]
domains = ["example.com"]
[crawler]
max_pages = 500
Crawls: example.com, www.example.com, docs.example.com, blog.example.com
CI/CD Pipeline
Strict settings for automated checks:
[crawler]
max_pages = 100
timeout_ms = 10000
[rules]
enable = ["core/*", "security/*", "links/*"]
disable = []
[external_links]
enabled = true
cache_ttl_days = 1
[output]
format = "json"
path = "report.json"
Exclude Admin Areas
Skip paths you don’t want crawled:
[crawler]
exclude = [
"/admin/*",
"/wp-admin/*",
"/api/*",
"*.pdf",
"*?preview=*"
]
Fast Local Development
Lightweight config for quick local tests:
[crawler]
max_pages = 50
concurrency = 10
per_host_delay_ms = 0
[external_links]
enabled = false
[rules]
enable = ["core/*", "content/*"]
disable = []
SEO-Focused Audit
Focus on SEO rules only:
[rules]
enable = ["core/*", "content/*", "schema/*", "crawl/*", "url/*"]
disable = []
Accessibility Audit
Focus on accessibility:
[rules]
enable = ["a11y/*", "mobile/*"]
disable = []
Focus on performance:
[rules]
enable = ["perf/*", "images/*"]
disable = []
Managing Settings
View Current Config
Show effective config (merged from all sources):
Show config file path:
Modify Config
Edit file directly:
Or use CLI:
squirrel config set crawler.max_pages 200
Preview change:
squirrel config set crawler.max_pages 200 --dry-run
Validate Config
Check for errors:
Output on success:
Config valid: /path/to/squirrel.toml
Output on error:
Invalid config: crawler.max_pages: Expected number, received string
CLI Settings vs Project Config
| Setting Type | File | Scope | Managed With |
|---|
| Project config | squirrel.toml | Crawler, rules, output | squirrel init, squirrel config |
| User settings | ~/.squirrel/settings.json | CLI behavior, updates | squirrel self settings |
| Local settings | .squirrel/settings.json | CLI behavior (project-scoped) | squirrel self settings --local |
Project Config (squirrel.toml):
- Crawler settings (max pages, delays, concurrency)
- Rule enable/disable
- Rule options
- Output format
- External link checking
CLI Settings (settings.json):
- Update channel (stable/beta)
- Auto-update preferences
- Notification settings
- Feedback email
- Dismissed updates