[external_links] section controls how squirrelscan validates outbound links during crawls.
Configuration
Options
enabled
Type: boolean
Default: true
Enable external link checking during crawl.
When enabled, squirrelscan validates all external links found during crawling to detect broken outbound links (404s, timeouts, DNS failures).
Examples:
Enable (default):
- Local development (localhost URLs)
- Network restrictions (firewalls, VPNs)
- Speed priority over external link validation
- Large sites with many external links
- Faster crawls (no external HTTP requests)
links/broken-external-linksrule won’t report issues- Outbound link quality not validated
cache_ttl_days
Type: number
Default: 7 (days)
Range: 1 to 365 recommended
How long to cache external link check results in days.
External link checks are cached globally per URL to avoid repeatedly checking the same external resources across multiple crawls.
How caching works:
- First crawl checks
https://example.com/article - Result cached for 7 days (default)
- Next crawl within 7 days reuses cached result
- After 7 days, link is re-checked
| Use Case | Recommended TTL | Reason |
|---|---|---|
| Daily crawls | 1-2 days | Fresh data daily |
| Weekly crawls | 7 days (default) | Balance freshness/speed |
| Monthly crawls | 14-30 days | Reduce external requests |
| CI/CD pipeline | 1 day | Catch issues quickly |
timeout_ms
Type: number
Default: 10000 (10 seconds)
Range: 1000 to 60000 (1-60 seconds)
Timeout for external link checks in milliseconds.
External link checks use HEAD requests by default (faster, no body download). If a site doesn’t respond within this timeout, it’s marked as failed.
Examples:
Fast timeout (5 seconds):
- Link marked as “timeout”
- Reported as broken external link
- Counted in
links/broken-external-linksrule
| Scenario | Timeout | Reason |
|---|---|---|
| Most sites | 10s (default) | Balance speed/reliability |
| Fast CDN links | 5s | CDNs are fast |
| Slow sites | 20-30s | Allow slow responses |
| CI/CD | 5-10s | Fail fast |
concurrency
Type: number
Default: 5
Range: 1 to 20 recommended
Maximum number of concurrent external link checks.
Controls how many external URLs are validated simultaneously during crawling.
Examples:
Sequential (one at a time):
- ✓ Faster external link validation
- ✓ Better for sites with many external links
- ✗ More network connections
- ✗ May trigger rate limits
- ✓ Polite to external sites
- ✓ Less network overhead
- ✗ Slower external link validation
| Use Case | Concurrency | Reason |
|---|---|---|
| Most sites | 5 (default) | Good balance |
| Many external links (100+) | 10-15 | Speed up validation |
| Slow network | 2-3 | Avoid overload |
| Rate-limited | 1-2 | Avoid 429 errors |
How External Link Checking Works
Request Strategy
-
HEAD request first
- Faster (no body download)
- Checks if URL responds
- Most efficient
-
GET fallback
- If HEAD fails/not supported
- Downloads full response
- Slower but more reliable
-
User agent
- Uses configured crawler user agent
- Browser impersonation if enabled
- Respects
request_methodsetting
Status Detection
| Status | Meaning | Reported As |
|---|---|---|
| 200-299 | Success | Working link |
| 300-399 | Redirect | Working (followed) |
| 400-499 | Client error | Broken link |
| 500-599 | Server error | Broken link |
| Timeout | No response | Broken link |
| DNS failure | Domain not found | Broken link |
Caching Behavior
Cached for TTL period:- 200-299 (success)
- 404 (not found)
- Redirects (with final destination)
- Timeouts (may be transient)
- Server errors (5xx - may be temporary)
- DNS failures (may recover)
Configuration Examples
Fast Crawl (Disable External Links)
For local development or quick audits:Aggressive External Link Checking
For comprehensive link validation:- Link quality audits
- Outbound link monitoring
- Content freshness validation
Polite External Link Checking
Conservative settings for respectful crawling:- Many external links
- Avoid rate limits
- Network restrictions
CI/CD Pipeline
Fast feedback with fresh data:- Automated testing
- PR checks
- Daily builds
Performance Impact
External link checking adds overhead to crawls. Typical impact: With external links enabled (default):External Link Rules
External link configuration affects these rules:links/broken-external-links
What it checks:
- External links returning 4xx/5xx errors
- Timeout failures
- DNS resolution failures
links/https-downgrade
What it checks:
- HTTPS page linking to HTTP external URL
- Security downgrade warnings
Troubleshooting
External links all timeout
Symptoms:- Many “timeout” failures
- External link checks slow
False positives (working links marked broken)
Cause: Some sites block HEAD requests or require browser user agents Solution: External link checker automatically falls back to GET requests Verify crawler user agent:Too slow with many external links
Symptoms:- Crawl takes long time
- Hundreds of external links
Cache not working
Verify cache location:Complete Example
- Crawls up to 500 pages
- Validates all external links
- Caches results for 7 days
- Allows 15s per external link
- Checks 10 external links in parallel
- Reports broken external links in audit
Related
- Crawler Settings - Request method and user agent
- Rules Configuration - Enable link rules
- Examples - Common configurations