AI Crawler Access

Reports which AI-agent crawlers robots.txt allows or blocks

Reports which AI-agent crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, …) your robots.txt allows or blocks at the site root. AI assistants and answer engines reach your content through these named user-agents, so blocking them keeps your pages out of those tools.


Rule ID	`ax/ai-crawlers`
Category	Agent Experience
Scope	Site-wide
Severity	info
Weight	1/10

What it checks

For each well-known AI crawler, the rule reads your parsed robots.txt and reports whether the user-agent is allowed or fully blocked (Disallow: / with no re-permitting Allow: /). A user-agent with its own group takes precedence over the wildcard * group, matching how real crawlers resolve robots.txt.

Crawlers covered include:

OpenAI — GPTBot (training), OAI-SearchBot (search), ChatGPT-User (live fetch)
Anthropic — ClaudeBot (training), Claude-User / Claude-SearchBot, anthropic-ai
Google — Google-Extended (Gemini/Vertex training)
Common Crawl — CCBot
Perplexity — PerplexityBot, Perplexity-User
Others — Applebot-Extended, Bytespider, Amazonbot, Meta-ExternalAgent, cohere-ai, DuckAssistBot, MistralAI-User, AI2Bot, Diffbot, YouBot

Solution

If you want AI visibility, make sure your robots.txt does not Disallow: / these user-agents. To opt out of model training while staying answerable in assistants, block the training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) but keep the live-fetch / search agents (ChatGPT-User, Claude-User, OAI-SearchBot, PerplexityBot) allowed.

robots.txt

txt

# Block training, allow live answer engines
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

Enable / Disable

Disable this rule

squirrel.toml

toml

[rules]
disable = ["ax/ai-crawlers"]

Enable only this rule

squirrel.toml

toml

[rules]
enable = ["ax/ai-crawlers"]
disable = ["*"]

Edit this page