AI Crawler Access
Reports which AI-agent crawlers robots.txt allows or blocks
Reports which AI-agent crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, …) your robots.txt allows or blocks at the site root. AI assistants and answer engines reach your content through these named user-agents, so blocking them keeps your pages out of those tools.
| Rule ID | ax/ai-crawlers |
| Category | Agent Experience |
| Scope | Site-wide |
| Severity | info |
| Weight | 1/10 |
What it checks
For each well-known AI crawler, the rule reads your parsed robots.txt and reports whether the user-agent is allowed or fully blocked (Disallow: / with no re-permitting Allow: /). A user-agent with its own group takes precedence over the wildcard * group, matching how real crawlers resolve robots.txt.
Crawlers covered include:
- OpenAI —
GPTBot(training),OAI-SearchBot(search),ChatGPT-User(live fetch) - Anthropic —
ClaudeBot(training),Claude-User/Claude-SearchBot,anthropic-ai - Google —
Google-Extended(Gemini/Vertex training) - Common Crawl —
CCBot - Perplexity —
PerplexityBot,Perplexity-User - Others —
Applebot-Extended,Bytespider,Amazonbot,Meta-ExternalAgent,cohere-ai,DuckAssistBot,MistralAI-User,AI2Bot,Diffbot,YouBot
Solution
If you want AI visibility, make sure your robots.txt does not Disallow: / these user-agents. To opt out of model training while staying answerable in assistants, block the training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) but keep the live-fetch / search agents (ChatGPT-User, Claude-User, OAI-SearchBot, PerplexityBot) allowed.
# Block training, allow live answer engines
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ChatGPT-User
Allow: /
User-agent: PerplexityBot
Allow: /Enable / Disable
Disable this rule
[rules]
disable = ["ax/ai-crawlers"]Enable only this rule
[rules]
enable = ["ax/ai-crawlers"]
disable = ["*"]