GitHub

AI Crawler Access

Reports which AI-agent crawlers robots.txt allows or blocks

Reports which AI-agent crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, …) your robots.txt allows or blocks at the site root. AI assistants and answer engines reach your content through these named user-agents, so blocking them keeps your pages out of those tools.

Rule IDax/ai-crawlers
CategoryAgent Experience
ScopeSite-wide
Severityinfo
Weight1/10

What it checks

For each well-known AI crawler, the rule reads your parsed robots.txt and reports whether the user-agent is allowed or fully blocked (Disallow: / with no re-permitting Allow: /). A user-agent with its own group takes precedence over the wildcard * group, matching how real crawlers resolve robots.txt.

Crawlers covered include:

  • OpenAI — GPTBot (training), OAI-SearchBot (search), ChatGPT-User (live fetch)
  • Anthropic — ClaudeBot (training), Claude-User / Claude-SearchBot, anthropic-ai
  • Google — Google-Extended (Gemini/Vertex training)
  • Common Crawl — CCBot
  • Perplexity — PerplexityBot, Perplexity-User
  • Others — Applebot-Extended, Bytespider, Amazonbot, Meta-ExternalAgent, cohere-ai, DuckAssistBot, MistralAI-User, AI2Bot, Diffbot, YouBot

Solution

If you want AI visibility, make sure your robots.txt does not Disallow: / these user-agents. To opt out of model training while staying answerable in assistants, block the training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) but keep the live-fetch / search agents (ChatGPT-User, Claude-User, OAI-SearchBot, PerplexityBot) allowed.

robots.txt
txt
# Block training, allow live answer engines
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

Enable / Disable

Disable this rule

squirrel.toml
toml
[rules]
disable = ["ax/ai-crawlers"]

Enable only this rule

squirrel.toml
toml
[rules]
enable = ["ax/ai-crawlers"]
disable = ["*"]

Type to search…

↑↓ navigate ↵ open esc close