Cloaking / UA-Gated Content
Detects pages that serve different content to Googlebot than to a normal visitor
Detects pages that serve materially different content to a Googlebot user-agent than to a normal visitor (UA cloaking), or that change their response when a URL query token is added (token-gating). Both are classic ways an injected SEO-spam or phishing payload hides from the site owner while still ranking — you see a clean page, the search engine (or a tokened link) sees the payload.
| Rule ID | integrity/cloaking |
| Category | Site Integrity |
| Scope | Site-wide |
| Severity | error |
| Weight | 7/10 |
How it works
This rule reads the results of an opt-in differential probe. After the crawl, squirrelscan picks a small, capped set of suspicious paths — pages that are orphaned (no inbound internal links and absent from the sitemap) or recently modified (sitemap lastmod within a recent window) — and re-fetches each one:
- with the default (normal-visitor) user-agent — the baseline,
- with a Googlebot user-agent, and
- optionally once more with an appended query token.
It then compares the responses by HTTP status and visible-text similarity:
- UA cloaking (→ failure): the Googlebot response materially diverges from the baseline — for example the normal visitor gets a
403/404while Googlebot gets a200, or the visible text barely overlaps. - Token-gating (→ warning): the response changes materially when a query token is appended, suggesting content gated behind a URL parameter.
The probe is off by default and bounded (capped page count) so it never multiplies crawl cost. When it is off, this rule does nothing — it never reports “clean” for paths it never probed.
Enable the probe
[integrity.cloaking_probe]
enabled = true # off by default
max_pages = 10 # hard cap on probed paths (1–50)
recent_days = 14 # sitemap lastmod within N days counts as "recent"
query_variation = true # also probe an appended query token (token-gating)
# googlebot_user_agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"Solution
A page that shows search engines (or tokened links) different content than it shows you is cloaking — almost always an injected payload or a deceptive doorway, and a likely compromise. Fetch the URL yourself with a Googlebot user-agent (e.g. curl -A 'Googlebot' <url>) to see the hidden content, then audit recently modified files/plugins and any server-side rules that branch on user-agent or query string. Legitimate UA/geo personalization should never change the core indexable content.
Enable / Disable
Disable this rule
[rules]
disable = ["integrity/cloaking"]Disable all Site Integrity rules
[rules]
disable = ["integrity/*"]