A Disallow: / for GPTBot, a 403 from your WAF, an overzealous Cloudflare rule — any one of these removes you from the new search surfaces. None of them surface in Google Search Console. You only notice when a competitor starts showing up in answers and you don’t.
What this actually blocks
Each AI surface ingests via a small set of named, IP-published crawlers. The map as of 2026:
| Crawler | Surface | JS rendering |
|---|---|---|
GPTBot | ChatGPT training and answers | No |
OAI-SearchBot | ChatGPT search results (live) | No |
ChatGPT-User | On-demand fetch when a user pastes a URL | No |
ClaudeBot | Claude training and live web | No |
Claude-User, Claude-SearchBot | Claude on-demand, search | No |
PerplexityBot | Perplexity index | No |
Perplexity-User | Perplexity on-demand fetch | No |
Google-Extended | Gemini and AI Overviews (independent of Googlebot) | n/a (token only) |
CCBot | Common Crawl (feeds many models) | No |
Google-Extended is a token, not a real crawler — but if you Disallow it in robots.txt, Google removes your content from Gemini and AI Overviews while keeping it in regular Search. That’s almost always not what you want.
How to detect it
Two layers. Check both.
GET /robots.txt HTTP/1.1
Host: example.com
Look for:
User-agent: GPTBot
Disallow: /
Or a global block:
User-agent: *
Disallow: /
Then verify the live HTTP response per bot — a permissive robots.txt is meaningless if the WAF returns 403.
for ua in \
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot" \
"Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" \
"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)"
do
echo "== $ua"
curl -sI -A "$ua" https://example.com/ | head -1
done
A 200 with a non-empty body for every bot means you’re actually unblocked. 403, 429, or a text/html challenge page means the WAF is silently overriding your robots.txt.
The fix
Universal robots.txt
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
Sitemap: https://example.com/sitemap.xml
Allow: / is the explicit form. Removing the Disallow lines works too — by spec, anything not explicitly disallowed is allowed.
WordPress (Yoast / Rank Math)
Both plugins auto-generate robots.txt. Override in the dashboard (Yoast: SEO → Tools → File editor; Rank Math: General Settings → Edit robots.txt). Or drop a static robots.txt in the WordPress root — a real file beats the plugin’s virtual one.
Shopify
Shopify generates robots.txt from robots.txt.liquid. Edit it in the theme code editor:
{% for group in robots.default_groups %}
{{- group.user_agent }}
{%- for rule in group.rules -%}
{{ rule }}
{%- endfor -%}
{%- endfor %}
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
Cloudflare
Two things to check, both in Security → Bots:
- AI Scrapers and Crawlers — set to off, or set to “Allow” for the specific bots you want.
- Block AI Bots managed rule — off.
Then audit any custom WAF rules for cf.client.bot matches. The default Cloudflare bot fingerprint marks AI crawlers as “bot,” and a rule like Block when (cf.client.bot) blocks them all.
Akamai / Imperva
Akamai Bot Manager → AI Bot category → set policy to Allow. Imperva Advanced Bot Protection → AI Bot category → Allow.
Pitfalls
Don’t filter by user-agent string alone for sensitive paths. User agents are trivial to spoof. OpenAI, Anthropic, and Perplexity publish their egress IP ranges — verify by IP if access matters. For public marketing content, allow openly and don’t overthink it.
Don’t block AI bots to save bandwidth. A single AI crawler visit costs a fraction of a cent. The downside — being absent from a fast-growing referral surface — is uncapped.
Don’t forget Google-Extended. Many SEO plugins still don’t ship a default for it. Blocking it is the only documented way to opt out of Gemini and AI Overviews. If you’re not deliberately opting out, allow it.
Fix at the edge with Serpwise
The block usually lives in a layer the SEO team doesn’t own — a WAF rule from years ago, a CMS-generated robots.txt that fights manual edits, a hosted vendor.
Serpwise rewrites robots.txt and overrides WAF responses for known AI user agents at the edge, before traffic reaches your origin. One rule unblocks everything. When OpenAI ships a new bot — and they do, every few months — you update one rule instead of opening a ticket against the WAF team.
See pricing or run a free AI visibility audit.