AI bots are blocked from your site

A Disallow: / for GPTBot, a 403 from your WAF, an overzealous Cloudflare rule — any one of these removes you from the new search surfaces. None of them surface in Google Search Console. You only notice when a competitor starts showing up in answers and you don’t.

What this actually blocks

Each AI surface ingests via a small set of named, IP-published crawlers. The map as of 2026:

Crawler	Surface	JS rendering
`GPTBot`	ChatGPT training and answers	No
`OAI-SearchBot`	ChatGPT search results (live)	No
`ChatGPT-User`	On-demand fetch when a user pastes a URL	No
`ClaudeBot`	Claude training and live web	No
`Claude-User`, `Claude-SearchBot`	Claude on-demand, search	No
`PerplexityBot`	Perplexity index	No
`Perplexity-User`	Perplexity on-demand fetch	No
`Google-Extended`	Gemini and AI Overviews (independent of Googlebot)	n/a (token only)
`CCBot`	Common Crawl (feeds many models)	No

Google-Extended is a token, not a real crawler — but if you Disallow it in robots.txt, Google removes your content from Gemini and AI Overviews while keeping it in regular Search. That’s almost always not what you want.

How to detect it

Two layers. Check both.

GET /robots.txt HTTP/1.1
Host: example.com

Look for:

User-agent: GPTBot
Disallow: /

Or a global block:

User-agent: *
Disallow: /

Then verify the live HTTP response per bot — a permissive robots.txt is meaningless if the WAF returns 403.

for ua in \
  "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.2; +https://openai.com/gptbot" \
  "Mozilla/5.0 (compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" \
  "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)"
do
  echo "== $ua"
  curl -sI -A "$ua" https://example.com/ | head -1
done

A 200 with a non-empty body for every bot means you’re actually unblocked. 403, 429, or a text/html challenge page means the WAF is silently overriding your robots.txt.

The fix

Universal robots.txt

User-agent: *
Allow: /

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Allow: / is the explicit form. Removing the Disallow lines works too — by spec, anything not explicitly disallowed is allowed.

WordPress (Yoast / Rank Math)

Both plugins auto-generate robots.txt. Override in the dashboard (Yoast: SEO → Tools → File editor; Rank Math: General Settings → Edit robots.txt). Or drop a static robots.txt in the WordPress root — a real file beats the plugin’s virtual one.

Shopify

Shopify generates robots.txt from robots.txt.liquid. Edit it in the theme code editor:

{% for group in robots.default_groups %}
  {{- group.user_agent }}
  {%- for rule in group.rules -%}
    {{ rule }}
  {%- endfor -%}
{%- endfor %}

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

Cloudflare

Two things to check, both in Security → Bots:

AI Scrapers and Crawlers — set to off, or set to “Allow” for the specific bots you want.
Block AI Bots managed rule — off.

Then audit any custom WAF rules for cf.client.bot matches. The default Cloudflare bot fingerprint marks AI crawlers as “bot,” and a rule like Block when (cf.client.bot) blocks them all.

Akamai / Imperva

Akamai Bot Manager → AI Bot category → set policy to Allow. Imperva Advanced Bot Protection → AI Bot category → Allow.

Pitfalls

Don’t filter by user-agent string alone for sensitive paths. User agents are trivial to spoof. OpenAI, Anthropic, and Perplexity publish their egress IP ranges — verify by IP if access matters. For public marketing content, allow openly and don’t overthink it.

Don’t block AI bots to save bandwidth. A single AI crawler visit costs a fraction of a cent. The downside — being absent from a fast-growing referral surface — is uncapped.

Don’t forget Google-Extended. Many SEO plugins still don’t ship a default for it. Blocking it is the only documented way to opt out of Gemini and AI Overviews. If you’re not deliberately opting out, allow it.

Fix at the edge with Serpwise

The block usually lives in a layer the SEO team doesn’t own — a WAF rule from years ago, a CMS-generated robots.txt that fights manual edits, a hosted vendor.

Serpwise rewrites robots.txt and overrides WAF responses for known AI user agents at the edge, before traffic reaches your origin. One rule unblocks everything. When OpenAI ships a new bot — and they do, every few months — you update one rule instead of opening a ticket against the WAF team.

See pricing or run a free AI visibility audit.

Start med optimizer

Teams der venter på releases