AI Crawler Access is about permissions — making sure you have not accidentally blocked the AI crawlers that power ChatGPT, Perplexity, Gemini, and Claude from reading your content. Worth 17% of your AEO score, this category is the easiest to fix and the most commonly broken.
Publish a robots.txt file at the root of your domain. It must be accessible at https://yourdomain.com/robots.txt.
Why AI cares: Without robots.txt, AI crawlers fall back to default behavior and may not index everything. An explicit file signals you are in control.
Ensure GPTBot is not blocked in robots.txt.
Why AI cares: GPTBot is OpenAI's crawler for ChatGPT training and web browsing. Blocking it means ChatGPT cannot learn from or cite your content.
Ensure ClaudeBot and anthropic-ai are not blocked.
Why AI cares: ClaudeBot is Anthropic's crawler. Blocking it means Claude cannot access or cite your content in responses.
Ensure PerplexityBot is not blocked. Perplexity is one of the highest-citation-rate AI search engines.
Why AI cares: Perplexity is the AI search engine most likely to cite your content directly in answers. Blocking it removes you from a major citation channel.
Ensure Google-Extended (Google's AI training crawler) is not blocked.
Why AI cares: Google-Extended controls whether your content appears in Google AI Overviews. Blocking it opts you out of one of the highest-traffic AI surfaces.
Check that your server is not sending X-Robots-Tag: noai or X-Robots-Tag: noimageai HTTP headers.
Why AI cares: The noai header directive explicitly tells AI systems not to use your content — even if robots.txt allows them to crawl it.
Ensure your homepage, pricing, product, and about pages do not have <meta name="robots" content="noindex">.
Why AI cares: Noindex blocks both search engine and AI crawler indexing. It is commonly accidentally applied to important pages.
Homepage, /pricing, /about, /blog, and /contact should all return 200 status codes.
Why AI cares: AI crawlers skip non-200 pages entirely. A broken pricing page means AI models cannot cite your pricing.
Add a Sitemap: directive pointing to your sitemap at the bottom of robots.txt.
Why AI cares: Sitemap discovery via robots.txt is the most reliable mechanism for crawlers to find your complete content inventory.
Avoid max-snippet:0 — set max-snippet:-1 (unlimited) or a large value like max-snippet:200.
Why AI cares: max-snippet:0 tells AI models they cannot quote any text from your page, effectively preventing citations.
Ensure there are no chains of 3+ redirects before reaching a final page.
Why AI cares: Long redirect chains slow crawlers and increase the chance they abandon before indexing the final URL.
Set max-image-preview: large to allow AI systems to use your images.
Why AI cares: Multimodal AI models may use your images to understand your product. Restricting previews reduces your visual footprint.
The most important are: GPTBot (ChatGPT/OpenAI), ClaudeBot and anthropic-ai (Claude/Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI Overviews), Amazonbot (Alexa AI), and Meta-ExternalAgent (Meta AI). Allow all of these unless you have a specific legal reason to block them.
For future training data, yes. But AI models already trained on your content (before your block) will still reference it. For real-time web search features (Perplexity, ChatGPT browsing, Gemini), blocking crawlers removes you from live citations going forward.
For most B2B companies, blocking AI crawlers is counterproductive. Being cited by AI models drives awareness and qualified traffic. The exception is if your content is a core IP asset and AI reproduction would directly harm your business model — in that case, consult legal counsel.
ansly audits your site across all 7 AEO categories including AI Crawler Access. Get your score in under 60 seconds.
Audit my site free →