AEO Category17% of overall AEO score

AI Crawler Access Checklist (2026)

AI Crawler Access is about permissions — making sure you have not accidentally blocked the AI crawlers that power ChatGPT, Perplexity, Gemini, and Claude from reading your content. Worth 17% of your AEO score, this category is the easiest to fix and the most commonly broken.

Category weight17%

Total checks

High priority

Quick wins

High Priority

8 checks

Create a valid robots.txt

high priorityQuick win

Publish a robots.txt file at the root of your domain. It must be accessible at https://yourdomain.com/robots.txt.

How to implement ▸

Create a robots.txt file. At minimum: User-agent: *\nAllow: /\nSitemap: https://yourdomain.com/sitemap.xml

Why AI cares: Without robots.txt, AI crawlers fall back to default behavior and may not index everything. An explicit file signals you are in control.

Allow GPTBot (OpenAI)

high priorityQuick win

Ensure GPTBot is not blocked in robots.txt.

How to implement ▸

Remove any "Disallow:" rules under "User-agent: GPTBot". If you want to allow only certain paths: User-agent: GPTBot\nAllow: /\nDisallow: /private/

Why AI cares: GPTBot is OpenAI's crawler for ChatGPT training and web browsing. Blocking it means ChatGPT cannot learn from or cite your content.

Allow ClaudeBot (Anthropic)

high priorityQuick win

Ensure ClaudeBot and anthropic-ai are not blocked.

How to implement ▸

Remove any "Disallow:" under "User-agent: ClaudeBot" and "User-agent: anthropic-ai".

Why AI cares: ClaudeBot is Anthropic's crawler. Blocking it means Claude cannot access or cite your content in responses.

Allow PerplexityBot

high priorityQuick win

Ensure PerplexityBot is not blocked. Perplexity is one of the highest-citation-rate AI search engines.

How to implement ▸

Remove any "Disallow:" under "User-agent: PerplexityBot".

Why AI cares: Perplexity is the AI search engine most likely to cite your content directly in answers. Blocking it removes you from a major citation channel.

Allow Google-Extended

high priorityQuick win

Ensure Google-Extended (Google's AI training crawler) is not blocked.

How to implement ▸

Remove any "Disallow:" under "User-agent: Google-Extended". Note: this controls Google AI Overviews usage of your content.

Why AI cares: Google-Extended controls whether your content appears in Google AI Overviews. Blocking it opts you out of one of the highest-traffic AI surfaces.

Remove X-Robots-Tag: noai headers

high priorityModerate effort

Check that your server is not sending X-Robots-Tag: noai or X-Robots-Tag: noimageai HTTP headers.

How to implement ▸

Audit your server/CDN response headers. Remove any noai directives from nginx, Apache, or CDN configs.

Why AI cares: The noai header directive explicitly tells AI systems not to use your content — even if robots.txt allows them to crawl it.

Remove noindex meta tags from key pages

high priorityQuick win

Ensure your homepage, pricing, product, and about pages do not have <meta name="robots" content="noindex">.

How to implement ▸

Check page source for <meta name="robots"> tags. Remove or change to "index, follow" on all public pages.

Why AI cares: Noindex blocks both search engine and AI crawler indexing. It is commonly accidentally applied to important pages.

Ensure key pages return HTTP 200

high priorityQuick win

Homepage, /pricing, /about, /blog, and /contact should all return 200 status codes.

How to implement ▸

Test with: curl -I https://yourdomain.com/pricing. Fix any 404s, 403s, or redirect loops.

Why AI cares: AI crawlers skip non-200 pages entirely. A broken pricing page means AI models cannot cite your pricing.

Medium Priority

3 checks

Reference sitemap.xml in robots.txt

medium priorityQuick win

Add a Sitemap: directive pointing to your sitemap at the bottom of robots.txt.

How to implement ▸

Add "Sitemap: https://yourdomain.com/sitemap.xml" to the end of robots.txt.

Why AI cares: Sitemap discovery via robots.txt is the most reliable mechanism for crawlers to find your complete content inventory.

Set a permissive max-snippet policy

medium priorityQuick win

Avoid max-snippet:0 — set max-snippet:-1 (unlimited) or a large value like max-snippet:200.

How to implement ▸

In robots.txt or meta robots: "max-snippet: -1". In Next.js: robots: { index: true, "max-snippet": -1 }

Why AI cares: max-snippet:0 tells AI models they cannot quote any text from your page, effectively preventing citations.

Fix redirect chains

medium priorityModerate effort

Ensure there are no chains of 3+ redirects before reaching a final page.

How to implement ▸

Test with a redirect checker tool. Simplify to a single 301 redirect wherever possible.

Why AI cares: Long redirect chains slow crawlers and increase the chance they abandon before indexing the final URL.

Lower Priority

1 check

Allow large image previews

low priorityQuick win

Set max-image-preview: large to allow AI systems to use your images.

How to implement ▸

Add to meta robots: max-image-preview:large. This is the default for most sites unless explicitly restricted.

Why AI cares: Multimodal AI models may use your images to understand your product. Restricting previews reduces your visual footprint.

Frequently Asked Questions

Which AI crawlers should I allow in robots.txt?+

The most important are: GPTBot (ChatGPT/OpenAI), ClaudeBot and anthropic-ai (Claude/Anthropic), PerplexityBot (Perplexity), Google-Extended (Google AI Overviews), Amazonbot (Alexa AI), and Meta-ExternalAgent (Meta AI). Allow all of these unless you have a specific legal reason to block them.

Will blocking AI crawlers prevent AI models from using my content?+

For future training data, yes. But AI models already trained on your content (before your block) will still reference it. For real-time web search features (Perplexity, ChatGPT browsing, Gemini), blocking crawlers removes you from live citations going forward.

Should I block AI crawlers to protect my content?+

For most B2B companies, blocking AI crawlers is counterproductive. Being cited by AI models drives awareness and qualified traffic. The exception is if your content is a core IP asset and AI reproduction would directly harm your business model — in that case, consult legal counsel.