Most technical SEO conversations center on schema markup, canonical tags, and Core Web Vitals. But there is a single file that carries more weight in your AEO score than all of your schema combined. That file is llms.txt, and it accounts for 23% of your total AEO score, the highest-weighted category in our audit framework.
If you have not created one yet, or if yours is malformed, you are leaving the largest single optimization lever untouched.
What Is llms.txt (And Why It Has More Weight Than Your Schema)
llms.txt is a plain-text, Markdown-formatted file hosted at the root of your domain, either at /llms.txt or /llms-full.txt, that tells AI agents and large language models how to interact with your site. Think of it as a structured briefing document written specifically for machines that read and reason about content rather than crawl it for link signals.
The format was proposed by Answer.AI in 2024 and has been gaining rapid adoption among developer-focused and B2B SaaS sites. It is not yet an RFC or W3C standard, but it is already being referenced by AI agent frameworks, retrieval-augmented generation (RAG) pipelines, and emerging AI search engines as a discovery mechanism.
Here is the critical distinction: llms.txt is fundamentally different from robots.txt. robots.txt is a crawl directive, it tells bots which URLs they are allowed or not allowed to fetch. llms.txt is guidance, it tells AI agents what your site is for, who it serves, and which URLs contain the most authoritative content. One controls access; the other shapes understanding.
AI engines like Perplexity, ChatGPT with browsing, and Claude Citations do not just crawl your site, they build a semantic model of it. llms.txt lets you influence that model directly. Without it, the AI has to infer your site's purpose from page content alone, which introduces noise and ambiguity. With a well-structured llms.txt, you are handing the AI a curated map of your content hierarchy and primary use cases.
That is why it outweighs schema in our scoring model. Schema annotates individual pages. llms.txt frames your entire site.
The Required Structure of a Valid llms.txt File
The llms.txt specification defines a specific Markdown structure. Deviations from this structure will cause AI agents to either ignore the file or parse it incorrectly. The four required elements are:
H1 - Site Name. The first line of the file must be a single H1 heading containing your site or company name. This is the primary identifier AI agents use to associate the file with your brand.
Blockquote - Purpose Statement. Immediately following the H1, you must include a Markdown blockquote (a line beginning with >) that contains a single sentence describing what your company does and who it serves. This is the most important sentence in the file. It is the text most likely to be used verbatim by AI agents when summarizing your site.
H2 Sections - Content Areas. The body of the file is organized into H2 sections, each representing a major content area of your site. Common sections include About, Products, Documentation, Blog, Pricing, and Contact. Each section should be named semantically, avoid internal jargon that an AI agent would not recognize.
Markdown Links - Key URLs. Within each section, you list URLs as Markdown links followed by a brief plain-text description. The link text should be descriptive, and the description should explain the page's purpose in a single clause. These links serve as the AI agent's entry points into your content hierarchy.
A Copy-Paste llms.txt Template
Host this file at https://yourdomain.com/llms.txt. Replace every bracketed placeholder with your actual content before publishing.
# [Company Name]
> [One-sentence description of what your company does and who it serves]
## About
- [https://yourdomain.com/about]: Company background, mission, and team
## Products
- [https://yourdomain.com/products]: Overview of all products and services
- [https://yourdomain.com/products/[product-name]]: [What this specific product does]
## Documentation
- [https://yourdomain.com/docs]: Technical documentation and integration guides
- [https://yourdomain.com/docs/getting-started]: Quickstart guide for new users
## Blog
- [https://yourdomain.com/blog]: Latest articles, guides, and industry insights
## Pricing
- [https://yourdomain.com/pricing]: Plans, pricing tiers, and feature comparison
## Contact
- [https://yourdomain.com/contact]: Sales inquiries, support, and partnership requests
A few notes on this template. Keep the blockquote to a single sentence, do not write a paragraph. AI agents treat the blockquote as a summary snippet, and longer text dilutes its signal value. Keep URLs canonical and avoid query strings or UTM parameters in this file. The links should point to the most authoritative, stable version of each page.
If your site has substantial documentation, consider also creating /llms-full.txt with expanded content (see FAQ below).
The 7 Checks Our Tool Runs on Your llms.txt
When you run an AEO audit on your site, our tool performs seven discrete checks against your llms.txt file. Each check is binary, it either passes or fails, and the aggregate score rolls up into the 23% category weight.
-
File exists at
/llms.txt. The file must be publicly accessible at the root of your domain over HTTPS. Redirects are followed, but a 404 or 403 response fails this check immediately and zeros out the entire category. -
Valid H1 present. The first content element in the file must be an H1 heading. Files that begin with a comment, a blank line followed by an H2, or plain text without heading syntax fail this check.
-
Blockquote present. The file must contain at least one Markdown blockquote. Our tool checks that the blockquote appears near the top of the file (within the first 500 characters) and contains a non-trivial string (more than 20 characters).
-
Multiple H2 sections present. A valid
llms.txtmust contain at least two H2 sections. Single-section files suggest incomplete implementation and score poorly with AI agents that expect a content hierarchy. -
Contains valid Markdown links. Each H2 section must contain at least one Markdown-formatted link in the format
[text](url)or- [url]: description. Files that list bare URLs without Markdown formatting are not parsed correctly by most AI agent frameworks. -
Links resolve without errors. Our tool performs a HEAD request against each URL listed in the file. Links that return 4xx or 5xx status codes fail this check. This is one of the most commonly failed checks,
llms.txtfiles that were created once and never updated accumulate dead links as site structure changes. -
File size under 100KB. The
llms.txtfile should be a concise index, not a content dump. Files over 100KB are treated as malformed by several AI agent implementations. If you need to expose full page content to AI agents, use/llms-full.txtfor that purpose and keep/llms.txtas a lean navigation document.
Common llms.txt Mistakes That Kill Your Score
Using HTML instead of Markdown. Some teams auto-generate llms.txt from their sitemap pipeline and output HTML tags. AI agents that parse llms.txt expect Markdown. An H1 in HTML (<h1>) is not equivalent to a Markdown H1 (#).
Copying robots.txt logic into llms.txt. The files serve different purposes. Do not include Disallow directives, User-agent blocks, or Crawl-delay settings. These are meaningless in the llms.txt context and signal that the file was created by someone who conflated the two formats.
Writing a marketing headline in the blockquote. The purpose statement should be descriptive and factual, not a tagline. "We help companies grow faster" fails because it tells the AI nothing specific about what you actually do. "B2B SaaS platform that automates invoice reconciliation for mid-market finance teams" is precise and usable.
Listing only your homepage. The value of llms.txt comes from its content hierarchy. A file that lists only https://yourdomain.com in every section provides no navigation signal and scores near zero on our link quality check.
Forgetting to update it. llms.txt is a living document. If you rename a product, retire a docs section, or restructure your URL hierarchy, the file must be updated in tandem. Stale links are the single most common cause of audit failures in this category.
Your llms.txt is the highest-leverage file you can add to your site for AI engine optimization. At 23% of your total AEO score, getting it right, correct structure, valid links, a precise purpose statement, moves the needle more than any other single technical change.
Check your llms.txt score now. It is 23% of your total AEO score. Run a free audit at tryansly.com.
llms.txt vs robots.txt vs sitemap.xml: When to Use Each
These three files are often confused because they all live at the root of your domain and all relate to how crawlers interact with your site. They serve completely different functions.
| File | Purpose | Who reads it | Controls |
|---|---|---|---|
robots.txt | Crawl access control | All web crawlers | Which URLs are allowed/disallowed |
sitemap.xml | URL discovery | Search crawlers (Googlebot, Bingbot) | Which pages exist and their priority |
llms.txt | Semantic guidance | AI agents and LLMs | What your site is for and which content is authoritative |
robots.txt is an access control document. It tells crawlers which pages they are permitted or prohibited from fetching. It is not optional — well-behaved crawlers check it before making any request. For AEO purposes, your robots.txt must explicitly allow or not block AI crawlers (GPTBot, ClaudeBot, PerplexityBot). If it blocks them, nothing in your llms.txt matters because the AI agent will never read your content.
sitemap.xml helps search engines discover pages they might not find through link crawling. It tells them which URLs exist, when they were last modified, and their relative priority. Traditional SEO tools (Google Search Console, Bing Webmaster Tools) rely heavily on sitemaps. AI agents use sitemaps to some extent, but they prioritize llms.txt as the semantic entry point.
llms.txt does something neither of the other files does: it explains your site's purpose and content hierarchy to an AI that has already been granted access. Where robots.txt answers "can I crawl this?", llms.txt answers "what is this site for and what should I prioritize reading?" It is guidance, not gating.
The practical rule: All three files should be present. robots.txt ensures AI crawlers have access. sitemap.xml ensures your URLs are discoverable. llms.txt ensures AI agents understand which content to attribute to your brand.
How Different AI Engines Read llms.txt
AI engines don't all parse llms.txt identically. Understanding the differences helps you write a file that performs across the platforms your buyers use.
ChatGPT (with browsing / SearchGPT)
ChatGPT's web browsing mechanism retrieves content from Bing's index supplemented by direct URL fetching. When ChatGPT's browsing system accesses your domain, llms.txt is one of the first files it checks alongside robots.txt. The purpose statement in your llms.txt blockquote is particularly important here — GPT-family models use it to build an initial semantic frame for your site before reading individual pages. A precise, factual blockquote (see the "Common Mistakes" section above) increases the accuracy of how ChatGPT describes your brand and products in generated answers.
Perplexity
PerplexityBot runs one of the most frequent crawl cycles of any AI agent — re-crawling high-authority pages every 2–3 days. Perplexity's retrieval system reads llms.txt as part of site discovery and uses the H2 sections and linked URLs to prioritize which pages to index most frequently. Sites with well-structured llms.txt files that link their most authoritative content tend to see those pages cached and cited more reliably. The "Products" or "Documentation" sections are particularly valuable for Perplexity because they help the crawler identify your primary content vs. supporting or archival content.
Claude (Anthropic)
ClaudeBot is Anthropic's web crawler, used for Claude's Citations and web retrieval features. Claude's citation model is authority-driven — it prioritizes primary sources and official documentation. Your llms.txt signals authority by clearly identifying your canonical content areas. The "About" section with a precise description of your company's expertise and the "Products" section linking to official product pages both contribute to how Claude classifies your site's authority on specific topics. For Claude specifically, the H1 heading (your company name) and blockquote (your purpose statement) carry the most weight in establishing brand identity.
Google AI Overviews
Google's approach to llms.txt is the most conservative of the four major AI engines. Google has not officially confirmed that llms.txt influences AI Overview selection, and Googlebot continues to rely primarily on structured data (schema markup), PageRank, and traditional crawl signals. That said, the structured content hierarchy implied by a well-formed llms.txt reinforces the same signals Google's systems already read. Writing a llms.txt optimized for Perplexity and Claude will not hurt your Google AI Overview eligibility and may indirectly help by improving content organization.
Testing Whether Your llms.txt Is Being Read
Once you've published your llms.txt, you need to verify that AI crawlers are actually fetching it. Three ways to confirm:
1. Server access logs. The most reliable method. Filter your access logs for requests to /llms.txt. You should see requests from GPTBot, ClaudeBot, PerplexityBot, and other AI user-agents within days to weeks of publishing the file. The absence of these requests after 2–4 weeks suggests a technical access issue.
2. tryansly.com AEO audit. Running an audit on your domain checks all 7 llms.txt validation points and confirms whether the file is publicly accessible, correctly structured, and contains valid links. The llms.txt category score in the audit tells you which of the 7 checks are failing without having to interpret access logs manually.
3. Direct URL check. Visit https://yourdomain.com/llms.txt in a browser. If you see the raw Markdown content, the file is publicly accessible. If you get a 404, the file isn't deployed correctly. If you get redirected, check that the final URL is exactly /llms.txt and not a redirect chain.
What to do if AI crawlers aren't fetching your llms.txt:
- Check
robots.txtfor anyDisallow: /orDisallow: /llms.txtrules that apply to all user-agents or the specific AI crawlers - Verify the file is served with content-type
text/plainortext/markdown— some CMS configurations serve all root-level files as HTML which may cause parsing issues - Check that the file isn't behind authentication or behind Cloudflare's bot protection rules
- Resubmit the URL through Bing Webmaster Tools' IndexNow API to trigger faster recrawling
Related Reading
- Free AEO Audit: What It Checks and How to Read Your Score, llms.txt is Category 1, the highest-weighted category in the audit.
- AEO Monitoring: How to Track Your AI Search Visibility Over Time - How to monitor whether your llms.txt improvements are producing citation gains.
- Best AEO Checker in 2026 (Compared), tryansly.com is the only tool that validates all 7 llms.txt checks automatically.