anslyansly
AuditPricingBlog
Sign In
anslyansly

AI-readiness platform for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact
anslyansly
AuditPricingBlog
Sign In
Home/Blog/llms.txt Complete Guide: Template, Format Rules & 7-Point Validator (2026)
Dark code editor with syntax-highlighted text on a monitor, representing technical configuration files
AEO6 min read

llms.txt Complete Guide: Template, Format Rules & 7-Point Validator (2026)

llms.txt is the single highest-weighted file in your AEO score, accounting for 23% of your total rating. Here's how to write one that actually works.

ansly Team·Published March 6, 2026·Updated April 18, 2026

Most technical SEO conversations center on schema markup, canonical tags, and Core Web Vitals. But there is a single file that carries more weight in your AEO score than all of your schema combined. That file is llms.txt, and it accounts for 23% of your total AEO score, the highest-weighted category in our audit framework.

If you have not created one yet, or if yours is malformed, you are leaving the largest single optimization lever untouched.


What Is llms.txt (And Why It Has More Weight Than Your Schema)

llms.txt is a plain-text, Markdown-formatted file hosted at the root of your domain, either at /llms.txt or /llms-full.txt, that tells AI agents and large language models how to interact with your site. Think of it as a structured briefing document written specifically for machines that read and reason about content rather than crawl it for link signals.

The format was proposed by Answer.AI in 2024 and has been gaining rapid adoption among developer-focused and B2B SaaS sites. It is not yet an RFC or W3C standard, but it is already being referenced by AI agent frameworks, retrieval-augmented generation (RAG) pipelines, and emerging AI search engines as a discovery mechanism.

Here is the critical distinction: llms.txt is fundamentally different from robots.txt. robots.txt is a crawl directive, it tells bots which URLs they are allowed or not allowed to fetch. llms.txt is guidance, it tells AI agents what your site is for, who it serves, and which URLs contain the most authoritative content. One controls access; the other shapes understanding.

AI engines like Perplexity, ChatGPT with browsing, and Claude Citations do not just crawl your site, they build a semantic model of it. llms.txt lets you influence that model directly. Without it, the AI has to infer your site's purpose from page content alone, which introduces noise and ambiguity. With a well-structured llms.txt, you are handing the AI a curated map of your content hierarchy and primary use cases.

That is why it outweighs schema in our scoring model. Schema annotates individual pages. llms.txt frames your entire site.


The Required Structure of a Valid llms.txt File

The llms.txt specification defines a specific Markdown structure. Deviations from this structure will cause AI agents to either ignore the file or parse it incorrectly. The four required elements are:

H1 - Site Name. The first line of the file must be a single H1 heading containing your site or company name. This is the primary identifier AI agents use to associate the file with your brand.

Blockquote - Purpose Statement. Immediately following the H1, you must include a Markdown blockquote (a line beginning with >) that contains a single sentence describing what your company does and who it serves. This is the most important sentence in the file. It is the text most likely to be used verbatim by AI agents when summarizing your site.

H2 Sections - Content Areas. The body of the file is organized into H2 sections, each representing a major content area of your site. Common sections include About, Products, Documentation, Blog, Pricing, and Contact. Each section should be named semantically, avoid internal jargon that an AI agent would not recognize.

Markdown Links - Key URLs. Within each section, you list URLs as Markdown links followed by a brief plain-text description. The link text should be descriptive, and the description should explain the page's purpose in a single clause. These links serve as the AI agent's entry points into your content hierarchy.


A Copy-Paste llms.txt Template

Host this file at https://yourdomain.com/llms.txt. Replace every bracketed placeholder with your actual content before publishing.

# [Company Name]

> [One-sentence description of what your company does and who it serves]

## About
- [https://yourdomain.com/about]: Company background, mission, and team

## Products
- [https://yourdomain.com/products]: Overview of all products and services
- [https://yourdomain.com/products/[product-name]]: [What this specific product does]

## Documentation
- [https://yourdomain.com/docs]: Technical documentation and integration guides
- [https://yourdomain.com/docs/getting-started]: Quickstart guide for new users

## Blog
- [https://yourdomain.com/blog]: Latest articles, guides, and industry insights

## Pricing
- [https://yourdomain.com/pricing]: Plans, pricing tiers, and feature comparison

## Contact
- [https://yourdomain.com/contact]: Sales inquiries, support, and partnership requests

A few notes on this template. Keep the blockquote to a single sentence, do not write a paragraph. AI agents treat the blockquote as a summary snippet, and longer text dilutes its signal value. Keep URLs canonical and avoid query strings or UTM parameters in this file. The links should point to the most authoritative, stable version of each page.

If your site has substantial documentation, consider also creating /llms-full.txt with expanded content (see FAQ below).


The 7 Checks Our Tool Runs on Your llms.txt

When you run an AEO audit on your site, our tool performs seven discrete checks against your llms.txt file. Each check is binary, it either passes or fails, and the aggregate score rolls up into the 23% category weight.

  1. File exists at /llms.txt. The file must be publicly accessible at the root of your domain over HTTPS. Redirects are followed, but a 404 or 403 response fails this check immediately and zeros out the entire category.

  2. Valid H1 present. The first content element in the file must be an H1 heading. Files that begin with a comment, a blank line followed by an H2, or plain text without heading syntax fail this check.

  3. Blockquote present. The file must contain at least one Markdown blockquote. Our tool checks that the blockquote appears near the top of the file (within the first 500 characters) and contains a non-trivial string (more than 20 characters).

  4. Multiple H2 sections present. A valid llms.txt must contain at least two H2 sections. Single-section files suggest incomplete implementation and score poorly with AI agents that expect a content hierarchy.

  5. Contains valid Markdown links. Each H2 section must contain at least one Markdown-formatted link in the format [text](url) or - [url]: description. Files that list bare URLs without Markdown formatting are not parsed correctly by most AI agent frameworks.

  6. Links resolve without errors. Our tool performs a HEAD request against each URL listed in the file. Links that return 4xx or 5xx status codes fail this check. This is one of the most commonly failed checks, llms.txt files that were created once and never updated accumulate dead links as site structure changes.

  7. File size under 100KB. The llms.txt file should be a concise index, not a content dump. Files over 100KB are treated as malformed by several AI agent implementations. If you need to expose full page content to AI agents, use /llms-full.txt for that purpose and keep /llms.txt as a lean navigation document.


Common llms.txt Mistakes That Kill Your Score

Using HTML instead of Markdown. Some teams auto-generate llms.txt from their sitemap pipeline and output HTML tags. AI agents that parse llms.txt expect Markdown. An H1 in HTML (<h1>) is not equivalent to a Markdown H1 (#).

Copying robots.txt logic into llms.txt. The files serve different purposes. Do not include Disallow directives, User-agent blocks, or Crawl-delay settings. These are meaningless in the llms.txt context and signal that the file was created by someone who conflated the two formats.

Writing a marketing headline in the blockquote. The purpose statement should be descriptive and factual, not a tagline. "We help companies grow faster" fails because it tells the AI nothing specific about what you actually do. "B2B SaaS platform that automates invoice reconciliation for mid-market finance teams" is precise and usable.

Listing only your homepage. The value of llms.txt comes from its content hierarchy. A file that lists only https://yourdomain.com in every section provides no navigation signal and scores near zero on our link quality check.

Forgetting to update it. llms.txt is a living document. If you rename a product, retire a docs section, or restructure your URL hierarchy, the file must be updated in tandem. Stale links are the single most common cause of audit failures in this category.


Your llms.txt is the highest-leverage file you can add to your site for AI engine optimization. At 23% of your total AEO score, getting it right, correct structure, valid links, a precise purpose statement, moves the needle more than any other single technical change.

Check your llms.txt score now. It is 23% of your total AEO score. Run a free audit at tryansly.com.


llms.txt vs robots.txt vs sitemap.xml: When to Use Each

These three files are often confused because they all live at the root of your domain and all relate to how crawlers interact with your site. They serve completely different functions.

FilePurposeWho reads itControls
robots.txtCrawl access controlAll web crawlersWhich URLs are allowed/disallowed
sitemap.xmlURL discoverySearch crawlers (Googlebot, Bingbot)Which pages exist and their priority
llms.txtSemantic guidanceAI agents and LLMsWhat your site is for and which content is authoritative

robots.txt is an access control document. It tells crawlers which pages they are permitted or prohibited from fetching. It is not optional — well-behaved crawlers check it before making any request. For AEO purposes, your robots.txt must explicitly allow or not block AI crawlers (GPTBot, ClaudeBot, PerplexityBot). If it blocks them, nothing in your llms.txt matters because the AI agent will never read your content.

sitemap.xml helps search engines discover pages they might not find through link crawling. It tells them which URLs exist, when they were last modified, and their relative priority. Traditional SEO tools (Google Search Console, Bing Webmaster Tools) rely heavily on sitemaps. AI agents use sitemaps to some extent, but they prioritize llms.txt as the semantic entry point.

llms.txt does something neither of the other files does: it explains your site's purpose and content hierarchy to an AI that has already been granted access. Where robots.txt answers "can I crawl this?", llms.txt answers "what is this site for and what should I prioritize reading?" It is guidance, not gating.

The practical rule: All three files should be present. robots.txt ensures AI crawlers have access. sitemap.xml ensures your URLs are discoverable. llms.txt ensures AI agents understand which content to attribute to your brand.


How Different AI Engines Read llms.txt

AI engines don't all parse llms.txt identically. Understanding the differences helps you write a file that performs across the platforms your buyers use.

ChatGPT (with browsing / SearchGPT)

ChatGPT's web browsing mechanism retrieves content from Bing's index supplemented by direct URL fetching. When ChatGPT's browsing system accesses your domain, llms.txt is one of the first files it checks alongside robots.txt. The purpose statement in your llms.txt blockquote is particularly important here — GPT-family models use it to build an initial semantic frame for your site before reading individual pages. A precise, factual blockquote (see the "Common Mistakes" section above) increases the accuracy of how ChatGPT describes your brand and products in generated answers.

Perplexity

PerplexityBot runs one of the most frequent crawl cycles of any AI agent — re-crawling high-authority pages every 2–3 days. Perplexity's retrieval system reads llms.txt as part of site discovery and uses the H2 sections and linked URLs to prioritize which pages to index most frequently. Sites with well-structured llms.txt files that link their most authoritative content tend to see those pages cached and cited more reliably. The "Products" or "Documentation" sections are particularly valuable for Perplexity because they help the crawler identify your primary content vs. supporting or archival content.

Claude (Anthropic)

ClaudeBot is Anthropic's web crawler, used for Claude's Citations and web retrieval features. Claude's citation model is authority-driven — it prioritizes primary sources and official documentation. Your llms.txt signals authority by clearly identifying your canonical content areas. The "About" section with a precise description of your company's expertise and the "Products" section linking to official product pages both contribute to how Claude classifies your site's authority on specific topics. For Claude specifically, the H1 heading (your company name) and blockquote (your purpose statement) carry the most weight in establishing brand identity.

Google AI Overviews

Google's approach to llms.txt is the most conservative of the four major AI engines. Google has not officially confirmed that llms.txt influences AI Overview selection, and Googlebot continues to rely primarily on structured data (schema markup), PageRank, and traditional crawl signals. That said, the structured content hierarchy implied by a well-formed llms.txt reinforces the same signals Google's systems already read. Writing a llms.txt optimized for Perplexity and Claude will not hurt your Google AI Overview eligibility and may indirectly help by improving content organization.


Testing Whether Your llms.txt Is Being Read

Once you've published your llms.txt, you need to verify that AI crawlers are actually fetching it. Three ways to confirm:

1. Server access logs. The most reliable method. Filter your access logs for requests to /llms.txt. You should see requests from GPTBot, ClaudeBot, PerplexityBot, and other AI user-agents within days to weeks of publishing the file. The absence of these requests after 2–4 weeks suggests a technical access issue.

2. tryansly.com AEO audit. Running an audit on your domain checks all 7 llms.txt validation points and confirms whether the file is publicly accessible, correctly structured, and contains valid links. The llms.txt category score in the audit tells you which of the 7 checks are failing without having to interpret access logs manually.

3. Direct URL check. Visit https://yourdomain.com/llms.txt in a browser. If you see the raw Markdown content, the file is publicly accessible. If you get a 404, the file isn't deployed correctly. If you get redirected, check that the final URL is exactly /llms.txt and not a redirect chain.

What to do if AI crawlers aren't fetching your llms.txt:

  • Check robots.txt for any Disallow: / or Disallow: /llms.txt rules that apply to all user-agents or the specific AI crawlers
  • Verify the file is served with content-type text/plain or text/markdown — some CMS configurations serve all root-level files as HTML which may cause parsing issues
  • Check that the file isn't behind authentication or behind Cloudflare's bot protection rules
  • Resubmit the URL through Bing Webmaster Tools' IndexNow API to trigger faster recrawling

Related Reading

  • Free AEO Audit: What It Checks and How to Read Your Score, llms.txt is Category 1, the highest-weighted category in the audit.
  • AEO Monitoring: How to Track Your AI Search Visibility Over Time - How to monitor whether your llms.txt improvements are producing citation gains.
  • Best AEO Checker in 2026 (Compared), tryansly.com is the only tool that validates all 7 llms.txt checks automatically.

On this page

What Is llms.txt (And Why It Has More Weight Than Your Schema)The Required Structure of a Valid llms.txt FileA Copy-Paste llms.txt TemplateAboutProductsDocumentationBlogPricingContactThe 7 Checks Our Tool Runs on Your llms.txtCommon llms.txt Mistakes That Kill Your Scorellms.txt vs robots.txt vs sitemap.xml: When to Use EachHow Different AI Engines Read llms.txtTesting Whether Your llms.txt Is Being ReadRelated Reading

Frequently Asked Questions

Is llms.txt an official standard?▾

It was proposed by Answer.AI in 2024 and is not yet a formal RFC or W3C standard. However, it has seen significant adoption among developer tools, AI-native SaaS products, and technical documentation sites. Several major AI agent frameworks explicitly check for llms.txt during site discovery. Treating it as a best practice rather than a formal standard is accurate, but given its weight in AI engine behavior, that undersells its importance.

What is the difference between llms.txt and robots.txt?▾

robots.txt is an access control document — it tells crawlers which URLs they are permitted or prohibited from fetching. llms.txt is a semantic guidance document. It does not block or allow access; it explains your site's purpose, structure, and primary content areas to AI agents that have already accessed your content. A bot that respects robots.txt may still read your llms.txt to understand your site before deciding which content to retrieve.

Does llms.txt affect Google rankings?▾

Not directly. Google's core ranking algorithm does not use llms.txt as a ranking signal. However, it does affect how AI-powered surfaces that source from the web — including Google's AI Overviews, Perplexity, ChatGPT with browsing, and Claude Citations — select and attribute content. If your goal is to appear in AI-generated answers rather than traditional blue-link results, llms.txt is directly relevant to that objective.

How often should I update llms.txt?▾

Update it whenever your site structure changes in ways that affect the URLs listed in the file. This means reviewing it during any sprint that involves URL restructuring, product renaming, documentation reorganization, or the retirement of major content sections. Many teams add llms.txt review to their launch checklist for significant site changes. A quarterly review cadence is a reasonable floor if you do not have a more granular trigger.

Do I need llms-full.txt as well?▾

llms-full.txt serves a different function than llms.txt. While llms.txt is a concise index file, llms-full.txt contains the complete text content of your key pages, inlined into a single document that AI agents can retrieve in one request. It is most useful for smaller sites (under 50 pages) or documentation-heavy sites where AI agents benefit from having full technical content without following individual links. For most B2B sites, start with a well-structured llms.txt and add llms-full.txt only if you have evidence that AI agents are struggling to retrieve your full content.

Related Articles

Diagram concept: automated AI visibility tracker pipeline compared to ChatGPT consumer web interface
AEO11 min read

AI Visibility Trackers vs ChatGPT UI: Why the Same Question Can Return Two Answers

A technical overview for customers: your AI visibility tracker and the ChatGPT website are two different tools. Learn why exact parity with the UI is not possible, how trackers stay close enough to be useful, and how to use automated runs for directional citation and mention insights.

ansly Team·Apr 22, 2026
Abstract neural network visualization suggesting AI and search technology
AEO14 min read

How to Rank in Google Gemini and AI Mode (2026)

Most teams optimize one Google surface and ignore the others. Gemini draws on different surfaces than your rank tracker reports — here is how visibility actually works in 2026.

ansly Team·Apr 21, 2026
Digital marketing agency team presenting AI search optimization strategy to clients around conference table
AEO13 min read

AEO for Digital Agencies: How to Offer AI Search Optimization as a Service

AI search optimization is one of the fastest-growing service offerings in digital marketing. Here is how to build an AEO service, price it, structure deliverables, and deliver measurable results for clients.

ansly Team·Apr 18, 2026
← Back to Blog
anslyansly

AI-readiness platform for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact