anslyansly
AuditPricingBlog
Sign In
anslyansly

AI-readiness platform for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact
anslyansly
AuditPricingBlog
Sign In
Home/Blog/From robots.txt to entities.txt: The Evolution of How Websites Talk to Machines
Timeline of technology evolution with glowing nodes on a dark background representing web standards across decades
AEO5 min read

From robots.txt to entities.txt: The Evolution of How Websites Talk to Machines

Every generation of machine visitors to the web demanded a new file. In 1994, crawlers needed permissions. In 2004, they needed an inventory. In 2024, agents needed a summary. In 2026, they need a knowledge graph.

ansly Team·Published March 11, 2026

The history of web infrastructure is a history of websites learning to talk to machines.

Every time a new type of machine visitor appeared, with new capabilities, new needs, new constraints, the web had to develop new vocabulary. A new file format. A new convention. A new layer of the stack.

We are in the middle of the fourth such transition. And understanding what drove each of the previous three clarifies exactly why entities.txt is necessary now.


1994: The Crawler Arrives, robots.txt

The first web crawlers appeared in 1993 and 1994. Their job was simple: follow links, index content, repeat. They were fast, thorough, and indiscriminate. They would crawl your login forms, your private admin pages, your session URLs, your PDFs, anything reachable via a link.

Website owners needed a way to say: "You can go here, but not there."

robots.txt was the answer. Proposed in 1994 and adopted as a de facto standard, it gave websites a simple protocol to communicate access permissions to automated agents. A few lines of plain text at the root of the domain, readable by any crawler that chose to check.

User-agent: *
Disallow: /admin/
Disallow: /private/

It was entirely about permissions. Nothing about meaning. Nothing about what the allowed pages contained or what they were for. The crawler's job was to crawl, figuring out meaning was a problem for the search engine's indexing algorithms.

What it gave machines: Permission maps. "Go here. Not there."


2004: The Indexer Scales, sitemap.xml

By 2004, the web had grown from millions of pages to billions. Search engines were good at finding the most-linked content, but they were systematically missing pages with few inbound links, valuable pages buried deep in large sites, or orphaned by poor internal linking.

Website owners needed a way to say: "These pages exist. Please index them."

Google introduced the Sitemap Protocol in 2005 (codified as sitemap.xml). It was a structured XML inventory: a list of URLs with optional metadata about when they were last modified and how frequently they changed. Not a crawl directive, an inventory.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-01-15</lastmod>
  </url>
</urlset>

It told crawlers nothing about what those pages meant, what entities they discussed, or how they related to each other. It was a catalog of addresses, not a map of meaning.

What it gave machines: Page inventory. "These pages exist. Check them last-modified on this date."


2024: The Agent Emerges, llms.txt

For two decades, the crawler/sitemap combination was sufficient because the consumers of web content were fundamentally the same type of entity: search engine indexers that ranked pages by link signals and keyword patterns.

Then large language models changed everything.

AI agents don't rank pages, they synthesize answers. They don't care about PageRank, they care about meaning. And they operate under a constraint that traditional crawlers don't: context windows. A crawler can index a million pages. An LLM agent generating an answer has a finite budget for how much text it can process.

The Answer.AI team recognized this and proposed llms.txt in 2024: a plain Markdown file at /llms.txt that gives AI agents a curated narrative summary of your site, what you do, who you serve, and where your most important content lives.

# Acme Corp

> Enterprise project management for distributed teams

## Docs
- [Getting Started](/docs/getting-started.md): How to set up your workspace
- [API Reference](/docs/api.md): Full API documentation

## Optional
- [Full site content](/llms-full.txt)

This was a genuine leap. For the first time, a web standard was designed specifically for AI reasoning rather than link-following or keyword indexing. llms.txt gives agents a starting point, a curated map of your content hierarchy.

But it has a ceiling. It's prose and links. An AI agent reading your llms.txt still has to infer what entities you have, how they relate to each other, and what you're authoritative on. The information is there, buried in narrative, but it's not structured data that a machine can traverse efficiently.

What it gave machines: Narrative summary. "Here's what this site is about and where to find key content."


2026: The Agent Era, entities.txt

The progression is clear in retrospect:

YearFileAudienceGave machines
1994robots.txtCrawlersPermission maps
2005sitemap.xmlIndexersPage inventory
2024llms.txtLLM agentsNarrative summary
2026entities.txtAI agentsSemantic knowledge graph

Each layer added more semantic richness because each generation of machine visitor needed more semantic context.

robots.txt told crawlers where they could go. sitemap.xml told indexers what pages existed. llms.txt told LLMs what the site was about. entities.txt tells agents what the site IS, its entities, their relationships, topic authority, and page-entity mappings.

The jump from llms.txt to entities.txt is the jump from narrative to graph. A narrative summary tells a story. A knowledge graph expresses a structure. For machines that reason over structured data, the difference is profound.

An agent reading your entities.txt can answer: "What entities does this site have?" "How does product A relate to product B?" "Is this site authoritative on topic X?" "Which page should I read to understand entity Y?", without crawling a single page, and without inferring from prose.

What it gives machines: Semantic knowledge graph. "Here are the entities, how they relate, and what we're authoritative on."


The Pattern: Machines Evolve, Infrastructure Follows

There's a consistent dynamic across all four transitions:

  1. A new type of machine visitor appears with new capabilities
  2. Existing web infrastructure doesn't give that machine what it needs
  3. A new convention emerges to fill the gap
  4. Early adopters gain an advantage; late adopters scramble to catch up

In 1994, sites that implemented robots.txt avoided crawl problems that punished sites without it. In 2005, sites with sitemaps got better index coverage. In 2024, sites with llms.txt file got better representation in AI-generated summaries. In 2026, sites with entities.txt will give AI agents the structured identity layer they need to represent those sites accurately.

The infrastructure follows the machine. The machine is now an agent. The infrastructure it needs is a knowledge graph.


For a practical guide to building your entities.txt, see entities.txt: Why Every Website Needs a Semantic Identity File for AI Agents. To understand how entities.txt reduces AI hallucination about your brand, see How entities.txt Reduces AI Hallucination.

For the technical foundation of making your site visible to AI crawlers in general, see GPTBot, ClaudeBot, PerplexityBot: The Complete AI Crawler Access Guide and The Complete Guide to llms.txt. For a comparison of AEO checking tools that audit AI crawler access and technical readiness, see the best AEO checkers guide for 2026.

On this page

1994: The Crawler Arrives, robots.txt2004: The Indexer Scales, sitemap.xml2024: The Agent Emerges, llms.txtDocsOptional2026: The Agent Era, entities.txtThe Pattern: Machines Evolve, Infrastructure Follows

Related Articles

Laptop on a desk with a map on screen: local search, Maps, and Google Business Profile visibility
Local SEO9 min read

Google Business Profile & Local SEO: Small Business Essentials

Digital storefront on Maps & Search: GBP photos, security, posts, LSAs & reviews for stronger local SEO (Google SMB Bulletin).

ansly Team·Apr 25, 2026
Small business owner reviewing local search results on a phone next to a laptop, with notes for posting offers and events
Local SEO10 min read

Google Posts for Restaurants, Food and Drink: Local SEO Guide

Google Posts for restaurants and local food and drink: Updates, Offers, Events, where they appear, the 80-character rule, and common traps.

ansly Team·Apr 25, 2026
Diagram concept: automated AI visibility tracker pipeline compared to ChatGPT consumer web interface
AEO11 min read

AI Visibility Trackers vs ChatGPT UI: Why the Same Question Can Return Two Answers

A technical overview for customers: your AI visibility tracker and the ChatGPT website are two different tools. Learn why exact parity with the UI is not possible, how trackers stay close enough to be useful, and how to use automated runs for directional citation and mention insights.

ansly Team·Apr 22, 2026
← Back to Blog
anslyansly

AI-readiness platform for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact