The history of web infrastructure is a history of websites learning to talk to machines.

Every time a new type of machine visitor appeared, with new capabilities, new needs, new constraints, the web had to develop new vocabulary. A new file format. A new convention. A new layer of the stack.

We are in the middle of the fourth such transition. And understanding what drove each of the previous three clarifies exactly why entities.txt is necessary now.

1994: The Crawler Arrives, robots.txt

The first web crawlers appeared in 1993 and 1994. Their job was simple: follow links, index content, repeat. They were fast, thorough, and indiscriminate. They would crawl your login forms, your private admin pages, your session URLs, your PDFs, anything reachable via a link.

Website owners needed a way to say: "You can go here, but not there."

robots.txt was the answer. Proposed in 1994 and adopted as a de facto standard, it gave websites a simple protocol to communicate access permissions to automated agents. A few lines of plain text at the root of the domain, readable by any crawler that chose to check.

User-agent: *
Disallow: /admin/
Disallow: /private/

It was entirely about permissions. Nothing about meaning. Nothing about what the allowed pages contained or what they were for. The crawler's job was to crawl, figuring out meaning was a problem for the search engine's indexing algorithms.

What it gave machines: Permission maps. "Go here. Not there."

2004: The Indexer Scales, sitemap.xml

By 2004, the web had grown from millions of pages to billions. Search engines were good at finding the most-linked content, but they were systematically missing pages with few inbound links, valuable pages buried deep in large sites, or orphaned by poor internal linking.

Website owners needed a way to say: "These pages exist. Please index them."

Google introduced the Sitemap Protocol in 2005 (codified as sitemap.xml). It was a structured XML inventory: a list of URLs with optional metadata about when they were last modified and how frequently they changed. Not a crawl directive, an inventory.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-01-15</lastmod>
  </url>
</urlset>

It told crawlers nothing about what those pages meant, what entities they discussed, or how they related to each other. It was a catalog of addresses, not a map of meaning.

What it gave machines: Page inventory. "These pages exist. Check them last-modified on this date."

2024: The Agent Emerges, llms.txt

For two decades, the crawler/sitemap combination was sufficient because the consumers of web content were fundamentally the same type of entity: search engine indexers that ranked pages by link signals and keyword patterns.

Then large language models changed everything.

AI agents don't rank pages, they synthesize answers. They don't care about PageRank, they care about meaning. And they operate under a constraint that traditional crawlers don't: context windows. A crawler can index a million pages. An LLM agent generating an answer has a finite budget for how much text it can process.

The Answer.AI team recognized this and proposed llms.txt in 2024: a plain Markdown file at /llms.txt that gives AI agents a curated narrative summary of your site, what you do, who you serve, and where your most important content lives.

# Acme Corp

> Enterprise project management for distributed teams

## Docs
- [Getting Started](/docs/getting-started.md): How to set up your workspace
- [API Reference](/docs/api.md): Full API documentation

## Optional
- [Full site content](/llms-full.txt)

This was a genuine leap. For the first time, a web standard was designed specifically for AI reasoning rather than link-following or keyword indexing. llms.txt gives agents a starting point, a curated map of your content hierarchy.

But it has a ceiling. It's prose and links. An AI agent reading your llms.txt still has to infer what entities you have, how they relate to each other, and what you're authoritative on. The information is there, buried in narrative, but it's not structured data that a machine can traverse efficiently.

What it gave machines: Narrative summary. "Here's what this site is about and where to find key content."

2026: The Agent Era, entities.txt

The progression is clear in retrospect:

Year	File	Audience	Gave machines
1994	robots.txt	Crawlers	Permission maps
2005	sitemap.xml	Indexers	Page inventory
2024	llms.txt	LLM agents	Narrative summary
2026	entities.txt	AI agents	Semantic knowledge graph

Each layer added more semantic richness because each generation of machine visitor needed more semantic context.

robots.txt told crawlers where they could go. sitemap.xml told indexers what pages existed. llms.txt told LLMs what the site was about. entities.txt tells agents what the site IS, its entities, their relationships, topic authority, and page-entity mappings.

The jump from llms.txt to entities.txt is the jump from narrative to graph. A narrative summary tells a story. A knowledge graph expresses a structure. For machines that reason over structured data, the difference is profound.

An agent reading your entities.txt can answer: "What entities does this site have?" "How does product A relate to product B?" "Is this site authoritative on topic X?" "Which page should I read to understand entity Y?", without crawling a single page, and without inferring from prose.

What it gives machines: Semantic knowledge graph. "Here are the entities, how they relate, and what we're authoritative on."

The Pattern: Machines Evolve, Infrastructure Follows

There's a consistent dynamic across all four transitions:

A new type of machine visitor appears with new capabilities
Existing web infrastructure doesn't give that machine what it needs
A new convention emerges to fill the gap
Early adopters gain an advantage; late adopters scramble to catch up

In 1994, sites that implemented robots.txt avoided crawl problems that punished sites without it. In 2005, sites with sitemaps got better index coverage. In 2024, sites with llms.txt file got better representation in AI-generated summaries. In 2026, sites with entities.txt will give AI agents the structured identity layer they need to represent those sites accurately.

The infrastructure follows the machine. The machine is now an agent. The infrastructure it needs is a knowledge graph.

For a practical guide to building your entities.txt, see entities.txt: Why Every Website Needs a Semantic Identity File for AI Agents. To understand how entities.txt reduces AI hallucination about your brand, see How entities.txt Reduces AI Hallucination.

For the technical foundation of making your site visible to AI crawlers in general, see GPTBot, ClaudeBot, PerplexityBot: The Complete AI Crawler Access Guide and The Complete Guide to llms.txt. For a comparison of AEO checking tools that audit AI crawler access and technical readiness, see the best AEO checkers guide for 2026.

The history of web infrastructure is a history of websites learning to talk to machines.

We are in the middle of the fourth such transition. And understanding what drove each of the previous three clarifies exactly why entities.txt is necessary now.

1994: The Crawler Arrives, robots.txt

Website owners needed a way to say: "You can go here, but not there."

User-agent: *
Disallow: /admin/
Disallow: /private/

What it gave machines: Permission maps. "Go here. Not there."

2004: The Indexer Scales, sitemap.xml

Website owners needed a way to say: "These pages exist. Please index them."

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-01-15</lastmod>
  </url>
</urlset>

It told crawlers nothing about what those pages meant, what entities they discussed, or how they related to each other. It was a catalog of addresses, not a map of meaning.

What it gave machines: Page inventory. "These pages exist. Check them last-modified on this date."

2024: The Agent Emerges, llms.txt

Then large language models changed everything.

# Acme Corp

> Enterprise project management for distributed teams

## Docs
- [Getting Started](/docs/getting-started.md): How to set up your workspace
- [API Reference](/docs/api.md): Full API documentation

## Optional
- [Full site content](/llms-full.txt)

What it gave machines: Narrative summary. "Here's what this site is about and where to find key content."

2026: The Agent Era, entities.txt

The progression is clear in retrospect:

Year	File	Audience	Gave machines
1994	robots.txt	Crawlers	Permission maps
2005	sitemap.xml	Indexers	Page inventory
2024	llms.txt	LLM agents	Narrative summary
2026	entities.txt	AI agents	Semantic knowledge graph

Each layer added more semantic richness because each generation of machine visitor needed more semantic context.

What it gives machines: Semantic knowledge graph. "Here are the entities, how they relate, and what we're authoritative on."

The Pattern: Machines Evolve, Infrastructure Follows

There's a consistent dynamic across all four transitions:

A new type of machine visitor appears with new capabilities
Existing web infrastructure doesn't give that machine what it needs
A new convention emerges to fill the gap
Early adopters gain an advantage; late adopters scramble to catch up

The infrastructure follows the machine. The machine is now an agent. The infrastructure it needs is a knowledge graph.

From robots.txt to entities.txt: The Evolution of How Websites Talk to Machines

1994: The Crawler Arrives, robots.txt

2004: The Indexer Scales, sitemap.xml

2024: The Agent Emerges, llms.txt

2026: The Agent Era, entities.txt

The Pattern: Machines Evolve, Infrastructure Follows

Related Articles

Google Business Profile & Local SEO: Small Business Essentials

Google Posts for Restaurants, Food and Drink: Local SEO Guide

AI Visibility Trackers vs ChatGPT UI: Why the Same Question Can Return Two Answers

From robots.txt to entities.txt: The Evolution of How Websites Talk to Machines

1994: The Crawler Arrives, robots.txt

2004: The Indexer Scales, sitemap.xml

2024: The Agent Emerges, llms.txt

2026: The Agent Era, entities.txt

The Pattern: Machines Evolve, Infrastructure Follows

Related Articles

Google Business Profile & Local SEO: Small Business Essentials

Google Posts for Restaurants, Food and Drink: Local SEO Guide

AI Visibility Trackers vs ChatGPT UI: Why the Same Question Can Return Two Answers