anslyansly
AuditPricingBlog
Sign In
anslyansly

AI-readiness scanner for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact
anslyansly
AuditPricingBlog
Sign In
Home/Blog/entities.txt: Why Every Website Needs a Semantic Identity File for AI Agents
Illuminated graph of interconnected nodes on a dark background, representing a semantic knowledge network
AEO7 min read

entities.txt: Why Every Website Needs a Semantic Identity File for AI Agents

The web was built for humans. Now agents are reading it. Here's why every website needs an entities.txt, a semantic knowledge graph that tells AI agents exactly what you are, what you offer, and how everything relates.

ansly Team·March 10, 2026

The web was built for humans. Every design decision, from hyperlinks to visual layouts to page titles, was made for a person sitting in front of a screen, making judgments with their eyes and brain.

That era is ending.

In 2026, a growing share of the entities reading your website are not human. They are AI agents: automated systems that browse, parse, and synthesize information on behalf of users. B2B buyers ask ChatGPT "What's the best tool for X?" instead of Googling and clicking through tabs. Coding agents scan documentation on behalf of developers. Enterprise procurement agents evaluate vendor sites to build comparison matrices. Research agents pull information from dozens of sources to answer complex questions.

These agents don't read the way humans do. They don't browse. They don't absorb visual hierarchy or get persuaded by hero copy. They need to parse, understand, and cross-reference structured information efficiently, and they face a constraint that humans don't: context windows.

A human can spend 20 minutes exploring your website. An AI agent working within a bounded context window cannot afford to ingest your entire site. It needs a way to understand your semantic identity, what entities exist, how they relate, what you're authoritative on, in a single, efficient read.

This is the gap that entities.txt fills.


The Problem With Existing Web Infrastructure

Before explaining what entities.txt is, it's worth being precise about what the current tools do, and don't, provide to AI agents.

robots.txt gives agents permission signals: which URLs they are allowed to fetch. It says nothing about what those URLs mean.

sitemap.xml gives agents an inventory: a list of URLs with optional last-modified dates. It says nothing about what concepts or entities those pages represent.

llms.txt (introduced in 2024) gives agents a narrative summary: a Markdown-formatted table of contents pointing to your key pages. It's a major improvement over nothing, but it's still prose, not structured data. An LLM reading your llms.txt still has to infer what entities you have and how they relate.

Per-page JSON-LD gives agents entity declarations on individual pages: "This page has a SoftwareApplication entity named X." But these declarations are scattered across dozens of pages with no site-wide view, and JSON-LD has no mechanism to express relationships between entities. You can't say "Product A integrates with Product B" or "we compete with Company X" in standard schema.org.

The result: AI agents visiting your site have to crawl dozens of pages, piece together an understanding from scattered signals, and fill in gaps with inference, which is where hallucinations come from.


What entities.txt Is

entities.txt is a JSON file at /entities.txt that gives AI agents a semantic knowledge graph of your website: the entities you represent, how they relate to each other, what topics you're authoritative on, and which pages cover which concepts.

It follows the .txt convention (robots.txt, llms.txt, security.txt, ads.txt), a familiar, well-understood pattern for machine-readable files at well-known paths.

Here's what a minimal entities.txt looks like for a SaaS company:

{
  "version": "1.0",
  "domain": "acme.com",
  "name": "Acme Corp",
  "description": "Enterprise project management for distributed engineering teams",

  "entities": [
    {
      "id": "acme-corp",
      "name": "Acme Corp",
      "type": "organization",
      "description": "AI-powered project management platform founded in 2019",
      "url": "/about",
      "identifiers": {
        "linkedin": "https://linkedin.com/company/acme",
        "crunchbase": "https://crunchbase.com/organization/acme"
      }
    },
    {
      "id": "acme-projects",
      "name": "Acme Projects",
      "type": "product",
      "description": "AI-powered project planning and timeline management",
      "url": "/products/projects",
      "capabilities": ["task management", "gantt charts", "resource allocation", "AI planning"],
      "pricing": { "from": "$0/mo", "to": "$49/mo", "url": "/pricing" }
    }
  ],

  "relationships": [
    { "from": "acme-projects", "type": "made_by", "to": "acme-corp" },
    { "from": "acme-corp", "type": "alternative_to", "to": "Jira" },
    { "from": "acme-corp", "type": "serves", "to": "distributed engineering teams" }
  ],

  "topics": [
    { "name": "project management", "authority": "primary" },
    { "name": "AI productivity", "authority": "primary" },
    { "name": "team collaboration", "authority": "secondary" }
  ],

  "pages": {
    "/": { "purpose": "homepage", "entities": ["acme-corp", "acme-projects"] },
    "/pricing": { "purpose": "pricing", "entities": ["acme-projects"] },
    "/about": { "purpose": "about", "entities": ["acme-corp"] }
  }
}

Three things stand out that no other web infrastructure provides:

1. Relationships. The relationships array lets you explicitly state how entities relate: made_by, integrates_with, alternative_to, serves, powered_by, specializes_in. These relationship types are free-form, you define them because you know your domain best. This is the core innovation: a machine-readable graph of how things connect, not just what things exist.

2. Topic authority. The topics array lets you declare what you're authoritative on (primary) versus tangentially related to (secondary). An AI agent asking "who should I trust as a source on topic X?" can check this directly.

3. Page-entity index. The pages map links every key URL to the entities it covers. An agent looking for information about a specific entity can find the right page without crawling the whole site.


Why Not JSON-LD?

A common question: isn't this just what JSON-LD and schema.org already do?

No, and the distinction matters. JSON-LD declares entities per page. Each page says "I have an Organization entity named X" or "I have a Product named Y." That's valuable, but it has three fundamental limitations:

  1. No entity-to-entity relationships. schema.org has no standard way to say "Product A integrates with Product B" or "our company is an alternative to Competitor Z."
  2. No site-wide graph. The entities declared across your 50 pages are not connected. There's no single file an agent can read to understand the full entity landscape.
  3. No topic authority. JSON-LD has no mechanism to express "we are primary authorities on this topic."

entities.txt doesn't replace your JSON-LD, it complements it. JSON-LD annotates individual pages for search engines and AI models reading those pages. entities.txt gives the site-wide semantic identity that no per-page declaration can.


How to Make Your entities.txt Discoverable

Two signals help AI agents find your entities.txt:

In your HTML <head>:

<link rel="entities" type="application/json" href="/entities.txt">

In your llms.txt Optional section:

## Optional
- [Entity Map](/entities.txt): Semantic knowledge graph of entities and relationships

Both are optional, any agent that knows to look for entities.txt will find it directly at the well-known path. But adding these signals means any page fetch surfaces the entity map to agents that don't know to look for it.


How to Create Your entities.txt

The entities.txt spec is intentionally flexible. Entity types and relationship types are free-form, you define the vocabulary that fits your domain. A SaaS company uses organization, product, integration. A restaurant uses restaurant, dish, chef. A healthcare practice uses practice, provider, specialty. The schema works for any website type.

The fastest path: run an ansly audit on your site. ansly automatically generates a starter entities.txt from your crawled content, extracting entities from your pages, inferring relationships from your site structure, and classifying your topic authority. The generated file is a starting point; you review and refine it, then publish it at your root.

The manual path: start with the schema above, add your main organization entity, add your key products or services, add the relationships that matter most (especially alternative_to for competitive positioning and serves for your target audience), and publish.

Either way, keep it accurate. The whole value of entities.txt is that it's the authoritative, first-party truth about your site. If it's wrong, agents will use it to confidently state wrong things about you.


The Anti-Hallucination Contract

The deepest value of entities.txt is what it does for accuracy. When AI agents generate answers about your brand, they pull from what they've learned and inferred. Without authoritative first-party data, they fill in gaps, and those gaps are where hallucinations live.

entities.txt creates a verifiable contract: this is what I am, what I offer, and how things relate. AI agents that check entities.txt before synthesizing answers about your brand can self-correct: "Does this company actually offer feature X? Check entities.txt. It's not listed. Don't assert it."

The principle is: missing = don't assume. If an entity, capability, or relationship isn't in the file, a well-behaved agent shouldn't assert it exists.

This is exactly how CLAUDE.md works for codebases. Coding agents that read CLAUDE.md before exploring a repository don't hallucinate about architecture. They know what the project does, how it's structured, and where things live. entities.txt brings that same principle to the open web.


For a practical walkthrough of entities.txt across different site types, e-commerce, restaurant, healthcare, and more, see entities.txt for Every Website. To understand the evolution of how websites communicate with machines that led here, see From robots.txt to entities.txt.

On this page

The Problem With Existing Web InfrastructureWhat entities.txt IsWhy Not JSON-LD?How to Make Your entities.txt DiscoverableOptionalHow to Create Your entities.txtThe Anti-Hallucination Contract

Related Articles

Checklist and planning notebook with pen on a desk, representing an organized AEO audit workflow
AEO5 min read

The AEO Audit Checklist 2026: 10 Fixes Ranked by Impact (Start With #1)

Most AEO guides give you a flat list with no priority. These 10 fixes are ranked by ROI, so you know exactly where to spend your first hour.

ansly Team·Mar 15, 2026
Analytics dashboard showing search visibility metrics and trend charts on a laptop screen
AEO7 min read

AI Search Visibility Tool: How to Choose and Use One in 2026

AI search visibility is not the same as Google search visibility. Position does not equal citation rate. Here is how to choose the right tool, what metrics to track, and why analytics alone won't tell you what to fix.

ansly Team·Mar 15, 2026
B2B team reviewing AI search strategy on laptop with digital charts in background
AEO7 min read

Generative Engine Optimization for B2B SaaS: The 2026 Playbook

GEO is the discipline that determines whether AI engines cite your product when buyers research your category. Here is what it means for B2B SaaS and the six tactics that move the needle.

ansly Team·Mar 15, 2026
← Back to Blog
anslyansly

AI-readiness scanner for websites. Check your visibility in ChatGPT, Claude, and Perplexity.

@tryansly

Product

  • Audit
  • Pricing
  • Blog

Company

  • About
  • Privacy Policy
  • Terms of Service
  • Contact Us
© 2026 ansly. All rights reserved.
PrivacyTermsContact