Imagine a potential customer asks ChatGPT: "Does Acme Corp have an API?"
ChatGPT says: "Yes, Acme Corp has a REST API with OAuth 2.0 authentication and rate limiting."
Acme Corp has no API.
This is not a hypothetical. It happens constantly, across categories and company sizes. AI systems confidently assert product features that don't exist, pricing that's years out of date, competitive positions that were never accurate, and capabilities the company explicitly doesn't offer.
The root cause is not a flaw in the models, it's a gap in the web's infrastructure. AI agents lack a reliable, authoritative, machine-readable source of ground truth about what a website actually is, what it offers, and how things relate. So they infer. And inference at scale means errors at scale.
entities.txt addresses this directly.
Why AI Hallucination About Brands Happens
When an AI system answers a question about your brand, it draws on a combination of:
- Training data: what was written about you across the web during pre-training
- Live retrieval (if available): recent pages fetched from your site or the web
- Inference: filling in gaps where data is absent or ambiguous
The hallucination problem is concentrated in layer 3. When training data is sparse, outdated, or contradictory, and when live retrieval returns pages that bury key facts in marketing prose, the model fills gaps with plausible-sounding inferences.
For small or mid-sized companies, the training data is thin. For companies with recent product changes, the training data is stale. For companies with complex product portfolios, the relationships between products are ambiguous in prose. For companies competing in crowded categories, the model has learned patterns from competitors and applies them to you.
The outcome: wrong features, wrong pricing, wrong competitive positioning, wrong integrations.
The Hallucination Taxonomy
Based on the patterns that emerge in AI answers about brands, there are five common hallucination types:
Feature hallucination: "Company X supports feature Y", when Y doesn't exist or was removed. The model learned that products in this category typically have Y, and applied the pattern.
Capability overstatement: "Company X can handle enterprise workloads at any scale", vague marketing language that the model inflated into a specific technical claim.
Stale pricing: "Company X costs $29/month", the price from two years ago, before a pricing change that wasn't widely covered.
Competitor confusion: "Company X is similar to Company Z and offers the same [feature set]", the model conflated two companies in the same category, inheriting features from the competitor.
Integration fabrication: "Company X integrates with [tool Y]", the model knows Y is popular in the category, inferred the integration exists, and stated it as fact.
All five have the same underlying cause: the model lacked authoritative first-party data and filled the gap.
How entities.txt Creates a Verifiable Contract
entities.txt gives AI agents a single, authoritative, machine-readable file that they can check before making claims about your brand.
The structure is explicit:
{
"entities": [
{
"id": "acme-projects",
"name": "Acme Projects",
"type": "product",
"capabilities": ["task management", "gantt charts", "resource allocation"],
"pricing": { "from": "$0/mo", "to": "$49/mo", "url": "/pricing" }
}
],
"relationships": [
{ "from": "acme-corp", "type": "alternative_to", "to": "Jira" },
{ "from": "acme-corp", "type": "alternative_to", "to": "Asana" }
]
}
An AI agent that checks this file before answering "Does Acme Corp have an API?" can see: the product entity lists task management, gantt charts, and resource allocation as capabilities. No API. If the product doesn't list API capabilities and there's no api-reference page in the pages map, a well-behaved agent should not assert one exists.
The principle: missing = don't assume.
If an entity, capability, or relationship isn't in the file, it should not be asserted. The entities.txt is the website's explicit declaration of what it is, and what it isn't.
This is not technically enforced - AI agents that choose to consult entities.txt can use it as a ground-truth check. But as the standard spreads, agent developers will increasingly build this check in: "before asserting a capability, verify it's in the site's entities.txt."
Each Hallucination Type, Solved
Feature hallucination → The capabilities array on each entity is an explicit list of what the product does. A capability not on the list should not be asserted.
Capability overstatement → The explicit vocabulary forces precision. "Handles enterprise workloads at any scale" doesn't fit neatly into a capabilities array; it forces the website owner to be specific or omit the claim.
Stale pricing → The pricing field with a direct link to the pricing page gives agents a live source to check rather than relying on training data.
Competitor confusion → The relationships array with explicit alternative_to entries tells the agent exactly which competitors you acknowledge, and by omission, which comparisons are not valid.
Integration fabrication → integrates_with relationships in the file are explicit. Integrations not listed are not asserted.
The External Verification Layer
entities.txt also enables external verification through identifiers. When your organization entity includes links to authoritative external sources:
"identifiers": {
"linkedin": "https://linkedin.com/company/acme",
"crunchbase": "https://crunchbase.com/organization/acme",
"wikipedia": "https://en.wikipedia.org/wiki/Acme_Corp"
}
An AI agent can cross-reference your self-declared entity data against independent sources. If your entities.txt says "founded 2019" and your Crunchbase profile corroborates this, the claim has higher confidence. If they conflict, the agent has a signal that something needs verification.
This is the same principle behind entity authority in knowledge graphs, facts corroborated by multiple authoritative sources are weighted higher than uncorroborated self-declarations. For more on building that external entity authority, see Entity Authority for AI: How to Build Knowledge Graph Presence AI Systems Trust.
The First-Mover Advantage
There's a compounding benefit to publishing entities.txt early: training data.
As AI models are retrained on web content, sites with well-structured entities.txt files will have their entity relationships and capabilities represented accurately in the training corpus. The entities.txt becomes part of the ground truth that future model versions learn from.
Sites without it will continue to have their entity data inferred from prose, which means continued exposure to all five hallucination types outlined above.
The window to shape how AI models represent your brand is narrowest when you act first. The time to establish accurate entity data is before the next major training run, not after.
Run an ansly audit on your site to get a generated starter entities.txt based on your crawled content. The generated file is a starting point, review it, refine the capabilities and relationships, and publish it at your site root. The more accurate and complete it is, the more reliable a ground-truth check it becomes for AI agents reasoning about your brand.
For a deeper look at why this is needed now, see entities.txt: Why Every Website Needs a Semantic Identity File for AI Agents. For the broader entity authority strategy, see Entity Authority for AI and B2B AI Search Visibility Guide.