In September 2024, Jeremy Howard of Answer.AI proposed a small, sensible-looking file. Put a markdown document called llms.txt at your site root, the spec said: an H1 with your project name, a blockquote summary, then tidy sections of links so a language model can skip your nav bars and ad slots and read a clean map of what you offer. A companion llms-full.txt would carry the whole thing in one file. It is a genuinely good idea about a real problem — HTML is a lossy way to feed a model, and context windows are finite.
Almost two years later, we can stop theorizing about whether it works, because someone counted. Ahrefs looked at 137,000 sites and found that 97% of their llms.txt files received zero requests in the month studied. Of the bots that did fetch one, 77% weren't AI tools at all — they were SEO auditors and generic crawlers. The actual answer-engine bots, the ones you wrote the file for, made a few hundred fetches across thousands of sites. The file is being published into a room with no one in it.
This is not a temporary gap that adoption will close. It's structural, and the clearest way to see why is to notice what llms.txt is: a document in which you describe yourself to a machine and ask it to believe you.
The meta keywords problem, again
We have run this experiment before. The <meta name="keywords"> tag let a page tell search engines what it was about, in the page's own words. It died because the incentive is fatal: every page claims to be the most relevant page for everything, so a self-reported signal carries no information a ranking system can use. Google's John Mueller made the comparison directly, arguing these systems "can't trust what is here as a way of differentiating between different websites." His colleague Gary Illyes was blunter: Google doesn't support llms.txt and isn't planning to.
A self-description file fails for the same reason meta keywords did: the web's trust machinery is built specifically to never take your word for it.
That is the one idea worth carrying out of this. An answer engine's whole job is to decide what is credible, and credibility is the one thing a source cannot assert about itself. llms.txt asks to be trusted by the exact systems engineered to discount self-assertion.
Then why do Anthropic, Stripe, and Vercel publish one?
Because there's a real use case hiding under the SEO hype, and it isn't search. The companies that maintain a good llms.txt — Anthropic's lives at docs.claude.com/llms.txt — publish it so that coding agents load their API docs. When you point Cursor or Claude Code at a library, an llms.txt or llms-full.txt is a clean, single-fetch way to pull the reference into context. That's a documentation-delivery convenience for in-product AI, not a lever on how ChatGPT decides whom to cite. Conflating the two is most of why the file got oversold.
What actually earns the citation
The mechanism is unglamorous and well documented. Most answer engines retrieve against an index before they generate — ChatGPT Search leans on Bing's index — so being crawlable and present in that index is the price of entry, full stop. From there, the Princeton "Generative Engine Optimization" paper (KDD 2024) measured what changes whether your passage gets pulled into an answer: adding citations, statistics, and direct quotations lifted visibility by up to 40%, while keyword stuffing did nothing. The engine rewards content that reads like something already credible.
And the strongest signal isn't on your page at all. Ahrefs' analysis of what AI assistants cite found that brand mentions across the web correlate with AI visibility roughly 3x more strongly than backlinks do. Reddit, YouTube, and news coverage move the needle. If you want to be quoted by a machine, the work is the same work that earns a human's trust: get other people to talk about you, in places the index already trusts, in language that's easy to lift. This is the same retrieval substrate the agentic crawlers read — you are optimizing for the index, not for a file.
The lever that does exist
Here's the irony. The one new, enforceable power publishers actually gained over AI in the last year is the opposite of llms.txt — not a file that invites the machine in and describes the buffet, but a wall with a turnstile. On July 1, 2025, Cloudflare began blocking AI crawlers by default for new domains and shipped Pay Per Crawl: hit a priced URL and you get an HTTP 402 Payment Required with a price attached. Allow, charge, or block — per crawler, enforced at the edge.
So the honest summary is a reversal of the hype. The file you publish to be read is ignored. The file that controls access (robots.txt, and now the 402) is respected because it's enforceable, not because it's polite. And the thing that earns citations was never a file — it's being in the index, being quotable, and being talked about. Spend the hour you'd put into a perfect llms.txt on a study worth citing instead. The machines can't read your self-description, but they're very good at repeating what everyone else says about you.



