3. LLM-readable feeds — llms.txt, RSS, JSON APIs
Part 3 of 6 (technical) · ← Structured data · Index · Next → Syndication pipeline
A growing ecosystem of machine-readable surfaces sits adjacent to your HTML pages. Some matter today, some will matter in 2027, some are aspirational. This page sorts them by current operational impact.
llms.txt
A proposed convention (introduced 2024) for a single text file at the root of a domain that summarises what an LLM should know about the site, with curated links to canonical sources. Format is compact markdown.
The shape:
\\\`
# Acme Insurance
US property and casualty insurer specialising in personal lines.
Products
- Homeowners HO-3: standard policy, 47 states.
- Auto liability: minimum-required coverage, all 50 states.
Editorial
Carrier card
- About Acme: history, AM Best rating, NAIC code.
\\\`
Status today (2026): limited evidence that LLMs actively use llms.txt files in their grounding pipelines. Phidea's multi-LLM citation probes do NOT show llms.txt URLs in citation hosts. Anthropic and OpenAI have not committed to a llms.txt-specific crawl path.
Why ship it anyway: cheap to maintain (one markdown file), zero downside, and as the convention matures the carriers with a curated llms.txt will be the ones LLMs read first. The cost of being wrong about llms.txt is one hour of work; the cost of skipping it if it becomes canonical in 2027 is missing a year of leverage. Ship it.
What to put in it: a curated map of your site, NOT a dump. The point is to summarise what's worth reading, not to inventory every page.
RSS / Atom
The original machine-readable feed. Now relevant for two LLM-specific reasons:
- Freshness signal. RSS feeds are how new articles get into syndication aggregators (Feedly, Substack, etc.) and from there into the citation graph LLMs read. A page that's only crawled by web crawlers indexes slowly; a page also surfaced via RSS hits broader aggregator surfaces faster.
- Curated trade-press distribution. Insurance trade publications (Coverager, Carrier Management, Insurance Journal) syndicate from RSS feeds. Getting picked up by them adds a citation surface LLMs already read.
What to ship: an RSS feed at \/rss.xml\ or \/feed.xml\ for your editorial content. Categorised feeds (\/rss/auto.xml\, \/rss/cyber.xml\) if you publish across distinct verticals. Standard RSS 2.0 or Atom 1.0; both work.
What NOT to ship: full-content feeds for ongoing analysis pieces. Headline + summary + canonical link is enough; full-content RSS dilutes the citation graph because aggregators republish without driving traffic to your domain.
JSON APIs
A JSON endpoint that returns structured data about your products, carriers, or content. Different from schema.org JSON-LD (which lives in the HTML page) — these are dedicated endpoints tools and aggregators can call directly.
Phidea exposes \/api/observations.json\ for its own observation registry, \/api/tools.json\ for its tool catalog. The pattern:
\\\`http
GET /api/observations.json
Content-Type: application/json
Cache-Control: public, max-age=3600
{
"generatedAt": "2026-05-04",
"count": 27,
"observations": [
{ "id": "...", "intent": "home", "lever": "...", ... },
...
]
}
\\\`
Why it matters: LLMs and aggregator tools are increasingly capable of reading JSON. A well-structured API endpoint can be cited or queried directly by LLM agents and by analytics tools.
Insurance-specific shape:
- \
/api/products.json\— your product catalog with pricing, state availability, coverage tags. - \
/api/carriers.json\— if you're a comparison site or aggregator, the carriers you cover. - \
/api/citations.json\— the comparison-site citations and trade-press mentions you've earned, structured. Useful for proving your media authority.
What NOT to ship:
- Don't expose raw rate-filing data unless you're explicitly OK with competitors querying it.
- Don't expose customer-claim data, even aggregated, unless explicitly anonymised and DOI-cleared.
- Don't expose anything you wouldn't want a competitor to scrape and republish.
sitemap.xml
Boring but mandatory. Your site's URL inventory in XML format, served at \/sitemap.xml\. Standard SEO infrastructure; LLMs and search-engine crawlers both read it.
For insurance sites, three categories of URLs need to be in your sitemap:
- Product pages (per state).
- Editorial pages (guides, comparisons, FAQs).
- About / regulatory pages (carrier card, AM Best info, complaint contact).
\changefreq\ and \priority\ are weak signals; don't over-engineer. A correct sitemap.xml that lists your real URLs is more valuable than a perfectly tuned one that misses pages.
robots.txt
The other mandatory file. Tells crawlers what they can and can't index. Insurance-specific consideration: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot are the LLM crawlers worth allowing or excluding explicitly.
Default posture for a US insurance carrier: allow all five LLM crawlers. The marginal value of being indexed by them is high; the marginal cost is essentially zero.
\\\`
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot Allow: /
User-agent: PerplexityBot Allow: /
User-agent: Google-Extended Allow: /
User-agent: CCBot Allow: /
User-agent: *
Allow: /
Sitemap: https://acme-insurance.com/sitemap.xml
\\\`
When to block: if you're a publisher whose business model depends on direct traffic and you've explicitly decided LLM training shouldn't use your content, then block GPTBot / ClaudeBot. For insurance carriers, this almost never applies — your business is selling policies, not serving ads, so being trained on is a benefit not a cost.
Open Graph / Twitter Cards
Standard social-share meta tags. Not LLM-specific, but they affect how your URLs render when shared into social platforms that LLMs sometimes read (LinkedIn, X). Worth setting once and forgetting.
Less-relevant feeds
Three formats often suggested but currently low-impact for LLM citation:
- JSON Feed: a JSON alternative to RSS. Aggregator support is sparse; ship RSS instead.
- WebSub / PubSubHubbub: real-time push notifications for feed updates. Used by some aggregators; not used by LLM grounding.
- DBpedia / Wikidata exports: the LLM training corpus uses them, but you can't directly publish to them. Improving your Wikipedia entry is the indirect path.
What to ship and when
Sequenced rollout:
- Week 1: sitemap.xml + robots.txt. Mandatory baseline.
- Week 2: RSS feed for editorial content. Aggregator-pickup leverage.
- Week 3: llms.txt with curated site map. Cheap insurance against the convention maturing.
- Week 4: Open Graph / Twitter Card meta tags. Social-platform consistency.
- Quarter 2: JSON API endpoints. Higher-effort, higher-leverage; only after the basics are clean.
Most insurance carriers in 2026 don't have llms.txt or comprehensive JSON APIs. Shipping them now is a small lead that compounds across crawls.
The honest framing for executives: LLM-readable feeds are infrastructure, not strategy. Get the table-stakes right (sitemap, robots, RSS), be slightly ahead of the curve on llms.txt, defer JSON APIs until you have a hypothesis for what you'd publish through them. The carriers that have invested heavily in this layer aren't winning more LLM citations than the carriers that haven't — yet. The leverage shows up later.