Reference · page 3 / 6

3. LLM-readable feeds — llms.txt, RSS, JSON APIs

Part 3 of 6 (technical) · ← Structured data · Index · Next → Syndication pipeline

A growing ecosystem of machine-readable surfaces sits adjacent to your HTML pages. Some matter today, some will matter in 2027, some are aspirational. This page sorts them by current operational impact.

llms.txt

A proposed convention (introduced 2024) for a single text file at the root of a domain that summarises what an LLM should know about the site, with curated links to canonical sources. Format is compact markdown.

The shape:

\\\` # Acme Insurance

US property and casualty insurer specialising in personal lines.

Products

Homeowners HO-3: standard policy, 47 states.
Auto liability: minimum-required coverage, all 50 states.

Editorial

Carrier card

- About Acme: history, AM Best rating, NAIC code. \\\`

Status today (2026): limited evidence that LLMs actively use llms.txt files in their grounding pipelines. Phidea's multi-LLM citation probes do NOT show llms.txt URLs in citation hosts. Anthropic and OpenAI have not committed to a llms.txt-specific crawl path.

Why ship it anyway: cheap to maintain (one markdown file), zero downside, and as the convention matures the carriers with a curated llms.txt will be the ones LLMs read first. The cost of being wrong about llms.txt is one hour of work; the cost of skipping it if it becomes canonical in 2027 is missing a year of leverage. Ship it.

What to put in it: a curated map of your site, NOT a dump. The point is to summarise what's worth reading, not to inventory every page.

RSS / Atom

The original machine-readable feed. Now relevant for two LLM-specific reasons:

Freshness signal. RSS feeds are how new articles get into syndication aggregators (Feedly, Substack, etc.) and from there into the citation graph LLMs read. A page that's only crawled by web crawlers indexes slowly; a page also surfaced via RSS hits broader aggregator surfaces faster.

Curated trade-press distribution. Insurance trade publications (Coverager, Carrier Management, Insurance Journal) syndicate from RSS feeds. Getting picked up by them adds a citation surface LLMs already read.

What to ship: an RSS feed at \/rss.xml\ or \/feed.xml\ for your editorial content. Categorised feeds (\/rss/auto.xml\, \/rss/cyber.xml\) if you publish across distinct verticals. Standard RSS 2.0 or Atom 1.0; both work.

What NOT to ship: full-content feeds for ongoing analysis pieces. Headline + summary + canonical link is enough; full-content RSS dilutes the citation graph because aggregators republish without driving traffic to your domain.

JSON APIs

A JSON endpoint that returns structured data about your products, carriers, or content. Different from schema.org JSON-LD (which lives in the HTML page) — these are dedicated endpoints tools and aggregators can call directly.

Phidea exposes \/api/observations.json\ for its own observation registry, \/api/tools.json\ for its tool catalog. The pattern:

\\\`http GET /api/observations.json Content-Type: application/json Cache-Control: public, max-age=3600

{ "generatedAt": "2026-05-04", "count": 27, "observations": [ { "id": "...", "intent": "home", "lever": "...", ... }, ... ] } \\\`

Why it matters: LLMs and aggregator tools are increasingly capable of reading JSON. A well-structured API endpoint can be cited or queried directly by LLM agents and by analytics tools.

Insurance-specific shape:

\/api/products.json\ — your product catalog with pricing, state availability, coverage tags.
\/api/carriers.json\ — if you're a comparison site or aggregator, the carriers you cover.
\/api/citations.json\ — the comparison-site citations and trade-press mentions you've earned, structured. Useful for proving your media authority.

What NOT to ship:

Don't expose raw rate-filing data unless you're explicitly OK with competitors querying it.
Don't expose customer-claim data, even aggregated, unless explicitly anonymised and DOI-cleared.
Don't expose anything you wouldn't want a competitor to scrape and republish.

sitemap.xml

Boring but mandatory. Your site's URL inventory in XML format, served at \/sitemap.xml\. Standard SEO infrastructure; LLMs and search-engine crawlers both read it.

For insurance sites, three categories of URLs need to be in your sitemap:

Product pages (per state).
Editorial pages (guides, comparisons, FAQs).
About / regulatory pages (carrier card, AM Best info, complaint contact).

\changefreq\ and \priority\ are weak signals; don't over-engineer. A correct sitemap.xml that lists your real URLs is more valuable than a perfectly tuned one that misses pages.

robots.txt

The other mandatory file. Tells crawlers what they can and can't index. Insurance-specific consideration: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot are the LLM crawlers worth allowing or excluding explicitly.

Default posture for a US insurance carrier: allow all five LLM crawlers. The marginal value of being indexed by them is high; the marginal cost is essentially zero.

\\\` User-agent: GPTBot Allow: /

User-agent: ClaudeBot Allow: /

User-agent: PerplexityBot Allow: /

User-agent: Google-Extended Allow: /

User-agent: CCBot Allow: /

User-agent: * Allow: / Sitemap: https://acme-insurance.com/sitemap.xml \\\`

When to block: if you're a publisher whose business model depends on direct traffic and you've explicitly decided LLM training shouldn't use your content, then block GPTBot / ClaudeBot. For insurance carriers, this almost never applies — your business is selling policies, not serving ads, so being trained on is a benefit not a cost.

Open Graph / Twitter Cards

Standard social-share meta tags. Not LLM-specific, but they affect how your URLs render when shared into social platforms that LLMs sometimes read (LinkedIn, X). Worth setting once and forgetting.

Less-relevant feeds

Three formats often suggested but currently low-impact for LLM citation:

JSON Feed: a JSON alternative to RSS. Aggregator support is sparse; ship RSS instead.
WebSub / PubSubHubbub: real-time push notifications for feed updates. Used by some aggregators; not used by LLM grounding.
DBpedia / Wikidata exports: the LLM training corpus uses them, but you can't directly publish to them. Improving your Wikipedia entry is the indirect path.

What to ship and when

Sequenced rollout:

Week 1: sitemap.xml + robots.txt. Mandatory baseline.
Week 2: RSS feed for editorial content. Aggregator-pickup leverage.
Week 3: llms.txt with curated site map. Cheap insurance against the convention maturing.
Week 4: Open Graph / Twitter Card meta tags. Social-platform consistency.
Quarter 2: JSON API endpoints. Higher-effort, higher-leverage; only after the basics are clean.

Most insurance carriers in 2026 don't have llms.txt or comprehensive JSON APIs. Shipping them now is a small lead that compounds across crawls.

The honest framing for executives: LLM-readable feeds are infrastructure, not strategy. Get the table-stakes right (sitemap, robots, RSS), be slightly ahead of the curve on llms.txt, defer JSON APIs until you have a hypothesis for what you'd publish through them. The carriers that have invested heavily in this layer aren't winning more LLM citations than the carriers that haven't — yet. The leverage shows up later.