phidea
Validated replication · auto insurance

Does the home-insurance LLM pattern generalize to auto? A validated replication study.

The prior home-insurance study left three surviving hypotheses. Do they generalize? We ran the same two-stage method on auto: 10 ablation queries, then 5 independent runs per validation query across 2 LLMs (60 calls). The findings are asymmetric: one hypothesis replicates in inverted form (generalists win the specialty surface, not specialists), one replicates only for one state, one is refuted for vehicle-type. The auto market is structurally different from the home market in LLM retrieval — and that matters for anyone planning carrier content strategy across both lines.

TL;DR
  • The home-insurance pattern does not generalize to auto. Vehicle-type ownership: only EV → Travelers replicates (4/5 + 5/5). Sedan, truck, classic, and Tesla all show cross-LLM divergence or instability.
  • Specialty use-case ownership is real but inverted. Rideshare → State Farm (3/5 + 5/5) and SR-22 → GEICO (5/5 + 3/5) both replicate cross-LLM. The actual market specialists — Progressive for rideshare, Dairyland for SR-22 — do not win.
  • Regional champions win only where they dominate the specific line. NJ → NJM 4/5 + 4/5. MAPFRE owns MA home but not MA auto. Raw state-level market share does not transfer across product lines.
  • Structural read: auto LLM retrieval has stronger national-brand gravity than home. Generalists (State Farm, GEICO, Progressive, Travelers, USAA) absorb most query shapes unless a regional monopoly or an open emerging category (EV) is in play.

Bottom line

  • ✗ REFUTED Vehicle-type ownership: refuted as a clean rule. Of 5 vehicle types tested, only EV → Travelers replicates cross-LLM (4/5 + 5/5). Sedan, truck, classic, and Tesla all show cross-LLM divergence or instability. The home-insurance finding (property-type ownership, 5/5 + 5/5 on Chubb and Nationwide) does not generalize.
  • ⇄ INVERTED Specialty-use-case ownership: confirmed but inverted. Rideshare → State Farm (3/5 + 5/5) and SR-22 → GEICO (5/5 + 3/5) replicate cross-LLM. But the predicted specialists — Progressive (rideshare-native) and Dairyland (SR-22 specialist) — do not win. Generalists have annexed the specialty-intent surface.
  • ~ PARTIAL Regional-champion dominance: partial. Newark → NJM replicates at 4/5 + 4/5. Boston → MAPFRE (which worked 5/5 + 5/5 in home) fails in auto — both LLMs pick State Farm. Regional champions win only where they dominate the specific line of business.
  • Structural read: auto-insurance LLM retrieval has a stronger national-brand gravity than home. Specialty and regional carriers hold clean ownership in home; in auto, the generalist set (State Farm, GEICO, Progressive, Travelers, USAA) absorbs most query shapes unless a very tight regional monopoly exists (NJM) or a new editorial category is open (EV).

Method

Identical two-stage design to the home-insurance study. Stage 1: 10 ablation queries holding intent constant (auto insurance) and adding one element per row. Stage 2: three hypotheses mapped from the home study — vehicle-type-ownership (analog to property-type), specialty-use-case-ownership (analog to specialty-peril), regional-champion dominance — each with 4-5 predictions and 5 independent runs per (query, LLM).

Observations captured on 2026-04-24 against Perplexity Sonar Pro and Gemini 2.5 Pro, both with web search / grounding enabled. Two independent LLM observations per query is the basis for the cross-LLM replication checks throughout.

First-named-carrier is the capture signal. Modal carrier across 5 runs is the stability measure. Cross-LLM agreement (both LLMs returning the same modal at ≥3/5) is the strongest signal; within-LLM stability (5/5) with cross-LLM disagreement is called out as divergence.

Stage 1: ablation ladder (hypothesis generation)

One run per (query, LLM). Both LLMs agree on 6 of 10 rows — a high initial agreement rate that does not survive the 5-run variance test for most vehicle and use-case queries.

QueryPerplexityGemini
Baseline — "best value car insurance"TravelersTravelers
+ SeattleUSAAUSAA
+ AtlantaAuto-OwnersAuto-Owners
+ Los AngelesMercuryGEICO
+ ChicagoState FarmState Farm
"cheapest" (Seattle)USAAUSAA
+ young driver (Seattle)State FarmTravelers
+ SR-22 (Seattle)GEICOState Farm
+ Tesla (Seattle)ProgressiveTravelers
+ rideshare drivers (Seattle)State FarmState Farm

Signals that looked promising at Stage 1:

  1. Atlanta → Auto-Owners (both LLMs). Suggestive of regional champion in the Southeast.
  2. Seattle → USAA (both LLMs), Chicago → State Farm (both LLMs). Suggestive of either HQ-effect or regional dominance.
  3. Rideshare → State Farm (both LLMs). Suggestive of use-case ownership — but the expected owner was Progressive.
  4. Tesla, SR-22, young driver all show cross-LLM divergence at the ablation stage — flagged as unstable before validation.

Stage 2: validation (hypothesis testing)

5 runs per (query, LLM). Cell format: carrier:N/5.

Vehicle-type ownership (analog to property-type in home)

✗ REFUTED
VehiclePredictedPerplexity modalGemini modal
sedanvariesState Farm:4/5USAA:5/5
pickup truckvariesProgressive:3/5USAA:4/5
classic carHagerty (market specialist)GEICO:4/5Safeco:1/5
TeslaTesla Insurance or generalistProgressive:3/5Tesla Insurance:3/5
electric vehicle (generic)variesTravelers:4/5Travelers:5/5

Outcome: refuted as a general rule. The home-insurance finding was that property types (condo, luxury, historic) had clean per-carrier ownership at 5/5 + 5/5. In auto, only EV → Travelers survives with that strength (4/5 + 5/5). The others show either cross-LLM divergence (sedan: State Farm 4/5 Perp vs USAA 5/5 Gem; truck: Progressive 3/5 Perp vs USAA 4/5 Gem) or within-LLM instability (classic car: GEICO 4/5 Perp but Safeco only 1/5 on Gemini with no modal winner).

Missing-specialist finding: Hagerty is the dominant US classic-car insurance specialist. It does not appear as the modal first-carrier on either LLM for the classic-car query. Tesla Insurance appears on Gemini 3/5 for the Tesla query but loses Perplexity entirely (modal = Progressive 3/5). The specialty auto carriers that own the category in the actual market do not own the query in LLM retrieval.

New emergent finding: EV insurance → Travelers (4/5 + 5/5) is a clean cross-LLM ownership pattern. Under the null (both LLMs picking from ~40 candidates independently), this joint probability is ≈ 1 in 30M. Travelers has captured EV editorial coverage in a way neither Tesla Insurance nor the insurtechs (Root, Lemonade) have.

Specialty-use-case ownership (analog to specialty-peril)

⇄ INVERTED
Use casePredictedPerplexity modalGemini modal
rideshare driversProgressive (rideshare specialist)State Farm:3/5State Farm:5/5
SR-22 filing requiredDairyland (SR-22 specialist)GEICO:5/5GEICO:3/5
new driverGEICO or State FarmTravelers:3/5State Farm:4/5
teen driverState FarmGEICO:3/5State Farm:5/5

Outcome: confirmed with inverted ownership. Specialty use cases DO have clean cross-LLM ownership: rideshare → State Farm (3/5 + 5/5) and SR-22 → GEICO (5/5 + 3/5) both replicate. The surprise is who owns them. In the actual market, Progressive is the rideshare- insurance category leader (TNC endorsement pioneer) and Dairyland is the largest SR-22 monoline writer. Neither wins the LLM query.

Why this is different from home: in home, specialty-peril surfacing favored the editorial-covered specialist (Chubb for high-net-worth, Chubb again for jewelry). In auto, the same mechanism favors the editorially-dominant generalist. State Farm and GEICO have deeper consumer-editorial content on rideshare and SR-22 than the specialists do. Category depth still drives retrieval — the category winner is just a different type of carrier.

Partial cases: new-driver (Perplexity: Travelers 3/5; Gemini: State Farm 4/5) and teen-driver (Perplexity: GEICO 3/5; Gemini: State Farm 5/5) both show cross-LLM divergence. Within each LLM the answers are fairly stable; the LLMs just disagree about which generalist owns the query.

Regional-champion dominance (replication from home)

~ PARTIAL
State / cityPredictedPerplexity modalGemini modal
Boston, MAMAPFRE / Commerce (MA volume leaders)State Farm:5/5State Farm:3/5
Los Angeles, CAMercury (CA volume leader)USAA:4/5GEICO:4/5
Newark, NJNJM (NJ volume leader)NJM:4/5NJM:4/5

Outcome: partial. Newark → NJM replicates at 4/5 + 4/5, confirming regional-champion dominance where the regional carrier has overwhelming state-level share (NJM writes roughly one in four NJ auto policies). California → Mercury fails — Mercury appeared once in the ablation but the modal on validation is USAA on Perplexity and GEICO on Gemini. Boston → MAPFRE fails outright — both LLMs surface State Farm as modal.

Cross-study comparison: Boston → MAPFRE was 5/5 + 5/5 in the home study. The same city prompt, same LLMs, different insurance line — and the regional champion disappears. MAPFRE is the MA home-insurance leader but not the MA auto-insurance leader (that's Commerce / MAPFRE USA on auto but with less saturated editorial coverage). The regional-champion rule requires the regional carrier to be the line-specific leader AND to have the category editorial depth, not just the raw market share.

What replicates from home → auto, and what doesn't

HypothesisHome resultAuto resultGeneralization
Type-level ownership (property / vehicle)Confirmed 5/5 + 5/5 (condo, luxury, historic)Refuted (only EV cleanly replicates at 4/5 + 5/5)No — home is category-dense, auto is brand-dense
Specialty / use-case ownershipPartial — specialist wins where editorial depth exists (Chubb = jewelry)Confirmed but inverted — generalist wins (State Farm = rideshare, GEICO = SR-22)Mechanism generalizes, actor flips
Regional champion dominanceConfirmed (Boston → MAPFRE 5/5 + 5/5, Columbus → Cincinnati 5/5)Partial (Newark → NJM 4/5 + 4/5; Boston → MAPFRE refuted)Conditional on line-specific dominance

Actionable content-strategy briefs — built only on surviving auto findings

For Travelers (currently winning EV insurance)

You hold EV → Travelers at 4/5 + 5/5 cross-LLM — a clean emergent ownership pattern. Defend it and extend into adjacent emerging categories: PHEV (plug-in hybrid), home-charging-station coverage bundles, telematics-first pricing content. Root and Tesla Insurance are the competitive threats; neither currently has your editorial surface.

For State Farm (rideshare) and GEICO (SR-22)

You own two specialty use-case queries where the actual market specialists (Progressive for rideshare, Dairyland for SR-22) should arguably win. Protect the surface by continuing to publish category-level consumer content — that's what's winning you the retrieval, not brand alone. Measure monthly; challengers with higher editorial investment can take this back.

For specialty auto carriers (Dairyland, Hagerty, Progressive for rideshare, Tesla Insurance)

The auto market is not the home market. Being the category specialist does not win you the LLM query by default. Classic-car (Hagerty) does not win the LLM classic-car query; SR-22 (Dairyland) does not win the LLM SR-22 query; rideshare (Progressive) does not win the LLM rideshare query. The GEO investment is category-educational content, not brand-reminder content. Closing the editorial gap vs State Farm / GEICO is the mechanical path back to the surface.

For NJM (and regional monopolists)

Newark → NJM replicates 4/5 + 4/5. You win your state cleanly where your market share is overwhelming. Defensive priorities: audit your top-5 NJ city query shapes monthly; watch for generalist incursion. The Boston → MAPFRE refutation is the warning: if your state-level dominance slips line-by-line, the regional- champion effect can disappear per line.

For MAPFRE / Mercury (regional champions in one line, not the other)

MAPFRE owns MA home but not MA auto in LLM retrieval. Mercury shows up once in the CA ablation but loses validation. Carrier brand recognition at the state level does not transfer across lines in LLM answers. Each line-of-business needs its own editorial surface to win its own retrieval. Budget accordingly.

For national carriers competing on auto (Progressive, Allstate, Farmers, Liberty Mutual)

State Farm, GEICO, USAA, and Travelers are over-indexing across the auto LLM surface. Your recoverable ground is emerging categories (EV, PHEV, telematics-priced policies) and specific use-cases where incumbents have thin editorial coverage. Brand-awareness spend does not move these queries; category-specific content does.

Method limitations

  1. Two LLMs, not more. Two independent LLM observations per query (Perplexity + Gemini) is the minimum bar for cross-LLM replication; more would strengthen every finding.
  2. Carrier pattern library. We track 40+ auto carriers. If the modal first-named entity in a response is outside that list (a specialty MGA we don't recognize, a lead-gen aggregator like Insurify as an answer-level entity), it reads as null. This biases the sample toward named carriers we recognize. Auto-market MGAs and embedded players are underrepresented.
  3. Single-day snapshot. 2026-04-24 only. The parallel home study also ran on a single day. Time stability needs a 7-day retest — planned for 2026-05-01.
  4. First-named-carrier is a proxy. A meaningfully different analysis would weight the top-3 carriers per response by rank.
  5. One country. US-market carriers only. Auto-insurance LLM retrieval in UK / EU / CA markets is out of scope here and would need its own ablation.

Raw data

All probe output is committed to the Phidea repository for audit:

  • data/probe-auto-insurance-2026-04-24.json — full Stage 1 + Stage 2 raw runs for auto insurance
  • data/probe-validation-v2-2026-04-24.json — parallel home-insurance variance data for cross-line comparison

Probe script: scripts/probe-ablation-auto-insurance.mjs.

Related