phidea
Validated study · Published content · home insurance

What actually drives which insurer LLMs recommend — a hypothesis + validation study.

Earlier Phidea case studies generated hypotheses about how LLMs retrieve insurance carriers. This one tests them. Ten ablation queries produced three candidate hypotheses; we then ran 5 independent runs per validation query across 2 LLMs to measure stability and test each hypothesis directly. One hypothesis survives strongly, one is refuted, one is conditional. The surviving hypothesis drives concrete content-strategy briefs; the refuted one is replaced with a better one forced by the data.

TL;DR
  • Chubb owns luxury + historic home at 5/5 + 5/5 across Perplexity and Gemini. Nationwide owns condo at 5/5 + 5/5. Joint null probability ≈ 1 in 39 billion per finding — property-type ownership is real.
  • HQ-city effect is a myth— only 1 of 10 HQ-city predictions matched. Boston HQ’s Liberty Mutual; MAPFRE wins the query 5/5 on both LLMs. Regional champion beats HQ address.
  • Specialty carriers win only where editorial depth exists (Chubb on jewelry: 4/5 + 4/5). Wildfire and flood fail to surface specialists — generalists win by default.

Bottom line

  • ✗ REFUTED H1 — HQ-city effect: refuted. Only 1 of 10 observations matched prediction. Chicago → Allstate from the ablation was a coincidence, not a rule.
  • ✓ CONFIRMED H2 — Property-type ownership: confirmed at overwhelming statistical significance. Condo → Nationwide and luxury / historic → Chubb each got 5/5 on Perplexity AND 5/5 on Gemini. Joint probability under the null ≈ 1 in 39 billion.
  • ~ PARTIAL H3 — Specialty-peril surfacing: conditional. High-value jewelry → Chubb replicates (4/5 + 4/5). Wildfire and flood do not replicate cleanly.
  • Surprise finding: the H1 refutation forced a better hypothesis — regional-champion dominance (Boston → MAPFRE 5/5 both LLMs; Columbus → Cincinnati 5/5 Perplexity). Carriers with dominant state-level market share win state queries, regardless of HQ city.

Method

Two stages. Stage 1 was an ablation ladder: 10 queries holding intent constant (home insurance) and adding one element per row. Each query ran once across 4 LLMs with web search. Stage 1 output three candidate hypotheses.

Stage 2 was validation. For each hypothesis, we designed specific predictions (5 HQ cities, 5 property types, 3 specialty perils), then ran each validation query 5 times per LLM and recorded the modal first-carrier. This measures stability: a hypothesis only survives if it replicates across runs and ideally across multiple LLMs.

Observations captured on 2026-04-24. Stage 1 ablation ran Perplexity Sonar Pro, Claude Opus 4.5, and Gemini 2.5 Pro. Stage 2 variance testing uses Perplexity + Gemini (5 runs per query) as two independent LLM observations, which is the basis for cross-LLM replication checks throughout the piece.

Stage 1: ablation ladder (hypothesis generation)

One run per (query, LLM). Compact view — this is the input to hypothesis generation, not the validated output.

QueryPerplexityClaudeGemini
Baseline — "best value housing insurance"ErieUSAAChubb
+ SeattleAllstateAllstateAllstate
+ AtlantaFarmersState FarmAuto-Owners
+ Jacksonville FLAllstateTower HillTower Hill
+ ChicagoAllstateAllstateAllstate
"cheapest" vs "best value" in SeattleAllstateSafecoSafeco
+ earthquake (Seattle)GeoVeraGeoVera
+ first-time homebuyer (Seattle)State FarmAllstateState Farm
+ condo (Seattle)NationwideNationwideNationwide
+ $500k home (Seattle)AllstateAllstateState Farm

Three candidate hypotheses emerged:

  1. H1 — HQ-city effect: Chicago → Allstate unanimously (Allstate HQ is Northbrook IL). Hypothesis: HQ cities systematically surface the HQ'd carrier.
  2. H2 — Property-type ownership: Condo → Nationwide unanimously. Hypothesis: property-type content is ownable per carrier; different property types should surface different carriers.
  3. H3 — Specialty-peril → specialty carrier: Earthquake → GeoVera. Hypothesis: specialty perils cleanly surface specialists.

Stage 2: validation (hypothesis testing)

5 runs per (query, LLM). Cell format: carrier:N/5 where N is how many of 5 runs returned that modal carrier.

H1 — HQ-city effect

✗ REFUTED
CityPredicted (HQ'd carrier)Perplexity modalGemini modal
Bloomington, ILState FarmUSAA:4/5State Farm:5/5
Columbus, OHNationwideCincinnati:5/5Erie:4/5
Hartford, CTTravelersMetLife:3/5State Farm:3/5
Mayfield Village, OHProgressiveCincinnati:3/5Erie:3/5
Boston, MALiberty MutualMAPFRE:5/5MAPFRE:5/5

Outcome: 1 of 10 predictions matched. Gemini/Bloomington returned State Farm 5/5 — the only hit. Every other HQ city surfaced a different carrier, often with very high stability (Boston → MAPFRE 5/5 on both LLMs, Columbus → Cincinnati 5/5 Perplexity). The predicted HQ'd carrier is not the retrieval winner.

Replacement hypothesis forced by the data: regional-champion dominance. Carriers with dominant state- level market share (MAPFRE in MA, Cincinnati in OH, regional mutuals) win state-level queries, regardless of HQ address. Boston's HQ'd carrier (Liberty Mutual) loses to MAPFRE 5/5 on both LLMs because MAPFRE is the Massachusetts volume leader. This is a better, data-supported hypothesis.

H2 — Property-type ownership

✓ CONFIRMED
Property typePredictedPerplexity modalGemini modal
condoNationwide (from ablation)Nationwide:5/5Nationwide:5/5
mobile homemanufactured-home specialistAmerican Modern:3/5American Family:2/5
luxury homeChubb / PUREChubb:5/5Chubb:5/5
historic homespecialty (Chubb likely)Chubb:5/5Chubb:5/5
townhouseunclearState Farm:4/5Nationwide:4/5

Outcome: confirmed at overwhelming significance. Condo → Nationwide (5/5 + 5/5), luxury home → Chubb (5/5 + 5/5), historic home → Chubb (5/5 + 5/5). Three property types, three cross-LLM-unanimous owners. Under the null hypothesis (LLMs pick carriers at random from ~50 candidates), each 5/5 + 5/5 joint result has probability ≈ 1/39B.

Sub-finding: Chubb wins both luxury AND historic at 5/5 on both LLMs. Chubb owns "premium / risk-elevated property" as a category, not just one property type. That's a cleaner formulation of the hypothesis.

Partial cases: mobile-home surfaces manufactured-home specialists (American Modern 3/5 Perplexity, American Family 2/5 Gemini) — the category pattern holds, the specific carrier is noisy. Townhouse is a cross-LLM divergence (Perplexity → State Farm 4/5; Gemini → Nationwide 4/5) — each LLM is internally stable but they disagree.

H3 — Specialty-peril surfacing

~ PARTIAL
PerilPredictedPerplexity modalGemini modal
floodflood specialistFirstMark:3/5Travelers:3/5
wildfirewildfire specialistAmerican Family:4/5Allstate:2/5
high-value jewelryChubb / PUREChubb:4/5Chubb:4/5

Outcome: conditional. Only high-value jewelry replicates cleanly (Chubb 4/5 + 4/5, strong cross- LLM signal). Flood is split (Perplexity surfaces FirstMark, a specialty carrier, 3/5; Gemini surfaces Travelers, a generalist, 3/5). Wildfire surfaces generalists on both LLMs — no specialist wins. The specialty-peril rule is not universal.

Refined hypothesis: specialty-carrier dominance holds where the specialist category has saturated editorial coverage (high-net-worth insurance via Chubb) and fails where editorial coverage defaults to generalists (wildfire, flood-on-Gemini). Category-level editorial depth, not peril specificity per se, drives the outcome.

The hypotheses that survive

After validation, three data-supported hypotheses remain — different from the originals:

  1. Property-type ownership is real and replicable. Carriers can cleanly own specific property-type queries (Nationwide = condo; Chubb = premium property). Cross-LLM 5/5 replication rules out chance.
  2. Regional-champion dominance beats HQ-city dominance. The carrier with the largest state- level volume wins state-level queries, regardless of HQ address. MAPFRE (MA), Cincinnati (OH regional presence), Tower Hill (FL — from the original ablation).
  3. Specialty-carrier surfacing is editorial-coverage- driven, not peril-specific. Where a specialty category has saturated editorial coverage (high-net-worth → Chubb), specialists win. Where it doesn't (wildfire), generalists win by default.

Actionable content-strategy briefs — built only on surviving hypotheses

For Nationwide (currently winning condo)

Defend your 5/5 condo dominance. Measure monthly whether challengers (Travelers, Farmers) start surfacing in the condo query. Adjacent query shapes to claim next: townhouse (contested), multi-family, duplex. Your property-type editorial depth is an asset you can extend.

For Chubb (currently winning luxury + historic + jewelry)

You own the "premium / elevated-risk property" cluster on two independent LLMs with 5/5 stability. Your natural expansion is into related specialty queries: art + collectibles insurance, private-client umbrella, high-value jewelry travel insurance. PURE and AIG Private Client are your direct challengers; they have less consumer-editorial surface currently.

For regional specialists (MAPFRE in MA, Cincinnati in OH, Tower Hill in FL, PEMCO in WA, Mercury in CA)

The data shows regional-champion dominance is real. You already win state-level queries where your market share is dominant. Two investments: (1) audit your citation- share in your top-3 states to confirm; (2) defend against national brands who are out-content-marketing you in city-level queries inside your states.

For national carriers (State Farm, Allstate, Nationwide, Travelers)

HQ-city effect is NOT a real defense. Your HQ city does not automatically surface you. Your defensible positions are national-brand-awareness queries (where you win on ad saturation) and property-type / demographic queries you explicitly invest in (first-time homebuyer for State Farm is a good example — the editorial surface is yours).

For specialty carriers (GeoVera, FirstMark, Neptune Flood, Stillwater)

Specialty-peril surfacing is conditional on editorial- category depth. Where your peril has saturated content (earthquake, high-net-worth), you win. Where it doesn't (wildfire, some flood contexts), you lose to generalists by default. The content investment is category-depth, not brand-awareness.

Method limitations

  1. Two LLMs, not four. Cross-LLM replication uses Perplexity + Gemini as two independent observations per query. Adding more LLMs would strengthen every replication finding.
  2. Single-day snapshot. Observations are from 2026-04-24. LLM retrieval indexes refresh. A 7-day retest is scheduled to check whether 5/5 findings are time-stable.
  3. One buyer intent tested here. This study tests home insurance specifically. The auto-insurance replication is now published at ablation-auto-insurance: vehicle-type ownership does NOT generalize cleanly from property-type, specialty-use-case ownership replicates but with generalists winning (inverted), and regional-champion dominance is partial. Life, health, and commercial remain untested.
  4. First-named-carrier is a proxy. A carrier named second or third in a response still gets some attention. Our analysis captures the rank-1 signal only. Richer analysis would weight the top-3 positions by order.

Raw data

All probe outputs are committed to the Phidea repository for audit:

  • data/probe-ablation-home-insurance-2026-04-24.json — Stage 1 ablation runs
  • data/probe-validation-hypotheses-2026-04-24.json — Stage 2 initial validation (1 run per query)
  • data/probe-validation-v2-2026-04-24.json — Stage 2 variance validation (5 runs per query, the source of the 5/5 stability numbers in this piece)

Probe scripts: scripts/probe-ablation-home-insurance.mjs, scripts/probe-validation-hypotheses.mjs, scripts/probe-validation-v2-variance.mjs.

Related