Validated study · Published content · home insurance

What actually drives which insurer LLMs recommend — a hypothesis + validation study.

Earlier Phidea case studies generated hypotheses about how LLMs retrieve insurance carriers. This one tests them. Ten ablation queries produced three candidate hypotheses; we then ran 5 independent runs per validation query across 2 LLMs to measure stability and test each hypothesis directly. One hypothesis survives strongly, one is refuted, one is conditional. The surviving hypothesis drives concrete content-strategy briefs; the refuted one is replaced with a better one forced by the data.

TL;DR

Chubb owns luxury + historic home at 5/5 + 5/5 across Perplexity and Gemini. Nationwide owns condo at 5/5 + 5/5. Joint null probability ≈ 1 in 39 billion per finding — property-type ownership is real.
HQ-city effect is a myth— only 1 of 10 HQ-city predictions matched. Boston HQ’s Liberty Mutual; MAPFRE wins the query 5/5 on both LLMs. Regional champion beats HQ address.
Specialty carriers win only where editorial depth exists (Chubb on jewelry: 4/5 + 4/5). Wildfire and flood fail to surface specialists — generalists win by default.

Bottom line

✗ REFUTED H1 — HQ-city effect: refuted. Only 1 of 10 observations matched prediction. Chicago → Allstate from the ablation was a coincidence, not a rule.
✓ CONFIRMED H2 — Property-type ownership: confirmed at overwhelming statistical significance. Condo → Nationwide and luxury / historic → Chubb each got 5/5 on Perplexity AND 5/5 on Gemini. Joint probability under the null ≈ 1 in 39 billion.
~ PARTIAL H3 — Specialty-peril surfacing: conditional. High-value jewelry → Chubb replicates (4/5 + 4/5). Wildfire and flood do not replicate cleanly.
Surprise finding: the H1 refutation forced a better hypothesis — regional-champion dominance (Boston → MAPFRE 5/5 both LLMs; Columbus → Cincinnati 5/5 Perplexity). Carriers with dominant state-level market share win state queries, regardless of HQ city.

Method

Two stages. Stage 1 was an ablation ladder: 10 queries holding intent constant (home insurance) and adding one element per row. Each query ran once across 4 LLMs with web search. Stage 1 output three candidate hypotheses.

Stage 2 was validation. For each hypothesis, we designed specific predictions (5 HQ cities, 5 property types, 3 specialty perils), then ran each validation query 5 times per LLM and recorded the modal first-carrier. This measures stability: a hypothesis only survives if it replicates across runs and ideally across multiple LLMs.

Observations captured on 2026-04-24. Stage 1 ablation ran Perplexity Sonar Pro, Claude Opus 4.5, and Gemini 2.5 Pro. Stage 2 variance testing uses Perplexity + Gemini (5 runs per query) as two independent LLM observations, which is the basis for cross-LLM replication checks throughout the piece.

Stage 1: ablation ladder (hypothesis generation)

One run per (query, LLM). Compact view — this is the input to hypothesis generation, not the validated output.

Query	Perplexity	Claude	Gemini
Baseline — "best value housing insurance"	Erie	USAA	Chubb
+ Seattle	Allstate	Allstate	Allstate
+ Atlanta	Farmers	State Farm	Auto-Owners
+ Jacksonville FL	Allstate	Tower Hill	Tower Hill
+ Chicago	Allstate	Allstate	Allstate
"cheapest" vs "best value" in Seattle	Allstate	Safeco	Safeco
+ earthquake (Seattle)	GeoVera	GeoVera	—
+ first-time homebuyer (Seattle)	State Farm	Allstate	State Farm
+ condo (Seattle)	Nationwide	Nationwide	Nationwide
+ $500k home (Seattle)	Allstate	Allstate	State Farm

Three candidate hypotheses emerged:

H1 — HQ-city effect: Chicago → Allstate unanimously (Allstate HQ is Northbrook IL). Hypothesis: HQ cities systematically surface the HQ'd carrier.
H2 — Property-type ownership: Condo → Nationwide unanimously. Hypothesis: property-type content is ownable per carrier; different property types should surface different carriers.
H3 — Specialty-peril → specialty carrier: Earthquake → GeoVera. Hypothesis: specialty perils cleanly surface specialists.

Stage 2: validation (hypothesis testing)

5 runs per (query, LLM). Cell format: carrier:N/5 where N is how many of 5 runs returned that modal carrier.

H1 — HQ-city effect

✗ REFUTED

City	Predicted (HQ'd carrier)	Perplexity modal	Gemini modal
Bloomington, IL	State Farm	USAA:4/5	State Farm:5/5
Columbus, OH	Nationwide	Cincinnati:5/5	Erie:4/5
Hartford, CT	Travelers	MetLife:3/5	State Farm:3/5
Mayfield Village, OH	Progressive	Cincinnati:3/5	Erie:3/5
Boston, MA	Liberty Mutual	MAPFRE:5/5	MAPFRE:5/5

Outcome: 1 of 10 predictions matched. Gemini/Bloomington returned State Farm 5/5 — the only hit. Every other HQ city surfaced a different carrier, often with very high stability (Boston → MAPFRE 5/5 on both LLMs, Columbus → Cincinnati 5/5 Perplexity). The predicted HQ'd carrier is not the retrieval winner.

Replacement hypothesis forced by the data: regional-champion dominance. Carriers with dominant state- level market share (MAPFRE in MA, Cincinnati in OH, regional mutuals) win state-level queries, regardless of HQ address. Boston's HQ'd carrier (Liberty Mutual) loses to MAPFRE 5/5 on both LLMs because MAPFRE is the Massachusetts volume leader. This is a better, data-supported hypothesis.

H2 — Property-type ownership

✓ CONFIRMED

Property type	Predicted	Perplexity modal	Gemini modal
condo	Nationwide (from ablation)	Nationwide:5/5	Nationwide:5/5
mobile home	manufactured-home specialist	American Modern:3/5	American Family:2/5
luxury home	Chubb / PURE	Chubb:5/5	Chubb:5/5
historic home	specialty (Chubb likely)	Chubb:5/5	Chubb:5/5
townhouse	unclear	State Farm:4/5	Nationwide:4/5

Outcome: confirmed at overwhelming significance. Condo → Nationwide (5/5 + 5/5), luxury home → Chubb (5/5 + 5/5), historic home → Chubb (5/5 + 5/5). Three property types, three cross-LLM-unanimous owners. Under the null hypothesis (LLMs pick carriers at random from ~50 candidates), each 5/5 + 5/5 joint result has probability ≈ 1/39B.

Sub-finding: Chubb wins both luxury AND historic at 5/5 on both LLMs. Chubb owns "premium / risk-elevated property" as a category, not just one property type. That's a cleaner formulation of the hypothesis.

Partial cases: mobile-home surfaces manufactured-home specialists (American Modern 3/5 Perplexity, American Family 2/5 Gemini) — the category pattern holds, the specific carrier is noisy. Townhouse is a cross-LLM divergence (Perplexity → State Farm 4/5; Gemini → Nationwide 4/5) — each LLM is internally stable but they disagree.

H3 — Specialty-peril surfacing

~ PARTIAL

Peril	Predicted	Perplexity modal	Gemini modal
flood	flood specialist	FirstMark:3/5	Travelers:3/5
wildfire	wildfire specialist	American Family:4/5	Allstate:2/5
high-value jewelry	Chubb / PURE	Chubb:4/5	Chubb:4/5

Outcome: conditional. Only high-value jewelry replicates cleanly (Chubb 4/5 + 4/5, strong cross- LLM signal). Flood is split (Perplexity surfaces FirstMark, a specialty carrier, 3/5; Gemini surfaces Travelers, a generalist, 3/5). Wildfire surfaces generalists on both LLMs — no specialist wins. The specialty-peril rule is not universal.

Refined hypothesis: specialty-carrier dominance holds where the specialist category has saturated editorial coverage (high-net-worth insurance via Chubb) and fails where editorial coverage defaults to generalists (wildfire, flood-on-Gemini). Category-level editorial depth, not peril specificity per se, drives the outcome.

The hypotheses that survive

After validation, three data-supported hypotheses remain — different from the originals:

Property-type ownership is real and replicable. Carriers can cleanly own specific property-type queries (Nationwide = condo; Chubb = premium property). Cross-LLM 5/5 replication rules out chance.
Regional-champion dominance beats HQ-city dominance. The carrier with the largest state- level volume wins state-level queries, regardless of HQ address. MAPFRE (MA), Cincinnati (OH regional presence), Tower Hill (FL — from the original ablation).
Specialty-carrier surfacing is editorial-coverage- driven, not peril-specific. Where a specialty category has saturated editorial coverage (high-net-worth → Chubb), specialists win. Where it doesn't (wildfire), generalists win by default.

Actionable content-strategy briefs — built only on surviving hypotheses

For Nationwide (currently winning condo)

Defend your 5/5 condo dominance. Measure monthly whether challengers (Travelers, Farmers) start surfacing in the condo query. Adjacent query shapes to claim next: townhouse (contested), multi-family, duplex. Your property-type editorial depth is an asset you can extend.

For Chubb (currently winning luxury + historic + jewelry)

You own the "premium / elevated-risk property" cluster on two independent LLMs with 5/5 stability. Your natural expansion is into related specialty queries: art + collectibles insurance, private-client umbrella, high-value jewelry travel insurance. PURE and AIG Private Client are your direct challengers; they have less consumer-editorial surface currently.

For regional specialists (MAPFRE in MA, Cincinnati in OH, Tower Hill in FL, PEMCO in WA, Mercury in CA)

The data shows regional-champion dominance is real. You already win state-level queries where your market share is dominant. Two investments: (1) audit your citation- share in your top-3 states to confirm; (2) defend against national brands who are out-content-marketing you in city-level queries inside your states.

For national carriers (State Farm, Allstate, Nationwide, Travelers)

HQ-city effect is NOT a real defense. Your HQ city does not automatically surface you. Your defensible positions are national-brand-awareness queries (where you win on ad saturation) and property-type / demographic queries you explicitly invest in (first-time homebuyer for State Farm is a good example — the editorial surface is yours).

For specialty carriers (GeoVera, FirstMark, Neptune Flood, Stillwater)

Specialty-peril surfacing is conditional on editorial- category depth. Where your peril has saturated content (earthquake, high-net-worth), you win. Where it doesn't (wildfire, some flood contexts), you lose to generalists by default. The content investment is category-depth, not brand-awareness.

Method limitations

Two LLMs, not four. Cross-LLM replication uses Perplexity + Gemini as two independent observations per query. Adding more LLMs would strengthen every replication finding.
Single-day snapshot. Observations are from 2026-04-24. LLM retrieval indexes refresh. A 7-day retest is scheduled to check whether 5/5 findings are time-stable.
One buyer intent tested here. This study tests home insurance specifically. The auto-insurance replication is now published at ablation-auto-insurance: vehicle-type ownership does NOT generalize cleanly from property-type, specialty-use-case ownership replicates but with generalists winning (inverted), and regional-champion dominance is partial. Life, health, and commercial remain untested.
First-named-carrier is a proxy. A carrier named second or third in a response still gets some attention. Our analysis captures the rank-1 signal only. Richer analysis would weight the top-3 positions by order.

Raw data

All probe outputs are committed to the Phidea repository for audit:

data/probe-ablation-home-insurance-2026-04-24.json — Stage 1 ablation runs
data/probe-validation-hypotheses-2026-04-24.json — Stage 2 initial validation (1 run per query)
data/probe-validation-v2-2026-04-24.json — Stage 2 variance validation (5 runs per query, the source of the 5/5 stability numbers in this piece)

Probe scripts: scripts/probe-ablation-home-insurance.mjs, scripts/probe-validation-hypotheses.mjs, scripts/probe-validation-v2-variance.mjs.

Auto insurance — does the home pattern generalize? — the replication study on auto: vehicle-type ownership refuted, specialty-use-case ownership confirmed but inverted.
Distribution through LLMs — the umbrella