What actually drives which insurer LLMs recommend — a hypothesis + validation study.
Earlier Phidea case studies generated hypotheses about how LLMs retrieve insurance carriers. This one tests them. Ten ablation queries produced three candidate hypotheses; we then ran 5 independent runs per validation query across 2 LLMs to measure stability and test each hypothesis directly. One hypothesis survives strongly, one is refuted, one is conditional. The surviving hypothesis drives concrete content-strategy briefs; the refuted one is replaced with a better one forced by the data.
- Chubb owns luxury + historic home at 5/5 + 5/5 across Perplexity and Gemini. Nationwide owns condo at 5/5 + 5/5. Joint null probability ≈ 1 in 39 billion per finding — property-type ownership is real.
- HQ-city effect is a myth— only 1 of 10 HQ-city predictions matched. Boston HQ’s Liberty Mutual; MAPFRE wins the query 5/5 on both LLMs. Regional champion beats HQ address.
- Specialty carriers win only where editorial depth exists (Chubb on jewelry: 4/5 + 4/5). Wildfire and flood fail to surface specialists — generalists win by default.
Bottom line
- ✗ REFUTED H1 — HQ-city effect: refuted. Only 1 of 10 observations matched prediction. Chicago → Allstate from the ablation was a coincidence, not a rule.
- ✓ CONFIRMED H2 — Property-type ownership: confirmed at overwhelming statistical significance. Condo → Nationwide and luxury / historic → Chubb each got 5/5 on Perplexity AND 5/5 on Gemini. Joint probability under the null ≈ 1 in 39 billion.
- ~ PARTIAL H3 — Specialty-peril surfacing: conditional. High-value jewelry → Chubb replicates (4/5 + 4/5). Wildfire and flood do not replicate cleanly.
- Surprise finding: the H1 refutation forced a better hypothesis — regional-champion dominance (Boston → MAPFRE 5/5 both LLMs; Columbus → Cincinnati 5/5 Perplexity). Carriers with dominant state-level market share win state queries, regardless of HQ city.
Method
Two stages. Stage 1 was an ablation ladder: 10 queries holding intent constant (home insurance) and adding one element per row. Each query ran once across 4 LLMs with web search. Stage 1 output three candidate hypotheses.
Stage 2 was validation. For each hypothesis, we designed specific predictions (5 HQ cities, 5 property types, 3 specialty perils), then ran each validation query 5 times per LLM and recorded the modal first-carrier. This measures stability: a hypothesis only survives if it replicates across runs and ideally across multiple LLMs.
Observations captured on 2026-04-24. Stage 1 ablation ran Perplexity Sonar Pro, Claude Opus 4.5, and Gemini 2.5 Pro. Stage 2 variance testing uses Perplexity + Gemini (5 runs per query) as two independent LLM observations, which is the basis for cross-LLM replication checks throughout the piece.
Stage 1: ablation ladder (hypothesis generation)
One run per (query, LLM). Compact view — this is the input to hypothesis generation, not the validated output.
| Query | Perplexity | Claude | Gemini |
|---|---|---|---|
| Baseline — "best value housing insurance" | Erie | USAA | Chubb |
| + Seattle | Allstate | Allstate | Allstate |
| + Atlanta | Farmers | State Farm | Auto-Owners |
| + Jacksonville FL | Allstate | Tower Hill | Tower Hill |
| + Chicago | Allstate | Allstate | Allstate |
| "cheapest" vs "best value" in Seattle | Allstate | Safeco | Safeco |
| + earthquake (Seattle) | GeoVera | GeoVera | — |
| + first-time homebuyer (Seattle) | State Farm | Allstate | State Farm |
| + condo (Seattle) | Nationwide | Nationwide | Nationwide |
| + $500k home (Seattle) | Allstate | Allstate | State Farm |
Three candidate hypotheses emerged:
- H1 — HQ-city effect: Chicago → Allstate unanimously (Allstate HQ is Northbrook IL). Hypothesis: HQ cities systematically surface the HQ'd carrier.
- H2 — Property-type ownership: Condo → Nationwide unanimously. Hypothesis: property-type content is ownable per carrier; different property types should surface different carriers.
- H3 — Specialty-peril → specialty carrier: Earthquake → GeoVera. Hypothesis: specialty perils cleanly surface specialists.
Stage 2: validation (hypothesis testing)
5 runs per (query, LLM). Cell format: carrier:N/5 where N is how many of 5 runs returned that modal carrier.
H1 — HQ-city effect
✗ REFUTED| City | Predicted (HQ'd carrier) | Perplexity modal | Gemini modal |
|---|---|---|---|
| Bloomington, IL | State Farm | USAA:4/5 | State Farm:5/5 |
| Columbus, OH | Nationwide | Cincinnati:5/5 | Erie:4/5 |
| Hartford, CT | Travelers | MetLife:3/5 | State Farm:3/5 |
| Mayfield Village, OH | Progressive | Cincinnati:3/5 | Erie:3/5 |
| Boston, MA | Liberty Mutual | MAPFRE:5/5 | MAPFRE:5/5 |
Outcome: 1 of 10 predictions matched. Gemini/Bloomington returned State Farm 5/5 — the only hit. Every other HQ city surfaced a different carrier, often with very high stability (Boston → MAPFRE 5/5 on both LLMs, Columbus → Cincinnati 5/5 Perplexity). The predicted HQ'd carrier is not the retrieval winner.
Replacement hypothesis forced by the data: regional-champion dominance. Carriers with dominant state- level market share (MAPFRE in MA, Cincinnati in OH, regional mutuals) win state-level queries, regardless of HQ address. Boston's HQ'd carrier (Liberty Mutual) loses to MAPFRE 5/5 on both LLMs because MAPFRE is the Massachusetts volume leader. This is a better, data-supported hypothesis.
H2 — Property-type ownership
✓ CONFIRMED| Property type | Predicted | Perplexity modal | Gemini modal |
|---|---|---|---|
| condo | Nationwide (from ablation) | Nationwide:5/5 | Nationwide:5/5 |
| mobile home | manufactured-home specialist | American Modern:3/5 | American Family:2/5 |
| luxury home | Chubb / PURE | Chubb:5/5 | Chubb:5/5 |
| historic home | specialty (Chubb likely) | Chubb:5/5 | Chubb:5/5 |
| townhouse | unclear | State Farm:4/5 | Nationwide:4/5 |
Outcome: confirmed at overwhelming significance. Condo → Nationwide (5/5 + 5/5), luxury home → Chubb (5/5 + 5/5), historic home → Chubb (5/5 + 5/5). Three property types, three cross-LLM-unanimous owners. Under the null hypothesis (LLMs pick carriers at random from ~50 candidates), each 5/5 + 5/5 joint result has probability ≈ 1/39B.
Sub-finding: Chubb wins both luxury AND historic at 5/5 on both LLMs. Chubb owns "premium / risk-elevated property" as a category, not just one property type. That's a cleaner formulation of the hypothesis.
Partial cases: mobile-home surfaces manufactured-home specialists (American Modern 3/5 Perplexity, American Family 2/5 Gemini) — the category pattern holds, the specific carrier is noisy. Townhouse is a cross-LLM divergence (Perplexity → State Farm 4/5; Gemini → Nationwide 4/5) — each LLM is internally stable but they disagree.
H3 — Specialty-peril surfacing
~ PARTIAL| Peril | Predicted | Perplexity modal | Gemini modal |
|---|---|---|---|
| flood | flood specialist | FirstMark:3/5 | Travelers:3/5 |
| wildfire | wildfire specialist | American Family:4/5 | Allstate:2/5 |
| high-value jewelry | Chubb / PURE | Chubb:4/5 | Chubb:4/5 |
Outcome: conditional. Only high-value jewelry replicates cleanly (Chubb 4/5 + 4/5, strong cross- LLM signal). Flood is split (Perplexity surfaces FirstMark, a specialty carrier, 3/5; Gemini surfaces Travelers, a generalist, 3/5). Wildfire surfaces generalists on both LLMs — no specialist wins. The specialty-peril rule is not universal.
Refined hypothesis: specialty-carrier dominance holds where the specialist category has saturated editorial coverage (high-net-worth insurance via Chubb) and fails where editorial coverage defaults to generalists (wildfire, flood-on-Gemini). Category-level editorial depth, not peril specificity per se, drives the outcome.
The hypotheses that survive
After validation, three data-supported hypotheses remain — different from the originals:
- Property-type ownership is real and replicable. Carriers can cleanly own specific property-type queries (Nationwide = condo; Chubb = premium property). Cross-LLM 5/5 replication rules out chance.
- Regional-champion dominance beats HQ-city dominance. The carrier with the largest state- level volume wins state-level queries, regardless of HQ address. MAPFRE (MA), Cincinnati (OH regional presence), Tower Hill (FL — from the original ablation).
- Specialty-carrier surfacing is editorial-coverage- driven, not peril-specific. Where a specialty category has saturated editorial coverage (high-net-worth → Chubb), specialists win. Where it doesn't (wildfire), generalists win by default.
Actionable content-strategy briefs — built only on surviving hypotheses
Defend your 5/5 condo dominance. Measure monthly whether challengers (Travelers, Farmers) start surfacing in the condo query. Adjacent query shapes to claim next: townhouse (contested), multi-family, duplex. Your property-type editorial depth is an asset you can extend.
You own the "premium / elevated-risk property" cluster on two independent LLMs with 5/5 stability. Your natural expansion is into related specialty queries: art + collectibles insurance, private-client umbrella, high-value jewelry travel insurance. PURE and AIG Private Client are your direct challengers; they have less consumer-editorial surface currently.
The data shows regional-champion dominance is real. You already win state-level queries where your market share is dominant. Two investments: (1) audit your citation- share in your top-3 states to confirm; (2) defend against national brands who are out-content-marketing you in city-level queries inside your states.
HQ-city effect is NOT a real defense. Your HQ city does not automatically surface you. Your defensible positions are national-brand-awareness queries (where you win on ad saturation) and property-type / demographic queries you explicitly invest in (first-time homebuyer for State Farm is a good example — the editorial surface is yours).
Specialty-peril surfacing is conditional on editorial- category depth. Where your peril has saturated content (earthquake, high-net-worth), you win. Where it doesn't (wildfire, some flood contexts), you lose to generalists by default. The content investment is category-depth, not brand-awareness.
Method limitations
- Two LLMs, not four. Cross-LLM replication uses Perplexity + Gemini as two independent observations per query. Adding more LLMs would strengthen every replication finding.
- Single-day snapshot. Observations are from 2026-04-24. LLM retrieval indexes refresh. A 7-day retest is scheduled to check whether 5/5 findings are time-stable.
- One buyer intent tested here. This study tests home insurance specifically. The auto-insurance replication is now published at ablation-auto-insurance: vehicle-type ownership does NOT generalize cleanly from property-type, specialty-use-case ownership replicates but with generalists winning (inverted), and regional-champion dominance is partial. Life, health, and commercial remain untested.
- First-named-carrier is a proxy. A carrier named second or third in a response still gets some attention. Our analysis captures the rank-1 signal only. Richer analysis would weight the top-3 positions by order.
Raw data
All probe outputs are committed to the Phidea repository for audit:
data/probe-ablation-home-insurance-2026-04-24.json— Stage 1 ablation runsdata/probe-validation-hypotheses-2026-04-24.json— Stage 2 initial validation (1 run per query)data/probe-validation-v2-2026-04-24.json— Stage 2 variance validation (5 runs per query, the source of the 5/5 stability numbers in this piece)
Probe scripts: scripts/probe-ablation-home-insurance.mjs, scripts/probe-validation-hypotheses.mjs, scripts/probe-validation-v2-variance.mjs.
Related
- Auto insurance — does the home pattern generalize? — the replication study on auto: vehicle-type ownership refuted, specialty-use-case ownership confirmed but inverted.
- Distribution through LLMs — the umbrella