Most observations held. Three flipped. Gemini coverage was degraded — read the Perplexity column.
Re-ran the three multi-query probes (price-anchor wording, bundled policies, commercial cyber) five days after the original observations. Same probe configs, same 5-runs-per- LLM cadence. The point of the retest: confirm whether the observations are time-stable enough to recommend acting on.
- Perplexity, where coverage was reliable: 12 of 15 queries returned the same modal carrier or shifted only in count. 3 of 15 flipped carriers.
- The flips were specific.Seattle “cheapest” (State Farm → Kemper). Seattle “most reliable” (USAA → State Farm). Fintech-startup cyber (Chubb → Coalition). Each is a full carrier replacement at high stability.
- Gemini coverage was materially degraded today, returning 0-2 valid runs out of 5 across most queries (vs typical 3-5/5 in the baseline). Likely a temporary API / grounding issue, not a real instability signal. Flagged as method limitation; do not interpret Gemini drops as findings.
- Bundled policies were the most stable group on Perplexity: 5 of 5 queries held the same modal carrier cross-day (Newark NJ → Farmers strengthened from 2/5 to 4/5).
- Commercial cyber surfaced one new pattern: Coalition (cyber-native insurtech) is ascending in the fintech-startup cyber surface, displacing Chubb. The round-2 finding that “Chubb wins commercial cyber across verticals” needs softening for fintech specifically.
Price anchor (auto, Seattle)
| Query | Perp 2026-04-26 | Perp 2026-04-29 | Perp verdict | Gem 2026-04-26 | Gem 2026-04-29 | Gem verdict |
|---|---|---|---|---|---|---|
best-value-seattle What's the best value car insurance in Seattle? | USAA:4/5 | USAA:4/5 | stable | State Farm:4/5 | —:0/5 | degraded coverage |
cheapest-seattle What's the cheapest car insurance in Seattle? | State Farm:5/5 | Kemper:5/5 | drifted | State Farm:5/5 | USAA:1/5 | degraded coverage |
premium-seattle What's the premium car insurance in Seattle? | USAA:4/5 | USAA:4/5 | stable | Chubb:4/5 | Chubb:1/5 | degraded coverage |
top-rated-seattle What's the top-rated car insurance in Seattle? | USAA:4/5 | USAA:5/5 | stable | USAA:3/5 | State Farm:2/5 | degraded coverage |
most-reliable-seattle What's the most reliable car insurance in Seattle? | USAA:5/5 | State Farm:4/5 | drifted | USAA:4/5 | State Farm:1/5 | degraded coverage |
cheapest-seattle. Full carrier replacement on Perplexity. State Farm → Kemper at maximum stability on both ends. Strong signal that 'cheapest' is volatile across short windows.
most-reliable-seattle. Full carrier replacement on Perplexity. USAA was 5/5 originally; today State Farm 4/5. The 'most reliable' framing is retrieval-volatile.
Bundled policies
| Query | Perp 2026-04-26 | Perp 2026-04-29 | Perp verdict | Gem 2026-04-26 | Gem 2026-04-29 | Gem verdict |
|---|---|---|---|---|---|---|
bundle-seattle What's the best home and auto insurance bundle in Seattle? | State Farm:5/5 | State Farm:5/5 | stable | PEMCO:3/5 | State Farm:2/5 | degraded coverage |
bundle-newark-nj What's the best home and auto insurance bundle in Newark, New Jersey? | Farmers:2/5 | Farmers:4/5 | stable | NJM:3/5 | Farmers:1/5 | degraded coverage |
bundle-boston What's the best home and auto insurance bundle in Boston, Massachusetts? | Farmers:5/5 | Farmers:5/5 | stable | Farmers:5/5 | Farmers:2/5 | degraded coverage |
bundle-atlanta What's the best home and auto insurance bundle in Atlanta, Georgia? | Progressive:4/5 | Progressive:5/5 | stable | State Farm:2/5 | State Farm:1/5 | degraded coverage |
bundle-la What's the best home and auto insurance bundle in Los Angeles, California? | AAA:5/5 | AAA:5/5 | stable | AAA:4/5 | AAA:1/5 | degraded coverage |
bundle-newark-nj. Perplexity confirms Farmers wins NJ bundles, this time at 4/5 instead of 2/5 — stronger evidence than the original. NJM does NOT extend to bundles.
bundle-boston. Boston bundle → Farmers replicates at 5/5 on Perplexity. Strong cross-day stability.
Commercial cyber
| Query | Perp 2026-04-26 | Perp 2026-04-29 | Perp verdict | Gem 2026-04-26 | Gem 2026-04-29 | Gem verdict |
|---|---|---|---|---|---|---|
saas-50 What's the best cyber insurance for a 50-person SaaS startup? | Chubb:5/5 | Chubb:5/5 | stable | Hiscox:2/5 | —:0/5 | degraded coverage |
law-firm-mid What's the best cyber insurance for a mid-market law firm? | Chubb:5/5 | Chubb:5/5 | stable | Chubb:5/5 | Chubb:3/5 | stable |
healthcare-smb What's the best cyber insurance for a healthcare SMB? | Chubb:5/5 | Chubb:5/5 | stable | Chubb:3/5 | —:0/5 | degraded coverage |
fintech-startup What's the best cyber insurance for a fintech startup? | Chubb:4/5 | Coalition:4/5 | drifted | Coalition:3/5 | Coalition:2/5 | stable |
manufacturer-200 What's the best cyber insurance for a manufacturer with 200 employees? | Chubb:5/5 | Chubb:4/5 | stable | Chubb:4/5 | Chubb:1/5 | degraded coverage |
law-firm-mid. Cleanest cross-LLM, cross-day stability in the dataset. Chubb owns law-firm cyber unambiguously.
fintech-startup. Material drift. Chubb owned fintech cyber on Perplexity 5 days ago; today Coalition wins. Coalition (cyber-native insurtech) appears to be ascending in the fintech-vertical surface. The 'specialists lose to generalists' rule from round 2 may NOT hold for fintech specifically.
What this means for the registry
Three observations need their stability claims softened.
2026-04-auto-price-quality-anchor-wording-cheapest-seattle— was “State Farm 5/5 + 5/5”; today Kemper 5/5. The “cheapest” lever is real but the carrier assignment is volatile.2026-04-auto-price-quality-anchor-wording-most-reliable-seattle— was “USAA 5/5 + 4/5”; today State Farm. Same pattern.2026-04-commercial-commercial-cyber-by-vertical-fintech-startup— was Chubb on Perplexity; today Coalition on both LLMs. The cross-vertical commercial-cyber claim doesn't hold for fintech specifically.
Twelve observations strengthened or held. Boston bundle → Farmers, LA bundle → AAA, law-firm cyber → Chubb, healthcare-SMB cyber → Chubb, manufacturer cyber → Chubb, Seattle bundle → State Farm — all replicated at the same or stronger stability cross-day. The first three should be considered high-confidence findings.
Method limitation: Gemini coverage today was unreliable.Across all 15 queries, Gemini returned 0-2 valid first-named-carrier results out of 5 attempts (vs 3-5/5 in the baseline). The most likely explanation is an issue with Gemini's grounding tool or rate-limits today, not a real change in retrieval. We’re not interpreting the Gemini drops as findings; future retests with full Gemini coverage will give a cleaner cross-LLM picture.
Raw data
data/probe-auto-price-quality-anchor-wording-2026-04-29.jsondata/probe-home-bundled-policy-prompt-2026-04-29.jsondata/probe-commercial-commercial-cyber-by-vertical-2026-04-29.json- Compared against the 2026-04-26 versions of the same files in the same directory.
Next time-stability retest: planned for 7 days out (2026-05-06) to confirm whether today’s drifts are persistent or ephemeral, with full Gemini coverage assumed.