phidea
Time-stability check · 2026-04-262026-04-29 · 5-day diff

Most observations held. Three flipped. Gemini coverage was degraded — read the Perplexity column.

Re-ran the three multi-query probes (price-anchor wording, bundled policies, commercial cyber) five days after the original observations. Same probe configs, same 5-runs-per- LLM cadence. The point of the retest: confirm whether the observations are time-stable enough to recommend acting on.

TL;DR
  • Perplexity, where coverage was reliable: 12 of 15 queries returned the same modal carrier or shifted only in count. 3 of 15 flipped carriers.
  • The flips were specific.Seattle “cheapest” (State Farm → Kemper). Seattle “most reliable” (USAA → State Farm). Fintech-startup cyber (Chubb → Coalition). Each is a full carrier replacement at high stability.
  • Gemini coverage was materially degraded today, returning 0-2 valid runs out of 5 across most queries (vs typical 3-5/5 in the baseline). Likely a temporary API / grounding issue, not a real instability signal. Flagged as method limitation; do not interpret Gemini drops as findings.
  • Bundled policies were the most stable group on Perplexity: 5 of 5 queries held the same modal carrier cross-day (Newark NJ → Farmers strengthened from 2/5 to 4/5).
  • Commercial cyber surfaced one new pattern: Coalition (cyber-native insurtech) is ascending in the fintech-startup cyber surface, displacing Chubb. The round-2 finding that “Chubb wins commercial cyber across verticals” needs softening for fintech specifically.

Price anchor (auto, Seattle)

QueryPerp 2026-04-26Perp 2026-04-29Perp verdictGem 2026-04-26Gem 2026-04-29Gem verdict
best-value-seattle
What's the best value car insurance in Seattle?
USAA:4/5USAA:4/5stableState Farm:4/5—:0/5degraded coverage
cheapest-seattle
What's the cheapest car insurance in Seattle?
State Farm:5/5Kemper:5/5driftedState Farm:5/5USAA:1/5degraded coverage
premium-seattle
What's the premium car insurance in Seattle?
USAA:4/5USAA:4/5stableChubb:4/5Chubb:1/5degraded coverage
top-rated-seattle
What's the top-rated car insurance in Seattle?
USAA:4/5USAA:5/5stableUSAA:3/5State Farm:2/5degraded coverage
most-reliable-seattle
What's the most reliable car insurance in Seattle?
USAA:5/5State Farm:4/5driftedUSAA:4/5State Farm:1/5degraded coverage

cheapest-seattle. Full carrier replacement on Perplexity. State Farm → Kemper at maximum stability on both ends. Strong signal that 'cheapest' is volatile across short windows.

most-reliable-seattle. Full carrier replacement on Perplexity. USAA was 5/5 originally; today State Farm 4/5. The 'most reliable' framing is retrieval-volatile.

Bundled policies

QueryPerp 2026-04-26Perp 2026-04-29Perp verdictGem 2026-04-26Gem 2026-04-29Gem verdict
bundle-seattle
What's the best home and auto insurance bundle in Seattle?
State Farm:5/5State Farm:5/5stablePEMCO:3/5State Farm:2/5degraded coverage
bundle-newark-nj
What's the best home and auto insurance bundle in Newark, New Jersey?
Farmers:2/5Farmers:4/5stableNJM:3/5Farmers:1/5degraded coverage
bundle-boston
What's the best home and auto insurance bundle in Boston, Massachusetts?
Farmers:5/5Farmers:5/5stableFarmers:5/5Farmers:2/5degraded coverage
bundle-atlanta
What's the best home and auto insurance bundle in Atlanta, Georgia?
Progressive:4/5Progressive:5/5stableState Farm:2/5State Farm:1/5degraded coverage
bundle-la
What's the best home and auto insurance bundle in Los Angeles, California?
AAA:5/5AAA:5/5stableAAA:4/5AAA:1/5degraded coverage

bundle-newark-nj. Perplexity confirms Farmers wins NJ bundles, this time at 4/5 instead of 2/5 — stronger evidence than the original. NJM does NOT extend to bundles.

bundle-boston. Boston bundle → Farmers replicates at 5/5 on Perplexity. Strong cross-day stability.

Commercial cyber

QueryPerp 2026-04-26Perp 2026-04-29Perp verdictGem 2026-04-26Gem 2026-04-29Gem verdict
saas-50
What's the best cyber insurance for a 50-person SaaS startup?
Chubb:5/5Chubb:5/5stableHiscox:2/5—:0/5degraded coverage
law-firm-mid
What's the best cyber insurance for a mid-market law firm?
Chubb:5/5Chubb:5/5stableChubb:5/5Chubb:3/5stable
healthcare-smb
What's the best cyber insurance for a healthcare SMB?
Chubb:5/5Chubb:5/5stableChubb:3/5—:0/5degraded coverage
fintech-startup
What's the best cyber insurance for a fintech startup?
Chubb:4/5Coalition:4/5driftedCoalition:3/5Coalition:2/5stable
manufacturer-200
What's the best cyber insurance for a manufacturer with 200 employees?
Chubb:5/5Chubb:4/5stableChubb:4/5Chubb:1/5degraded coverage

law-firm-mid. Cleanest cross-LLM, cross-day stability in the dataset. Chubb owns law-firm cyber unambiguously.

fintech-startup. Material drift. Chubb owned fintech cyber on Perplexity 5 days ago; today Coalition wins. Coalition (cyber-native insurtech) appears to be ascending in the fintech-vertical surface. The 'specialists lose to generalists' rule from round 2 may NOT hold for fintech specifically.

What this means for the registry

Three observations need their stability claims softened.

  • 2026-04-auto-price-quality-anchor-wording-cheapest-seattle — was “State Farm 5/5 + 5/5”; today Kemper 5/5. The “cheapest” lever is real but the carrier assignment is volatile.
  • 2026-04-auto-price-quality-anchor-wording-most-reliable-seattle — was “USAA 5/5 + 4/5”; today State Farm. Same pattern.
  • 2026-04-commercial-commercial-cyber-by-vertical-fintech-startup — was Chubb on Perplexity; today Coalition on both LLMs. The cross-vertical commercial-cyber claim doesn't hold for fintech specifically.

Twelve observations strengthened or held. Boston bundle → Farmers, LA bundle → AAA, law-firm cyber → Chubb, healthcare-SMB cyber → Chubb, manufacturer cyber → Chubb, Seattle bundle → State Farm — all replicated at the same or stronger stability cross-day. The first three should be considered high-confidence findings.

Method limitation: Gemini coverage today was unreliable.Across all 15 queries, Gemini returned 0-2 valid first-named-carrier results out of 5 attempts (vs 3-5/5 in the baseline). The most likely explanation is an issue with Gemini's grounding tool or rate-limits today, not a real change in retrieval. We’re not interpreting the Gemini drops as findings; future retests with full Gemini coverage will give a cleaner cross-LLM picture.

Raw data

  • data/probe-auto-price-quality-anchor-wording-2026-04-29.json
  • data/probe-home-bundled-policy-prompt-2026-04-29.json
  • data/probe-commercial-commercial-cyber-by-vertical-2026-04-29.json
  • Compared against the 2026-04-26 versions of the same files in the same directory.

Next time-stability retest: planned for 7 days out (2026-05-06) to confirm whether today’s drifts are persistent or ephemeral, with full Gemini coverage assumed.