91. Blog Stale Stats Audit (V12/V13.x pollution)

Date: 2026-05-20 Scope: BlogArticle table, status='published' (104 rows, 52 EN + 52 FR). Tool: scripts/_audit-blog-stale.ts (read-only). Raw dump: /tmp/blog-stale-audit.json.

TL;DR

37 of 104 published articles still cite stale V12/V13.x backtest literals as truth.
30 EN + 7 FR. FR locale almost untouched (the FR copy of these articles paraphrases without the numbers).
Dominant offender: the triplet mean 90-day return -1.70 percent, win rate 41.82 percent, Sharpe -0.08 lifted from the V12 aggregate cohort. It appears verbatim in 28 EN articles, often twice.
34 of 37 cite the numbers with zero disclaimer (category A). 0 carry an "outdated/superseded by V13.5" wrapper (B). 3 are methodological and the citation could survive as historical context if framed (C).
No FR article was misclassified into C. The 7 FR hits are 1 or 2 literal numbers, not paragraph-level stat dumps.

Pattern frequency

pattern	hits
`Sharpe -0.08` (with "Sharpe" within 60 chars on same line)	39
literal `-1.70` (not part of a longer number)	39
literal `-0.08` (not part of a longer number)	39
literal `14.1` (the AUM/CAGR mismatch flagged previously)	18
Sharpe 1.15 / 1.08 / 0.70, CAGR 25.4 / 43.2, DSR 0.33 / 0.53, hit rate 48-62 percent	0
`maxDD -7.x`, `deflated negative` co-located with Sharpe	0

Note: the V13.4 grid bake-off literals (Sharpe 1.15, CAGR 25.4, DSR 0.33) are not present in published blog copy. The infection vector is narrower than feared: it is the V12 universe cohort triplet plus one stray 14.1 literal in 18 articles.

Counts

Scanned: 104 articles (52 EN + 52 FR, status='published')
Hit at least once: 37 (35.6 percent)
Category A (cited as fact, no disclaimer): 34
Category B (cited with disclaimer / "superseded by V13.5"): 0
Category C (methodological / version-history piece where citation is contextually OK): 3
Hit by locale: EN 30 / 52 (57.7 percent), FR 7 / 52 (13.5 percent)

Top 20 worst offenders

(ranking by number of stale-literal matches across intro + descriptionMeta + bodyMdx)

hits	cat	locale	slug
7	A	en	survivorship-bias-in-insider-studies-how-we-corrected-ours
7	A	en	walk-forward-validation-2022-2024-train-2025-2026-test
7	A	en	ceo-vs-cfo-buys-which-signal-wins-on-t90
6	A	en	liquidity-adjusted-scoring-why-we-cap-trades-greatereuro1b
6	A	en	do-women-insiders-outperform-men-cross-market-gender-t90-analysis
6	A	en	multiple-testing-inflation-583k-grid-search-trials-how-to-deflate
6	C	en	the-composite-score-audit-why-we-use-30352510-weights
6	A	en	cluster-trades-revisiting-cohen-malloy-pomorski-on-17-markets
6	A	en	earnings-proximity-premium-buys-14-days-before-earnings
6	A	en	sharpe-vs-dsr-how-to-read-backtest-claims-with-our-published-numbers
5	C	en	currency-normalized-comparisons-why-we-convert-everything-to-eur
5	A	en	from-filing-to-price-impact-t0-to-t30-daily-microstructure
4	A	en	insider-holding-period-reality-when-do-insiders-themselves-exit
4	A	en	beta-neutral-insider-portfolios-long-signal-short-index
3	A	en	insider-trades-and-etf-flows-do-they-precede-sector-etf-rebalances
3	A	en	why-swiss-insiders-are-anonymous-art-56-sesta-privacy-clash
3	A	en	amf-bdif-a-10-year-openness-experiment-what-we-learned
3	A	en	tax-loss-patterns-month-end-year-end-clustering-by-jurisdiction
3	A	en	sec-section-16-vs-mar-structural-differences-and-arbitrage-windows
3	C	en	the-3-business-day-rule-enforcement-asymmetry-across-regulators

Remaining 17 hits (EN tail + FR): see /tmp/blog-stale-audit.json for the full list.

Sample contexts (top 5)

All five samples are paragraph-level copies of the same V12 aggregate snippet:

[survivorship-bias] ...mean 90-day return of -1.70 percent and a win rate of 41.82 percent.
                    The annualised Sharpe ratio stands at -0.08, mirroring the aggregate
                    findings from the broader universe of 227,594 BUY filings...

[walk-forward]      ...mean 90-day return stands at -1.70 percent, accompanied by a win
                    rate of 41.82 percent. The annualized Sharpe ratio is -0.08...

[ceo-vs-cfo]        ...the mean 90-day return stands at -1.70 percent, with a win rate of
                    only 41.82 percent. The annualized Sharpe ratio is notably low at -0.08...

[liquidity-adj]     ...mean 90-day return stands at -1.70 percent, accompanied by a win
                    rate of 41.82 percent and an annualised Sharpe ratio of -0.08...

[do-women]          ...mean 90-day return of -1.70 percent with a win rate of 41.82 percent
                    and an annualised Sharpe ratio of -0.08. This performance is consistent
                    with the global universe context, which shows 227,594 BUY filings...

The phrasing varies cosmetically but the triplet -1.70 / 41.82 / -0.08 is identical across all 28 EN articles. This is a single shared LLM-generated paragraph that got templated into the generation pipeline.

Root cause

The blog generation prompt (V12-era) was seeded with the universe-aggregate cohort summary as a "factual anchor" and the model dutifully recycled it into every long-form article regardless of the actual subject. Once V13.x replaced V12 universe stats, the prompt was not re-grounded, so every fresh batch inherited the same dead triplet.

Remediation options

Option A. Sed-style mass substitute (4 hours, low risk)

Replace the dead literals with V13.5-current values inside bodyMdx for the 34 category-A articles. Pros: fast, deterministic, preserves SEO URLs + indexed copy. Cons: keeps the same paragraph structure, does not fix the deeper "every article cites universe aggregate regardless of topic" anti-pattern. Will not pass a careful human read because the new numbers still will not match the article's claimed subject.

Required mappings (need confirmation from V13.5 backfill):

-1.70 percent  -> <V13.5 universe mean 90d return>
41.82 percent  -> <V13.5 universe win rate>
-0.08 (Sharpe) -> <V13.5 universe Sharpe>
14.1 (mentions in 18 articles) -> <intended figure>

Action: produce a scripts/_replace-blog-literals.ts that does the substitution in one transaction with a dry-run flag.

Option B. Regen 34 category-A articles with new prompt referencing V13.5 (2-3 days)

Update the article-generation prompt to (1) inject the V13.5 cohort that actually matches the article's filter (small-cap, CEO, post-earnings, etc) instead of one global aggregate, (2) explicitly forbid the V12 literals. Regenerate the 34 A-list EN articles plus the 7 FR ones for parity. Keep slugs and publishedAt to preserve SEO.

Pros: real fix, articles become topic-coherent. Cons: cost (LLM tokens + review time), risk of new hallucinations, requires human pass.

Option C. Archive + republish (1 week)

Move 34 articles to status='archived' (404 with redirect to topic hub), draft fresh pieces with V13.5 stats from scratch, publish under new slugs. Pros: cleanest editorial reset. Cons: loses inbound links + indexed positions, longest path.

Recommendation

Hybrid B then A as fallback:

Day 1: pull V13.5 universe + per-subset cohort stats into a JSON registry (data/v13_5_cohorts.json). Source of truth for any blog citation.
Day 1: ship Option A as a stopgap on top-10 worst offenders only. Caveat-banner in MDX: "Stats refreshed 2026-05-20, V13.5 universe." Buys time and removes the worst stale claims from prod.
Day 2-4: regenerate the remaining 27 articles with new prompt that pulls from v13_5_cohorts.json keyed by article subject. Human review pass on each. Republish in batches of 5.
Day 5: re-run scripts/_audit-blog-stale.ts and confirm 0 hits remaining. Add the script to CI as a regression guard.

Also recommended:

Add the same audit logic to a pre-publish hook on BlogArticle so a future regen cannot reintroduce these literals.
Strip the 3 category-C articles' V12 numbers and replace them with explicit "V12 vs V13.5" comparison tables (the version-history framing is fine, the silent citation is not).
For the 7 FR hits, mostly low-literal-count, prefer Option A (1 or 2 sed substitutions per article) over full regen.

Files

scripts/_audit-blog-stale.ts audit script (read-only)
/tmp/blog-stale-audit.json full result dump (per-article matches, contexts, classification)
docs/method-review/91-blog-stale-stats-audit-2026-05-20.md this doc

No DB modifications were made

The audit is read-only. No update or delete issued. Remediation requires explicit go-ahead.