91. Blog Stale Stats Audit (V12/V13.x pollution)
Date: 2026-05-20
Scope: BlogArticle table, status='published' (104 rows, 52 EN + 52 FR).
Tool: scripts/_audit-blog-stale.ts (read-only).
Raw dump: /tmp/blog-stale-audit.json.
TL;DR
- 37 of 104 published articles still cite stale V12/V13.x backtest literals as truth.
- 30 EN + 7 FR. FR locale almost untouched (the FR copy of these articles paraphrases without the numbers).
- Dominant offender: the triplet
mean 90-day return -1.70 percent, win rate 41.82 percent, Sharpe -0.08lifted from the V12 aggregate cohort. It appears verbatim in 28 EN articles, often twice. - 34 of 37 cite the numbers with zero disclaimer (category A). 0 carry an "outdated/superseded by V13.5" wrapper (B). 3 are methodological and the citation could survive as historical context if framed (C).
- No FR article was misclassified into C. The 7 FR hits are 1 or 2 literal numbers, not paragraph-level stat dumps.
Pattern frequency
| pattern | hits |
|---|---|
Sharpe -0.08 (with "Sharpe" within 60 chars on same line) |
39 |
literal -1.70 (not part of a longer number) |
39 |
literal -0.08 (not part of a longer number) |
39 |
literal 14.1 (the AUM/CAGR mismatch flagged previously) |
18 |
| Sharpe 1.15 / 1.08 / 0.70, CAGR 25.4 / 43.2, DSR 0.33 / 0.53, hit rate 48-62 percent | 0 |
maxDD -7.x, deflated negative co-located with Sharpe |
0 |
Note: the V13.4 grid bake-off literals (Sharpe 1.15, CAGR 25.4, DSR 0.33) are not present in published blog copy. The infection vector is narrower than feared: it is the V12 universe cohort triplet plus one stray 14.1 literal in 18 articles.
Counts
- Scanned: 104 articles (52 EN + 52 FR,
status='published') - Hit at least once: 37 (35.6 percent)
- Category A (cited as fact, no disclaimer): 34
- Category B (cited with disclaimer / "superseded by V13.5"): 0
- Category C (methodological / version-history piece where citation is contextually OK): 3
- Hit by locale: EN 30 / 52 (57.7 percent), FR 7 / 52 (13.5 percent)
Top 20 worst offenders
(ranking by number of stale-literal matches across intro + descriptionMeta + bodyMdx)
| hits | cat | locale | slug |
|---|---|---|---|
| 7 | A | en | survivorship-bias-in-insider-studies-how-we-corrected-ours |
| 7 | A | en | walk-forward-validation-2022-2024-train-2025-2026-test |
| 7 | A | en | ceo-vs-cfo-buys-which-signal-wins-on-t90 |
| 6 | A | en | liquidity-adjusted-scoring-why-we-cap-trades-greatereuro1b |
| 6 | A | en | do-women-insiders-outperform-men-cross-market-gender-t90-analysis |
| 6 | A | en | multiple-testing-inflation-583k-grid-search-trials-how-to-deflate |
| 6 | C | en | the-composite-score-audit-why-we-use-30352510-weights |
| 6 | A | en | cluster-trades-revisiting-cohen-malloy-pomorski-on-17-markets |
| 6 | A | en | earnings-proximity-premium-buys-14-days-before-earnings |
| 6 | A | en | sharpe-vs-dsr-how-to-read-backtest-claims-with-our-published-numbers |
| 5 | C | en | currency-normalized-comparisons-why-we-convert-everything-to-eur |
| 5 | A | en | from-filing-to-price-impact-t0-to-t30-daily-microstructure |
| 4 | A | en | insider-holding-period-reality-when-do-insiders-themselves-exit |
| 4 | A | en | beta-neutral-insider-portfolios-long-signal-short-index |
| 3 | A | en | insider-trades-and-etf-flows-do-they-precede-sector-etf-rebalances |
| 3 | A | en | why-swiss-insiders-are-anonymous-art-56-sesta-privacy-clash |
| 3 | A | en | amf-bdif-a-10-year-openness-experiment-what-we-learned |
| 3 | A | en | tax-loss-patterns-month-end-year-end-clustering-by-jurisdiction |
| 3 | A | en | sec-section-16-vs-mar-structural-differences-and-arbitrage-windows |
| 3 | C | en | the-3-business-day-rule-enforcement-asymmetry-across-regulators |
Remaining 17 hits (EN tail + FR): see /tmp/blog-stale-audit.json for the full list.
Sample contexts (top 5)
All five samples are paragraph-level copies of the same V12 aggregate snippet:
[survivorship-bias] ...mean 90-day return of -1.70 percent and a win rate of 41.82 percent.
The annualised Sharpe ratio stands at -0.08, mirroring the aggregate
findings from the broader universe of 227,594 BUY filings...
[walk-forward] ...mean 90-day return stands at -1.70 percent, accompanied by a win
rate of 41.82 percent. The annualized Sharpe ratio is -0.08...
[ceo-vs-cfo] ...the mean 90-day return stands at -1.70 percent, with a win rate of
only 41.82 percent. The annualized Sharpe ratio is notably low at -0.08...
[liquidity-adj] ...mean 90-day return stands at -1.70 percent, accompanied by a win
rate of 41.82 percent and an annualised Sharpe ratio of -0.08...
[do-women] ...mean 90-day return of -1.70 percent with a win rate of 41.82 percent
and an annualised Sharpe ratio of -0.08. This performance is consistent
with the global universe context, which shows 227,594 BUY filings...
The phrasing varies cosmetically but the triplet -1.70 / 41.82 / -0.08 is identical across all 28 EN articles. This is a single shared LLM-generated paragraph that got templated into the generation pipeline.
Root cause
The blog generation prompt (V12-era) was seeded with the universe-aggregate cohort summary as a "factual anchor" and the model dutifully recycled it into every long-form article regardless of the actual subject. Once V13.x replaced V12 universe stats, the prompt was not re-grounded, so every fresh batch inherited the same dead triplet.
Remediation options
Option A. Sed-style mass substitute (4 hours, low risk)
Replace the dead literals with V13.5-current values inside bodyMdx for the 34 category-A articles. Pros: fast, deterministic, preserves SEO URLs + indexed copy. Cons: keeps the same paragraph structure, does not fix the deeper "every article cites universe aggregate regardless of topic" anti-pattern. Will not pass a careful human read because the new numbers still will not match the article's claimed subject.
Required mappings (need confirmation from V13.5 backfill):
-1.70 percent -> <V13.5 universe mean 90d return>
41.82 percent -> <V13.5 universe win rate>
-0.08 (Sharpe) -> <V13.5 universe Sharpe>
14.1 (mentions in 18 articles) -> <intended figure>
Action: produce a scripts/_replace-blog-literals.ts that does the substitution in one transaction with a dry-run flag.
Option B. Regen 34 category-A articles with new prompt referencing V13.5 (2-3 days)
Update the article-generation prompt to (1) inject the V13.5 cohort that actually matches the article's filter (small-cap, CEO, post-earnings, etc) instead of one global aggregate, (2) explicitly forbid the V12 literals. Regenerate the 34 A-list EN articles plus the 7 FR ones for parity. Keep slugs and publishedAt to preserve SEO.
Pros: real fix, articles become topic-coherent. Cons: cost (LLM tokens + review time), risk of new hallucinations, requires human pass.
Option C. Archive + republish (1 week)
Move 34 articles to status='archived' (404 with redirect to topic hub), draft fresh pieces with V13.5 stats from scratch, publish under new slugs. Pros: cleanest editorial reset. Cons: loses inbound links + indexed positions, longest path.
Recommendation
Hybrid B then A as fallback:
- Day 1: pull V13.5 universe + per-subset cohort stats into a JSON registry (
data/v13_5_cohorts.json). Source of truth for any blog citation. - Day 1: ship Option A as a stopgap on top-10 worst offenders only. Caveat-banner in MDX: "Stats refreshed 2026-05-20, V13.5 universe." Buys time and removes the worst stale claims from prod.
- Day 2-4: regenerate the remaining 27 articles with new prompt that pulls from
v13_5_cohorts.jsonkeyed by article subject. Human review pass on each. Republish in batches of 5. - Day 5: re-run
scripts/_audit-blog-stale.tsand confirm 0 hits remaining. Add the script to CI as a regression guard.
Also recommended:
- Add the same audit logic to a pre-publish hook on
BlogArticleso a future regen cannot reintroduce these literals. - Strip the 3 category-C articles' V12 numbers and replace them with explicit "V12 vs V13.5" comparison tables (the version-history framing is fine, the silent citation is not).
- For the 7 FR hits, mostly low-literal-count, prefer Option A (1 or 2 sed substitutions per article) over full regen.
Files
scripts/_audit-blog-stale.tsaudit script (read-only)/tmp/blog-stale-audit.jsonfull result dump (per-article matches, contexts, classification)docs/method-review/91-blog-stale-stats-audit-2026-05-20.mdthis doc
No DB modifications were made
The audit is read-only. No update or delete issued. Remediation requires explicit go-ahead.