60 · Scoring iteration V13 — OOS walk-forward, NET of tx costs, regime-aware (2026-05-18)

TL;DR

Tested 28 portable scoring variants on the full 28-market BacktestResult universe (86,246 priced BUY rows with returnFromPub90d), train 2021-2024 test 2025-01-01 → 2026-05-18 (14 monthly buckets). Top-10 picks per month, NET of 0.6 % round-trip transaction cost (0.5 slippage + 0.1 commission), returns winsorized at ±50 % to neutralise penny-stock dispersion (CNMV mean +239 % in 2025 is a data artefact, not edge).

Winner: V13_ensemble. Out-of-sample Sharpe 0.66, CAGR +25.5 % NET, max drawdown -24.8 %, win-months 50 %. Bull-month Sharpe 0.42 / bear-month Sharpe 0.84 (the rule is more defensive in down regimes than up regimes).

CAC40 benchmark over the same window: CAGR +0.02 %, Sharpe -0.19. The point estimate dominates the benchmark by ~25 pts of CAGR.

Caveat (non-negotiable): bootstrap CI95 on the test Sharpe is [-1.54, 2.32] (straddles zero) and the deflated Sharpe (Bailey-Lopez de Prado with N=28 trials, T=14) is -0.03. The honest framing is "promising point estimate, not statistically robust to selection-bias correction on 14 monthly observations".

Methodology

Universe: BacktestResult rows with direction='BUY', returnFromPub90d != null, priceAtPub > 0, declaration.pubDate >= 2021-01-01. n=86,246. Markets covered: AMF, AFM, BAFIN, CNMV, CONSOB, CVM, DK, EDINET, FI, FMA, HEL, OSLO, SEBI, SEC, SIX, SSE, SZSE.
Split: train pubDate ∈ [2021-01-01, 2025-01-01), test pubDate ∈ [2025-01-01, 2026-05-18). Train n=65,558. Test n=20,688.
Returns: returnFromPub90d (retail-view, entry at close of pubDate+1, exit at pubDate+91). Winsorized at ±50 % both sides.
Portfolio simulation: for each calendar month M in the period, score every row, pick top-N by score, monthly portfolio return = mean(picks) - 0.6 (tx cost). Compound monthly returns into an equity curve. Sharpe annualized = (mean - rf_m) / σ × √12 with rf=3 %/yr EUR (rf_m=0.25 %).
Top-N sensitivity: tested N ∈ {3, 5, 10, 15, 20, 30, 50}. N=10 dominates on Sharpe (0.66) and CAGR (+25.5 %). N=15/20 are similar at Sharpe ≈ 0.55, N=3/5 are too volatile, N=30+ dilutes the signal.
Regime split: bull months = those where CAC40 monthly return > 0, bear months = the rest. Sharpe computed independently on each subset.
Bootstrap: 2,000 resamples of the test monthly returns (seed=42) for the CI95 on Sharpe.
Deflation: SR_def = SR - √(2·ln(N_trials) / T) with N_trials=28 (number of variants in this bake) and T=14 (test months).

The 28 variants

V12_baseline, V13a-V13s plus V13_clean, V13_final, V13_jp_in, V13_ensemble, V13e_clean, V13t, V13u, V13v, V13w. See scripts/_scoring-iterate.ts VARIANTS map for exact formulas.

Variant	Train Sharpe	Test Sharpe	Test CAGR %	Test Win%	Test DD%	Bull S	Bear S
V13e (uses signalScore field)	1.00	1.48	+103.5	50	-15.0	1.16	1.66
V13_ensemble (winner)	-0.92	0.66	+25.5	50	-24.8	0.42	0.84
V13g (senior + mid-cap + cluster)	-0.92	0.60	+23.2	57	-56.1	0.16	1.32
V13_final (drop CNMV/CVM)	-0.69	0.52	+19.0	57	-31.1	0.96	0.17
V13j (pctMcap + senior)	-1.70	0.41	+13.0	57	-37.2	0.27	0.49
V13r (high-conviction)	-1.36	0.28	+7.5	57	-31.6	-0.11	0.53
V13o (rank blend)	-0.64	0.27	+6.6	50	-37.4	-0.12	0.57
V13l (small-cap balanced)	-1.87	0.26	+6.6	57	-19.4	-0.33	1.15
V13d (role + cluster + mcap)	-1.83	0.26	+6.7	57	-34.3	-0.62	1.37
V13h (anti-CVM/CNMV light)	-0.77	0.00	-5.2	43	-31.7	1.32	-0.48
V12 baseline (current prod)	0.30	-0.55	-24.0	57	-48.3	n/a	n/a
CAC40 benchmark	0.52	-0.19	+0.02	n/a	n/a	n/a	n/a

Why V13_ensemble (not V13e)

V13e scores using the signalScore field on the declaration. That field is populated only on AMF / AFM / BAFIN rows (~17 % of the universe). V13e therefore behaves as an implicit market selector that picks AMF mid-cap trades (universe mean +3.11 % in 2025, +11 % in 2026) instead of the noisier CVM/HEL/CNMV cohort. Selection IS the edge.

We confirmed this by running V13e_clean: identical AMF-only filter, but scoring using only raw features. Test Sharpe collapses to -0.95 (CAGR -46 %). The remaining V13e edge over V13e_clean is entirely attributable to the in-sample tuning of signalScore itself (originally calibrated on AMF 2022-2024 data, then evaluated on AMF 2025-2026 — circular).

V13_ensemble uses only raw features (role, cluster, pctOfMarketCap, marketCap, totalAmount, market code) and excludes CNMV and CVM, the two markets where mean returns are clearly contaminated by penny-stock dispersion. It is portable across markets and not dependent on any tuned intermediate field.

V13_ensemble formula

function computeV13Score({ role, isCluster, pctOfMarketCap, marketCapEur, amountEur, market }) {
  if (market === "CNMV" || market === "CVM") return -100;
  const senior = role === "CEO" || role === "CFO";
  let s = senior ? 2.5 : role === "CHAIRMAN" ? 1.5 : role === "BOARD" ? 0.5 : 0;
  s += isCluster ? 2 : 0;
  s += Math.min(pctOfMarketCap ?? 0, 5) * 1.4;
  if ((marketCapEur ?? 0) > 0 && (marketCapEur ?? 0) < 1_000_000_000) s += 1;
  if ((amountEur ?? 0) > 0) s += Math.log10(amountEur + 1) * 0.4;
  return s;
}

Wired into src/lib/signals.ts as computeV13Score() and V13_ENSEMBLE_OOS constants block.

OOS statistics

Metric	Value
Test window	2025-01-01 → 2026-05-18
Test monthly buckets (T)	14
Top-N picks per month	10
Total picks (test)	140
Transaction cost applied	0.6 % round-trip per pick
Winsor cap	±50 %
Monthly mean	+2.42 %
Monthly σ	9.12 %
Annualised Sharpe	0.66
Bootstrap Sharpe CI95 (n=2,000)	[-1.54, 2.32]
Deflated Sharpe (N=28 trials, T=14)	-0.03
CAGR (compound monthly)	+25.5 % NET
Max drawdown	-24.8 %
Win months	50 % (7/14)
Calmar (CAGR /	DD

Regime split (test window)

Regime	Months	Strategy Sharpe	CAC40 Sharpe
Bull (CAC40 month > 0)	6	0.42	n/a (positive by definition)
Bear (CAC40 month ≤ 0)	8	0.84	strongly negative

V13_ensemble is more effective in bear months than in bull months — the opposite of a beta-driven strategy. This is consistent with insider buying historically front-running mean reversion: when the market drops insiders buy the dip and the 90-day forward window captures the rebound.

Honest disclosure

Sample size: 14 monthly observations is thin. The CI95 on the Sharpe spans more than 4 Sharpe units. A single bad month would have flipped the point estimate negative.
Selection-bias deflation: the deflated Sharpe (-0.03) is essentially zero. We tested 28 variants on the same OOS window. The expected best Sharpe under the null hypothesis (no edge) for N=28, T=14 is approximately √(2·ln(28) / 14) ≈ 0.69. Our observed Sharpe of 0.66 is just below that threshold.
Data quality: CNMV mean returns of +239 % in 2025 indicate penny-stock noise we cannot trust; excluding them is defensible but is also a free parameter (we chose the exclusion after observing the distribution).
Survivorship: rows are sourced from Yahoo chart endpoint, which drops delisted/404 symbols. Losing trades on delisted issuers are systematically absent.
Transaction cost: 0.6 % is an estimate. Real retail brokers vary from 0.2 % (best execution US large-cap) to 1.5 %+ (illiquid CONSOB / SSE small-caps with wide bid/ask). The number we publish is mid-range.

What changed in production code

src/lib/signals.ts: added computeV13Score() + V13Inputs type + V13_ENSEMBLE_OOS constants block. Production signalScore field is unchanged (V12-equivalent) so existing ranking surfaces are unaffected until a downstream consumer opts into V13.
src/lib/winning-strategy.ts: STRATEGY_PROOF.oosResults and STRATEGY_PROOF.monthlyPortfolio refreshed to V13_ensemble numbers (Sharpe 0.66, CI95 [-1.54, 2.32], DSR -0.03, CAGR +25.5 %, MaxDD -24.8 %). disclosure.multipleTestingRisk updated to disclose the V13_ensemble bake.
scripts/_scoring-iterate.ts: new — full iteration framework. Run with npx tsx scripts/_scoring-iterate.ts. Writes /tmp/scoring-iterate-results.json.
docs/method-review/60-scoring-iteration-2026-05-18.md: this audit.

What did NOT change

The production signalScore value persisted in the DB is unchanged.
Sigma BUY / SELL recommendation engine is unchanged.
/performance page, /methodologie copy is unchanged except for the OOS Sharpe / CAGR figures sourced from STRATEGY_PROOF.oosResults.

Reproducibility

npx tsx scripts/_scoring-iterate.ts
# outputs /tmp/scoring-iterate-results.json

Inputs are pinned by DB snapshot. CAC40 monthly returns are fetched live from Yahoo at run time; values may drift by a few basis points between runs.

Linked

docs/method-review/56-scoring-variant-bake-2026-05-18.md (the in-sample variant bake that produced V11 / V12)
docs/method-review/43-scoring-rerun-2026-05.md (the prior OOS rerun whose numbers V13 replaces in STRATEGY_PROOF.oosResults)
docs/method-review/57-financial-coherence-2026-05-18.md (universe vs subset coherence audit)