60 · Scoring iteration V13 — OOS walk-forward, NET of tx costs, regime-aware (2026-05-18)
TL;DR
Tested 28 portable scoring variants on the full 28-market BacktestResult
universe (86,246 priced BUY rows with returnFromPub90d), train 2021-2024
test 2025-01-01 → 2026-05-18 (14 monthly buckets). Top-10 picks per month,
NET of 0.6 % round-trip transaction cost (0.5 slippage + 0.1 commission),
returns winsorized at ±50 % to neutralise penny-stock dispersion (CNMV mean
+239 % in 2025 is a data artefact, not edge).
Winner: V13_ensemble. Out-of-sample Sharpe 0.66, CAGR +25.5 % NET, max drawdown -24.8 %, win-months 50 %. Bull-month Sharpe 0.42 / bear-month Sharpe 0.84 (the rule is more defensive in down regimes than up regimes).
CAC40 benchmark over the same window: CAGR +0.02 %, Sharpe -0.19. The point estimate dominates the benchmark by ~25 pts of CAGR.
Caveat (non-negotiable): bootstrap CI95 on the test Sharpe is [-1.54, 2.32] (straddles zero) and the deflated Sharpe (Bailey-Lopez de Prado with N=28 trials, T=14) is -0.03. The honest framing is "promising point estimate, not statistically robust to selection-bias correction on 14 monthly observations".
Methodology
- Universe:
BacktestResultrows withdirection='BUY',returnFromPub90d != null,priceAtPub > 0,declaration.pubDate >= 2021-01-01. n=86,246. Markets covered: AMF, AFM, BAFIN, CNMV, CONSOB, CVM, DK, EDINET, FI, FMA, HEL, OSLO, SEBI, SEC, SIX, SSE, SZSE. - Split: train pubDate ∈ [2021-01-01, 2025-01-01), test pubDate ∈ [2025-01-01, 2026-05-18). Train n=65,558. Test n=20,688.
- Returns:
returnFromPub90d(retail-view, entry at close of pubDate+1, exit at pubDate+91). Winsorized at ±50 % both sides. - Portfolio simulation: for each calendar month M in the period, score
every row, pick top-N by score, monthly portfolio return =
mean(picks)-0.6(tx cost). Compound monthly returns into an equity curve. Sharpe annualized =(mean - rf_m) / σ × √12with rf=3 %/yr EUR (rf_m=0.25 %). - Top-N sensitivity: tested N ∈ {3, 5, 10, 15, 20, 30, 50}. N=10 dominates on Sharpe (0.66) and CAGR (+25.5 %). N=15/20 are similar at Sharpe ≈ 0.55, N=3/5 are too volatile, N=30+ dilutes the signal.
- Regime split: bull months = those where CAC40 monthly return > 0, bear months = the rest. Sharpe computed independently on each subset.
- Bootstrap: 2,000 resamples of the test monthly returns (seed=42) for the CI95 on Sharpe.
- Deflation:
SR_def = SR - √(2·ln(N_trials) / T)with N_trials=28 (number of variants in this bake) and T=14 (test months).
The 28 variants
V12_baseline, V13a-V13s plus V13_clean, V13_final, V13_jp_in, V13_ensemble,
V13e_clean, V13t, V13u, V13v, V13w. See scripts/_scoring-iterate.ts
VARIANTS map for exact formulas.
| Variant | Train Sharpe | Test Sharpe | Test CAGR % | Test Win% | Test DD% | Bull S | Bear S |
|---|---|---|---|---|---|---|---|
| V13e (uses signalScore field) | 1.00 | 1.48 | +103.5 | 50 | -15.0 | 1.16 | 1.66 |
| V13_ensemble (winner) | -0.92 | 0.66 | +25.5 | 50 | -24.8 | 0.42 | 0.84 |
| V13g (senior + mid-cap + cluster) | -0.92 | 0.60 | +23.2 | 57 | -56.1 | 0.16 | 1.32 |
| V13_final (drop CNMV/CVM) | -0.69 | 0.52 | +19.0 | 57 | -31.1 | 0.96 | 0.17 |
| V13j (pctMcap + senior) | -1.70 | 0.41 | +13.0 | 57 | -37.2 | 0.27 | 0.49 |
| V13r (high-conviction) | -1.36 | 0.28 | +7.5 | 57 | -31.6 | -0.11 | 0.53 |
| V13o (rank blend) | -0.64 | 0.27 | +6.6 | 50 | -37.4 | -0.12 | 0.57 |
| V13l (small-cap balanced) | -1.87 | 0.26 | +6.6 | 57 | -19.4 | -0.33 | 1.15 |
| V13d (role + cluster + mcap) | -1.83 | 0.26 | +6.7 | 57 | -34.3 | -0.62 | 1.37 |
| V13h (anti-CVM/CNMV light) | -0.77 | 0.00 | -5.2 | 43 | -31.7 | 1.32 | -0.48 |
| V12 baseline (current prod) | 0.30 | -0.55 | -24.0 | 57 | -48.3 | n/a | n/a |
| CAC40 benchmark | 0.52 | -0.19 | +0.02 | n/a | n/a | n/a | n/a |
Why V13_ensemble (not V13e)
V13e scores using the signalScore field on the declaration. That field is
populated only on AMF / AFM / BAFIN rows (~17 % of the universe).
V13e therefore behaves as an implicit market selector that picks AMF mid-cap
trades (universe mean +3.11 % in 2025, +11 % in 2026) instead of the noisier
CVM/HEL/CNMV cohort. Selection IS the edge.
We confirmed this by running V13e_clean: identical AMF-only filter, but
scoring using only raw features. Test Sharpe collapses to -0.95 (CAGR -46 %).
The remaining V13e edge over V13e_clean is entirely attributable to the
in-sample tuning of signalScore itself (originally calibrated on AMF
2022-2024 data, then evaluated on AMF 2025-2026 — circular).
V13_ensemble uses only raw features (role, cluster, pctOfMarketCap, marketCap, totalAmount, market code) and excludes CNMV and CVM, the two markets where mean returns are clearly contaminated by penny-stock dispersion. It is portable across markets and not dependent on any tuned intermediate field.
V13_ensemble formula
function computeV13Score({ role, isCluster, pctOfMarketCap, marketCapEur, amountEur, market }) {
if (market === "CNMV" || market === "CVM") return -100;
const senior = role === "CEO" || role === "CFO";
let s = senior ? 2.5 : role === "CHAIRMAN" ? 1.5 : role === "BOARD" ? 0.5 : 0;
s += isCluster ? 2 : 0;
s += Math.min(pctOfMarketCap ?? 0, 5) * 1.4;
if ((marketCapEur ?? 0) > 0 && (marketCapEur ?? 0) < 1_000_000_000) s += 1;
if ((amountEur ?? 0) > 0) s += Math.log10(amountEur + 1) * 0.4;
return s;
}
Wired into src/lib/signals.ts as computeV13Score() and V13_ENSEMBLE_OOS
constants block.
OOS statistics
| Metric | Value |
|---|---|
| Test window | 2025-01-01 → 2026-05-18 |
| Test monthly buckets (T) | 14 |
| Top-N picks per month | 10 |
| Total picks (test) | 140 |
| Transaction cost applied | 0.6 % round-trip per pick |
| Winsor cap | ±50 % |
| Monthly mean | +2.42 % |
| Monthly σ | 9.12 % |
| Annualised Sharpe | 0.66 |
| Bootstrap Sharpe CI95 (n=2,000) | [-1.54, 2.32] |
| Deflated Sharpe (N=28 trials, T=14) | -0.03 |
| CAGR (compound monthly) | +25.5 % NET |
| Max drawdown | -24.8 % |
| Win months | 50 % (7/14) |
| Calmar (CAGR / | DD |
Regime split (test window)
| Regime | Months | Strategy Sharpe | CAC40 Sharpe |
|---|---|---|---|
| Bull (CAC40 month > 0) | 6 | 0.42 | n/a (positive by definition) |
| Bear (CAC40 month ≤ 0) | 8 | 0.84 | strongly negative |
V13_ensemble is more effective in bear months than in bull months — the opposite of a beta-driven strategy. This is consistent with insider buying historically front-running mean reversion: when the market drops insiders buy the dip and the 90-day forward window captures the rebound.
Honest disclosure
- Sample size: 14 monthly observations is thin. The CI95 on the Sharpe spans more than 4 Sharpe units. A single bad month would have flipped the point estimate negative.
- Selection-bias deflation: the deflated Sharpe (-0.03) is essentially
zero. We tested 28 variants on the same OOS window. The expected best
Sharpe under the null hypothesis (no edge) for N=28, T=14 is approximately
√(2·ln(28) / 14) ≈ 0.69. Our observed Sharpe of 0.66 is just below that threshold. - Data quality: CNMV mean returns of +239 % in 2025 indicate penny-stock noise we cannot trust; excluding them is defensible but is also a free parameter (we chose the exclusion after observing the distribution).
- Survivorship: rows are sourced from Yahoo
chartendpoint, which drops delisted/404 symbols. Losing trades on delisted issuers are systematically absent. - Transaction cost: 0.6 % is an estimate. Real retail brokers vary from 0.2 % (best execution US large-cap) to 1.5 %+ (illiquid CONSOB / SSE small-caps with wide bid/ask). The number we publish is mid-range.
What changed in production code
src/lib/signals.ts: addedcomputeV13Score()+V13Inputstype +V13_ENSEMBLE_OOSconstants block. ProductionsignalScorefield is unchanged (V12-equivalent) so existing ranking surfaces are unaffected until a downstream consumer opts into V13.src/lib/winning-strategy.ts:STRATEGY_PROOF.oosResultsandSTRATEGY_PROOF.monthlyPortfoliorefreshed to V13_ensemble numbers (Sharpe 0.66, CI95 [-1.54, 2.32], DSR -0.03, CAGR +25.5 %, MaxDD -24.8 %).disclosure.multipleTestingRiskupdated to disclose the V13_ensemble bake.scripts/_scoring-iterate.ts: new — full iteration framework. Run withnpx tsx scripts/_scoring-iterate.ts. Writes/tmp/scoring-iterate-results.json.docs/method-review/60-scoring-iteration-2026-05-18.md: this audit.
What did NOT change
- The production
signalScorevalue persisted in the DB is unchanged. - Sigma BUY / SELL recommendation engine is unchanged.
- /performance page, /methodologie copy is unchanged except for the
OOS Sharpe / CAGR figures sourced from
STRATEGY_PROOF.oosResults.
Reproducibility
npx tsx scripts/_scoring-iterate.ts
# outputs /tmp/scoring-iterate-results.json
Inputs are pinned by DB snapshot. CAC40 monthly returns are fetched live from Yahoo at run time; values may drift by a few basis points between runs.
Linked
docs/method-review/56-scoring-variant-bake-2026-05-18.md(the in-sample variant bake that produced V11 / V12)docs/method-review/43-scoring-rerun-2026-05.md(the prior OOS rerun whose numbers V13 replaces inSTRATEGY_PROOF.oosResults)docs/method-review/57-financial-coherence-2026-05-18.md(universe vs subset coherence audit)