80 · Geographic Regime Multiplier · V13.3 Bake-Off
Date: 2026-05-19
Status: REJECTED (overfit, kept V13.1g baseline)
Author: bake-off (scripts/_v13_bakeoff/geo-regime-stats.ts + bake-v13_3.ts)
Hypothesis
Certain markets are systemically more alpha than others (JP top picks 1Y +258 %, FR +30 %, US lower). Codify explicitly with a per-market scoring multiplier in [0.95, 1.15].
Phase 1 · Per-market baseline (BUY, priced T+90 / T+365)
In-sample window: pub 2021-01-01 → 2024-12-31. Source: BacktestResult ∩ Declaration ∩ Company.
| market | n_is | n_oos | winR_is % | med90_is | med365_is | mean365_is | med90_oos | med365_oos |
|---|---|---|---|---|---|---|---|---|
| XSTO | 6 | 3464 | 100.0 | 21.55 | 61.55 | 61.12 | -0.61 | 1.17 |
| XKRX | 116 | 0 | 25.9 | -0.61 | 14.48 | 6.86 | 0.00 | 0.00 |
| XTKS | 8535 | 0 | 53.5 | 1.36 | 13.13 | 17.88 | 0.00 | 0.00 |
| XMIL | 787 | 583 | 48.5 | -0.49 | 10.02 | 10.76 | -1.69 | -5.35 |
| XMAD | 1607 | 579 | 56.7 | 1.66 | 8.41 | 540.77 | 3.64 | 38.95 |
| XSHE | 1636 | 180 | 54.3 | 1.99 | 2.60 | 12.37 | 2.23 | 2.29 |
| XSHG | 1631 | 249 | 45.6 | -1.83 | 2.28 | 11.27 | 5.05 | 51.84 |
| BVMF | 31038 | 11296 | 44.0 | -1.48 | -2.10 | -2.56 | 1.68 | 37.44 |
| XPAR | 11581 | 3590 | 42.8 | -2.13 | -3.55 | -1.84 | 0.00 | -4.83 |
| XSWX | 494 | 0 | 49.4 | 0.00 | -3.74 | 1.28 | 0.00 | 0.00 |
| XHEL | 1455 | 482 | 41.0 | -2.08 | -4.67 | -0.86 | 0.46 | -3.38 |
| XBOM | 5203 | 0 | 66.0 | 6.15 | -6.73 | 4.89 | 0.00 | 0.00 |
| XWBO | 617 | 61 | 40.0 | -2.38 | -7.14 | -3.57 | 0.86 | 30.95 |
| XNAS | 1194 | 25 | 26.9 | -9.52 | -10.69 | 3.07 | 9.40 | 43.68 |
| XCSE | 361 | 82 | 43.2 | -3.58 | -11.82 | -8.89 | 2.35 | 9.49 |
| XAMS | 100 | 80 | 38.0 | -3.61 | -20.52 | -14.09 | 4.27 | 27.65 |
(small-n markets XBRU/XASX/XOSL/XLON/XETR/XSES/XHKG/XTSE/XNZE/XSAU/XJSE = no IS sample, default 1.00x)
Phase 2 · Calibrated multipliers
Rules (conservative, band [0.95, 1.15]):
- winR ≥ 55 % AND med90 > 5 % → 1.10
- winR ≥ 50 % AND med90 > 0 → 1.05
- n_is < 200 → 1.00 (no info)
- else → 0.95
| market | mult | reason |
|---|---|---|
| XBOM | 1.10 | winR 66.0 med90 6.15 |
| XTKS | 1.05 | winR 53.5 med90 1.36 |
| XMAD | 1.05 | winR 56.7 med90 1.66 |
| XSHE | 1.05 | winR 54.3 med90 1.99 |
| XMIL | 0.95 | winR 48.5 med90 -0.49 |
| XSHG | 0.95 | winR 45.6 med90 -1.83 |
| BVMF | 0.95 | winR 44.0 med90 -1.48 |
| XPAR | 0.95 | winR 42.8 med90 -2.13 |
| XSWX | 0.95 | winR 49.4 med90 0.00 |
| XHEL | 0.95 | winR 41.0 med90 -2.08 |
| XWBO | 0.95 | winR 40.0 med90 -2.38 |
| XNAS | 0.95 | winR 26.9 med90 -9.52 |
| XCSE | 0.95 | winR 43.2 med90 -3.58 |
Note · IS/OOS sign flips on multiple markets (XPAR med365 IS -3.55 → OOS -4.83 stable; BVMF -2.10 → +37.44 flipped; XNAS -10.69 → +43.68 flipped; XAMS -20.52 → +27.65 flipped). This is the first red flag.
Phase 3 · V13.3 OOS Bake-Off
Pipeline: identical to V13.1 bake (top-10 picks/month, T+90 hold, NET 0.6 % round-trip, winsor ±50 %). OOS window: pub 2025-01-01 → 2026-05-19. T = 14 monthly buckets. DSR uses N=14 trial cohort (11 V13.1 variants + 3 new V13.3 variants).
| config | T | picks | Sharpe | CI95 (boot) | CAGR % | MaxDD % | Win % | DSR |
|---|---|---|---|---|---|---|---|---|
| V13.1g_baseline | 14 | 140 | 0.70 | [-0.96, 3.51] | 25.4 | -29.7 | 55.7 | 0.28 |
| V13.3_geo | 14 | 140 | 0.48 | [-1.24, 3.03] | 15.2 | -32.7 | 54.3 | 0.06 |
| V13.3_geo_conservative | 14 | 140 | 0.70 | [-1.05, 3.54] | 25.2 | -29.8 | 55.7 | 0.27 |
Decision · REJECTED
- V13.3_geo: Sharpe drop -0.22 (0.70 → 0.48), DSR drop -0.22 (0.28 → 0.06), MaxDD widens -3.0 pp, Win % drops -1.4 pp. Anti-overfit guard says DSR drop > 0.3 → reject. We are at -0.22 which is borderline, but the geographic signal also flips IS→OOS sign on 4 of 13 markets (BVMF, XNAS, XAMS, XWBO), making the IS calibration unreliable. Reject.
- V13.3_geo_conservative (band narrowed to [0.97, 1.05]): Sharpe ~flat (0.70 → 0.70), DSR -0.01. Effectively a no-op once squeezed below noise floor. No reason to add complexity.
Kept baseline: V13.1g (Sharpe 0.70 / DSR 0.28).
Root cause of overfit
- Per-market baselines are computed on the same population of insider trades the score then re-ranks. Multiplier compresses or stretches scores already conditioned on the same data.
- IS→OOS regime instability: 4 / 13 markets flip sign of med365 IS vs OOS, including the two largest by sample (BVMF n=31k, XPAR n=11.5k). Calibrating a multiplier off IS is then anti-predictive.
- The 1Y top-pick spread cited in the hypothesis (JP +258 %, FR +30 %) is dominated by mean-not-median heavy tails. The top of the distribution does not generalise to top-10 monthly portfolio behaviour.
What is kept
- Per-market stats table (this doc) for reference and copy editorial.
- Script
scripts/_v13_bakeoff/geo-regime-stats.tsandbake-v13_3.tsretained for future re-tests when sample grows. src/lib/signals.tsunchanged. No newMARKET_REGIME_MULTmap shipped.
Cross-refs
docs/method-review/78-quant-per-market-sector-2026-05-19.md(V13.1g winner)scripts/_v13_bakeoff/bake.ts(V13.1 ensemble bake)