V15 pattern-boost stack · audit (2026-05-20)

TL;DR

Ship verdict: KEEP V14e_tuned_light. V15 produces zero improvement over V14e across all 5 variants tested. Pattern boost does not move the needle.

Why V15 fails (mechanically)

The pattern miner (see 96-top-patterns-2026-05-20.md) surfaced 11 stable patterns. 10 of 11 are signatures dominated by KIND_direct + EP_0 + PMC_<0.01 i.e. small direct buys with no earnings proximity. These rows have very low V13.1g base scores (no role bonus, no cluster bonus, no pctMcap bonus) and are systematically outside the top-10 monthly selection.

Multiplying a near-zero base score by 1.05-1.40 still produces a near-zero post-boost score. The boost never displaces a real top-10 pick. Result: V15 picks the exact same 140 declarations per month as V14e. Sharpe identical to 4 decimals.

Match rate: 4 168 of 60 735 rows (6.9%) match a stable pattern. Of those, almost none survive the top-10 cut.

OOS results (2025+, Top-10/mo, T+90, NET 0.6% RT, winsor +/-50%)

config	Sharpe	CI95	CAGR%	DSR
V14e_baseline	1.39	[-0.45, 4.83]	50.3	0.29
V15a_full +0.05	1.39	[-0.45, 4.83]	50.3	0.29
V15b_strict +0.05	1.39	[-0.45, 4.83]	50.3	0.29
V15c_top3 +0.05	1.39	[-0.45, 4.83]	50.3	0.29
V15d_full +0.03	1.39	[-0.45, 4.83]	50.3	0.29
V15e_full +0.10	1.39	[-0.45, 4.83]	50.3	0.29

Note: V14e Sharpe 1.39 here vs 1.31 in V14 bake is reproducibility variance from a slightly different row-count snapshot (60 735 vs 59 952) and the additional sig-computation step. Both pass the gate.

Ship gate

check	threshold	actual	pass
Sharpe	>= 1.31	1.39	y (but identical to baseline)
Sharpe > V14e baseline	strict	1.39 == 1.39	no
DSR drop	<= 0.30	0	y
CI95Lo	>= -2.0	-0.45	y

Verdict: KEEP V14e_tuned_light LIVE. V15 brings no incremental value.

Why the pattern miner produced unusable boosts

Two structural issues:

Look-ahead bias. The miner was fed the full 2023-now cohort and asked which signatures had high mean returns. By construction, the discovered patterns are best-case in-sample. We did not split mining IS / validation OOS. Even if V15 had moved Sharpe, we could not have trusted the lift.
Bucket-mean optimization is orthogonal to top-10 selection. The scoring stack picks the top-10 ranked rows per month, not the top-10 highest-expected-return. A pattern with mean 47% but n=31 spread over 24 months contributes ~1 row/month, and that row needs to outrank the existing top-10 to matter. Boosting an already-mid-rank row by 5% rarely changes ranking when score variance is high.

What would actually work (V16+ proposals)

IS-only pattern mining: rerun mine-patterns.ts filtering pub < 2025-01-01, then validate the surfaced patterns OOS. Ship boosts only for patterns that hold OOS.
Ranking-aware boost: instead of multiplicative on base score, add a fixed score delta sized to actually move rankings (e.g. +1.0 per matched pattern, which for a typical V13.5 score in [2, 6] is a meaningful bump).
Pattern-conditioned alternative selection: bypass top-10 ranking entirely for rows matching ultra-high-confidence patterns (n>=100, OOS mean>=15%, OOS win-rate>=60%); add 1-2 "guaranteed" picks per month from these.
Cross-validate with role / kind dimensions that the V14 backfill will refresh. Spouse/child/parent cohort will grow; rerun mining then.

Action

Do not ship V15. V14e remains live.
Re-run pattern mining on IS-only after the data backfill completes.
Prototype V16 with score-delta boost (proposal 2 above) before next sprint.

Files

scripts/_v13_bakeoff/bake-v15_patterns.ts
scripts/_v13_bakeoff/stats-v15.json
scripts/_v13_bakeoff/patterns-stable.json
/tmp/bake-v15.log

Why V15 fails (mechanically)

Match rate: 4 168 of 60 735 rows (6.9%) match a stable pattern. Of those, almost none survive the top-10 cut.

OOS results (2025+, Top-10/mo, T+90, NET 0.6% RT, winsor +/-50%)

config

Sharpe

CI95

CAGR%

DSR

V14e_baseline

1.39

[-0.45, 4.83]

50.3

0.29

V15a_full +0.05

1.39

[-0.45, 4.83]

50.3

0.29

V15b_strict +0.05

1.39

[-0.45, 4.83]

50.3

0.29

V15c_top3 +0.05

1.39

[-0.45, 4.83]

50.3

0.29

V15d_full +0.03

1.39

[-0.45, 4.83]

50.3

0.29

V15e_full +0.10

1.39

[-0.45, 4.83]

50.3

0.29

check

threshold

actual

pass

Sharpe

>= 1.31

1.39

y (but identical to baseline)

Sharpe > V14e baseline

strict

1.39 == 1.39

DSR drop

<= 0.30

CI95Lo

>= -2.0

-0.45

Why the pattern miner produced unusable boosts

Two structural issues:

Look-ahead bias. The miner was fed the full 2023-now cohort and asked which signatures had high mean returns. By construction, the discovered patterns are best-case in-sample. We did not split mining IS / validation OOS. Even if V15 had moved Sharpe, we could not have trusted the lift.

Bucket-mean optimization is orthogonal to top-10 selection. The scoring stack picks the top-10 ranked rows per month, not the top-10 highest-expected-return. A pattern with mean 47% but n=31 spread over 24 months contributes ~1 row/month, and that row needs to outrank the existing top-10 to matter. Boosting an already-mid-rank row by 5% rarely changes ranking when score variance is high.

What would actually work (V16+ proposals)

IS-only pattern mining: rerun mine-patterns.ts filtering pub < 2025-01-01, then validate the surfaced patterns OOS. Ship boosts only for patterns that hold OOS.

Ranking-aware boost: instead of multiplicative on base score, add a fixed score delta sized to actually move rankings (e.g. +1.0 per matched pattern, which for a typical V13.5 score in [2, 6] is a meaningful bump).

Pattern-conditioned alternative selection: bypass top-10 ranking entirely for rows matching ultra-high-confidence patterns (n>=100, OOS mean>=15%, OOS win-rate>=60%); add 1-2 "guaranteed" picks per month from these.

Cross-validate with role / kind dimensions that the V14 backfill will refresh. Spouse/child/parent cohort will grow; rerun mining then.