13 · Feature Engineering — Inexploited Transaction Features
Generated: 2026-05-15
TL;DR
7 candidate features tested · 1 skipped (data unavailable) · 0 passed walk-forward OOS validation.
The recommendation engine (src/lib/recommendation-engine.ts) is unchanged.
No bucket dimensions added, no multiplicative bonuses applied. Cache key
reco-bucket-stats-v7 is retained.
Honest result: in-sample bucket spreads on 1,000 random declarations looked promising for three features (drawdown context, block-vs-scattered, sector momentum), but none of them survived strict walk-forward (24m train / 12m test, 2 OOS folds available within the 2021-12 → 2026-01 dataset).
Features tested
| # | Name | Definition | Status |
|---|---|---|---|
| 1 | drawdownCtx |
priceAtTrade position in [0,1] vs trailing 365d company priceAtTrade min/max (from prior BacktestResult rows of the same company — strictly PIT) |
tested, rejected |
| 2 | earningsProximity |
days to next earnings | SKIPPED — no Company.nextEarnings field in prisma/schema.prisma |
| 3 | priorCount |
# prior decls by same (insiderName, companyId) before pubDate |
tested in diagnostic, dropped before WF (non-monotonic) |
| 4 | blockCluster |
# other filings by same (insiderName, companyId) within ±7d |
tested, rejected |
| 5 | netStepPp |
totalAmount / marketCap * 100 (size of incremental position step) |
tested in diagnostic, dropped before WF (large step → worse in-sample, hypothesis falsified) |
| 6 | daysSinceLast |
days since prior filing on companyId (any insider) |
tested in diagnostic, dropped before WF (non-monotonic, sparse buckets) |
| 7 | sectorMom |
mean r90 of PRIOR decls in same sectorTag within trailing 90d window — in WF, restricted to decls with pubDate ≤ t − 90d so r90 is realized before scoring time |
tested, rejected |
In-sample diagnostic (1,000 random BUY decls, baseline r90 = −1.52%, winRate 41.6%)
Bucket means (r90 % / winRate % / n):
drawdownCtx — spread 4.91pp (monotonic in-sample)
- bottom (0-0.33): n=346, mean=+1.85%, wr=46.8%
- middle (0.33-0.67): n=150, mean=−1.63%, wr=46.0%
- top (0.67-1): n=173, mean=−3.06%, wr=34.1%
priorCount — spread 2.84pp (non-monotonic)
- firstTime (0): mean=−2.68%
- repeat (1-3): mean=−3.33%
- habitual (4+): mean=−0.49% ← unexpected direction
blockCluster — spread 2.92pp
- single (0): mean=+0.28%, wr=43.1%, n=385
- scattered (1+): mean=−2.64%, wr=40.7%, n=615
netStepPp — spread 6.97pp (large step underperforms — hypothesis falsified)
- small <0.5pp: mean=−2.24%, n=773
- medium 0.5-2pp: mean=+4.81%, n=20 (sparse)
- large >2pp: mean=−9.21%, n=35
daysSinceLast — spread 10.09pp (non-monotonic, sparse)
- active <7d: mean=−2.64%, n=731
- warming 7-30d: mean=+7.45%, n=95
- stale 30-180d: mean=−2.44%, n=106
- first 180+: mean=+4.44%, n=36
sectorMom — spread 21.27pp (large but uses look-ahead through prior r90)
- down <-5: mean=−10.15%, wr=23.1%, n=373
- flat -5..5: mean=+0.32%, wr=48.8%, n=424
- up 5+: mean=+11.12%, wr=61.4%, n=197
The sectorMom spread is dramatic but contaminated by lookahead in the diagnostic (it uses the realized r90 of prior decls, which for very recent prior decls would not yet be observable at the anchor's pubDate). The walk-forward harness corrects this by requiring prior pubMs ≤ t − 90d.
Walk-forward OOS results (24m train / 12m test, 2 folds, top-10/week, hold 90d)
Strategy = baseline A1 weights (35 signal / 25 winRate / 20 return / 20 recency). Variant strategies remove 5pts proportionally from the recency slice and add a 5pt feature bonus mapped from the candidate feature.
| Strategy | n picks | mean r90 % | winRate % | Sharpe_ann | Δ winRate | Δ Sharpe_ann | Δ mean | p (perm, 800 iter) | Decision |
|---|---|---|---|---|---|---|---|---|---|
| baseline (A1) | 1,060 | 4.14 | 50.9 | 0.858 | — | — | — | — | — |
| + drawdownInv (5·(1−ctx)) | 1,060 | 2.24 | 48.5 | −0.527 | −2.4 | −1.385 | −1.90 | 0.056 | REJECT |
| + blockSingle (5 if single) | 1,060 | 2.90 | 49.2 | −0.060 | −1.7 | −0.918 | −1.24 | 0.217 | REJECT |
| + sectorMom (logistic→5) | 1,060 | 4.12 | 51.6 | 0.701 | +0.7 | −0.157 | −0.02 | 0.983 | REJECT |
Validation threshold:
- ΔWinRate ≥ +2pp OR ΔSharpe_ann ≥ +0.15
- AND permutation p-value < 0.10
No variant clears the bar. Two clearly hurt the strategy (drawdownInv, blockSingle); sectorMom is statistically indistinguishable from baseline (Δmean = −0.02%, p = 0.983) once strict PIT is enforced.
Why the diagnostic / walk-forward gap?
- Sector momentum collapses under strict PIT. The 21.27pp in-sample spread shrinks to noise when the proxy can only use prior decls whose r90 was already realized at scoring time. That's the textbook signature of an in-sample artifact masquerading as alpha.
- Drawdown context flips sign in OOS. The "buy near 52w low" pattern (which was the only monotonic feature in-sample) loses badly in walk-forward (ΔSharpe_ann = −1.385). Likely cause: regime-dependence (the 2022 bear made "bottoming" prices keep falling), and the proxy itself is noisy when based on sparse same-company priceAtTrade history.
- Limited fold count (2) keeps Sharpe_ann estimates unstable. With 4 years of data and 24m train / 12m test, we can only roll twice. More history is the right unlock.
Engine changes
None. Per the honesty constraint, when no feature passes the validation
threshold we do not touch the engine. src/lib/recommendation-engine.ts,
src/lib/transaction-features.ts (not created), and the bucket cache key
reco-bucket-stats-v7 are all unchanged.
Reproducibility
# Diagnostic — 1,000 random BUY decls, in-sample bucket means
node --env-file=.env.local scripts/extract-transaction-features.mjs
# → /tmp/feature-extraction.json (regenerate as needed)
# Walk-forward backtest — strict OOS, baseline + 3 variants
node --env-file=.env.local scripts/backtest-feature-additions.mjs
# → /tmp/feature-backtest.json (regenerate as needed)
Seeds: 7 (sample shuffle), 42 (bootstrap CI), 11 (permutation p-values).
Recommended next steps
- Add
Company.nextEarnings(Yahoo Finance has it) and retest feature #2. - Wait for ≥1 more year of data to get a 3rd walk-forward fold and tighter Sharpe_ann error bars.
- Re-explore drawdown context using a true historical 52w-band proxy —
e.g. backfill
BacktestResult.fiftyTwoWeekHighAtPub/…LowAtPubfrom the Yahoo historical timeseries that already enrichesBacktestResult.price*— the same-company prior-priceAtTrade window is too sparse for many decls. - Re-test sector momentum with an external sector index series (CAC sectorial indices), avoiding the lookahead inherent to using same-cohort r90 as proxy.