13 · Feature Engineering — Inexploited Transaction Features

Generated: 2026-05-15

TL;DR

7 candidate features tested · 1 skipped (data unavailable) · 0 passed walk-forward OOS validation.

The recommendation engine (src/lib/recommendation-engine.ts) is unchanged. No bucket dimensions added, no multiplicative bonuses applied. Cache key reco-bucket-stats-v7 is retained.

Honest result: in-sample bucket spreads on 1,000 random declarations looked promising for three features (drawdown context, block-vs-scattered, sector momentum), but none of them survived strict walk-forward (24m train / 12m test, 2 OOS folds available within the 2021-12 → 2026-01 dataset).

Features tested

#	Name	Definition	Status
1	`drawdownCtx`	`priceAtTrade` position in [0,1] vs trailing 365d company priceAtTrade min/max (from prior `BacktestResult` rows of the same company — strictly PIT)	tested, rejected
2	`earningsProximity`	days to next earnings	SKIPPED — no `Company.nextEarnings` field in `prisma/schema.prisma`
3	`priorCount`	# prior decls by same `(insiderName, companyId)` before `pubDate`	tested in diagnostic, dropped before WF (non-monotonic)
4	`blockCluster`	# other filings by same `(insiderName, companyId)` within ±7d	tested, rejected
5	`netStepPp`	`totalAmount / marketCap * 100` (size of incremental position step)	tested in diagnostic, dropped before WF (large step → worse in-sample, hypothesis falsified)
6	`daysSinceLast`	days since prior filing on `companyId` (any insider)	tested in diagnostic, dropped before WF (non-monotonic, sparse buckets)
7	`sectorMom`	mean r90 of PRIOR decls in same `sectorTag` within trailing 90d window — in WF, restricted to decls with `pubDate ≤ t − 90d` so r90 is realized before scoring time	tested, rejected

In-sample diagnostic (1,000 random BUY decls, baseline r90 = −1.52%, winRate 41.6%)

Bucket means (r90 % / winRate % / n):

`drawdownCtx` — spread 4.91pp (monotonic in-sample)

bottom (0-0.33): n=346, mean=+1.85%, wr=46.8%
middle (0.33-0.67): n=150, mean=−1.63%, wr=46.0%
top (0.67-1): n=173, mean=−3.06%, wr=34.1%

`priorCount` — spread 2.84pp (non-monotonic)

firstTime (0): mean=−2.68%
repeat (1-3): mean=−3.33%
habitual (4+): mean=−0.49% ← unexpected direction

`blockCluster` — spread 2.92pp

single (0): mean=+0.28%, wr=43.1%, n=385
scattered (1+): mean=−2.64%, wr=40.7%, n=615

`netStepPp` — spread 6.97pp (large step underperforms — hypothesis falsified)

small <0.5pp: mean=−2.24%, n=773
medium 0.5-2pp: mean=+4.81%, n=20 (sparse)
large >2pp: mean=−9.21%, n=35

`daysSinceLast` — spread 10.09pp (non-monotonic, sparse)

active <7d: mean=−2.64%, n=731
warming 7-30d: mean=+7.45%, n=95
stale 30-180d: mean=−2.44%, n=106
first 180+: mean=+4.44%, n=36

`sectorMom` — spread 21.27pp (large but uses look-ahead through prior r90)

down <-5: mean=−10.15%, wr=23.1%, n=373
flat -5..5: mean=+0.32%, wr=48.8%, n=424
up 5+: mean=+11.12%, wr=61.4%, n=197

The sectorMom spread is dramatic but contaminated by lookahead in the diagnostic (it uses the realized r90 of prior decls, which for very recent prior decls would not yet be observable at the anchor's pubDate). The walk-forward harness corrects this by requiring prior pubMs ≤ t − 90d.

Walk-forward OOS results (24m train / 12m test, 2 folds, top-10/week, hold 90d)

Strategy = baseline A1 weights (35 signal / 25 winRate / 20 return / 20 recency). Variant strategies remove 5pts proportionally from the recency slice and add a 5pt feature bonus mapped from the candidate feature.

Strategy	n picks	mean r90 %	winRate %	Sharpe_ann	Δ winRate	Δ Sharpe_ann	Δ mean	p (perm, 800 iter)	Decision
baseline (A1)	1,060	4.14	50.9	0.858	—	—	—	—	—
+ drawdownInv (5·(1−ctx))	1,060	2.24	48.5	−0.527	−2.4	−1.385	−1.90	0.056	REJECT
+ blockSingle (5 if single)	1,060	2.90	49.2	−0.060	−1.7	−0.918	−1.24	0.217	REJECT
+ sectorMom (logistic→5)	1,060	4.12	51.6	0.701	+0.7	−0.157	−0.02	0.983	REJECT

Validation threshold:

ΔWinRate ≥ +2pp OR ΔSharpe_ann ≥ +0.15
AND permutation p-value < 0.10

No variant clears the bar. Two clearly hurt the strategy (drawdownInv, blockSingle); sectorMom is statistically indistinguishable from baseline (Δmean = −0.02%, p = 0.983) once strict PIT is enforced.

Why the diagnostic / walk-forward gap?

Sector momentum collapses under strict PIT. The 21.27pp in-sample spread shrinks to noise when the proxy can only use prior decls whose r90 was already realized at scoring time. That's the textbook signature of an in-sample artifact masquerading as alpha.
Drawdown context flips sign in OOS. The "buy near 52w low" pattern (which was the only monotonic feature in-sample) loses badly in walk-forward (ΔSharpe_ann = −1.385). Likely cause: regime-dependence (the 2022 bear made "bottoming" prices keep falling), and the proxy itself is noisy when based on sparse same-company priceAtTrade history.
Limited fold count (2) keeps Sharpe_ann estimates unstable. With 4 years of data and 24m train / 12m test, we can only roll twice. More history is the right unlock.

Engine changes

None. Per the honesty constraint, when no feature passes the validation threshold we do not touch the engine. src/lib/recommendation-engine.ts, src/lib/transaction-features.ts (not created), and the bucket cache key reco-bucket-stats-v7 are all unchanged.

Reproducibility

# Diagnostic — 1,000 random BUY decls, in-sample bucket means
node --env-file=.env.local scripts/extract-transaction-features.mjs
# → /tmp/feature-extraction.json (regenerate as needed)

# Walk-forward backtest — strict OOS, baseline + 3 variants
node --env-file=.env.local scripts/backtest-feature-additions.mjs
# → /tmp/feature-backtest.json (regenerate as needed)

Seeds: 7 (sample shuffle), 42 (bootstrap CI), 11 (permutation p-values).

Recommended next steps

Add Company.nextEarnings (Yahoo Finance has it) and retest feature #2.
Wait for ≥1 more year of data to get a 3rd walk-forward fold and tighter Sharpe_ann error bars.
Re-explore drawdown context using a true historical 52w-band proxy — e.g. backfill BacktestResult.fiftyTwoWeekHighAtPub / …LowAtPub from the Yahoo historical timeseries that already enriches BacktestResult.price* — the same-company prior-priceAtTrade window is too sparse for many decls.
Re-test sector momentum with an external sector index series (CAC sectorial indices), avoiding the lookahead inherent to using same-cohort r90 as proxy.

13 · Feature Engineering — Inexploited Transaction Features

Generated: 2026-05-15

TL;DR

7 candidate features tested · 1 skipped (data unavailable) · 0 passed walk-forward OOS validation.

The recommendation engine (src/lib/recommendation-engine.ts) is unchanged. No bucket dimensions added, no multiplicative bonuses applied. Cache key reco-bucket-stats-v7 is retained.

Features tested

#	Name	Definition	Status
1	`drawdownCtx`	`priceAtTrade` position in [0,1] vs trailing 365d company priceAtTrade min/max (from prior `BacktestResult` rows of the same company — strictly PIT)	tested, rejected
2	`earningsProximity`	days to next earnings	SKIPPED — no `Company.nextEarnings` field in `prisma/schema.prisma`
3	`priorCount`	# prior decls by same `(insiderName, companyId)` before `pubDate`	tested in diagnostic, dropped before WF (non-monotonic)
4	`blockCluster`	# other filings by same `(insiderName, companyId)` within ±7d	tested, rejected
5	`netStepPp`	`totalAmount / marketCap * 100` (size of incremental position step)	tested in diagnostic, dropped before WF (large step → worse in-sample, hypothesis falsified)
6	`daysSinceLast`	days since prior filing on `companyId` (any insider)	tested in diagnostic, dropped before WF (non-monotonic, sparse buckets)
7	`sectorMom`	mean r90 of PRIOR decls in same `sectorTag` within trailing 90d window — in WF, restricted to decls with `pubDate ≤ t − 90d` so r90 is realized before scoring time	tested, rejected

In-sample diagnostic (1,000 random BUY decls, baseline r90 = −1.52%, winRate 41.6%)

Bucket means (r90 % / winRate % / n):

`drawdownCtx` — spread 4.91pp (monotonic in-sample)

bottom (0-0.33): n=346, mean=+1.85%, wr=46.8%
middle (0.33-0.67): n=150, mean=−1.63%, wr=46.0%
top (0.67-1): n=173, mean=−3.06%, wr=34.1%

`priorCount` — spread 2.84pp (non-monotonic)

firstTime (0): mean=−2.68%
repeat (1-3): mean=−3.33%
habitual (4+): mean=−0.49% ← unexpected direction

`blockCluster` — spread 2.92pp

single (0): mean=+0.28%, wr=43.1%, n=385
scattered (1+): mean=−2.64%, wr=40.7%, n=615

`netStepPp` — spread 6.97pp (large step underperforms — hypothesis falsified)

small <0.5pp: mean=−2.24%, n=773
medium 0.5-2pp: mean=+4.81%, n=20 (sparse)
large >2pp: mean=−9.21%, n=35

`daysSinceLast` — spread 10.09pp (non-monotonic, sparse)

active <7d: mean=−2.64%, n=731
warming 7-30d: mean=+7.45%, n=95
stale 30-180d: mean=−2.44%, n=106
first 180+: mean=+4.44%, n=36

`sectorMom` — spread 21.27pp (large but uses look-ahead through prior r90)

down <-5: mean=−10.15%, wr=23.1%, n=373
flat -5..5: mean=+0.32%, wr=48.8%, n=424
up 5+: mean=+11.12%, wr=61.4%, n=197

Walk-forward OOS results (24m train / 12m test, 2 folds, top-10/week, hold 90d)

Strategy	n picks	mean r90 %	winRate %	Sharpe_ann	Δ winRate	Δ Sharpe_ann	Δ mean	p (perm, 800 iter)	Decision
baseline (A1)	1,060	4.14	50.9	0.858	—	—	—	—	—
+ drawdownInv (5·(1−ctx))	1,060	2.24	48.5	−0.527	−2.4	−1.385	−1.90	0.056	REJECT
+ blockSingle (5 if single)	1,060	2.90	49.2	−0.060	−1.7	−0.918	−1.24	0.217	REJECT
+ sectorMom (logistic→5)	1,060	4.12	51.6	0.701	+0.7	−0.157	−0.02	0.983	REJECT

Validation threshold:

ΔWinRate ≥ +2pp OR ΔSharpe_ann ≥ +0.15
AND permutation p-value < 0.10

Why the diagnostic / walk-forward gap?

Sector momentum collapses under strict PIT. The 21.27pp in-sample spread shrinks to noise when the proxy can only use prior decls whose r90 was already realized at scoring time. That's the textbook signature of an in-sample artifact masquerading as alpha.
Drawdown context flips sign in OOS. The "buy near 52w low" pattern (which was the only monotonic feature in-sample) loses badly in walk-forward (ΔSharpe_ann = −1.385). Likely cause: regime-dependence (the 2022 bear made "bottoming" prices keep falling), and the proxy itself is noisy when based on sparse same-company priceAtTrade history.
Limited fold count (2) keeps Sharpe_ann estimates unstable. With 4 years of data and 24m train / 12m test, we can only roll twice. More history is the right unlock.

Engine changes

Reproducibility

# Diagnostic — 1,000 random BUY decls, in-sample bucket means
node --env-file=.env.local scripts/extract-transaction-features.mjs
# → /tmp/feature-extraction.json (regenerate as needed)

# Walk-forward backtest — strict OOS, baseline + 3 variants
node --env-file=.env.local scripts/backtest-feature-additions.mjs
# → /tmp/feature-backtest.json (regenerate as needed)

Seeds: 7 (sample shuffle), 42 (bootstrap CI), 11 (permutation p-values).

Recommended next steps

Add Company.nextEarnings (Yahoo Finance has it) and retest feature #2.
Wait for ≥1 more year of data to get a 3rd walk-forward fold and tighter Sharpe_ann error bars.
Re-explore drawdown context using a true historical 52w-band proxy — e.g. backfill BacktestResult.fiftyTwoWeekHighAtPub / …LowAtPub from the Yahoo historical timeseries that already enriches BacktestResult.price* — the same-company prior-priceAtTrade window is too sparse for many decls.
Re-test sector momentum with an external sector index series (CAC sectorial indices), avoiding the lookahead inherent to using same-cohort r90 as proxy.

13 · Feature Engineering — Inexploited Transaction Features

TL;DR

Features tested

In-sample diagnostic (1,000 random BUY decls, baseline r90 = −1.52%, winRate 41.6%)

drawdownCtx — spread 4.91pp (monotonic in-sample)

priorCount — spread 2.84pp (non-monotonic)

blockCluster — spread 2.92pp

netStepPp — spread 6.97pp (large step underperforms — hypothesis falsified)

daysSinceLast — spread 10.09pp (non-monotonic, sparse)

sectorMom — spread 21.27pp (large but uses look-ahead through prior r90)

Walk-forward OOS results (24m train / 12m test, 2 folds, top-10/week, hold 90d)

Why the diagnostic / walk-forward gap?

Engine changes

Reproducibility

Recommended next steps

13 · Feature Engineering — Inexploited Transaction Features

TL;DR

Features tested

In-sample diagnostic (1,000 random BUY decls, baseline r90 = −1.52%, winRate 41.6%)

drawdownCtx — spread 4.91pp (monotonic in-sample)

priorCount — spread 2.84pp (non-monotonic)

blockCluster — spread 2.92pp

netStepPp — spread 6.97pp (large step underperforms — hypothesis falsified)

daysSinceLast — spread 10.09pp (non-monotonic, sparse)

sectorMom — spread 21.27pp (large but uses look-ahead through prior r90)

Walk-forward OOS results (24m train / 12m test, 2 folds, top-10/week, hold 90d)

Why the diagnostic / walk-forward gap?

Engine changes

Reproducibility

Recommended next steps

`drawdownCtx` — spread 4.91pp (monotonic in-sample)

`priorCount` — spread 2.84pp (non-monotonic)

`blockCluster` — spread 2.92pp

`netStepPp` — spread 6.97pp (large step underperforms — hypothesis falsified)

`daysSinceLast` — spread 10.09pp (non-monotonic, sparse)

`sectorMom` — spread 21.27pp (large but uses look-ahead through prior r90)

`drawdownCtx` — spread 4.91pp (monotonic in-sample)

`priorCount` — spread 2.84pp (non-monotonic)

`blockCluster` — spread 2.92pp

`netStepPp` — spread 6.97pp (large step underperforms — hypothesis falsified)

`daysSinceLast` — spread 10.09pp (non-monotonic, sparse)

`sectorMom` — spread 21.27pp (large but uses look-ahead through prior r90)