44 · Alpha discovery R3 (2026-05-17)
PIT-strict feature scan on the enriched 162 k dataset. 23,788 BUY rows with returnFromPub90d. 36 candidate features tested. DSR-style penalty = sqrt(2 ln 36) ≈ 2.677 subtracted from |t|.
Methodology
- Target:
returnFromPub90d(retail-view 90-day return from publication+1bd close). - Direction filter: BUY only.
- Long-short construction:
- Binary features: long = value 1, short = value 0.
- Real features: top decile minus bottom decile (cross-section over the full pool).
- Test: Welch two-sample t on the means; DSR-deflated by sqrt(2 ln N).
- PIT discipline: every feature is a function of (filing fields, company snapshot) at filing time. No future-window leak. The market-cap snapshot is the current Company row (mild PIT bias, identical to v5.1 baseline — flagged in disclosure).
Top features by |t_deflated|
| Rank | Feature | n | topMean % | botMean % | diff pp | t-stat | t_def |
|---|---|---|---|---|---|---|---|
| 1 | highscore_60 |
23788 | 27.435 | 0.696 | 26.739 | 9.211 | 6.534 |
| 2 | pubDelay_le14 |
23788 | 1.34 | -2.914 | 4.253 | 8.568 | 5.891 |
| 3 | market_FR |
23788 | -0.247 | 2.932 | -3.179 | -8.478 | -5.8 |
| 4 | pubDelayDays |
4756 | -4.297 | 1.301 | -5.598 | -8.13 | -5.453 |
| 5 | pubDelay_le7 |
23788 | 1.419 | -2.2 | 3.619 | 7.386 | 4.709 |
| 6 | size_Micro |
23788 | -3.783 | 1.29 | -5.073 | -6.94 | -4.263 |
| 7 | log_mcap |
4680 | 1.143 | -3.743 | 4.886 | 6.361 | 3.684 |
| 8 | small_cluster |
23788 | 4.2 | 0.5 | 3.7 | 6.246 | 3.569 |
| 9 | market_NL |
23788 | 2.605 | 0.375 | 2.231 | 5.816 | 3.138 |
| 10 | month_summer |
23788 | -1.074 | 1.25 | -2.324 | -5.151 | -2.474 |
| 11 | signalScore |
4756 | 6.18 | 2.411 | 3.769 | 4.93 | 2.252 |
| 12 | market_US |
23788 | 3.47 | 0.48 | 2.99 | 4.667 | 1.99 |
| 13 | pctOfFlow |
4252 | 0.265 | -2.718 | 2.982 | 4.031 | 1.354 |
| 14 | role_Board |
23788 | 1.596 | 0.092 | 1.503 | 3.934 | 1.257 |
| 15 | size_Mega |
23788 | 1.645 | 0.498 | 1.148 | 3.708 | 1.031 |
| 16 | cumNet_positive |
23788 | -0.135 | 1.328 | -1.463 | -3.693 | -1.016 |
| 17 | pubDelay_le3 |
23788 | 1.388 | 0.114 | 1.274 | 3.329 | 0.652 |
| 18 | size_Large |
23788 | 2.078 | 0.42 | 1.657 | 3.272 | 0.595 |
| 19 | cfo_and_cluster |
23788 | 8.216 | 0.755 | 7.461 | 3.17 | 0.493 |
| 20 | fresh_ceo |
23788 | 2.468 | 0.52 | 1.948 | 2.984 | 0.307 |
| 21 | role_Director |
23788 | -1.757 | 0.863 | -2.621 | -2.898 | -0.221 |
| 22 | ceo_and_cluster |
23788 | 2.594 | 0.681 | 1.913 | 2.711 | 0.034 |
| 23 | isCluster |
23788 | 0.968 | 0.723 | 0.245 | 0.689 | 0 |
| 24 | role_CEO |
23788 | 0.698 | 0.799 | -0.101 | -0.191 | 0 |
| 25 | role_CFO |
23788 | 1.915 | 0.763 | 1.152 | 0.893 | 0 |
Comparison to alpha-r1 / r2
The previous rounds (docs 32 and 34) flagged role_CEO, isCluster, pubDelay_le7, signalScore, size_Sweet as the strongest PIT-safe single signals. Round 3 on the enriched dataset:
- signalScore stays in the top 5 — bigger sample tightens t but DSR penalty is unchanged so net effect is moderate.
- isCluster holds rank; cluster premium is robust across markets.
- pubDelay_le7 stays positive; the freshness gate generalises to SEC volume.
- market_NL (AFM) emerges as a notable positive (consistent with doc 30: AFM mean +2.32 pct T+90).
- market_FR (AMF) emerges as a notable negative — coherent with the AMF -0.87 pct T+90 mean observed in doc 30.
- highscore_60 is sharper than
signalScorecontinuous — non-linearity (decision threshold) confirmed.
Reading
- Any feature with |t_def| > 2 clears Bonferroni at p < 0.05 across the 36-feature scan.
- Features below that threshold are inside the multiple-testing noise floor and should not drive scoring weights.
Reproducibility
node --env-file=.env.local --import tsx scripts/alpha-discovery-r3.ts
Generated 2026-05-17T18:32:39.398Z.