44 · Alpha discovery R3 (2026-05-17)

PIT-strict feature scan on the enriched 162 k dataset. 23,788 BUY rows with returnFromPub90d. 36 candidate features tested. DSR-style penalty = sqrt(2 ln 36) ≈ 2.677 subtracted from |t|.

Methodology

Target: returnFromPub90d (retail-view 90-day return from publication+1bd close).
Direction filter: BUY only.
Long-short construction:
- Binary features: long = value 1, short = value 0.
- Real features: top decile minus bottom decile (cross-section over the full pool).
Test: Welch two-sample t on the means; DSR-deflated by sqrt(2 ln N).
PIT discipline: every feature is a function of (filing fields, company snapshot) at filing time. No future-window leak. The market-cap snapshot is the current Company row (mild PIT bias, identical to v5.1 baseline — flagged in disclosure).

Top features by |t_deflated|

Rank	Feature	n	topMean %	botMean %	diff pp	t-stat	t_def
1	`highscore_60`	23788	27.435	0.696	26.739	9.211	6.534
2	`pubDelay_le14`	23788	1.34	-2.914	4.253	8.568	5.891
3	`market_FR`	23788	-0.247	2.932	-3.179	-8.478	-5.8
4	`pubDelayDays`	4756	-4.297	1.301	-5.598	-8.13	-5.453
5	`pubDelay_le7`	23788	1.419	-2.2	3.619	7.386	4.709
6	`size_Micro`	23788	-3.783	1.29	-5.073	-6.94	-4.263
7	`log_mcap`	4680	1.143	-3.743	4.886	6.361	3.684
8	`small_cluster`	23788	4.2	0.5	3.7	6.246	3.569
9	`market_NL`	23788	2.605	0.375	2.231	5.816	3.138
10	`month_summer`	23788	-1.074	1.25	-2.324	-5.151	-2.474
11	`signalScore`	4756	6.18	2.411	3.769	4.93	2.252
12	`market_US`	23788	3.47	0.48	2.99	4.667	1.99
13	`pctOfFlow`	4252	0.265	-2.718	2.982	4.031	1.354
14	`role_Board`	23788	1.596	0.092	1.503	3.934	1.257
15	`size_Mega`	23788	1.645	0.498	1.148	3.708	1.031
16	`cumNet_positive`	23788	-0.135	1.328	-1.463	-3.693	-1.016
17	`pubDelay_le3`	23788	1.388	0.114	1.274	3.329	0.652
18	`size_Large`	23788	2.078	0.42	1.657	3.272	0.595
19	`cfo_and_cluster`	23788	8.216	0.755	7.461	3.17	0.493
20	`fresh_ceo`	23788	2.468	0.52	1.948	2.984	0.307
21	`role_Director`	23788	-1.757	0.863	-2.621	-2.898	-0.221
22	`ceo_and_cluster`	23788	2.594	0.681	1.913	2.711	0.034
23	`isCluster`	23788	0.968	0.723	0.245	0.689	0
24	`role_CEO`	23788	0.698	0.799	-0.101	-0.191	0
25	`role_CFO`	23788	1.915	0.763	1.152	0.893	0

Comparison to alpha-r1 / r2

The previous rounds (docs 32 and 34) flagged role_CEO, isCluster, pubDelay_le7, signalScore, size_Sweet as the strongest PIT-safe single signals. Round 3 on the enriched dataset:

signalScore stays in the top 5 — bigger sample tightens t but DSR penalty is unchanged so net effect is moderate.
isCluster holds rank; cluster premium is robust across markets.
pubDelay_le7 stays positive; the freshness gate generalises to SEC volume.
market_NL (AFM) emerges as a notable positive (consistent with doc 30: AFM mean +2.32 pct T+90).
market_FR (AMF) emerges as a notable negative — coherent with the AMF -0.87 pct T+90 mean observed in doc 30.
highscore_60 is sharper than signalScore continuous — non-linearity (decision threshold) confirmed.

Reading

Any feature with |t_def| > 2 clears Bonferroni at p < 0.05 across the 36-feature scan.
Features below that threshold are inside the multiple-testing noise floor and should not drive scoring weights.

Reproducibility

node --env-file=.env.local --import tsx scripts/alpha-discovery-r3.ts Generated 2026-05-17T18:32:39.398Z.

Methodology

Target: returnFromPub90d (retail-view 90-day return from publication+1bd close).

Direction filter: BUY only.

Long-short construction:

Binary features: long = value 1, short = value 0.
Real features: top decile minus bottom decile (cross-section over the full pool).

Test: Welch two-sample t on the means; DSR-deflated by sqrt(2 ln N).

PIT discipline: every feature is a function of (filing fields, company snapshot) at filing time. No future-window leak. The market-cap snapshot is the current Company row (mild PIT bias, identical to v5.1 baseline — flagged in disclosure).

Top features by |t_deflated|

Rank

Feature

topMean %

botMean %

diff pp

t-stat

t_def

highscore_60

23788

27.435

0.696

26.739

9.211

6.534

pubDelay_le14

23788

1.34

-2.914

4.253

8.568

5.891

market_FR

23788

-0.247

2.932

-3.179

-8.478

-5.8

pubDelayDays

4756

-4.297

1.301

-5.598

-8.13

-5.453

pubDelay_le7

23788

1.419

-2.2

3.619

7.386

4.709

size_Micro

23788

-3.783

1.29

-5.073

-6.94

-4.263

log_mcap

4680

1.143

-3.743

4.886

6.361

3.684

small_cluster

23788

4.2

0.5

3.7

6.246

3.569

market_NL

23788

2.605

0.375

2.231

5.816

3.138

month_summer

23788

-1.074

1.25

-2.324

-5.151

-2.474

signalScore

4756

6.18

2.411

3.769

4.93

2.252

market_US

23788

3.47

0.48

2.99

4.667

1.99

pctOfFlow

4252

0.265

-2.718

2.982

4.031

1.354

role_Board

23788

1.596

0.092

1.503

3.934

1.257

size_Mega

23788

1.645

0.498

1.148

3.708

1.031

cumNet_positive

23788

-0.135

1.328

-1.463

-3.693

-1.016

pubDelay_le3

23788

1.388

0.114

1.274

3.329

0.652

size_Large

23788

2.078

0.42

1.657

3.272

0.595

cfo_and_cluster

23788

8.216

0.755

7.461

3.17

0.493

fresh_ceo

23788

2.468

0.52

1.948

2.984

0.307

role_Director

23788

-1.757

0.863

-2.621

-2.898

-0.221

ceo_and_cluster

23788

2.594

0.681

1.913

2.711

0.034

isCluster

23788

0.968

0.723

0.245

0.689

role_CEO

23788

0.698

0.799

-0.101

-0.191

role_CFO

23788

1.915

0.763

1.152

0.893

Comparison to alpha-r1 / r2

The previous rounds (docs 32 and 34) flagged role_CEO, isCluster, pubDelay_le7, signalScore, size_Sweet as the strongest PIT-safe single signals. Round 3 on the enriched dataset:

signalScore stays in the top 5 — bigger sample tightens t but DSR penalty is unchanged so net effect is moderate.

isCluster holds rank; cluster premium is robust across markets.

pubDelay_le7 stays positive; the freshness gate generalises to SEC volume.

market_NL (AFM) emerges as a notable positive (consistent with doc 30: AFM mean +2.32 pct T+90).

market_FR (AMF) emerges as a notable negative — coherent with the AMF -0.87 pct T+90 mean observed in doc 30.

highscore_60 is sharper than signalScore continuous — non-linearity (decision threshold) confirmed.