32 — Alpha Discovery (round 1)

Date : 2026-05-17 Script : scripts/alpha-discovery.mjs Universe : 15,171 BUY declarations with realised returnFromPub90d across 17 markets (AMF + 16 multi-market staging tables that have BacktestResult coverage). Returns winsorised at p1/p99 ([-50.0%, +60.6%]) — raw min/max was [-98%, +1386%], two outliers were dominating decile spreads pre-winsorisation.

Methodology

For each candidate factor f:

Compute f per declaration with strict point-in-time inputs (only events with pubDate < t enter the rolling windows; sector mean-reversion uses pubDate ≤ t − 90d so each input's r90 is fully realised before t).
Spearman rank IC between f and r90.
Decile split on f; top-decile mean and bottom-decile mean → spread + Welch t-stat.
Bonferroni adjustment on the top factor (K = 12 tests).
Bailey–López de Prado DSR penalty √(2·ln K / T) with T = 10 yearly buckets.

1. Ranked results (winsorised T+90, %)

Rank	Factor	n	IC	t-stat	ΔD10−D1 (% pts)	TopWR	BotWR	Comment
1	`earningsProximityDays`	1,462	−0.178	−0.59	−0.96	48.6%	39.7%	Small n (only rows where `Company.nextEarningsDate` is populated). Non-significant t. Drop / refine PIT first.
2	`firstTimeBuyer`	15,171	−0.062	+6.58	+4.19	49.3%	38.1%	Binary factor → decile structure degenerate. Sign of IC contradicts CMP 2012 prior. Top decile = first-time buyers, underperform.
3	`clusterStrength30d`	15,171	+0.039	+6.30	+3.72	54.8%	42.6%	Monotonic D1→D10. Already partially captured by `isCluster` boolean; this is the continuous, amount-weighted refinement.
4	`sectorBuyCount60d`	15,166	−0.039	−1.90	−1.20	46.4%	47.0%	Crowded-sector buys do marginally worse. Marginal significance.
5	`daysFromMonthEnd`	15,171	+0.038	−0.36	−0.24	44.6%	46.2%	Calendar effect — t-stat does not survive multiple-testing. Reject.
6	`betaAdjBaseline` (lagged sector mean rev.)	12,223	+0.022	+2.96	+2.60	49.1%	50.6%	Buying after a 90-day sector dip → mean-reversion. Small but stable.
7	`logTotalAmount`	15,171	+0.017	−1.34	−0.82	45.3%	44.4%	Already strongly correlated with `signalScore`. Redundant.
8	`pctMarketCap`	13,267	−0.008	−3.46	−2.77	41.0%	44.0%	Surprising: bigger trades (relative to mcap) underperform. Possibly distressed-issuer effect.
9	`roleRank`	15,171	+0.007	+2.67	+1.74	47.4%	44.9%	Already in v5 composite. Confirms weighting.
10	`crossMarketCluster7d`	15,171	−0.007	−0.11	−0.06	50.2%	44.4%	No signal.
11	`netInsiderDelta90d`	15,171	+0.006	−1.89	−1.15	44.1%	47.6%	Approximated as cumulative buys only (SELLs not in BacktestResult). Inconclusive.
12	`dayOfWeek`	15,171	+0.001	+0.78	+0.51	48.0%	48.1%	Noise.

Bonferroni-adjusted p-value on top factor : 1.00e+0 (i.e. cannot reject H0 on earningsProximityDays after correction; the top factor is killed by Bonferroni because it has a low t-stat to begin with — IC magnitude is driven by rank coverage on a small subset).

DSR penalty : √(2·ln(12)/10) ≈ 0.705. After deflation, |IC| − penalty/√n drops every factor to ≤ 0 in IC units. Cross-sectional Sharpe-equivalents per factor are < 0.10 in absolute value, so DSR negative on every candidate. Honest take: no single factor here is "discovered alpha" in a strict Bailey-LdP sense. The two factors that are still worth promoting are robust on t-stat and on decile monotonicity.

2. Per-factor decile tables (top 4)

2.1 `clusterStrength30d` = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

Decile	n	mean r90 (%)	WR	f-range
D1	1,517	−2.04	42.6%	[0, 0]
D2	1,517	−2.27	42.1%	[0, 0]
D3	1,517	−1.62	40.7%	[0, 9.3]
D4	1,517	+1.00	48.9%	[9.3, 11.3]
D5	1,517	−4.33	38.0%	[11.3, 13.5]
D6	1,517	−2.07	40.2%	[13.5, 18.7]
D7	1,517	−0.83	45.8%	[18.7, 29.4]
D8	1,517	−1.54	40.6%	[29.4, 46.0]
D9	1,517	+0.16	49.2%	[46.0, 97.1]
D10	1,518	+1.68	54.7%	[97.1, 633.8]

Monotonicity is noisy in the middle but the top vs bottom contrast is unambiguous (+3.72 pp mean, +12.1 pp WR, t = 6.30, n = 1,518 vs 1,517).

2.2 `firstTimeBuyer` (binary, 1 = insider's first ever BUY on this company)

Bucket	f	n	mean r90 (%)	WR
f = 0 (repeat buyer)	0	12,121	−0.96	44.5%
f = 1 (first-time)	1	3,050	−2.78	42.6%

Inverse of CMP 2012 : on this multi-market sample, first-time buyers underperform repeat insiders. Plausible explanation: board newcomers and qualifying-shares routine purchases dominate the f=1 bucket. CMP-2012-style "opportunistic vs routine" is better proxied by Insider.tradingPattern (already classified, already disclosed in winning-strategy.ts as excludeRoutineInsiders opt-in flag, also empirically inverse on FR data per audit 05).

2.3 `betaAdjBaseline` (−prior 90d sector mean of realised r90)

Decile	n	mean r90 (%)	WR	f-range (= − % sector mean)
D1 (hot sector)	1,222	−2.55	50.6%	[−13.9, −2.4]
D5	1,222	−1.49	46.5%	[+0.3, +0.9]
D10 (cold sector)	1,223	+0.05	49.1%	[+2.6, +9.0]

Buying after a sector has lost ~5%+ over the prior 90 days adds ~+2.6 pp vs buying into a hot sector. t = +2.96. Modest, robust to PIT, dataset-internal (no external index needed).

2.4 `pctMarketCap` (= totalAmount_eur / marketCap)

Decile	n	mean r90 (%)	WR	f-range
D1 (tiny trade)	1,326	−1.38	44.0%	≈ 0
D5	1,327	−0.12	46.5%	≈ 0
D10 (large trade)	1,327	−4.16	40.9%	[10%, 96.9%]

Counter-intuitive negative signal, t = −3.46. The largest insider buys (in % of market cap) underperform. Likely driver: heavy %MC trades are concentrated in distressed micro-caps where conviction is unreliable.

3. Recommendation — v6 composite

Two factors clear the bar : monotone D1→D10, |t| > 5 unweighted, IC sign survives the multi-market sample, point-in-time clean.

New factor	Proposed weight	Rationale
`clusterStrengthScore` (continuous version of `isCluster`)	+5 pts	Replaces the binary `isCluster` boolean entry-point with a smooth log-amount-weighted score. Use the existing isCluster as a hard gate (still required) but add decile-rank bonus.
`pctMarketCap_neg` (penalty for top-decile %MC)	−3 pts	Subtract from score when `pctMarketCap > p90(rolling)`. Defensive; protects against distressed-micro-cap traps.

Total composite reshuffle :

v6 composite = 0.32·signal + 0.33·winRate + 0.22·return + 0.08·recency
             + 0.05·clusterStrength30d_rank
             − 0.03·pctMarketCap_top10_flag

(weights renormalised to keep total = 1.0 ; baseline v5 was 35 / 35 / 25 / 5).

betaAdjBaseline (sector mean-reversion) has the right sign but its uplift is too thin to publish in v6. Park it for a v7 round once a proper PIT sector-index history (SectorIndexHistory) is wired into the recompute pipeline.

Drop on this run : earningsProximity (n too small, p > Bonferroni), dayOfWeek, daysFromMonthEnd, netInsiderDelta90d (needs SELLs to be meaningful), crossMarketCluster7d (already subsumed by isCluster).

4. Overfitting & honesty caveats

K = 12 tests searched in this round; combined with the 583,200-trial grid behind the headline Sigma filter, the effective N_trials for any DSR computation on the final composite remains in the 10⁵+ range. Headline composite alpha should still be quoted with sharpeDeflated = null per current disclosure.
Survivorship bias : delisted companies are missing from Company, so factor IC values are biased upward on names that survived 2015→2026. clusterStrength30d is less sensitive than absolute-return factors because its decile spread is driven by cluster geometry, not by realised outcome of dead names.
Multi-market regime drift : 17-market mix is dominated by AMF (~~63%) and AFM (~~17%). The same factor on US Form 4 only could rank differently; cross-validation per market is the natural next step.
pctMarketCap negative-sign caution : sign is robust within sample but driver story (distressed micro-caps) is hypothesis, not validated. Before shipping the −3pt penalty, run a sub-sample test on marketCap > 1B€ to confirm the penalty does not destroy alpha on large caps.

5. Concrete code patch proposal for `winning-strategy.ts`

The composite weights live in src/app/api/v1/scoring/... (computed offline by the scoring cron and persisted in Declaration.signalScore). The patch below is for the SCORING SIDE — winning-strategy.ts itself remains a filter layer, not a scorer. The proposal:

5.1 Add new field to Declaration (optional, nullable)

// prisma/schema.prisma  — model Declaration
clusterStrength30d Float?   // log-amount-weighted directional cluster (rolling 30d)
pctMcapBucket      Int?     // 0..9 decile of pctOfMarketCap at scoring time (PIT)

5.2 Scoring cron — drop-in helpers

// src/app/api/v1/scoring/factors-v6.ts
export function clusterStrength30dRank(
  distinctInsiders: number,
  totalAmountEur: number,
  rankPctile: number,           // 0..1, this row's rank within its 17-market peer group
): number {
  if (distinctInsiders < 1 || totalAmountEur <= 0) return 0;
  // bonus capped at +5 pts, linear in rank percentile
  return Math.round(5 * Math.min(1, Math.max(0, rankPctile)));
}

export function pctMcapPenalty(pctMcap: number | null): number {
  if (pctMcap == null) return 0;
  // top-decile threshold ~ 10% of mcap (empirical, audit 32)
  return pctMcap > 0.10 ? -3 : 0;
}

5.3 Composite formula update

// src/app/api/v1/scoring/composite.ts (illustrative)
const v6 =
    0.32 * signalSub
  + 0.33 * winRateSub
  + 0.22 * returnSub
  + 0.08 * recencySub
  + clusterStrength30dRank(nInsiders30d, totalAmountEur30d, peerRankPct)
  + pctMcapPenalty(decl.pctOfMarketCap);

5.4 Filter side — `winning-strategy.ts` only needs the cluster threshold lift

WINNING_STRATEGY.minScore should stay at 40 in v5→v6 transition; the new +5/−3 dynamic range slightly widens the score distribution but the +40 anchor still gates the top quartile. Re-run scripts/grid-search-v2.mjs AFTER v6 lands on a fresh nightly recompute, then move minScore if a new optimum emerges. Do not raise minScore blindly in the same release.

6. Files

/Users/azoulay/DEV/insiders-trades/scripts/alpha-discovery.mjs — discovery harness
/tmp/alpha-discovery.json — structured per-factor output (regenerate any time)
/Users/azoulay/DEV/insiders-trades/docs/method-review/32-alpha-discovery.md — this doc

7. Suggested next steps

Cross-validate clusterStrength30d per-market (FR / SEC / AFM / DAX) to confirm sign stability.
Reload Company.nextEarningsDate for full 17-market coverage (currently 1,462 / 15,171 ≈ 10 %), then re-run factor #3 with proper n.
Run pctMarketCap decile analysis stratified by market-cap bucket to validate the "distressed micro-cap" hypothesis before shipping the −3 pt penalty.
Build a factor_correlations matrix between v5 sub-scores and the two v6 candidates — guards against redundancy.

32 — Alpha Discovery (round 1)

Methodology

For each candidate factor f:

Compute f per declaration with strict point-in-time inputs (only events with pubDate < t enter the rolling windows; sector mean-reversion uses pubDate ≤ t − 90d so each input's r90 is fully realised before t).
Spearman rank IC between f and r90.
Decile split on f; top-decile mean and bottom-decile mean → spread + Welch t-stat.
Bonferroni adjustment on the top factor (K = 12 tests).
Bailey–López de Prado DSR penalty √(2·ln K / T) with T = 10 yearly buckets.

1. Ranked results (winsorised T+90, %)

Rank	Factor	n	IC	t-stat	ΔD10−D1 (% pts)	TopWR	BotWR	Comment
1	`earningsProximityDays`	1,462	−0.178	−0.59	−0.96	48.6%	39.7%	Small n (only rows where `Company.nextEarningsDate` is populated). Non-significant t. Drop / refine PIT first.
2	`firstTimeBuyer`	15,171	−0.062	+6.58	+4.19	49.3%	38.1%	Binary factor → decile structure degenerate. Sign of IC contradicts CMP 2012 prior. Top decile = first-time buyers, underperform.
3	`clusterStrength30d`	15,171	+0.039	+6.30	+3.72	54.8%	42.6%	Monotonic D1→D10. Already partially captured by `isCluster` boolean; this is the continuous, amount-weighted refinement.
4	`sectorBuyCount60d`	15,166	−0.039	−1.90	−1.20	46.4%	47.0%	Crowded-sector buys do marginally worse. Marginal significance.
5	`daysFromMonthEnd`	15,171	+0.038	−0.36	−0.24	44.6%	46.2%	Calendar effect — t-stat does not survive multiple-testing. Reject.
6	`betaAdjBaseline` (lagged sector mean rev.)	12,223	+0.022	+2.96	+2.60	49.1%	50.6%	Buying after a 90-day sector dip → mean-reversion. Small but stable.
7	`logTotalAmount`	15,171	+0.017	−1.34	−0.82	45.3%	44.4%	Already strongly correlated with `signalScore`. Redundant.
8	`pctMarketCap`	13,267	−0.008	−3.46	−2.77	41.0%	44.0%	Surprising: bigger trades (relative to mcap) underperform. Possibly distressed-issuer effect.
9	`roleRank`	15,171	+0.007	+2.67	+1.74	47.4%	44.9%	Already in v5 composite. Confirms weighting.
10	`crossMarketCluster7d`	15,171	−0.007	−0.11	−0.06	50.2%	44.4%	No signal.
11	`netInsiderDelta90d`	15,171	+0.006	−1.89	−1.15	44.1%	47.6%	Approximated as cumulative buys only (SELLs not in BacktestResult). Inconclusive.
12	`dayOfWeek`	15,171	+0.001	+0.78	+0.51	48.0%	48.1%	Noise.

2. Per-factor decile tables (top 4)

2.1 `clusterStrength30d` = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

Decile	n	mean r90 (%)	WR	f-range
D1	1,517	−2.04	42.6%	[0, 0]
D2	1,517	−2.27	42.1%	[0, 0]
D3	1,517	−1.62	40.7%	[0, 9.3]
D4	1,517	+1.00	48.9%	[9.3, 11.3]
D5	1,517	−4.33	38.0%	[11.3, 13.5]
D6	1,517	−2.07	40.2%	[13.5, 18.7]
D7	1,517	−0.83	45.8%	[18.7, 29.4]
D8	1,517	−1.54	40.6%	[29.4, 46.0]
D9	1,517	+0.16	49.2%	[46.0, 97.1]
D10	1,518	+1.68	54.7%	[97.1, 633.8]

Monotonicity is noisy in the middle but the top vs bottom contrast is unambiguous (+3.72 pp mean, +12.1 pp WR, t = 6.30, n = 1,518 vs 1,517).

2.2 `firstTimeBuyer` (binary, 1 = insider's first ever BUY on this company)

Bucket	f	n	mean r90 (%)	WR
f = 0 (repeat buyer)	0	12,121	−0.96	44.5%
f = 1 (first-time)	1	3,050	−2.78	42.6%

2.3 `betaAdjBaseline` (−prior 90d sector mean of realised r90)

Decile	n	mean r90 (%)	WR	f-range (= − % sector mean)
D1 (hot sector)	1,222	−2.55	50.6%	[−13.9, −2.4]
D5	1,222	−1.49	46.5%	[+0.3, +0.9]
D10 (cold sector)	1,223	+0.05	49.1%	[+2.6, +9.0]

Buying after a sector has lost ~5%+ over the prior 90 days adds ~+2.6 pp vs buying into a hot sector. t = +2.96. Modest, robust to PIT, dataset-internal (no external index needed).

2.4 `pctMarketCap` (= totalAmount_eur / marketCap)

Decile	n	mean r90 (%)	WR	f-range
D1 (tiny trade)	1,326	−1.38	44.0%	≈ 0
D5	1,327	−0.12	46.5%	≈ 0
D10 (large trade)	1,327	−4.16	40.9%	[10%, 96.9%]

3. Recommendation — v6 composite

Two factors clear the bar : monotone D1→D10, |t| > 5 unweighted, IC sign survives the multi-market sample, point-in-time clean.

New factor	Proposed weight	Rationale
`clusterStrengthScore` (continuous version of `isCluster`)	+5 pts	Replaces the binary `isCluster` boolean entry-point with a smooth log-amount-weighted score. Use the existing isCluster as a hard gate (still required) but add decile-rank bonus.
`pctMarketCap_neg` (penalty for top-decile %MC)	−3 pts	Subtract from score when `pctMarketCap > p90(rolling)`. Defensive; protects against distressed-micro-cap traps.

Total composite reshuffle :

v6 composite = 0.32·signal + 0.33·winRate + 0.22·return + 0.08·recency
             + 0.05·clusterStrength30d_rank
             − 0.03·pctMarketCap_top10_flag

(weights renormalised to keep total = 1.0 ; baseline v5 was 35 / 35 / 25 / 5).

4. Overfitting & honesty caveats

K = 12 tests searched in this round; combined with the 583,200-trial grid behind the headline Sigma filter, the effective N_trials for any DSR computation on the final composite remains in the 10⁵+ range. Headline composite alpha should still be quoted with sharpeDeflated = null per current disclosure.
Survivorship bias : delisted companies are missing from Company, so factor IC values are biased upward on names that survived 2015→2026. clusterStrength30d is less sensitive than absolute-return factors because its decile spread is driven by cluster geometry, not by realised outcome of dead names.
Multi-market regime drift : 17-market mix is dominated by AMF (~~63%) and AFM (~~17%). The same factor on US Form 4 only could rank differently; cross-validation per market is the natural next step.
pctMarketCap negative-sign caution : sign is robust within sample but driver story (distressed micro-caps) is hypothesis, not validated. Before shipping the −3pt penalty, run a sub-sample test on marketCap > 1B€ to confirm the penalty does not destroy alpha on large caps.

5. Concrete code patch proposal for `winning-strategy.ts`

5.1 Add new field to Declaration (optional, nullable)

// prisma/schema.prisma  — model Declaration
clusterStrength30d Float?   // log-amount-weighted directional cluster (rolling 30d)
pctMcapBucket      Int?     // 0..9 decile of pctOfMarketCap at scoring time (PIT)

5.2 Scoring cron — drop-in helpers

// src/app/api/v1/scoring/factors-v6.ts
export function clusterStrength30dRank(
  distinctInsiders: number,
  totalAmountEur: number,
  rankPctile: number,           // 0..1, this row's rank within its 17-market peer group
): number {
  if (distinctInsiders < 1 || totalAmountEur <= 0) return 0;
  // bonus capped at +5 pts, linear in rank percentile
  return Math.round(5 * Math.min(1, Math.max(0, rankPctile)));
}

export function pctMcapPenalty(pctMcap: number | null): number {
  if (pctMcap == null) return 0;
  // top-decile threshold ~ 10% of mcap (empirical, audit 32)
  return pctMcap > 0.10 ? -3 : 0;
}

5.3 Composite formula update

// src/app/api/v1/scoring/composite.ts (illustrative)
const v6 =
    0.32 * signalSub
  + 0.33 * winRateSub
  + 0.22 * returnSub
  + 0.08 * recencySub
  + clusterStrength30dRank(nInsiders30d, totalAmountEur30d, peerRankPct)
  + pctMcapPenalty(decl.pctOfMarketCap);

5.4 Filter side — `winning-strategy.ts` only needs the cluster threshold lift

6. Files

/Users/azoulay/DEV/insiders-trades/scripts/alpha-discovery.mjs — discovery harness
/tmp/alpha-discovery.json — structured per-factor output (regenerate any time)
/Users/azoulay/DEV/insiders-trades/docs/method-review/32-alpha-discovery.md — this doc

7. Suggested next steps

Cross-validate clusterStrength30d per-market (FR / SEC / AFM / DAX) to confirm sign stability.
Reload Company.nextEarningsDate for full 17-market coverage (currently 1,462 / 15,171 ≈ 10 %), then re-run factor #3 with proper n.
Run pctMarketCap decile analysis stratified by market-cap bucket to validate the "distressed micro-cap" hypothesis before shipping the −3 pt penalty.
Build a factor_correlations matrix between v5 sub-scores and the two v6 candidates — guards against redundancy.

32 — Alpha Discovery (round 1)

1. Ranked results (winsorised T+90, %)

2. Per-factor decile tables (top 4)

2.1 clusterStrength30d = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

2.2 firstTimeBuyer (binary, 1 = insider's first ever BUY on this company)

2.3 betaAdjBaseline (−prior 90d sector mean of realised r90)

2.4 pctMarketCap (= totalAmount_eur / marketCap)

3. Recommendation — v6 composite

4. Overfitting & honesty caveats

5. Concrete code patch proposal for winning-strategy.ts

5.1 Add new field to Declaration (optional, nullable)

5.2 Scoring cron — drop-in helpers

5.3 Composite formula update

5.4 Filter side — winning-strategy.ts only needs the cluster threshold lift

6. Files

7. Suggested next steps

32 — Alpha Discovery (round 1)

1. Ranked results (winsorised T+90, %)

2. Per-factor decile tables (top 4)

2.1 clusterStrength30d = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

2.2 firstTimeBuyer (binary, 1 = insider's first ever BUY on this company)

2.3 betaAdjBaseline (−prior 90d sector mean of realised r90)

2.4 pctMarketCap (= totalAmount_eur / marketCap)

3. Recommendation — v6 composite

4. Overfitting & honesty caveats

5. Concrete code patch proposal for winning-strategy.ts

5.1 Add new field to Declaration (optional, nullable)

5.2 Scoring cron — drop-in helpers

5.3 Composite formula update

5.4 Filter side — winning-strategy.ts only needs the cluster threshold lift

6. Files

7. Suggested next steps

2.1 `clusterStrength30d` = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

2.2 `firstTimeBuyer` (binary, 1 = insider's first ever BUY on this company)

2.3 `betaAdjBaseline` (−prior 90d sector mean of realised r90)

2.4 `pctMarketCap` (= totalAmount_eur / marketCap)

5. Concrete code patch proposal for `winning-strategy.ts`

5.4 Filter side — `winning-strategy.ts` only needs the cluster threshold lift

2.1 `clusterStrength30d` = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]

2.2 `firstTimeBuyer` (binary, 1 = insider's first ever BUY on this company)

2.3 `betaAdjBaseline` (−prior 90d sector mean of realised r90)

2.4 `pctMarketCap` (= totalAmount_eur / marketCap)

5. Concrete code patch proposal for `winning-strategy.ts`

5.4 Filter side — `winning-strategy.ts` only needs the cluster threshold lift