32 — Alpha Discovery (round 1)
Date : 2026-05-17
Script : scripts/alpha-discovery.mjs
Universe : 15,171 BUY declarations with realised returnFromPub90d across 17 markets (AMF + 16 multi-market staging tables that have BacktestResult coverage). Returns winsorised at p1/p99 ([-50.0%, +60.6%]) — raw min/max was [-98%, +1386%], two outliers were dominating decile spreads pre-winsorisation.
Methodology
For each candidate factor f:
- Compute
fper declaration with strict point-in-time inputs (only events withpubDate < tenter the rolling windows; sector mean-reversion usespubDate ≤ t − 90dso each input'sr90is fully realised beforet). - Spearman rank IC between
fandr90. - Decile split on
f; top-decile mean and bottom-decile mean → spread + Welch t-stat. - Bonferroni adjustment on the top factor (K = 12 tests).
- Bailey–López de Prado DSR penalty √(2·ln K / T) with T = 10 yearly buckets.
1. Ranked results (winsorised T+90, %)
| Rank | Factor | n | IC | t-stat | ΔD10−D1 (% pts) | TopWR | BotWR | Comment |
|---|---|---|---|---|---|---|---|---|
| 1 | earningsProximityDays |
1,462 | −0.178 | −0.59 | −0.96 | 48.6% | 39.7% | Small n (only rows where Company.nextEarningsDate is populated). Non-significant t. Drop / refine PIT first. |
| 2 | firstTimeBuyer |
15,171 | −0.062 | +6.58 | +4.19 | 49.3% | 38.1% | Binary factor → decile structure degenerate. Sign of IC contradicts CMP 2012 prior. Top decile = first-time buyers, underperform. |
| 3 | clusterStrength30d |
15,171 | +0.039 | +6.30 | +3.72 | 54.8% | 42.6% | Monotonic D1→D10. Already partially captured by isCluster boolean; this is the continuous, amount-weighted refinement. |
| 4 | sectorBuyCount60d |
15,166 | −0.039 | −1.90 | −1.20 | 46.4% | 47.0% | Crowded-sector buys do marginally worse. Marginal significance. |
| 5 | daysFromMonthEnd |
15,171 | +0.038 | −0.36 | −0.24 | 44.6% | 46.2% | Calendar effect — t-stat does not survive multiple-testing. Reject. |
| 6 | betaAdjBaseline (lagged sector mean rev.) |
12,223 | +0.022 | +2.96 | +2.60 | 49.1% | 50.6% | Buying after a 90-day sector dip → mean-reversion. Small but stable. |
| 7 | logTotalAmount |
15,171 | +0.017 | −1.34 | −0.82 | 45.3% | 44.4% | Already strongly correlated with signalScore. Redundant. |
| 8 | pctMarketCap |
13,267 | −0.008 | −3.46 | −2.77 | 41.0% | 44.0% | Surprising: bigger trades (relative to mcap) underperform. Possibly distressed-issuer effect. |
| 9 | roleRank |
15,171 | +0.007 | +2.67 | +1.74 | 47.4% | 44.9% | Already in v5 composite. Confirms weighting. |
| 10 | crossMarketCluster7d |
15,171 | −0.007 | −0.11 | −0.06 | 50.2% | 44.4% | No signal. |
| 11 | netInsiderDelta90d |
15,171 | +0.006 | −1.89 | −1.15 | 44.1% | 47.6% | Approximated as cumulative buys only (SELLs not in BacktestResult). Inconclusive. |
| 12 | dayOfWeek |
15,171 | +0.001 | +0.78 | +0.51 | 48.0% | 48.1% | Noise. |
Bonferroni-adjusted p-value on top factor : 1.00e+0 (i.e. cannot reject H0 on earningsProximityDays after correction; the top factor is killed by Bonferroni because it has a low t-stat to begin with — IC magnitude is driven by rank coverage on a small subset).
DSR penalty : √(2·ln(12)/10) ≈ 0.705. After deflation, |IC| − penalty/√n drops every factor to ≤ 0 in IC units. Cross-sectional Sharpe-equivalents per factor are < 0.10 in absolute value, so DSR negative on every candidate. Honest take: no single factor here is "discovered alpha" in a strict Bailey-LdP sense. The two factors that are still worth promoting are robust on t-stat and on decile monotonicity.
2. Per-factor decile tables (top 4)
2.1 clusterStrength30d = n_distinct_insiders × log(1 + Σ totalAmount_eur) [30d rolling]
| Decile | n | mean r90 (%) | WR | f-range |
|---|---|---|---|---|
| D1 | 1,517 | −2.04 | 42.6% | [0, 0] |
| D2 | 1,517 | −2.27 | 42.1% | [0, 0] |
| D3 | 1,517 | −1.62 | 40.7% | [0, 9.3] |
| D4 | 1,517 | +1.00 | 48.9% | [9.3, 11.3] |
| D5 | 1,517 | −4.33 | 38.0% | [11.3, 13.5] |
| D6 | 1,517 | −2.07 | 40.2% | [13.5, 18.7] |
| D7 | 1,517 | −0.83 | 45.8% | [18.7, 29.4] |
| D8 | 1,517 | −1.54 | 40.6% | [29.4, 46.0] |
| D9 | 1,517 | +0.16 | 49.2% | [46.0, 97.1] |
| D10 | 1,518 | +1.68 | 54.7% | [97.1, 633.8] |
Monotonicity is noisy in the middle but the top vs bottom contrast is unambiguous (+3.72 pp mean, +12.1 pp WR, t = 6.30, n = 1,518 vs 1,517).
2.2 firstTimeBuyer (binary, 1 = insider's first ever BUY on this company)
| Bucket | f | n | mean r90 (%) | WR |
|---|---|---|---|---|
| f = 0 (repeat buyer) | 0 | 12,121 | −0.96 | 44.5% |
| f = 1 (first-time) | 1 | 3,050 | −2.78 | 42.6% |
Inverse of CMP 2012 : on this multi-market sample, first-time buyers underperform repeat insiders. Plausible explanation: board newcomers and qualifying-shares routine purchases dominate the f=1 bucket. CMP-2012-style "opportunistic vs routine" is better proxied by Insider.tradingPattern (already classified, already disclosed in winning-strategy.ts as excludeRoutineInsiders opt-in flag, also empirically inverse on FR data per audit 05).
2.3 betaAdjBaseline (−prior 90d sector mean of realised r90)
| Decile | n | mean r90 (%) | WR | f-range (= − % sector mean) |
|---|---|---|---|---|
| D1 (hot sector) | 1,222 | −2.55 | 50.6% | [−13.9, −2.4] |
| D5 | 1,222 | −1.49 | 46.5% | [+0.3, +0.9] |
| D10 (cold sector) | 1,223 | +0.05 | 49.1% | [+2.6, +9.0] |
Buying after a sector has lost ~5%+ over the prior 90 days adds ~+2.6 pp vs buying into a hot sector. t = +2.96. Modest, robust to PIT, dataset-internal (no external index needed).
2.4 pctMarketCap (= totalAmount_eur / marketCap)
| Decile | n | mean r90 (%) | WR | f-range |
|---|---|---|---|---|
| D1 (tiny trade) | 1,326 | −1.38 | 44.0% | ≈ 0 |
| D5 | 1,327 | −0.12 | 46.5% | ≈ 0 |
| D10 (large trade) | 1,327 | −4.16 | 40.9% | [10%, 96.9%] |
Counter-intuitive negative signal, t = −3.46. The largest insider buys (in % of market cap) underperform. Likely driver: heavy %MC trades are concentrated in distressed micro-caps where conviction is unreliable.
3. Recommendation — v6 composite
Two factors clear the bar : monotone D1→D10, |t| > 5 unweighted, IC sign survives the multi-market sample, point-in-time clean.
| New factor | Proposed weight | Rationale |
|---|---|---|
clusterStrengthScore (continuous version of isCluster) |
+5 pts | Replaces the binary isCluster boolean entry-point with a smooth log-amount-weighted score. Use the existing isCluster as a hard gate (still required) but add decile-rank bonus. |
pctMarketCap_neg (penalty for top-decile %MC) |
−3 pts | Subtract from score when pctMarketCap > p90(rolling). Defensive; protects against distressed-micro-cap traps. |
Total composite reshuffle :
v6 composite = 0.32·signal + 0.33·winRate + 0.22·return + 0.08·recency
+ 0.05·clusterStrength30d_rank
− 0.03·pctMarketCap_top10_flag
(weights renormalised to keep total = 1.0 ; baseline v5 was 35 / 35 / 25 / 5).
betaAdjBaseline (sector mean-reversion) has the right sign but its uplift is too thin to publish in v6. Park it for a v7 round once a proper PIT sector-index history (SectorIndexHistory) is wired into the recompute pipeline.
Drop on this run : earningsProximity (n too small, p > Bonferroni), dayOfWeek, daysFromMonthEnd, netInsiderDelta90d (needs SELLs to be meaningful), crossMarketCluster7d (already subsumed by isCluster).
4. Overfitting & honesty caveats
- K = 12 tests searched in this round; combined with the 583,200-trial grid behind the headline Sigma filter, the effective
N_trialsfor any DSR computation on the final composite remains in the 10⁵+ range. Headline composite alpha should still be quoted withsharpeDeflated = nullper current disclosure. - Survivorship bias : delisted companies are missing from
Company, so factor IC values are biased upward on names that survived 2015→2026.clusterStrength30dis less sensitive than absolute-return factors because its decile spread is driven by cluster geometry, not by realised outcome of dead names. - Multi-market regime drift : 17-market mix is dominated by AMF (
63%) and AFM (17%). The same factor on US Form 4 only could rank differently; cross-validation per market is the natural next step. pctMarketCapnegative-sign caution : sign is robust within sample but driver story (distressed micro-caps) is hypothesis, not validated. Before shipping the −3pt penalty, run a sub-sample test onmarketCap > 1B€to confirm the penalty does not destroy alpha on large caps.
5. Concrete code patch proposal for winning-strategy.ts
The composite weights live in src/app/api/v1/scoring/... (computed offline by the scoring cron and persisted in Declaration.signalScore). The patch below is for the SCORING SIDE — winning-strategy.ts itself remains a filter layer, not a scorer. The proposal:
5.1 Add new field to Declaration (optional, nullable)
// prisma/schema.prisma — model Declaration
clusterStrength30d Float? // log-amount-weighted directional cluster (rolling 30d)
pctMcapBucket Int? // 0..9 decile of pctOfMarketCap at scoring time (PIT)
5.2 Scoring cron — drop-in helpers
// src/app/api/v1/scoring/factors-v6.ts
export function clusterStrength30dRank(
distinctInsiders: number,
totalAmountEur: number,
rankPctile: number, // 0..1, this row's rank within its 17-market peer group
): number {
if (distinctInsiders < 1 || totalAmountEur <= 0) return 0;
// bonus capped at +5 pts, linear in rank percentile
return Math.round(5 * Math.min(1, Math.max(0, rankPctile)));
}
export function pctMcapPenalty(pctMcap: number | null): number {
if (pctMcap == null) return 0;
// top-decile threshold ~ 10% of mcap (empirical, audit 32)
return pctMcap > 0.10 ? -3 : 0;
}
5.3 Composite formula update
// src/app/api/v1/scoring/composite.ts (illustrative)
const v6 =
0.32 * signalSub
+ 0.33 * winRateSub
+ 0.22 * returnSub
+ 0.08 * recencySub
+ clusterStrength30dRank(nInsiders30d, totalAmountEur30d, peerRankPct)
+ pctMcapPenalty(decl.pctOfMarketCap);
5.4 Filter side — winning-strategy.ts only needs the cluster threshold lift
WINNING_STRATEGY.minScore should stay at 40 in v5→v6 transition; the new +5/−3 dynamic range slightly widens the score distribution but the +40 anchor still gates the top quartile. Re-run scripts/grid-search-v2.mjs AFTER v6 lands on a fresh nightly recompute, then move minScore if a new optimum emerges. Do not raise minScore blindly in the same release.
6. Files
/Users/azoulay/DEV/insiders-trades/scripts/alpha-discovery.mjs— discovery harness/tmp/alpha-discovery.json— structured per-factor output (regenerate any time)/Users/azoulay/DEV/insiders-trades/docs/method-review/32-alpha-discovery.md— this doc
7. Suggested next steps
- Cross-validate
clusterStrength30dper-market (FR / SEC / AFM / DAX) to confirm sign stability. - Reload
Company.nextEarningsDatefor full 17-market coverage (currently 1,462 / 15,171 ≈ 10 %), then re-run factor #3 with proper n. - Run
pctMarketCapdecile analysis stratified by market-cap bucket to validate the "distressed micro-cap" hypothesis before shipping the −3 pt penalty. - Build a
factor_correlationsmatrix between v5 sub-scores and the two v6 candidates — guards against redundancy.