34 — Alpha Discovery (round 2)
Date : 2026-05-17
Script : scripts/alpha-discovery-round2.mjs
Universe : 15,171 BUY declarations with realised returnFromPub90d (winsorised at [−50.0%, +60.6%]). FR-dominated (~93%), 17-market priced subset of the 162,573 declaration corpus. Same universe as round 1 (audit 32) so figures are directly comparable.
Round 1 recap : 12 factors tested, only clusterStrength30d cleared the bar. Round 2 widens the search to 33 candidate factors with deeper feature engineering across 5 buckets (insider behaviour, company state, macro regime, microstructure, cross-source).
0. Methodology
For each candidate factor f:
- Compute
fper declaration with strict point-in-time inputs — only events withpubDate < t(and, where the input is a realised return,pubDate + 90d ≤ t) enter rolling windows. PIT-strict critical fix described in §3. - Spearman rank IC between
fand r90. - Decile split (Q=10) ; for binary factors degenerate-decile fallback to f=0 vs f=1 buckets.
- Welch t-stat top-decile vs bottom-decile.
- Bonferroni adjustment on K = 33 tests.
- Bailey–López de Prado DSR penalty
√(2·ln K / T)with T = 10 yearly buckets → penalty ≈ 0.836.
Universe coverage and 4 schema fields (shortInterest, full earnings consensus, VIX, 10y-2y curve) are missing or too sparse — flagged below as "skipped honestly". The 33 actually-tested factors are tabulated.
1. Ranked results (winsorised T+90, %)
PIT-clean ✓ ; snapshot/static fields are NOT point-in-time and marked ⚠.
| Rank | Factor | n | IC | t | ΔTop−Bot (% pts) | TopWR | BotWR | PIT |
|---|---|---|---|---|---|---|---|---|
| 1 | companyFirstFiling_yearsAgo |
14,798 | +0.123 | +11.21 | +7.31 | 47.6% | 33.5% | ✓ |
| 2 | regimeEURUSD_change30d |
15,171 | +0.081 | +9.72 | +6.08 | 61.0% | 48.5% | ✓ |
| 3 | companyAnalystReco_score |
9,667 | +0.073 | +9.34 | +8.08 | 49.1% | 37.7% | ⚠ snapshot |
| 4 | microPubLagDays |
14,977 | −0.073 | −9.03 | −5.67 | 32.0% | 43.0% | ✓ |
| 5 | insiderHabitualBuyer_count365d |
15,171 | +0.018 | −8.06 | −4.71 | 39.2% | 47.8% | ✓ |
| 6 | companyLogMcap |
15,130 | +0.072 | +7.13 | +5.05 | 51.5% | 35.6% | ⚠ snapshot |
| 7 | microMonthOfYear |
15,171 | +0.006 | +5.55 | +3.58 | 47.8% | 40.1% | ✓ |
| 8 | companyInsiderHeldPct |
15,071 | +0.034 | +5.55 | +3.71 | 43.7% | 39.0% | ⚠ snapshot |
| 9 | companyIPORecent3y_flag |
14,798 | −0.061 | −5.24 | −1.74 | 43.3% | 49.9% | ✓ |
| 10 | regimeSectorMomentum_60d |
12,019 | −0.030 | −4.48 | −4.18 | 51.2% | 53.2% | ✓ |
| 11 | insiderPriorWinRate_90d_shrunk (PIT-strict) |
9,071 | −0.055 | −4.40 | −3.36 | 38.8% | 52.7% | ✓ |
| 12 | companyMcapBucket_smallcap_flag |
15,130 | −0.040 | −3.90 | −1.45 | 40.5% | 45.7% | ⚠ snapshot |
| 13 | insiderRolePromoted_flag |
11,326 | +0.057 | +3.44 | +1.59 | 51.6% | 44.1% | ✓ |
| 14 | insiderClusterRecency_days |
12,784 | −0.008 | −3.35 | −2.41 | 43.7% | 52.8% | ✓ |
| 15 | companyDebtToEquity |
13,639 | +0.005 | −3.17 | −2.05 | 41.7% | 44.5% | ⚠ snapshot |
| 16 | companyPriceMomentum_90d |
9,534 | −0.108 | −2.97 | −3.28 | 37.1% | 50.2% | ✓ |
| 17 | companyPriceMomentum_30d |
9,861 | −0.087 | −2.41 | −2.70 | 40.4% | 56.2% | ✓ |
| 18 | companyBeta |
14,987 | −0.048 | −2.33 | −1.79 | 43.1% | 43.7% | ⚠ snapshot |
| 19 | microGap2pct_flag |
9,881 | +0.008 | +2.18 | +1.12 | 48.2% | 46.6% | ✓ |
| 20 | insiderRoundLot_flag |
15,171 | −0.014 | −2.07 | −0.63 | 43.8% | 44.5% | ✓ |
| 21–33 | ( | t | < 2 or n < 1k) | — | — | — | — | — |
Skipped honestly (sparse / no DB coverage) :
companyShortInterest_ratio (n=769 only), companyVolumeSpike_flag (no volume in PriceHistory), companyEarningsDeltaConsensus (no consensus history), regimeVIX_level (no VIX series), regimeYieldCurve_inverted (no rates), microPostFiling_drift_3d (lookahead by construction), companyDualListing_flag (no ISIN bridge coverage), clusterCrossMarket_count (no group-mapping table). The 4 macro factors are the most painful absence — they would let us test regime-conditional alpha properly. Recos in §5.
2. Critical PIT-leakage finding
The first version of this script ranked
insiderPriorWinRate_90d_shrunkat t = +31.5, IC = +0.355, TopWR 76% / BotWR 17%. That headline collapsed to t = −4.4, TopWR 38.8% / BotWR 52.7% after a single line fix (h.ts + 90d ≤ tinstead ofh.ts < t). Without the gate, "prior trades" with unrealisedr90shared the market regime of the current row → factor was leaking +35pp. Round 1 of this audit shipped before the fix and would have over-published it had the script been promoted blindly. The corrected sign is slightly negative — insiders with strong recent track records on this dataset under-perform — a CMP-2012-style "regression to mean / routine repeat buyers" hypothesis, NOT alpha.
The same trap would have hit any factor of the form "recent realised performance of related entity" — sector momentum (audit 32) escaped it because we already lagged by 90d there ; new factors insiderHabitualBuyer_count365d is COUNT-based (no r90 input → safe).
3. Top 5 honest verdicts
3.1 companyFirstFiling_yearsAgo (t = +11.21, IC = +0.123, PIT ✓)
Decile spread monotone-ish, D1 (companies with no prior filing history) mean −7.16%, D10 (4y+ filing history) +0.06%. Reading: insiders on newly-filing companies (often recent IPOs or first-time AMF-registrants) underperform. The signal is dominated by D1 vs D2-D10 (a stepped jump rather than a continuous trend) — it is functionally an "IPO penalty" factor. companyIPORecent3y_flag (rank 9) is the discrete version.
Verdict : real but it's selection (avoid recent IPOs), not a new alpha source.
3.2 regimeEURUSD_change30d (t = +9.72, IC = +0.081, PIT ✓)
D10 (EUR appreciated >+2.6% in 30d before filing) → mean r90 +5.53%, WR 61.1%. D1 (EUR weakened ≥-2.5%) → r90 −0.64%. Counter-intuitive vs the audit-32 hypothesis (EUR weakness drives foreign-buyer inflow); here EUR strength is bullish.
Verdict : plausible regime factor, but the sign is correlated with broad risk-on (when EUR strengthens, equities tend to rally globally). It is mostly a proxy for the missing CAC40-SMA-50d trend filter. Until we wire actual CAC40 history (PR-able from Yahoo ^FCHI), this is the closest macro signal available.
3.3 microPubLagDays (t = −9.03, IC = −0.073, PIT ✓)
D10 (lag > 100 days) → r90 −7.59%, WR 32%. D1 (lag <1.7 days) → r90 −1.92%, WR 43%. Confirms existing v5 hard filter at pubDelay ≤ 7d. Late-disclosed trades carry no information edge — alpha is dissipated by the time the market sees the PDF.
Verdict : already in the v5 winning-strategy filter as maxPubDelayDays: 7. Nothing to add to v6.
3.4 insiderHabitualBuyer_count365d (t = −8.06, PIT ✓)
D2 ([0,0]) → r90 −4.72%, D10 (88-343 prior buys, year) → r90 −4.04%. Non-monotone with a sweet spot in the middle (D6 +0.30%). The two tails (no prior signal | hyperactive trader) both underperform. Signal is U-shaped, hard to use directly. Confirms CMP-2012 "routine" cohort under-performs finding from audit 05 — directionally consistent.
Verdict : already partially captured by Insider.tradingPattern classification. Could promote to a soft penalty −1 pt when count365d > 50.
3.5 companyAnalystReco_score (t = +9.34, IC = +0.073, ⚠ SNAPSHOT — NOT PIT)
D10 (analyst score 4 = strong_buy) → r90 +8.08% over D1 (sell). DO NOT SHIP. Company.analystReco is a snapshot of the current rating, not the rating at filing time. A company that the sell-side rates "strong_buy" today probably also performed well during the last 90 days — the factor is partly reading the future. Need a CompanyAnalystSnapshot time-series table before this can be promoted.
Verdict : honestly inconclusive. The IC sign is plausible but the look-back is contaminated by snapshot bias.
4. v6 composite proposal
Round 1 already proposed +5pts cluster-strength rank, −3pts pctMcap-top10 penalty. Round 2 adds three changes, all small, all conditional on a fresh nightly recompute :
v6_round2 composite =
0.30 · signal // -2pts from v5.1 (35→30)
+ 0.32 · winRate // -3pts (35→32)
+ 0.22 · return // unchanged from v5.1 (25→22 = -3, redistributed)
+ 0.07 · recency // -3pts (10→7)
+ 0.05 · clusterStrength30d_rank // KEPT from round 1
− 0.03 · pctMcap_top10_flag // KEPT from round 1
− 0.02 · companyFirstFiling_yearsAgo<1y_flag // NEW: IPO-recent penalty
+ 0.02 · pubLag_fresh3d_flag // NEW: ultra-fresh bonus (lag ≤ 3d)
− 0.01 · insiderHabitualBuyer_count365d>50_flag // NEW: hyperactive routine penalty
Net dynamic-range : +0.10 / −0.06 — moves are minimal so WINNING_STRATEGY.minScore should remain at 40 in the v6 transition (re-run grid-search-v2 after v6 lands, then re-anchor).
Factors rejected for v6 despite t > 4 : regimeEURUSD_change30d (waiting on real CAC40 series + sector-conditional version), companyAnalystReco_score (snapshot bias unresolved), companyLogMcap / companyInsiderHeldPct / companyBeta (all snapshot — risk PIT bias replicates the audit-28 finding).
5. DSR-adjusted verdict
| Factor | t | Per-trade SR (t/√n) | Annualised crude (×√(252/90)) | DSR adj (− 0.836) | Verdict |
|---|---|---|---|---|---|
companyFirstFiling_yearsAgo |
11.21 | 0.092 | 0.155 | −0.68 | DSR negative |
regimeEURUSD_change30d |
9.72 | 0.079 | 0.132 | −0.70 | DSR negative |
microPubLagDays |
−9.03 | 0.094 | 0.157 | −0.68 | DSR negative |
insiderHabitualBuyer_count365d |
−8.06 | 0.065 | 0.109 | −0.73 | DSR negative |
companyLogMcap (snapshot) |
7.13 | 0.058 | 0.097 | −0.74 | DSR negative |
No round-2 factor clears Bailey–López de Prado. The cross-sectional SR-equivalent on the strongest factor is ≈ 0.155, the DSR penalty for K=33 is 0.836 — even a hypothetical SR = 1.0 would barely clear. The honest published verdict aligns with audit 32 §4 : individual factor-level alpha is below the DSR floor on this universe ; the only defensible publish-worthy modifications to the v6 composite are the small weight perturbations in §4, applied conjointly with the existing v5 filter (which carries its own historical performance disclosure with sharpeDeflated = null).
6. Files
/Users/azoulay/DEV/insiders-trades/scripts/alpha-discovery-round2.mjs— round-2 discovery harness (33 factors, PIT-strict)/tmp/alpha-discovery-round2.json— structured per-factor output/Users/azoulay/DEV/insiders-trades/docs/method-review/34-alpha-discovery-round2.md— this doc
Patch proposal for src/lib/winning-strategy.ts (NOT applied — DON'T commit) :
// src/app/api/v1/scoring/factors-v6.ts (new file)
export function ipoRecent3yPenalty(firstFilingYearsAgo: number | null): number {
if (firstFilingYearsAgo == null) return 0;
return firstFilingYearsAgo < 1 ? -2 : 0; // -2pts (audit 34 §4)
}
export function pubLagFreshBonus(pubDelayDays: number | null): number {
if (pubDelayDays == null) return 0;
return pubDelayDays <= 3 ? +2 : 0; // +2pts (audit 34 §4)
}
export function habitualBuyerPenalty(count365d: number): number {
return count365d > 50 ? -1 : 0; // -1pt (audit 34 §4)
}
// composite update — additive on top of v5.1 (signal/winRate/return/recency)
// All weights renormalised so total = 1.0 ± 0.10 dynamic-range from §4 bonuses.
WINNING_STRATEGY.minScore stays at 40 until a fresh scripts/grid-search-v2.mjs re-tunes the threshold after v6 lands.
7. Next steps
- Wire
^FCHIYahoo history intoPriceHistory(single new symbol, daily, 2015→) — unlocks properregimeCAC50dma_aboveinstead of the EUR/USD proxy. - Build
CompanyAnalystSnapshottime-series (cron-snapshot daily YahoorecommendationKey) socompanyAnalystReco_scorecan be PIT-tested honestly. - Re-run round 2 with PIT-strict snapshot fundamentals (need
CompanyMarketCapSnapshotdaily) — the snapshot-flagged factors above probably collapse likeinsiderPriorWinRate_90d_shrunkdid. - Stratify the surviving 4 PIT-clean factors per market (FR / SEC / AFM / pooled) to detect regime-specific alpha that the pooled rank hides.
- After (1)+(2) land, re-attempt VIX + 10y-2y series ingestion (FRED API, free tier) and re-run the regime block — these are the most likely candidates for genuine new alpha.