19 — SEC Form 4 Round 3: CIK Bridge, FX History, Role Mapping, Extended Backfill
Status: Infrastructure deployed. Backfill in progress. Combined walk-forward pending full data. Date: 2026-05-15 Closes: Items 1–3, 5, 6 from the round 2 open issues list (doc 18).
1. What landed this session
| Deliverable | File | Status |
|---|---|---|
CompanyIssuer schema model |
prisma/schema.prisma |
Done |
FxHistory schema model |
prisma/schema.prisma |
Done |
| Prod SQL migrations | /api/migrate/route.ts |
Done — both tables live |
| CIK ↔ ISIN bridge lib | src/lib/cik-isin-bridge.ts |
Done |
| FX history lib | src/lib/fx-history.ts |
Done |
| US role mapping lib | src/lib/us-role-mapping.ts |
Done |
| CIK bridge builder script | scripts/build-cik-isin-bridge.mjs |
Done |
| FX backfill script | scripts/backfill-fx-history.mjs |
Done, ran — 1,300 rows |
| Extended Form 4 backfill | scripts/backfill-sec-form4-extended.mjs |
Done |
| Multi-window backfill runner | scripts/run-multi-window-backfill.mjs |
Done |
| Combined walk-forward harness | scripts/backtest-combined-universe.mjs |
Done |
| EFTS URL fix (q= param) | src/lib/sec-form4.ts + extended backfill |
Done |
| Cron steps 13 + 14 | src/app/api/cron/route.ts |
Done |
| tsc pass | — | Clean, 0 errors |
2. Coverage stats (session results)
FX History
- Rows inserted: 1,300 daily bars
- Date range: 2021-05-16 → 2026-05-15 (5 years)
- Source: Yahoo Finance
EURUSD=X(1d interval,5yrange)
CompanyIssuer bridge
- CIK rows upserted: 7,996 (of 10,353 SEC total — remainder were missing tickers)
- ISIN resolution via Yahoo: 0 / 625 filing tickers (0%)
ISIN coverage note: Yahoo Finance's quoteType.isin field is only populated for ADRs and foreign-listed securities. US-listed domestic stocks return null. This is a structural Yahoo limitation, not a bug. Paths forward:
- OpenFIGI API (
OPENFIGI_API_KEYenv var):tryResolveIsinOpenFigi()is already wired incik-isin-bridge.ts. Free tier = 25 req/min. Set key and re-run bridge builder. - SEC EDGAR submissions API:
https://data.sec.gov/submissions/CIK{padded}.jsoncontainstickers+ some ISIN fields for larger registrants. - Without ISIN, US filings cannot be price-matched; the combined harness skips them with
usSkippedNoIsincounter.
SEC Form 4 backfill
- Pre-session DB: 734 rows (POC)
- Session target: 5,000
- Session result: ~4,500 rows (multi-window run in progress at session close)
- Unique tickers: ~1,200+
- P/M (directional signal) trades: ~1,100
EFTS pagination cap: The EFTS q= query caps at ~400 results per date window. The extended backfill iterates multiple monthly sub-windows (Nov 2025 – Apr 2026) to bypass this. Each window adds ~300–550 rows.
3. Combined walk-forward results
Result: THIN DATA — needs full backfill.
Current state:
- US trades in staging have
ret90d = null(no price history per US ticker linked yet) - Combined walk-forward runs on FR returns only — US signals do not yet contribute to the OOS metric
- FR-only baseline: ~300 backtested declarations with
returnFromPub90d— sufficient for a few folds but marginal - The merge gate evaluation requires combined > baseline by +0.20 Sharpe with p < 0.10
To properly evaluate: link US tickers to Yahoo symbols via CompanyIssuer.ticker → PriceHistory, then re-run the harness. This is gated on full ISIN resolution.
4. Round 4 checklist
Immediate (next session)
- Set OPENFIGI_API_KEY in env — run
build-cik-isin-bridge.mjsagain — expected coverage: 60–80% of US filing tickers - Link US ticker to Yahoo symbol:
CompanyIssuer.ticker→ Yahoo Finance price history viaPriceHistorytable (same pattern as existingrefreshTrackedSymbols) - Compute US backtest returns: Add a
computeUsBacktestIncremental()function that fetches Yahoo charts for US tickers and writesreturnFromPub30/90/365donto aSecForm4BacktestResultmodel (or extendSecForm4Filingwith return columns) - Re-run combined walk-forward after above steps
Near-term (batched cron cycles)
- Reach 250k filings via batched runs of
run-multi-window-backfill.mjs:- Each cron cycle can add ~1,500 filings (5 windows × 300/window)
- 250k filings ÷ 1,500/cycle ≈ 167 cron cycles ≈ ~6 months of daily cron
- Alternative: run
run-multi-window-backfill.mjsmanually in weekly batches going back to 2021
- Re-run combined walk-forward after each 25k checkpoint — log results in doc 20
- Multi-tx support (v2): Change PK to
(accessionNumber, txIndex)to capture all non-derivative rows per filing — increases signal density by ~15%
Merge gate
- If gate cleared at 100k+ filings (combined OOS Sharpe ≥ FR-only + 0.20, p < 0.10):
- Create
SecForm4BacktestResultmodel with same structure asBacktestResult - Wire US signals into
getRecommendations()as a separate universe lane with gated flag - A/B test: 50% of users see FR-only, 50% see combined (measure live Sharpe divergence over 30d)
- Merge fully only after live test confirms signal quality
- Create
EU expansion (longer term)
- BaFin (Germany): Directors' dealings feed at
https://www.bafin.de/DE/PublikationenDaten/Datenbanken/Directors_Dealings/directors_dealings_node.html— XBRL format - CONSOB (Italy):
https://www.consob.it/web/investor-education/operazioni-dei-dirigenti— PDF-heavy like AMF - AFM (Netherlands):
https://www.afm.nl/en/professionals/registers/melding-zeggenschap— structured XML - Each jurisdiction needs its own currency → EUR converter (already handled by
FxHistorypattern) - EU expansion adds ~3x the signal universe vs US alone; deferred until US gate cleared
5. Architecture notes
EFTS pagination workaround
EDGAR EFTS search caps at ~400 hits per query when using the q= parameter. To get more filings:
- Iterate 2-week or monthly sub-windows going back 5 years (2021–2026)
- Use
efts.sec.gov/LATEST/search-index?q=%22form+4%22&dateRange=custom&startdt=X&enddt=Y&forms=4 - 250k filings ÷ ~350/window = ~714 windows ÷ 5 req/s = ~4.8 hours total fetch time
ISIN bridge path
Current: CIK → ticker (SEC bulk) → ISIN via Yahoo (fails for US domestic) Target: CIK → ticker → ISIN via OpenFIGI → Yahoo symbol → price history
FX cache
convertUsdToEur() caches last 30 days of FX rates in-memory (1-hour TTL). For the backtest harness, use loadFxRateMap() + fxRateOnOrBefore() for bulk processing — no per-row DB hits.