18 — SEC Form 4 Ingestion: Mapping Decisions + POC Findings
Status: Staging only. NOT merged into the live recommendation engine. Date: 2026-05-15 Next step: Round 3 — full backfill + OOS validation (see checklist below)
1. Why a separate table
| Dimension | AMF Declaration |
SEC SecForm4Filing |
|---|---|---|
| Jurisdiction | France | USA |
| Filing delay | 4 business days | 2 business days |
| Currency | EUR | USD |
| Cumulative net position | insiderCumNet field |
Not in Form 4 (would require accumulation logic) |
| Role taxonomy | PDG / Administrateur / DG | Officer / Director / 10% owner |
| Market universe | Euronext Paris | US exchanges (NYSE, Nasdaq) |
| ISIN presence | Required in AMF report | Optional field in Form 4; CIK is primary key |
| Signal score | signalScore computed |
Not computed yet in staging |
Merging into Declaration would require: (a) USD→EUR fx conversion for amount parity, (b) CIK→ISIN bridge for most tickers, (c) role taxonomy bridge. Premature given no OOS evidence of additive alpha.
2. Transaction code mapping
| Code | EDGAR meaning | Signal interpretation |
|---|---|---|
| P | Open-market purchase | BUY — strongest signal (insider spent own money) |
| S | Open-market sale | SELL — meaningful but noisier (could be diversification) |
| M | Option exercise | BUY context — insider is capturing upside |
| A | Grant/award | Non-market — excluded from directional signal in v1 |
| F | Tax withholding (share surrender) | Non-market — excluded |
| G | Gift | Non-market — excluded |
| D | Disposition to issuer | Non-market — excluded |
| J | Other acquisition | Ambiguous — excluded in v1 |
| U | Tender offer | Context-dependent — excluded in v1 |
| C | Conversion of derivative | Excluded in v1 |
| X | Option exercise/expiry | Excluded in v1 |
In v1 only P and S are directional signals. M could be added in v2 if paired with same-day S (option+sale = synthetic open-market sale).
3. Why only non-derivative transactions in v1
Form 4 contains two tables:
- Non-derivative table: shares of common stock — single price × shares = USD amount. Clean.
- Derivative table: options, warrants, convertible notes. Requires grant date, exercise price, expiry, and conversion ratio to compute economic value. Complex multi-leg structure.
v1 ingests only the first non-derivative transaction per filing. Derivative-only filings return null and are skipped (167/1000 in the POC = 16.7%).
4. Outstanding integration concerns
4a. CIK ↔ ISIN cross-reference
- SEC CIKs are the primary key on EDGAR. Most US-listed companies have an ISIN.
- EDGAR's company tickers endpoint (
https://www.sec.gov/files/company_tickers_exchange.json) maps CIK → ticker → exchange. - For ISIN, SEC EDGAR does not publish a CIK→ISIN table. External source required: OpenFIGI API (free, maps ticker+exchange → ISIN) or GLEIF LEI database.
- Without ISIN, US filings cannot be price-matched to Yahoo Finance symbols for backtesting.
4b. USD→EUR fx conversion
- AMF
Declaration.totalAmountis in EUR. SEC amounts are in USD. - For cross-jurisdiction comparison and combined backtest, we need daily USDEUR rates.
- Source: ECB daily reference rates (
sdw-wsrest.ecb.europa.eu) or Yahoo FinanceEURUSD=X. - A
FxHistorytable (similar toPriceHistory) would hold daily USDEUR close.
4c. Role taxonomy bridge
- US
officerTitleis free text: "Chief Executive Officer", "CFO", "Director", "EVP Operations", etc. - AMF
insiderFunctionuses controlled vocabulary: PDG, DG, Administrateur, Directeur Financier, etc. - Bridge needed: regex/NLP classifier mapping US title → AMF bucket → signal weight bucket.
- PDG equivalent = CEO/President; Administrateur equivalent = Director; CFO equivalent = Directeur Financier.
4d. Multi-transaction filings (v1 limitation)
- Current: one row per filing, first non-derivative transaction only.
- Impact: some filings report a grant + same-day sale (2 rows in the XML). We capture only the grant.
- Fix in v2: change PK to
(accessionNumber, txIndex)and ingest all non-derivative rows.
5. POC results (backfill run: 30 days, 1000 refs)
Run date: 2026-05-15
Duration: ~14 minutes (1000 HTTP requests at 5 req/s)
Universe: 30 days back, EFTS full-text search for forms=4, ~10,000 total available in window
| Metric | Value |
|---|---|
| Filing refs collected | 1,000 (capped) |
| Successfully inserted | 313 |
| Skipped (null-parse) | 167 |
| Skipped (derivative-only) | included in null-parse |
| Errors (Neon idle timeout) | 2 (transient, self-recovered) |
| Unique tickers | 364 |
Null-parse rate: 16.7% — mostly derivative-only filings (no nonDerivativeTable) and a small subset of malformed / SGML-only filings from smaller filers.
Transaction code distribution (all rows in staging table)
| Code | Description | Count |
|---|---|---|
| A | Grant | 275 |
| S | Sale | 142 |
| M | Option exercise | 117 |
| P | Purchase | 87 |
| F | Tax withholding | 40 |
| J | Other | 22 |
| G | Gift | 19 |
| U | Tender offer | 16 |
| D | Disposition | 10 |
| C | Conversion | 5 |
| X | Option/expiry | 1 |
Directional signal candidates (P + S): 229 out of 734 rows = 31%. Grants (A) dominate at 37%.
Mean transaction amount (USD)
$3,750,960 (n=377 filings with a non-zero price). High mean driven by large block sales from executives at large-cap companies.
Top 5 issuers by filing count
| Count | Issuer |
|---|---|
| 16 | Apellis Pharmaceuticals, Inc. |
| 11 | AEVEX Corp. |
| 10 | GDS Holdings Ltd |
| 9 | FIRST HORIZON CORP |
| 9 | GXO Logistics, Inc. |
6. Round 3 checklist (full integration)
- FX history: Add
FxHistorytable and daily USDEUR backfill (ECB or YahooEURUSD=X) - CIK→ticker bridge: Ingest
company_tickers_exchange.jsonfrom EDGAR → CIK→ticker→exchange lookup table - ISIN bridge: Call OpenFIGI API for each unique ticker → get ISIN → join to Yahoo Finance for price history
- Full backfill: Remove 1000-cap, backfill last 5 years (2021–2026). ~250k filings estimated. Use concurrent batching with 5 req/s global rate limit.
- Multi-tx support: Change PK to
(accessionNumber, txIndex)to capture all non-derivative rows per filing - Role bridge: Build regex classifier for US officer titles → AMF role buckets → signal weight tier
- Signal scoring: Compute
signalScoreequivalent for US filings using same quant features (pct of market cap, cluster detection, insider role weight, trading pattern) - Walk-forward backtest: Re-run walk-forward on combined FR+US universe using retail-view returns (pubDate+1 entry)
- Merge gate: Only merge into live engine if combined OOS Sharpe ≥ FR-only OOS Sharpe + 0.20. Otherwise, US data stays staged.