49. Schema harmony: MIC, country, ISIN, currency
Date: 2026-05-17 Author: schema-harmony pass (refactor session) Status: applied to Neon production (Company.market + countryCode), pending for ISIN uniqueness
1. Context
Audit before this pass found Company.market was a dumpster:
887 null # legacy AMF (FR) imports
181 XOSL # valid MIC
120 XAMS # valid MIC
107 XMIL # valid MIC
79 "NONE" # sentinel
65 "SIX" # legacy SIX-Swiss alias (now XSWX)
61 XSTO # valid MIC
60 XHEL # valid MIC
46 XCSE
35 "N/A" # sentinel
32 XMAD
23 XBRU
19 XETR
18 XWBO
8 "NA" # sentinel
…
6,000+ rows: tickers (ATI, HROW, SPG, J, CRM, GOOGL …)
600+ rows: numeric CVM / DART codes (24112, 034020, 5258, 8451 …)
3 rows: free-form labels (LSE, BIT, BIST …)
In total: 6,729 distinct values across 9,249 companies. 767 were already valid MIC codes; the other ~92% required normalization.
2. Canonical key tables
The three load-bearing tables are:
Company
id— cuid (PK)slug— uniqueamfToken— unique. Primary regulator key. Format<REGULATOR>:<external_id_or_slug>for non-FR, or bareRS00xxxxxfor FR (legacy AMF).isin— nullable. Currently NOT unique. See section 4.market— now an ISO 10383 MIC (XPAR, XAMS, XNAS, XHKG, BVMF, …). Nullable when truly unknown.countryCode— new column. ISO 3166-1 alpha-2 (FR, US, JP, …). Nullable when unknown. Indexed.
Declaration
amfId— uniquecompanyId,insiderIdtotalAmount— EUR-normalized (cumulative scoring uses EUR throughout).nativeAmount+nativeCurrency— original-currency snapshot (preserved for audit and per-country display).currency— ISO 4217 oftotalAmount. Always non-null after this pass.
IngestionRun / per-regulator filing tables
Out of scope for this pass: they hold raw, regulator-shaped data and stage into Company/Declaration via scripts/run-merge.ts.
3. The mapping
Single source of truth: src/lib/markets/mic-mapping.json. Three layers:
regulatorToMic— the canonical mapping fromamfTokenprefix to{ mic, country, currency }. Examples:SEC:→XNAS / US / USD(defaults to NASDAQ; cannot reliably split XNAS / XNYS / XASE without per-ticker enrichment)BAFIN:→XETR / DE / EURRNS:→XLON / GB / GBPCVM:→BVMF / BR / BRLDART:→XKRX / KR / KRWHKEX:→XHKG / HK / HKDEDINET:/JP:→XTKS / JP / JPY- no prefix →
XPAR / FR / EUR(legacy AMF)
dirtyValueMap— covers free-formCompany.marketvalues:- already-MIC values pass through unchanged (
XAMS→XAMS) - Yahoo-style suffixes are translated (
PA→XPAR,L→XLON,DE→XETR,MI→XMIL,AS→XAMS,T→XTKS, …) - sentinels (
NONE,N/A,NA,GTM, empty,null) map tonull
- already-MIC values pass through unchanged (
micToCountry+micToCurrency— full MIC → ISO 3166 / ISO 4217 reference table; covers 20+ markets including the BRICS/PIIGS/ASEAN expansion targets.
Resolver resolveCanonical(amfToken, market) in src/lib/markets/mic.ts:
- try
dirtyValueMap[market]— if it hits, use that MIC - else use the regulator prefix on
amfToken - else (no prefix → legacy AMF) →
XPAR / FR / EUR - derive country + currency from the MIC
This is idempotent: clean rows stay clean, dirty rows converge.
4. ISIN normalization & uniqueness
Helper (this pass)
src/lib/markets/isin-normalize.ts:
normalizeIsin(raw)— strips whitespace / hyphens / dots, uppercases, fixes OCR errors at position 12 (the numeric check digit:O→0,I→1,S→5,B→8,Z→2), validates shape^[A-Z]{2}[A-Z0-9]{9}[0-9]$.isValidIsinCheckDigit(isin)— Luhn-mod-10 over letter-expanded digits.normalizeIsinStrict(raw)— both combined; returnsnullfor anything that won't pass.
Uniqueness plan (NOT applied this pass)
The audit folded 16 duplicate-ISIN companies on commit e23e9f1. Going forward we want a partial unique index rather than a strict unique constraint, because:
- ETFs, dual-listings, and pre-IPO drafts share ISINs across legitimate
Companyrows in some regulators (notably CVM Brazil and HKEX H-shares). - We still want to forbid duplicates within the same
(market, isin)pair.
Proposed migration (Simon to review):
-- Partial unique index: enforce uniqueness only when isin is present.
CREATE UNIQUE INDEX IF NOT EXISTS "Company_market_isin_unique"
ON "Company" ("market", "isin")
WHERE "isin" IS NOT NULL;
Pre-flight check before applying:
SELECT market, isin, COUNT(*)
FROM "Company"
WHERE isin IS NOT NULL
GROUP BY market, isin
HAVING COUNT(*) > 1;
If any rows return, run the dedupe migration first (mirror of the e23e9f1 logic, scoped to (market, isin)).
5. Currency standardization
After this pass:
Declaration.currencyis non-null on 100% of 173,488 rows. The 68 nulls were backfilled from each row'sCompany.market→ MIC → currency.- Per-market expectation:
| MIC | country | currency |
|---|---|---|
| XPAR / XAMS / XETR / XMIL / XMAD / XHEL / XWBO / XBRU / XDUB / XLIS | FR / NL / DE / IT / ES / FI / AT / BE / IE / PT | EUR |
| XLON | GB | GBP |
| XOSL | NO | NOK |
| XCSE | DK | DKK |
| XSTO | SE | SEK |
| XSWX | CH | CHF |
| XNAS / XNYS / XASE / OTCM | US | USD |
| XASX | AU | AUD |
| XTSE / XTSX | CA | CAD |
| XTKS | JP | JPY |
| XKRX / XKOS | KR | KRW |
| XSES | SG | SGD |
| XBOM / XNSE | IN | INR |
| XHKG | HK | HKD |
| BVMF | BR | BRL |
No FX conversion is performed by this pass. totalAmount remains EUR-normalized (existing nativeAmount + nativeCurrency preserved untouched).
6. What changed in the DB
| Surface | Before | After |
|---|---|---|
Company.market distinct values |
6,729 | 22 (all valid MIC) |
Company.market NULL |
887 | 0 |
Company.countryCode |
column did not exist | 100% populated, 22 distinct values |
Declaration.currency NULL |
68 | 0 |
7. Pending migrations (queued for Simon's review)
Company_market_isin_uniquepartial unique index (section 4).- Optional: tighten
Company.markettoNOT NULL(would need a backfill default likeXXXXfor the rare unmapped row). Currently kept nullable to honor "we genuinely don't know". - Optional: a
mic_lookupreference table seeded frommic-mapping.jsonso the DB can enforcemarket IN (...)via FK rather than via app-layer validation.
8. Files
src/lib/markets/mic-mapping.json— canonical mapping tablesrc/lib/markets/mic.ts— typed accessors +resolveCanonicalsrc/lib/markets/isin-normalize.ts— ISIN cleaner + Luhn checkscripts/normalize-market-mic.ts— one-shot DB cleanup (idempotent,--applyto write)prisma/migrations/20260517800000_company_country_code/migration.sql— addscountryCode+ indexesprisma/schema.prisma— Company model updated withcountryCode, indexes onmarket+countryCode