28, Financial Coherence Audit (Senior Quant Review)
Reviewer: senior quantitative analyst (audit role) Scope: scoring v3-v5, multi-market scoring, backtest pipeline, STRATEGY_PROOF, FX, recommendation engine Date: 2026-05-17 Verdict (TL;DR): PREVIEW-ONLY. Multi-market scoring is NOT production-ready. FR alone is shippable with disclosure. 6 CRITICAL coherence defects detected; full list below.
0. Preconditions check
Task spec required:
Declaration≥ 50,000 + ≥10 market namespaces, NOT MET (33,165 declarations, 8 market namespaces incl.frlegacy)docs/method-review/25-final-multi-market-scoring.md, DOES NOT EXIST (latest doc is24-value-prop-2026.md). Yetrecommendation-engine.ts:60already cites this missing doc in a comment, speculative reference.BacktestResultper-market for ≥3 markets, NOT MET.BacktestResulthas nomarketcolumn; all 24,124 BUY rows derive from FR only (after joining via Company.market, the column is corrupted with ticker symbols and 77% NULL).
Decision: precondition explicitly not met, but waiting another 5h would not change the structural issues exposed below (architecture / data-quality, not data-volume). Audit proceeds on current state. Re-audit must be re-run after the missing prerequisites land.
1. CRITICAL incoherences
C1, Currency labels are systematically wrong (all non-FR markets)
Severity: CRITICAL. What's wrong: Every non-FR declaration is tagged currency='EUR' in the DB even though SEC=USD, RNS=GBP, SIX=CHF, SEDI=CAD, BaFin=EUR (correct by accident), Consob=EUR, CNMV=EUR.
Evidence (DB):
us n=6206 currency='EUR' (should be USD)
uk n=176 currency='EUR' (should be GBP)
ch n=226 currency='EUR' (should be CHF)
ca n=68 currency='EUR' (should be CAD)
Also 27 rows with currency='VOL' (invalid ISO-4217 code; junk parser output).
Why it matters: scoring uses raw totalAmount (no FX conversion anywhere in recommendation-engine.ts nor backtest-compute.ts, grep currency|fx|USD returns 0 matches). A €10M cluster in FR and a $10M cluster in US are treated as identical in size buckets and pctOfMarketCap thresholds. Backtest cross-market aggregation produces meaningless averages.
Proposed fix:
- In each market ingester, write the correct ISO currency code (
SECForm4 → USD,RNS → GBP,SEDI → CAD,SIX → CHF). - Backfill DB:
UPDATE Declaration SET currency='USD' WHERE amfId LIKE 'SEC:%'etc. - Introduce
lib/fx-history.ts::convertToEur(amount, currency, asOfDate)covering CHF/GBP/CAD/USD with proper EURXXX pairs. The current helper is hardcoded toPAIR='EURUSD'. - Call
convertToEureverywhere scoring/backtest readstotalAmount.
C2, FX history coverage gaps make non-EUR markets un-priceable
Severity: CRITICAL. What's wrong:
EURUSD 1300 bars 2021-05-16 → 2026-05-15 (5y, OK)
CADEUR 261 bars 2025-05-14 → 2026-05-15 (1y, partial)
CHFEUR 66 bars 2026-02-16 → 2026-05-15 (3 months, hole)
GBPEUR 24 bars 2026-04-14 → 2026-05-15 (1 month, hole)
NOKEUR missing
SEKEUR missing
JPYEUR missing
AUDEUR missing
SIX SER (CH) filings reach back years; with 3 months of CHF, every historical trade gets a NULL rate or near-current snapshot, both wrong for PIT scoring.
Why it matters: backtest will silently drop or mis-convert non-EUR trades. Look-ahead bias when filling missing rates from the most recent (vs PIT) close.
Proposed fix: extend fx-history.ts to support all MARKET_CURRENCY codes; backfill ≥5y of daily closes per pair from Yahoo (EURGBP=X, EURCHF=X, etc.). Add a CI alarm if any required pair has gap >7 trading days.
C3, Outlier handling absent in scoring + backtest stats
Severity: CRITICAL. What's wrong:
- Max
return90din BacktestResult = +1385.56% on a single trade (likely a small-cap pump or split-adjustment glitch). - Top
totalAmountrows includeJun Hu, €34.6 billion(2026-05-13) and Pershing Square€4.1Bmislabeled EUR (almost certainly USD). 34.6B EUR for a single insider is structurally impossible. - 1,160 declarations exceed €10M and 223 exceed €100M, with zero capping or winsorization in
backtest-compute.ts(sharpe,avg,winRateuse raw arrays). quant-stats.ts::winsorizeEXISTS but is not invoked bybacktest-compute.ts, only bywinning-strategy.tsaggregation.
Why it matters: cross-sectional Sharpe and avg returns published to users are inflated by tail. A single ill-converted trade can swing aggregate stats by several percent.
Proposed fix:
- Apply
winsorize(returns, 0.01)(1% each tail) inaggregateGroup()before computing avg/Sharpe. - Cap
totalAmountdisplayed/scored at the market-conditioned 99.5th percentile (~€50M for FR). - Add a sanity-check script that flags trades >€100M as
needsReview=trueand excludes from aggregates pending manual triage. - Re-derive Sharpe with both raw and winsorized, publish both with footnote.
C4, Sharpe formula divergence between modules
Severity: HIGH. What's wrong:
backtest-compute.ts:34uses population stdevsqrt(ss/n).quant-stats.ts:147uses sample stdevsqrt(ss/(n-1))(Bessel's correction). For n=15k the bias is 0.003% (immaterial), but reports cite different Sharpe values for the same dataset depending on which module computed them.
Why it matters: reproducibility / audit-grade claim broken, two endpoints can show two Sharpes.
Proposed fix: import sampleStdDev from quant-stats.ts everywhere; delete the local sharpe() in backtest-compute.ts.
C5, STRATEGY_PROOF.sharpe = cross-sectional dispersion, not a Sharpe ratio
Severity: HIGH. What's wrong: STRATEGY_PROOF.sharpe = 1.87 is documented (lines 24–36 of winning-strategy.ts) as the cross-sectional mean/σ of individual 90d trade returns. This is a stat dispersion, NOT a portfolio Sharpe (no time aggregation, no risk-free subtraction, not annualized). sharpeAnnualized ≈ 0.40 and sharpeDeflated = null (negative after Bailey–López de Prado DSR correction on a 583,200-trial grid search).
The published headline in strategy-proof.ts::strategyHeadline() returns STRATEGY_PROOF.sharpe.toFixed(2) = "1.87", i.e. the disallowed figure. The internal sharpeDisclosure() function adds the disclaimer, but strategyHeadline() does not, and the latter is consumed by UI components.
Why it matters: regulatory/copy risk. CSA-style "fair, clear, not misleading" requires that the published Sharpe be the deflated one (negative → not advertisable) or the annualized one (0.40 with wide CI). Citing 1.87 alone without the asterisk = misrepresentation.
Proposed fix:
strategyHeadline()must returnsharpeAnnualized(or null if statistically insignificant), never the cross-sectional value bare.- Audit every UI consumer for the bare
STRATEGY_PROOF.sharpereference (grep -r "STRATEGY_PROOF.sharpe\b"). - If
sharpeDeflated < 0, the headline must showdsrNoteand downgrade Sharpe to "see methodology". - Cite Bailey & López de Prado (2014) "The Deflated Sharpe Ratio" in the disclosure copy.
C6, Strategy-proof numbers vs DB re-derivation: massive divergence
Severity: CRITICAL. What's wrong:
| Metric | Published (STRATEGY_PROOF) | Re-derived on universe (BUY, n=15171) | Δ |
|---|---|---|---|
| Sample size n | 173 | 15,171 | 88× |
| Win rate | 77.0% | 44.3% | -32.7 pts |
| Avg return T+90 (retail-view) | +13.2% | -0.72% | -13.9 pts |
| Cross-sectional Sharpe | 1.87 | -0.026 | -1.90 |
The published n=173 is a tiny filtered subset (the strategy filter: cluster + acquisition + mid-cap + role + freshness + score ≥30). The universe-level numbers re-derived from BacktestResult.returnFromPub90d show a near-zero-alpha, negative-Sharpe distribution.
Why it matters: the published numbers are conditional on the strategy filter, THIS IS LEGITIMATE as long as the filter is honest. But the headline does not surface the conditional. A reader sees 77% win rate, +13.2% and infers it applies broadly; the disclaimer is only in code comments.
Sub-issue: the strategy filter contains a survivorship/selection trap flagged in winning-strategy.ts:325:
LEAKAGE(CRITICAL): Mid-cap filter uses Company.marketCap (current snapshot).
A company classified as "mid-cap today" may have been small-cap or large-cap
at the time of the historical trade, causing inclusion/exclusion bias in the
380 winning signals.
This is unfixed (TODO-1). It inflates the win rate by including only companies that survived to be mid-cap today.
Proposed fix:
- Re-run the strategy filter with PIT marketCap from
CompanyMarketCapSnapshot(already in schema), not the current snapshot. - Re-publish
STRATEGY_PROOFnumbers with the corrected filter; expect win rate to drop closer to 60–65%. - Add a sub-component to the UI: "Filter universe: 15,171 BUY trades. Strategy selects 173 (1.1%). Universe mean: -0.72%. Filtered mean: +13.2%. Improvement attributable to filter: +13.9 pts, p=?, deflated Sharpe=?."
2. HIGH severity
H1, Multi-market scoring is NOT running in production
signalScore IS NULL for all 7,432 non-FR declarations. The recommendation engine code-paths for non-FR markets exist (MARKET_WEIGHTS, roleBasePts fallback) but no rows have been scored since SEC/RNS/SIX/SEDI/Consob/CNMV ingestion. Until the cross-market scoring cron runs and populates signalScore for the staged 7,432 rows, the multi-market UI is showing role-fallback-only recommendations capped at ~50/100 composite.
Fix: spawn the scoring cron over non-FR backlog; flag any market with scored_pct < 80% as "preview" in the UI.
H2, Anonymised CH insiders track-record correctness
MARKET_WEIGHTS.ch = (sig=10, wr=35, ret=10, rec=45) correctly reduces the signal slot for anonymised CH filings (no insider FK → no track record possible). However the recommendation-engine.ts:502-504 fallback roleBasePts(decl.insiderFunction) / 35 still kicks in for CH rows even though SIX SER filings are anonymised by design, insiderFunction will be the impersonal label "Person Discharging Managerial Responsibilities". Role-fallback should be DISABLED for CH (return 0 / W.sig instead).
Fix: in recommendation-engine.ts, guard if (market === 'ch') signalNorm = 0 when signalScore == null.
H3, UK connected-person weighting
RNS PDMR filings can be filed by a connected person (spouse, controlled entity). Currently treated identically to the PDMR themselves. Information content is lower for connected persons (lower confidence, often planned). No weight reduction in current code.
Fix: parse RnsFiling.insiderType and downweight signalScore by ~0.7 when type ∈ {Connected_Person, Person_Closely_Associated}. Calibrate against the eventual UK BacktestResult subset.
H4, Look-ahead in convertUsdToEur
fx-history.ts:163 finds "the most recent rate on or before asOfDate" but the cache is loaded with cutoff = now - 30d, for any backtest date older than 30 days, falls through to DB. That's fine for live, but in the backtest harness, if the harness runs once a week and the FX history was refreshed yesterday, asOfDate < cutoff works correctly. However: there is no test asserting that convertUsdToEur(amt, 2024-01-15) returns the 2024-01-15 rate (not the latest). I could not find such a test.
Fix: add a PIT unit test for FX conversion at past dates; also extend cache to all currencies (currently EURUSD only).
H5, Earnings-period clustering double-counting
Per audit 16 (earnings proximity), multiple trades by the same insider around the same earnings call (cluster) are scored independently, isCluster flag is set, but the return for each trade in the cluster is counted in BacktestResult independently, inflating effective n in cross-sectional Sharpe.
Fix: in backtest stats path, group cluster trades by (company, ±5d window) and use the cluster-level return (size-weighted) as ONE observation. Reduces n materially (~30%); will lower published Sharpe.
3. MEDIUM severity
- M1 Survivorship: delisted companies are not represented in
Companytable at all, their declarations are dropped at ingestion (no recovery from CSV). Backtest universe is biased toward survivors. Mitigate by retaining declarations even when company resolution fails; addCompany.delistedAtcolumn. - M2
Declaration.totalAmountisFloat(Postgres double precision), not Decimal. At €1B+ values, IEEE-754 precision is ~244 EUR; non-critical for stats but cosmetically wrong for headline rendering (use BigInt + integer cents). - M3 Score caps in
recommendation-engine.ts:signalPts + winRatePts + returnPts + recencyPts ≤ 20+40+10+30 = 100. Verified bounded by clamps. ✓ no clipping bug. However,returnPts = min(W.ret, max(0, W.ret/2 + (W.ret/4)*signed))can in theory hitW.retonly withsigned ≥ 2. Distribution analysis recommended. - M4
MARKET_WEIGHTSfor DE/UK/CA/IT/ES inherit FR weights "until per-market BacktestResult coverage > 1000 rows" (code comment). Currently zero coverage for all five. Document this as PREVIEW status in the UI. - M5 Per-market liquidity / ADV: not implemented anywhere. Small-cap CH thresholds = small-cap US thresholds = €200M (
WINNING_STRATEGY.minMarketCapEur). For CH market the median company is smaller; the FR-calibrated 200M-1B mid-cap window may capture only the tail of SIX listings.
4. LOW severity
- L1
Company.marketfield contains ticker symbols (KTOS, VINP, RDDT…) instead of MIC codes (XMIL, XMAD, XETR). 77% NULL. UsemarketOf(amfToken)everywhere; deprecate the field. - L2
currency='VOL'27 rows, invalid; clean up ingester. - L3 Mulberry32 seed=42 deterministic ✓, good for reproducibility.
5. Coordination log, sub-agents
Note: per task constraints, sub-Agent spawning via the Agent tool is reserved for critical/high cross-file fixes. Given that:
- The precondition (doc 25 + BacktestResult per-market) is not met,
- Fixing C1-C2-C3 requires DB-level backfill + ingester changes coordinated with the data-engineering agents currently running the population,
- The 6h budget is shared with the population wait,
Decision: I did NOT spawn sub-agents to apply fixes. Reason: applying the C1/C2 currency backfill mid-population would cause race conditions with the ingester agents writing currency='EUR' on new rows. The fix MUST be sequenced AFTER population completes.
Recommended sub-agent spawn order (post-population):
general-purpose, backfillDeclaration.currencyfromamfIdprefix +MARKET_CURRENCYmap (one SQL UPDATE per market).quant-analyst, extendfx-history.tsto all currencies + backfill 5y of EURGBP/EURCHF/EURCAD/EURNOK/EURSEK from Yahoo.general-purpose, winsorizebacktest-compute.ts::aggregateGroupand add per-market subgrouping.quant-analyst, re-derive STRATEGY_PROOF using PIT marketCap fromCompanyMarketCapSnapshot; update headline tosharpeAnnualizedonly.general-purpose, fixrecommendation-engine.tsCH guard (H2).quant-analyst, apply earnings-cluster deduplication in backtest stats (H5); re-publish.
6. Strategy-proof: published vs re-derived
| Metric | Published | Re-derived (universe, BUY, n=15171, returnFromPub90d) | Δ |
|---|---|---|---|
| n | 173 | 15,171 | +8675% |
| Win rate (%) | 77.0 | 44.3 | -42.5% |
| Avg return T+90 (%) | 13.2 | -0.72 | -105.5% |
| Sharpe (cross-sectional) | 1.87 | -0.026 | -101.4% |
All four deltas exceed the 5% flag threshold. The published numbers ARE the filter-conditioned numbers, but the filter contains the PIT leakage flagged in C6 (and acknowledged in winning-strategy.ts:325). After PIT correction, expected published win rate ~62-68%, avg ~+7-9%, deflated Sharpe likely between -0.2 and +0.3 (not statistically distinguishable from zero given the multiple-testing penalty).
7. Verdict per market
| Market | Declarations | Scored | FX coverage | BacktestResult | Ship-ready? |
|---|---|---|---|---|---|
| fr | 25,847 | 22,255 (86%) | n/a (EUR native) | 15,171 BUY | YES with disclosure (C5/C6 fix) |
| us | 6,206 | 0 | EURUSD 5y ✓ | 0 | PREVIEW (need scoring + currency relabel) |
| de | 295 | 0 | EUR native ✓ | 0 | PREVIEW (need scoring) |
| it | 360 | 0 | EUR native ✓ | 0 | PREVIEW (need scoring) |
| es | 101 | 0 | EUR native ✓ | 0 | PREVIEW (need scoring) |
| ch | 226 | 0 | CHFEUR 3mo only | 0 | NO-GO (FX gap + anonymity rule untested) |
| uk | 176 | 0 | GBPEUR 1mo only | 0 | NO-GO (FX gap + connected-person rule untested) |
| ca | 68 | 0 | CADEUR 1y partial | 0 | NO-GO (volume + FX) |
8. Final coherence check
Not re-run: the audit identified that 6 CRITICAL items are open (C1-C6). A re-audit is meaningful only after the data-engineering agents finish populating the 14+ markets + scoring cron + per-market backtests. The audit-doc-25 prerequisite is the gating signal.
0 CRITICAL remaining → NOT YET. Target: re-audit when this doc's coordination-log step 6 completes.
9. References
- Bailey, D.H., López de Prado, M. (2014). "The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality." J. Portfolio Management 40(5):94-107.
- López de Prado, M. (2018). Advances in Financial Machine Learning, ch. 11 (Backtest statistics) and ch. 12 (Backtest through cross-validation).
- Cohen, L., Malloy, C., Pomorski, L. (2012). "Decoding Inside Information." J. of Finance 67(3):1009-1043. (Routine-vs-opportunistic classification, already cited in
winning-strategy.ts:69.) - CSA-style "fair, clear, not misleading" disclosure standards for performance claims.