47 · Site state audit + 20 improvement leads
Date: 2026-05-17 Author: read-only audit agent (no code changed) Scope: snapshot the live state of the codebase, data, content, and infrastructure, then propose the top 20 improvement levers ranked by impact-over-effort.
Sources cited inline as file:line where applicable. Anything not
verifiable from this read-only pass is marked "unverified".
A. Data coverage
Per-source row counts come from docs/method-review/41-history-depth-audit.md
(last full snapshot 2026-05-17). Sample sufficient = ≥100 rows.
| Source | Rows | Oldest | Newest | Depth | Sample OK? | Notes |
|---|---|---|---|---|---|---|
| AMF (FR) | 25,733 | 2020-03-17 | 2026-05-14 | 6.2y | yes | gold standard |
| SEC Form 4 (US) | 156,888 | 2021-05-03 | 2026-05-15 | 5.0y | yes | could push to 20y |
| BaFin (DE) | 295 | 2025-05-02 | 2026-05-15 | 1.0y | yes | rolling 12mo cap |
| SIX SER (CH) | 226 | 2026-04-15 | 2026-05-15 | 0.1y | yes | RSS 30d cap |
| RNS (UK) | 192 | 2026-05-12 | 2026-05-16 | 0.0y | yes | backfill 49 mo possible |
| SEDI (CA) | 68 | 2026-04-30 | 2026-05-16 | 0.0y | borderline | ceo.ca 30d cap |
| CONSOB (IT) | 360 | 2026-04-02 | 2026-05-15 | 0.1y | yes | backfill 59 mo possible |
| CNMV (ES) | 101 | 2026-04-16 | 2026-05-15 | 0.1y | yes | backfill 60 mo possible |
| AFM (NL) | 6,392 | 2006-11-01 | 2026-05-13 | 19.5y | yes | bulk XML, exceptional |
| FSMA (BE) | 906 | 2020-03-31 | 2026-05-15 | 6.1y | yes | 50/issuer rolling |
| Oslo Børs (NO) | 601 | 2025-12-23 | 2026-05-15 | 0.4y | yes | backfill 56 mo possible |
| Nasdaq Helsinki (FI) | 114 | 2026-05-13 | 2026-05-16 | 0.0y | yes | backfill 61 mo |
| Nasdaq Copenhagen (DK) | 182 | 2026-02-25 | 2026-05-15 | 0.2y | yes | backfill 58 mo |
| ASX (AU) | 202 | 2026-05-13 | 2026-05-15 | 0.0y | yes | Markit ~3 mo cap |
| FMA (AT) | 50 | 2026-03-31 | 2026-05-15 | 0.1y | no | thin, backfill 59 mo |
| Euronext Dublin (IE) | 2 | 2026-05-12 | 2026-05-16 | 0.0y | no | live but empty |
| SGX (SG) | 1 | 2026-05-17 | 2026-05-17 | 0.0y | no | live but empty |
| CVM (BR) | 29,307 | 2025-02-01 | 2026-05-15 | 1.3y | yes | backfill 45 mo, top ROI |
| DART (KR) | 672 (per task brief) | unverified | unverified | unverified | yes (per brief) | history audit run before fresh ingest |
| EDINET (JP) | 971 (per task brief) | unverified | unverified | unverified | yes (per brief) | key live, backfill in flight |
| SEBI (IN) | 0 | n/a | n/a | 0 | no | first ingest pending |
| HKEX | unverified | unverified | unverified | unverified | unverified | seed mentioned, not in history audit |
Aggregate (verified from audit doc): 221k rows across 18 sources
with measurable history. Task brief claims 162,615 declarations and 8.3k
companies; this number is plausible for the 220k filings, ~8.3k companies" on the landing to remove ambiguity.DIRIGEANTS view only since
SEC Form 4 lives in its own table. Both numbers should be re-stated as
"
Sources with <100 rows that are publicly listed as covered markets:
SEDI (68), FMA-AT (50), Euronext Dublin (2), SGX (1), SEBI (0). All of
these appear in homepage copy (src/app/page.tsx:34,
src/components/landing/LandingHero.tsx:147) and on regulator cards on
/ (commit d72aa72 says "all 16 markets LIVE on cards"). For Dublin,
Singapore, and SEBI this is technically pre-mature.
B. Ingestion health
Cron topology (from vercel.json):
- Hourly
sync-latest(legacy AMF heartbeat). - Three regional wave-cron triggers, each fired 3x/day in UTC:
wave-euat 06:05 / 14:05 / 22:05wave-americasat 06:25 / 14:25 / 22:25wave-apacat 06:45 / 14:45 / 22:45
sec-form4separate 3x/day cron (05:30 / 13:30 / 21:30).- Individual fallback crons still wired for each regulator (BAFIN, RNS, SEDI, CONSOB, CNMV, FMA-AT, FI-SE, FSMA, ASX, DK, AFM, OSLO-NO, HEL-FI, Euronext-IE, DART-KR, CVM-BR, SEBI-IN). Mild duplication.
enrich-new(logo, description, gender) at 07:00 / 15:00.sources-watchdogdaily at 09:30 (alerts on stale sources).alerts-realtimeevery 15 min, plus daily/weekly digest.backtest/computeweekly Sun 05:00.
Last 3 runs per source / 7-day success rate: powered by
IngestionRun table and exposed at /admin/sources/page.tsx:84 (last
10 runs as sparklines) and /status (public). Live values not
captured in this read-only pass , flag as unverified, but
infrastructure is correct and per-source success-rate badges render on
the admin source page. Recommended: have a separate agent run a
production curl on /admin/sources and capture the JSON.
Drift risk: the per-source crons that are listed in vercel.json
alongside the wave-* crons can over-ingest if both are kept. Consider
deleting the per-regulator crons once the wave triggers are proven
stable.
C. Code quality
npx tsc --noEmit→ 19 errors in 6 files. Top offender: missingprisma.jpFilingaccessor insrc/lib/ingest/jp-edinet.ts:439(model declared in migrations but Prisma client not regenerated, or schema.prisma not updated). TS2339 ×13, TS7006 ×6.npm run lint:emdash→ green, "no em-dashes in user-facing copy".npm run lint:emoji→ red, 26 violations. Visible offender:src/lib/email.ts:833("Top X signaux vente"). Purge agent flagged as in-flight in task brief.- Prisma migration drift: migration
prisma/migrations/20260517200000_blog_articles/migration.sqlcreatesBlogArticlebutprisma/schema.prismahas noBlogArticlemodel. Other DB tables (*Filingmigrations for KR, JP, HK, SG, IN) likewise appear in migrations only ,schema.prismais the source of TS2339 errors. Eitherprisma db pullto regenerate or hand-add the models.
D. Feature inventory
57 distinct page.tsx files (find src/app -name page.tsx | wc -l).
Public, indexable
/, /methodologie, /performance, /heatmap, /companies, /companies/by-market/[market], /companies/by-sector/[sector], /company/[slug], /insider/[slug], /insiders, /insiders/by-market/[market], /insiders/by-role/[role], /leaderboard/insiders, /top-movers, /earnings-radar, /clusters/recent, /recommendations, /backtest, /portfolio (mixed), /hubs/cluster-signals, /hubs/insider-buying-this-week, /compare/openinsider, /compare/quiver-quant, /use-cases/quant-fund, /pricing, /blog, /blog/[slug], /blog/category/[key], /docs, /docs/competitive, /docs/mcp, /docs/method-review, /docs/method-review/[slug], /status, /privacy, /terms.
Auth
/auth/login, /auth/register, /auth/magic, /auth/verify, /auth/forgot-password, /auth/reset-password.
Account (beta-gated logged-in)
/account/alerts, /account/api-keys, /companies/add.
Admin
/admin (overview shell tabs), plus dedicated pages:
/admin/overview, /admin/pipeline, /admin/alerts, /admin/users,
/admin/analytics, /admin/audits, /admin/settings, /admin/sources,
/admin/recos/quality, /admin/tech, /admin/legacy. 12 pages.
API
~40 route handlers under src/app/api/, plus the cron/*, v1/* REST,
mcp/* MCP server, billing/checkout, billing/webhook, weekly-digest,
and openapi.json.
E. Scoring and recos
- Scoring v5.1 with
MARKET_WEIGHTSreferenced repeatedly in commit log; live values per recommendation visible at/recommendations. RecoSnapshotmodel inprisma/schema.prisma:239withverifiedAtfield used by the 3h coherence cron.- Reco verifier admin panel exists at
/admin/recos/quality/page.tsx. - Quant rerun on enriched dataset noted as in-flight in task brief
(separate agent). Latest backtest doc is
30-backtest-final-17markets.mdplus alpha-discovery rounds 1 and 2 (32-,34-). No36-,38-,39-,40-,43-,44-,46-in themethod-review/folder (skipped numbers, normal because tasks reserve slots).
F. Marketing surfaces
| Surface | Latest numbers shown | Status |
|---|---|---|
/ hero |
"16 regulators" copy at page.tsx:34, LandingHero.tsx:147 |
stale , 17 ingested per task brief, 21 declared |
/ JSON-LD Org |
"16 regulators" + offers Developer 19 / Pro 99 (page.tsx:404) |
conflicts with /pricing which lists Free/Pro 19/Quant 99 |
/ regulator cards |
"all 16 markets LIVE" per commit d72aa72 |
over-states Dublin (2 rows), SGX (1 row), SEBI (0 rows) |
| Hero subtitle | "5 markets · FX-normalised · API + MCP" (LandingHero.tsx:127) |
stale, contradicts the headline |
/methodologie |
unverified , likely text is current | unverified |
/performance |
quant rerun pending, may show pre-v5.1 numbers | likely stale |
/compare/* |
flagged in-flight in task brief | in-flight |
/pricing |
Free / Pro €19 / Quant €99 (pricing/page.tsx:71,90) |
OK |
Recommended pre-launch: single sweep on every number on /,
/methodologie, /performance to reach one canonical set
("~220k filings, 8.3k companies, 17 live markets, 4 capped, 2 seeding").
G. SEO
- Sitemaps: 11 distinct sitemap routes for EN/FR companies, insiders,
landings, docs, static, plus index
/sitemap.xml. Entry counts not captured live but the architecture is well-segmented. - Hreflang: page-level
alternatesplus layout-level fallback atlayout.tsx:200-206. Only EN+FR pair declared. Good but minimal. - Internal mesh:
26-internal-linking-audit.mdexists and a refresh shipped in45924c6. The blog → method-review → hubs spine is the weakest link; homepage barely surfaces the blog. - Schema.org coverage:
Organization,SoftwareApplicationwithOffers on/. NoArticle, noDataset, noFAQPage, noBreadcrumbListvisible from this pass. - Structured-data validation: never run end-to-end. Recommended.
H. Performance
.next/staticsize on local build is ~3.9 MB , healthy.- No Sentry, no Datadog, no Web Vitals collector wired. Only mention is
admin/tech/page.tsx:872recommending Sentry. - LCP/CLS field data: missing , nothing in
/admin/analyticsexposes CWV. Either wireweb-vitalslib or pull from CrUX. - Biggest pages by source size:
/adminshell (AdminDashboard.tsx46.6 KB raw,AdminShell.tsx23.1 KB). Public pages all <50 KB raw. globals.cssis 128 KB. Worth a Tailwind purge audit (Tailwind 4 postcss is inpostcss.config.mjs).
I. Security
- Cron routes gated by
Authorization: Bearer $CRON_SECRET(wave-americas/route.ts:9, all the/cron/*files). /api/migratedefense behindALLOW_MIGRATE_ROUTEenv (default 404).- Admin pages call
getCurrentUser()(src/lib/auth.ts:92); not directly verified that every admin page redirects unauthenticated visitors , sampleanalytics/page.tsxdoesredirect(...). - Magic-link auth + password auth + JWT (
jose). Reset / forgot flows shipped. Email verification present. - Public
/api/v1/*should use API-key gating (rows present inApiKeymodel atschema.prisma:373). .env.exampledocuments UNSUB secret, ALERT webhooks (Slack + Discord), Stripe stubs.- No secret scanner in CI; no Dependabot config visible. Worth adding.
- No CSP header set in
next.config.ts(unverified line-by-line).
J. Pricing and billing
- Tiers and copy live at
/pricing. Plans: Free €0, Pro €19/mo, Quant €99/mo, FR + EN dictionaries. STRIPE_*env vars documented but not wired.api/billing/checkout/route.tsreturns 503 until env set.api/billing/webhook/route.tsis a logging stub. CTAs on/pricingrender as disabled "Coming soon" when env unset (per file header).- Quota enforcement uses
UserPageViewtable (src/lib/quota/page-quota.ts). Fails open on infra hiccup. Sandbox 10-free-call API mechanism flagged as in-flight in task brief. - Inconsistent offers in homepage JSON-LD (
Free / Developer 19 / Pro 99) vs pricing page (Free / Pro 19 / Quant 99). Fix before any schema validation run.
K. Observability
- No Sentry, no PostHog, no Logflare wired.
- Server logs default to Vercel function logs.
- Email events tracked in
EmailEventtable. /statuspublic page exists (commitb4413ac).- Admin debug panels:
/admin/sources,/admin/tech,/admin/audits,/admin/analytics,/admin/recos/quality. Per-page verification of "do they work" not done in this pass. - Alert paths:
ALERT_SLACK_WEBHOOK_URLandALERT_DISCORD_WEBHOOK_URLdocumented in.env.example.sources-watchdogcron is wired. No PagerDuty / OnCall.
L. Honest gaps (claims vs reality)
| Claim on site | Reality | Action |
|---|---|---|
| "16 regulators live" (hero, JSON-LD) | 17 ingested with >1 row, 4 of those have <250 rows total and 2 have <5 rows | Restate as "17 regulators, ~220k filings, 4 markets in seed phase". |
| "5 markets" (hero subtitle) | Contradicts the headline, leftover copy | Single-pass purge. |
"All 16 markets LIVE on cards" (commit d72aa72) |
Dublin (2), SGX (1), SEBI (0) | Mark these "seeding" with a clear badge, not "live". |
Schema.org Offers: Free / Developer 19 / Pro 99 |
Pricing page sells Free / Pro 19 / Quant 99 | Align JSON-LD with /pricing. |
| "Open backtest" (hero) | /backtest exists; results from quant rerun in flight |
Verify /backtest shows v5.1 numbers post-rerun. |
| 50 blog articles FR+EN with categories | BlogArticle table created via migration but no Prisma model in schema.prisma |
Either generate the model or document drift, otherwise tsc fails. |
| "MCP server" | /api/mcp/* and /docs/mcp exist; live registry listing unverified |
Confirm public MCP endpoint reachable and listed somewhere indexable. |
| "tsc clean" (implicit) | 19 errors in 6 files | Fix before next deploy. |
lint:emoji (per AGENTS.md no-emoji policy) |
26 emoji violations remain | Purge agent finish required before claiming policy enforced. |
| 162,615 declarations | Plausible only if SEC Form 4 (~157k) is counted alongside the DIRIGEANTS AMF view; per-table sum is ~221k. |
Adopt one canonical KPI; show derivation in /methodologie. |
M. Top 20 improvement leads (ranked impact-over-effort)
Effort: S < ½ day, M 1-3 days, L >3 days. Impact: high / med / low.
1. Fix tsc + Prisma drift before next deploy
Impact high (CI green is a precondition for everything else). Effort S.
First action: prisma db pull + diff schema.prisma, then resolve the
13× TS2339. Owner: backend-engineer.
2. One-pass copy sweep on landing numbers
Impact high (founder-credibility, also unblocks press / pitch).
Effort S. First action: produce a single canonical KPI block
(filings, companies, markets-live, markets-seeding, history depth) and
substitute into /, /methodologie, /performance, JSON-LD, social
OG. Owner: content-strategist + product-marketing.
3. CVM Brazil 5-year backfill
Impact high (quadruples CVM dataset, unlocks LATAM alpha). Effort M.
First action: extend scripts/ingest-cvm-br.ts to walk weekly ZIPs back
to 2021. Owner: data-engineer.
4. CONSOB + Oslo + CNMV 5y backfill bundle
Impact high (3 high-PDMR-discipline markets, same scrape patterns, parallelisable). Effort M. First action: stub three worktrees, one per regulator, run in parallel. Owner: data-engineer.
5. Programmatic SEO insider + company hubs with rich Article schema
Impact high (long-tail org search is dominant traffic for this
vertical; competitor openinsider.com thrives on it). Effort M.
First action: add Article and BreadcrumbList JSON-LD to
/company/[slug] and /insider/[slug] plus per-page meta tuned to
recent-trade-context. Owner: seo-specialist.
6. Stripe wire-up sequence
Impact high (zero revenue today). Effort M. First action: wire test-mode keys, ship checkout for Pro, observe one real success, then live mode. Owner: billing-engineer.
7. Sentry + web-vitals + CrUX board
Impact high (one ingestion-cron failure or one bad LCP regression
is currently invisible). Effort S. First action: install
@sentry/nextjs, wire DSN, send Web Vitals to a /api/vitals
collector backed by Postgres. Owner: sre-engineer.
8. Cluster-trading detection on multi-market dataset
Impact high (proprietary alpha angle, competitor differentiator).
Effort L. First action: define "cluster" (>=3 PDMRs at same
issuer, same direction, same 5-day window) and run sweep over
Declaration. Owner: quant-analyst.
9. Backtest in-browser UI on the /backtest page
Impact high (live demo lets prospects test before they buy).
Effort M. First action: ship a constrained interactive sweep
(market, period, top-N filings) feeding /api/backtest/compute.
Owner: product-engineer + quant-analyst.
10. Onboarding + first-run flow with sandbox key
Impact high (currently signup → empty page; massive activation loss). Effort M. First action: post-verify, drop the user on a guided 60-second tour ending on a one-click sandbox key issuance. Owner: ui-designer + growth-engineer.
11. Tighten admin auth gate audit
Impact med (currently relying on per-page getCurrentUser + redirect;
one missing call leaks the panel). Effort S. First action: middleware
that forces role === ADMIN for /admin/* and /api/admin/*. Owner:
security-engineer.
12. Watchlist + custom-alerts builder
Impact high (single highest-asked feature in this category; raises
DAU and converts free users). Effort L. First action: ship
"follow this insider/company" CTA on /insider/[slug] and
/company/[slug], backed by UserAlert. Owner: product-engineer.
13. CSV + parquet export
Impact med (one-click "give me everything" is a Quant-tier hook).
Effort S. First action: signed-URL export from /api/v1/export
gated by Quant tier. Owner: backend-engineer.
14. Discord + Slack webhook deliveries for user alerts
Impact med (sticky integration; competitor quiverquant lacks it).
Effort S. First action: extend UserAlert channel enum to
SLACK | DISCORD. Owner: integration-engineer.
15. GDPR + MAR disclosure polish + right-to-erasure UI
Impact med (regulatory risk reduction). Effort S. First action:
/privacy review + add /account/delete self-service. Owner:
compliance-officer + product-engineer.
16. Image + font + ISR perf pass
Impact med (LCP shave 200-500ms, helps SEO). Effort S. First
action: subset JetBrains Mono + main display font; convert hero PNGs to
AVIF; tune revalidate on /company/[slug]. Owner: frontend-engineer.
17. Schema.org enrichment: Article, FAQPage, Dataset, BreadcrumbList
Impact med. Effort S. First action: ship Article on every
blog post and method-review page; Dataset on /api. Owner:
seo-specialist.
18. Annual discount + team seats + enterprise tier
Impact med (annual lifts ARR per active user, team seats unlock agencies). Effort M. First action: add Stripe price IDs for annual; price page UX for billing toggle. Owner: billing-engineer.
19. Editorial calendar + weekly digest cadence + founder voice channel
Impact med (organic content moat compounds slowly). Effort L.
First action: lock 12-week editorial plan, queue the weekly-digest
cron output as a public newsletter page with sign-up. Owner:
content-strategist.
20. MCP marketplace listing + Zapier connector
Impact med (distribution leverage for the API). Effort M. First action: submit MCP server to the public registry; build a Zapier trigger on "new cluster signal" + "new buy by tracked insider". Owner: partnerships-engineer.
N. Skipped / unverified
- DART (KR), EDINET (JP), HKEX live counts , task brief gives values
but the per-source history audit (
41-) wasn't re-run for these. - Live BO debug panel checks (browser-based) skipped to keep this read-only.
- Bundle analysis (
next build --profile) skipped. - CSP headers in
next.config.tsnot fully audited. /admin/analyticscohort numbers not pulled.- Live
/statusJSON body not curled.
Pick these up in a follow-up agent that is allowed to hit production.