58 · Vercel cron migration audit (2026-05-18)
Author: BO refine sweep · scope: identify local nohup-spawned long-running backfills that should move to Vercel cron, sized correctly under the 300s / 1024 MB Vercel-Pro function ceiling.
TL;DR
Three classes of locally-spawned procs deserve different treatment:
| Local task | Recommendation | Why |
|---|---|---|
Logo backfill (scripts/_logo-backfill-asia-em.ts, _logo-audit-regen.ts) |
Move to Vercel cron, every 4h | Pure HTTP fanout to Clearbit/Google/OpenAI Vision; bounded to ~50 companies per run keeps it well under 300s. |
Insider bios (scripts/_backfill-insider-bios-batched.mjs, PID 75137 locally) |
Move to Vercel cron, every 6h | OpenAI calls capped at 30/run = ~90s p95. Persist cursor in Setting table (key cursor:insider-bios). |
SEC form4 history (backfill-sec-form4, 39h running locally) |
Already wired (/api/cron/backfill-sec-form4, schedule 50 */1 * * *). Kill local nohup. |
Cron already chunks ~150 rows per run; the local process is the legacy bootstrap and is now redundant. |
Net effect: zero residual nohup processes on Simon's laptop. All ingest +
enrichment runs from Vercel with audit trail in IngestionRun.
Audit detail
Why these procs were local in the first place
The three procs above were spawned with nohup during the initial bootstrap
of the platform (Apr-May 2026) when the corresponding IngestionRun /
backfill-cursor scaffolding didn't yet exist. They have been left running on
the laptop because:
- They are I/O-bound + bounded budget (Clearbit, OpenAI), so 300s per run is more than enough.
- None requires GPU, ffmpeg, or local-only secrets.
- Each was paused/resumed manually via shell; no UI surface.
There is no architectural reason to keep them on the laptop. The only operational reason was inertia.
Constraints to respect on Vercel
- 300s max function duration (Pro plan, with
vercel.jsonmaxDuration: 300). - 1024 MB memory ceiling — fine for all three (logo binaries are streamed, bios are plain text).
- No filesystem persistence between invocations — use
Settingtable for the cursor (existing pattern:backfill_cursor:<source>already wiresdart-kr,oslo-no,nse-in,sec-form4,jp-merge,sebi-merge,jp-edinet). - Cron secret in
Authorization: Bearer ${CRON_SECRET}header — already in place for every existing cron.
Concrete migration steps
Logo backfill · add
/api/cron/backfill-logos/route.ts:- Reads
Settingrowbackfill_cursor:logos→{ lastId, processedCount }. - Selects 50 companies with
logoUrl IS NULLordered bymarketCap DESCstarting atlastId. - Fans out 8-way (existing logo-pipeline helpers already do this).
- Records
IngestionRun(source='backfill-logos', recordsNew, recordsSeen). - Schedule:
0 */4 * * *(every 4 hours).
- Reads
Insider bios · add
/api/cron/backfill-insider-bios/route.ts:- Reads
Settingrowbackfill_cursor:insider-bios. - Selects 30
Insiderrows wherebioGeneratedAt IS NULLordered by declaration count desc. - Calls
enrichInsiderBio()per row (existing helper). - Schedule:
15 */6 * * *(every 6 hours, offset to avoid waves). - Estimated cost per run: 30 × ~2k tokens GPT-4o-mini = ~$0.02.
- Reads
SEC form4 history · no new route needed.
- Action:
kill 39741(the 39h-running local nohup; PID was visible in project_insiders_trades_state.md checkpoint). - The existing
/api/cron/backfill-sec-form4cron has been running for weeks on schedule50 */1 * * *and is the source of truth. The duplicate local run is a no-op double-write because the merge layer dedups on(accession, transactionDate).
- Action:
Verification
After the two new crons are wired, both will appear in /admin/sources and
/admin/jobs automatically (no admin code change needed — the BO reads
IngestionRun and the source registry).
Stale-source detection on /admin/tech (48h threshold, introduced today)
will fire if either cron silently dies.
Out of scope for this audit
- Recommendation re-scoring (
/api/recos/quality) — already runs in cron. - Weekly digest emails — already runs in cron.
- Sandbox quota reset — already runs in cron.
enrich-mcapdaily — already runs in cron.
The local laptop should only host: dev server (next dev), Prisma Studio
(ad-hoc), and the mempalace indexer. Nothing else.
Next step
Open two PRs:
feat(cron): backfill-logos route + vercel schedulefeat(cron): backfill-insider-bios route + vercel schedule
Then kill 75137 39741 on the laptop and confirm via ps aux | grep node
that nothing else is silently consuming Simon's RAM.