Most zero-downtime deployment incidents are not caused by the deploy button. They happen because old code, new code, workers, webhooks, and the database disagree about shape while they overlap.
A rolling deploy means two application versions are alive at the same time. Background workers may lag behind. Mobile clients may keep sending old payloads. Cached pages may read old fields. The database migration has to tolerate that overlap.
Use expand and contract as separate releases
The safe pattern is usually:
1. Expand schema in a backward-compatible way
2. Deploy code that can read old and new shapes
3. Write both shapes when necessary
4. Backfill existing records
5. Verify all readers use the new shape
6. Contract by removing old columns or code
For example, renaming users.full_name to users.display_name should not be one migration that drops the old column.
ALTER TABLE users ADD COLUMN display_name text;
Then deploy code that writes both:
await db.user.update({
where: { id: userId },
data: {
fullName: input.name,
displayName: input.name,
},
});
Only after old readers are gone and records are backfilled should the team remove full_name.
Treat workers as their own rollout
Web servers are rarely the only readers. Queue workers, cron jobs, webhook handlers, report exporters, and admin scripts may touch the same table with different deployment timing.
Before merging a migration, list the processes:
Table: invoices
Readers: web invoice page, PDF export worker, revenue report cron, support admin
Writers: checkout API, invoice adjustment job
Lag risk: PDF export worker can process jobs created before deploy
Compatibility decision: keep old amount_cents field until worker queue drains
If one process can lag, the schema needs to remain compatible until that process has deployed and drained.
Prove the overlap with a small test
The overlap test does not need to model the whole system. It needs to prove the risky assumption.
it("new invoice reader handles rows before and after tax breakdown backfill", async () => {
await insertInvoice({ id: "old", totalCents: 1200, taxBreakdown: null });
await insertInvoice({
id: "new",
totalCents: 1200,
taxBreakdown: [{ type: "vat", cents: 200 }],
});
expect(await renderInvoice("old")).toContain("$12.00");
expect(await renderInvoice("new")).toContain("$12.00");
});
This test documents the compatibility window better than a checklist sentence.
Backfills are production jobs
A backfill is not just a one-time script someone runs from a laptop. It should be restartable, rate-limited, observable, and safe to pause.
Batch size: 500 rows
Progress key: last processed invoice id
Rate limit: sleep 250ms between batches
Stop condition: no rows where tax_breakdown is null
Dashboard: rows remaining, errors, database write latency
Rollback: leave old total_cents reads in place until backfill verification passes
If the table is hot, measure database load during the backfill. If records can be partially backfilled, application code must tolerate that partial state until the backfill is complete.
Rollback only works while the old shape still works
Redeploying the previous application version is not a rollback if the database no longer matches it. Dropping a column, changing enum values in place, or renaming a field too early can make the old version crash immediately.
Destructive changes belong at the end:
Release A: add new nullable column
Release B: read/write both fields
Release C: backfill
Release D: read only new field, keep old field populated
Release E: remove old writes and drop old field
That sequence is slower than a single migration. It is also the reason the team can recover when Release C reveals a bad assumption.
Make the compatibility window visible
Temporary compatibility code becomes dangerous when nobody knows whether it is still temporary. Track the window:
Compatibility: invoice total_cents and tax_breakdown coexist
Owner: billing platform
Started: 2026-06-05
Can remove old field when:
- PDF export workers deployed after build 1842
- backfill dashboard shows zero remaining rows
- support admin reads tax_breakdown for seven days without errors
Zero downtime is less about a heroic deployment platform and more about respecting the period where two versions of the system are alive. The database has to be boring during that period.