Migrations That Don't Lose You a Weekend (or a Customer)
- modernisation
- architecture
- delivery
Most migration horror stories share a root cause: someone treated it as an event instead of a process. The big-bang cutover on a Saturday night, the rollback that doesn't work, the customers who notice. I've led modernisation on platforms turning over eight figures, and the ones that went smoothly all followed the same unglamorous discipline.
Migrate in parallel, never in place
Run the new path alongside the old one. Dual-write, shadow-read, compare outputs in production with real traffic before anything depends on the new system. You want to discover the discrepancy while the old path is still serving customers, not after you've switched it off.
The data is the hard part, always
Teams underestimate data debt every single time. The schema that doesn't quite map, the records that violate the constraints you're about to add, the "temporary" field everything now depends on. Audit the data early and brutally — it's where the timeline actually lives, not the code.
A cutover you can reverse in minutes
If your rollback plan is "redeploy the old version and hope," you don't have one. Feature flags, a routing layer you can flip, a tested path back. The confidence to cut over comes entirely from the confidence you can cut back.
Infrastructure as code makes it repeatable
Terraform and containers turn "it worked in staging" into "it is identical in production." Environment drift is what turns a clean migration into a 2am incident. Eliminating it is boring and it's the difference between a non-event and a weekend lost.
Slow is smooth, smooth is fast
Phased, reversible, observable. It feels slower than the heroic cutover. It is dramatically faster than the heroic cutover that fails.