The Hidden Cost of 'We'll Just Rewrite It'
Published on August 12, 2024
"We'll just rewrite it" might be the four most expensive words in software development.
I've seen this play out dozens of times. A team inherits a messy legacy system, spends a few weeks trying to understand it, gets frustrated, and declares that rewriting from scratch will be faster than fixing what exists. Six months later, they're halfway done, over budget, and realizing the old system actually did more than they thought.
Why Rewrites Are Tempting
I get it. Looking at a 10-year-old codebase is painful:
- Mix of coding styles from different eras
- Patterns that made sense in 2014 but not today
- Dependencies on outdated frameworks
- Zero documentation and the original developers are long gone
- Mysterious code comments like "DO NOT REMOVE - production breaks"
The temptation is to wipe the slate clean and build it "the right way" with modern tools and patterns. It feels like the engineering answer to a fresh start.
The Hidden Complexity in Legacy Systems
Here's what you don't see when you look at that ugly codebase:
It handles edge cases you've forgotten exist. That weird validation logic in the checkout flow? Turns out it prevents a race condition that caused $50K in chargebacks five years ago. Nobody remembers this anymore, but removing it will reintroduce the problem.
It contains 10 years of bug fixes. Every if (x == null && y > 0) check represents a production incident someone debugged at 2 AM. Your rewrite won't have these fixes until you rediscover each bug the hard way.
It reflects business rules that aren't documented. The accounting team expects invoices to follow a specific numbering pattern. The warehouse needs orders formatted exactly one way for their integration. None of this is written down—it just works.
It has performance optimizations you don't see. That confusing caching layer? It was added because the naive approach couldn't handle Black Friday traffic. Your clean rewrite will have the same scaling problems until you add the same ugly caching.
The Math That Doesn't Work
Here's the typical rewrite pitch:
"The old system took 3 years to build, but we've learned so much since then. With modern tools, we can rebuild it in 6 months."
This is fantasy. Here's reality:
- Months 1-2: Setting up the new architecture, picking frameworks, debating microservices vs monolith
- Months 3-4: Building the happy path—features seem to work great in demos
- Month 5: Realizing the old system did way more than you documented
- Month 6: Panic as the deadline approaches and you're at 60% feature parity
- Months 7-12: Rebuilding all the edge cases and bug fixes from the legacy system
- Months 13-18: Dealing with the new bugs your rewrite introduced
- Months 19-24: Performance tuning because your clean architecture doesn't scale
So your "6-month rewrite" actually takes 2 years. During that time:
- Your competitors shipped new features
- Your existing system still needed maintenance (bugs don't stop for rewrites)
- Your team got burned out from the death march
- You burned through your innovation budget on rebuilding existing functionality
When Rewrites Actually Make Sense
I'm not saying never rewrite. Sometimes it's the right call:
The technology is truly dead. If you're on Classic ASP or Visual Basic 6 or Flash, yes, you need to rewrite. The platform is dead and there's no path forward.
The business has fundamentally changed. If you built a desktop app but now need cloud-based SaaS, or you were B2C but pivoted to B2B enterprise—the requirements are so different that the old code isn't salvageable.
The cost of change exceeds the cost of replacement. Sometimes the accidental complexity is so high that every change takes weeks and introduces multiple bugs. At some point, the math flips.
You have true greenfield opportunity. A new product line, new market, new customer segment—something that doesn't need to maintain compatibility with the existing system.
The Strangler Fig Alternative
Here's what actually works: the strangler fig pattern.
Imagine wrapping a new system around the old one, gradually replacing functionality piece by piece. Each new feature gets built in the new system. Each refactor extracts one component at a time. The old system shrinks while the new one grows around it.
Benefits:
- You can ship incrementally and get feedback
- Users never experience a "big bang" cutover
- You learn what the old system actually did before replacing it
- If you run out of budget, you still have a working system
- Risk is distributed over time instead of all at once
Example:
- Month 1: New checkout flow in modern React, old system handles everything else
- Month 2: Extract product catalog to new API, keep legacy search for now
- Month 3: Migrate user authentication to new system
- Month 6: 50% of functionality is in the new system, but everything still works
- Month 12: Legacy system is down to 20% of functionality
- Month 18: Turn off the last piece of legacy code
At any point, if you need to stop (budget, priorities shift, whatever), you have a working hybrid system. With a full rewrite, stopping halfway means you have nothing.
The Refactor vs Rewrite Decision Tree
Use this framework:
Can you deploy the existing code? If no, start with getting deployment automated. You can't safely change code you can't deploy.
Can you add tests to the existing code? If yes, do that first. Tests give you confidence to refactor. Start with characterization tests that document current behavior.
Is the core domain logic salvageable? The business rules are usually fine—it's the infrastructure and UI that's ugly. Extract the domain logic and wrap it in a new shell.
Can you identify clear boundaries? If you can isolate components (like "the reporting system" or "the admin panel"), you can extract and replace them one at a time.
Do you have 2-3x the timeline and budget you think you need? If not, you can't afford a rewrite. The scope always grows.
What to Do Instead
Here's my standard recommendation for teams stuck with legacy code:
Stop digging. No new features in the old codebase. Everything new gets built in a new layer.
Build a facade. Create a new API layer that wraps the old system. New code calls the facade, not the old system directly.
Extract one vertical slice. Pick one feature end-to-end (like user registration) and rewrite just that. Get it to production. Learn from the experience.
Measure everything. Track how long changes take, how often bugs happen, what parts of the code cause the most pain. This data tells you where to focus refactoring effort.
Set a sunset date. Give yourself a deadline for the old system. This creates urgency but also forces realistic planning.
The ROI Conversation
When your team wants to rewrite, make them do the math:
- Old system maintenance costs: $X per month
- Rewrite cost: $Y total
- Time to break even: Y / X months
- But factor in: opportunity cost of not shipping new features during rewrite
- And factor in: risk that rewrite takes longer than estimated (it will)
Often this math shows that limping along with the old system while gradually modernizing is cheaper than a full rewrite, even though it feels worse.
The Bottom Line
Rewrites fail because they assume that rebuilding will be faster than understanding. It rarely is.
Your legacy system is ugly because it solved real problems in real production environments over years. Your rewrite will look equally ugly eventually—it just hasn't had time to accumulate that wisdom yet.
Respect the legacy code. It might be ugly, but it works. Your job isn't to replace it with something prettier—it's to deliver value to the business. Sometimes that means refactoring in place. Sometimes that means gradual migration. Sometimes it even means a rewrite.
But make that decision based on ROI, not frustration.
Related Posts
Ready to Talk About Your Project?
If you're dealing with any of the challenges discussed in this post, let's have a conversation about how I can help.
Get In Touch