The Vendor Lock-In Decision Framework

Published on May 15, 2024

Every architecture decision involves tradeoffs. One of the most debated: vendor lock-in. Should you use AWS Lambda or build your own orchestration? Azure SQL or Postgres? Google Cloud Functions or self-hosted containers?

The conventional wisdom is "avoid lock-in at all costs." But after working with dozens of organizations, I've found this absolutist view causes more problems than it solves.

Here's a framework for thinking through vendor decisions rationally.

The Real Cost of Lock-In

Lock-in has two actual costs:

Migration cost: If you need to switch vendors, how much will it cost in time and money?

Pricing leverage: If you're stuck with one vendor, they can raise prices and you have no options.

Notice what's NOT on this list: "using vendor-specific features." That's not a cost—that's an investment decision.

The Hidden Cost of Avoiding Lock-In

Teams scared of lock-in build abstraction layers to stay "portable." Here's what that actually looks like:

Lowest common denominator features. You can't use Aurora's write forwarding or Azure SQL's geo-replication because your abstraction layer needs to work with vanilla Postgres. So you're paying for premium services but getting commodity features.

Maintenance burden. That abstraction layer doesn't maintain itself. When AWS releases new features, you need to update your wrapper. When a bug appears, you're debugging your abstraction instead of using tested vendor code.

Performance overhead. Every abstraction adds latency and complexity. Your "portable" queue has worse throughput than SQS because it's a wrapper around multiple possible backends.

Opportunity cost. Time spent maintaining portability is time not spent building features customers care about.

I've seen teams spend 6 months building a "cloud-agnostic" deployment system that perfectly replicates 30% of what Kubernetes does. They avoided lock-in to AWS. They created lock-in to their own increasingly unmaintained internal tool.

The Matrix: When to Embrace vs Avoid Lock-In

Not all lock-in is equal. Here's how to evaluate each decision:

Fully Embrace (High Value, Low Risk)

Infrastructure primitives: EC2, S3, blob storage, managed databases. Every cloud has these. Migration is straightforward even if annoying.

Why embrace: These are commodities. The managed versions save enormous operational overhead. Migration risk is low because all clouds offer equivalents.

Example: Use RDS or Azure SQL instead of self-hosting Postgres. The operational savings are huge and migration to another managed Postgres is realistic.

Strategic Use (High Value, Medium Risk)

Platform services: Lambda, Azure Functions, Cloud Run. These are powerful but proprietary.

Why strategic use: They provide massive productivity gains. But you need a plan.

The mitigation: Keep business logic separate from platform code. Your Lambda handler should be a thin adapter calling domain logic. Then migration means rewriting adapters, not business logic.

Example: A Lambda function that receives events and calls your OrderProcessingService. If you need to move off Lambda, the OrderProcessingService goes anywhere—you just need a new trigger mechanism.

Carefully Evaluate (Medium Value, High Risk)

Vendor-specific data formats: DynamoDB's data model, Cosmos DB's unique features, proprietary document schemas.

Why careful: Data is the hardest thing to migrate. Once you have millions of records in a vendor-specific format, extraction is painful.

The mitigation: If you use proprietary data stores, have an export strategy from day one. Test it. Regular backups in portable formats. Scripts that can reconstruct data elsewhere.

Example: If using DynamoDB, maintain backups that can be imported into MongoDB or Postgres. Don't let the vendor own your data without an exit path.

Avoid (Low Value, High Risk)

Proprietary business logic: If AWS offered a "revenue calculation as a service" API, don't use it. Business logic should live in your codebase.

Why avoid: You can't extract this. Your core business processes shouldn't be opaque vendor services.

Workflow orchestration that's fully proprietary: AWS Step Functions are powerful, but your workflow definitions become vendor-specific. If workflows are critical to your business, use something portable like Temporal or code-based orchestration.

The TCO Calculation Most Teams Skip

Let's be concrete. Two scenarios:

Scenario A: Embrace Vendor Services

  • Team uses Lambda, SQS, DynamoDB, API Gateway
  • Development velocity: High (no infrastructure to manage)
  • Operational overhead: Low (vendor handles scaling, monitoring, patches)
  • Migration risk: Medium (would take 2-3 months if needed)
  • Monthly cost: $15K in AWS services
  • Engineering cost: 0.5 FTEs on infrastructure

Total annual cost: $180K (AWS) + $100K (0.5 FTE) = $280K

Scenario B: Cloud-Agnostic Approach

  • Team runs Kubernetes, self-managed Postgres, Redis, custom queuing
  • Development velocity: Medium (infrastructure work competes with features)
  • Operational overhead: High (team owns monitoring, scaling, incidents)
  • Migration risk: Low (could move clouds in 1 month)
  • Monthly cost: $8K in compute resources
  • Engineering cost: 2 FTEs on infrastructure

Total annual cost: $96K (compute) + $400K (2 FTEs) = $496K

The "portable" approach costs $216K more per year and ships features slower. Even if you needed to migrate every 5 years, the economics don't work out.

Now, this calculation changes if:

  • You're at massive scale where managed services become cost-prohibitive
  • You have compliance requirements that rule out certain services
  • You're building infrastructure software where portability is a product feature

But for most businesses, vendor services are the better economic choice.

The Questions to Ask

Here's my decision framework for every vendor service:

1. How Hard Is Migration?

Easy: Infrastructure primitives with clear equivalents (compute, storage, networking)

Medium: Services with similar alternatives (Lambda → Cloud Functions → Cloud Run)

Hard: Proprietary data formats or workflow definitions

Impossible: Vendor owns your business logic

2. How Much Value Does It Provide?

Massive: Eliminates whole categories of undifferentiated work (managed databases, serverless compute)

Significant: Solves complex problems you'd otherwise build poorly (authentication, CDN, DDoS protection)

Marginal: Convenience features you could easily build (simple queues, basic caching)

3. What's Your Realistic Alternative?

Don't compare vendor services to an imaginary perfect alternative. Compare to what you'd actually build.

Bad comparison: "DynamoDB vs a perfectly optimized NoSQL cluster we'd build"

Real comparison: "DynamoDB vs the MongoDB cluster we'd run on EC2 with one person on-call for incidents"

4. How Critical Is This Component?

Core business logic: Avoid proprietary implementations

Infrastructure layer: Embrace managed services

Supporting features: Use whatever ships fastest

5. Do You Have Leverage?

No leverage: Small startup with $2K/month AWS bill. AWS will never negotiate and lock-in doesn't matter because you can't afford the migration anyway.

Some leverage: $50K/month spend. You have options, but switching costs are real. This is where abstraction layers might make sense for negotiating purposes.

Significant leverage: $500K/month spend. You can negotiate with vendors. You can also afford engineers to build abstractions if they're worth it. But you're probably better off playing vendors against each other than building portability.

The Compounding Error

The worst decision is building bad abstractions that provide neither vendor benefits nor real portability.

I've seen teams:

Build a "multi-cloud deployment system" that kind of works on AWS and Azure but uses neither platform's strengths. It's just badly reimplementing Kubernetes.

Create a "database abstraction layer" that works with MySQL, Postgres, and SQL Server. It supports the lowest common denominator of SQL, so they can't use any advanced features. They might as well use SQLite.

Wrap every AWS service in "portable" interfaces that are actually just pass-throughs with extra latency. When they try to migrate, they discover the abstractions leaked vendor details everywhere anyway.

Good abstractions:

Keep business logic separate from platform adapters. You can rewrite adapters; you can't rewrite tangled code.

Abstract at the right level. A repository pattern for data access? Reasonable. An abstraction that hides whether you're using REST vs GraphQL vs gRPC? Probably too leaky.

Test the abstraction's portability. If you've never actually swapped implementations, you don't know if it's portable. You just know it's slower.

When You Absolutely Need Portability

Some organizations actually do need multi-cloud:

True requirements for portability:

Regulatory requirements. Financial services that need data residency across multiple regions and clouds.

Customer requirements. Selling to enterprises who mandate deployment in their cloud.

Jurisdictional requirements. Operating globally where some countries ban certain cloud providers.

You're selling infrastructure software. Your product needs to run anywhere.

Real risk of a single cloud going away. This is mostly theoretical, but if you're building something with a 20-year lifespan, plan accordingly.

If you have these requirements, embrace the complexity. Use Kubernetes. Build real abstraction layers. Test them continuously.

But if you're a typical SaaS company optimizing for "what if we want to switch someday," you're paying real costs today for hypothetical benefits tomorrow.

The Honest Recommendation

Start with vendor services. Get to product-market fit. Build the business. Worry about lock-in when you're successful enough that it matters.

Design for extraction, not portability. Keep business logic separate. Use adapters at boundaries. Don't scatter vendor-specific code throughout your codebase.

Have a migration playbook. Document what you'd need to do if you had to switch. Test exports periodically. Know the cost. Then make an informed decision about whether to pay it.

Negotiate when you have leverage. Once you're spending real money, use migration risk as negotiating leverage. "We're evaluating alternatives" is a powerful statement when you've designed for extraction.

The goal isn't perfect portability. The goal is making an informed tradeoff between velocity today and flexibility tomorrow.

Most companies optimize for the wrong one.



Ready to Talk About Your Project?

If you're dealing with any of the challenges discussed in this post, let's have a conversation about how I can help.

Get In Touch