We Killed Our Cloud Migration—On-Premise Was Better (and 70% Cheaper)

In 2023, our board asked why we weren't "in the cloud." Every startup was in the cloud. Every enterprise was migrating to the cloud. Remaining on bare metal felt archaic, like driving a horse-drawn carriage on a superhighway. So we hired a consultancy, formed a "Cloud Center of Excellence," and began a massive lift-and-shift of our infrastructure to AWS.

Two years and 4.2 million dollars later, we stopped. We didn't just pause; we completely reversed course. We canceled the enterprise support contract, ordered pallets of Dell servers, leased a cage in an Equinix datacenter, and repatriated our data. Our monthly infrastructure bill dropped by 70%. Here is the reality of the cloud that the major providers desperately try to obscure.

The False Promise of "Infinite Scale"

The primary argument for the cloud is elasticity. You only pay for what you use, and you can scale infinitely to meet spikes in demand. This is a brilliant marketing pitch for a problem most companies do not have.

We are a B2B SaaS company. Our traffic is not variable. We do not experience "Super Bowl spikes." Our traffic curve looks like a gentle sine wave that peaks at 10 AM on weekdays and drops on weekends. We have highly predictable workloads. We know exactly how much compute, memory, and storage we need six months in advance.

When you have predictable workloads, paying a massive premium for the option of elasticity is financial malpractice. We were paying AWS an 800% markup on compute purely for the privilege of being able to spin up 1,000 servers in three minutes—a capability we had never needed in seven years of operation.

Furthermore, cloud scaling isn't actually infinite, nor is it effortless. When you hit a sudden spike in AWS, you hit account limits. You hit API rate limits. Your database connections max out before your auto-scaling groups provision new instances. True infinite scale in the cloud requires incredibly complex, custom-engineered architecture. It is not something you get simply by moving your VMs to EC2.

The Egress Extortion

Data has gravity. The cloud providers understand this perfectly. They make it incredibly cheap to move your data into their ecosystem, and punitively expensive to move it out or between zones.

Our application processes immense amounts of telemetry data. In our on-premise datacenter, a terabyte of data moving between servers on a local network costs literally nothing. It is a sunk capital cost in our switches and cabling.

In AWS, moving that same terabyte between availability zones (which is required for the "high availability" architecture they mandate) costs $0.01 to $0.02 per gigabyte. When you are processing hundreds of terabytes per month, these data transfer fees accumulate into a staggering monthly tax.

Our AWS bill was roughly 30% compute, 15% storage, and 55% network and bandwidth fees. We were bleeding cash simply because our microservices were talking to each other. The cloud providers have essentially monetized the speed of light.

The Illusion of Managed Services

Another major selling point of the cloud was reduced operational overhead. "Don't manage databases," they said. "Use managed RDS. Use DynamoDB. Focus on your business logic, not infrastructure."

This is a compelling argument until you realize that managed services do not eliminate operational overhead; they merely shift it. Instead of tuning PostgreSQL configurations in Linux, our engineers were tuning complex IAM policies, navigating impenetrable AWS networking rules, and writing custom Terraform modules to manage the "managed" services.

When an on-premise database fails, a talented database administrator can SSH into the box, read the kernel logs, look at the disk I/O, and fix it. When AWS Aurora experiences localized degradation in a specific availability zone, your dashboard turns orange, and your only recourse is to submit a support ticket, wait for an engineer halfway across the world to investigate, and pray.

We realized that we hadn't outsourced our operations. We had outsourced our visibility and our control. We traded the acute pain of maintaining hardware for the chronic pain of opaque, unresolvable cloud outages.

The Cost of Complexity

Cloud infrastructure is inherently complex because it is designed to be multi-tenant and universally applicable to millions of different businesses. To secure an on-premise network, you set up a perimeter firewall, configure a VPN, and manage internal subnets. It is straightforward network topology.

In the cloud, you are dealing with VPCs, Security Groups, Network ACLs, Route Tables, Transit Gateways, Internet Gateways, NAT Gateways, and a Byzantine IAM system where a single misconfigured JSON policy can expose your entire customer database to the public internet.

This complexity requires dedicated, highly paid cloud architects. Every deployment becomes a massive undertaking of infrastructure-as-code planning. We spent more engineering hours arguing about VPC peering strategies and IAM roles than we previously spent racking servers and configuring switches.

The Math of Repatriation

When our AWS bill crossed $400,000 per month, the CFO demanded an audit. We modeled what it would cost to run the exact same workloads on our own hardware.

The Cloud Model:

Monthly Compute (Reserved Instances): $120,000
Monthly Storage (S3, EBS): $60,000
Monthly Bandwidth/Egress: $220,000
Total Monthly: $400,000
Annual Cost: $4,800,000

The On-Premise Model (Amortized over 3 years):

Hardware Hardware (Servers, Switches, SANs): $1,200,000 ($33,333/mo)
Colocation Space and Power (2 racks): $8,000/mo
Blended Internet Transit (10Gbps): $4,000/mo
Two Full-Time Infrastructure Engineers: $30,000/mo
Total Equivalent Monthly: $75,333
Annual Cost: $903,996

We were paying nearly 4 million dollars extra every year for the privilege of not owning our hardware. You can hire a lot of incredibly talented systems administrators for 4 million dollars.

The Execution

Repatriating from the cloud is harder than migrating to it. The cloud providers intentionally build "sticky" services—proprietary queues, proprietary serverless functions, proprietary databases—that firmly lock you into their ecosystem. We had to rewrite the components of our system that had adopted AWS Lambda and SQS, replacing them with standard Node.js applications and RabbitMQ.

We bought high-density Dell servers packed with AMD EPYC processors and massive amounts of NVMe storage. We installed Proxmox as our hypervisor, set up a robust Kubernetes cluster, and deployed our applications.

The performance difference was staggering. Because our applications were no longer traversing complex virtualized cloud networks and reading from throttled network-attached storage (EBS), our 99th percentile latency dropped by 40%. Our database queries executed twice as fast. Bare metal is dramatically, undeniably faster than virtualized cloud compute.

Conclusion

The cloud makes perfect sense for two types of companies: two-person startups that need to move fast without capital expenditure, and wildly unpredictable consumer applications (like a viral game) where traffic can spike 10,000% in an hour.

For everyone else—the profitable, predictable, mid-sized B2B companies—the cloud is a luxury tax. It is a wealth transfer from your balance sheet to Amazon, Microsoft, and Google, justified by marketing rhetoric about agility and scale.

We bought our own hardware. We control our own network. We know exactly what our infrastructure will cost next month. We killed our cloud migration, and our business has never been healthier.

Tags:TechnologyTutorialGuide

Written by XQA Team

Our team of experts delivers insights on technology, business, and design. We are dedicated to helping you build better products and scale your business.

•