
"Pay only for what you use. Infinite scale. No servers to manage." The serverless pitch was irresistible. We rebuilt everything on AWS Lambda.
For async workloads, it was perfect. For user-facing APIs, it was a disaster.
The Numbers Nobody Warned Us About
| Metric | Value |
|---|---|
| P50 latency | 80ms |
| P95 latency | 450ms |
| P99 latency | 4,200ms |
| Cold start frequency | ~2% of requests |
| Cold start penalty | 3-5 seconds |
2% of users waited 4+ seconds for every request. That's 2% of users having a terrible experience, randomly, unpredictably. They thought our product was broken.
The Provisioned Concurrency Trap
"Just use provisioned concurrency!" AWS's solution to cold starts.
We tried it. Problems:
- Cost exploded: Provisioned concurrency charges whether used or not. Our bill tripled.
- Capacity planning returned: How many to provision? Too few = cold starts return. Too many = paying for idle.
- Traffic spikes: Provisioned concurrency doesn't auto-scale fast enough for spikes.
We'd traded "pay for what you use" for "pay for what you might use plus cold start lottery."
The Hidden Complexity
Connection management: Each Lambda invocation opens fresh connections. Database connection pools exploded. We added RDS Proxy. More cost, more latency.
Local development: Lambda locally is different from Lambda in AWS. SAM and LocalStack helped but never matched production behavior exactly.
Debugging: Distributed traces across dozens of Lambdas. No persistent logs. X-Ray helped but added latency and cost.
Deployment complexity: Each function deployed separately. Coordinating deployments across 40 functions was its own project.
What We Do Now: Hybrid Architecture
User-facing APIs: ECS Fargate containers. Always warm. Predictable latency. Connection pooling that works.
Async processing: Lambda. Perfect fit. Event-driven, sporadic traffic, cold starts don't matter.
Scheduled jobs: Lambda. Run once, done. Cold start is irrelevant.
Webhooks: Lambda behind API Gateway. Occasional traffic, latency tolerance acceptable.
The New Metrics
| Metric | Before (Lambda) | After (Hybrid) |
|---|---|---|
| P50 latency | 80ms | 45ms |
| P99 latency | 4,200ms | 180ms |
| Monthly cost | $8,000 | $5,500 |
| User complaints about slowness | Weekly | None |
Lower latency AND lower cost. The "serverless premium" was real.
When Serverless Wins
- Truly sporadic workloads (occasional scripts, scheduled reports)
- Event-driven backends (S3 triggers, queue processors)
- Prototype/early stage (before latency matters)
- Massive scale-to-zero requirements (multi-tenant with idle tenants)
Serverless isn't wrong. "Serverless-first" is. Let the workload dictate the architecture, not the hype.
Written by XQA Team
Our team of experts delivers insights on technology, business, and design. We are dedicated to helping you build better products and scale your business.