
Continuous Integration and Continuous Deployment became religious doctrine. Every commit must trigger the full pipeline. Every change must be automatically tested, validated, analyzed, and deployed. If something isn't in the pipeline, it isn't real. This was our church.
Three years later, we'd created a Rube Goldberg machine that took 47 minutes to run, failed 30% of the time due to flaky tests, and required a dedicated "pipeline team" of 3 engineers to maintain. The thing meant to accelerate delivery was our biggest bottleneck.
This is the story of CI/CD maximalism, why it backfired, and what we do instead. The goal is shipping software, not maintaining automation for its own sake.
How We Got Here
It started reasonably. We had manual deployments that were error-prone. Someone forgot a step and production went down. We automated the deployment. Good decision.
Then we added automated tests. Also good—catch bugs before they reach production. Then we added linting. Then security scanning. Then license compliance checking. Then dependency vulnerability scanning. Then code coverage thresholds. Then performance benchmarks. Then container image scanning. Then infrastructure validation. Then more tests. Then even more tests.
Each addition made sense in isolation. Each was solving a real problem. But the aggregate became monstrous.
Our final pipeline before the intervention:
- Code checkout and dependency installation: 3 minutes
- Linting (ESLint, Prettier, custom rules): 2 minutes
- Unit tests: 8 minutes
- Integration tests: 12 minutes
- End-to-end tests: 15 minutes
- Security scanning (SAST): 4 minutes
- Dependency vulnerability scan: 2 minutes
- License compliance check: 1 minute
- Code coverage analysis: 2 minutes
- Build Docker images: 5 minutes
- Container security scan: 3 minutes
- Infrastructure validation: 2 minutes
- Deploy to staging: 4 minutes
- Smoke tests on staging: 5 minutes
- Deploy to production: 4 minutes
- Production smoke tests: 3 minutes
Total: 75 minutes on a good day. But it was rarely a good day.
The Reality: 47 Minutes Average, When It Worked
We parallelized what we could. The 75-minute sequential pipeline became roughly 47 minutes with parallelization. Still brutally long, but better.
Except it often didn't work. Our pipeline success rate was 70%. Nearly one in three runs failed for reasons unrelated to actual code problems:
- Flaky tests: E2E tests that failed randomly due to timing issues. "Retry and it'll probably pass."
- Network timeouts: Fetching dependencies from npm or container registries would occasionally time out.
- Resource contention: Parallel jobs competing for CI runner resources. Tests would fail with out-of-memory errors.
- Third-party service issues: Security scanners would be down. License databases would be slow. External dependencies failed.
- Configuration drift: Something in the pipeline configuration was subtly wrong, and it took hours to debug.
When a pipeline failed, developers had two choices: spend 15-30 minutes debugging the failure, or click "retry" and hope it was flaky. Most chose retry. Multiple times. The average time from commit to production deploy was 3-4 hours when you included retries and queue times.
The Feedback Loop Was Destroyed
Fast feedback is the point of CI/CD. The dream: commit code, get feedback in minutes, iterate quickly. Our reality: commit code, context switch to something else, forget what you were doing, get interrupted by a failure notification, try to remember the context, fix the issue, commit again, repeat.
The cognitive overhead was enormous. Developers stopped making small, incremental commits because each commit was a 47-minute tax. Instead, they'd batch changes into large commits, which were harder to review, more likely to contain bugs, and more painful to revert when things went wrong.
We had continuous integration that nobody integrated continuously because the feedback loop was too slow.
The Hidden Cost: The Pipeline Team
Maintaining our CI/CD infrastructure required dedicated engineers. Not because we wanted it, but because nothing else would have gotten done.
The "DevOps" team (really the Pipeline Team) consisted of 3 engineers whose primary job was:
- Debugging flaky tests and pipeline failures
- Upgrading CI/CD tools and runners
- Optimizing pipeline performance
- Managing secrets, credentials, and access
- Responding to "the pipeline is broken" emergencies
- Writing and maintaining pipeline configuration (YAML files totaling 3,000+ lines)
3 engineers out of 60. 5% of engineering capacity dedicated to maintaining the machine that was supposed to accelerate the other 95%. When we did the math, we found the pipeline team cost significantly more than the bugs the pipeline was catching.
And the pipeline team was increasingly burned out. Their on-call rotation was brutal. Nobody wanted to be the person investigating why the E2E tests were failing at 2am.
The Cost-Benefit Analysis That Changed Everything
New engineering leadership asked a simple question: What is this pipeline actually preventing?
We analyzed six months of data:
| Pipeline Stage | Failures/Month | Real Bugs Caught | False Positives |
|---|---|---|---|
| Linting | 45 | 0 | 45 |
| Unit tests | 120 | 85 | 35 |
| Integration tests | 80 | 40 | 40 |
| E2E tests | 200 | 20 | 180 |
| Security scan | 15 | 2 | 13 |
| Dependency scan | 30 | 5 | 25 |
| Other stages | 50 | 3 | 47 |
E2E tests failed 200 times per month. Only 20 were real bugs. 90% were flaky false positives that just needed a retry. Meanwhile, they were adding 15 minutes to every pipeline run.
The security scanner caught 2 real issues in six months. It also produced 78 false positives that required investigation. Engineers learned to ignore it—there's a term for this: "alert fatigue."
We were maintaining expensive infrastructure to catch bugs that were either:
- Already caught by simpler stages (unit tests caught most real issues)
- So rare they didn't justify the overhead
- False positives that wasted developer time
The Simplification
We didn't eliminate CI/CD. We made it proportionate. The principles:
Principle 1: Fast Feedback First
The primary goal became fast feedback, not comprehensive validation. If a developer can't get feedback in under 10 minutes, the pipeline is too slow.
We created a two-track system:
Fast track (under 5 minutes): Linting, unit tests, build verification. Runs on every commit. Fast enough that developers can wait for it.
Full track (under 20 minutes): Integration tests, security scans, full validation. Runs on PRs before merge and on main branch after merge. Developers don't wait for it—they move on.
Principle 2: Fix Flaky Tests or Delete Them
We gave teams a mandate: every flaky test must be fixed within one week, or it gets deleted. If a test fails randomly, it provides negative value—it wastes time and erodes trust.
The result: Our E2E test suite shrank from 400 tests to 80 tests. The remaining 80 were reliable and actually caught bugs. Coverage went down on paper; actual bug detection went up in practice.
Some teams replaced E2E tests with contract tests—faster, more reliable, and they caught the same integration bugs. Not every change needs a browser automation test.
Principle 3: Eliminate Low-Value Stages
We asked of every pipeline stage: Is this catching bugs that matter? Is it catching bugs proportional to its cost?
Stages we kept:
- Unit tests: Fast, reliable, high bug detection
- Integration tests: Moderate speed, catches real issues
- Basic security scanning: Quick, catches obvious problems
- Build and deploy: Obviously necessary
Stages we moved to weekly/manual:
- Comprehensive security scan: Run weekly, not on every commit
- License compliance: Run weekly
- Dependency vulnerability scan: Run daily, separately from deploy pipeline
Stages we eliminated:
- Code coverage thresholds: Gaming them cost more than they helped
- Multiple redundant container scans: One was enough
- Complex infrastructure validation: Replaced with simpler checks
Principle 4: Pipeline as Product
The pipeline is a product used by developers. Like any product, it needs user research, performance metrics, and continuous improvement focused on user experience.
We instrumented the pipeline to track:
- Time from commit to production
- Pipeline success rate
- Developer wait time attributable to pipeline
- True positive rate for each stage
These became OKRs. The pipeline team's success was measured by developer productivity, not by how comprehensive the pipeline was.
The Results
| Metric | Before | After |
|---|---|---|
| Pipeline duration (fast track) | 47 min | 4 min |
| Pipeline duration (full track) | 47 min | 18 min |
| Pipeline success rate | 70% | 95% |
| Time from commit to production | 3-4 hours | 25 minutes |
| Deployments per day | 4 | 20+ |
| Pipeline team size | 3 engineers | 1 engineer (part-time) |
| Production bugs (pipeline-escapees) | 5/month | 6/month |
Production bugs increased by one per month. That's the tradeoff. But we now ship 5x faster, with 2.5 fewer engineers dedicated to pipeline maintenance. The marginal bug is worth the velocity.
When Full Pipelines Make Sense
Our approach works for B2B SaaS with moderate blast radius. Other contexts may need more:
- Safety-critical systems: Medical devices, aircraft, nuclear plants. Comprehensive testing is life-or-death. Accept the slowness.
- Financial systems with regulatory requirements: Some stages are mandated by law. Optimize within constraints.
- Open source libraries with many consumers: Thorough testing protects downstream users. Take the time.
But for most web applications? Feature velocity matters more than catching the marginal bug. Users would rather have new capabilities with occasional bugs than a slower product that's incrementally more polished.
The Lesson
CI/CD is a tool for shipping faster, not an end in itself. When the tool takes longer than the work, the tool is broken.
Ask regularly: Is this pipeline helping us ship faster, or is it deployment theater that makes us feel rigorous while actually slowing us down?
The best pipeline is the simplest one that catches the bugs that matter. Everything else is overhead.
Comprehensive pipelines feel responsible. Fast pipelines deliver results. When your 47-minute pipeline becomes a 4-minute pipeline, you'll wonder why you waited so long to simplify.
Written by XQA Team
Our team of experts delivers insights on technology, business, and design. We are dedicated to helping you build better products and scale your business.