If your Kubernetes bill keeps growing faster than revenue, it's not "cloud inflation."
It's you running a mansion-sized cluster and only using two rooms.
The wake-up call nobody scheduled
We walked into a client spending $126,400 a month on Kubernetes. After 6 weeks of work, the same workloads ran at $88,300 a month. That's a 30.1% reduction—$38,100 every month—without touching SLAs or rewriting a single app.
$457,200/year. Gone. Because nobody looked at actual utilization metrics.
Here is exactly how that money was being burned, and what we changed.
The Ugly Baseline: 61% Of Their Cluster Was Doing Nothing
Their setup looked like every "growing" SaaS or healthcare platform:
The "Cloud Native" Setup (On Paper)
Infrastructure
→ AWS EKS with 3 node groups: "web", "batch", "misc"
→ Horizontal Pod Autoscaler (HPA) technically "enabled"
→ Cluster Autoscaler installed once and forgotten
Reality Check
→ Average node CPU utilization: 23%
→ Average node memory utilization: 31%
→ 68% of pods requested 3–8× the memory they actually used
→ 40–60% of allocated resources never touched
Monthly K8s-Related Spend
Worker Nodes (EC2)
$101,000
EBS + PVs
$14,600
LB + Networking
$7,800
Add-ons
$3,000
Total: $126,400/month
Paying for a 300-seat plane and flying with 120 passengers. Every day.
Mistake #1: "Provision for Peak" Instead of "Scale for Reality"
Every deployment used static, fat resource requests:
deployment.yaml
resources:
requests:
cpu: "500m"
memory: "2Gi"
limits:
cpu: "1"
memory: "4Gi"
Actual P95 usage for most web workloads? 180m CPU and 650Mi memory.
In other words, 2–3× over-provisioned on CPU, 3–6× on memory, almost exactly what large multi-cluster studies are reporting.
Why This Happens Every Single Time
→ One OOMKilled incident 18 months ago → everyone doubled memory "just to be safe"
→ Old Helm charts copied from tutorials with memory: 2Gi baked in
→ No feedback loop from real usage back into manifests
The result: autoscalers saw inflated requests and kept nodes alive "for safety" that never actually got used.
Mistake #2: Autoscaling Was Technically On, Functionally Useless
They "had" autoscaling:
The "Autoscaling" That Wasn't
→ HPA on a few services, targeting CPU only
→ Cluster Autoscaler with a minimum of 10 nodes per group
→ Cooldown windows so long that weekend traffic drops changed nothing
The Reality:
Nightly traffic dropped 60–70%. Node count dropped 0–1 node because minimums were pinned too high.
Batch jobs used CronJobs but no scale-down logic; nodes sat idle for hours afterwards.
This is how you end up in that 30–65% idle resource zone—K8s is allowed to scale up aggressively but almost never allowed to scale back down.
Mistake #3: Zero Use of Cheaper Capacity
Every node group ran 100% on on-demand instances.
No spot, no preemptible, no mixed instance policies. Platform engineering's excuse: "Spot is risky."
The workloads in those groups:
70% Stateless Web Pods
Multi-AZ, behind a load balancer. Perfect for spot.
Async Workers
Pulling from a queue. Idempotent, retryable. Perfect for spot.
Short-Lived Batch Jobs
Run and done. Perfect for spot.
Exactly the kind of workloads spot and preemptible were designed for. Industry-wide, moving these cuts that portion of the bill 60–90%.
They just never made the time to do it. *(Sound familiar?)*
The 3-Step Fix That Cut 30% Of Costs
We didn't introduce any magic. We fixed three boring things no one wanted to touch. With the right cloud & DevOps engineering, these are table stakes.
Step 1: Aggressive, Data-Driven Right-Sizing (Saved ~14%)
We pulled 30 days of metrics per workload: P50/P95 CPU and memory, request vs actual utilization, spike patterns (boot-time vs sustained load).
Then we rewrote requests/limits based on P95 + 15% buffer, not "what feels safe":
| Workload | Before (Requests) | Actual Usage | After (Right-Sized) |
|---|---|---|---|
| Web Service | 500m / 2Gi | 180m / 650Mi | 250m / 1Gi |
| Heavy Batch Job | 2 / 8Gi | 1.3 / 4.7Gi | 1500m / 6Gi |
Right-Sizing Impact
CPU Requests
↓ 32%
Memory Requests
↓ 27%
CPU Utilization
23% → 52%
Memory Utilization
31% → 58%
Right-sizing alone removed ~$17,600/month from the EC2 line item. No pods started erroring. No SRE pagers started screaming.
Step 2: Intelligent Autoscaling Instead of "Set It and Forget It" (Saved ~9%)
Next, we fixed scaling.
Horizontal Pod Autoscaler (HPA) Fixes
→ Switched several workloads to scale on both CPU and custom metrics (queue depth, RPS)
→ Tuned target utilization so we didn't overreact to micro-spikes
→ Set sane min/max pod counts so weekend traffic could actually drop capacity
Cluster Autoscaler / Karpenter Fixes
→ Lowered node group minimums (from 10 to 3 on non-critical groups)
→ Shortened scale-down timeouts
→ Introduced a separate, cheap node group for spiky workloads
Autoscaling Results
Low-Traffic Nights/Weekends
60% fewer nodes
Normal Off-Peak Hours
40% fewer nodes
Net EC2 Savings
~$11,300/mo
P95 latency improved because burst scaling worked instead of constantly hitting throttled nodes.
Step 3: Moving Stateless Workloads to Spot (Saved ~7%)
Finally, we attacked the big sacred cow: 100% on-demand capacity.
We carved out a new node group:
The Spot Instance Strategy
→ 70% spot, 30% on-demand mix
→ Only stateless web + worker pods scheduled here
→ PodDisruptionBudgets set to tolerate rolling terminations
→ Health checks and retries wired correctly
Backed it using Karpenter-style autoscaling to always pick the cheapest eligible instance flavors in the right AZs.
Spot Instance Results (Conservative Rollout)
Node Hours on Spot
42% within 3 weeks
Customer Downtime
Zero
Blended Rate Drop
55–65% vs on-demand
~$9,200/month shaved off, even after we left certain latency-sensitive services on pure on-demand.
The Before vs After (Numbers, Not Vibes)
Before (Monthly)
→ K8s worker nodes: $101,000
→ Storage + EBS: $14,600
→ LB + network overhead: $7,800
→ Add-ons: $3,000
Total: $126,400
After 6 Weeks of Optimization
→ K8s worker nodes: $69,800
→ Storage + EBS (unchanged, this phase): $14,600
→ LB + network: $7,800
→ Add-ons: $3,000
Total: $95,200 → Reduction: $31,200/month (24.7%)
Three more tweaks over the next month—aggressive cleanup of zombie namespaces, scaling down a legacy "misc" cluster, and moving a few cron workloads to cheaper instance types—brought it to $88,300/month.
Final Reduction vs Baseline
30.1%
$38,100/month = $457,200/year
Exactly in line with the 30–50% K8s savings FinOps shops deliver when you stop overprovisioning and let autoscaling actually do its job.
The Part No One Likes To Admit
None of this required new programming languages, rewriting microservices, or fancy service mesh rollouts.
It required three things your team has been avoiding:
1. Owning the cluster bill as a first-class SLO, not an afterthought.
2. Letting data—not PTSD from one OOM event—drive requests and limits.
3. Trusting automation to scale down as aggressively as you let it scale up.
Look, K8s didn't make your cloud bill explode. Your "better safe than sorry" defaults did. Studies show overprovisioning alone is responsible for 70% of unnecessary cloud spend in many environments, and in K8s clusters 30–65% waste is now considered "normal."
Normal doesn't mean acceptable.
You can keep paying $126,400 every month for a cluster that idles at 25% utilization. Or you can integrate proper operations governance and fix it.
The Insider Take: Your K8s Bill Is a Configuration Problem, Not a Platform Problem
If you've never optimized, 25–40% cluster-level savings is normal. Severely overprovisioned environments can exceed 50%. The fix isn't more tools—it's letting real utilization data drive your manifests instead of fear.
Stop treating resource requests like insurance premiums. Start treating them like profit margins.
Frequently Asked Questions
How long did the 30% K8s cost reduction actually take?
Six weeks for the initial ~25% drop, then another four weeks of incremental tuning to reach ~30%. The first meaningful savings showed up within 10 days of right-sizing resource requests.
Did right-sizing and autoscaling hurt performance or SLAs?
No—P95 latency improved because nodes stopped thrashing under inflated requests and autoscaling responded to real, not imaginary, load. Right-sizing based on P95 + 15% buffer means you still have headroom for spikes.
How much of the savings came from spot instances vs right-sizing?
Roughly 60% from right-sizing + smarter autoscaling, 40% from moving safe workloads to spot. Right-sizing is always the biggest lever because it unlocks better bin-packing, which directly reduces node count.
Do we need new tools to get similar results?
You can start with built-in metrics (kubectl top, Metrics Server) and native autoscalers (HPA, Cluster Autoscaler). Specialized K8s FinOps tools like Kubecost or PerfectScale make it easier to sustain and deepen the savings over time, but they're not required to get the first 20–30%.
What's the realistic savings range most K8s shops should expect?
If you've never optimized, 25–40% cluster-level savings is normal; severely overprovisioned environments can exceed 50%. The key is measuring actual usage against requested resources—most teams are shocked by the gap. Book a free K8s cost audit and we'll show you yours.

