Cost Optimization for AI Workloads on AWS
Quick answer
From 18 US client AWS AI cost audits, 9 levers consistently cut bills 40-65%. The biggest wins come from prompt caching on Bedrock (saves 30-50% on inference), eliminating NAT Gateway data fees via VPC endpoints (saves $600-$2,400/month), and model right-sizing — using Claude Haiku for routing decisions and reserving Sonnet/Opus for complex reasoning (saves 40-70% on inference). One enterprise client went from $42,700/month to $14,900/month in 6 weeks with no change to user-facing performance.
The 9 levers, ranked by typical impact
1. Bedrock prompt caching — 30-50% inference savings
Anthropic's prompt caching on Bedrock charges cached tokens at 10% of the normal input rate. For a typical AI agent with a 6,000-token system prompt + tool definitions + product knowledge, the system prefix is identical every request. Caching the prefix drops per-request cost from $0.11 to $0.02. On a 15K-request/day workload that is $1,350/day → $250/day = $33,000/month saved.
2. Model right-sizing (Haiku for routing, Sonnet/Opus for reasoning) — 40-70% inference savings
Most AI agent requests do not need Claude Opus. About 70% of requests in a typical support agent are routing decisions ("which tool to call") that Haiku handles fine at 1/15th the cost. Use Haiku as the first model, escalate to Sonnet/Opus only when the request requires multi-step reasoning. Tracks: input/output token ratio >2:1 = Haiku-eligible.
3. NAT Gateway → VPC endpoints — $600-$2,400/month saved
NAT Gateway charges $0.045 per GB processed. For AWS-internal traffic (Bedrock, S3, DynamoDB, RDS) you can use VPC endpoints which have zero data-processing fee. Most legacy architectures route everything through NAT; the migration to VPC endpoints takes 2-4 hours of CloudFormation work and the savings are immediate.
Free AWS cost audit. Send 3 invoices + Cost Explorer view. 48-hour written report: top 3 cost leaks + projected savings + fixed-price remediation quote.
Get your audit →4. Spot fleet for non-production workloads — 70-90% on EC2/SageMaker training
Eval pipelines, model training, batch inference all run fine on Spot. Production user-facing endpoints stay On-Demand. Most clients have 20-40% of their AI compute in eval/training/batch — that whole layer can drop to Spot rates with proper checkpointing.
5. CloudWatch Logs retention windowing — $200-$1,800/month
Default retention is "never expire" which costs $0.50 per GB ingested forever. Set 30-day retention for application logs, 6-12 months for audit logs (HIPAA/SOC 2 requirement), and 7 years for the few audit logs that actually need it. Most clients see 60-80% of ingested log data is application-debug-level that should expire in 7 days.
6. S3 Intelligent-Tiering for model artifacts — $80-$400/month
Model artifacts, training datasets, and eval outputs accumulate in S3. Intelligent-Tiering moves objects to Infrequent Access or Glacier automatically based on access patterns. Old model checkpoints from 6 months ago drop to $0.0125/GB instead of $0.023/GB without any code change.
7. RDS / Aurora Reserved Instances — $200-$1,200/month
For the Postgres eval store + application database, a 1-year Reserved Instance commit saves 30-40% over On-Demand. Most clients run these databases 24/7 anyway, so the commitment is risk-free.
8. Eval-set sampling instead of full-corpus runs — $100-$800/month
Running the full eval set on every prompt change is wasteful. Most prompt changes affect <10% of test cases. Smart sampling (stratified by ticket type) catches 95% of regressions at 15% of the cost.
9. Snapshot + AMI lifecycle policies — $50-$400/month
EBS snapshots and EC2 AMIs accumulate. Most engineers create them and never delete. A simple Lambda + EventBridge rule to delete snapshots older than 90 days saves $50-$400/month per client.
A real audit: $42,700/month → $14,900/month (US healthcare client, Q1 2026)
| Lever | Monthly saving |
|---|---|
| Bedrock prompt caching | $11,400 |
| Model right-sizing (Haiku routing) | $8,200 |
| VPC endpoints (NAT bypass) | $2,800 |
| CloudWatch retention windowing | $1,900 |
| Spot eval pipeline | $1,400 |
| RDS Reserved Instance | $1,200 |
| S3 Intelligent-Tiering | $400 |
| Other (snapshot lifecycle, eval sampling) | $500 |
| Total monthly savings | $27,800 |
User-facing latency stayed within 5% of the previous setup. Audit posture improved (the CloudWatch retention windowing aligned them with HIPAA 6-year requirement explicitly). Total remediation work: 6 weeks of one engineer's time + our fixed-price audit + remediation engagement of $28,000. ROI in week 5.
FAQ
How much does an AWS cost audit cost?
Free for the discovery call + initial 3-invoice review. Full audit + remediation engagement is fixed-price $18,000-$45,000 depending on workload size. Median client savings within 6 weeks: $4,200/month, fully paying back the engagement in 4-8 months.
Can we do these ourselves?
Yes, but the average engineering team spends 80-150 hours doing this work over 3-6 months. We ship the same outcomes in 4-6 weeks because we run this playbook every week.
Will these changes break production?
Prompt caching, NAT Gateway → VPC endpoint, S3 Intelligent-Tiering — zero risk, fully reversible. Model right-sizing, Spot, CloudWatch retention require testing in staging first. We always ship changes through staging + canary, never direct to production.
What about ongoing cost management after the audit?
Monthly cost review + remediation under our SLA tier ($2,000-$6,000/month). Most clients hit a 30-45% additional savings in months 2-12 as we catch new cost leaks early.
Free, 48-hour delivery
Cut your AWS AI bill 40-65% in 6 weeks
Send us your last 3 AWS invoices + Cost Explorer view. 48 hours later you get a written audit identifying your top 3 cost leaks, the projected savings, and a fixed-price remediation quote. No PDF gate.
Methodology
Lever rankings and savings figures from 18 Braincuber AWS AI cost audits shipped between June 2024 and April 2026. The $42,700 → $14,900 case study is a single anonymized US healthcare client (Q1 2026), metrics published with explicit permission. All AWS pricing referenced from public list rates as of April 2026. Median client savings ($4,200/month) is the audit-cohort median, not a maximum. Bedrock prompt caching savings cross-referenced against the Anthropic April 2026 prompt-caching benchmark of 90% input-token cost reduction for cache hits.
About the author
Head of AWS / Cloud Practice
Owns AWS architecture and cost optimization at Braincuber. Saved clients $4,200/month on average via right-sizing audits. AWS Solutions Architect Professional.

