Can we do AWS cost optimization ourselves?

Yes — but average engineering teams spend 80-150 hours over 3-6 months. We ship the same outcomes in 4-6 weeks because we run this playbook weekly.

AI on AWS Cost Cuts: 9 Levers That Save 40-65%

Q: How much does an AWS cost audit cost?

Free initial discovery + 3-invoice review. Full audit + remediation $18,000-$45,000 fixed depending on workload size. Median client savings within 6 weeks: $4,200/month.

Q: Will these changes break production?

Prompt caching, VPC endpoints, S3 Intelligent-Tiering — zero risk. Model right-sizing, Spot, retention windowing require staging testing. We ship through staging + canary, never direct to production.

Q: Ongoing cost management?

Monthly review + remediation under SLA tier ($2,000-$6,000/month). Most clients hit additional 30-45% savings in months 2-12.

Quick answer

From 18 US client AWS AI cost audits, 9 levers consistently cut bills 40-65%. The biggest wins come from prompt caching on Bedrock (saves 30-50% on inference), eliminating NAT Gateway data fees via VPC endpoints (saves $600-$2,400/month), and model right-sizing — using Claude Haiku for routing decisions and reserving Sonnet/Opus for complex reasoning (saves 40-70% on inference). One enterprise client went from $42,700/month to $14,900/month in 6 weeks with no change to user-facing performance.

40-65%

typical savings

audits shipped

6 wk

median fix time

Book a free AWS cost audit →

The 9 levers, ranked by typical impact

1. Bedrock prompt caching — 30-50% inference savings

Anthropic's prompt caching on Bedrock charges cached tokens at 10% of the normal input rate. For a typical AI agent with a 6,000-token system prompt + tool definitions + product knowledge, the system prefix is identical every request. Caching the prefix drops per-request cost from $0.11 to $0.02. On a 15K-request/day workload that is $1,350/day → $250/day = $33,000/month saved.

2. Model right-sizing (Haiku for routing, Sonnet/Opus for reasoning) — 40-70% inference savings

Most AI agent requests do not need Claude Opus. About 70% of requests in a typical support agent are routing decisions ("which tool to call") that Haiku handles fine at 1/15th the cost. Use Haiku as the first model, escalate to Sonnet/Opus only when the request requires multi-step reasoning. Tracks: input/output token ratio >2:1 = Haiku-eligible.

3. NAT Gateway → VPC endpoints — $600-$2,400/month saved

NAT Gateway charges $0.045 per GB processed. For AWS-internal traffic (Bedrock, S3, DynamoDB, RDS) you can use VPC endpoints which have zero data-processing fee. Most legacy architectures route everything through NAT; the migration to VPC endpoints takes 2-4 hours of CloudFormation work and the savings are immediate.

Free AWS cost audit. Send 3 invoices + Cost Explorer view. 48-hour written report: top 3 cost leaks + projected savings + fixed-price remediation quote.

Get your audit →

4. Spot fleet for non-production workloads — 70-90% on EC2/SageMaker training

Eval pipelines, model training, batch inference all run fine on Spot. Production user-facing endpoints stay On-Demand. Most clients have 20-40% of their AI compute in eval/training/batch — that whole layer can drop to Spot rates with proper checkpointing.

5. CloudWatch Logs retention windowing — $200-$1,800/month

Default retention is "never expire" which costs $0.50 per GB ingested forever. Set 30-day retention for application logs, 6-12 months for audit logs (HIPAA/SOC 2 requirement), and 7 years for the few audit logs that actually need it. Most clients see 60-80% of ingested log data is application-debug-level that should expire in 7 days.

6. S3 Intelligent-Tiering for model artifacts — $80-$400/month

Model artifacts, training datasets, and eval outputs accumulate in S3. Intelligent-Tiering moves objects to Infrequent Access or Glacier automatically based on access patterns. Old model checkpoints from 6 months ago drop to $0.0125/GB instead of $0.023/GB without any code change.

7. RDS / Aurora Reserved Instances — $200-$1,200/month

For the Postgres eval store + application database, a 1-year Reserved Instance commit saves 30-40% over On-Demand. Most clients run these databases 24/7 anyway, so the commitment is risk-free.

8. Eval-set sampling instead of full-corpus runs — $100-$800/month

Running the full eval set on every prompt change is wasteful. Most prompt changes affect <10% of test cases. Smart sampling (stratified by ticket type) catches 95% of regressions at 15% of the cost.

9. Snapshot + AMI lifecycle policies — $50-$400/month

EBS snapshots and EC2 AMIs accumulate. Most engineers create them and never delete. A simple Lambda + EventBridge rule to delete snapshots older than 90 days saves $50-$400/month per client.

A real audit: $42,700/month → $14,900/month (US healthcare client, Q1 2026)

Lever	Monthly saving
Bedrock prompt caching	$11,400
Model right-sizing (Haiku routing)	$8,200
VPC endpoints (NAT bypass)	$2,800
CloudWatch retention windowing	$1,900
Spot eval pipeline	$1,400
RDS Reserved Instance	$1,200
S3 Intelligent-Tiering	$400
Other (snapshot lifecycle, eval sampling)	$500
Total monthly savings	$27,800

User-facing latency stayed within 5% of the previous setup. Audit posture improved (the CloudWatch retention windowing aligned them with HIPAA 6-year requirement explicitly). Total remediation work: 6 weeks of one engineer's time + our fixed-price audit + remediation engagement of $28,000. ROI in week 5.

FAQ

How much does an AWS cost audit cost?

Free for the discovery call + initial 3-invoice review. Full audit + remediation engagement is fixed-price $18,000-$45,000 depending on workload size. Median client savings within 6 weeks: $4,200/month, fully paying back the engagement in 4-8 months.

Can we do these ourselves?

Yes, but the average engineering team spends 80-150 hours doing this work over 3-6 months. We ship the same outcomes in 4-6 weeks because we run this playbook every week.

Will these changes break production?

Prompt caching, NAT Gateway → VPC endpoint, S3 Intelligent-Tiering — zero risk, fully reversible. Model right-sizing, Spot, CloudWatch retention require testing in staging first. We always ship changes through staging + canary, never direct to production.

What about ongoing cost management after the audit?

Monthly cost review + remediation under our SLA tier ($2,000-$6,000/month). Most clients hit a 30-45% additional savings in months 2-12 as we catch new cost leaks early.

Free, 48-hour delivery

Cut your AWS AI bill 40-65% in 6 weeks

Send us your last 3 AWS invoices + Cost Explorer view. 48 hours later you get a written audit identifying your top 3 cost leaks, the projected savings, and a fixed-price remediation quote. No PDF gate.

Book a free 30-min AWS audit → AWS Consulting Services →

Related resources

Methodology

Lever rankings and savings figures from 18 Braincuber AWS AI cost audits shipped between June 2024 and April 2026. The $42,700 → $14,900 case study is a single anonymized US healthcare client (Q1 2026), metrics published with explicit permission. All AWS pricing referenced from public list rates as of April 2026. Median client savings ($4,200/month) is the audit-cohort median, not a maximum. Bedrock prompt caching savings cross-referenced against the Anthropic April 2026 prompt-caching benchmark of 90% input-token cost reduction for cache hits.

Quick answer

40-65%

typical savings

audits shipped

6 wk

median fix time

Book a free AWS cost audit →

The 9 levers, ranked by typical impact

1. Bedrock prompt caching — 30-50% inference savings

2. Model right-sizing (Haiku for routing, Sonnet/Opus for reasoning) — 40-70% inference savings

3. NAT Gateway → VPC endpoints — $600-$2,400/month saved

Free AWS cost audit. Send 3 invoices + Cost Explorer view. 48-hour written report: top 3 cost leaks + projected savings + fixed-price remediation quote.

Get your audit →

4. Spot fleet for non-production workloads — 70-90% on EC2/SageMaker training

5. CloudWatch Logs retention windowing — $200-$1,800/month

6. S3 Intelligent-Tiering for model artifacts — $80-$400/month

7. RDS / Aurora Reserved Instances — $200-$1,200/month

For the Postgres eval store + application database, a 1-year Reserved Instance commit saves 30-40% over On-Demand. Most clients run these databases 24/7 anyway, so the commitment is risk-free.

8. Eval-set sampling instead of full-corpus runs — $100-$800/month

Running the full eval set on every prompt change is wasteful. Most prompt changes affect <10% of test cases. Smart sampling (stratified by ticket type) catches 95% of regressions at 15% of the cost.

9. Snapshot + AMI lifecycle policies — $50-$400/month

EBS snapshots and EC2 AMIs accumulate. Most engineers create them and never delete. A simple Lambda + EventBridge rule to delete snapshots older than 90 days saves $50-$400/month per client.

A real audit: $42,700/month → $14,900/month (US healthcare client, Q1 2026)

Lever	Monthly saving
Bedrock prompt caching	$11,400
Model right-sizing (Haiku routing)	$8,200
VPC endpoints (NAT bypass)	$2,800
CloudWatch retention windowing	$1,900
Spot eval pipeline	$1,400
RDS Reserved Instance	$1,200
S3 Intelligent-Tiering	$400
Other (snapshot lifecycle, eval sampling)	$500
Total monthly savings	$27,800

FAQ

How much does an AWS cost audit cost?

Can we do these ourselves?

Yes, but the average engineering team spends 80-150 hours doing this work over 3-6 months. We ship the same outcomes in 4-6 weeks because we run this playbook every week.

Will these changes break production?

What about ongoing cost management after the audit?

Monthly cost review + remediation under our SLA tier ($2,000-$6,000/month). Most clients hit a 30-45% additional savings in months 2-12 as we catch new cost leaks early.

Free, 48-hour delivery

Cut your AWS AI bill 40-65% in 6 weeks

Book a free 30-min AWS audit → AWS Consulting Services →

Related resources

Cost Optimization for AI Workloads on AWS

The 9 levers, ranked by typical impact

1. Bedrock prompt caching — 30-50% inference savings

2. Model right-sizing (Haiku for routing, Sonnet/Opus for reasoning) — 40-70% inference savings

3. NAT Gateway → VPC endpoints — $600-$2,400/month saved

4. Spot fleet for non-production workloads — 70-90% on EC2/SageMaker training

5. CloudWatch Logs retention windowing — $200-$1,800/month

6. S3 Intelligent-Tiering for model artifacts — $80-$400/month

7. RDS / Aurora Reserved Instances — $200-$1,200/month

8. Eval-set sampling instead of full-corpus runs — $100-$800/month

9. Snapshot + AMI lifecycle policies — $50-$400/month

A real audit: $42,700/month → $14,900/month (US healthcare client, Q1 2026)

FAQ

Cut your AWS AI bill 40-65% in 6 weeks

Methodology

Ready to implement what you just read?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

Cost Optimization for AI Workloads on AWS

The 9 levers, ranked by typical impact

1. Bedrock prompt caching — 30-50% inference savings

2. Model right-sizing (Haiku for routing, Sonnet/Opus for reasoning) — 40-70% inference savings

3. NAT Gateway → VPC endpoints — $600-$2,400/month saved

4. Spot fleet for non-production workloads — 70-90% on EC2/SageMaker training

5. CloudWatch Logs retention windowing — $200-$1,800/month

6. S3 Intelligent-Tiering for model artifacts — $80-$400/month

7. RDS / Aurora Reserved Instances — $200-$1,200/month

8. Eval-set sampling instead of full-corpus runs — $100-$800/month

9. Snapshot + AMI lifecycle policies — $50-$400/month

A real audit: $42,700/month → $14,900/month (US healthcare client, Q1 2026)

FAQ

Cut your AWS AI bill 40-65% in 6 weeks

Methodology

Ready to implement what you just read?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief