Cost Anomaly Detection has always been good at the what: ECS costs are up 28%, attributed to service X in account Y. What it couldn't do was the why — whether that spike came from something in your environment using more resources, or from a shift in how AWS is pricing what you're already running. Those two causes require completely different fixes, and until now, distinguishing them meant manually pulling CloudTrail, IAM, and Cost Explorer data in the right sequence and hoping your ops person knew which sequence that was.
The new AI-powered cost investigation, triggered directly from the anomaly detail page, makes usage-driven vs rate-driven the first classification step — before it looks at anything else. Everything after that split follows from it: which data sources to consult, what the root cause is, which team to alert. For a D2C team that currently handles AWS cost spikes by opening Cost Explorer and clicking dimensions until something looks wrong, this changes the investigation from 85 minutes of guessing to 4 minutes of reading.
Running AWS for a D2C brand with no dedicated FinOps person? Book a 30-min audit — Dev joins every call, we walk through your anomaly detection setup and CloudTrail coverage, written brief inside a week. No SDR layer.
What the AI Investigation Actually Returns
When you click "Investigate with Amazon Q" on the anomaly detail page, the investigation runs immediately and returns a structured answer built around five questions:
- What: which service, account, and usage dimension changed, and by how much relative to the 90-day baseline
- When: the precise time window the change began and its duration
- Where: which account and region — with cross-account coverage if org-wide CloudTrail is configured
- Who: the specific IAM role or user whose API calls correlate with the timing of the change (usage-driven only; requires CloudTrail)
- Why: the plain-language explanation grounded in specific evidence, with explicit flags where data gaps exist
The investigation is conversational — you can ask follow-up questions in the same session without starting over. "Show me the CloudTrail events for that IAM role over the same window" or "Was this account flagged in any previous anomalies?" are the kinds of follow-ups that surface context the initial summary might not have surfaced unprompted.
One thing AWS is honest about: when evidence gaps exist, the investigation says so directly rather than filling them with inference. If CloudTrail isn't wired in the relevant account, the investigation will tell you that and explain what it was unable to determine as a result. That honesty is useful — it tells you exactly where your observability coverage has a hole.
Why Usage-Driven and Rate-Driven Need Different Investigations
The usage-driven vs rate-driven classification isn't just taxonomic. It determines the entire shape of the investigation and the fix.
Usage-driven means your environment consumed more resources or generated more API activity. Common D2C causes: a deployment that scaled ECS task count upward and wasn't rolled back, a load test left running in a dev account, an auto-scaling policy that fired on a traffic spike and didn't scale back, a new feature that added a database write path not accounted for in the budget. The investigation path: CloudTrail API calls → IAM identity → specific action → time window. The fix: find the resource, right-size or terminate it, add a guardrail so it doesn't happen again.
Rate-driven means you're being charged a different price per unit for the same resources. Common D2C causes: a Reserved Instance or Savings Plan expired and the workload fell back to on-demand rates, a Spot capacity interruption forced a switch to on-demand, a Marketplace third-party product changed its hourly rate, or a usage tier threshold was crossed that changed the per-unit price. The investigation path: billing events → pricing composition changes → rate table comparison. The fix: renew the commitment, re-evaluate Spot configuration, or review Marketplace subscriptions.
An ops team that opens CloudTrail looking for a deployment event on a rate-driven spike will find nothing. An ops team that combs through pricing tiers on a usage-driven spike will also find nothing. Getting the classification right first saves the time spent in the wrong direction — which, across our work on 20+ D2C AWS accounts, is where most of the 85-minute investigations go.
The CloudTrail Prerequisite Most D2C Brands Have Not Met
The "who" question — which IAM role or user triggered the usage change — requires an organization-wide CloudTrail trail configured to deliver events to CloudWatch Logs. Without it, the AI investigation can tell you "usage-driven, ECS task scaling" but not "IAM role ci-runner in the dev account at 14:32 UTC."
Most D2C brands have CloudTrail enabled in their production account because that's where the compliance requirement first appeared. The staging and dev accounts — where load tests live, where engineers experiment with infrastructure changes, where cost spikes most commonly originate — often have no CloudTrail trail at all.
Enabling an organization trail through AWS Organizations fixes this across all member accounts at once. The standing cost for a 3-account D2C setup (management events only, no data events) is typically $9 to $45 per month total, depending on API call frequency. That's CloudWatch Logs ingestion and storage plus the Insights queries the investigation runs — roughly $0.10 or less per investigation query at D2C traffic volumes.
Our AWS consulting setup includes org-wide CloudTrail as a baseline configuration step. It's also a prerequisite for the cross-account anomaly investigation in AWS FinOps Agent, so if you're planning to enable both, one CloudTrail trail serves both tools.
The CloudTrail coverage gap is the most common reason cost investigations come back with partial answers. We've audited this across D2C stacks at every GMV tier — if you want to know what your current coverage looks like before enabling the AI investigation, grab 30 minutes. Written brief inside a week.
The Client Story: 85 Minutes vs 4 Minutes
A $5M outdoor apparel brand came to us after a Cost Anomaly Detection alert on a Tuesday afternoon: ECS costs up 31% for the day, $2,800 above the daily baseline. Their ops lead — one person covering all of AWS — opened Cost Explorer and started working through the dimensions. Service: ECS. Account: the alert said dev account. Region: us-east-1. Usage type: ECS task-hours. Forty minutes in, still no root cause. They broadened to CloudTrail manually, filtered by ECS API calls, couldn't find a scaling event in the right time window because the event timestamps in the console were showing UTC and they were searching in local time. Eighty-five minutes total before tracing it to a load test initiated by the ci-runner IAM role that had been running since 14:23 and hadn't been stopped when the test completed.
We walked them through enabling the AI investigation on their next anomaly. Two weeks later — a smaller ECS spike, $740 over baseline — they clicked "Investigate with Amazon Q." Four minutes: usage-driven, ECS task scaling, ci-runner IAM role, dev account, initiated at 09:14 UTC, still running. They terminated the task at minute five.
The only change between the two investigations was enabling org-wide CloudTrail to CloudWatch Logs in their dev account. That cost $11 in the billing period it was enabled. The first investigation cost 85 minutes of ops time. The second cost 4.
What You Get Without Org-Wide CloudTrail
If you enable the AI investigation before wiring org-wide CloudTrail, it still runs — it just tells you what it can't determine. A typical partial result for a usage-driven spike looks like: "ECS costs increased 31% from baseline. Analysis indicates usage-driven increase in task-hours. Unable to identify the specific IAM identity or initiating action without organization-wide CloudTrail data configured for this account."
That's still useful. "Usage-driven, ECS task-hours" tells your ops lead to look at running tasks, not at pricing or commitment changes. The investigation cuts the problem space in half even without the full CloudTrail picture. It's a meaningful step up from a raw CAD alert that shows a 31% anomaly with no direction at all.
The partial result also explicitly tells you which CloudTrail gap is blocking the full answer — so you know exactly what to fix to get the complete investigation next time.
How This Fits the Three-Tool Stack
These three AWS cost investigation tools are complementary, not competing. They handle different moments in the investigation workflow:
- Cost Anomaly Detection + AI investigation: triggered when a spike is detected; returns a plain-language root cause within minutes; conversational follow-up in the same session. Good for in-the-moment investigation when an anomaly fires.
- Amazon Q in Cost Explorer: user-initiated; you go to it when you want to explore a question about your costs that isn't tied to a specific anomaly alert. We covered the D2C workflow for this in our Cost Explorer Q post.
- FinOps Agent: automates the investigation trigger and routes results to Jira or Slack on a schedule or threshold you set. The investigation runs without anyone clicking anything. We walked through the context file setup for D2C teams in our FinOps Agent post.
For a D2C brand at $3–10M GMV with one ops person covering AWS: the AI investigation from CAD is the highest-impact tool to enable first. It requires the least setup (just org-wide CloudTrail), handles the most disruptive moment (an active spike), and gives the most immediate return. FinOps Agent makes sense once you've had enough anomalies to want the routing automated rather than click-by-click.
Frequently Asked Questions
How is the AI cost investigation different from Cost Anomaly Detection's existing root cause analysis?
Cost Anomaly Detection's existing root cause analysis shows dimensional breakdowns — which service, account, region, or usage type drove the change. That answers "what changed." The new AI investigation answers "why it changed" by correlating those dimensional signals with CloudTrail API calls, IAM identities, and billing events. It also classifies the spike as usage-driven or rate-driven first, which the dimensional breakdown doesn't do, and surfaces a plain-language explanation that references specific evidence rather than a table of cost splits.
Does the AI cost investigation require Amazon Q Developer to be separately enabled?
Yes. The "Investigate with Amazon Q" button on the anomaly detail page requires Amazon Q Developer access on your AWS account. If the button is absent on your anomaly detail page, check whether Amazon Q Developer is enabled in your account settings. During public preview this has been available to accounts with Amazon Q Developer enabled; standard Q Developer billing applies for the investigation queries to CloudWatch Logs Insights.
What does it cost to enable organization-wide CloudTrail to CloudWatch Logs?
There are two cost components. First, the CloudTrail trail itself: an organization trail that delivers to S3 is free for management events in the first trail per region; data events cost extra if enabled. Second, CloudWatch Logs ingestion and storage: at typical D2C event volumes (management events only, not data events), this runs roughly $3 to $15 per account per month depending on API call frequency. The CloudWatch Logs Insights queries the AI investigation runs are billed at $0.005 per GB scanned, which for most anomaly investigations is under $0.10 per query. For a 3-account D2C setup the standing monthly cost is typically $9 to $45 total.
About the author
Founder & CEO, Braincuber Technologies
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.

