A $5M outdoor gear brand spent 18 months building four AI features. Product recommendations on the product detail page, approved because conversion data from comparable brands showed a 12% lift. A visual search tool for Instagram-driven shoppers who arrive with a photo rather than a keyword, approved because Pinterest-to-purchase attribution showed strong intent. A demand forecasting model running nightly to reduce overstock on seasonal SKUs, approved after the ops team costed out last year's markdown losses. A support bot handling order status queries, approved because the customer service team was handling 200 tickets a day of the same four questions.
Each feature had a launch budget conversation. None had an ongoing cost visibility conversation. Eighteen months in, the AWS bill shows one line: Machine Learning — $2,400 per month. The CTO asks which feature is driving the cost spike that appeared in March. The developer who built recommendations has left the company. Nobody knows whether the demand forecasting model is still running on the same SageMaker instance spec it launched on, or whether it scaled up during the holiday season and never scaled back. Nobody knows whether visual search is being called at the rate it was scoped for.
AWS GPU cost attribution research for shared EKS clusters solves the same attribution blindness at enterprise scale: DCGM metrics collecting per-GPU-slice utilization, Kube-State-Metrics adding Kubernetes pod context, Prometheus storing it, Grafana visualizing it. The tooling is specific to Kubernetes GPU workloads. The pattern — tag every resource by consuming feature, collect usage metrics per feature, display cost alongside utilization in a single dashboard — applies directly to D2C AI workloads on Bedrock, SageMaker, and Rekognition without EKS or GPU hardware.
Running multiple AI features and getting one ML line item on your AWS bill? Book a 30-min audit — Dhwani joins every call, we review your current AI resource tagging and map the attribution dashboard to your specific feature set. Written brief inside a week. No SDR layer.
Why the AWS Bill Doesn't Break Down by Feature by Default
AWS billing is organized by service, not by business function. The Machine Learning line item in Cost Explorer covers every dollar spent on Bedrock model invocations, SageMaker endpoint hours, Rekognition API calls, and any other ML-category service — regardless of which feature triggered the spend. A single Bedrock call from the support bot and a single Bedrock call from the product recommendation engine look identical in the bill.
Cost allocation tags are AWS's mechanism for breaking this apart. When a tag — say, ai-feature: demand-forecasting — is activated in the Billing console and applied consistently to every AWS resource that serves that feature, Cost Explorer can group costs by that tag value. The result is a per-feature cost breakdown that the default bill view cannot produce.
The challenge: tagging must be applied at the resource level, not the API call level. Bedrock does not support per-invocation tagging. The tag lives on the Lambda function or ECS task making the Bedrock call — so the Lambda compute cost and the Bedrock token cost flow to the same feature tag only when the infrastructure is organized one-resource-per-feature. D2C teams that built their AI features quickly, sharing Lambda functions or ECS services across multiple features, have a tagging refactor to do before attribution works cleanly.
The Three-Layer Attribution Stack
The full attribution stack for D2C AI cost visibility has three layers, each answering a different question:
Layer 1: Cost Explorer tag-based grouping. Answers: what did each feature cost this month, this week, and how does it compare to last month? Requires activated cost allocation tags on every AI-serving resource. The output is a Cost Explorer saved view filtered by the ai-feature tag, grouped by tag value, showing a per-feature bar chart of daily ML spend. This is the view the CTO needs for the monthly AI investment review. For context on how this fits into a broader AWS FinOps practice, see our earlier post on the AWS FinOps Agent org context setup — the account and team tagging structure there is the foundation the feature-level tags build on.
Layer 2: CloudWatch custom usage metrics. Answers: how much is each feature being used, and is the cost proportional to usage? A Lambda function serving product recommendations should emit a CloudWatch custom metric for every recommendation request it handles: namespace: D2CAI, metric: RecommendationRequests, dimension: feature=product-recommendations. Over time, this metric shows whether a cost spike in March correlates with a traffic spike (expected) or with the same traffic volume at higher cost (unexpected — model drift, instance scaling event, or new model version with higher token consumption).
Layer 3: Managed Grafana dashboard. Answers: what is the cost-per-request for each feature, and is it trending in the right direction? The dashboard pulls Cost Explorer data via the Cost Explorer API and CloudWatch usage metrics via the CloudWatch data source, displays them side by side, and calculates derived metrics: cost per recommendation served, cost per visual search query, cost per demand forecast run, cost per support ticket deflected. These are the numbers that connect AI spend to business value — and the numbers the original budget conversations should have included.
Tagging Every AI Resource by Feature
The tag schema that makes attribution work cleanly across a D2C AI stack:
ai-feature: [product-recommendations | visual-search | demand-forecasting | support-bot]
ai-model: [bedrock-claude | sagemaker-custom | rekognition | bedrock-titan]
ai-env: [production | staging]
Apply these tags to every resource involved in the AI feature's serving path:
Lambda functions: tag at creation in the function configuration. Cost Explorer attributes Lambda compute costs (invocation + duration) to the tag. If a shared Lambda calls multiple AI features based on request type, split it into per-feature Lambda functions before implementing attribution — shared functions cannot be cleanly attributed.
SageMaker endpoints: tag the endpoint resource. Cost Explorer attributes endpoint-hour costs to the tag. SageMaker auto-scaling events that change instance count are reflected in the cost automatically — if demand forecasting scaled from ml.t3.medium to ml.m5.large during BFCM and never scaled back, the Cost Explorer tag view shows the per-day cost change.
Rekognition: Rekognition API calls are per-image or per-minute charges with no resource to tag. Attribution for Rekognition requires instrumenting the calling Lambda: the Lambda function serving visual search should be tagged ai-feature: visual-search, and its compute cost is attributed accordingly. The Rekognition API charges themselves aggregate at the service level and cannot be split by tag without a separate cost allocation approach using usage metrics.
Bedrock: same pattern as Rekognition — tag the Lambda or ECS task making the Bedrock API calls. Bedrock token costs appear as a service-level charge; the compute cost of the calling resource is what gets attributed to the feature tag.
Activate all three tag keys in the AWS Billing console under Cost Allocation Tags. Tags activated today apply to future charges only — historical costs before tag activation cannot be retroactively attributed.
Building the Cost and Usage Dashboard in Managed Grafana
Amazon Managed Grafana connects to Cost Explorer via the AWS Cost Explorer data source plugin and to CloudWatch via the native CloudWatch data source. No Prometheus is required for this D2C stack — CloudWatch handles metric storage, unlike the EKS GPU pattern where DCGM outputs Prometheus-format metrics that require a Prometheus store.
The four panels that answer the CTO's question:
Panel 1: Daily AI cost by feature (bar chart). Data source: Cost Explorer. Filter: service = Amazon SageMaker, Amazon Bedrock, Amazon Rekognition. Group by: ai-feature tag. Time range: last 30 days. This panel shows which feature spent what each day, and immediately makes the March spike attributable to a specific feature rather than the service aggregate.
Panel 2: Request volume by feature (time series). Data source: CloudWatch. Metrics: custom namespace D2CAI, each feature's request count metric. This panel overlays request volume onto the cost panel — if cost went up and requests went up proportionally, the cost increase is explained by traffic. If cost went up but requests stayed flat, there is an efficiency problem to investigate.
Panel 3: Cost per request by feature (stat panels). Calculated field: daily cost from Panel 1 divided by daily request count from Panel 2. This is the number that enables ROI conversations — if visual search costs $0.04 per query and has a measured conversion lift of $1.20 per assisted session, the economics are clear. If demand forecasting costs $180/month and the last inventory review found no evidence of its output being used, that is also clear.
Panel 4: Budget alert status (table). For each feature: monthly budget set at launch, current month-to-date spend, percentage consumed, days remaining in the month. A CloudWatch Budget is configured per feature tag and feeds an alert if spend exceeds 80% of budget before the 20th of the month. This panel shows the status without requiring the CTO to open the Billing console.
The Managed Grafana and Managed Prometheus setup that supports this dashboard is part of our AI solutions infrastructure for D2C brands — the same observability layer applies to agent performance monitoring, model latency tracking, and the agentic triage patterns we've covered elsewhere.
The GPU Variant: When D2C Brands Host Their Own Models
Some D2C brands, particularly those with custom visual models trained on proprietary product imagery, run inference on EC2 GPU instances rather than managed services. The g4dn.xlarge (single NVIDIA T4) at approximately $0.526/hour is a common choice for visual search inference at D2C traffic volumes. If the same instance serves two features — visual search and an image-quality moderation model — tag-based billing attributes the entire instance to whichever feature's Lambda tagged the EC2 launch request, which is inaccurate.
This is exactly the scenario the EKS GPU attribution article solves with DCGM metrics. For D2C brands not running EKS, the equivalent is nvidia-smi output piped to a CloudWatch custom metric via a sidecar process on the EC2 instance, with per-process GPU utilization tagged by the inference process name. The CloudWatch metric then shows what fraction of the GPU each feature is consuming, and cost attribution divides the instance cost proportionally.
The practical recommendation: if two AI features share a GPU instance, separate them onto distinct instances before the attribution complexity compounds. A second g4dn.xlarge adds $380/month — less than the engineering cost of building accurate shared-GPU attribution, and it eliminates the noisy-neighbor problem where one feature's traffic spike affects the other's latency. We covered the full self-hosted GPU economics in the local AI hosting cost post; the attribution layer is a downstream concern only after the hosting decision is made.
Four AI features and one unreadable AWS bill? Book 30 minutes with Dhwani — we audit your current tagging state, identify which resources need to be split or retagged, and design the Cost Explorer + Grafana dashboard for your specific feature set. Written brief inside a week.
Frequently Asked Questions
Does AWS Cost Explorer show Bedrock costs by model or by feature?
By default, Cost Explorer shows Bedrock costs aggregated at the service level — one line for Amazon Bedrock across all models and all features. To break it down by feature, activate custom cost allocation tags in the Billing console and ensure every resource making Bedrock API calls carries the correct feature tag. Bedrock does not support per-request tagging at the API call level. The tag lives on the Lambda function or ECS task making the call — so the Lambda compute cost and (by proxy) the Bedrock token cost flow to the feature tag when each feature has its own dedicated calling resource. D2C teams with shared Lambda functions calling Bedrock for multiple features have a resource split to complete before attribution is accurate.
How accurate is tag-based attribution compared to DCGM GPU metrics?
DCGM-based GPU attribution is granular to the millisecond and GPU slice — actual hardware utilization per pod in real time. Tag-based Cost Explorer attribution is accurate to the billing day and allocates by resource hours, not by utilization within a shared resource. For D2C AI workloads on managed services, the distinction rarely matters: a SageMaker endpoint serving one feature is not shared at the GPU slice level with another feature, so tag-based billing attribution matches actual cost. The DCGM approach becomes necessary only when multiple AI features share a single GPU instance. In that case, resource-level tags attribute the entire instance cost to one feature, and per-process GPU metrics are needed to split utilization accurately between features.
What does Managed Grafana cost for a D2C AI monitoring dashboard?
Amazon Managed Grafana charges per active user per month — approximately $9 for editors and $5 for viewers as of 2026 pricing. For a D2C brand with two or three people who need access to the AI cost dashboard, total Managed Grafana cost runs $15–30 per month. The CloudWatch data source is included; no separate Managed Prometheus is needed for a CloudWatch-native D2C stack. For a D2C brand with 10–20 custom AI usage metrics in CloudWatch, the metric storage cost is under $5 per month. The total dashboard infrastructure cost — $20–35/month — is less than 1.5% of a $2,400 AI bill, paid once to have ongoing per-feature cost visibility rather than a monthly archaeology exercise in the Billing console.
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.
