Quick answer
There are three honest AI build timelines: 2-week focused pilots, 8-week production MVPs, and 12-18 month enterprise rollouts. The right tier depends on three things — whether the use case is generic enough for an off-shelf model, how many integrations are involved, and whether you need SOC 2 / HIPAA scope. From 50+ shipped US projects, the median commercial AI agent ships in 8-12 weeks; enterprise AI under audit scope runs 12-18 months end-to-end. Cutting that by skipping shadow mode is how 42% of US AI initiatives died in 2025.
The three timelines, with what fits each
Most agencies quote a single number and call it "the AI timeline." That is wrong. There are three honest timelines depending on what you are actually building. Picking the wrong tier is how projects get killed at year-2 audit.
Tier 1: The 2-week focused pilot
Generic use case, off-shelf model (GPT-4o or Claude direct API), 1 integration max, internal-only deployment, no audit scope. Examples we have shipped: an internal knowledge-base Q&A bot, a marketing copy generator, a single-form classifier.
- Cost: $4,500 - $15,000.
- Best for: Validating that AI even helps with a specific task. "Is this worth a real engagement?"
- Skip if: External user traffic, regulatory scope, or more than one integration.
Tier 2: The 8-week production MVP
Specific workflow, custom prompts + tool calls, 2-4 integrations, eval set required, shadow-mode tested, real user traffic. Examples: a Shopify support agent, an invoice-OCR + GL-coding agent, a sales-call-summary tool for a B2B SaaS.
- Cost: $30,000 - $80,000.
- Best for: First production AI agent. Median payback 8-12 weeks for ROI-positive use cases.
- Phases: Discovery (2 wk) → eval set (1 wk) → build (4 wk) → shadow mode (1 wk) → gated rollout. 8 weeks total.
Tier 3: The 12-18 month enterprise rollout
Multi-agent system, 5-10+ integrations, SOC 2 / HIPAA / SOX scope, audit-evidence pipeline, on-call rotation. Examples: a payer prior-auth + denials + claims AI suite, a SOX-scoped finance AI for AP + reporting, a hospital clinical-notes + discharge-summary stack.
- Cost: $180,000 - $600,000 over the engagement.
- Phases: Same 6 phases as Tier 2 but each runs longer. Scale + audit-prep phase alone is 6-12 months.
- Skip if: Use case is generic enough that an off-shelf product (e.g. Glean, Copilot) covers it. Custom only when the gap is real.
Not sure which tier fits? 30-minute call with a practice lead. Bring your use case + your audit constraints; we tell you the tier honestly, including whether off-shelf is the right call.
Book a free 30-min scoping call →What kills timelines (real data from 23 post-mortems)
- Data not in a usable API. 9 of 23 post-mortems. Order data in Shopify but returns in Notion, salaries in a CSV emailed monthly. Fix the data layer first or accept a 40% timeline penalty.
- No eval set owner. 6 of 23. The internal lead who has to label 200 test cases got pulled to another project. Build slows or quality drops.
- "Perfect day one" expectation. 4 of 23. Leadership refuses to ship until eval-set accuracy hits 95%. Shadow mode never starts. Project quietly dies.
- Skipped shadow mode. 3 of 23. Went straight from build to 100% production. First hallucination hit a real customer; incident response consumed 6 weeks.
- Vendor switched mid-engagement. 1 of 23. Procurement re-negotiated with a different vendor at week 6. Everything reset.
How to compress 40% without losing audit-clean output
The fastest 18-month enterprise build we shipped finished in 11 months. The pattern that worked: (1) the buyer pre-cleaned their data layer before discovery, (2) the eval-set owner was full-time on the project for the first 6 weeks not a side gig, (3) executive sponsor said "ship at 90% of target accuracy, iterate to 95% in production." We did not skip phases. We compressed every phase by removing internal coordination friction. The buyer organization mattered more than our team.
FAQ
Can we ship a real production AI agent in 4 weeks?
A focused pilot, yes. A production agent serving external users, almost never. The honest minimum for a production-grade build is 8 weeks because shadow mode + gated rollout are non-negotiable. If a vendor quotes 4 weeks production they are skipping shadow mode.
What is the cheapest tier that can actually book a meeting / drive a conversion?
The 8-week Tier 2 MVP. Pilot-tier work is internal-only and rarely tied to revenue. The MVP is where AI starts producing measurable business outcomes — support deflection, sales-call summary throughput, document processing time. Median payback 8-12 weeks.
What about regulated industries (healthcare, finance, defense)?
Default to Tier 3 timelines. Healthcare HIPAA adds 4-6 weeks for BAA + boundary scrubbing. Finance SOX adds 8-12 weeks for audit-log pipeline. Defense ITAR adds 12+ weeks for GovCloud architecture. The compliance work is real and visible in the timeline.
Should we hire in-house or use an agency?
For Tier 1 + Tier 2 (under $80K total), an agency is almost always faster. For Tier 3 (enterprise, multi-year), hire in-house but use an agency for the first 6-12 months to bootstrap the pattern. Mixing both is the most common pattern we see in 2026.
Get your tier in writing
Pick the right timeline before you sign anything
30-minute call. You bring the use case + audit constraints. We come back with the tier recommendation, the phased timeline, and a written assessment if off-shelf is the better answer. No sales sequence, no recurring nudges.
Methodology
Tier ranges, median timelines, and failure-mode tallies pulled from 50+ Braincuber US AI engagements shipped 2023-2026 plus 23 documented project post-mortems. The 42% scrap-rate figure references Deloitte's State of AI 2026 report (n=1,800 US enterprises). Median MVP payback (8-12 weeks) cross-validated against the OneReach.ai 2026 benchmark of 5.1-month overall median across all AI agent categories.
Founder and CEO of Braincuber. Has scoped and shipped 500+ Odoo, AI, and cloud projects for US mid-market and global brands. Takes every founder call personally — no SDR layer between buyers and the people building the system.
