What Is Amazon Comprehend? NLP on AWS Explained

Your engineering team is still writing custom regex patterns to extract entities from customer emails.

That is $14,200+ a year in developer hours on a problem AWS solved in 2017. Amazon Comprehend is a fully managed NLP service that uses machine learning to pull structured insights — sentiment, entities, key phrases, language, and more — straight out of raw, unstructured text. No model training. No infrastructure management. No PhD required.

Impact: $14,200+ burned annually on regex that a $0.0001/unit API replaces.

We deploy Comprehend as part of production AI pipelines at Braincuber for clients across the US, UAE, and Singapore — and the gap between what founders think NLP costs to build versus what AWS actually charges is eye-opening. The honest breakdown is below.

What Comprehend Actually Does

Stop picturing a chatbot. Amazon Comprehend is an API-first intelligence layer that sits inside your data pipeline and answers one question: “What does this text actually mean?”

You pass it text. It returns JSON. That JSON contains entities, sentiment scores, key phrases, language codes, and topic classifications — ready to plug into your downstream systems, whether that is a Lambda function, a Redshift warehouse, or an Odoo ERP workflow.

The Full Capability Stack (2026)

▸

Entity Recognition — Identifies people, places, organizations, dates, and quantities with confidence scores above 0.91 on standard datasets

▸

Sentiment Analysis — Classifies text as Positive, Negative, Neutral, or Mixed; Schuh uses this to pre-sort 41% of customer emails before the support team even logs in

▸

Targeted Sentiment — Drills down to entity-level sentiment (e.g., “burger → Positive, service → Negative” from a single review)

▸

Custom Entity Recognition — AutoML trains a private model on your domain-specific terms — policy numbers, claim IDs, SKUs — from a small set of labeled examples, no ML experience needed

▸

Custom Classification — Build a text classifier for your own labels (e.g., “Account Question,” “Ticket Refund,” “Flight Complaint”) without writing a single line of model code

▸

PII Identification & Redaction — Detects and redacts names, credit card numbers, bank routing numbers, and dates automatically — critical for GDPR and HIPAA compliance

▸

Keyphrase Extraction — Pulls the actual talking points from unstructured text with confidence scores

▸

Topic Modeling — Groups thousands of S3 documents into topic clusters without manual labeling

▸

Events Detection — Extracts who-what-when-where from large document sets at scale

▸

Language Detection — Identifies the dominant language from 100+ supported languages, with confidence scores

▸

Toxicity Detection & Prompt Safety — Flags harmful content in peer-to-peer platforms and LLM inputs/outputs

▸

Syntax Analysis — Tokenizes text and tags parts of speech (nouns, verbs, adjectives) for downstream NLP pipelines

The “Just Build It Yourself” Trap

Here is the controversial opinion most AWS consultants will not tell you: most teams that choose to build custom NLP models in-house spend 3–6 months and $80,000–$120,000 before they have anything production-ready. Then they spend another $4,000/month on a data scientist to maintain it.

Comprehend Custom Classification: The Cost Reality

Training cost: $3 per hour of training time (billed by the second). Comprehend’s Custom Classification API trains a private model on your labeled CSV data, returns a confidence score above 0.90 on well-structured data.

That is not a rounding error

That is a category-level cost advantage over in-house NLP.

LexisNexis: 200 Million Documents, One API

LexisNexis ran its custom entity recognition model across 200 million documents and hit above 92% accuracy on extracting legal-specific entities like judges and attorneys. Their alternative was a multi-year ML project with a team of data scientists. Instead, they called an API.

If you are processing fewer than 500,000 documents per month, the economics of building your own NLP stack are almost impossible to justify.

What It Costs (The Actual Numbers)

Comprehend uses a pay-as-you-go model. Text is measured in units of 100 characters, with a 3-unit (300-character) minimum per request.

Feature	Pricing
Standard NLP APIs (Sentiment, Entities, Keyphrases)	$0.0001–$0.001 per unit
Custom Model Training	$3/hour (billed per second)
Custom Model Management	$0.50/month per model
Free Tier (first 12 months)	50,000 units/month

NLP Cost Comparison: AWS vs. Google vs. Azure

Amazon Comprehend

$0.0001–$0.001/unit. Lowest entry cost. Native S3, Lambda, SageMaker, Redshift integration.

Google Cloud NLP

$0.0002/unit. Competitive pricing, but cross-cloud overhead if you run AWS infrastructure.

Azure Cognitive Services

$0.00015/unit. Tightly coupled to Microsoft ecosystem. Best for Teams/365 shops.

Siemens: The ROI Case, Not a Slide Deck

Siemens switched from human analysts to Comprehend for employee survey translation, analysis, and categorization. Their cost per interview dropped from “multiple euros” to less than one euro — and results came back 75% faster.

Real AWS Stacks Where Comprehend Earns Its Keep

We constantly see clients treating Comprehend as a standalone toy rather than a pipeline component. That is the wrong mental model. The service is designed to sit inside a larger AWS architecture:

Customer Support Automation

Pattern: Route incoming tickets by detected sentiment and classified issue type. Schuh deployed this exact pattern — support tickets are now colour-coded and matched to the right agent before the agent opens their queue.

Result: measurably better customer retention on first contact.

41% of emails pre-sorted before a human touches them.

Compliance & Legal Document Review

Case: HMLR (HM Land Registry) used Comprehend to compare thousands of legal documents per week — cutting document review time by 50% and doubling review throughput.

The system flags discrepancies early

Before they turn into indemnity claims worth £50,000+.

Financial & Regulatory Intelligence

Scale: FINRA processes millions of unstructured compliance documents. Their investigators previously had to read page by page.

With Comprehend, they extract named entities, match them against FINRA records

And flag individuals of interest at scale — in seconds per document.

E-Commerce & Retail

Real deployment: ExxonMobil’s procurement team uses a custom Comprehend classification model to map free-text eProcurement entries to supplier contract agreements — plugged directly into SageMaker.

For D2C brands, the same pattern applies

Mapping product reviews to SKU-level insights. Automatically.

Healthcare & Life Sciences

Clinical notes carry billions of dollars of billing and compliance risk buried in free-form text. Comprehend Medical (the HIPAA-eligible variant) extracts medical entities while preserving patient privacy — purpose-built for this workflow.

If you are touching PHI and not using Comprehend Medical, your compliance team should be nervous.

Where Braincuber Plugs Comprehend In

We are not reselling AWS licenses. We build the architecture around Comprehend that makes it actually do something for your business.

Here is the pattern we deploy most often for D2C brands and enterprise clients:

The Braincuber NLP Pipeline

1. Ingest

Raw text (emails, tickets, reviews, contracts) lands in S3 via Kinesis or Lambda triggers.

2. Process

Comprehend APIs run entity extraction, sentiment scoring, and custom classification in real-time or batch.

3. Store

Structured JSON output flows into Redshift or a custom data lake.

4. Act

Odoo ERP or downstream CRM gets enriched records with sentiment tags, entity labels, and risk flags.

5. Monitor

SageMaker tracks model drift on custom classifiers; alerts trigger retraining if accuracy drops.

That pipeline handles 37+ hours of manual review work per week for a mid-size client. One engineer manages the whole thing. (Yes, that is the actual number from a current deployment.)

The brands running this architecture stop asking “what are customers saying?” and start acting on the answer within 24 hours of data ingestion.

The Integration Reality

Comprehend connects natively to Amazon S3, Lambda, SageMaker, Redshift, Amazon Translate, and Amazon Augmented AI (A2I). That last one matters — A2I lets you loop humans back into the workflow when confidence scores drop below a defined threshold, so you are not blindly trusting ML on edge cases.

Assent Compliance: The Full Stack Pattern

Architecture: Textract pulls text from PDFs, Comprehend extracts business-specific entities, and A2I routes low-confidence extractions to human reviewers.

Hundreds of hours saved per week

In manual document review for supply chain compliance teams.

If you are running a Shopify store connected to Odoo and you are not feeding your product reviews and support tickets through a Comprehend pipeline, you are leaving actionable intelligence in an S3 bucket you never open. We build the AI pipelines that turn that dead data into decisions. And we integrate it directly into the cloud infrastructure you already run.

Stop Guessing What Your Customers Are Telling You

Book a free 15-Minute AI Architecture Audit — we will map exactly where NLP fits inside your existing AWS stack and show you what you can automate this quarter.

Frequently Asked Questions

Does Amazon Comprehend require machine learning expertise to set up?

No. The standard APIs (sentiment, entity recognition, keyphrase extraction) work out of the box — you call the API, pass text, and receive JSON. Custom models use AutoML, meaning you supply labeled training data in a CSV and AWS handles the rest. No model architecture decisions, no hyperparameter tuning, no ML background required.

How accurate is Amazon Comprehend’s sentiment analysis?

For standard English text, Comprehend’s sentiment analysis exceeds 90% accuracy on well-structured customer feedback. Vision Critical measured accuracy “at over 90%” on customer feedback classification in production. Accuracy drops on highly domain-specific language, which is exactly where Custom Entity Recognition models trained on your own data make the difference.

What is the difference between Amazon Comprehend and Amazon Comprehend Medical?

Standard Comprehend is a general-purpose NLP service for any text type. Comprehend Medical is HIPAA-eligible and trained specifically on clinical notes, lab results, and medical records — it extracts medical entities like medications, dosages, and diagnoses while meeting healthcare privacy requirements. Use Medical when you are processing PHI; use standard Comprehend for everything else.

Can Amazon Comprehend process documents in languages other than English?

Yes. Comprehend supports text analysis in German, English, Spanish, Italian, Portuguese, French, Japanese, Korean, Hindi, Arabic, and Simplified and Traditional Chinese. Its language detection API identifies the dominant language across 100+ languages. For languages outside the NLP-analysis list, pair Comprehend with Amazon Translate first — it handles translation, then Comprehend handles analysis.

How does Amazon Comprehend pricing scale for high-volume workloads?

Comprehend uses tiered pricing — the more text units you process per month, the lower the per-unit rate. Standard APIs run at $0.0001–$0.001 per 100-character unit. The 12-month free tier covers 50,000 units/month. For high-volume workloads, batch processing is more cost-efficient than real-time API calls — consolidating large document jobs into monthly bulk runs can cut per-unit costs by 30–40% compared to running fragmented daily jobs.

Your engineering team is still writing custom regex patterns to extract entities from customer emails.

Impact: $14,200+ burned annually on regex that a $0.0001/unit API replaces.

What Comprehend Actually Does

Stop picturing a chatbot. Amazon Comprehend is an API-first intelligence layer that sits inside your data pipeline and answers one question: “What does this text actually mean?”

The Full Capability Stack (2026)

▸

Entity Recognition — Identifies people, places, organizations, dates, and quantities with confidence scores above 0.91 on standard datasets

▸

Sentiment Analysis — Classifies text as Positive, Negative, Neutral, or Mixed; Schuh uses this to pre-sort 41% of customer emails before the support team even logs in

▸

Targeted Sentiment — Drills down to entity-level sentiment (e.g., “burger → Positive, service → Negative” from a single review)

▸

Custom Entity Recognition — AutoML trains a private model on your domain-specific terms — policy numbers, claim IDs, SKUs — from a small set of labeled examples, no ML experience needed

▸

Custom Classification — Build a text classifier for your own labels (e.g., “Account Question,” “Ticket Refund,” “Flight Complaint”) without writing a single line of model code

▸

PII Identification & Redaction — Detects and redacts names, credit card numbers, bank routing numbers, and dates automatically — critical for GDPR and HIPAA compliance

▸

Keyphrase Extraction — Pulls the actual talking points from unstructured text with confidence scores

▸

Topic Modeling — Groups thousands of S3 documents into topic clusters without manual labeling

▸

Events Detection — Extracts who-what-when-where from large document sets at scale

▸

Language Detection — Identifies the dominant language from 100+ supported languages, with confidence scores

▸

Toxicity Detection & Prompt Safety — Flags harmful content in peer-to-peer platforms and LLM inputs/outputs

▸

Syntax Analysis — Tokenizes text and tags parts of speech (nouns, verbs, adjectives) for downstream NLP pipelines

The “Just Build It Yourself” Trap

Comprehend Custom Classification: The Cost Reality

That is not a rounding error

That is a category-level cost advantage over in-house NLP.

LexisNexis: 200 Million Documents, One API

If you are processing fewer than 500,000 documents per month, the economics of building your own NLP stack are almost impossible to justify.

What It Costs (The Actual Numbers)

Comprehend uses a pay-as-you-go model. Text is measured in units of 100 characters, with a 3-unit (300-character) minimum per request.

Feature	Pricing
Standard NLP APIs (Sentiment, Entities, Keyphrases)	$0.0001–$0.001 per unit
Custom Model Training	$3/hour (billed per second)
Custom Model Management	$0.50/month per model
Free Tier (first 12 months)	50,000 units/month

NLP Cost Comparison: AWS vs. Google vs. Azure

Amazon Comprehend

$0.0001–$0.001/unit. Lowest entry cost. Native S3, Lambda, SageMaker, Redshift integration.

Google Cloud NLP

$0.0002/unit. Competitive pricing, but cross-cloud overhead if you run AWS infrastructure.

Azure Cognitive Services

$0.00015/unit. Tightly coupled to Microsoft ecosystem. Best for Teams/365 shops.

Siemens: The ROI Case, Not a Slide Deck

Real AWS Stacks Where Comprehend Earns Its Keep

We constantly see clients treating Comprehend as a standalone toy rather than a pipeline component. That is the wrong mental model. The service is designed to sit inside a larger AWS architecture:

Customer Support Automation

Result: measurably better customer retention on first contact.

41% of emails pre-sorted before a human touches them.

Compliance & Legal Document Review

Case: HMLR (HM Land Registry) used Comprehend to compare thousands of legal documents per week — cutting document review time by 50% and doubling review throughput.

The system flags discrepancies early

Before they turn into indemnity claims worth £50,000+.

Financial & Regulatory Intelligence

Scale: FINRA processes millions of unstructured compliance documents. Their investigators previously had to read page by page.

With Comprehend, they extract named entities, match them against FINRA records

And flag individuals of interest at scale — in seconds per document.

E-Commerce & Retail

For D2C brands, the same pattern applies

Mapping product reviews to SKU-level insights. Automatically.

Healthcare & Life Sciences

If you are touching PHI and not using Comprehend Medical, your compliance team should be nervous.

Where Braincuber Plugs Comprehend In

We are not reselling AWS licenses. We build the architecture around Comprehend that makes it actually do something for your business.

Here is the pattern we deploy most often for D2C brands and enterprise clients:

The Braincuber NLP Pipeline

1. Ingest

Raw text (emails, tickets, reviews, contracts) lands in S3 via Kinesis or Lambda triggers.

2. Process

Comprehend APIs run entity extraction, sentiment scoring, and custom classification in real-time or batch.

3. Store

Structured JSON output flows into Redshift or a custom data lake.

4. Act

Odoo ERP or downstream CRM gets enriched records with sentiment tags, entity labels, and risk flags.

5. Monitor

SageMaker tracks model drift on custom classifiers; alerts trigger retraining if accuracy drops.

That pipeline handles 37+ hours of manual review work per week for a mid-size client. One engineer manages the whole thing. (Yes, that is the actual number from a current deployment.)

The brands running this architecture stop asking “what are customers saying?” and start acting on the answer within 24 hours of data ingestion.

The Integration Reality

Assent Compliance: The Full Stack Pattern

Architecture: Textract pulls text from PDFs, Comprehend extracts business-specific entities, and A2I routes low-confidence extractions to human reviewers.

Hundreds of hours saved per week

In manual document review for supply chain compliance teams.

Stop Guessing What Your Customers Are Telling You

Book a free 15-Minute AI Architecture Audit — we will map exactly where NLP fits inside your existing AWS stack and show you what you can automate this quarter.

What Comprehend Actually Does

The Full Capability Stack (2026)

The “Just Build It Yourself” Trap

Comprehend Custom Classification: The Cost Reality

LexisNexis: 200 Million Documents, One API

What It Costs (The Actual Numbers)

Siemens: The ROI Case, Not a Slide Deck

Real AWS Stacks Where Comprehend Earns Its Keep

Customer Support Automation

Compliance & Legal Document Review

Financial & Regulatory Intelligence

E-Commerce & Retail

Healthcare & Life Sciences

Where Braincuber Plugs Comprehend In

The Integration Reality

Assent Compliance: The Full Stack Pattern

Stop Guessing What Your Customers Are Telling You

Frequently Asked Questions

Does Amazon Comprehend require machine learning expertise to set up?

How accurate is Amazon Comprehend’s sentiment analysis?

What is the difference between Amazon Comprehend and Amazon Comprehend Medical?

Can Amazon Comprehend process documents in languages other than English?

How does Amazon Comprehend pricing scale for high-volume workloads?

HIPAA-scope AI engagement?

Let's find what's breaking — and fix it

What Comprehend Actually Does

The Full Capability Stack (2026)

The “Just Build It Yourself” Trap

Comprehend Custom Classification: The Cost Reality

LexisNexis: 200 Million Documents, One API

What It Costs (The Actual Numbers)

Siemens: The ROI Case, Not a Slide Deck

Real AWS Stacks Where Comprehend Earns Its Keep

Customer Support Automation

Compliance & Legal Document Review

Financial & Regulatory Intelligence

E-Commerce & Retail

Healthcare & Life Sciences

Where Braincuber Plugs Comprehend In

The Integration Reality

Assent Compliance: The Full Stack Pattern

Stop Guessing What Your Customers Are Telling You

Frequently Asked Questions

Does Amazon Comprehend require machine learning expertise to set up?

How accurate is Amazon Comprehend’s sentiment analysis?

What is the difference between Amazon Comprehend and Amazon Comprehend Medical?

Can Amazon Comprehend process documents in languages other than English?

How does Amazon Comprehend pricing scale for high-volume workloads?

HIPAA-scope AI engagement?

Let's find what's breaking — and fix it