Reliable LLM Structured Output: JSON Mode and Schemas

AI Summary - 20-sec read - Reviewed by experts

The moment an LLM feeds another system - a database, an API, your app - free text stops being clever and starts being a bug. You need output your code can parse the same way every time.
Prompting "return JSON" is not enough. Models still wrap it in prose, add trailing commas, or drop a field on the one request that matters, and your parser throws in production.
The reliable stack is three layers: JSON mode or a strict schema so the model can only emit valid JSON, a validator that rejects anything off-shape, and a bounded retry that re-asks with the error.
Function calling and constrained decoding turn "usually valid" into "structurally guaranteed" - the model fills a schema you define rather than freestyling a string.
Short on time? We will make your LLM output parse reliably, with schemas and validation your pipeline can trust. Book a free call.

Short on time? Book a free call.

Your prototype worked. The model returned a tidy JSON object, your code parsed it, everyone was happy. Then you shipped, traffic arrived, and one request in fifty came back with a chatty preamble, a trailing comma, or a missing field - and the parser that never failed in the demo threw an exception in front of a customer. Free-text output is fine when a human reads it. The instant an LLM feeds another system, unstructured text is a liability, and "just prompt it to return JSON" is not the fix teams think it is.

Why "return JSON" is not enough

A language model predicts the next likely token, not a valid data structure. Ask it for JSON and it will usually oblige, because JSON is common in its training data - but "usually" is exactly the problem. The failures are predictable and they cluster at the worst time: under load, on unusual inputs, on the long-tail request you did not test. You get markdown fences around the object, an explanatory sentence before it, a trailing comma that is valid to a human and fatal to a strict parser, or a field that silently vanishes when the model decides it was not relevant. Each one is rare in isolation and inevitable at volume. If your code assumes clean JSON, every one of these is an unhandled exception.

Layer one: make invalid output impossible to emit

The strongest fix is to stop relying on the prompt and use the structured-output features the model providers now ship. There are two levers, and you should reach for them before you write a single clever instruction.

JSON mode. Most major APIs have a mode that constrains generation so the response is guaranteed to be syntactically valid JSON - no prose, no fences, no trailing commas. This alone removes the entire class of "it wrapped the object in text" failures.
Schema-constrained output and function calling. Go further and hand the model a JSON Schema (or a tool/function signature). Now it does not write a string that looks like your object - it fills the fields you defined, with the types you declared. This is the difference between hoping for the right shape and constraining the model so the wrong shape cannot be produced. Define required fields, enums for anything categorical, and types for every value.

Is your LLM output breaking downstream systems?

We will review where your model feeds code, add JSON mode and a schema, and put validation in front of your parser so a bad response never reaches production. No pitch, reply in 2 hrs, no card needed, NDA on request.

Get a free audit

Layer two: validate before you trust

Even with JSON mode on, treat model output the way you treat any external input: never trust it, always validate it. JSON mode guarantees the response parses; it does not guarantee the values make sense. A well-formed object can still carry a price of -1, a status your app has never heard of, or a date in the wrong century. Define the schema once - the fields, their types, the allowed values, what is required - and run every response through a validator before your business logic touches it. When something is off-shape, you want a controlled rejection you can log and retry, not a crash three functions deep where the bad value finally causes damage. This is the same discipline that separates a demo from a system, and it is core to how we build every production AI agent.

Layer three: retry with the error, then fall back

When validation fails, do not give up and do not loop forever. Send the model back its own broken output plus the specific validation error - "field total must be a number, you returned a string" - and ask it to fix that one thing. Models are good at correcting a concrete mistake they can see. Bound the retries to two or three attempts so a persistently confused model cannot run up your bill or hang the request. If it still fails after the last retry, fall back cleanly: return a safe default, route to a human, or surface an honest error - never push an unvalidated object downstream because you ran out of patience. Instrument every retry and every fallback; a rising retry rate is an early signal that a prompt, a model version, or an input distribution has drifted, and it is exactly the kind of runtime signal we cover in AI agent observability.

Unparseable output is a production outage waiting to happen.

We will design the schema, wire JSON mode and validation, and add bounded retries so your LLM output is something your pipeline can actually depend on. Reply in 2 hrs, NDA on request.

Book a free call

Takeaways

The moment an LLM feeds another system, free text is a liability - you need output your code can parse identically every time.
Prompting "return JSON" is not enough; failures cluster under load and on long-tail inputs you did not test.
Use JSON mode and a schema (or function calling) so the model fills a defined shape instead of freestyling a string.
Validate every response before your logic touches it - a well-formed object can still carry nonsense values.
On failure, retry with the specific error, bound the attempts, then fall back cleanly - never ship an unvalidated object downstream.

Put it together

The order of work is what makes this durable. Turn on JSON mode and define a schema so most output is valid by construction. Put a validator in front of your parser so nothing off-shape gets through. Add a bounded retry that feeds the model its own error, and a clean fallback for the rare case that still fails. Prompt engineering still helps at the margins - a clear instruction and a worked example reduce the retry rate - and if you are new to that lever, start with our prompt engineering guide. But the reliability comes from the structure around the model, not the wording inside it. That structure is the heart of the AI systems we build for teams putting models into real pipelines.

Frequently asked questions

Is JSON mode the same as function calling?

Not quite. JSON mode guarantees the response is syntactically valid JSON but says nothing about which fields appear. Function calling (or schema-constrained output) goes further: you declare the exact fields and types, and the model fills them. Use function calling when you know the shape you need - which, for anything feeding code, is almost always. Use plain JSON mode only when the shape is genuinely open-ended.

Do I still need validation if I use a schema?

Yes. Schema-constrained generation controls the structure, not the meaning. The model can return a perfectly typed object whose values are wrong - an impossible date, a negative quantity, a status your system does not recognise. Validation catches the semantic errors that structure alone cannot. Treat model output as untrusted external input, always.

How many retries should I allow?

Two or three. A single retry with the validation error fixes most transient failures, because the model can usually correct a mistake it can see. Beyond three attempts you are burning latency and cost on a request that is unlikely to recover, so fall back to a default or a human instead. Always cap it - an unbounded retry loop is how a bad prompt turns into a runaway bill.

Does this add much latency?

Validation is negligible - it is a local check in microseconds. The cost is the occasional retry, which adds one extra model call for the small fraction of responses that fail the first time. In practice a well-designed schema keeps the retry rate low enough that average latency barely moves, and you trade a few milliseconds for output your system can actually trust.

The short version: an LLM that returns broken JSON is not a model problem, it is a missing-structure problem. Constrain generation with JSON mode and a schema, validate every response as untrusted input, retry with the specific error, and fall back cleanly when all else fails. Do that and the model becomes a dependable part of your pipeline instead of the flaky link everyone is afraid to build on.

AI Summary - 20-sec read - Reviewed by experts

The moment an LLM feeds another system - a database, an API, your app - free text stops being clever and starts being a bug. You need output your code can parse the same way every time.
Prompting "return JSON" is not enough. Models still wrap it in prose, add trailing commas, or drop a field on the one request that matters, and your parser throws in production.
The reliable stack is three layers: JSON mode or a strict schema so the model can only emit valid JSON, a validator that rejects anything off-shape, and a bounded retry that re-asks with the error.
Function calling and constrained decoding turn "usually valid" into "structurally guaranteed" - the model fills a schema you define rather than freestyling a string.
Short on time? We will make your LLM output parse reliably, with schemas and validation your pipeline can trust. Book a free call.

Short on time? Book a free call.

Why "return JSON" is not enough

Layer one: make invalid output impossible to emit

JSON mode. Most major APIs have a mode that constrains generation so the response is guaranteed to be syntactically valid JSON - no prose, no fences, no trailing commas. This alone removes the entire class of "it wrapped the object in text" failures.
Schema-constrained output and function calling. Go further and hand the model a JSON Schema (or a tool/function signature). Now it does not write a string that looks like your object - it fills the fields you defined, with the types you declared. This is the difference between hoping for the right shape and constraining the model so the wrong shape cannot be produced. Define required fields, enums for anything categorical, and types for every value.

Is your LLM output breaking downstream systems?

Get a free audit

Layer two: validate before you trust

Layer three: retry with the error, then fall back

Unparseable output is a production outage waiting to happen.

We will design the schema, wire JSON mode and validation, and add bounded retries so your LLM output is something your pipeline can actually depend on. Reply in 2 hrs, NDA on request.

Book a free call

Takeaways

The moment an LLM feeds another system, free text is a liability - you need output your code can parse identically every time.
Prompting "return JSON" is not enough; failures cluster under load and on long-tail inputs you did not test.
Use JSON mode and a schema (or function calling) so the model fills a defined shape instead of freestyling a string.
Validate every response before your logic touches it - a well-formed object can still carry nonsense values.
On failure, retry with the specific error, bound the attempts, then fall back cleanly - never ship an unvalidated object downstream.

Your LLM keeps returning broken JSON. Here is the fix

Why "return JSON" is not enough

Layer one: make invalid output impossible to emit

Layer two: validate before you trust

Layer three: retry with the error, then fall back

Put it together

Frequently asked questions

Is JSON mode the same as function calling?

Do I still need validation if I use a schema?

How many retries should I allow?

Does this add much latency?

Let's find what's breaking — and fix it

Your LLM keeps returning broken JSON. Here is the fix

Why "return JSON" is not enough

Layer one: make invalid output impossible to emit

Layer two: validate before you trust

Layer three: retry with the error, then fall back

Put it together

Frequently asked questions

Is JSON mode the same as function calling?

Do I still need validation if I use a schema?

How many retries should I allow?

Does this add much latency?

Let's find what's breaking — and fix it