Structured Output from LLMs: How to Get Guaranteed JSON Every Time

Structured output forces LLMs to return valid, schema-compliant JSON instead of free text. This guide covers constrained decoding, provider implementations (OpenAI, Anthropic, Google), Zod and Pydantic schemas, the format tax on reasoning, and practical patterns for coding agents.

April 5, 2026 · 2 min read

What Structured Output Is

Every production application that calls an LLM needs to do something with the response. Parse it into an object. Store it in a database. Pass it to another function. Route it to a downstream service. All of these require the response to have a predictable shape. Free text doesn't have a predictable shape.

Structured output is a contract: you provide a JSON schema, and the model returns a response that conforms to it. Not "usually conforms." Not "conforms if you prompt carefully." The response is mathematically guaranteed to be valid against your schema, because the token generation process itself is constrained to only produce valid output.

100%
Schema compliance with constrained decoding
0
Retries needed for malformed JSON
~50µs
CPU overhead per token (llguidance)

Before constrained decoding, developers used three approaches to get JSON from LLMs. Prompt engineering ("Return your answer as JSON with fields name, age, and email") worked 90-98% of the time, but a 2% failure rate at scale is a production incident. JSON mode, introduced by OpenAI in late 2023, guaranteed syntactically valid JSON but not schema compliance. The model might return {"full_name": "John"} when you expected {"name": "John"}. Structured output is the third generation: both syntactically valid and schema-compliant.

ApproachGuaranteesFailure Mode
Prompt engineeringNone. Best-effort.Malformed JSON, missing fields, wrong types, markdown wrapping
JSON modeValid JSON syntaxWrong field names, missing required properties, unexpected types
Structured outputFull schema complianceNone. Output matches schema by construction.

How Constrained Decoding Works

LLMs generate text one token at a time. At each step, the model produces a probability distribution over its entire vocabulary (typically 100,000+ tokens). Normally, the model samples from this distribution freely. Constrained decoding intervenes at this step: before sampling, it masks out every token that would make the output violate the target schema.

The process starts when you submit a JSON schema. The provider compiles that schema into a finite state machine (FSM) or context-free grammar (CFG) that represents all valid strings the schema accepts. At each generation step, the system checks which tokens are valid transitions from the current state. Invalid tokens get their probability set to zero. The model can only sample from tokens that keep the output on a valid path.

1. Schema compilation

Your JSON schema is compiled into a grammar or finite state machine. This happens once per unique schema and is cached. OpenAI reports slightly higher latency on the first request for a new schema, then cached performance afterward.

2. Token masking

At each generation step, the system computes which tokens are valid given the current generation state. If the grammar says the next valid tokens are digits (because we're inside an integer field), all non-digit tokens are masked to probability zero.

3. Constrained sampling

The model samples from the reduced token set. Because only valid tokens remain, the output is guaranteed to be schema-compliant. The model still controls the content (which string to put in a field, which number to assign), but the structure is locked.

4. Scaffolding bypass

Advanced implementations skip deterministic tokens entirely. If the grammar dictates the next characters must be a closing brace and comma, the system writes those directly without running the model. This reduces latency and token costs.

Performance impact

Constrained decoding does not slow generation in practice. The token masking computation is on the order of 50 microseconds per token for a 128K-token vocabulary (measured by llguidance). Scaffolding bypass actually speeds up generation by skipping deterministic tokens. SGLang reports an order-of-magnitude speedup for JSON generation compared to unconstrained generation, because structural tokens are written instantly rather than sampled.

What happens at each token position

// Schema: { type: "object", properties: { age: { type: "integer" } } }
// The model is generating: {"age": |
//                                    ^ cursor here

// Without constrained decoding:
// All ~100K tokens are valid. Model might produce:
//   "twenty-five"  (string, not integer)
//   25.5           (float, not integer)
//   null           (null, not integer)

// With constrained decoding:
// Only digit tokens (0-9) and closing tokens are valid.
// Model MUST produce a valid integer.
// Token mask: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] + structural tokens
// Result: 25 (guaranteed integer)

Provider Implementations

Every major LLM provider now supports native structured output through constrained decoding. The APIs differ in parameter names and schema configuration, but the underlying mechanism is the same. Here is the current state of each provider as of April 2026.

OpenAI: response_format with json_schema

OpenAI shipped structured outputs in August 2024. There are two integration points: response_format for shaping the model's direct response, and strict: true on tool definitions for shaping function call arguments. Both use constrained decoding.

OpenAI structured output with response_format

import OpenAI from "openai";

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: "gpt-5",
  messages: [
    { role: "user", content: "Extract: John Smith, john@acme.com, VP Engineering" }
  ],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "contact",
      strict: true,
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          email: { type: "string" },
          title: { type: "string" },
          department: { type: ["string", "null"] }
        },
        required: ["name", "email", "title", "department"],
        additionalProperties: false
      }
    }
  }
});

const contact = JSON.parse(response.choices[0].message.content);
// { name: "John Smith", email: "john@acme.com",
//   title: "VP Engineering", department: null }
// Guaranteed to match schema. No try/catch needed.

Schema constraints in strict mode

OpenAI's strict mode requires additionalProperties: false on every object and all properties listed in the required array. Optional fields use type unions with null (e.g., ["string", "null"]). Maximum 100 object properties total with up to 5 levels of nesting. Some JSON Schema features like pattern, minItems, and conditional schemas are not supported.

Anthropic: output_config.format

Anthropic launched structured outputs in beta (November 2025) and made them generally available in early 2026. The API uses output_config.format with a json_schema type. Supported on Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, Opus 4.5, and Haiku 4.5.

Anthropic structured output with output_config

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Extract: John Smith, john@acme.com, VP Engineering"
    }
  ],
  output_config: {
    format: {
      type: "json_schema",
      schema: {
        type: "object",
        properties: {
          name: { type: "string" },
          email: { type: "string" },
          title: { type: "string" },
          department: { type: ["string", "null"] }
        },
        required: ["name", "email", "title", "department"],
        additionalProperties: false
      }
    }
  }
});

const textBlock = response.content.find(b => b.type === "text");
const contact = JSON.parse(textBlock.text);
// Schema-compliant. Same guarantee as OpenAI strict mode.

Anthropic also supports structured output through strict tool use. Setting strict: true on a tool definition guarantees the model's tool input conforms to your schema. This is particularly useful for agent workflows where the final output of a multi-turn tool-use conversation needs to match a specific shape.

Google Gemini: controlled generation

Google calls it "controlled generation." The API uses response_mime_type set to application/json and response_json_schema for the schema definition. Supported on Gemini 2.5 Pro, 2.5 Flash, and the Gemini 3 series.

Gemini structured output with controlled generation

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY });

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Extract: John Smith, john@acme.com, VP Engineering",
  config: {
    responseMimeType: "application/json",
    responseJsonSchema: {
      type: "object",
      properties: {
        name: { type: "string" },
        email: { type: "string" },
        title: { type: "string" },
        department: { type: ["string", "null"] }
      },
      required: ["name", "email", "title", "department"]
    }
  }
});

const contact = JSON.parse(response.text);
FeatureOpenAIAnthropicGoogle Gemini
API parameterresponse_format.json_schemaoutput_config.formatresponseJsonSchema
Tool strict modestrict: true per toolstrict: true per toolVia function declarations
SDK helperszodResponseFormat (Zod), .parse() (Pydantic)zodOutputFormat (Zod), .parse() (Pydantic)zodToJsonSchema conversion
StreamingPartial JSON chunksPartial JSON chunksPartial JSON chunks
Max schema depth5 levels, 100 properties5 levels, 100 propertiesNo documented limit
Schema cachingAutomatic. First call compiles.Automatic. First call compiles.Automatic.

Zod and Pydantic: Define Once, Validate Everywhere

Writing raw JSON schemas by hand is tedious and error-prone. Zod (TypeScript) and Pydantic (Python) let you define schemas in your native language, then convert to JSON Schema automatically. The provider SDKs have first-class support for both. Define a schema once, get compile-time type safety, runtime validation, and LLM schema enforcement from the same definition.

Zod with OpenAI (TypeScript)

zodResponseFormat: type-safe structured output

import OpenAI from "openai";
import { zodResponseFormat } from "openai/helpers/zod";
import { z } from "zod";

// Define the schema once
const CodeReview = z.object({
  file: z.string().describe("The file path reviewed"),
  issues: z.array(z.object({
    line: z.number().int().describe("Line number"),
    severity: z.enum(["error", "warning", "info"]),
    message: z.string().describe("Description of the issue"),
    suggestion: z.string().describe("Suggested fix")
  })),
  summary: z.string().describe("One-line summary of findings"),
  approved: z.boolean()
});

// TypeScript knows the return type
type CodeReviewResult = z.infer<typeof CodeReview>;

const openai = new OpenAI();
const completion = await openai.chat.completions.parse({
  model: "gpt-5",
  messages: [
    { role: "system", content: "Review the following code for bugs and style issues." },
    { role: "user", content: fileContents }
  ],
  response_format: zodResponseFormat(CodeReview, "code_review")
});

// completion.choices[0].message.parsed is typed as CodeReviewResult
const review = completion.choices[0].message.parsed;
console.log(review.issues.length); // TypeScript autocomplete works
console.log(review.approved);      // boolean, guaranteed

Zod with Anthropic (TypeScript)

zodOutputFormat with Anthropic SDK

import Anthropic from "@anthropic-ai/sdk";
import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";

const ContactSchema = z.object({
  name: z.string(),
  email: z.string(),
  company: z.string(),
  role: z.enum(["engineer", "manager", "executive", "other"])
});

const client = new Anthropic();
const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  messages: [
    { role: "user", content: "Extract contact: Jane Doe, CTO at Stripe, jane@stripe.com" }
  ],
  output_format: zodOutputFormat(ContactSchema, "contact")
});

// SDK parses and validates automatically
const contact = JSON.parse(
  response.content.find(b => b.type === "text")?.text ?? "{}"
);

Zod with Vercel AI SDK

generateObject: the cleanest Zod integration

import { generateObject } from "ai";
import { openai } from "@ai-sdk/openai";
import { z } from "zod";

const { object } = await generateObject({
  model: openai("gpt-5"),
  schema: z.object({
    tasks: z.array(z.object({
      title: z.string().describe("Short task title"),
      priority: z.enum(["high", "medium", "low"]),
      estimatedHours: z.number().describe("Estimated hours to complete"),
      dependencies: z.array(z.string()).describe("Task titles this depends on")
    })),
    totalHours: z.number()
  }),
  prompt: "Break down this project into tasks: Build a REST API for user management"
});

// object is fully typed. No JSON.parse. No validation code.
// The AI SDK handles schema conversion, API call, and parsing.
for (const task of object.tasks) {
  console.log(`[${task.priority}] ${task.title} - ${task.estimatedHours}h`);
}

Pydantic with OpenAI (Python)

Pydantic models with OpenAI .parse()

from pydantic import BaseModel
from openai import OpenAI

class Issue(BaseModel):
    line: int
    severity: str  # "error" | "warning" | "info"
    message: str
    suggestion: str

class CodeReview(BaseModel):
    file: str
    issues: list[Issue]
    summary: str
    approved: bool

client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "Review the code for bugs."},
        {"role": "user", "content": file_contents}
    ],
    response_format=CodeReview
)

review = completion.choices[0].message.parsed
# review is a CodeReview instance with full type hints
print(f"{len(review.issues)} issues found, approved: {review.approved}")

Zod .describe() matters

When using Zod schemas, call .describe() on fields that need context. The description string is sent to the model as part of the JSON schema and directly influences output quality. z.number().describe("Line number in the source file") gives the model better guidance than a bare z.number(). More descriptive schemas produce more accurate structured output.

The Format Tax: When Structured Output Hurts

Structured output is not free. Research published in April 2026 ("The Format Tax") measured the accuracy cost of requiring LLMs to produce structured formats instead of free text. The findings are nuanced and important for anyone building production systems.

3-9pp
Accuracy loss on open-weight models
~0pp
Accuracy loss on frontier closed models
95%
Of significant effects showed degradation

Open-weight models (Llama, Mistral, Qwen) consistently lose 3-9 percentage points of accuracy on reasoning benchmarks when generating structured output. The worst case was MATH-500, where specific configurations lost up to 17.8 percentage points. Writing quality degraded similarly when LaTeX formatting was required.

Frontier closed-weight models tell a different story. Claude Haiku 4.5, Grok 4.1 Fast, and recent GPT variants showed near-zero or even positive deltas. This suggests the format tax is not inherent to structured generation. It can be trained away, or it correlates with model scale and instruction tuning quality.

The surprising root cause

The researchers found that format-requesting instructions alone cause most of the accuracy loss, before any decoder constraint is applied. Simply telling a model "respond in JSON" degrades reasoning. The constrained decoder adds only minor additional degradation on top of that. This means the problem is cognitive, not mechanical: the model spends capacity on formatting concerns that competes with reasoning capacity.

Mitigation strategies

Two-pass generation

Generate a freeform answer first, then reformat into the target schema in a second pass. Recovers approximately 6.8 percentage points of lost accuracy on average. Costs 2x the tokens, but preserves reasoning quality.

Extended thinking

Enable chain-of-thought or extended thinking within a single generation, then constrain only the final output. Recovers approximately 9.2 percentage points on average. Higher variance but single-pass. Claude and OpenAI reasoning models support this natively.

Practical takeaway

If you're using frontier models (Claude Sonnet/Opus, GPT-5, Gemini 2.5 Pro), the format tax is negligible. Use structured output everywhere. If you're using open-weight models, measure your specific task. For classification and extraction, structured output often improves accuracy. For complex reasoning, consider two-pass generation or extended thinking to preserve quality.

Structured Output vs Free Text: When to Use Which

Structured output is not a universal upgrade. It solves a specific problem (reliable parsing for machine consumption) and introduces a specific tradeoff (reduced flexibility, potential reasoning degradation on weaker models). The decision is straightforward.

ScenarioUseWhy
API responses consumed by codeStructured outputDownstream code needs predictable types and fields
Tool call arguments in agentsStructured output (strict)Invalid arguments crash tool execution
Data extraction from documentsStructured outputEntity types, counts, and classifications need schema enforcement
Subagent communicationStructured outputAgent-to-agent messages must be parseable without ambiguity
Chat responses shown to usersFree textUsers read prose, not JSON
Creative writing, brainstormingFree textStructure constrains creative exploration
Complex multi-step reasoningFree text (then reformat)Format tax degrades reasoning on open-weight models
Summarization for human readersFree textStructured output forces artificial field boundaries on fluid content

The general rule: if the output is consumed by code, use structured output. If it is consumed by humans, use free text. If it is consumed by code but requires complex reasoning to produce, consider a two-pass approach where the model reasons in free text and then reformats.

Patterns for Coding Agents

Coding agents are the most structured-output-intensive applications. Every tool call is structured output. Every code edit is structured output. Every file operation has a schema. An agent that makes 50 tool calls per task needs all 50 to be schema-valid, or the task fails. This is where strict mode and constrained decoding pay for themselves.

Tool call schemas

Every tool your agent exposes is defined by a JSON schema for its parameters. With strict mode enabled, the model's arguments are guaranteed to match. Without it, you need defensive parsing, type coercion, and fallback logic for every tool. Here are the core schemas coding agents use.

File operation tools with strict schemas

const tools = [
  {
    type: "function",
    name: "read_file",
    strict: true,
    description: "Read the contents of a file at the given path",
    parameters: {
      type: "object",
      properties: {
        path: { type: "string", description: "File path relative to project root" },
        start_line: { type: ["integer", "null"], description: "First line to read (1-indexed). Null for start of file." },
        end_line: { type: ["integer", "null"], description: "Last line to read. Null for end of file." }
      },
      required: ["path", "start_line", "end_line"],
      additionalProperties: false
    }
  },
  {
    type: "function",
    name: "write_file",
    strict: true,
    description: "Write content to a file, creating it if it doesn't exist",
    parameters: {
      type: "object",
      properties: {
        path: { type: "string", description: "File path relative to project root" },
        content: { type: "string", description: "The full file content to write" }
      },
      required: ["path", "content"],
      additionalProperties: false
    }
  },
  {
    type: "function",
    name: "search_code",
    strict: true,
    description: "Search the codebase using a regex pattern. Returns matching lines with file paths and line numbers.",
    parameters: {
      type: "object",
      properties: {
        pattern: { type: "string", description: "Regex pattern to search for" },
        file_glob: { type: ["string", "null"], description: "Glob to filter files, e.g. '*.ts'. Null for all files." },
        max_results: { type: "integer", description: "Maximum number of matches to return" }
      },
      required: ["pattern", "file_glob", "max_results"],
      additionalProperties: false
    }
  }
];

Code edit operations

The model needs to express code edits in a structured format that your application can apply reliably. There are three common patterns, each with different tradeoffs.

Full file rewrite

Model returns the entire file with edits applied. Simple to implement. Wasteful on tokens: editing one line in a 500-line file costs 500 lines of output tokens. Works for small files.

Search-and-replace

Model returns the old text and new text. Compact and unambiguous. Fails when the old text appears multiple times. Claude Code and most agents use this pattern because it scales well to large files.

Unified diff

Model returns a standard unified diff with line numbers and context. Compact. Fragile: off-by-one line numbers cause patch failures. Works best with a fuzzy apply step that handles minor misalignment.

Search-and-replace edit schema

{
  type: "function",
  name: "edit_file",
  strict: true,
  description: "Apply a targeted edit to a file by replacing old_text with new_text",
  parameters: {
    type: "object",
    properties: {
      path: {
        type: "string",
        description: "File path relative to project root"
      },
      old_text: {
        type: "string",
        description: "The exact text to find and replace. Must match uniquely."
      },
      new_text: {
        type: "string",
        description: "The replacement text"
      }
    },
    required: ["path", "old_text", "new_text"],
    additionalProperties: false
  }
}

Structured output for agent orchestration

When one agent dispatches work to another, the message format must be structured. The orchestrator needs to know which subagent to route to, what the task is, what context to provide, and what format the result should take. Free text between agents is a recipe for cascading failures.

Orchestrator dispatch schema

const DispatchSchema = z.object({
  subtask: z.string().describe("Clear description of what the subagent should do"),
  agent_type: z.enum(["coder", "reviewer", "tester", "researcher"]),
  context: z.object({
    files: z.array(z.string()).describe("File paths relevant to the subtask"),
    constraints: z.array(z.string()).describe("Rules the subagent must follow"),
    prior_results: z.string().describe("Summary of work done so far")
  }),
  expected_output: z.enum(["code_edit", "review_report", "test_results", "analysis"]),
  timeout_seconds: z.number().int().describe("Max time before the subtask is killed")
});

// The orchestrator model generates this schema,
// your application routes to the correct subagent,
// and the subagent returns its own structured result.

Morph and Structured Responses

Morph's APIs are built for agent workflows where structured input and output are non-negotiable. The Fast Apply API takes structured input (original code + edit description) and returns structured output (the edited file). No prompt engineering. No JSON parsing. The API contract is the schema.

10,500
Tokens/sec (Fast Apply)
98%
Edit accuracy (morph-v3-fast)
<50ms
Sandbox cold start

For agent builders, this matters because code editing is the most latency-sensitive structured operation. A general-purpose LLM generating a full-file rewrite at 80 tok/s takes 6 seconds for a 500-line file. Morph's specialized model processes the same edit at 10,500 tok/s. The response is structured by construction: input schema, output schema, no ambiguity.

Morph Fast Apply as a structured tool

// Define as a tool in your agent
const fastApplyTool = {
  type: "function",
  name: "apply_code_edit",
  strict: true,
  description: "Apply a code edit using Morph Fast Apply. Faster and more accurate than full-file rewrites.",
  parameters: {
    type: "object",
    properties: {
      original_code: { type: "string", description: "Current file contents" },
      edit_snippet: { type: "string", description: "The edit to apply (diff, snippet, or description)" },
      filename: { type: "string", description: "Filename for language detection" }
    },
    required: ["original_code", "edit_snippet", "filename"],
    additionalProperties: false
  }
};

// Execute the tool call
async function executeFastApply(args: {
  original_code: string;
  edit_snippet: string;
  filename: string;
}) {
  const response = await fetch("https://api.morphllm.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${process.env.MORPH_API_KEY}`
    },
    body: JSON.stringify({
      model: "morph-v3-fast",
      messages: [
        { role: "user", content: args.edit_snippet }
      ],
      original_code: args.original_code,
      filename: args.filename
    })
  });
  const data = await response.json();
  return data.choices[0].message.content; // The edited file
}

WarpGrep follows the same principle. The search query is structured input (query string, filters, scope). The results are structured output (file paths, line numbers, matched content, relevance scores). Your agent's search_code tool maps directly to WarpGrep's API. No parsing intermediate text. No extracting file paths from prose.

Best Practices

Always use strict mode in production

Best-effort JSON is a testing convenience, not a production strategy. Enable strict mode (or its equivalent) on every structured output call. The first-request schema compilation latency is negligible after caching.

Use .describe() on every Zod field

The description string is sent to the model as part of the schema. It directly impacts output quality. z.string().describe('ISO 8601 date') produces better results than a bare z.string(). Treat descriptions as prompts for individual fields.

Keep schemas flat when possible

Deeply nested schemas are harder for models to follow and hit provider depth limits faster. If your schema has more than 3 levels of nesting, consider flattening. Use arrays of objects at the top level rather than deeply nested hierarchies.

Use enums for constrained fields

If a field has a finite set of valid values, use an enum. z.enum(['error', 'warning', 'info']) is better than z.string() with a description. Enums reduce the token space and eliminate invalid values entirely.

Separate reasoning from formatting

For complex tasks on open-weight models, generate freeform reasoning first, then extract structured data in a second pass. This avoids the format tax on reasoning quality. For frontier models, single-pass structured output works fine.

Version your schemas

When your schema changes, downstream consumers break. Use explicit versioning or additive-only changes. Adding a nullable field is safe. Removing a field or changing its type requires coordination with consumers.

Schema design is prompt engineering

The JSON schema you send to the model is part of the prompt. Field names, descriptions, enum values, and nesting structure all influence output quality. A field named severity with description "How critical this issue is: error (blocks deployment), warning (should fix), info (nice to know)" produces more accurate classifications than a field named level with no description. Invest the same care in schema design that you invest in system prompts.

Frequently Asked Questions

What is structured output in LLMs?

Structured output forces an LLM to return responses conforming to a predefined JSON schema. It uses constrained decoding, which restricts token generation at inference time so only schema-valid tokens can be produced. This guarantees 100% schema compliance with no retries needed for malformed output.

What is the difference between JSON mode and structured output?

JSON mode guarantees syntactically valid JSON but not schema compliance. The model returns valid JSON, but fields might be wrong. Structured output (strict mode) guarantees both valid syntax and full schema conformance: correct field names, types, required properties, and enum values. Use structured output for production. JSON mode is a fallback for cases where you have no schema upfront.

How does constrained decoding work?

The provider compiles your JSON schema into a grammar or finite state machine. At each token generation step, the system computes which tokens are valid given the current state and masks invalid tokens to probability zero. The model samples only from valid tokens. This guarantees the output matches the schema by construction, not by validation.

Which LLM providers support structured output?

All major providers as of 2026. OpenAI uses response_format with type: "json_schema". Anthropic uses output_config.format. Google Gemini uses responseMimeType and responseJsonSchema. Open-source inference engines (vLLM, SGLang, llama.cpp) also support constrained decoding via grammar specifications. Provider-agnostic libraries like Instructor and the Vercel AI SDK abstract these differences.

Does structured output hurt LLM reasoning?

It depends on the model. Frontier closed-weight models (Claude Sonnet/Opus, GPT-5, Gemini 2.5 Pro) show near-zero reasoning degradation. Open-weight models lose 3-9 percentage points on reasoning benchmarks. The primary cause is the format-requesting instruction itself, not the constrained decoder. Two-pass generation (reason freely, then format) and extended thinking modes recover most of the lost accuracy.

Should I use Zod or Pydantic for LLM structured output?

Use Zod for TypeScript and Pydantic for Python. Both convert to JSON Schema, which is what providers consume. OpenAI provides zodResponseFormat and .parse() for Pydantic. Anthropic provides zodOutputFormat and .parse(). The Vercel AI SDK's generateObject accepts Zod schemas directly. The choice is a language decision, not a capability decision.

Related Guides

Structured APIs for agent builders

Morph's Fast Apply and WarpGrep APIs return structured responses by default. Define them as tools, wire them into your agent, and get guaranteed schema-compliant results at 10,500 tok/s.