AI Structured Outputs Guide: JSON Mode, Function Calling & Schemas (2026)
February 2026 guide — get reliable JSON from GPT-5, Claude, and Gemini. Structured outputs, function calling, response schemas compared. Code examples in Python and TypeScript.
Getting reliable, parseable output from large language models is one of the biggest engineering challenges in production AI systems. You prompt the model, it returns beautiful JSON… most of the time. Then at 3 AM, a response comes back with a trailing comma, a markdown code fence wrapper, or a chatty explanation before the JSON blob, and your pipeline breaks.
Every major LLM provider now offers structured output capabilities, but they all work differently. OpenAI has native Pydantic support with 100% guaranteed schema adherence. Anthropic uses tool use as the structured output mechanism. Google Gemini bakes response schemas directly into the generation config. Choosing the right approach — and understanding the tradeoffs — determines whether your LLM integration is rock-solid or held together with regex and prayer.
This guide compares structured output approaches across OpenAI, Anthropic, and Google as of February 2026, with working code examples in Python and TypeScript for every method.
Why Structured Outputs Matter
Before diving into provider-specific implementations, it helps to understand why structured outputs have become essential infrastructure rather than a nice-to-have.
Eliminates JSON parsing failures in production. The most common failure mode in LLM-powered applications is not bad reasoning — it is malformed output. A model that returns {"rating": 4.5, "review": "Great movie"} 99.7% of the time and Here's my review: {"rating": 4.5...} the other 0.3% will crash your parser thousands of times at scale. Structured outputs guarantee the response matches your expected format every time.
Reduces output tokens. When you ask a model to “return a JSON object with these fields,” the model often includes explanatory text, markdown formatting, or unnecessary whitespace. Structured output modes strip all of that away, returning only the data you requested. This directly reduces your API costs.
Enables type-safe LLM integrations. With Pydantic models (OpenAI) or JSON Schema definitions (Claude, Gemini), your IDE provides autocomplete, your type checker catches bugs, and your codebase treats LLM responses like any other typed data structure.
Required for function calling and tool use. Every agentic framework — whether you are building with LangChain, CrewAI, or raw API calls — relies on the model returning structured function call arguments. Understanding how structured outputs work is a prerequisite for building agents.
Provider Comparison (February 2026)
Here is a side-by-side view of structured output capabilities across the three major providers:
| Feature | OpenAI | Anthropic Claude | Google Gemini |
|---|---|---|---|
| JSON Mode | Yes (response_format) | Yes (prefill trick) | Yes (response_mime_type) |
| Strict Schema Enforcement | Yes (Structured Outputs) | No (high reliability) | Yes (response_schema) |
| Function Calling | Yes (tools) | Yes (tools) | Yes (tools) |
| Pydantic Support | Native (SDK method) | Via third-party libs | Via genai SDK |
| Guaranteed Valid JSON | Yes (100%) | No (~99%+ reliability) | Yes (with schema) |
| Streaming Support | Yes | Yes | Yes |
| Nested Object Schemas | Yes | Yes | Yes |
| Enum Constraints | Yes | Yes | Yes |
| Default Values | No (all fields required) | Via schema default | Limited |
| Max Schema Depth | 5 levels | No hard limit | 5 levels |
The fundamental difference: OpenAI and Gemini use constrained decoding to guarantee schema-valid output at the token generation level. Anthropic relies on the model’s instruction-following capability, which is highly reliable but technically not 100% guaranteed.
OpenAI Structured Outputs
OpenAI’s Structured Outputs feature, available on GPT-5 and GPT-4.1 family models, is the most developer-friendly implementation. It uses constrained decoding to guarantee that every response matches your Pydantic model exactly.
Python with Pydantic
from pydantic import BaseModel
from openai import OpenAI
class MovieReview(BaseModel):
title: str
year: int
rating: float
pros: list[str]
cons: list[str]
recommendation: str
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-5",
response_format=MovieReview,
messages=[
{
"role": "system",
"content": "You are a movie critic. Provide structured reviews."
},
{
"role": "user",
"content": "Review The Matrix (1999)"
}
]
)
review = response.choices[0].message.parsed
print(f"{review.title} ({review.year}): {review.rating}/10")
print(f"Pros: {', '.join(review.pros)}")
The .parse() method handles schema conversion, API call, and response parsing in one step. The review variable is a fully typed MovieReview instance — your IDE provides autocomplete, and any field access that does not match the schema is caught at development time.
TypeScript with Zod
import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const MovieReview = z.object({
title: z.string(),
year: z.number().int(),
rating: z.number().min(0).max(10),
pros: z.array(z.string()),
cons: z.array(z.string()),
recommendation: z.enum(["must-watch", "recommended", "skip"]),
});
const client = new OpenAI();
const response = await client.beta.chat.completions.parse({
model: "gpt-5",
response_format: zodResponseFormat(MovieReview, "movie_review"),
messages: [
{ role: "system", content: "You are a movie critic. Provide structured reviews." },
{ role: "user", content: "Review The Matrix (1999)" },
],
});
const review = response.choices[0].message.parsed;
// review is fully typed: review.recommendation is "must-watch" | "recommended" | "skip"
Complex Nested Schemas
OpenAI supports nested objects, arrays of objects, enums, and optional fields:
from pydantic import BaseModel
from enum import Enum
from typing import Optional
class Severity(str, Enum):
low = "low"
medium = "medium"
high = "high"
critical = "critical"
class CodeLocation(BaseModel):
file: str
line: int
column: Optional[int] = None
class CodeIssue(BaseModel):
severity: Severity
message: str
location: CodeLocation
suggestion: str
class CodeReview(BaseModel):
summary: str
issues: list[CodeIssue]
overall_quality: float
approved: bool
response = client.beta.chat.completions.parse(
model="gpt-5",
response_format=CodeReview,
messages=[
{"role": "system", "content": "Review this code and identify issues."},
{"role": "user", "content": code_snippet}
]
)
for issue in response.choices[0].message.parsed.issues:
print(f"[{issue.severity.value}] {issue.location.file}:{issue.location.line}")
print(f" {issue.message}")
print(f" Fix: {issue.suggestion}")
OpenAI JSON Mode (Simpler Alternative)
If you do not need strict schema enforcement and just want valid JSON, OpenAI also offers a simpler JSON mode:
response = client.chat.completions.create(
model="gpt-5",
response_format={"type": "json_object"},
messages=[
{
"role": "system",
"content": "Return a JSON object with fields: title, rating, summary."
},
{"role": "user", "content": "Review The Matrix"}
]
)
import json
data = json.loads(response.choices[0].message.content)
This guarantees valid JSON but does not guarantee the JSON matches any specific schema. The model might return {"title": "The Matrix", "score": 9} instead of {"title": "The Matrix", "rating": 9}. For production use, prefer Structured Outputs with a Pydantic model.
Claude Structured Outputs (Anthropic)
Anthropic’s Claude does not have a dedicated “structured output” mode. Instead, you achieve structured output through two mechanisms: tool use (recommended) and prefill (simpler but less reliable).
Method 1: Tool Use (Recommended)
The tool use approach defines a “tool” whose input schema matches your desired output format, then forces the model to call that tool:
import anthropic
import json
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
tools=[{
"name": "movie_review",
"description": "Generate a structured movie review",
"input_schema": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Movie title"},
"year": {"type": "integer", "description": "Release year"},
"rating": {
"type": "number",
"minimum": 0,
"maximum": 10,
"description": "Rating out of 10"
},
"pros": {
"type": "array",
"items": {"type": "string"},
"description": "Positive aspects"
},
"cons": {
"type": "array",
"items": {"type": "string"},
"description": "Negative aspects"
},
"recommendation": {
"type": "string",
"enum": ["must-watch", "recommended", "skip"]
}
},
"required": ["title", "year", "rating", "pros", "cons", "recommendation"]
}
}],
tool_choice={"type": "tool", "name": "movie_review"},
messages=[{"role": "user", "content": "Review The Matrix (1999)"}]
)
# Extract the tool call result
tool_use_block = next(
block for block in response.content if block.type == "tool_use"
)
review = tool_use_block.input
print(f"{review['title']} ({review['year']}): {review['rating']}/10")
The key line is tool_choice={"type": "tool", "name": "movie_review"}. This forces Claude to call the specified tool, which means the response will always contain structured data matching your schema. Without tool_choice, the model might decide to respond with plain text instead.
TypeScript with Claude
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 1024,
tools: [
{
name: "movie_review",
description: "Generate a structured movie review",
input_schema: {
type: "object" as const,
properties: {
title: { type: "string" },
year: { type: "integer" },
rating: { type: "number", minimum: 0, maximum: 10 },
pros: { type: "array", items: { type: "string" } },
cons: { type: "array", items: { type: "string" } },
recommendation: {
type: "string",
enum: ["must-watch", "recommended", "skip"],
},
},
required: ["title", "year", "rating", "pros", "cons", "recommendation"],
},
},
],
tool_choice: { type: "tool", name: "movie_review" },
messages: [{ role: "user", content: "Review The Matrix (1999)" }],
});
const toolBlock = response.content.find((block) => block.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
const review = toolBlock.input as {
title: string;
year: number;
rating: number;
pros: string[];
cons: string[];
recommendation: "must-watch" | "recommended" | "skip";
};
console.log(`${review.title}: ${review.rating}/10`);
}
Method 2: Prefill (Simpler, Less Reliable)
The prefill trick works by starting the assistant’s response with {, which nudges Claude to continue with valid JSON:
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": (
"Review The Matrix (1999). Return ONLY a JSON object with "
"fields: title (string), year (integer), rating (number 0-10), "
"pros (array of strings), cons (array of strings)."
)
},
{
"role": "assistant",
"content": "{"
}
]
)
import json
review = json.loads("{" + response.content[0].text)
This approach works well for simple schemas but has downsides: the model might occasionally include trailing text after the JSON, it does not enforce enum constraints, and there is no formal schema validation. For production use, tool use is the safer choice.
Gemini Structured Outputs (Google)
Google’s Gemini API supports structured output through response_schema in the generation config. Like OpenAI, Gemini uses constrained decoding to guarantee schema-valid output.
Python with Gemini
import google.generativeai as genai
model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content(
"Review The Matrix (1999)",
generation_config=genai.GenerationConfig(
response_mime_type="application/json",
response_schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"year": {"type": "integer"},
"rating": {"type": "number"},
"pros": {
"type": "array",
"items": {"type": "string"}
},
"cons": {
"type": "array",
"items": {"type": "string"}
},
"recommendation": {
"type": "string",
"enum": ["must-watch", "recommended", "skip"]
}
},
"required": ["title", "year", "rating", "pros", "cons", "recommendation"]
}
)
)
import json
review = json.loads(response.text)
print(f"{review['title']} ({review['year']}): {review['rating']}/10")
Gemini with Pydantic (via genai SDK)
The Google Gen AI SDK also supports Pydantic-style schema definitions:
from google.generativeai.types import GenerationConfig
from pydantic import BaseModel
class MovieReview(BaseModel):
title: str
year: int
rating: float
pros: list[str]
cons: list[str]
recommendation: str
model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content(
"Review The Matrix (1999)",
generation_config=GenerationConfig(
response_mime_type="application/json",
response_schema=MovieReview,
)
)
review = MovieReview.model_validate_json(response.text)
Gemini JSON Mode (Without Schema)
Like OpenAI, Gemini supports a simpler JSON mode without schema enforcement:
response = model.generate_content(
"Review The Matrix (1999). Return JSON with title, rating, and summary fields.",
generation_config=genai.GenerationConfig(
response_mime_type="application/json"
)
)
This guarantees valid JSON but does not enforce field names or types. Use response_schema for production.
Function Calling vs. Structured Outputs
These two concepts use the same underlying mechanism (JSON Schema) but serve different purposes. Confusing them is one of the most common mistakes developers make when building LLM applications.
Structured Outputs: Data Extraction
Structured outputs are for when you always want the model to return data in a fixed format. The model does not “decide” to return JSON — it is forced to.
Use cases:
- Extracting entities from text (names, dates, amounts)
- Classifying content (sentiment, category, priority)
- Generating structured content (reviews, reports, summaries)
- Transforming data between formats
# Structured output: ALWAYS returns this format
class ExtractedInvoice(BaseModel):
vendor: str
amount: float
currency: str
date: str
line_items: list[str]
response = client.beta.chat.completions.parse(
model="gpt-5",
response_format=ExtractedInvoice,
messages=[
{"role": "system", "content": "Extract invoice details from the text."},
{"role": "user", "content": invoice_text}
]
)
Function Calling: Agentic Actions
Function calling is for when the model should decide whether and which action to take. The model chooses to call a function based on the conversation context.
Use cases:
- Searching a database when the user asks a question
- Sending an email when the user requests it
- Looking up weather data when relevant
- Executing multi-step workflows
# Function calling: model DECIDES whether to call these
tools = [
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search the product database",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
"max_results": {"type": "integer", "default": 10}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "send_email",
"description": "Send an email to a customer",
"parameters": {
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"}
},
"required": ["to", "subject", "body"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-5",
tools=tools,
messages=[{"role": "user", "content": "Find laptops under $1000"}]
)
# Model might call search_database, or might respond with text
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
# Execute the function and return results to the model
Decision Framework
| Question | Use Structured Outputs | Use Function Calling |
|---|---|---|
| Should the model always return this format? | Yes | No |
| Does the model need to decide whether to act? | No | Yes |
| Are you extracting data from text? | Yes | No |
| Are you building an agent with tools? | No | Yes |
| Do you need multiple possible actions? | No | Yes |
In practice, many applications use both. An agentic system might use function calling for tool selection and structured outputs for formatting the final response to the user.
Best Practices for Schema Design
The quality of your structured output depends heavily on how you design your schema. A well-designed schema guides the model toward better responses and reduces edge cases.
1. Keep Schemas Flat When Possible
Deeply nested schemas increase the chance of errors and make responses harder to parse. If you can flatten the structure, do it.
# Avoid: deeply nested
class BadSchema(BaseModel):
metadata: dict # What goes in here? The model has to guess.
details: dict
# Prefer: flat and explicit
class GoodSchema(BaseModel):
title: str
author: str
published_date: str
word_count: int
category: str
summary: str
2. Use Enums for Categorical Fields
Enums constrain the model to valid values. Without them, you get inconsistent strings like “High”, “high”, “HIGH”, and “h” for the same concept.
from enum import Enum
class Priority(str, Enum):
low = "low"
medium = "medium"
high = "high"
critical = "critical"
class Ticket(BaseModel):
title: str
priority: Priority # Guaranteed to be one of the four values
assignee: str
3. Add Descriptions to Properties
Property descriptions act as instructions for the model. They are especially important for ambiguous field names.
# In JSON Schema (Claude, Gemini)
schema = {
"type": "object",
"properties": {
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence score between 0 and 1, where 1 means absolute certainty"
},
"reasoning": {
"type": "string",
"description": "Step-by-step explanation of how you arrived at the classification"
}
}
}
For Pydantic models (OpenAI), use Field:
from pydantic import BaseModel, Field
class Classification(BaseModel):
label: str = Field(description="The category label")
confidence: float = Field(
ge=0, le=1,
description="Confidence score between 0 and 1"
)
reasoning: str = Field(
description="Step-by-step explanation of the classification"
)
4. Validate on Your End
Even with “guaranteed” schema adherence, always validate responses in your application code. Schema guarantees cover structure, not semantics. A model might return {"rating": 0.0, "title": ""} — valid JSON, valid schema, but useless data.
def validate_review(review: MovieReview) -> bool:
if not review.title.strip():
return False
if review.rating < 0 or review.rating > 10:
return False
if len(review.pros) == 0 and len(review.cons) == 0:
return False
return True
5. Design Schemas Visually
For complex schemas with nested objects, arrays, and constraints, use a visual schema builder to design and validate your schema before writing any code. The JSON Schema Builder on DevTk.AI lets you create schemas with drag-and-drop, preview the output, and export the JSON Schema definition directly.
Cost Impact of Structured Outputs
Structured outputs are not just about reliability — they directly reduce your API spend by eliminating unnecessary output tokens.
Unstructured vs. Structured Token Comparison
Consider asking a model to review a movie. Here is what you get without structured output:
The Matrix (1999) is a groundbreaking sci-fi film. Here's my review:
**Rating:** 9.2/10
**Pros:**
- Revolutionary visual effects and "bullet time" sequences
- Deep philosophical themes about reality and choice
- Excellent performances from Keanu Reeves and Laurence Fishburne
**Cons:**
- Some dialogue feels stilted
- The sequels diminished the original's impact somewhat
Overall, I highly recommend The Matrix. It's a must-watch for any sci-fi fan.
That response uses approximately 85-95 output tokens. With structured output:
{
"title": "The Matrix",
"year": 1999,
"rating": 9.2,
"pros": [
"Revolutionary visual effects and bullet time sequences",
"Deep philosophical themes about reality and choice",
"Excellent performances from Keanu Reeves and Laurence Fishburne"
],
"cons": [
"Some dialogue feels stilted",
"Sequels diminished the original's impact"
],
"recommendation": "must-watch"
}
That is approximately 60-70 output tokens. The savings grow with longer responses. For a 500-token unstructured analysis, a structured version might use 200-250 tokens.
Cost Savings at Scale
Using Claude Sonnet 4.5 output pricing ($15.00 per 1M tokens) as a reference:
| Scenario | Unstructured Tokens | Structured Tokens | Monthly Savings (1M requests) |
|---|---|---|---|
| Short extraction | 80 tokens | 35 tokens | $675 |
| Medium analysis | 200 tokens | 100 tokens | $1,500 |
| Long report | 500 tokens | 250 tokens | $3,750 |
At scale, structured outputs can save 40-60% on output token costs. Use the AI Token Counter to measure your actual token counts, and the AI Model Pricing Calculator to project costs across different providers.
Common Pitfalls
Even with structured output support from all major providers, there are several mistakes that trip up developers in production.
Pitfall 1: Trusting the Schema Too Much
Schema enforcement guarantees the structure of the response, not the quality. A model can return {"sentiment": "positive", "confidence": 0.99} for a clearly negative review if the prompt is ambiguous. Always pair structured outputs with clear instructions in the system prompt.
Pitfall 2: Ignoring Refusal Responses
OpenAI’s Structured Outputs can return a refusal instead of parsed data if the model determines the request violates content policies:
response = client.beta.chat.completions.parse(
model="gpt-5",
response_format=MySchema,
messages=[{"role": "user", "content": potentially_harmful_request}]
)
if response.choices[0].message.refusal:
print(f"Request refused: {response.choices[0].message.refusal}")
else:
result = response.choices[0].message.parsed
Always check for refusals before accessing .parsed.
Pitfall 3: Oversized Schemas
Large schemas with dozens of fields and deep nesting increase latency and token consumption. The schema itself counts toward context tokens. Split large extraction tasks into multiple focused calls rather than one massive schema.
Pitfall 4: Not Handling Streaming Correctly
When streaming structured outputs, you receive partial JSON that is not valid until the stream completes. Do not attempt to parse until the final chunk arrives:
# OpenAI streaming with structured output
stream = client.beta.chat.completions.stream(
model="gpt-5",
response_format=MovieReview,
messages=[{"role": "user", "content": "Review The Matrix"}]
)
with stream as response:
for event in response:
# Partial updates available via event.snapshot
pass
# Final parsed result
final = response.get_final_completion()
review = final.choices[0].message.parsed
Choosing the Right Approach
Here is a decision tree for selecting the right structured output method:
Do you need 100% guaranteed schema adherence?
- Yes: Use OpenAI Structured Outputs or Gemini with
response_schema - No (high reliability is fine): Claude tool use works well
Are you already using Anthropic Claude for other reasons (quality, context window, prompt caching)?
- Yes: Use Claude tool use with
tool_choice— it is reliable enough for most production workloads - No: Consider OpenAI for the schema guarantee
Is your schema simple (flat object, few fields)?
- Yes: Any provider works. OpenAI JSON mode or Gemini JSON mode is the simplest
- No (nested objects, enums, arrays of objects): Use OpenAI Structured Outputs with Pydantic for the best developer experience
Are you building an agentic system where the model picks which tools to call?
- Yes: Use function calling (all providers support it)
- No: Use structured outputs (fixed format every time)
Bottom Line
Structured outputs have gone from a hack (regex parsing, retry loops) to a first-class feature across all major LLM providers. As of February 2026, OpenAI offers the most developer-friendly implementation with native Pydantic support and 100% schema guarantees. Claude delivers excellent reliability through tool use, with the added benefit of Anthropic’s strong instruction following. Gemini provides schema-enforced output with constrained decoding, similar to OpenAI’s approach.
For most production applications, the choice comes down to which provider you are already using. If you are starting fresh and structured output is a critical requirement, OpenAI’s Structured Outputs with Pydantic models gives you the least friction. If you are already invested in Claude’s ecosystem and value its reasoning quality, tool use with forced tool_choice is production-ready.
Regardless of provider, invest time in schema design. Keep schemas flat, use enums for categorical fields, add descriptions to ambiguous properties, and always validate the semantic content of responses even when the structural format is guaranteed.
Related tools and guides:
- JSON Schema Builder — Design structured output schemas visually with drag-and-drop
- AI Token Counter — Measure token counts before and after switching to structured output
- AI Model Pricing Calculator — Compare costs across 40+ models with your usage pattern
- How to Cut AI API Costs by 80% — Structured outputs are one of eight strategies covered
- AI API Pricing Comparison 2026 — Full pricing table for all providers
- Claude API Pricing Guide 2026 — Claude tool use pricing and prompt caching details
- OpenAI API Pricing Guide 2026 — GPT-5 structured output pricing