AI Structured Outputs Guide: JSON Mode, Function Calling & Schemas (2026)

Getting reliable, parseable output from large language models is one of the biggest engineering challenges in production AI systems. You prompt the model, it returns beautiful JSON… most of the time. Then at 3 AM, a response comes back with a trailing comma, a markdown code fence wrapper, or a chatty explanation before the JSON blob, and your pipeline breaks.

Every major LLM provider now offers structured output capabilities, but they all work differently. OpenAI has native Pydantic support with 100% guaranteed schema adherence. Anthropic uses tool use as the structured output mechanism. Google Gemini bakes response schemas directly into the generation config. Choosing the right approach — and understanding the tradeoffs — determines whether your LLM integration is rock-solid or held together with regex and prayer.

This guide compares structured output approaches across OpenAI, Anthropic, and Google as of February 2026, with working code examples in Python and TypeScript for every method.

Why Structured Outputs Matter

Before diving into provider-specific implementations, it helps to understand why structured outputs have become essential infrastructure rather than a nice-to-have.

Eliminates JSON parsing failures in production. The most common failure mode in LLM-powered applications is not bad reasoning — it is malformed output. A model that returns {"rating": 4.5, "review": "Great movie"} 99.7% of the time and Here's my review: {"rating": 4.5...} the other 0.3% will crash your parser thousands of times at scale. Structured outputs guarantee the response matches your expected format every time.

Reduces output tokens. When you ask a model to “return a JSON object with these fields,” the model often includes explanatory text, markdown formatting, or unnecessary whitespace. Structured output modes strip all of that away, returning only the data you requested. This directly reduces your API costs.

Enables type-safe LLM integrations. With Pydantic models (OpenAI) or JSON Schema definitions (Claude, Gemini), your IDE provides autocomplete, your type checker catches bugs, and your codebase treats LLM responses like any other typed data structure.

Required for function calling and tool use. Every agentic framework — whether you are building with LangChain, CrewAI, or raw API calls — relies on the model returning structured function call arguments. Understanding how structured outputs work is a prerequisite for building agents.

Provider Comparison (February 2026)

Here is a side-by-side view of structured output capabilities across the three major providers:

Feature	OpenAI	Anthropic Claude	Google Gemini
JSON Mode	Yes (`response_format`)	Yes (prefill trick)	Yes (`response_mime_type`)
Strict Schema Enforcement	Yes (Structured Outputs)	No (high reliability)	Yes (`response_schema`)
Function Calling	Yes (`tools`)	Yes (`tools`)	Yes (`tools`)
Pydantic Support	Native (SDK method)	Via third-party libs	Via `genai` SDK
Guaranteed Valid JSON	Yes (100%)	No (~99%+ reliability)	Yes (with schema)
Streaming Support	Yes	Yes	Yes
Nested Object Schemas	Yes	Yes	Yes
Enum Constraints	Yes	Yes	Yes
Default Values	No (all fields required)	Via schema `default`	Limited
Max Schema Depth	5 levels	No hard limit	5 levels

The fundamental difference: OpenAI and Gemini use constrained decoding to guarantee schema-valid output at the token generation level. Anthropic relies on the model’s instruction-following capability, which is highly reliable but technically not 100% guaranteed.

OpenAI Structured Outputs

OpenAI’s Structured Outputs feature, available on GPT-5 and GPT-4.1 family models, is the most developer-friendly implementation. It uses constrained decoding to guarantee that every response matches your Pydantic model exactly.

Python with Pydantic

from pydantic import BaseModel
from openai import OpenAI

class MovieReview(BaseModel):
    title: str
    year: int
    rating: float
    pros: list[str]
    cons: list[str]
    recommendation: str

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-5",
    response_format=MovieReview,
    messages=[
        {
            "role": "system",
            "content": "You are a movie critic. Provide structured reviews."
        },
        {
            "role": "user",
            "content": "Review The Matrix (1999)"
        }
    ]
)

review = response.choices[0].message.parsed
print(f"{review.title} ({review.year}): {review.rating}/10")
print(f"Pros: {', '.join(review.pros)}")

The .parse() method handles schema conversion, API call, and response parsing in one step. The review variable is a fully typed MovieReview instance — your IDE provides autocomplete, and any field access that does not match the schema is caught at development time.

TypeScript with Zod

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const MovieReview = z.object({
  title: z.string(),
  year: z.number().int(),
  rating: z.number().min(0).max(10),
  pros: z.array(z.string()),
  cons: z.array(z.string()),
  recommendation: z.enum(["must-watch", "recommended", "skip"]),
});

const client = new OpenAI();

const response = await client.beta.chat.completions.parse({
  model: "gpt-5",
  response_format: zodResponseFormat(MovieReview, "movie_review"),
  messages: [
    { role: "system", content: "You are a movie critic. Provide structured reviews." },
    { role: "user", content: "Review The Matrix (1999)" },
  ],
});

const review = response.choices[0].message.parsed;
// review is fully typed: review.recommendation is "must-watch" | "recommended" | "skip"

Complex Nested Schemas

OpenAI supports nested objects, arrays of objects, enums, and optional fields:

from pydantic import BaseModel
from enum import Enum
from typing import Optional

class Severity(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class CodeLocation(BaseModel):
    file: str
    line: int
    column: Optional[int] = None

class CodeIssue(BaseModel):
    severity: Severity
    message: str
    location: CodeLocation
    suggestion: str

class CodeReview(BaseModel):
    summary: str
    issues: list[CodeIssue]
    overall_quality: float
    approved: bool

response = client.beta.chat.completions.parse(
    model="gpt-5",
    response_format=CodeReview,
    messages=[
        {"role": "system", "content": "Review this code and identify issues."},
        {"role": "user", "content": code_snippet}
    ]
)

for issue in response.choices[0].message.parsed.issues:
    print(f"[{issue.severity.value}] {issue.location.file}:{issue.location.line}")
    print(f"  {issue.message}")
    print(f"  Fix: {issue.suggestion}")

OpenAI JSON Mode (Simpler Alternative)

If you do not need strict schema enforcement and just want valid JSON, OpenAI also offers a simpler JSON mode:

response = client.chat.completions.create(
    model="gpt-5",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "Return a JSON object with fields: title, rating, summary."
        },
        {"role": "user", "content": "Review The Matrix"}
    ]
)

import json
data = json.loads(response.choices[0].message.content)

This guarantees valid JSON but does not guarantee the JSON matches any specific schema. The model might return {"title": "The Matrix", "score": 9} instead of {"title": "The Matrix", "rating": 9}. For production use, prefer Structured Outputs with a Pydantic model.

Claude Structured Outputs (Anthropic)

Anthropic’s Claude does not have a dedicated “structured output” mode. Instead, you achieve structured output through two mechanisms: tool use (recommended) and prefill (simpler but less reliable).

Method 1: Tool Use (Recommended)

The tool use approach defines a “tool” whose input schema matches your desired output format, then forces the model to call that tool:

import anthropic
import json

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    tools=[{
        "name": "movie_review",
        "description": "Generate a structured movie review",
        "input_schema": {
            "type": "object",
            "properties": {
                "title": {"type": "string", "description": "Movie title"},
                "year": {"type": "integer", "description": "Release year"},
                "rating": {
                    "type": "number",
                    "minimum": 0,
                    "maximum": 10,
                    "description": "Rating out of 10"
                },
                "pros": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Positive aspects"
                },
                "cons": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Negative aspects"
                },
                "recommendation": {
                    "type": "string",
                    "enum": ["must-watch", "recommended", "skip"]
                }
            },
            "required": ["title", "year", "rating", "pros", "cons", "recommendation"]
        }
    }],
    tool_choice={"type": "tool", "name": "movie_review"},
    messages=[{"role": "user", "content": "Review The Matrix (1999)"}]
)

# Extract the tool call result
tool_use_block = next(
    block for block in response.content if block.type == "tool_use"
)
review = tool_use_block.input
print(f"{review['title']} ({review['year']}): {review['rating']}/10")

The key line is tool_choice={"type": "tool", "name": "movie_review"}. This forces Claude to call the specified tool, which means the response will always contain structured data matching your schema. Without tool_choice, the model might decide to respond with plain text instead.

TypeScript with Claude

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  tools: [
    {
      name: "movie_review",
      description: "Generate a structured movie review",
      input_schema: {
        type: "object" as const,
        properties: {
          title: { type: "string" },
          year: { type: "integer" },
          rating: { type: "number", minimum: 0, maximum: 10 },
          pros: { type: "array", items: { type: "string" } },
          cons: { type: "array", items: { type: "string" } },
          recommendation: {
            type: "string",
            enum: ["must-watch", "recommended", "skip"],
          },
        },
        required: ["title", "year", "rating", "pros", "cons", "recommendation"],
      },
    },
  ],
  tool_choice: { type: "tool", name: "movie_review" },
  messages: [{ role: "user", content: "Review The Matrix (1999)" }],
});

const toolBlock = response.content.find((block) => block.type === "tool_use");
if (toolBlock && toolBlock.type === "tool_use") {
  const review = toolBlock.input as {
    title: string;
    year: number;
    rating: number;
    pros: string[];
    cons: string[];
    recommendation: "must-watch" | "recommended" | "skip";
  };
  console.log(`${review.title}: ${review.rating}/10`);
}

Method 2: Prefill (Simpler, Less Reliable)

The prefill trick works by starting the assistant’s response with {, which nudges Claude to continue with valid JSON:

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": (
                "Review The Matrix (1999). Return ONLY a JSON object with "
                "fields: title (string), year (integer), rating (number 0-10), "
                "pros (array of strings), cons (array of strings)."
            )
        },
        {
            "role": "assistant",
            "content": "{"
        }
    ]
)

import json
review = json.loads("{" + response.content[0].text)

This approach works well for simple schemas but has downsides: the model might occasionally include trailing text after the JSON, it does not enforce enum constraints, and there is no formal schema validation. For production use, tool use is the safer choice.

Gemini Structured Outputs (Google)

Google’s Gemini API supports structured output through response_schema in the generation config. Like OpenAI, Gemini uses constrained decoding to guarantee schema-valid output.

Python with Gemini

import google.generativeai as genai

model = genai.GenerativeModel("gemini-2.5-pro")

response = model.generate_content(
    "Review The Matrix (1999)",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema={
            "type": "object",
            "properties": {
                "title": {"type": "string"},
                "year": {"type": "integer"},
                "rating": {"type": "number"},
                "pros": {
                    "type": "array",
                    "items": {"type": "string"}
                },
                "cons": {
                    "type": "array",
                    "items": {"type": "string"}
                },
                "recommendation": {
                    "type": "string",
                    "enum": ["must-watch", "recommended", "skip"]
                }
            },
            "required": ["title", "year", "rating", "pros", "cons", "recommendation"]
        }
    )
)

import json
review = json.loads(response.text)
print(f"{review['title']} ({review['year']}): {review['rating']}/10")

Gemini with Pydantic (via genai SDK)

The Google Gen AI SDK also supports Pydantic-style schema definitions:

from google.generativeai.types import GenerationConfig
from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    year: int
    rating: float
    pros: list[str]
    cons: list[str]
    recommendation: str

model = genai.GenerativeModel("gemini-2.5-pro")

response = model.generate_content(
    "Review The Matrix (1999)",
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=MovieReview,
    )
)

review = MovieReview.model_validate_json(response.text)

Gemini JSON Mode (Without Schema)

Like OpenAI, Gemini supports a simpler JSON mode without schema enforcement:

response = model.generate_content(
    "Review The Matrix (1999). Return JSON with title, rating, and summary fields.",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json"
    )
)

This guarantees valid JSON but does not enforce field names or types. Use response_schema for production.

Function Calling vs. Structured Outputs

These two concepts use the same underlying mechanism (JSON Schema) but serve different purposes. Confusing them is one of the most common mistakes developers make when building LLM applications.

Structured Outputs: Data Extraction

Structured outputs are for when you always want the model to return data in a fixed format. The model does not “decide” to return JSON — it is forced to.

Use cases:

Extracting entities from text (names, dates, amounts)
Classifying content (sentiment, category, priority)
Generating structured content (reviews, reports, summaries)
Transforming data between formats

# Structured output: ALWAYS returns this format
class ExtractedInvoice(BaseModel):
    vendor: str
    amount: float
    currency: str
    date: str
    line_items: list[str]

response = client.beta.chat.completions.parse(
    model="gpt-5",
    response_format=ExtractedInvoice,
    messages=[
        {"role": "system", "content": "Extract invoice details from the text."},
        {"role": "user", "content": invoice_text}
    ]
)

Function Calling: Agentic Actions

Function calling is for when the model should decide whether and which action to take. The model chooses to call a function based on the conversation context.

Use cases:

Searching a database when the user asks a question
Sending an email when the user requests it
Looking up weather data when relevant
Executing multi-step workflows

# Function calling: model DECIDES whether to call these
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the product database",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "max_results": {"type": "integer", "default": 10}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email to a customer",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string"},
                    "subject": {"type": "string"},
                    "body": {"type": "string"}
                },
                "required": ["to", "subject", "body"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-5",
    tools=tools,
    messages=[{"role": "user", "content": "Find laptops under $1000"}]
)

# Model might call search_database, or might respond with text
if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    # Execute the function and return results to the model

Decision Framework

Question	Use Structured Outputs	Use Function Calling
Should the model always return this format?	Yes	No
Does the model need to decide whether to act?	No	Yes
Are you extracting data from text?	Yes	No
Are you building an agent with tools?	No	Yes
Do you need multiple possible actions?	No	Yes

In practice, many applications use both. An agentic system might use function calling for tool selection and structured outputs for formatting the final response to the user.

Best Practices for Schema Design

The quality of your structured output depends heavily on how you design your schema. A well-designed schema guides the model toward better responses and reduces edge cases.

1. Keep Schemas Flat When Possible

Deeply nested schemas increase the chance of errors and make responses harder to parse. If you can flatten the structure, do it.

# Avoid: deeply nested
class BadSchema(BaseModel):
    metadata: dict  # What goes in here? The model has to guess.
    details: dict

# Prefer: flat and explicit
class GoodSchema(BaseModel):
    title: str
    author: str
    published_date: str
    word_count: int
    category: str
    summary: str

2. Use Enums for Categorical Fields

Enums constrain the model to valid values. Without them, you get inconsistent strings like “High”, “high”, “HIGH”, and “h” for the same concept.

from enum import Enum

class Priority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class Ticket(BaseModel):
    title: str
    priority: Priority  # Guaranteed to be one of the four values
    assignee: str

3. Add Descriptions to Properties

Property descriptions act as instructions for the model. They are especially important for ambiguous field names.

# In JSON Schema (Claude, Gemini)
schema = {
    "type": "object",
    "properties": {
        "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1,
            "description": "Confidence score between 0 and 1, where 1 means absolute certainty"
        },
        "reasoning": {
            "type": "string",
            "description": "Step-by-step explanation of how you arrived at the classification"
        }
    }
}

For Pydantic models (OpenAI), use Field:

from pydantic import BaseModel, Field

class Classification(BaseModel):
    label: str = Field(description="The category label")
    confidence: float = Field(
        ge=0, le=1,
        description="Confidence score between 0 and 1"
    )
    reasoning: str = Field(
        description="Step-by-step explanation of the classification"
    )

4. Validate on Your End

Even with “guaranteed” schema adherence, always validate responses in your application code. Schema guarantees cover structure, not semantics. A model might return {"rating": 0.0, "title": ""} — valid JSON, valid schema, but useless data.

def validate_review(review: MovieReview) -> bool:
    if not review.title.strip():
        return False
    if review.rating < 0 or review.rating > 10:
        return False
    if len(review.pros) == 0 and len(review.cons) == 0:
        return False
    return True

5. Design Schemas Visually

For complex schemas with nested objects, arrays, and constraints, use a visual schema builder to design and validate your schema before writing any code. The JSON Schema Builder on DevTk.AI lets you create schemas with drag-and-drop, preview the output, and export the JSON Schema definition directly.

Cost Impact of Structured Outputs

Structured outputs are not just about reliability — they directly reduce your API spend by eliminating unnecessary output tokens.

Unstructured vs. Structured Token Comparison

Consider asking a model to review a movie. Here is what you get without structured output:

The Matrix (1999) is a groundbreaking sci-fi film. Here's my review:

**Rating:** 9.2/10

**Pros:**
- Revolutionary visual effects and "bullet time" sequences
- Deep philosophical themes about reality and choice
- Excellent performances from Keanu Reeves and Laurence Fishburne

**Cons:**
- Some dialogue feels stilted
- The sequels diminished the original's impact somewhat

Overall, I highly recommend The Matrix. It's a must-watch for any sci-fi fan.

That response uses approximately 85-95 output tokens. With structured output:

{
  "title": "The Matrix",
  "year": 1999,
  "rating": 9.2,
  "pros": [
    "Revolutionary visual effects and bullet time sequences",
    "Deep philosophical themes about reality and choice",
    "Excellent performances from Keanu Reeves and Laurence Fishburne"
  ],
  "cons": [
    "Some dialogue feels stilted",
    "Sequels diminished the original's impact"
  ],
  "recommendation": "must-watch"
}

That is approximately 60-70 output tokens. The savings grow with longer responses. For a 500-token unstructured analysis, a structured version might use 200-250 tokens.

Cost Savings at Scale

Using Claude Sonnet 4.5 output pricing ($15.00 per 1M tokens) as a reference:

Scenario	Unstructured Tokens	Structured Tokens	Monthly Savings (1M requests)
Short extraction	80 tokens	35 tokens	$675
Medium analysis	200 tokens	100 tokens	$1,500
Long report	500 tokens	250 tokens	$3,750

At scale, structured outputs can save 40-60% on output token costs. Use the AI Token Counter to measure your actual token counts, and the AI Model Pricing Calculator to project costs across different providers.

Common Pitfalls

Even with structured output support from all major providers, there are several mistakes that trip up developers in production.

Pitfall 1: Trusting the Schema Too Much

Schema enforcement guarantees the structure of the response, not the quality. A model can return {"sentiment": "positive", "confidence": 0.99} for a clearly negative review if the prompt is ambiguous. Always pair structured outputs with clear instructions in the system prompt.

Pitfall 2: Ignoring Refusal Responses

OpenAI’s Structured Outputs can return a refusal instead of parsed data if the model determines the request violates content policies:

response = client.beta.chat.completions.parse(
    model="gpt-5",
    response_format=MySchema,
    messages=[{"role": "user", "content": potentially_harmful_request}]
)

if response.choices[0].message.refusal:
    print(f"Request refused: {response.choices[0].message.refusal}")
else:
    result = response.choices[0].message.parsed

Always check for refusals before accessing .parsed.

Pitfall 3: Oversized Schemas

Large schemas with dozens of fields and deep nesting increase latency and token consumption. The schema itself counts toward context tokens. Split large extraction tasks into multiple focused calls rather than one massive schema.

Pitfall 4: Not Handling Streaming Correctly

When streaming structured outputs, you receive partial JSON that is not valid until the stream completes. Do not attempt to parse until the final chunk arrives:

# OpenAI streaming with structured output
stream = client.beta.chat.completions.stream(
    model="gpt-5",
    response_format=MovieReview,
    messages=[{"role": "user", "content": "Review The Matrix"}]
)

with stream as response:
    for event in response:
        # Partial updates available via event.snapshot
        pass
    # Final parsed result
    final = response.get_final_completion()
    review = final.choices[0].message.parsed

Choosing the Right Approach

Here is a decision tree for selecting the right structured output method:

Do you need 100% guaranteed schema adherence?

Yes: Use OpenAI Structured Outputs or Gemini with response_schema
No (high reliability is fine): Claude tool use works well

Are you already using Anthropic Claude for other reasons (quality, context window, prompt caching)?

Yes: Use Claude tool use with tool_choice — it is reliable enough for most production workloads
No: Consider OpenAI for the schema guarantee

Is your schema simple (flat object, few fields)?

Yes: Any provider works. OpenAI JSON mode or Gemini JSON mode is the simplest
No (nested objects, enums, arrays of objects): Use OpenAI Structured Outputs with Pydantic for the best developer experience

Are you building an agentic system where the model picks which tools to call?

Yes: Use function calling (all providers support it)
No: Use structured outputs (fixed format every time)

Bottom Line

Structured outputs have gone from a hack (regex parsing, retry loops) to a first-class feature across all major LLM providers. As of February 2026, OpenAI offers the most developer-friendly implementation with native Pydantic support and 100% schema guarantees. Claude delivers excellent reliability through tool use, with the added benefit of Anthropic’s strong instruction following. Gemini provides schema-enforced output with constrained decoding, similar to OpenAI’s approach.

For most production applications, the choice comes down to which provider you are already using. If you are starting fresh and structured output is a critical requirement, OpenAI’s Structured Outputs with Pydantic models gives you the least friction. If you are already invested in Claude’s ecosystem and value its reasoning quality, tool use with forced tool_choice is production-ready.

Regardless of provider, invest time in schema design. Keep schemas flat, use enums for categorical fields, add descriptions to ambiguous properties, and always validate the semantic content of responses even when the structural format is guaranteed.

Related tools and guides:

JSON Schema Builder — Design structured output schemas visually with drag-and-drop
AI Token Counter — Measure token counts before and after switching to structured output
AI Model Pricing Calculator — Compare costs across 40+ models with your usage pattern
How to Cut AI API Costs by 80% — Structured outputs are one of eight strategies covered
AI API Pricing Comparison 2026 — Full pricing table for all providers
Claude API Pricing Guide 2026 — Claude tool use pricing and prompt caching details
OpenAI API Pricing Guide 2026 — GPT-5 structured output pricing