AI 结构化 JSON 输出：各模型支持情况与代码示例（2026）

如果你正在用 LLM 构建生产级应用，那你一定遇到过这种情况：模型返回了一段”看起来像 JSON”的文本，但 JSON.parse() 直接报错。或者结构大致对了，但某个字段名拼错、类型不匹配，下游代码全崩。

这不是偶发问题，而是 LLM 应用开发中最常见的工程痛点之一。大模型本质上是文本生成器，它不”理解” JSON 语法，也不关心你的 Schema 约束。你在 prompt 里写得再详细，它偶尔还是会返回带注释的 JSON、缺少逗号的 JSON、甚至在 JSON 前面加一句”好的，这是你要的 JSON：”。

好消息是，截至 2026 年 2 月，三大 API 厂商——OpenAI、Anthropic、Google——都提供了不同层级的结构化输出能力。从最基础的 JSON Mode 到严格的 Schema 验证，再到 Function Calling，开发者有了多种武器来确保模型输出符合预期。

这篇文章会系统性地梳理这些能力，包括各厂商的实现差异、完整的代码示例，以及在不同场景下该如何选择。

为什么结构化输出对生产应用至关重要

先说清楚一个前提：如果你只是在做实验或写 demo，结构化输出没那么重要。随手加一句”请以 JSON 格式返回”就够用了。

但在生产环境中，情况完全不同。你的代码需要可靠地解析模型返回的数据，然后传给下一个环节——写入数据库、调用第三方 API、渲染前端 UI。任何一次解析失败都意味着一次用户可见的错误、一次重试开销、甚至一次数据损坏。

没有结构化输出时的典型问题

# 你期望模型返回这样的 JSON：
# {"name": "张三", "age": 28, "skills": ["Python", "TypeScript"]}

# 但实际你可能得到：
response_1 = '```json\n{"name": "张三", "age": 28, "skills": ["Python", "TypeScript"]}\n```'
# 问题：被 Markdown 代码块包裹了

response_2 = '{"name": "张三", "age": "28", "skills": ["Python", "TypeScript"]}'
# 问题：age 应该是 int，但返回了 string

response_3 = '{"name": "张三", "age": 28, "skills": ["Python", "TypeScript"], }'
# 问题：末尾多了一个逗号，不是合法 JSON

response_4 = '好的，以下是提取结果：\n{"name": "张三", "age": 28}'
# 问题：JSON 前面有多余文字

这些问题在生产环境中会以不同频率出现。根据实际经验，纯 prompt 指令下的 JSON 输出成功率大约在 85-95% 之间——听起来还行，但如果你每天有 10 万次调用，这意味着每天 5000-15000 次失败。

结构化输出解决了什么

能力层级	保证	代表实现
JSON Mode	输出是合法 JSON	OpenAI `response_format: {type: "json_object"}`
JSON Schema	输出严格匹配指定 Schema	OpenAI `response_format: {type: "json_schema"}`
Function Calling (tool_use)	参数严格匹配函数签名	所有三大厂商
Constrained Decoding	从 token 生成层面强制约束	OpenAI Structured Outputs 底层机制

从下到上，约束力逐级增强。JSON Mode 只保证输出是合法 JSON，但不保证字段名和类型。JSON Schema 在此基础上增加了 Schema 验证。而 Constrained Decoding（约束解码）是在模型生成 token 时就排除不符合 Schema 的选项，从根源上杜绝格式错误。

三大厂商结构化输出能力对比

截至 2026 年 2 月，三家的能力差异仍然显著：

特性	OpenAI (GPT-5 / GPT-4.1)	Anthropic (Claude 4.5)	Google (Gemini 2.5)
JSON Mode	支持	不支持（通过 prompt 实现）	支持 (`response_mime_type`)
JSON Schema 约束	支持（Structured Outputs）	不支持（用 tool_use 替代）	支持 (`response_schema`)
Function Calling	支持（严格模式 `strict: true`）	支持（`tool_use`）	支持
Constrained Decoding	支持	不支持	支持
Schema 子集限制	仅 `additionalProperties: false`，不支持 `minItems`/`maxItems`	N/A（用 JSON Schema 描述 tool input）	部分关键字不支持
100% 格式保证	是（Structured Outputs 模式）	否（但 tool_use 成功率 >99%）	是（`response_schema` 模式）
嵌套对象支持	支持	支持	支持
枚举 (enum)	支持	支持	支持

核心差异： OpenAI 是目前结构化输出能力最完整的——从 JSON Mode 到 Structured Outputs 再到严格模式的 Function Calling，三级能力齐全。Google Gemini 紧随其后，response_schema 配合 response_mime_type 也能做到强约束。Anthropic Claude 走了一条不同的路——没有独立的 JSON Mode 或 Schema 约束，而是通过 tool_use（Function Calling）机制实现结构化输出，实际效果同样可靠，只是用法上需要一点”hack”。

OpenAI Structured Outputs——最完整的方案

OpenAI 的 Structured Outputs 是截至目前最成熟的结构化输出实现。它在 token 生成层面引入了约束解码（Constrained Decoding），模型在采样每个 token 时会参考你提供的 JSON Schema，从根源上排除不合法的 token 序列。这意味着输出 100% 符合你的 Schema——不是 99.9%，是 100%。

Python 示例：使用 Pydantic 定义 Schema

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

# 用 Pydantic 模型定义期望的输出结构
class ExtractedEntity(BaseModel):
    name: str
    entity_type: str  # "person" | "company" | "location"
    confidence: float

class ExtractionResult(BaseModel):
    entities: list[ExtractedEntity]
    summary: str
    language: str

# 发送请求——response_format 绑定 Pydantic 模型
response = client.beta.chat.completions.parse(
    model="gpt-5",
    messages=[
        {
            "role": "system",
            "content": "从用户提供的文本中提取所有命名实体。返回实体列表、文本摘要和检测到的语言。"
        },
        {
            "role": "user",
            "content": "阿里巴巴集团昨日宣布，张勇将于下月卸任CEO一职。总部位于杭州的阿里巴巴同时公布了新的组织架构调整方案。"
        }
    ],
    response_format=ExtractionResult,
)

# 返回值已经是类型安全的 Python 对象
result = response.choices[0].message.parsed

print(f"检测语言: {result.language}")
print(f"摘要: {result.summary}")
for entity in result.entities:
    print(f"  {entity.name} ({entity.entity_type}) - 置信度: {entity.confidence}")

输出示例：

检测语言: zh
摘要: 阿里巴巴集团宣布CEO变更及组织架构调整
  阿里巴巴集团 (company) - 置信度: 0.98
  张勇 (person) - 置信度: 0.97
  杭州 (location) - 置信度: 0.95

TypeScript 示例：使用 Zod 定义 Schema

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const client = new OpenAI();

// 用 Zod 定义输出结构
const SentimentResult = z.object({
  sentiment: z.enum(["positive", "negative", "neutral", "mixed"]),
  confidence: z.number(),
  key_phrases: z.array(z.string()),
  aspects: z.array(
    z.object({
      aspect: z.string(),
      sentiment: z.enum(["positive", "negative", "neutral"]),
      reason: z.string(),
    })
  ),
});

async function analyzeSentiment(text: string) {
  const response = await client.beta.chat.completions.parse({
    model: "gpt-5",
    messages: [
      {
        role: "system",
        content:
          "分析用户输入文本的情感。返回整体情感、置信度、关键短语和各维度的情感分析。",
      },
      { role: "user", content: text },
    ],
    response_format: zodResponseFormat(SentimentResult, "sentiment_result"),
  });

  // 返回值类型安全，TS 编译器能推断所有字段
  const result = response.choices[0].message.parsed!;
  return result;
}

// 使用
const analysis = await analyzeSentiment(
  "这家餐厅的菜品非常好吃，但服务态度很差，等了四十分钟才上菜。"
);
console.log(analysis.sentiment); // "mixed"
console.log(analysis.aspects);
// [
//   { aspect: "菜品", sentiment: "positive", reason: "非常好吃" },
//   { aspect: "服务", sentiment: "negative", reason: "态度差，等待时间过长" }
// ]

OpenAI Schema 限制

使用 Structured Outputs 时需要注意几个约束：

所有对象必须设置 additionalProperties: false
不支持 minItems、maxItems、minLength、maxLength 等校验关键字
不支持 pattern 正则约束
不支持 format（如 "format": "email"）
所有字段默认 required，可选字段需要用 union type "type": ["string", "null"]

这些限制是约束解码机制的代价——模型需要在采样时实时匹配 Schema，过于复杂的约束会导致性能问题。对于需要更精细校验的场景，建议在获取到结构化输出后，用 Pydantic validator 或 Zod refine 做二次校验。

Claude 结构化输出——tool_use 方式

Anthropic 的 Claude 没有提供独立的 JSON Mode 或 Structured Outputs 接口。官方推荐的方案是通过 tool_use（Function Calling）机制来获取结构化输出——定义一个”虚拟工具”，让模型以调用该工具的方式返回结构化数据。

这个方案看起来有点 hack，但实际效果非常稳定。因为 Claude 的 tool_use 机制本身就对参数做了 JSON Schema 校验，输出的格式可靠性在 99% 以上。

Python 示例：使用 tool_use 获取结构化输出

import anthropic
import json

client = anthropic.Anthropic()

# 定义一个"虚拟工具"——实际上不会调用任何函数
# 只是利用 tool_use 机制让模型返回符合 Schema 的 JSON
extraction_tool = {
    "name": "save_extraction_result",
    "description": "保存从文本中提取的结构化信息。必须对每段输入文本调用此工具。",
    "input_schema": {
        "type": "object",
        "properties": {
            "entities": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string", "description": "实体名称"},
                        "entity_type": {
                            "type": "string",
                            "enum": ["person", "company", "location", "date", "money"]
                        },
                        "confidence": {
                            "type": "number",
                            "description": "置信度 0-1"
                        }
                    },
                    "required": ["name", "entity_type", "confidence"]
                }
            },
            "summary": {
                "type": "string",
                "description": "原文的一句话摘要"
            },
            "language": {
                "type": "string",
                "enum": ["zh", "en", "ja", "ko", "other"]
            }
        },
        "required": ["entities", "summary", "language"]
    }
}

response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=4096,
    tools=[extraction_tool],
    tool_choice={"type": "tool", "name": "save_extraction_result"},  # 强制调用此工具
    messages=[
        {
            "role": "user",
            "content": "提取以下文本中的实体：阿里巴巴集团昨日宣布，张勇将于下月卸任CEO一职。总部位于杭州的阿里巴巴同时公布了新的组织架构调整方案。"
        }
    ]
)

# 从 tool_use 响应中提取结构化数据
for block in response.content:
    if block.type == "tool_use":
        result = block.input  # 已经是 Python dict，无需 json.loads
        print(json.dumps(result, ensure_ascii=False, indent=2))

输出示例：

{
  "entities": [
    {"name": "阿里巴巴集团", "entity_type": "company", "confidence": 0.98},
    {"name": "张勇", "entity_type": "person", "confidence": 0.97},
    {"name": "杭州", "entity_type": "location", "confidence": 0.95}
  ],
  "summary": "阿里巴巴宣布张勇卸任CEO并调整组织架构",
  "language": "zh"
}

TypeScript 示例

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

interface ExtractionResult {
  entities: Array<{
    name: string;
    entity_type: "person" | "company" | "location" | "date" | "money";
    confidence: number;
  }>;
  summary: string;
  language: "zh" | "en" | "ja" | "ko" | "other";
}

async function extractEntities(text: string): Promise<ExtractionResult> {
  const response = await client.messages.create({
    model: "claude-sonnet-4-5-20250514",
    max_tokens: 4096,
    tools: [
      {
        name: "save_extraction_result",
        description: "保存提取结果。对每段文本必须调用此工具。",
        input_schema: {
          type: "object" as const,
          properties: {
            entities: {
              type: "array",
              items: {
                type: "object",
                properties: {
                  name: { type: "string" },
                  entity_type: {
                    type: "string",
                    enum: ["person", "company", "location", "date", "money"],
                  },
                  confidence: { type: "number" },
                },
                required: ["name", "entity_type", "confidence"],
              },
            },
            summary: { type: "string" },
            language: {
              type: "string",
              enum: ["zh", "en", "ja", "ko", "other"],
            },
          },
          required: ["entities", "summary", "language"],
        },
      },
    ],
    tool_choice: { type: "tool", name: "save_extraction_result" },
    messages: [{ role: "user", content: `提取以下文本中的实体：${text}` }],
  });

  const toolBlock = response.content.find((block) => block.type === "tool_use");
  if (!toolBlock || toolBlock.type !== "tool_use") {
    throw new Error("模型未返回 tool_use 响应");
  }
  return toolBlock.input as ExtractionResult;
}

Claude 方案的优缺点

优点：

tool_choice: {"type": "tool", "name": "xxx"} 强制模型调用指定工具，几乎不会跳过
JSON Schema 描述能力完整，支持嵌套对象、数组、枚举
同一个请求可以定义多个工具，让模型根据场景选择不同的输出结构

缺点：

没有 100% 格式保证（极少数情况下可能出现 Schema 不匹配）
语义上不太直观——你定义的是”工具”，但实际目的是获取结构化数据
tool_use 会消耗额外的 output token（工具名、参数名等），成本略高于纯文本输出

Gemini 结构化输出——response_schema 方式

Google 的 Gemini 2.5 系列提供了原生的结构化输出支持，通过 response_mime_type 和 response_schema 两个参数实现。这种方式和 OpenAI 的 Structured Outputs 类似，都能做到强约束。

Python 示例：使用 google-genai SDK

from google import genai
from google.genai import types
import json

client = genai.Client(api_key="your-api-key")

# 定义输出 Schema
extraction_schema = types.Schema(
    type=types.Type.OBJECT,
    properties={
        "entities": types.Schema(
            type=types.Type.ARRAY,
            items=types.Schema(
                type=types.Type.OBJECT,
                properties={
                    "name": types.Schema(type=types.Type.STRING),
                    "entity_type": types.Schema(
                        type=types.Type.STRING,
                        enum=["person", "company", "location", "date", "money"]
                    ),
                    "confidence": types.Schema(type=types.Type.NUMBER),
                },
                required=["name", "entity_type", "confidence"]
            )
        ),
        "summary": types.Schema(type=types.Type.STRING),
        "language": types.Schema(
            type=types.Type.STRING,
            enum=["zh", "en", "ja", "ko", "other"]
        ),
    },
    required=["entities", "summary", "language"]
)

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="提取以下文本中的实体：阿里巴巴集团昨日宣布，张勇将于下月卸任CEO一职。总部位于杭州的阿里巴巴同时公布了新的组织架构调整方案。",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=extraction_schema,
    )
)

result = json.loads(response.text)
print(json.dumps(result, ensure_ascii=False, indent=2))

TypeScript 示例：使用 REST API

const API_KEY = process.env.GEMINI_API_KEY;

interface ExtractionResult {
  entities: Array<{
    name: string;
    entity_type: string;
    confidence: number;
  }>;
  summary: string;
  language: string;
}

async function extractWithGemini(text: string): Promise<ExtractionResult> {
  const response = await fetch(
    `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=${API_KEY}`,
    {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        contents: [
          {
            parts: [{ text: `提取以下文本中的实体：${text}` }],
          },
        ],
        generationConfig: {
          responseMimeType: "application/json",
          responseSchema: {
            type: "OBJECT",
            properties: {
              entities: {
                type: "ARRAY",
                items: {
                  type: "OBJECT",
                  properties: {
                    name: { type: "STRING" },
                    entity_type: {
                      type: "STRING",
                      enum: ["person", "company", "location", "date", "money"],
                    },
                    confidence: { type: "NUMBER" },
                  },
                  required: ["name", "entity_type", "confidence"],
                },
              },
              summary: { type: "STRING" },
              language: {
                type: "STRING",
                enum: ["zh", "en", "ja", "ko", "other"],
              },
            },
            required: ["entities", "summary", "language"],
          },
        },
      }),
    }
  );

  const data = await response.json();
  return JSON.parse(data.candidates[0].content.parts[0].text);
}

Gemini Schema 限制

类型名使用大写形式（STRING、NUMBER、OBJECT、ARRAY）
不支持 $ref 引用和递归 Schema
不支持 oneOf、anyOf、allOf 组合关键字
枚举值只支持字符串类型
嵌套深度有限制（一般不超过 5 层）

Function Calling vs 结构化输出——何时用哪个

这是很多开发者困惑的问题。两者都能获取结构化的 JSON 输出，但设计目的不同，适用场景也不同。

本质区别

维度	结构化输出 (Structured Outputs)	Function Calling (tool_use)
设计目的	获取格式化的数据	让模型调用外部工具/函数
输出内容	直接就是你要的数据	工具名 + 参数（你需要执行工具并返回结果）
交互流程	单轮：请求 -> 结构化响应	多轮：请求 -> 工具调用 -> 执行 -> 返回结果 -> 最终回复
适用场景	信息提取、分类、格式转换	查询数据库、调用 API、执行操作

选择决策树

用结构化输出的场景：

从文本中提取结构化信息（NER、情感分析、分类）
将非结构化输入转换为结构化格式（简历解析、发票识别）
需要模型直接输出可用数据，不需要后续交互
对输出格式有严格要求，容不得任何偏差

用 Function Calling 的场景：

模型需要查询外部数据（数据库、API）才能回答问题
执行具体操作（发邮件、创建工单、修改文件）
构建 AI Agent，需要模型自主决策调用哪些工具
需要多轮工具调用的复杂工作流

特殊情况——用 Claude 时两者合一：

如前所述，Claude 没有独立的结构化输出 API，所以你只能用 tool_use 来实现结构化输出。这意味着在 Claude 生态中，Function Calling 同时承担了两个角色。

代码对比：同一个任务的两种实现

场景：从产品评论中提取评分和关键信息。

方式 1：OpenAI Structured Outputs

from pydantic import BaseModel
from openai import OpenAI

class ReviewAnalysis(BaseModel):
    rating: float
    pros: list[str]
    cons: list[str]
    recommended: bool

client = OpenAI()
response = client.beta.chat.completions.parse(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "分析产品评论，提取评分和关键信息。"},
        {"role": "user", "content": "这款耳机音质很棒，降噪效果一流，但续航只有4小时太短了，而且佩戴时间长了耳朵会疼。总体7.5分。"}
    ],
    response_format=ReviewAnalysis,
)
result = response.choices[0].message.parsed
# result.rating = 7.5
# result.pros = ["音质好", "降噪效果一流"]
# result.cons = ["续航短(4小时)", "长时间佩戴不适"]
# result.recommended = True

方式 2：Claude tool_use

import anthropic

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    tools=[{
        "name": "save_review_analysis",
        "description": "保存评论分析结果",
        "input_schema": {
            "type": "object",
            "properties": {
                "rating": {"type": "number"},
                "pros": {"type": "array", "items": {"type": "string"}},
                "cons": {"type": "array", "items": {"type": "string"}},
                "recommended": {"type": "boolean"}
            },
            "required": ["rating", "pros", "cons", "recommended"]
        }
    }],
    tool_choice={"type": "tool", "name": "save_review_analysis"},
    messages=[
        {"role": "user", "content": "分析这条评论：这款耳机音质很棒，降噪效果一流，但续航只有4小时太短了，而且佩戴时间长了耳朵会疼。总体7.5分。"}
    ]
)
result = next(b.input for b in response.content if b.type == "tool_use")
# 结果相同，只是获取方式不同

两种方式都能完成任务，区别在于 OpenAI 方案更简洁（直接 .parse() 获取类型化对象），Claude 方案需要从 tool_use 响应中手动提取。

最佳实践

经过大量生产环境的使用，以下是我们总结的结构化输出最佳实践。

1. 选择正确的约束强度

不是所有场景都需要最强的约束。约束越强，灵活性越低，有时候也会影响模型的”思考质量”。

场景	推荐方案	理由
数据提取 / ETL 管道	Structured Outputs (严格 Schema)	100% 可靠性，零解析错误
内容分类 / 情感分析	JSON Mode + 简单验证	Schema 简单，不需要严格约束
多轮对话中穿插结构化输出	Function Calling	灵活性更好，模型可以选择何时调用
复杂推理 + 结构化结果	先让模型自由推理，再用结构化输出提取结论	严格 Schema 可能限制推理能力

2. Schema 设计原则

# 不好的 Schema——字段太多，描述模糊
bad_schema = {
    "type": "object",
    "properties": {
        "d": {"type": "string"},   # 字段名含义不明
        "v": {"type": "number"},
        "f": {"type": "boolean"},
    }
}

# 好的 Schema——字段名自解释，有 description
good_schema = {
    "type": "object",
    "properties": {
        "diagnosis": {
            "type": "string",
            "description": "问题的根因诊断，一句话概括"
        },
        "severity": {
            "type": "number",
            "description": "严重程度评分，1-10 之间的整数"
        },
        "needs_escalation": {
            "type": "boolean",
            "description": "是否需要升级处理"
        }
    },
    "required": ["diagnosis", "severity", "needs_escalation"]
}

关键原则：

字段名要自解释。 模型依赖字段名来理解你的意图，customer_sentiment 比 cs 好得多。
善用 description。 尤其是枚举值、数值范围等信息，写在 description 里模型能更好地理解。
控制嵌套深度。 超过 3 层嵌套会显著降低输出质量。如果结构过于复杂，考虑分成多次请求。
可选字段用 nullable。 不是所有字段都能从输入中提取到，用 "type": ["string", "null"] 标记可选字段，比要求模型填空字符串更合理。

在设计 Schema 时，可以使用 JSON Schema 构建器可视化创建和验证你的 Schema，然后直接复制到代码中使用。

3. 错误处理和降级策略

即使使用了结构化输出，生产代码仍然需要错误处理。可能的失败包括：模型拒绝响应（安全过滤）、网络超时、rate limit、以及极少数的 Schema 不匹配。

from pydantic import BaseModel, ValidationError
from openai import OpenAI
import json

client = OpenAI()

class AnalysisResult(BaseModel):
    category: str
    confidence: float
    explanation: str

def analyze_with_fallback(text: str) -> dict:
    """带降级策略的结构化输出"""
    # 方案 1：尝试 Structured Outputs（100% 可靠）
    try:
        response = client.beta.chat.completions.parse(
            model="gpt-5",
            messages=[
                {"role": "system", "content": "分析文本并分类。"},
                {"role": "user", "content": text}
            ],
            response_format=AnalysisResult,
        )
        if response.choices[0].message.parsed:
            return response.choices[0].message.parsed.model_dump()
    except Exception as e:
        print(f"Structured Outputs 失败: {e}")

    # 方案 2：降级到 JSON Mode
    try:
        response = client.chat.completions.create(
            model="gpt-5",
            messages=[
                {"role": "system", "content": "分析文本并分类。返回 JSON：{category, confidence, explanation}"},
                {"role": "user", "content": text}
            ],
            response_format={"type": "json_object"},
        )
        data = json.loads(response.choices[0].message.content)
        return AnalysisResult(**data).model_dump()  # Pydantic 校验
    except (json.JSONDecodeError, ValidationError) as e:
        print(f"JSON Mode 降级也失败: {e}")

    # 方案 3：最终降级——纯文本 + 正则提取
    response = client.chat.completions.create(
        model="gpt-5",
        messages=[
            {"role": "system", "content": "分析文本并按格式返回：类别: xxx\n置信度: 0.xx\n解释: xxx"},
            {"role": "user", "content": text}
        ],
    )
    # 正则解析...（省略）
    return {"category": "unknown", "confidence": 0.0, "explanation": "降级响应"}

4. 分离推理与输出

对于需要复杂推理的任务，不要强迫模型在结构化约束下”思考”。更好的方式是分两步：

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Decision(BaseModel):
    approved: bool
    risk_level: str  # "low" | "medium" | "high"
    reasons: list[str]

# 第一步：让模型自由推理（不加结构化约束）
reasoning_response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "你是贷款审批专家。详细分析以下申请人信息，逐项评估风险。"},
        {"role": "user", "content": applicant_data}
    ],
)
reasoning = reasoning_response.choices[0].message.content

# 第二步：基于推理结果提取结构化决策
decision_response = client.beta.chat.completions.parse(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "基于以下分析，提取最终决策。"},
        {"role": "user", "content": reasoning}
    ],
    response_format=Decision,
)
decision = decision_response.choices[0].message.parsed

这种两步法的好处是：第一步不受 Schema 约束，模型可以充分推理；第二步做纯提取，结构化输出可靠性最高。缺点是多了一次 API 调用，但如果任务复杂度高，这个成本值得。

成本影响——结构化输出如何节省 50-70% output tokens

这是很多人忽略的一个重要优势：结构化输出不仅提高了可靠性，还能显著降低 token 消耗。

为什么结构化输出更省 token

对比两种输出方式：

无结构化约束时模型的典型回复（约 120 tokens）：

根据分析，这条评论的情感是混合的。积极方面包括音质好和降噪效果好，
消极方面是续航短和佩戴不舒适。综合评分 7.5 分（满分 10 分）。
我认为这款产品还是值得推荐的，虽然存在一些不足。

以下是结构化结果：
{
  "rating": 7.5,
  "pros": ["音质好", "降噪效果好"],
  "cons": ["续航短", "佩戴不适"],
  "recommended": true
}

有结构化约束时的回复（约 45 tokens）：

{"rating":7.5,"pros":["音质好","降噪效果好"],"cons":["续航短","佩戴不适"],"recommended":true}

同样的信息，token 消耗从 120 降到了 45——省了 62%。模型不再输出废话和重复内容，只返回 Schema 要求的字段。

成本节省实测数据

以每天 10 万次请求的实体提取任务为例，对比有无结构化输出的月度成本：

模型	无结构化 (月费)	有结构化 (月费)	节省
GPT-5 ($10/M output)	平均 150 output tokens x 3M 请求 x $10 = $4,500	平均 55 tokens x 3M x $10 = $1,650	$2,850 (63%)
Claude Sonnet 4.5 ($15/M)	$6,750	$2,475	$4,275 (63%)
Gemini 2.5 Pro ($10/M)	$4,500	$1,650	$2,850 (63%)

output token 的单价通常是 input token 的 3-8 倍，所以减少 output token 的效果比减少 input token 更明显。结构化输出在产出更可靠数据的同时，顺手帮你省了一大笔钱——可以说是双赢。

想精确估算你的场景能省多少？用 AI Token 计算器测量结构化和非结构化输出的 token 差异，然后在 AI 模型价格计算器中计算月度节省。

总结

结构化输出已经从”可选优化”变成了生产级 LLM 应用的标配。回顾三大厂商的方案：

OpenAI 提供了最完整的能力栈——从 JSON Mode 到 Structured Outputs 到严格模式的 Function Calling，约束解码机制保证了 100% 的格式合规。如果你主要用 GPT-5/GPT-4.1，Structured Outputs + Pydantic 是最优选择。

Anthropic Claude 通过 tool_use 机制实现结构化输出，虽然不是原生支持，但实际可靠性很高（>99%）。tool_choice 强制模式让你可以控制模型必须返回结构化数据。如果你的项目同时需要 Function Calling 和结构化输出，Claude 的方案反而更统一。

Google Gemini 的 response_schema + response_mime_type 组合也能实现强约束。加上 Gemini 2.5 Flash 极低的价格（$0.15/$0.60），在大批量数据提取场景下成本优势明显。

无论选择哪家厂商，核心原则是一样的：

生产环境必须用结构化输出——不要依赖 prompt 指令获取 JSON
Schema 设计要清晰——字段名自解释，善用 description 和 enum
分离推理与输出——复杂任务先自由推理，再结构化提取
做好降级策略——从 Structured Outputs 到 JSON Mode 到纯文本，逐级降级

结构化输出不仅提高了可靠性（从 85-95% 提升到 99-100%），还能节省 50-70% 的 output token 成本。这是为数不多的”既提升质量又降低成本”的优化手段。