DevTk.AI
OpenAI API PricingGPT-5.5 PricingGPT-5.2 Codex PricingChatGPT APIAPI CostsLLM Pricing

Current OpenAI API Pricing 2026: GPT-5.5, GPT-5.4, GPT-4o & Codex Costs

Current OpenAI API pricing per 1M tokens for GPT-5.5, GPT-5.4, GPT-5.2-Codex, GPT-5, GPT-4o, and o3. Includes cached input, Batch/Flex discounts, long-context pricing, and monthly cost examples.

DevTk.AI 2026-02-24 Updated 2026-05-10 9 min read

OpenAI API pricing in 2026 is easiest to reason about per 1 million tokens. If you need the current OpenAI API price quickly: GPT-5.5 is $5/M input and $30/M output, GPT-5.4 is $2.50/M input and $15/M output, and GPT-5.2-Codex is $1.75/M input and $14/M output before caching, Batch, Flex, or long-context modifiers.

For developer workloads, the practical shortlist is now GPT-5.5 for the hardest coding and agent tasks, GPT-5.4 for lower-cost frontier work, GPT-5.2-Codex for dedicated coding-agent API flows, and cheaper GPT-5/GPT-4o/o3-family models for compatibility or specialized routing.

This guide focuses on facts that change often: model names, token pricing, context windows, processing modes, and rate-limit behavior. Always confirm availability in your OpenAI dashboard before migrating production traffic.

Quick Answer: OpenAI API Prices Per 1M Tokens

ModelInputCached inputOutputBest fit
GPT-5.5$5.00$0.50$30.00Hard coding, agents, long-context professional work
GPT-5.4$2.50$0.25$15.00Frontier quality at lower cost
GPT-5.2-Codex$1.75$0.175$14.00Dedicated Codex API coding-agent workloads
GPT-5$1.25-$10.00Existing GPT-5 integrations
GPT-5 Mini$0.25-$2.00Cost-sensitive production routing
GPT-4o mini$0.15-$0.60Legacy budget multimodal work

For a coding agent that uses 2M input tokens + 500K output tokens per month, the rough standard-price bill is about $25 on GPT-5.5, $12.50 on GPT-5.4, $10.50 on GPT-5.2-Codex, or $7.50 on GPT-5 before caching, Batch, or Flex discounts. Use the AI Model Pricing Calculator for your own token mix, or compare provider limits in the AI API rate limits guide.

OpenAI API Pricing Table (May 2026)

Prices are USD per 1 million tokens. For the latest GPT-5.5 and GPT-5.4 models, OpenAI lists separate short-context and long-context rates; long context starts above roughly 270K input tokens.

ModelStandard inputCached inputStandard outputLong-context inputLong-context outputContextMax outputBest for
GPT-5.5$5.00$0.50$30.00$10.00$45.001M128KHard coding, agents, professional work
GPT-5.5 Pro$30.00-$180.00$60.00$270.001M128KHighest-accuracy work
GPT-5.4$2.50$0.25$15.00$5.00$22.501M128KFrontier quality at lower cost
GPT-5.4 mini$0.75$0.075$4.50--400K128KLower-latency, lower-cost production
GPT-5.4 nano$0.20$0.02$1.25--See docsSee docsHigh-volume simple work
GPT-5.2-Codex$1.75$0.175$14.00--400K128KDedicated Codex API agent work

Source: OpenAI API pricing and OpenAI models.

Compatibility Models In The Site Catalog

The DevTk.AI canonical model table also keeps these OpenAI families for existing integrations and historical comparison:

ModelInputOutputContextMax outputUse when
GPT-5$1.25$10.00400K128KYou need the older GPT-5 baseline already deployed
GPT-5 Mini$0.25$2.00400K16KCost-sensitive GPT-5-family workloads
GPT-5 Nano$0.05$0.40128K16KVery high-volume routing and extraction
GPT-4o$2.50$10.00128K16KLegacy multimodal integrations
GPT-4o mini$0.15$0.60128K16KLegacy budget multimodal integrations
o3-pro$20.00$80.00200K100KHighest-cost reasoning tasks
o3$2.00$8.00200K100KStandard reasoning tasks
o3-mini$1.10$4.40200K100KLower-cost reasoning

These entries are maintained in src/data/models.ts; check OpenAI’s live pricing page before quoting them in contracts or customer-facing estimates.

Batch, Flex, Priority, And Data Residency

OpenAI now exposes pricing by processing mode for the latest GPT-5.5 and GPT-5.4 family:

ModePricing behaviorUse it for
StandardBaseline published token ratesInteractive production requests
Batch50% of standard ratesOffline jobs that can wait for asynchronous processing
Flex50% of standard ratesCost-sensitive work that can tolerate variable latency
Priority2.5x standard rates for listed modelsLatency-sensitive production spikes
Data residency / regional processing10% uplift for listed GPT-5.5 and GPT-5.4 modelsWorkloads with regional processing requirements

Batch and Flex cut GPT-5.5 standard short-context pricing from $5/$30 to $2.50/$15 per million input/output tokens. Priority raises GPT-5.5 short-context pricing to $12.50/$75.

Rate Limits And Usage Tiers

Do not hard-code public RPM or TPM tables into your planning. OpenAI states that rate limits are set at the organization and project level, vary by model, may be shared by model family, and can include separate limits for long-context requests.

OpenAI’s docs also distinguish rate limits from monthly usage limits. Your account can automatically graduate to higher usage tiers as API spend increases, but the exact limits for your organization should be read from the OpenAI dashboard.

Key planning rules:

  • Track RPM, RPD, TPM, TPD, and where relevant IPM.
  • Treat long-context workloads separately because they can have separate limits.
  • Check shared-limit groups before assuming two related model IDs provide independent capacity.
  • Monitor Batch queue limits; queued tokens count until the batch completes.

Source: OpenAI rate limits and usage tiers.

Monthly Cost Examples

These examples use the current GPT-5.5/GPT-5.4 published short-context standard rates unless marked Batch.

Solo Developer

3M input + 1.5M output tokens per month.

ModelMonthly cost
GPT-5.4 nano$2.48
GPT-5.4 mini$9.00
GPT-5.4$30.00
GPT-5.5$60.00
GPT-5.5 Batch/Flex$30.00

Startup Team

30M input + 15M output tokens per month.

ModelMonthly cost
GPT-5.4 nano$24.75
GPT-5.4 mini$90.00
GPT-5.4$300.00
GPT-5.5$600.00
GPT-5.5 Batch/Flex$300.00

Production Scale

300M input + 150M output tokens per month.

ModelMonthly cost
GPT-5.4 nano$247.50
GPT-5.4 mini$900.00
GPT-5.4$3,000.00
GPT-5.5$6,000.00
GPT-5.5 Batch/Flex$3,000.00

Long-context requests above the pricing threshold cost more for GPT-5.5, GPT-5.5 Pro, and GPT-5.4, so run your real prompts through a token counter before budgeting large document or repository workflows.

Want exact numbers for your usage pattern? Try our AI Model Pricing Calculator.

Which OpenAI Model Should You Choose?

GPT-5.5: Best For Hard Production Work

Start with GPT-5.5 when quality matters more than token price: coding agents, tool-heavy workflows, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-facing workflows where polished execution matters.

GPT-5.5 Pro: Highest Accuracy, Highest Cost

Use GPT-5.5 Pro only after evaluations show that the higher price produces enough quality lift. It is priced for the hardest professional tasks, not routine traffic.

GPT-5.4: Frontier Quality At A Lower Price

GPT-5.4 is the practical fallback when GPT-5.5 is too expensive but you still need a 1M context frontier model.

GPT-5.4 Mini And Nano: Default Routing Targets

Use GPT-5.4 mini for ordinary production requests that need good quality at lower latency and lower cost. Use GPT-5.4 nano for simple classification, extraction, tagging, routing, and formatting.

GPT-5, GPT-4o, And o3 Families: Compatibility And Specialized Routing

Keep existing GPT-5 or GPT-4o integrations if migration risk is higher than the expected savings. Route math, logic, and complex multi-step reasoning to the o3 family only when evals show it beats the general GPT-5 family on your task.

Getting Started With The Current API

Prefer the Responses API for new work unless you have an existing Chat Completions integration.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Review this API design and identify the highest-risk edge cases."
)

print(response.output_text)

For lower-cost routing:

response = client.responses.create(
    model="gpt-5.4-mini",
    input="Extract company names, dates, and dollar amounts as JSON."
)

Cost Optimization Checklist

  • Route simple tasks to mini or nano models before using GPT-5.5.
  • Use Batch or Flex for asynchronous workloads to cut token rates in half.
  • Keep reusable instructions and reference material stable so prompt caching can apply.
  • Treat long-context pricing as a separate budget line above the threshold.
  • Set output limits and structured formats so runaway generations do not dominate cost.
  • Read actual rate and usage limits from your OpenAI dashboard, not a static blog table.

Bottom Line

The old early-2026 framing is no longer the right starting point. In May 2026, the current OpenAI pricing conversation starts with GPT-5.5 for highest-quality work, GPT-5.4 for a lower-cost frontier option, and GPT-5.4 mini/nano for routed production traffic.

For most teams, the practical architecture is: GPT-5.4 mini or nano for routine work, GPT-5.4 for higher-quality long-context tasks, GPT-5.5 for the workflows where better execution changes the outcome, and Batch/Flex for every job that does not need an immediate response.

Official OpenAI references:

Related guides:

Related Posts