Current OpenAI API Pricing 2026: GPT-5.5, GPT-5.4, GPT-4o & Codex Costs

OpenAI API pricing in 2026 is easiest to reason about per 1 million tokens. If you need the current OpenAI API price quickly: GPT-5.5 is $5/M input and $30/M output, GPT-5.4 is $2.50/M input and $15/M output, and GPT-5.3-Codex is $1.75/M input and $14/M output before caching, Batch, Flex, or long-context modifiers.

For developer workloads, the practical shortlist is now GPT-5.5 for the hardest coding and agent tasks, GPT-5.4 for lower-cost frontier work, GPT-5.3-Codex for dedicated coding-agent API flows, and cheaper GPT-5/GPT-4o/o3-family models for compatibility or specialized routing.

This guide focuses on facts that change often: model names, token pricing, context windows, processing modes, and rate-limit behavior. Always confirm availability in your OpenAI dashboard before migrating production traffic.

Quick Answer: OpenAI API Prices Per 1M Tokens

Model	Input	Cached input	Output	Best fit
GPT-5.5	$5.00	$0.50	$30.00	Hard coding, agents, long-context professional work
GPT-5.4	$2.50	$0.25	$15.00	Frontier quality at lower cost
GPT-5.3-Codex	$1.75	$0.175	$14.00	Dedicated Codex API coding-agent workloads
GPT-5	$1.25	$0.125	$10.00	Existing GPT-5 integrations
GPT-5 Mini	$0.25	$0.025	$2.00	Cost-sensitive production routing
GPT-4o mini	$0.15	$0.075	$0.60	Legacy budget multimodal work

For a coding agent that uses 2M input tokens + 500K output tokens per month, the rough standard-price bill is about $25 on GPT-5.5, $12.50 on GPT-5.4, $10.50 on GPT-5.3-Codex, or $7.50 on GPT-5 before caching, Batch, or Flex discounts. Use the AI Model Pricing Calculator for your own token mix, or compare provider limits in the AI API rate limits guide.

OpenAI API Pricing Table (June 2026)

Prices are USD per 1 million tokens. For the latest GPT-5.5 and GPT-5.4 models, OpenAI lists separate short-context and long-context rates; long context starts above roughly 270K input tokens.

Model	Standard input	Cached input	Standard output	Long-context input	Long-context output	Context	Max output	Best for
GPT-5.5	$5.00	$0.50	$30.00	$10.00	$45.00	1M	128K	Hard coding, agents, professional work
GPT-5.5 Pro	$30.00	-	$180.00	$60.00	$270.00	1M	128K	Highest-accuracy work
GPT-5.4	$2.50	$0.25	$15.00	$5.00	$22.50	1M	128K	Frontier quality at lower cost
GPT-5.4 mini	$0.75	$0.075	$4.50	-	-	400K	128K	Lower-latency, lower-cost production
GPT-5.4 nano	$0.20	$0.02	$1.25	-	-	See docs	See docs	High-volume simple work
GPT-5.3-Codex	$1.75	$0.175	$14.00	-	-	400K	128K	Dedicated Codex API agent work

Source: OpenAI API pricing and OpenAI models.

Compatibility Models In The Site Catalog

The DevTk.AI canonical model table also keeps these OpenAI families for existing integrations and historical comparison:

Model	Input	Output	Context	Max output	Use when
GPT-5	$1.25	$10.00	400K	128K	You need the older GPT-5 baseline already deployed
GPT-5 Mini	$0.25	$2.00	400K	128K	Cost-sensitive GPT-5-family workloads
GPT-5 Nano	$0.05	$0.40	400K	128K	Very high-volume routing and extraction
GPT-4o	$2.50	$10.00	128K	16K	Deprecated multimodal integrations
GPT-4o mini	$0.15	$0.60	128K	16K	Legacy budget multimodal integrations
o3-pro	$20.00	$80.00	200K	100K	Highest-cost reasoning tasks
o3	$2.00	$8.00	200K	100K	Standard reasoning tasks
o3-mini	$1.10	$4.40	200K	100K	Lower-cost reasoning

These entries are maintained in src/data/models.ts; check OpenAI’s live pricing page before quoting them in contracts or customer-facing estimates.

Batch, Flex, Priority, And Data Residency

OpenAI now exposes pricing by processing mode for the latest GPT-5.5 and GPT-5.4 family:

Mode	Pricing behavior	Use it for
Standard	Baseline published token rates	Interactive production requests
Batch	50% of standard rates	Offline jobs that can wait for asynchronous processing
Flex	50% of standard rates	Cost-sensitive work that can tolerate variable latency
Priority	2.5x standard rates for listed models	Latency-sensitive production spikes
Data residency / regional processing	10% uplift for listed GPT-5.5 and GPT-5.4 models	Workloads with regional processing requirements

Batch and Flex cut GPT-5.5 standard short-context pricing from $5/$30 to $2.50/$15 per million input/output tokens. Priority raises GPT-5.5 short-context pricing to $12.50/$75.

Rate Limits And Usage Tiers

Do not hard-code public RPM or TPM tables into your planning. OpenAI states that rate limits are set at the organization and project level, vary by model, may be shared by model family, and can include separate limits for long-context requests.

OpenAI’s docs also distinguish rate limits from monthly usage limits. Your account can automatically graduate to higher usage tiers as API spend increases, but the exact limits for your organization should be read from the OpenAI dashboard.

Key planning rules:

Track RPM, RPD, TPM, TPD, and where relevant IPM.
Treat long-context workloads separately because they can have separate limits.
Check shared-limit groups before assuming two related model IDs provide independent capacity.
Monitor Batch queue limits; queued tokens count until the batch completes.

Source: OpenAI rate limits and usage tiers.

Monthly Cost Examples

These examples use the current GPT-5.5/GPT-5.4 published short-context standard rates unless marked Batch.

Solo Developer

3M input + 1.5M output tokens per month.

Model	Monthly cost
GPT-5.4 nano	$2.48
GPT-5.4 mini	$9.00
GPT-5.4	$30.00
GPT-5.5	$60.00
GPT-5.5 Batch/Flex	$30.00

Startup Team

30M input + 15M output tokens per month.

Model	Monthly cost
GPT-5.4 nano	$24.75
GPT-5.4 mini	$90.00
GPT-5.4	$300.00
GPT-5.5	$600.00
GPT-5.5 Batch/Flex	$300.00

Production Scale

300M input + 150M output tokens per month.

Model	Monthly cost
GPT-5.4 nano	$247.50
GPT-5.4 mini	$900.00
GPT-5.4	$3,000.00
GPT-5.5	$6,000.00
GPT-5.5 Batch/Flex	$3,000.00

Long-context requests above the pricing threshold cost more for GPT-5.5, GPT-5.5 Pro, and GPT-5.4, so run your real prompts through a token counter before budgeting large document or repository workflows.

Want exact numbers for your usage pattern? Try our AI Model Pricing Calculator.

Which OpenAI Model Should You Choose?

GPT-5.5: Best For Hard Production Work

Start with GPT-5.5 when quality matters more than token price: coding agents, tool-heavy workflows, grounded assistants, long-context retrieval, product-spec-to-plan workflows, and customer-facing workflows where polished execution matters.

GPT-5.5 Pro: Highest Accuracy, Highest Cost

Use GPT-5.5 Pro only after evaluations show that the higher price produces enough quality lift. It is priced for the hardest professional tasks, not routine traffic.

GPT-5.4: Frontier Quality At A Lower Price

GPT-5.4 is the practical fallback when GPT-5.5 is too expensive but you still need a 1M context frontier model.

GPT-5.4 Mini And Nano: Default Routing Targets

Use GPT-5.4 mini for ordinary production requests that need good quality at lower latency and lower cost. Use GPT-5.4 nano for simple classification, extraction, tagging, routing, and formatting.

GPT-5, GPT-4o, And o3 Families: Compatibility And Specialized Routing

Keep existing GPT-5 or GPT-4o integrations if migration risk is higher than the expected savings. Route math, logic, and complex multi-step reasoning to the o3 family only when evals show it beats the general GPT-5 family on your task.

Getting Started With The Current API

Prefer the Responses API for new work unless you have an existing Chat Completions integration.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.5",
    input="Review this API design and identify the highest-risk edge cases."
)

print(response.output_text)

For lower-cost routing:

response = client.responses.create(
    model="gpt-5.4-mini",
    input="Extract company names, dates, and dollar amounts as JSON."
)

Cost Optimization Checklist

Route simple tasks to mini or nano models before using GPT-5.5.
Use Batch or Flex for asynchronous workloads to cut token rates in half.
Keep reusable instructions and reference material stable so prompt caching can apply.
Treat long-context pricing as a separate budget line above the threshold.
Set output limits and structured formats so runaway generations do not dominate cost.
Read actual rate and usage limits from your OpenAI dashboard, not a static blog table.

Bottom Line

The old early-2026 framing is no longer the right starting point. In June 2026, the current OpenAI pricing conversation starts with GPT-5.5 for highest-quality work, GPT-5.4 for a lower-cost frontier option, and GPT-5.4 mini/nano for routed production traffic.

For most teams, the practical architecture is: GPT-5.4 mini or nano for routine work, GPT-5.4 for higher-quality long-context tasks, GPT-5.5 for the workflows where better execution changes the outcome, and Batch/Flex for every job that does not need an immediate response.

Official OpenAI references:

Related guides: