AI API Pricing Comparison (May 2026): 40+ Models Side-by-Side Table

The AI model landscape is evolving at breakneck speed in 2026. In the last refresh cycle, DeepSeek V4 and Xiaomi MiMo-V2.5 changed the low-cost agent market while OpenAI’s GPT-5.5, Google’s Gemini 3.5 Flash / Gemini 3.1 Pro, and Anthropic’s Claude 4.6 family continue to define the high-capability tier. Eight major providers are competing on price, performance, and specialization — and the pricing gaps between them are wider than ever.

Whether you are building a production chatbot, a code assistant, or an agentic workflow, choosing the right model at the right price point can make or break your unit economics.

This guide breaks down the current pricing for every major AI API, helps you estimate monthly costs, and shows you where to save money without sacrificing quality.

The 2026 AI Provider Landscape

Eight providers dominate the API market in May 2026:

OpenAI remains the largest player. GPT-5.5 ($5/$30 per 1M, $0.50 cached input) is now the frontier model for complex coding and professional work, while GPT-5.2-Codex ($1.75/$14) is the dedicated Codex API model for long-horizon agentic coding.
Anthropic is led by the Claude 4.6 family. Opus 4.6 ($5/$25) is the maximum-capability option, Sonnet 4.6 ($3/$15) is the best-value flagship, and Haiku 4.5 ($1/$5) covers fast lower-cost work.
Google now includes Gemini 3.5 Flash ($1.50/$9, $0.15 cached input), Gemini 3.1 Pro ($2.00/$12 in this table), Gemini 3.1 Flash-Lite, and the Gemini 2.5 line. Note that current Gemini 2.5 Flash standard pricing is $0.30/$2.50, not the older $0.15/$0.60 estimate.
xAI now centers its API lineup on Grok 4 and Grok 4 Fast rather than Grok 3 as the current anchor.
Meta keeps Llama open-source, but hosted Llama 3.3 70B is available through third-party API providers at rock-bottom prices.
DeepSeek shifted the price floor with V4 Flash and V4 Pro. V4 Flash is $0.14/M cache-miss input, $0.0028/M cached input, and $0.28/M output, making cache-heavy agent workloads dramatically cheaper than old V3.2 estimates.
Xiaomi MiMo entered the agent API conversation with MiMo-V2.5-Pro and MiMo-V2.5, both offering 1M context, OpenAI/Anthropic-compatible APIs, cache-hit pricing, and MIT-licensed open weights.
Mistral holds its niche in Europe with Large 3 for complex tasks and Small 3.1 as one of the cheapest capable models on the market.

Full Pricing Table (Per 1M Tokens)

Here is the complete pricing breakdown as of May 2026. All prices are in USD per 1 million tokens.

Flagship / High-Capability Models

Model	Provider	Input (per 1M)	Output (per 1M)	Context Window	NEW
GPT-5.5	OpenAI	$5.00	$30.00	1.05M	Apr 2026
Claude Opus 4.6	Anthropic	$5.00	$25.00	1M beta	Feb 2026
Gemini 3.1 Pro	Google	$2.00	$12.00	2M	Feb 2026
GPT-5.5 Pro	OpenAI	$30.00	$180.00	1.05M	Apr 2026
GPT-5.4	OpenAI	$2.50	$15.00	1.05M
GPT-5	OpenAI	$1.25	$10.00	400K
Gemini 2.5 Pro	Google	$1.25	$10.00	2M
Grok 4	xAI	$3.00	$15.00	256K
o3	OpenAI	$2.00	$8.00	200K

Mid-Tier / Best Value Models

Model	Provider	Input (per 1M)	Output (per 1M)	Context Window	NEW
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	1M beta	Feb 2026
GPT-5.2-Codex	OpenAI	$1.75	$14.00	400K	Apr 2026
Gemini 3.5 Flash	Google	$1.50	$9.00	1.05M	May 2026
GPT-4o	OpenAI	$2.50	$10.00	128K
Mistral Large 3	Mistral	$2.00	$6.00	128K
DeepSeek V4 Pro	DeepSeek	$0.435	$0.87	1M	Official 1/4-of-original price after May 31
Xiaomi MiMo-V2.5-Pro	Xiaomi MiMo	$1.00	$3.00	1M
Grok 4 Fast	xAI	$0.20	$0.50	2M

Budget / High-Throughput Models

Model	Provider	Input (per 1M)	Output (per 1M)	Context Window
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	1M
Xiaomi MiMo-V2.5-Flash	Xiaomi MiMo	$0.10	$0.30	256K
Claude Haiku 4.5	Anthropic	$1.00	$5.00	200K
Gemini 3.1 Flash-Lite	Google	$0.25	$1.50	1M
Gemini 2.5 Flash	Google	$0.30	$2.50	1M
Gemini 2.5 Flash-Lite	Google	$0.10	$0.40	1M
Xiaomi MiMo-V2.5	Xiaomi MiMo	$0.40	$2.00	1M
Llama 3.3 70B	Meta (hosted)	$0.88	$0.88	128K
Mistral Small 3.1	Mistral	$0.20	$0.60	128K

Use our AI Model Pricing Calculator to run custom comparisons with your own usage patterns.

Best Model for Your Budget

Enterprise ($5,000+/month API spend)

At enterprise scale, you want maximum capability and reliability. GPT-5.5, Claude Opus 4.6, and Gemini 3.1 Pro are the current high-capability anchors for complex reasoning, long-form content, and agentic workflows. Gemini 2.5 Pro remains worth considering when you need a 2M-token context window at a lower per-token cost.

Recommended stack: GPT-5.5 or Claude Opus 4.6 for the hardest tasks, Sonnet 4.6 or GPT-5.4 for everyday production traffic, and Haiku/Flash/Small-class models for simple requests.

Startup ($500-$5,000/month)

This is where model routing matters most. Claude Sonnet 4.6, GPT-5.4, GPT-5.2-Codex, Gemini 3.5 Flash, and Mistral Large 3 are stronger current anchors than older GPT-4.1 or Claude 4.5-era recommendations. Pair them with Claude Haiku 4.5, Gemini 2.5 Flash-Lite, DeepSeek V4 Flash, or Grok 4 Fast for classification, summarization, and other high-volume tasks.

For a detailed head-to-head, see our OpenAI vs Anthropic comparison.

Hobby / Side Projects (Under $500/month)

Maximize every dollar. DeepSeek V4 Flash at $0.14/$0.28 and Xiaomi MiMo-V2.5-Flash at $0.10/$0.30 are among the most aggressive text API prices before cache-hit discounts. Gemini 2.5 Flash-Lite and Mistral Small 3.1 are strong mature-platform budget options, while Gemini 2.5 Flash is now priced more like a higher-throughput multimodal model.

For open-source enthusiasts, Llama 3.3 70B via hosted APIs offers flat pricing at $0.88 per million tokens for both input and output — a great option if your workload is output-heavy.

Check out our DeepSeek vs ChatGPT comparison for a deeper look at the cost-performance tradeoff.

Monthly Cost Estimates

To put these prices in perspective, here are estimated monthly costs for three common workloads. Each assumes a 1:1 input-to-output token ratio.

Light Usage (1M input + 1M output tokens/month)

Model	Monthly Cost
DeepSeek V4 Flash	$0.42
Xiaomi MiMo-V2.5-Flash	$0.40
Gemini 2.5 Flash-Lite	$0.50
Mistral Small 3.1	$0.80
Llama 3.3 70B	$1.76
Gemini 2.5 Flash	$2.80
Claude Haiku 4.5	$6.00
Gemini 3.5 Flash	$10.50
GPT-5	$11.25
Claude Sonnet 4.6	$18.00

Moderate Usage (50M input + 50M output tokens/month)

Model	Monthly Cost
DeepSeek V4 Flash	$21
Xiaomi MiMo-V2.5-Flash	$20
Gemini 2.5 Flash-Lite	$25
Mistral Small 3.1	$40
Llama 3.3 70B	$88
Gemini 2.5 Flash	$140
Claude Haiku 4.5	$300
Gemini 3.5 Flash	$525
GPT-5	$563
Claude Sonnet 4.6	$900

Heavy Usage (500M input + 500M output tokens/month)

Model	Monthly Cost
DeepSeek V4 Flash	$210
Xiaomi MiMo-V2.5-Flash	$200
Gemini 2.5 Flash-Lite	$250
Mistral Small 3.1	$400
Llama 3.3 70B	$880
Gemini 2.5 Flash	$1,400
Claude Haiku 4.5	$3,000
Gemini 3.5 Flash	$5,250
GPT-5	$5,625
Claude Sonnet 4.6	$9,000

Want exact numbers for your workload? Use the AI Token Counter to measure your actual prompt sizes, then plug them into the Pricing Calculator.

Hidden Costs to Watch

Raw per-token pricing does not tell the whole story. Keep these factors in mind:

Rate limits. Most providers impose requests-per-minute (RPM) and tokens-per-minute (TPM) limits on lower tiers. OpenAI and Anthropic both require usage history or prepayment to unlock higher rate limits. If your app needs burst capacity, budget for a higher tier or prepaid credits.

Cached input pricing. Anthropic offers prompt caching for Claude, where repeated system prompts are charged at a reduced rate after the first call. OpenAI has cached input prices for GPT-5.5, GPT-5.4, GPT-5.2-Codex, and other current models. DeepSeek V4 and Xiaomi MiMo now publish separate cache-hit prices; DeepSeek V4 Flash cached input is $0.0028/M, which can make repeated agent context far cheaper than a naive table estimate.

Batch API discounts. OpenAI’s Batch API offers 50% off for non-real-time workloads (24-hour turnaround). Anthropic’s Message Batches API provides similar discounts. If your use case allows asynchronous processing — think data labeling, content generation, or bulk analysis — always use batch endpoints.

Reasoning tokens. Models like o3 consume internal “thinking” tokens that you pay for but never see in the output. A single o3 request might use 5-10x more tokens than the visible output suggests. Monitor your actual token usage carefully with reasoning models.

Minimum spend / commitments. Some enterprise plans require monthly minimums. Google’s Gemini API has different pricing tiers depending on whether you use the free tier, pay-as-you-go, or provisioned throughput.

6 Ways to Reduce Your AI API Costs

Route by complexity. Not every request needs your most expensive model. Use a classifier (even a regex or keyword check) to send simple queries to Haiku/Flash/Small and only escalate complex ones to Opus/GPT-5.
Cache aggressively. If you send the same system prompt with every request, enable prompt caching. For Anthropic, cached prompts cost 90% less on subsequent calls. For application-level caching, store responses for identical or near-identical queries.
Use batch APIs for async work. Any task that does not need a real-time response — content moderation queues, document processing, weekly reports — should run through batch endpoints at half price.
Optimize your prompts. Shorter prompts cost less. Remove redundant instructions, compress examples, and use structured formats. A well-engineered prompt can be 30-50% shorter than a first draft while producing better results. Our AI Token Counter helps you measure exactly how many tokens each prompt version uses.
Fine-tune for repetitive tasks. If you are making thousands of similar API calls, a fine-tuned smaller model often outperforms a general-purpose large model at a fraction of the cost. OpenAI and Mistral both offer fine-tuning APIs.
Monitor and set budgets. All major providers offer usage dashboards and spending alerts. Set hard monthly limits to avoid surprise bills, especially during development and testing phases.

The Bottom Line

The 2026 AI API market offers more options at more price points than ever before. The pricing spread is enormous: from $0.10/1M input tokens (Gemini 2.5 Flash-Lite or Xiaomi MiMo-V2.5-Flash) to $5.00/1M (GPT-5.5 or Claude Opus 4.6), before the even higher GPT-5.5 Pro tier. The key to managing costs is not picking one model, but building a routing strategy that matches model capability to task complexity.

Start with the AI Model Pricing Calculator to model your specific use case, and revisit your model choices quarterly as pricing continues to evolve.

New model deep dives:

Gemini 3.5 Flash vs DeepSeek V4 - Price and agent-routing comparison
Gemini 3.1 Pro Pricing Guide — $2.00/M, 77.1% ARC-AGI-2, native video, 2M context
GPT-5.5 in Codex Pricing Guide — GPT-5.5, GPT-5.2-Codex, and DeepSeek routing costs

Provider deep dives:

DeepSeek API Pricing Guide 2026 — V4 Flash, V4 Pro permanent price cut, and cache-hit math
Xiaomi MiMo-V2.5 Agent Model Guide — MiMo pricing, Token Plan, Claude Code and OpenCode setup
Anthropic Claude API Pricing Guide 2026 — Opus vs Sonnet vs Haiku, prompt caching saves 90%
OpenAI API Pricing Guide 2026 — GPT-5.5, GPT-5.4, o3, batch API 50% off
Google Gemini API Pricing Guide 2026 - Gemini 3.5 Flash, 3.1 Pro, 2.5 Flash, Flash-Lite, and free tier details
Grok API Pricing Guide 2026 — Grok 4 and Grok 4 Fast pricing
Mistral API Pricing Guide 2026 — Large 3 at $2/M, Small 3.1 at $0.20/M, EU GDPR compliant
How to Cut AI API Costs by 80% — 8 proven strategies with code examples
Self-Hosting LLMs vs API: Cost Breakdown — GPU costs vs API costs, breakeven analysis
AI API Rate Limits Comparison — throughput limits by provider