AI Model Pricing Directory
Browse pricing for 40+ AI models including GPT-5, Claude, Gemini, DeepSeek, Llama & more. Compare input/output costs, context windows, and capabilities.
Last updated: 2026-06-14 · 47 models from 14 providers
7 provider sources need re-verification: Meta, Meta (via providers), Alibaba Cloud, Cohere, AI21 Labs, AWS Bedrock, Amazon.
47
Models
14
Providers
$0.035
Cheapest Input/M
1.1M
Max Context
Need to estimate costs?
Use our interactive Pricing Calculator to compare models side-by-side and estimate monthly costs.
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| GPT-5.5 Current OpenAI frontier model for complex reasoning, coding, and professional work. GPT-5.5 is also available in Codex; prompts above 272K input tokens use higher long-context pricing. | $5.00 | $30.00 | 1.1M |
| GPT-5.5 Pro Higher-compute GPT-5.5 variant for the hardest professional tasks. No cached-input discount is listed on the official model page. | $30.00 | $180.00 | 1.1M |
| GPT-5.4 More affordable OpenAI model for coding and professional work. GPT-5.5 is now the frontier option. | $2.50 | $15.00 | 1.1M |
| GPT-5 | $1.25 | $10.00 | 400K |
| GPT-5 Mini | $0.25 | $2.00 | 400K |
| GPT-5 Nano | $0.05 | $0.4 | 400K |
| GPT-4o mini | $0.15 | $0.6 | 128K |
| o3-pro Highest reasoning capability for elite tasks. | $20.00 | $80.00 | 200K |
| o3 Standard reasoning model. | $2.00 | $8.00 | 200K |
| GPT-5.4 Mini Current lower-cost GPT-5.4 production model. | $0.75 | $4.50 | 400K |
| GPT-5.4 Nano Current high-volume GPT-5.4 routing model. | $0.2 | $1.25 | 400K |
| GPT-5.3-Codex Current dedicated Codex API model for long-horizon agentic coding. | $1.75 | $14.00 | 400K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Claude Opus 4.8 Latest generally usable Opus model. Full 1M context at standard pricing. Fast mode costs $10/$50 per million tokens and remains a research preview. | $5.00 | $25.00 | 1.0M |
| Claude Opus 4.6 Previous-generation Opus model. Full 1M context is available at standard pricing. | $5.00 | $25.00 | 1.0M |
| Claude Sonnet 4.6 Current balanced Claude production model. Full 1M context is available at standard pricing. | $3.00 | $15.00 | 1.0M |
| Claude Haiku 4.5 Fastest Claude model. 200K context, 64K max output. | $1.00 | $5.00 | 200K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Gemini 3.5 Flash Stable Gemini 3.5 Flash model for agentic loops, coding cycles, long-horizon tasks, search grounding, Batch API, Flex, context caching, and multimodal inputs. | $1.50 | $9.00 | 1.0M |
| Gemini 3.1 Pro Preview Preview Gemini Pro model. Inputs above 200K tokens use higher long-context pricing. | $2.00 | $12.00 | 1.0M |
| Gemini 3.1 Flash-Lite Fastest Gemini model. Optimized for high-throughput, multimodal tasks. | $0.25 | $1.50 | 1.0M |
| Gemini 2.5 Pro Long context >200K: $2.50 input, $15.00 output per 1M. | $1.25 | $10.00 | 1.0M |
| Gemini 2.5 Flash | $0.3 | $2.50 | 1.0M |
| Gemini 2.5 Flash-Lite | $0.1 | $0.4 | 1.0M |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Grok 4.3 Current general-purpose xAI API model. Older Grok 3 and Grok 4 aliases route to the current model family. | $1.25 | $2.50 | 1.0M |
| Grok Build 0.1 Current xAI model optimized for coding and software-building workflows. | $1.00 | $2.00 | 256K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Mistral Large 3 | $0.5 | $1.50 | 128K |
| Mistral Medium 3.5 Current balanced Mistral production model. | $1.50 | $7.50 | 131K |
| Mistral Small 4 Current low-cost Mistral model for high-volume production routing. | $0.1 | $0.3 | 131K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| DeepSeek V4 Flash Current default V4 model. deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking/thinking modes and are scheduled for deprecation on 2026-07-24. Cache-hit input is $0.0028/M. | $0.14 | $0.28 | 1.0M |
| DeepSeek V4 Pro Current 75% off API price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC. Original reference price was $0.0145/M cached input, $1.74/M cache-miss input, and $3.48/M output. | $0.435 | $0.87 | 1.0M |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Xiaomi MiMo-V2.5-Pro Open-sourced under MIT. New pay-as-you-go pricing took effect on 2026-05-27 00:00 Beijing time. Domestic pricing is ¥3.00/M cache-miss input, ¥0.025/M cached input, and ¥6.00/M output. Cache writing is currently free for a limited time. | $0.435 | $0.87 | 1.0M |
| Xiaomi MiMo-V2.5 Native full-modal model with text, image, video, and audio understanding. New pay-as-you-go pricing took effect on 2026-05-27 00:00 Beijing time. Domestic pricing is ¥1.00/M cache-miss input, ¥0.02/M cached input, and ¥2.00/M output. | $0.14 | $0.28 | 1.0M |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| MiniMax M3 Latest MiniMax frontier coding and agent model. Supports Adaptive Thinking, tool use, and native image/video input. Inputs above 512K tokens use long-context pricing and currently require limited-access availability. | $0.3 | $1.20 | 1.0M |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| GLM-5.1 Latest generally documented GLM flagship API model for long-horizon coding and agent tasks. Z.AI says it can work autonomously on a single task for up to eight hours. | $1.40 | $4.40 | 200K |
| GLM-5 Turbo Current faster GLM model optimized for tool use and agent workflows. | $1.20 | $4.00 | 200K |
| GLM-5V Turbo Current multimodal GLM agent model for text, image, and video input. | $1.20 | $4.00 | 200K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Kimi K2.5 China API pricing in CNY. Current K2.5 multimodal model with text, image, and video input. | ¥4.00 | ¥21.00 | 262K |
| Kimi K2.6 Current general Kimi multimodal model. China API pricing in CNY. | ¥6.50 | ¥27.00 | 262K |
| Kimi K2.7 Code Current Kimi coding model. Thinking-only mode; China API pricing in CNY. | ¥6.50 | ¥27.00 | 262K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Qwen3.7 Plus Current balanced Qwen agent model. China API pricing in CNY; inputs above 256K use long-context pricing. | ¥2.00 | ¥8.00 | 1.0M |
| Qwen3.7 Max Current flagship Qwen API model. China API pricing in CNY. | ¥12.00 | ¥36.00 | 1.0M |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Command R+ Enterprise RAG-optimized with citation grounding. | $2.50 | $10.00 | 128K |
| Command R | $0.15 | $0.6 | 128K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Jamba 1.5 Large Mamba-based SSM architecture. 256K context. | $2.00 | $8.00 | 256K |
| Jamba 1.5 Mini | $0.2 | $0.4 | 256K |
| Model | Input $/M | Output $/M | Context |
|---|---|---|---|
| Amazon Nova Pro Via AWS Bedrock. 300K context. | $0.8 | $3.20 | 300K |
| Amazon Nova Lite | $0.06 | $0.24 | 300K |
| Amazon Nova Micro Text-only. Lowest cost option on Bedrock. | $0.035 | $0.14 | 128K |