Current DeepSeek API Pricing 2026: V4 Flash & Pro Cost per 1M Tokens
Current DeepSeek API pricing for 2026: V4 Flash and V4 Pro token costs, cache-hit pricing, free-credit notes, official model IDs, and coding-agent cost examples.
DeepSeek API pricing in 2026 is one of the cheapest options for high-volume AI agents. The current V4 lineup is DeepSeek V4 Flash and DeepSeek V4 Pro, both with 1M context, 384K max output, OpenAI-compatible API support, Anthropic-compatible API support, JSON output, and tool calls.
If you are searching for the current DeepSeek API price, the short answer is: V4 Flash costs $0.0028/M cached input, $0.14/M cache-miss input, and $0.28/M output. V4 Pro currently costs $0.003625/M cached input, $0.435/M cache-miss input, and $0.87/M output; DeepSeek says this 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC.
The important detail is cache pricing. DeepSeek splits input tokens into cache hit and cache miss buckets, and cache-hit input can be 50x cheaper than cache-miss input on V4 Flash.
Official sources: DeepSeek model pricing, DeepSeek context caching, and DeepSeek Anthropic API compatibility.
Quick Answer: DeepSeek V4 API Pricing Per 1M Tokens
If you only need the current official 2026 price table, use this:
| Model | Cached Input / 1M | Cache-Miss Input / 1M | Output / 1M | Context | Best Fit |
|---|---|---|---|---|---|
| deepseek-v4-flash | $0.0028 | $0.14 | $0.28 | 1M | High-volume chat, extraction, and coding-agent subtasks |
| deepseek-v4-pro | $0.003625 | $0.435 | $0.87 | 1M | Harder coding, reasoning-heavy agents, and long-horizon tasks |
DeepSeek V4 also supports OpenAI-compatible and Anthropic-compatible endpoints, JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode.
For the broader model page that ranks for DeepSeek V4 pricing, see DeepSeek V4 model pricing. For agent setup, see DeepSeek V4 agent setup.
Current DeepSeek V4 Pricing
All prices below are USD per 1 million tokens.
| Model | Cached Input | Cache-Miss Input | Output | Context | Notes |
|---|---|---|---|---|---|
| deepseek-v4-flash | $0.0028 | $0.14 | $0.28 | 1M | Best default for most chat, coding, and agent workloads |
| deepseek-v4-pro | $0.003625 | $0.435 | $0.87 | 1M | Current 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC |
DeepSeek says deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking and thinking modes, and those aliases are scheduled for deprecation on 2026-07-24. For new integrations, use deepseek-v4-flash or deepseek-v4-pro directly.
Why Real Bills Can Be Much Lower Than the Table Price
DeepSeek’s cost formula is:
cost =
prompt_cache_hit_tokens * cached_input_price / 1,000,000
+ prompt_cache_miss_tokens * cache_miss_input_price / 1,000,000
+ completion_tokens * output_price / 1,000,000
The cache is enabled by default. You do not need to change application code to use it. DeepSeek exposes two usage fields so you can measure the effect:
usage.prompt_cache_hit_tokens
usage.prompt_cache_miss_tokens
Cache hits are best-effort, not guaranteed. They work best when requests reuse a stable prefix, such as the same system prompt, coding rules, repository summary, product spec, or previous conversation prefix.
How 10M Tokens Can Cost Around RMB 2
The “10 million tokens for a little over RMB 2” story is plausible when most input is cached and output is small. Using DeepSeek’s official domestic V4 Flash prices:
8.5M cached input * ¥0.02/M = ¥0.17
1.0M cache-miss input * ¥1/M = ¥1.00
0.5M output * ¥2/M = ¥1.00
Total = ¥2.17
That is not the cost of 10M random uncached tokens. It is the cost of a cache-heavy workflow where repeated context dominates, which is exactly what long-running coding agents often do.
Cost Examples
| Workload | No Cache, V4 Flash | 85% Input Cache Hit, V4 Flash |
|---|---|---|
| 10M input + 1M output | $1.68 | $0.49 |
| 50M input + 5M output | $8.40 | $2.45 |
| 500M input + 50M output | $84.00 | $24.50 |
For repeated agent work, cache hit rate matters more than the headline input price. Monitor the prompt_cache_hit_tokens ratio before deciding whether V4 Flash, V4 Pro, or another provider is cheaper.
DeepSeek V4 Pro Permanent Price Cut
V4 Pro currently shows 75% off pricing until May 31, 2026 at 15:59 UTC. DeepSeek’s pricing page now says the API price will be officially adjusted to 1/4 of the original price after that promotion ends. In other words, the current price is not expected to revert to the original reference price.
| Price Type | Current / Official After May 31 | Original Reference Price |
|---|---|---|
| Cached input | $0.003625/M | $0.0145/M |
| Cache-miss input | $0.435/M | $1.74/M |
| Output | $0.87/M | $3.48/M |
Use V4 Pro when you need stronger agentic reasoning, complex coding, or long-horizon task coherence. Use V4 Flash for high-volume production flows where cost and latency dominate.
DeepSeek API Free Tier, Credits, and Balance
DeepSeek’s public API docs do not publish a permanent free-tier table for V4. The billing docs say usage is deducted from your topped-up balance or granted balance, with granted balance used first when both are available.
For planning, treat DeepSeek V4 as pay-as-you-go unless your account dashboard shows active granted credits. If you are checking whether you have free credits, use the platform balance page or the /user/balance API response fields:
total_balance
granted_balance
topped_up_balance
The practical takeaway: DeepSeek may show granted balance on some accounts or promotions, but do not assume ongoing free production capacity unless your own account balance confirms it.
OpenAI-Compatible API Example
from openai import OpenAI
client = OpenAI(
api_key="your-deepseek-api-key",
base_url="https://api.deepseek.com"
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a careful coding assistant."},
{"role": "user", "content": "Explain this repository structure."}
],
)
print(response.choices[0].message.content)
print(response.usage.prompt_cache_hit_tokens)
print(response.usage.prompt_cache_miss_tokens)
For coding agents that expect Anthropic format, use:
export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=your-deepseek-api-key
When DeepSeek Is the Best Choice
| Use Case | Recommendation |
|---|---|
| High-volume chat, extraction, summaries | Start with V4 Flash |
| Coding agents with repeated repo context | Start with V4 Flash, monitor cache hit rate |
| Harder autonomous agent tasks | Use V4 Pro when quality matters more than the lowest possible token cost |
| Vision or audio input | Use another provider; DeepSeek V4 is text-first |
| Enterprise compliance-sensitive data | Evaluate legal and data-residency requirements first |
Related guides:
- How to Configure DeepSeek V4 in Claude Code — terminal agent setup
- DeepSeek V4 for OpenCode, Codex, Cline, Kilo, and Roo — multi-agent setup notes
- AI API Pricing Comparison 2026 — compare DeepSeek with OpenAI, Claude, Gemini, MiMo, and others
- AI Model Pricing Calculator — estimate your monthly bill
Related Posts
How to Configure DeepSeek V4 in Claude Code
2026-04-28
Gemini 3.5 Flash vs DeepSeek V4: API Price, Agents, and When to Use Each
2026-05-24
AI Coding Agent Cost Comparison 2026: Codex, Claude Code, Cursor, DeepSeek & GPT-5.5
2026-05-07