Current DeepSeek API Pricing 2026: V4 Flash & Pro Cost per 1M Tokens

DeepSeek API pricing in 2026 is one of the cheapest options for high-volume AI agents. The current V4 lineup is DeepSeek V4 Flash and DeepSeek V4 Pro, both with 1M context, 384K max output, OpenAI-compatible API support, Anthropic-compatible API support, JSON output, and tool calls.

If you are searching for the current DeepSeek API price, the short answer is: V4 Flash costs $0.0028/M cached input, $0.14/M cache-miss input, and $0.28/M output. V4 Pro currently costs $0.003625/M cached input, $0.435/M cache-miss input, and $0.87/M output; DeepSeek says this 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC.

The important detail is cache pricing. DeepSeek splits input tokens into cache hit and cache miss buckets, and cache-hit input can be 50x cheaper than cache-miss input on V4 Flash.

Official sources: DeepSeek model pricing, DeepSeek context caching, and DeepSeek Anthropic API compatibility.

Quick Answer: DeepSeek V4 API Pricing Per 1M Tokens

If you only need the current official 2026 price table, use this:

Model	Cached Input / 1M	Cache-Miss Input / 1M	Output / 1M	Context	Best Fit
deepseek-v4-flash	$0.0028	$0.14	$0.28	1M	High-volume chat, extraction, and coding-agent subtasks
deepseek-v4-pro	$0.003625	$0.435	$0.87	1M	Harder coding, reasoning-heavy agents, and long-horizon tasks

DeepSeek V4 also supports OpenAI-compatible and Anthropic-compatible endpoints, JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode.

For the broader model page that ranks for DeepSeek V4 pricing, see DeepSeek V4 model pricing. For agent setup, see DeepSeek V4 agent setup.

Current DeepSeek V4 Pricing

All prices below are USD per 1 million tokens.

Model	Cached Input	Cache-Miss Input	Output	Context	Notes
deepseek-v4-flash	$0.0028	$0.14	$0.28	1M	Best default for most chat, coding, and agent workloads
deepseek-v4-pro	$0.003625	$0.435	$0.87	1M	Current 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC

DeepSeek says deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking and thinking modes, and those aliases are scheduled for deprecation on 2026-07-24. For new integrations, use deepseek-v4-flash or deepseek-v4-pro directly.

Why Real Bills Can Be Much Lower Than the Table Price

DeepSeek’s cost formula is:

cost =
  prompt_cache_hit_tokens * cached_input_price / 1,000,000
+ prompt_cache_miss_tokens * cache_miss_input_price / 1,000,000
+ completion_tokens * output_price / 1,000,000

The cache is enabled by default. You do not need to change application code to use it. DeepSeek exposes two usage fields so you can measure the effect:

usage.prompt_cache_hit_tokens
usage.prompt_cache_miss_tokens

Cache hits are best-effort, not guaranteed. They work best when requests reuse a stable prefix, such as the same system prompt, coding rules, repository summary, product spec, or previous conversation prefix.

How 10M Tokens Can Cost Around RMB 2

The “10 million tokens for a little over RMB 2” story is plausible when most input is cached and output is small. Using DeepSeek’s official domestic V4 Flash prices:

8.5M cached input * ¥0.02/M = ¥0.17
1.0M cache-miss input * ¥1/M = ¥1.00
0.5M output * ¥2/M = ¥1.00
Total = ¥2.17

That is not the cost of 10M random uncached tokens. It is the cost of a cache-heavy workflow where repeated context dominates, which is exactly what long-running coding agents often do.

Cost Examples

Workload	No Cache, V4 Flash	85% Input Cache Hit, V4 Flash
10M input + 1M output	$1.68	$0.49
50M input + 5M output	$8.40	$2.45
500M input + 50M output	$84.00	$24.50

For repeated agent work, cache hit rate matters more than the headline input price. Monitor the prompt_cache_hit_tokens ratio before deciding whether V4 Flash, V4 Pro, or another provider is cheaper.

DeepSeek V4 Pro Permanent Price Cut

V4 Pro currently shows 75% off pricing until May 31, 2026 at 15:59 UTC. DeepSeek’s pricing page now says the API price will be officially adjusted to 1/4 of the original price after that promotion ends. In other words, the current price is not expected to revert to the original reference price.

Price Type	Current / Official After May 31	Original Reference Price
Cached input	$0.003625/M	$0.0145/M
Cache-miss input	$0.435/M	$1.74/M
Output	$0.87/M	$3.48/M

Use V4 Pro when you need stronger agentic reasoning, complex coding, or long-horizon task coherence. Use V4 Flash for high-volume production flows where cost and latency dominate.

DeepSeek API Free Tier, Credits, and Balance

DeepSeek’s public API docs do not publish a permanent free-tier table for V4. The billing docs say usage is deducted from your topped-up balance or granted balance, with granted balance used first when both are available.

For planning, treat DeepSeek V4 as pay-as-you-go unless your account dashboard shows active granted credits. If you are checking whether you have free credits, use the platform balance page or the /user/balance API response fields:

total_balance
granted_balance
topped_up_balance

The practical takeaway: DeepSeek may show granted balance on some accounts or promotions, but do not assume ongoing free production capacity unless your own account balance confirms it.

OpenAI-Compatible API Example

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a careful coding assistant."},
        {"role": "user", "content": "Explain this repository structure."}
    ],
)

print(response.choices[0].message.content)
print(response.usage.prompt_cache_hit_tokens)
print(response.usage.prompt_cache_miss_tokens)

For coding agents that expect Anthropic format, use:

export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=your-deepseek-api-key

When DeepSeek Is the Best Choice

Use Case	Recommendation
High-volume chat, extraction, summaries	Start with V4 Flash
Coding agents with repeated repo context	Start with V4 Flash, monitor cache hit rate
Harder autonomous agent tasks	Use V4 Pro when quality matters more than the lowest possible token cost
Vision or audio input	Use another provider; DeepSeek V4 is text-first
Enterprise compliance-sensitive data	Evaluate legal and data-residency requirements first

Related guides:

How to Configure DeepSeek V4 in Claude Code — terminal agent setup
DeepSeek V4 for OpenCode, Codex, Cline, Kilo, and Roo — multi-agent setup notes
AI API Pricing Comparison 2026 — compare DeepSeek with OpenAI, Claude, Gemini, MiMo, and others
AI Model Pricing Calculator — estimate your monthly bill