DevTk.AI
DeepSeek APIAI PricingDeepSeek V4Prompt CachingLLM Pricing

Current DeepSeek API Pricing 2026: V4 Flash & Pro Cost per 1M Tokens

Current DeepSeek API pricing for 2026: V4 Flash and V4 Pro token costs, cache-hit pricing, free-credit notes, official model IDs, and coding-agent cost examples.

DevTk.AI 2026-02-23 Updated 2026-05-24 6 min read

DeepSeek API pricing in 2026 is one of the cheapest options for high-volume AI agents. The current V4 lineup is DeepSeek V4 Flash and DeepSeek V4 Pro, both with 1M context, 384K max output, OpenAI-compatible API support, Anthropic-compatible API support, JSON output, and tool calls.

If you are searching for the current DeepSeek API price, the short answer is: V4 Flash costs $0.0028/M cached input, $0.14/M cache-miss input, and $0.28/M output. V4 Pro currently costs $0.003625/M cached input, $0.435/M cache-miss input, and $0.87/M output; DeepSeek says this 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC.

The important detail is cache pricing. DeepSeek splits input tokens into cache hit and cache miss buckets, and cache-hit input can be 50x cheaper than cache-miss input on V4 Flash.

Official sources: DeepSeek model pricing, DeepSeek context caching, and DeepSeek Anthropic API compatibility.

Quick Answer: DeepSeek V4 API Pricing Per 1M Tokens

If you only need the current official 2026 price table, use this:

ModelCached Input / 1MCache-Miss Input / 1MOutput / 1MContextBest Fit
deepseek-v4-flash$0.0028$0.14$0.281MHigh-volume chat, extraction, and coding-agent subtasks
deepseek-v4-pro$0.003625$0.435$0.871MHarder coding, reasoning-heavy agents, and long-horizon tasks

DeepSeek V4 also supports OpenAI-compatible and Anthropic-compatible endpoints, JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode.

For the broader model page that ranks for DeepSeek V4 pricing, see DeepSeek V4 model pricing. For agent setup, see DeepSeek V4 agent setup.

Current DeepSeek V4 Pricing

All prices below are USD per 1 million tokens.

ModelCached InputCache-Miss InputOutputContextNotes
deepseek-v4-flash$0.0028$0.14$0.281MBest default for most chat, coding, and agent workloads
deepseek-v4-pro$0.003625$0.435$0.871MCurrent 75% off price becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC

DeepSeek says deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking and thinking modes, and those aliases are scheduled for deprecation on 2026-07-24. For new integrations, use deepseek-v4-flash or deepseek-v4-pro directly.

Why Real Bills Can Be Much Lower Than the Table Price

DeepSeek’s cost formula is:

cost =
  prompt_cache_hit_tokens * cached_input_price / 1,000,000
+ prompt_cache_miss_tokens * cache_miss_input_price / 1,000,000
+ completion_tokens * output_price / 1,000,000

The cache is enabled by default. You do not need to change application code to use it. DeepSeek exposes two usage fields so you can measure the effect:

usage.prompt_cache_hit_tokens
usage.prompt_cache_miss_tokens

Cache hits are best-effort, not guaranteed. They work best when requests reuse a stable prefix, such as the same system prompt, coding rules, repository summary, product spec, or previous conversation prefix.

How 10M Tokens Can Cost Around RMB 2

The “10 million tokens for a little over RMB 2” story is plausible when most input is cached and output is small. Using DeepSeek’s official domestic V4 Flash prices:

8.5M cached input * ¥0.02/M = ¥0.17
1.0M cache-miss input * ¥1/M = ¥1.00
0.5M output * ¥2/M = ¥1.00
Total = ¥2.17

That is not the cost of 10M random uncached tokens. It is the cost of a cache-heavy workflow where repeated context dominates, which is exactly what long-running coding agents often do.

Cost Examples

WorkloadNo Cache, V4 Flash85% Input Cache Hit, V4 Flash
10M input + 1M output$1.68$0.49
50M input + 5M output$8.40$2.45
500M input + 50M output$84.00$24.50

For repeated agent work, cache hit rate matters more than the headline input price. Monitor the prompt_cache_hit_tokens ratio before deciding whether V4 Flash, V4 Pro, or another provider is cheaper.

DeepSeek V4 Pro Permanent Price Cut

V4 Pro currently shows 75% off pricing until May 31, 2026 at 15:59 UTC. DeepSeek’s pricing page now says the API price will be officially adjusted to 1/4 of the original price after that promotion ends. In other words, the current price is not expected to revert to the original reference price.

Price TypeCurrent / Official After May 31Original Reference Price
Cached input$0.003625/M$0.0145/M
Cache-miss input$0.435/M$1.74/M
Output$0.87/M$3.48/M

Use V4 Pro when you need stronger agentic reasoning, complex coding, or long-horizon task coherence. Use V4 Flash for high-volume production flows where cost and latency dominate.

DeepSeek API Free Tier, Credits, and Balance

DeepSeek’s public API docs do not publish a permanent free-tier table for V4. The billing docs say usage is deducted from your topped-up balance or granted balance, with granted balance used first when both are available.

For planning, treat DeepSeek V4 as pay-as-you-go unless your account dashboard shows active granted credits. If you are checking whether you have free credits, use the platform balance page or the /user/balance API response fields:

total_balance
granted_balance
topped_up_balance

The practical takeaway: DeepSeek may show granted balance on some accounts or promotions, but do not assume ongoing free production capacity unless your own account balance confirms it.

OpenAI-Compatible API Example

from openai import OpenAI

client = OpenAI(
    api_key="your-deepseek-api-key",
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a careful coding assistant."},
        {"role": "user", "content": "Explain this repository structure."}
    ],
)

print(response.choices[0].message.content)
print(response.usage.prompt_cache_hit_tokens)
print(response.usage.prompt_cache_miss_tokens)

For coding agents that expect Anthropic format, use:

export ANTHROPIC_BASE_URL=https://api.deepseek.com/anthropic
export ANTHROPIC_AUTH_TOKEN=your-deepseek-api-key

When DeepSeek Is the Best Choice

Use CaseRecommendation
High-volume chat, extraction, summariesStart with V4 Flash
Coding agents with repeated repo contextStart with V4 Flash, monitor cache hit rate
Harder autonomous agent tasksUse V4 Pro when quality matters more than the lowest possible token cost
Vision or audio inputUse another provider; DeepSeek V4 is text-first
Enterprise compliance-sensitive dataEvaluate legal and data-residency requirements first

Related guides:

Related Posts