What capabilities does DeepSeek V4 Flash have?

DeepSeek V4 Flash supports: text, function_calling, structured_output. Current default V4 model. deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking/thinking modes and are scheduled for deprecation on 2026-07-24. Cache-hit input is $0.0028/M.

DeepSeek V4 Flash

Q: What is DeepSeek V4 Flash's context window?

DeepSeek V4 Flash supports a context window of 1.0M tokens with a maximum output of 384K tokens per response.

DeepSeek

Official DeepSeek V4 API pricing for 2026. DeepSeek V4 Flash: $0.14/M cache-miss input, $0.0028/M cached input, $0.28/M output, 1M context, 384K max output.

Input Price

$0.14

cache miss / 1M tokens

Cached Input

$0.0028

per 1M tokens

Output Price

$0.28

per 1M tokens

Context Window

1.0M

tokens

DeepSeek V4 API Pricing 2026

Official DeepSeek V4 prices are per 1M tokens. V4 Pro currently shows 75% off pricing, and DeepSeek says it becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC.

Official pricing

Model	Cached input / 1M	Cache-miss input / 1M	Output / 1M	Best use
DeepSeek V4 Flash	$0.0028	$0.14	$0.28	High-volume agent traffic
DeepSeek V4 Pro	$0.003625	$0.435	$0.87	Harder coding and reasoning

DeepSeek V4 API pricing: use cache-miss input for new prompt tokens and cached input for repeated prefixes.

DeepSeek V4 price: both models support OpenAI-compatible and Anthropic-compatible endpoints.

DeepSeek V4 cost: agent workloads can be far cheaper when repository context and system prompts hit cache.

Specifications

Provider	DeepSeek
Model ID	deepseek-v4-flash
Input Price	$0.14 / 1M cache-miss tokens
Cached Input Price	$0.0028 / 1M tokens
Output Price	$0.28 / 1M tokens
Context Window	1.0M tokens
Max Output	384K tokens
Capabilities	textfunction_callingstructured_output
Release Date	2026-04
Pricing Source	Official DeepSeek pricing
Price Verified	2026-05-24 · V4 Pro current 75% off pricing becomes the official 1/4-of-original price after 2026-05-31 15:59 UTC.
Notes	Current default V4 model. deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking/thinking modes and are scheduled for deprecation on 2026-07-24. Cache-hit input is $0.0028/M.

Monthly Cost Estimates

Estimated monthly costs based on different daily usage levels (assuming 50% input / 50% output split). Input estimates use cache-miss pricing, so cache-heavy workloads can be lower.

Daily Tokens	Monthly Cost	Annual Cost
10K	$0.06	$0.76
50K	$0.32	$3.78
100K	$0.63	$7.56
500K	$3.15	$37.80
1.0M	$6.30	$75.60

About DeepSeek V4 Flash

DeepSeek V4 Flash is a large language model by DeepSeek. It features a 1.0M token context window with up to 384K tokens of output per request. The model supports 3 capabilities: text, function_calling, structured_output.

At $0.14 per million cache-miss input tokens and $0.28 per million output tokens, DeepSeek V4 Flash is positioned as a cost-effective option in the DeepSeek lineup. Repeated prefix input can be charged at $0.0028 per million cached tokens. Use our Token Counter to estimate how many tokens your prompts use, and our Pricing Calculator to compare costs across all models.

DeepSeek V4 Flash Key Details

Pricing: $0.14/M cache-miss input tokens, $0.0028/M cached input tokens, $0.28/M output tokens
Context window: 1.0M tokens — one of the largest available
Max output: 384K tokens per response
Capabilities: text, function_calling, structured_output
Highlights: Current default V4 model. deepseek-chat and deepseek-reasoner are compatibility aliases for V4 Flash non-thinking/thinking modes and are scheduled for deprecation on 2026-07-24. Cache-hit input is $0.0028/M.
Released: 2026-04

Other DeepSeek Models

DeepSeek V4 Pro

$0.435 / $0.87 per 1M · cached input $0.003625

Similar Price Range

Xiaomi MiMo-V2.5

Xiaomi MiMo

$0.14 / $0.28 per 1M · cached input $0.0028

GPT-4o mini

OpenAI

$0.15 / $0.6 per 1M · cached input $0.075

Command R

Cohere

$0.15 / $0.6 per 1M

Related Tools

AI Token Counter

Count tokens for DeepSeek V4 Flash

Pricing Calculator

Compare all model prices

Throughput Planner

Plan RPM, TPM, and monthly cost for DeepSeek V4 Flash

FAQ

How much does DeepSeek V4 Flash cost?

DeepSeek V4 Flash costs $0.14 per million cache-miss input tokens and $0.28 per million output tokens. Cached input costs $0.0028 per million tokens. For a typical workload of 100K input tokens/day and 50K output tokens/day, expect approximately $0.84/month before cache-hit savings.

What is DeepSeek V4 Flash's context window?

DeepSeek V4 Flash supports a context window of 1.0M tokens. This means your combined input prompt and output response can be up to 1.0M tokens. The maximum output per response is 384K tokens.

Is DeepSeek V4 Flash good for my use case?

DeepSeek V4 Flash supports text, function_calling, structured_output. As a budget-friendly model, it works well for high-volume tasks like classification, summarization, and simple generation. Use our Pricing Calculator to compare with alternatives.

Is this the official DeepSeek V4 API pricing?

The table above is based on DeepSeek's official model pricing page, last checked against the current public docs. DeepSeek says product prices may change, so verify the official pricing page before committing large production spend.

Should I use DeepSeek V4 Flash or DeepSeek V4 Pro?

Use DeepSeek V4 Flash for high-volume chat, extraction, coding-agent subtasks, and cache-heavy repository work. Use DeepSeek V4 Pro for harder coding, reasoning-heavy agent tasks, and stronger long-horizon evaluations.

Does DeepSeek V4 have a free tier?

DeepSeek's public API docs describe billing from topped-up balance or granted balance, but they do not publish a permanent free tier table. Check your platform balance and the official docs for current granted credits before assuming free production usage.