Chinese AI Models in 2026: GLM-5.1, MiniMax M3, Qwen 3.7, DeepSeek V4, and MiMo Pricing

Chinese AI APIs are no longer one budget category. The current market includes ultra-cheap text routes, premium long-horizon coding models, native multimodal agents, and million-token context models.

The most useful question is not which model wins a vendor benchmark. It is which model completes your workload at the lowest total cost after output tokens, cached context, long-context premiums, latency, and retries.

Current Price Snapshot

USD prices below use official international API pricing. RMB prices remain in RMB because converting them would introduce exchange-rate drift.

Model	Input / 1M	Cached Input / 1M	Output / 1M	Context	Best Fit
DeepSeek V4 Flash	$0.14	$0.0028	$0.28	1M	Lowest-cost text routing
Xiaomi MiMo-V2.5	$0.14	$0.0028	$0.28	1M	Affordable multimodal agents
MiniMax M3, up to 512K input	$0.30	$0.06	$1.20	1M	Long-context multimodal agents
GLM-5.1	$1.40	$0.26	$4.40	200K	Long-horizon engineering tasks
Qwen3.7 Plus, up to 256K input	¥2	Not listed	¥8	1M	Alibaba Cloud production workloads
Qwen3.7 Max	¥12	Not listed	¥36	1M	Premium Qwen flagship

MiniMax M3 input above 512K costs $0.60/M input, $0.12/M cached input, and $2.40/M output. Qwen3.7 Plus input from 256K to 1M costs ¥6/M input and ¥24/M output.

What Changed

MiniMax M3 makes long context affordable

MiniMax M3 launched on June 1, 2026 with a 1M-token context window, native image and video input, tool use, and a very large maximum output allowance. Its $0.30/$1.20 standard international price makes it one of the most interesting choices for long codebases and multimodal agents.

GLM-5.1 focuses on sustained execution

Z.AI positions GLM-5.1 around long-horizon coding and agent work rather than short benchmark prompts. Its international API price is materially higher than MiniMax, DeepSeek, or MiMo, so teams should test whether fewer retries and better task completion justify the premium.

Qwen has moved far beyond Qwen 2.5

Qwen3.7 Plus and Max now offer 1M context through Alibaba Cloud Model Studio. Plus is the more practical price-performance route; Max is a premium model and should be reserved for tasks where its quality difference is measurable.

Recommended Routing

Workload	Starting Model
Classification, extraction, simple tool calls	DeepSeek V4 Flash
Cost-sensitive coding agent	Xiaomi MiMo-V2.5
Long codebase, image/video input, long output	MiniMax M3
Multi-hour engineering workflow	GLM-5.1
Existing Alibaba Cloud stack	Qwen3.7 Plus

Do not route everything to one flagship. Measure task success rate, total generated tokens, retries, latency, and human repair time. A model with a higher token price can still be cheaper per completed task, but only your workload can prove that.

Use the AI Model Pricing Calculator to compare the USD-priced models in the canonical dataset.

Official sources checked: Z.AI pricing, GLM-5.1 docs, MiniMax pay-as-you-go pricing, MiniMax M3 announcement, Alibaba Cloud Model Studio pricing, DeepSeek pricing, and Xiaomi MiMo pricing.

Chinese AI Models in 2026: GLM-5.1, MiniMax M3, Qwen 3.7, DeepSeek V4, and MiMo Pricing

Current Price Snapshot

What Changed

MiniMax M3 makes long context affordable

GLM-5.1 focuses on sustained execution

Qwen has moved far beyond Qwen 2.5

Recommended Routing

Related Posts

AI API Pricing Comparison (June 2026): 50+ Models Side-by-Side Table

Gemini 3.5 Flash vs DeepSeek V4: API Price, Agents, and When to Use Each

Is Claude Opus 4.8 Worth Upgrading To? Capability, Cost, and the Fable 5 Problem