Chinese AI Models in 2026: GLM-5.1, MiniMax M3, Qwen 3.7, DeepSeek V4, and MiMo Pricing
Compare current Chinese AI model API pricing, context windows, and agent use cases across GLM-5.1, MiniMax M3, Qwen 3.7, DeepSeek V4, and Xiaomi MiMo.
Chinese AI APIs are no longer one budget category. The current market includes ultra-cheap text routes, premium long-horizon coding models, native multimodal agents, and million-token context models.
The most useful question is not which model wins a vendor benchmark. It is which model completes your workload at the lowest total cost after output tokens, cached context, long-context premiums, latency, and retries.
Current Price Snapshot
USD prices below use official international API pricing. RMB prices remain in RMB because converting them would introduce exchange-rate drift.
| Model | Input / 1M | Cached Input / 1M | Output / 1M | Context | Best Fit |
|---|---|---|---|---|---|
| DeepSeek V4 Flash | $0.14 | $0.0028 | $0.28 | 1M | Lowest-cost text routing |
| Xiaomi MiMo-V2.5 | $0.14 | $0.0028 | $0.28 | 1M | Affordable multimodal agents |
| MiniMax M3, up to 512K input | $0.30 | $0.06 | $1.20 | 1M | Long-context multimodal agents |
| GLM-5.1 | $1.40 | $0.26 | $4.40 | 200K | Long-horizon engineering tasks |
| Qwen3.7 Plus, up to 256K input | ¥2 | Not listed | ¥8 | 1M | Alibaba Cloud production workloads |
| Qwen3.7 Max | ¥12 | Not listed | ¥36 | 1M | Premium Qwen flagship |
MiniMax M3 input above 512K costs $0.60/M input, $0.12/M cached input, and $2.40/M output. Qwen3.7 Plus input from 256K to 1M costs ¥6/M input and ¥24/M output.
What Changed
MiniMax M3 makes long context affordable
MiniMax M3 launched on June 1, 2026 with a 1M-token context window, native image and video input, tool use, and a very large maximum output allowance. Its $0.30/$1.20 standard international price makes it one of the most interesting choices for long codebases and multimodal agents.
GLM-5.1 focuses on sustained execution
Z.AI positions GLM-5.1 around long-horizon coding and agent work rather than short benchmark prompts. Its international API price is materially higher than MiniMax, DeepSeek, or MiMo, so teams should test whether fewer retries and better task completion justify the premium.
Qwen has moved far beyond Qwen 2.5
Qwen3.7 Plus and Max now offer 1M context through Alibaba Cloud Model Studio. Plus is the more practical price-performance route; Max is a premium model and should be reserved for tasks where its quality difference is measurable.
Recommended Routing
| Workload | Starting Model |
|---|---|
| Classification, extraction, simple tool calls | DeepSeek V4 Flash |
| Cost-sensitive coding agent | Xiaomi MiMo-V2.5 |
| Long codebase, image/video input, long output | MiniMax M3 |
| Multi-hour engineering workflow | GLM-5.1 |
| Existing Alibaba Cloud stack | Qwen3.7 Plus |
Do not route everything to one flagship. Measure task success rate, total generated tokens, retries, latency, and human repair time. A model with a higher token price can still be cheaper per completed task, but only your workload can prove that.
Use the AI Model Pricing Calculator to compare the USD-priced models in the canonical dataset.
Official sources checked: Z.AI pricing, GLM-5.1 docs, MiniMax pay-as-you-go pricing, MiniMax M3 announcement, Alibaba Cloud Model Studio pricing, DeepSeek pricing, and Xiaomi MiMo pricing.
Related Posts
AI API Pricing Comparison (June 2026): 50+ Models Side-by-Side Table
2026-02-19
Gemini 3.5 Flash vs DeepSeek V4: API Price, Agents, and When to Use Each
2026-05-24
Is Claude Opus 4.8 Worth Upgrading To? Capability, Cost, and the Fable 5 Problem
2026-06-14