Google Gemini API Pricing (March 2026): 3.1 Pro, 2.5 Flash + Free Tier

Google’s Gemini API has quietly become one of the most compelling options for AI developers in 2026. With the February 2026 launch of Gemini 3.1 Pro (77.1% ARC-AGI-2, native video understanding), Google now offers a flagship model that rivals GPT-5 and Claude Opus on reasoning — at a competitive $2.00/M input price point. Combined with a 1 million token context window and a genuinely useful free tier, no other provider matches this combination.

Whether you are processing massive documents, building agentic workflows, or just prototyping your next idea without spending a dime, Gemini deserves a serious look. This guide covers every current Gemini model’s pricing — including the new 3.1 Pro — plus the free tier details, context window cost implications, head-to-head comparisons with GPT-5, Claude, and DeepSeek, monthly cost estimates, and optimization strategies.

Gemini 2026 Model Lineup & Pricing

Google currently offers four API models in the Gemini family, each targeting different use cases and budgets:

Model	Input Price	Output Price	Context	Max Output	Best For
Gemini 3.1 Pro (NEW)	$2.00/M	$12.00/M	1M	16K	Top-tier reasoning, native video, multimodal
Gemini 2.5 Pro	$1.25/M	$10.00/M	1M	64K	Complex reasoning, long documents, coding
Gemini 2.5 Flash	$0.15/M	$0.60/M	1M	64K	High-throughput, cost-sensitive tasks
Gemini 2.0 Flash (legacy)	$0.10/M	$0.40/M	1M	8K	Legacy apps, lowest-cost option

All prices in USD per 1 million tokens. Source: Google AI pricing. Last updated: February 27, 2026.

What’s New: Gemini 3.1 Pro (February 2026)

Gemini 3.1 Pro is Google’s newest flagship, launched February 2026 with standout specifications:

77.1% ARC-AGI-2 — the highest reasoning benchmark score among all Google models
Native video understanding — process and analyze video content directly via the API
Audio input — transcribe and analyze audio without separate preprocessing
Grounding — connect model outputs to real-world data via Google Search

At $2.00/$12.00 (slightly above Gemini 2.5 Pro’s $1.25/$10.00), the main trade-offs are price and max output: 3.1 Pro currently caps at 16K tokens per response vs 64K for 2.5 Pro. Choose 3.1 Pro for reasoning-heavy and multimodal tasks; stick with 2.5 Pro when you need longer outputs or lower cost.

What Makes This Lineup Stand Out

The Gemini pricing structure has two key advantages over competitors:

Two flagship models at competitive prices — Gemini 2.5 Pro costs $1.25/$10.00 (matching GPT-5 exactly) while the newer 3.1 Pro costs $2.00/$12.00 with stronger reasoning. Both offer 2.5x the context window of GPT-5 (1M vs 400K tokens).
Gemini 2.5 Flash is uniquely positioned at $0.15/$0.60 — cheaper than nearly every mid-tier model on the market while still being a 2.5-generation model with strong reasoning capabilities. For comparison, Claude Haiku 4.5 costs $1.00/$5.00 — over 6.7x more expensive on input and 8.3x more on output.

Gemini Free Tier — The Biggest Differentiator

This is where Google separates itself from every other major provider. Gemini offers a genuinely useful free tier through the Google AI Studio API:

Model	Free Rate Limit	Free Token Limit
Gemini 3.1 Pro	25 requests per minute	Free up to usage limits
Gemini 2.5 Pro	25 requests per minute	Free up to usage limits
Gemini 2.5 Flash	500 requests per minute	Free up to usage limits
Gemini 2.0 Flash	500 requests per minute	Free up to usage limits

Why This Matters

Neither OpenAI nor Anthropic offer a free API tier. OpenAI gives new accounts a small credit that expires, and Anthropic requires payment from the start. Google is the only major provider where you can:

Prototype an entire application without entering a credit card
Run a hobby project in production at zero cost (within rate limits)
Test and evaluate Gemini models against your specific use case before committing budget
Build student and educational projects without worrying about API costs

For solo developers and students, the Gemini 2.5 Flash free tier at 500 RPM is particularly generous. That is enough throughput to handle a small production app serving real users — completely free.

Context Window Pricing — The 200K Threshold

Gemini 2.5 Pro has a tiered pricing structure based on how much of the 1M context window you actually use. This is an important detail that many developers miss:

Gemini 2.5 Pro Context-Based Pricing

Context Usage	Input Price	Output Price
Under 200K tokens	$1.25/M	$10.00/M
Over 200K tokens	$2.50/M	$15.00/M

When your total prompt (system prompt + conversation history + user input) exceeds 200,000 tokens, the entire request is billed at the higher rate — not just the tokens above the threshold. This means there is a significant cost jump at the 200K boundary.

Gemini 2.5 Flash Context-Based Pricing

Context Usage	Input Price	Output Price
Under 200K tokens	$0.15/M	$0.60/M
Over 200K tokens	$0.30/M	$1.20/M

Flash follows the same 2x multiplier pattern above 200K tokens.

What This Means in Practice

If you are processing long documents — legal contracts, codebases, research papers — and your inputs regularly exceed 200K tokens, your effective cost for Gemini 2.5 Pro doubles to $2.50/$15.00. At that price point, it is more expensive than GPT-5 ($1.25/$10.00, which supports up to 400K context at flat pricing).

Strategy: If your inputs are between 200K and 400K tokens, GPT-5 may actually be cheaper. If your inputs exceed 400K tokens, Gemini is your only option among closed-source models — no other provider offers context windows that large.

Gemini vs GPT-5 vs Claude vs DeepSeek: Price Comparison

Here is how Gemini 2.5 Pro stacks up against every major competitor at standard pricing (under 200K context):

Model	Provider	Input	Output	vs Gemini Pro
Gemini 3.1 Pro	Google	$2.00	$12.00	Newest flagship
Gemini 2.5 Pro	Google	$1.25	$10.00	Same price, 64K max output
GPT-5	OpenAI	$1.25	$10.00	Same price
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	2.4x / 1.5x more
Claude Opus 4.6	Anthropic	$5.00	$25.00	4x / 2.5x more
Grok 3	xAI	$3.00	$15.00	2.4x / 1.5x more
DeepSeek V3.2	DeepSeek	$0.27	$1.10	4.6x / 9.1x cheaper
Gemini 2.5 Flash	Google	$0.15	$0.60	8.3x / 16.7x cheaper

Key Takeaways

Gemini 3.1 Pro brings flagship reasoning at $2.00/$12.00 — its 77.1% ARC-AGI-2 score makes it competitive with GPT-5 and Claude Opus on reasoning, at a fraction of Claude’s cost.
Gemini 2.5 Pro remains relevant — same pricing as 3.1 Pro but with 64K max output (vs 16K), making it better for long-form content generation.
Claude Sonnet 4.6 costs 2.4x more on input than Gemini Pro. Unless you specifically need Claude’s instruction-following style or prompt caching features, Gemini offers better value.
DeepSeek V3.2 remains the budget king at $0.27 input — but with only 128K context and no multimodal support. For simple text tasks, it is hard to beat.
Gemini 2.5 Flash is the real standout at $0.15/$0.60. It is 6.7x cheaper than Claude Haiku 4.5 on input and 8.3x cheaper on output — while offering 5x the context window (1M vs 200K). For high-volume applications, Flash is arguably the best value on the market.

Monthly Cost Estimates

Let’s look at real-world costs across three common usage scenarios.

Scenario 1: Solo Developer

100K input + 50K output tokens per day

Model	Monthly Cost
Gemini 2.5 Flash	$0.90
Gemini 3.1 Pro	$18.75
Gemini 2.5 Pro	$18.75
DeepSeek V3.2	$2.46
GPT-5	$18.75
Claude Sonnet 4.6	$31.50
Claude Opus 4.6	$52.50

Calculation: (100K x 30 / 1M) x input price + (50K x 30 / 1M) x output price

At this usage level, Gemini 2.5 Flash costs under a dollar a month. And with the free tier, your actual bill could be $0. Both Gemini 3.1 Pro and 2.5 Pro match GPT-5 exactly at $18.75/month.

Scenario 2: Startup

1M input + 500K output tokens per day

Model	Monthly Cost
Gemini 2.5 Flash	$9.00
Gemini 3.1 Pro	$187.50
Gemini 2.5 Pro	$187.50
DeepSeek V3.2	$24.60
GPT-5	$187.50
Claude Sonnet 4.6	$315.00
Claude Opus 4.6	$525.00

Calculation: (1M x 30 / 1M) x input price + (500K x 30 / 1M) x output price

At startup scale, the Flash advantage becomes dramatic. $9/month on Flash vs $315/month on Claude Sonnet 4.6 — a 35x difference. Even if Flash’s quality is somewhat lower, the cost savings are staggering for tasks like summarization, classification, or extraction.

Scenario 3: Enterprise / Production

10M input + 5M output tokens per day

Model	Monthly Cost
Gemini 2.5 Flash	$90
Gemini 3.1 Pro	$1,875
Gemini 2.5 Pro	$1,875
DeepSeek V3.2	$246
GPT-5	$1,875
Claude Sonnet 4.6	$3,150
Claude Opus 4.6	$5,250

Calculation: (10M x 30 / 1M) x input price + (5M x 30 / 1M) x output price

At enterprise scale, Gemini 2.5 Flash at $90/month processing 450M tokens per month is remarkably cost-effective. You would spend $3,150/month for the same volume on Claude Sonnet 4.6 — 35x more.

Want exact numbers for your specific usage? Try our AI Model Pricing Calculator.

When to Choose Gemini Over Other Providers

1. Long Document Processing (1M Context Advantage)

If you are building a system that needs to process entire books, legal contracts, large codebases, or lengthy research papers in a single API call, Gemini is the clear winner. No other major provider offers 1M tokens of context:

Gemini 3.1 Pro / 2.5 Pro / Flash: 1,000,000 tokens
GPT-5: 400,000 tokens
Claude 4.6: 200,000 tokens
DeepSeek V3.2: 128,000 tokens

For context, 1M tokens is roughly 750,000 words — equivalent to about 10 full-length novels or a complete software repository.

2. Prototyping and Hobby Projects (Free Tier)

If you are exploring an idea, building a side project, or learning AI development, start with Gemini’s free tier. You can build and test your entire application without any API costs, then decide whether to stay with Gemini or switch providers when you go to production.

3. High-Volume, Cost-Sensitive Applications (Flash)

For applications like:

Content moderation at scale
Document classification and tagging
Data extraction from structured text
Chatbot responses for simple queries

Gemini 2.5 Flash at $0.15/$0.60 is extremely competitive. It costs less than DeepSeek V3.2 on input ($0.15 vs $0.27) while offering a far larger context window and multimodal capabilities.

4. Google Cloud Integration (Vertex AI)

If your infrastructure already runs on Google Cloud, using Gemini through Vertex AI gives you:

Unified billing and identity management
Data residency controls
Enterprise SLAs and support
Integration with BigQuery, Cloud Storage, and other GCP services

5. Multimodal Applications

Gemini 2.5 Pro supports text, images, audio, and video input natively. If your application needs to process mixed media — for example, analyzing screenshots, transcribing audio, or understanding video content — Gemini handles this within a single model and API call.

Getting Started with Gemini API

Step 1: Get Your API Key

Visit Google AI Studio
Sign in with your Google account
Click “Get API Key” to generate your key
No credit card required for free tier access

Step 2: Install the SDK

pip install google-generativeai

Step 3: Make Your First Request

Python:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-2.5-pro")

response = model.generate_content("Explain quantum computing in simple terms")
print(response.text)

Using Gemini 2.5 Flash for cost-effective requests:

flash_model = genai.GenerativeModel("gemini-2.5-flash")

response = flash_model.generate_content(
    "Classify this support ticket as billing, technical, or general: "
    "'I can't log into my account after resetting my password'"
)
print(response.text)

JavaScript / TypeScript (using the REST API):

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro:generateContent?key=${API_KEY}`,
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      contents: [{
        parts: [{ text: 'Write a React component for a search bar' }]
      }]
    })
  }
);

const data = await response.json();
console.log(data.candidates[0].content.parts[0].text);

Using the OpenAI-compatible endpoint (easy migration):

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GEMINI_API_KEY",
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)

response = client.chat.completions.create(
    model="gemini-2.5-pro",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a quicksort function in Python"}
    ],
    temperature=0.7,
    max_tokens=2000
)

print(response.choices[0].message.content)

This OpenAI-compatible endpoint means you can switch from GPT-5 to Gemini by changing just two lines — the base_url and api_key.

Cost Optimization Tips

1. Use Flash for Simple Tasks, Pro for Complex Ones

The single biggest cost optimization: do not use Gemini 2.5 Pro for tasks that Flash can handle. Flash costs 8.3x less on input and 16.7x less on output. Build a routing layer that sends simple tasks (classification, extraction, summarization) to Flash and reserves Pro for complex reasoning, creative writing, and multi-step analysis.

A typical 70/30 split (Flash/Pro) can reduce your overall API costs by 60-70% compared to running everything on Pro.

2. Stay Under 200K Context to Avoid the 2x Price Jump

This is critical for Gemini 2.5 Pro users. Once your prompt exceeds 200K tokens, pricing doubles to $2.50/$15.00. Strategies to stay under:

Chunk long documents instead of processing them in a single call
Summarize earlier parts of a conversation before appending new context
Use retrieval-augmented generation (RAG) to pull only relevant sections instead of feeding entire documents

3. Maximize the Free Tier for Development

Do all your development, testing, and prototyping on the free tier. At 25 RPM for Pro and 500 RPM for Flash, you have more than enough throughput for development workflows. Only switch to paid billing when you are ready for production.

4. Use Context Caching for Repeated Prompts

Google offers context caching for Gemini models, which reduces costs when you repeatedly send the same large context (like a system prompt or reference document). Cached tokens are billed at a reduced rate on subsequent requests, similar to Anthropic’s prompt caching.

5. Count Tokens Before Sending

Use our AI Token Counter to estimate token counts before making API calls. This helps you:

Predict costs accurately
Stay under the 200K threshold when it matters
Avoid surprise bills from oversized inputs

Gemini Strengths and Limitations

Strengths

1M context window — the largest among major providers, unmatched for document processing
Generous free tier — prototype and test without any cost
Competitive pricing — Pro matches GPT-5; Flash undercuts nearly everything
Native multimodal — text, images, audio, and video in a single model
Google ecosystem — deep integration with Vertex AI, GCP, and Google Workspace
OpenAI-compatible endpoint — easy migration from OpenAI with minimal code changes

Limitations

200K context price doubling — using the full 1M window gets expensive at $2.50/$15.00
Smaller third-party ecosystem — fewer community tools and integrations compared to OpenAI
Rate limits on free tier — 25 RPM for Pro limits real production use on free tier
Less established for enterprise — Anthropic and OpenAI have stronger enterprise sales and compliance tracks
Output quality variance — some developers report less consistent output compared to Claude on creative and nuanced tasks

Bottom Line

Google Gemini’s 2026 lineup is one of the most versatile AI API choices available. With the addition of Gemini 3.1 Pro (77.1% ARC-AGI-2, native video) at $2.00/$12.00, and Gemini 2.5 Pro matching GPT-5 dollar for dollar at $1.25/$10.00, Google offers two strong flagship models with 2.5x the context window of GPT-5 and a free tier that neither OpenAI nor Anthropic can match. For high-volume workloads, Gemini 2.5 Flash at $0.15/$0.60 is arguably the best value in the entire AI API market.

The main consideration is the 200K context pricing threshold for 2.5-generation models. If your workloads regularly exceed 200K tokens, factor in the 2x price multiplier when comparing costs. For everything under 200K tokens, Gemini offers flagship performance at mid-tier prices with the added bonus of a generous free tier for development.

For most developers, the winning strategy is: start with Flash for volume, use 3.1 Pro for reasoning-heavy tasks, use 2.5 Pro when you need long outputs, and take full advantage of the free tier during development. This approach can keep your AI API costs 60-80% lower than comparable setups on OpenAI or Anthropic.

Related tools and guides:

AI Model Pricing Calculator — Compare monthly costs across 25+ models interactively
AI Token Counter — Count tokens accurately before making API calls
AI API Pricing Comparison 2026 — Full pricing table for all 7 major providers
DeepSeek API Pricing Guide 2026 — The budget alternative at $0.27/M input
Claude API Pricing Guide 2026 — Anthropic’s premium tier with prompt caching
Grok API Pricing Guide 2026 — Grok 3 at $3/M, Mini at $0.30/M, $25 free credits
Mistral API Pricing Guide 2026 — Large 3 at $2/M, Small 3.1 at $0.20/M, EU GDPR compliant
Gemini 3.1 Pro Pricing Guide — $2.00/M, 77.1% ARC-AGI-2, 1M context
GPT-5.3 Codex Pricing Guide — $2/M, agentic coding, 200K context, 32K output