What capabilities does Llama 3.1 405B have?

Llama 3.1 405B supports: text, function_calling. Open-source. Pricing varies by hosting provider.

Llama 3.1 405B

Q: What is Llama 3.1 405B's context window?

Llama 3.1 405B supports a context window of 128K tokens with a maximum output of 4K tokens per response.

Meta (via providers)

Updated May 2026. Llama 3.1 405B by Meta (via providers): $3.50/M cache-miss input, $3.50/M output tokens. 128K context, 4K max output. Function Calling. Free calculator + compare 40+ models.

Input Price

$3.50

cache miss / 1M tokens

Output Price

$3.50

per 1M tokens

Context Window

128K

tokens

Specifications

Provider	Meta (via providers)
Model ID	llama-3-1-405b
Input Price	$3.5 / 1M cache-miss tokens
Output Price	$3.5 / 1M tokens
Context Window	128K tokens
Max Output	4K tokens
Capabilities	textfunction_calling
Release Date	2024-07
Pricing Source	Official Meta (via providers) pricing
Price Verified	2026-02-26 · Hosted Llama pricing depends on the third-party API provider.
Notes	Open-source. Pricing varies by hosting provider.

Monthly Cost Estimates

Estimated monthly costs based on different daily usage levels (assuming 50% input / 50% output split). Input estimates use cache-miss pricing, so cache-heavy workloads can be lower.

Daily Tokens	Monthly Cost	Annual Cost
10K	$1.05	$12.60
50K	$5.25	$63.00
100K	$10.50	$126.00
500K	$52.50	$630.00
1.0M	$105.00	$1260.00

About Llama 3.1 405B

Llama 3.1 405B is a large language model by Meta (via providers). It features a 128K token context window with up to 4K tokens of output per request. The model supports 2 capabilities: text, function_calling.

At $3.5 per million cache-miss input tokens and $3.5 per million output tokens, Llama 3.1 405B is positioned as a mid-range option in the Meta (via providers) lineup. Use our Token Counter to estimate how many tokens your prompts use, and our Pricing Calculator to compare costs across all models.

Llama 3.1 405B Key Details

Pricing: $3.5/M cache-miss input tokens, $3.5/M output tokens
Context window: 128K tokens — good for standard conversations and tasks
Max output: 4K tokens per response
Capabilities: text, function_calling
Highlights: Open-source. Pricing varies by hosting provider.
Released: 2024-07

Other Meta (via providers) Models

Llama 3.3 70B

$0.88 / $0.88 per 1M

Similar Price Range

Claude Sonnet 4.6

Anthropic

$3 / $15 per 1M

Grok 3

xAI

$3 / $15 per 1M

Grok 4

xAI

$3 / $15 per 1M

Related Tools

AI Token Counter

Count tokens for Llama 3.1 405B

Pricing Calculator

Compare all model prices

Throughput Planner

Plan RPM, TPM, and monthly cost for Llama 3.1 405B

FAQ

How much does Llama 3.1 405B cost?

Llama 3.1 405B costs $3.5 per million cache-miss input tokens and $3.5 per million output tokens. For a typical workload of 100K input tokens/day and 50K output tokens/day, expect approximately $15.75/month before cache-hit savings.

What is Llama 3.1 405B's context window?

Llama 3.1 405B supports a context window of 128K tokens. This means your combined input prompt and output response can be up to 128K tokens. The maximum output per response is 4K tokens.

Is Llama 3.1 405B good for my use case?

Llama 3.1 405B supports text, function_calling. As a mid-range model, it balances capability and cost for most production use cases. Use our Pricing Calculator to compare with alternatives.