This post is that one place. Every lever that moves your Anthropic invoice, with concrete numbers and worked examples. Current as of April 2026.
The Base Rates
Claude's pricing is per million tokens, split between input and output. Output tokens are always more expensive than input tokens — a ratio that usually lands around 5×.
As of April 2026:
| Model | Input / M tokens | Output / M tokens |
|---|---|---|
| Claude Opus 4.x | $15.00 | $75.00 |
| Claude Sonnet 4.x | $3.00 | $15.00 |
| Claude Sonnet 4.x (1M context) | $6.00 | $30.00 |
| Claude Haiku 4.x | $0.80 | $4.00 |
These are the "direct" list rates. Everything else on this page is either a discount off these, a multiplier on top, or a conversion rule that turns non-text content (images, PDFs, tool calls) into billable tokens.
Why output costs 5× input. Generating tokens requires the full forward pass through the model for each token produced. Reading tokens is cheaper because they're processed in parallel. The economics of transformer inference favor readers over writers.
Why 1M-context Sonnet costs 2×. Longer context windows require more memory and more expensive infrastructure per call. Anthropic prices the 1M variant at exactly 2× the base Sonnet rate.
Prompt Caching: The Biggest Discount
Prompt caching can save you up to 90% on cached input tokens. It's the single largest cost lever Anthropic offers, and most teams leave it partly on the table.
How it's billed (April 2026):
- Cache write (first time you cache a prefix): ~1.25× the normal input rate. You pay a small premium to store it.
- Cache read (every subsequent hit within the TTL): ~0.10× the normal input rate. 90% off.
- TTL: 5 minutes default; 1-hour TTL is available at a higher write premium.
Concrete example. You have a 10,000-token system prompt you send with every call.
Without caching, after 1,000 calls:
- 10,000 × 1,000 = 10M cached tokens × $3/M (Sonnet) = $30 just for the system prompt
With caching, after 1,000 calls:
- First call: 10,000 × $3/M × 1.25 = $0.0375 (cache write)
- Next 999 calls: 10,000 × $3/M × 0.10 = $0.003 each = $2.997 total
- Grand total: ~$3.03 — a 90% reduction on that segment.
Gotchas:
- Caching only works for exact prefix matches. Change the first character of your system prompt and you invalidate the cache.
- Dynamic content (dates, user IDs, "Today is…") must live at the end of the prompt, not the beginning, or you defeat the whole system.
- The cache expires on its TTL regardless of activity. High-frequency calls benefit more than low-frequency.
Batch API: 50% Off For Patient Workloads
If a workload can wait up to 24 hours, the Batch API gives you 50% off both input and output tokens. You submit a batch of requests, Anthropic processes them asynchronously, and results come back to a bucket you pull from. In practice, batches usually complete in minutes, not hours.
When batch wins:
- Nightly summarization or classification jobs
- Large-scale evaluation runs
- Document indexing
- Anything a user is not blocking on
When it doesn't:
- Real-time chat
- Agent loops (each step depends on the last)
- Anything where human latency matters
A reasonable mental model: if you could do the work in a cron job, use batch. If someone is staring at a loading spinner, use the standard API.
Tool Use: The Round-Trip Tax
When Claude uses a tool, you pay for two completions minimum, often more:
- First call: Claude reads the prompt + tool definitions, decides to call a tool, emits the tool call. You pay for input + output.
- Your code runs the tool and returns the result.
- Second call: Claude reads the original prompt + tool definitions + tool call + tool result, and either emits the final answer or calls another tool. You pay for input + output again.
Each round trip re-reads the full context. If your tool definitions are 3,000 tokens and you have a 10-tool agent loop, you're reading those 3,000 tokens 10+ times unless you cache them.
Practical rule. Tool definitions are the single best candidate for caching. They're stable (you're not redefining get_weather per-call) and they're re-read on every round trip. Wrapping tool schemas in a cache marker typically cuts agent-loop costs 40–70%.
Vision: Images Are Tokens
Anthropic converts images to tokens at a rate that depends on image dimensions:
- Roughly
(width × height) / 750tokens - Floor of ~1,600 tokens even for small images
- Very large images (4K+) can hit 4,000–5,000 tokens each
Real-world framing: a typical screenshot pasted into a conversation is ~1,200–2,000 tokens. A full-resolution photo from a phone is ~3,000+ tokens. At Sonnet input rates ($3/M), that's roughly a penny per image. At Opus rates ($15/M), it's five cents per image.
If you're running vision at volume — say, a product that analyzes 10,000 screenshots a day on Sonnet — that's:
10,000 × 1,500 tokens × $3/M = $45/day = ~$1,350/month in vision input alone.
Worth modeling before shipping.
PDF Input: Pages Are Tokens (With A Multiplier)
PDFs are processed as a mix of text extraction + rasterized images for each page. Anthropic's effective rate lands around ~2,000–3,000 tokens per page for typical documents, sometimes higher for scanned PDFs with lots of imagery.
A 40-page research paper is roughly 100,000 tokens. At Sonnet rates that's 30 cents to have Claude read it once. At Opus rates it's $1.50. Summarize it 1,000 times across your user base and you're at $300–$1,500 just for document ingestion.
Takeaway: cache parsed PDFs. If 100 users are going to ask questions about the same document, the document itself should be cached after the first parse, not re-billed each time.
The 1M Context Window: Priced Accordingly
Claude Sonnet's 1M-context variant is double the normal Sonnet rate: $6 input / $30 output per million tokens.
It's genuinely useful — whole codebases, entire knowledge bases, giant audit trails — but it's easy to burn through money if you default to it. Most prompts don't need 1M tokens of context. Use the standard context window unless you have a specific reason not to.
A real example: feeding a 400K-token codebase into Claude Sonnet 1M once is 400,000 × $6/M = $2.40. Do that on every call and an active user costs you $100+/day.
The Complete Cost Formula
If you want the full math for a single call:
call_cost =
(cached_input_tokens × rate_cache_read) // ~0.10× input rate
+ (new_cached_input_tokens × rate_cache_write) // ~1.25× input rate
+ (uncached_input_tokens × rate_input) // full input rate
+ (image_token_equivalent × rate_input) // vision
+ (pdf_token_equivalent × rate_input) // pdf
+ (output_tokens × rate_output) // full output rate
+ (tool_roundtrips - 1) × per_roundtrip_cost // each extra call re-bills context
Apply × 0.5 if the whole thing went through the Batch API.
It's a lot. That's why most teams don't actually model it and then get surprised by the invoice.
What Actually Shows Up On Your Invoice
Anthropic itemizes by model and by input/output/cache category. A typical monthly bill for a mid-size team might look like:
- $1,240 — Sonnet input (uncached)
- $180 — Sonnet input (cache read)
- $95 — Sonnet input (cache write)
- $2,900 — Sonnet output
- $420 — Haiku input + output (background jobs)
- $550 — Opus (rare hard tasks)
- $85 — Batch API (nightly summaries)
- Total: ~$5,470
If your invoice is imbalanced — say, output dwarfing everything else — that's a signal to shorten responses or use structured output. If uncached input dwarfs cache-read, you have caching left on the table. If Opus dominates, you're over-spec'd and Sonnet probably handles 80% of it.
The Alternative: A BYOK Savings Layer
If you've read this far, you probably want one of two things:
- Full control. Model your cost end-to-end, tune every knob, squeeze the invoice yourself. Work the levers above — caching, batch, model selection, output discipline.
- Less invoice. Don't want to become a token-optimization specialist. Just want the same Claude output for a fraction of the price.
If it's #2, that's what aiusage does. Same Claude, same Claude Code, same SDK — routed through proprietary infrastructure that delivers the same output while billing Anthropic ~20× less per call. You pay us a flat credit-pack fee ($10/15 runs, $25/50, $50/120), your Anthropic key stays in your account, no subscription, credits never expire. Most of the levers in this post still apply (caching is still worth it, model selection still matters) — we just make the underlying per-call Anthropic cost dramatically smaller.
If you want #1, skip us and work the list. Either way, you're going to get a smaller bill than the default.
Quick Reference (Save This)
- Opus: $15 in / $75 out. Use sparingly.
- Sonnet: $3 in / $15 out. The default.
- Sonnet 1M: $6 in / $30 out. Only when you need it.
- Haiku: $0.80 in / $4 out. Route as much as possible here.
- Prompt cache read: ~10% of input rate. 90% off.
- Prompt cache write: ~125% of input rate. Pay once, save forever.
- Batch API: 50% off. Anything that can wait.
- Tool use: each round trip re-reads context. Cache tool schemas.
- Vision: ~1,500–3,000 tokens per image.
- PDFs: ~2,000–3,000 tokens per page.
Print this. Tape it to your monitor. The next time your bill spikes, the answer is probably on the list.
Drop your Claude bill 20×.
Paste your key at aiusage.ai — takes 60 seconds. BYOK, credit packs from $10, credits never expire.
Get started →Written by the team at aiusage.ai — the BYOK Claude proxy that makes your existing Anthropic account ~20× cheaper. See the math or grab a $10 pack to try it.