LLM Token Counter & Cost Calculator
Estimate tokens and API cost across GPT, Claude, Gemini, and Llama in real time. Runs entirely in your browser.
About these estimates
Token counts use a calibrated heuristic (chars-per-token derived from each vendor's published tokenizer behavior, with adjustments for code-dense and CJK content). For natural English prose the estimate is typically within ยฑ5% of the true tokenizer.
For exact counts, use the provider's tokenizer directly: OpenAI's tiktoken, Anthropic's count_tokens API, or Google's countTokens endpoint. Pricing reflects publicly listed rates as of May 2026 and may change โ always verify on the vendor's official pricing page before billing decisions.
๐ How to Use
- Pick the model you're calling (GPT-5, Claude Opus 4.7, etc.).
- Paste your prompt or input text into the textarea.
- Watch token count and context-window usage update in real time.
- Set expected output tokens to estimate end-to-end API cost.
- Click Copy summary to share or paste cost estimates.
About the LLM Token Counter
Calling an LLM API is priced by tokens, not by character or word โ and tokens are the most common source of unexpected bills. This calculator estimates how many tokens your prompt will be on each major model and how much a full call (input + output) will cost. It also shows how much of the model's context window your prompt consumes, so you can spot when you're about to overflow.
When you'd use this
- Cost estimation before shipping: Project monthly spend before turning on a feature that calls an LLM per user request.
- Model comparison: See whether dropping from GPT-5 to GPT-4o (or Claude Opus 4.7 to Sonnet 4.6) makes the difference between $0.04 and $0.004 per call.
- Context budgeting: RAG pipelines, long-document summarisation, and agentic tool-loop systems regularly hit context limits. The progress bar makes the constraint visible.
- Prompt engineering: Trim system prompts and few-shot examples while watching the live count drop.
How accuracy works here
Vendor tokenizers (OpenAI's tiktoken, Anthropic's tokenizer, Google's SentencePiece) are language-specific BPE/sentencepiece variants and are too large to ship in a no-build static site. This tool uses a calibrated chars-per-token heuristic per model, adjusted for content type (natural language โ vendor-published ratio, code โ 0.78ร that, CJK โ 0.4ร that). For English prose the estimate is typically within 5% of the exact count. For billing-critical decisions, always run a sample through the official API or tokenizer.
Pricing snapshot (May 2026, USD per 1M tokens)
- GPT-5: $5.00 in / $15.00 out
- GPT-4o: $2.50 in / $10.00 out
- GPT-4o mini: $0.15 in / $0.60 out
- Claude Opus 4.7: $15.00 in / $75.00 out
- Claude Sonnet 4.6: $3.00 in / $15.00 out
- Claude Haiku 4.5: $0.80 in / $4.00 out
- Gemini 3.1 Pro: $3.50 in / $10.50 out
- Gemini 1.5 Flash: $0.075 in / $0.30 out
- Llama 3.3 70B (hosted): $0.20 in / $0.60 out
Always verify against the vendor's official pricing page before committing to billing decisions.
โ Frequently Asked Questions
How accurate is the token count?
The counter uses a calibrated heuristic (characters-per-token) tuned per model and adjusted for code-dense and CJK content. For natural English prose the estimate is typically within ยฑ5% of the official tokenizer. For exact counts, use OpenAI tiktoken, Anthropic count_tokens API, or Google countTokens.
Which models are supported?
GPT-5, GPT-4o, GPT-4o mini (OpenAI), Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 (Anthropic), Gemini 3.1 Pro, Gemini 1.5 Flash (Google), and Llama 3.3 70B (Meta, hosted-API price). New models are added as they release.
Are the prices up to date?
Pricing reflects publicly listed rates as of May 2026 in USD per million tokens. Vendors adjust prices regularly โ always verify on the official pricing page before billing-critical decisions.
How is API cost calculated?
Cost = (input tokens ร input rate per million / 1,000,000) + (output tokens ร output rate per million / 1,000,000). Output tokens are billed separately from prompt tokens, usually at a higher rate. Set the expected output tokens field for an end-to-end call estimate.
Why are output tokens billed at a different rate?
Output (completion) generation is autoregressive and serial โ the model produces one token at a time, which costs more compute per token than batched input processing. For most modern models output is 3โ5ร the input rate.
Can I use this offline / is my data sent to OpenAI or Anthropic?
No data is sent anywhere. All counting and cost calculation runs entirely in your browser via JavaScript. Once the page is loaded it works offline. Your prompts never touch any server.
How do tokens map to words?
A rough rule of thumb: 1 token โ 0.75 English words, or 1 word โ 1.3 tokens. So 1,000 tokens โ 750 words โ 4 paragraphs. Code, JSON, and non-Latin scripts tokenize denser (more tokens per character).
What is a context window?
The context window is the maximum number of tokens (input + output combined) the model can process in a single request. GPT-4o is 128k, Claude Opus 4.7 is 1M, Gemini 3.1 Pro is 2M. The progress bar shows what fraction of the window your input alone consumes.
Why do tokens differ between GPT and Claude for the same text?
Each model family uses its own tokenizer (BPE variants). OpenAI uses cl100k_base/o200k_base, Anthropic uses a custom tokenizer, Google uses SentencePiece. The same English sentence might be 18 tokens for GPT-4o and 16 for Claude Opus.