LLM Token Counter & Cost Calculator

Estimate tokens and API cost across GPT, Claude, Gemini, and Llama in real time. Runs entirely in your browser.

100% client-side · your data never leaves your browser

Model

Prompt / input text

Est. tokens

Characters

Words

Bytes (UTF-8)

Lines

Chars no spaces

Context window usage0 / 128,000 (0.00%)

Expected output tokens (for cost estimate)

APIs are billed for both prompt (input) and completion (output) tokens. A typical assistant reply is 200–800 tokens.

Input cost

0 tok × $2.5/M

Output cost

$0.0050

500 tok × $10/M

Total per call

$0.0050

GPT-4o

≈ $5.00 per 1,000 calls · $5000.00 per 1M calls

About these estimates

Token counts use a calibrated heuristic (chars-per-token derived from each vendor's published tokenizer behavior, with adjustments for code-dense and CJK content). For natural English prose the estimate is typically within ±5% of the true tokenizer.

For exact counts, use the provider's tokenizer directly: OpenAI's tiktoken, Anthropic's count_tokens API, or Google's countTokens endpoint. Pricing reflects publicly listed rates as of May 2026 and may change — always verify on the vendor's official pricing page before billing decisions.

📖 How to Use

Pick the model you're calling (GPT-5, Claude Opus 4.7, etc.).
Paste your prompt or input text into the textarea.
Watch token count and context-window usage update in real time.
Set expected output tokens to estimate end-to-end API cost.
Click Copy summary to share or paste cost estimates.

About the LLM Token Counter

Calling an LLM API is priced by tokens, not by character or word — and tokens are the most common source of unexpected bills. This calculator estimates how many tokens your prompt will be on each major model and how much a full call (input + output) will cost. It also shows how much of the model's context window your prompt consumes, so you can spot when you're about to overflow.

When you'd use this

Cost estimation before shipping: Project monthly spend before turning on a feature that calls an LLM per user request.
Model comparison: See whether dropping from GPT-5 to GPT-4o (or Claude Opus 4.7 to Sonnet 4.6) makes the difference between $0.04 and $0.004 per call.
Context budgeting: RAG pipelines, long-document summarisation, and agentic tool-loop systems regularly hit context limits. The progress bar makes the constraint visible.
Prompt engineering: Trim system prompts and few-shot examples while watching the live count drop.

How accuracy works here

Vendor tokenizers (OpenAI's tiktoken, Anthropic's tokenizer, Google's SentencePiece) are language-specific BPE/sentencepiece variants and are too large to ship in a no-build static site. This tool uses a calibrated chars-per-token heuristic per model, adjusted for content type (natural language ≈ vendor-published ratio, code ≈ 0.78× that, CJK ≈ 0.4× that). For English prose the estimate is typically within 5% of the exact count. For billing-critical decisions, always run a sample through the official API or tokenizer.

Pricing snapshot (May 2026, USD per 1M tokens)

GPT-5: $5.00 in / $15.00 out
GPT-4o: $2.50 in / $10.00 out
GPT-4o mini: $0.15 in / $0.60 out
Claude Opus 4.7: $15.00 in / $75.00 out
Claude Sonnet 4.6: $3.00 in / $15.00 out
Claude Haiku 4.5: $0.80 in / $4.00 out
Gemini 3.1 Pro: $3.50 in / $10.50 out
Gemini 1.5 Flash: $0.075 in / $0.30 out
Llama 3.3 70B (hosted): $0.20 in / $0.60 out

Always verify against the vendor's official pricing page before committing to billing decisions.

❓ Frequently Asked Questions

How accurate is the token count?

The counter uses a calibrated heuristic (characters-per-token) tuned per model and adjusted for code-dense and CJK content. For natural English prose the estimate is typically within ±5% of the official tokenizer. For exact counts, use OpenAI tiktoken, Anthropic count_tokens API, or Google countTokens.

Which models are supported?

GPT-5, GPT-4o, GPT-4o mini (OpenAI), Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 (Anthropic), Gemini 3.1 Pro, Gemini 1.5 Flash (Google), and Llama 3.3 70B (Meta, hosted-API price). New models are added as they release.

Are the prices up to date?

Pricing reflects publicly listed rates as of May 2026 in USD per million tokens. Vendors adjust prices regularly — always verify on the official pricing page before billing-critical decisions.

How is API cost calculated?

Cost = (input tokens × input rate per million / 1,000,000) + (output tokens × output rate per million / 1,000,000). Output tokens are billed separately from prompt tokens, usually at a higher rate. Set the expected output tokens field for an end-to-end call estimate.

Why are output tokens billed at a different rate?

Output (completion) generation is autoregressive and serial — the model produces one token at a time, which costs more compute per token than batched input processing. For most modern models output is 3–5× the input rate.

Can I use this offline / is my data sent to OpenAI or Anthropic?

No data is sent anywhere. All counting and cost calculation runs entirely in your browser via JavaScript. Once the page is loaded it works offline. Your prompts never touch any server.

How do tokens map to words?

A rough rule of thumb: 1 token ≈ 0.75 English words, or 1 word ≈ 1.3 tokens. So 1,000 tokens ≈ 750 words ≈ 4 paragraphs. Code, JSON, and non-Latin scripts tokenize denser (more tokens per character).

What is a context window?

The context window is the maximum number of tokens (input + output combined) the model can process in a single request. GPT-4o is 128k, Claude Opus 4.7 is 1M, Gemini 3.1 Pro is 2M. The progress bar shows what fraction of the window your input alone consumes.

Why do tokens differ between GPT and Claude for the same text?

Each model family uses its own tokenizer (BPE variants). OpenAI uses cl100k_base/o200k_base, Anthropic uses a custom tokenizer, Google uses SentencePiece. The same English sentence might be 18 tokens for GPT-4o and 16 for Claude Opus.