AI Token Counting: What Developers Need to Know (2026 Guide)

Every AI API charges by the token. One token is roughly 4 characters or three-quarters of a word. A 1,000-word blog post is about 1,300 tokens. Understanding tokens is the difference between a $20/month AI bill and a $200 one.

This guide explains how tokens work, how to count them, and how to optimize your usage to keep costs down.

What Is a Token?

A token is the smallest unit of text that a language model processes. Models don't read characters or words. They read tokens, which are fragments of text determined during training.

Rough conversion rules:

Text	Approximate Tokens
1 word (English)	~1.3 tokens
1 sentence	~15-25 tokens
1 paragraph	~60-100 tokens
1 page (500 words)	~650 tokens
1,000 words	~1,300 tokens
Average email	~200-400 tokens
Average code function	~100-300 tokens

These are averages for English text. Code, JSON, and non-English languages tokenize differently. OpenAI's tokenizer documentation confirms that code typically uses 1.5-2x more tokens per semantic unit than prose.

How Token Counting Works

Models use a tokenizer (like BPE, Byte Pair Encoding) to split text into tokens. Common words become single tokens. Rare words get split into multiple tokens.

Examples:

"hello" → 1 token
"unbelievable" → 2 tokens ("un" + "believable")
"GPT-5.6" → 3-4 tokens
console.log("hello") → 5-6 tokens
A JSON object {"name": "John"} → ~8 tokens

Key insight: Structured formats (JSON, XML, code) use more tokens per "meaningful unit" than natural language. A 10-row CSV is more token-efficient than the same data as JSON.

Input Tokens vs Output Tokens

AI APIs charge separately for:

Input tokens: everything you send (system prompt, user message, context)
Output tokens: everything the model generates in response

Output tokens are typically 2-6x more expensive than input tokens:

Model	Input/1M	Output/1M	Ratio
GPT-5.6 Luna	$1.00	$6.00	6x
Claude Sonnet 5	$2.00	$10.00	5x
GPT-5.6 Sol	$5.00	$30.00	6x
Gemini 3 Flash	$0.075	$0.30	4x

Optimization implication: Reducing output length saves more money than reducing input length. Ask for concise responses. Use structured output (JSON mode) to avoid verbose explanations.

How to Count Tokens

Quick Estimation

Count words × 1.3 = approximate tokens (English text)
For code: count characters ÷ 3.5 = approximate tokens
For JSON: count characters ÷ 3 = approximate tokens

Exact Counting

Use the tokenizer that matches your model:

OpenAI models: tiktoken library (Python) with cl100k_base or o200k_base encoding
Claude models: Anthropic's tokenizer (roughly similar to tiktoken)
Gemini: Google's SentencePiece tokenizer

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Your text here")
print(f"Token count: {len(tokens)}")

Our token counter tool gives you an instant count without writing code.

Context Windows Explained

The context window is the maximum number of tokens a model can process in a single conversation (input + output combined).

Model	Context Window	In Words (~)
DeepSeek R1	64K	~49,000
Grok 4.5	500K	~384,000
Claude Sonnet 5	1M	~769,000
GPT-5.6 (all tiers)	1.05M	~808,000
Gemini 3.1 Pro	2M	~1,538,000

Larger context = more data per request, but also higher costs (you pay for all input tokens). Don't stuff the context window "just in case." Only include what's relevant.

Why Token Efficiency Matters

Some models use dramatically more tokens to accomplish the same task:

Grok 4.5: ~1.9M total tokens per coding agent task
GPT-5.6 Sol: ~2.6M total tokens per coding agent task
Claude Fable 5: ~7.2M total tokens per coding agent task

Fable 5 uses 3.8x more tokens than Grok 4.5 for the same class of task. Even though Grok's per-token price is similar, the total cost per task is much lower because it's more efficient.

We noticed this in our own benchmark testing: running the same coding agent task on Fable 5 cost $0.43 vs $0.15 on Grok 4.5. The quality difference was marginal for standard tasks. Check the full numbers on our benchmark dashboard.

7 Ways to Reduce Token Usage

1. Shorten your system prompt

System prompts are sent with every request. A 2,000-token system prompt across 100 daily requests = 200K extra input tokens/day. Trim it to essentials.

2. Use structured output

Ask for JSON or bullet points instead of prose explanations. A model generating a 500-word explanation uses ~650 output tokens. The same information as JSON might be ~150 tokens.

3. Cache repeated context

Use prompt caching (available on OpenAI and Anthropic) to avoid re-processing the same system prompt and context. Cached tokens cost 80-90% less.

4. Chunk large inputs

Instead of sending a full document (100K tokens) every request, extract the relevant section first. Use embeddings or simple keyword matching to find the right chunk.

5. Use cheaper models for simple tasks

A tiered approach routes simple tasks (summaries, formatting, classification) to budget models while reserving expensive models for tasks that need them. See our cost comparison guide.

6. Set max_tokens on output

Always set a max_tokens limit appropriate to your task. Without it, models may generate unnecessarily long responses.

7. Batch similar requests

Group multiple small tasks into a single request when possible. One request with 5 questions is cheaper than 5 separate requests (less system prompt repetition).

Token Costs for Common Tasks

Task	Avg Input Tokens	Avg Output Tokens	Cost (Luna)	Cost (Sonnet 5)
Email draft	300	200	$0.002	$0.003
Code function	500	300	$0.002	$0.004
Blog paragraph	200	400	$0.003	$0.005
Data analysis (1 page)	2,000	500	$0.005	$0.009
Full article	1,000	2,000	$0.013	$0.022
Code review (file)	5,000	1,000	$0.011	$0.020
Agent task (complex)	30,000	15,000	$0.120	$0.210

Summary

Tokens are the currency of AI APIs. Understanding them lets you:

Estimate costs before committing to a model
Optimize prompts to reduce waste
Choose the right model tier for each task
Set appropriate context and output limits

The fastest way to check your token usage: paste text into our token counter. For cost estimation across models, use the Cost Calculator. To see how token pricing varies across all major LLMs, check the LLM Leaderboard.

Count your tokens

Paste text and see the exact token count. Works with OpenAI, Claude, and Gemini tokenizers.

Open Token Counter