AI Token Counter: How to Count Tokens and Estimate API Costs
Learn what tokens are, how to count them for GPT, Claude, and Gemini, and estimate API costs before you hit send.
Every API call to GPT, Claude, or Gemini costs money. The price depends on tokens. Understanding tokens helps you estimate costs, stay within limits, and optimize your prompts.
This guide explains what tokens are, how to count them, and how to reduce costs.
What is a Token?
Tokens are pieces of text that AI models process. A token can be:
- A complete word ("hello" = 1 token)
- Part of a word ("unbelievable" = 3 tokens: "un", "believ", "able")
- Punctuation ("!" = 1 token)
- A space (sometimes included with adjacent words)
Token Estimation Rules
For English text:
- 1 token ≈ 4 characters
- 1 token ≈ 0.75 words
- 100 tokens ≈ 75 words
These are rough estimates. Actual counts depend on the specific tokenizer each model uses.
Example Token Counts
| Text | Approximate Tokens |
|---|---|
| "Hello" | 1 |
| "Hello, world!" | 4 |
| "The quick brown fox jumps over the lazy dog." | 10 |
| A 500-word blog post | ~650 tokens |
| A 2000-word article | ~2600 tokens |
Why Token Counts Matter
1. API Costs
AI APIs charge per token. For example, GPT-5 costs $1.25 per million input tokens and $10 per million output tokens.
A simple calculation:
- Your prompt: 500 tokens
- AI response: 300 tokens
- Cost: (500 × $1.25 + 300 × $10) / 1,000,000 = $0.0037
That's less than half a cent. But at scale, costs add up. 1000 requests per day = $3.70/day = $111/month.
2. Context Window Limits
Each model has a maximum context window:
| Model | Context Window |
|---|---|
| GPT-5 | 400,000 tokens |
| Claude Sonnet 4.5 | 1,000,000 tokens |
| Gemini 3 Pro | 1,048,576 tokens |
| Grok 4.1 Fast | 2,000,000 tokens |
| Claude Haiku 4.5 | 200,000 tokens |
Your input + output must fit within this limit. For long documents or conversations, you need to track token usage.
3. Response Quality
Longer context doesn't always mean better responses. Very long prompts can dilute the model's focus. Sometimes a concise, well-structured prompt works better than a verbose one.
Current AI Model Pricing (January 2026)
Pricing changes frequently. Here are current rates for popular models:
OpenAI
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| GPT-5 | $1.25 | $10.00 |
Anthropic
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Gemini 3 Pro | $2.00 | $12.00 |
| Gemini 3 Flash | $0.075 | $0.30 |
xAI
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Grok 4.1 Fast | $0.20 | $0.50 |
How to Count Tokens
Option 1: Online Tool
Use our Token Counter to instantly count tokens for any text. Select your model, paste your text, and see the token count, cost estimate, and context usage.
All processing happens in your browser. Your text stays private.
Option 2: OpenAI Tokenizer
OpenAI provides a tokenizer library:
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4")
tokens = encoder.encode("Your text here")
print(f"Token count: {len(tokens)}")
Option 3: Anthropic API
Claude uses a different tokenizer. You can estimate or use their API to get exact counts:
import anthropic
client = anthropic.Anthropic()
count = client.count_tokens("Your text here")
Option 4: Quick Estimation
For rough estimates without tools:
- Count words in your text
- Multiply by 1.3
- Round up
500 words × 1.3 = 650 tokens (approximate)
Reducing Token Usage and Costs
1. Write Concise Prompts
Bad (verbose):
I would really appreciate it if you could help me with the following
task. What I need is for you to write a summary of the article that
I am going to paste below. The summary should be relatively brief,
maybe around 3 paragraphs or so.
Good (concise):
Summarize this article in 3 paragraphs:
Same intent, 90% fewer tokens.
2. Use System Prompts Efficiently
System prompts are sent with every request. Keep them lean:
Bad:
You are a helpful assistant that always provides accurate, detailed,
and well-researched answers. You should be polite, professional, and
thorough in all your responses...
Good:
You are a technical writer. Be concise and accurate.
3. Limit Output Length
Ask for specific formats:
- "Answer in 2 sentences"
- "List 5 items"
- "Respond in 100 words or less"
This reduces output tokens, which are typically more expensive than input tokens.
4. Use Cheaper Models for Simple Tasks
Not every task needs GPT-5.2 or Claude Sonnet 4.5. For simple classification, summarization, or formatting:
- Use GPT-4o Mini ($0.15/1M input vs $1.75/1M)
- Use Gemini Flash ($0.075/1M input vs $2.00/1M)
That's 10-25x cheaper for tasks that don't need top-tier reasoning.
5. Cache Repeated Content
If you're sending the same context repeatedly (like documentation or examples), consider:
- Caching at the application level
- Using "prompt caching" features (available in some APIs)
- Pre-processing to extract only relevant sections
Tracking Usage at Scale
For production applications:
Set Up Monitoring
Track tokens per request, daily usage, and costs by endpoint. Most API clients provide token counts in responses.
Set Budgets and Alerts
OpenAI and Anthropic dashboards let you set spending limits. Configure alerts before you hit them.
Optimize High-Volume Endpoints
Identify which API calls use the most tokens. Focus optimization efforts there first.
Try It Now
Use our free Token Counter to count tokens before making API calls. Estimate costs, check context limits, and optimize your prompts.
Building prompts? Try our Prompt Optimizer to improve your prompts for better results. For hands-on techniques, read our Prompt Engineering guide.
Choosing between models? See our Best AI Models for Coding in 2026 for a cost-vs-quality breakdown.