What Is a Token?
If you’ve worked with AI APIs, you’ve seen charges based on “tokens” rather than words or characters. But what exactly is a token?
A token is the basic unit of text that a language model processes. Think of it as a piece of a word — not quite a letter, not quite a word, but something in between.
Token Examples
Let’s look at how the word “tokenization” gets broken down:
| Text | Tokens | Token Count |
|---|---|---|
| ”Hello” | [“Hello”] | 1 |
| ”Hello, world!” | [“Hello”, ”,”, ” world”, ”!“] | 4 |
| ”tokenization” | [“token”, “ization”] | 2 |
| ”AI is amazing” | [“AI”, ” is”, ” amazing”] | 3 |
| ”🚀” | [”🚀“] | 1 |
| ”Привет” (Russian) | [“При”, “вет”] | 2 |
The 4-Character Rule of Thumb
For English text, the most common approximation is:
1 token ≈ 4 characters ≈ 0.75 words
Or equivalently: 750 words ≈ 1,000 tokens
This works well for typical English prose. It breaks down for:
- Code (often more tokens per character due to special symbols)
- Non-English languages (often more tokens per word)
- Technical content with many rare words
Different Models, Different Tokenizers
Each AI model family uses its own tokenizer:
| Model Family | Tokenizer | Notes |
|---|---|---|
| GPT-4, GPT-3.5 | cl100k_base | 100K vocabulary |
| GPT-4o | o200k_base | 200K vocabulary, more efficient |
| Claude | Custom | Similar to cl100k |
| Gemini | SentencePiece | Different vocabulary |
| Llama | BPE | Open-source, inspectable |
For this reason, the same prompt can have different token counts on different models.
Why Token Counts Matter for Costs
Every API call costs money based on:
- Input tokens — everything you send (system prompt + conversation history + user message)
- Output tokens — everything the model generates
Since output tokens typically cost 3-5x more than input tokens, generating concise responses matters.
Example Cost Calculation
Say you’re using GPT-4o ($2.50/1M input, $10.00/1M output):
- System prompt: 200 tokens
- User message: 150 tokens
- Model response: 400 tokens
Cost = (350 input × $0.0000025) + (400 output × $0.00001)
= $0.000875 + $0.004 = $0.004875 per request
At 10,000 requests/day: $48.75/day
Context Windows
Every model has a maximum context window — the total number of tokens it can process in a single request (input + output combined):
| Model | Context Window |
|---|---|
| GPT-4o | 128K tokens |
| Claude 3.5 Sonnet | 200K tokens |
| Gemini 1.5 Pro | 2M tokens |
| GPT-3.5 Turbo | 16K tokens |
| Llama 3.1 | 128K tokens |
Exceeding the context window causes errors. For chat applications, you need to truncate or summarize history to stay within limits.
How to Count Tokens Before Sending
Python (OpenAI models)
import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")
tokens = enc.encode("Hello, world!")
print(len(tokens)) # 4
Estimate (all models)
def estimate_tokens(text: str) -> int:
return len(text) // 4 # rough approximation
Use TokenCalc
The easiest way: just paste your text into our Token Calculator and see the estimate instantly.
Practical Tips
- Measure your actual prompts with tiktoken before estimating costs
- Track token usage in API responses — all providers return actual counts in the response
- Set max_tokens to limit output length and prevent runaway costs
- Compress conversation history for chat apps to avoid hitting context limits
- Non-English content uses more tokens — factor in 1.5-3x multiplier for languages like Chinese, Japanese, or Arabic
Understanding tokens is fundamental to building efficient, cost-optimized AI applications. Use our calculator to model costs before you build, not after you get the bill.