Dictionary

Tokens

A token is the smallest unit of text an AI model processes, usually half a word or a punctuation mark. Tokens drive both the bill on every API call and the limits of what a model can handle at once.

What is a token?

A token is the smallest unit of text an AI model works with. For the model, language is not made of words or letters but of tokens. A tokeniser first turns your input into a sequence of these units, and the model reasons only at that level. Every answer is generated token by token too.

In English a token is roughly half a syllable to a whole one. The word cat is one token, cats usually two, unreliability can break into four or five. Punctuation, spaces, and line breaks each count as tokens too. For Chinese, Arabic, or Japanese a single character often pairs into one or two tokens.

Picture tokens as the puzzle pieces a model uses to chop up language. They are not random: the tokeniser is trained so that common word combinations stay as one piece and rare words split into smaller building blocks.

Why tokens instead of words?

A word-based approach does not work well for models that must handle many languages, accept new words, and stay memory-efficient. A tokeniser solves that with three properties:

  • Common words stay whole. Everyday words such as house, company, or the are a single token each.

  • Rare or compound words get broken up. That way the model can still understand unknown words by composing them from familiar pieces.

  • Multilinguality is baked in. A well-trained tokeniser handles English, Dutch, code, and emoji through the same vocabulary.

The common methods are Byte Pair Encoding (BPE), WordPiece, and SentencePiece. OpenAI, Anthropic, and Google each use a variant, which is why the same text ends up with slightly different token counts across models.

Rules of thumb for estimating tokens

  • 100 tokens is roughly 75 words of English or Dutch.

  • 1,000 tokens fits on one A4 page in a normal font.

  • 1 million tokens covers around five books or a medium-sized codebase.

For exact counts, use the tokeniser of the model you are calling. OpenAI offers tiktoken, Anthropic has a tokeniser endpoint, and most SDKs expose the token count per response. Always count in advance for large prompts, because surprises on the bill are never fun.

Why tokens matter

Billing
Most AI APIs price per token. Input tokens are usually cheaper than output tokens, but both count. A chatbot that sends 50,000 tokens of context on every query runs about half a euro cent per call. With millions of calls per month, that adds up.

Context window
An LLM can only hold a limited number of tokens in its context window at once. Instructions, retrieved documents, and conversation history all need to fit. Managing tokens is as much a quality question as a cost one.

Latency
Every output token takes compute time. A model that generates 2,000 tokens of answer takes about twenty times longer than one that answers in 100 tokens. For chat experiences that is noticeable to the user.

How to spend fewer tokens

  1. Shorter system prompts. Long role descriptions packed with examples eat the same budget on every call. Cutting them back and testing often has no impact on quality.

  2. Selective retrieval with RAG. Send only relevant passages instead of full documents.

  3. Prompt caching. Anthropic and OpenAI let you cache large, stable parts of the prompt so you do not pay again for the same system prompt or knowledge on every call.

  4. Smaller model where possible. Route routine tasks to a cheaper model (a classifier or extractor) and reserve the most capable model for complex work. Often cuts total token cost in half.

  5. Force structured output. A model that returns JSON does not need to write an intro sentence or a conclusion. Saves tokens and makes downstream processing easier.

Tokens versus words when comparing prices

When providers quote prices per million tokens, translating that to your own usage is useful. A customer service chatbot that handles 3,000 conversations per day at 1,500 tokens per conversation burns 4.5 million tokens per day, or 135 million per month. At 3 euros per million input tokens, that is a bit more than 400 euros per month for input alone. Numbers like these turn token budget from a technical detail into a conscious design choice.

Last Updated: April 23, 2026 Back to Dictionary
Keywords
tokens tokenisation llm context window ai generative ai gpt claude api cost prompt engineering