Dictionary

Tokens

A token is the smallest unit of text an AI model processes, usually half a word or a punctuation mark. Tokens drive both the bill on every API call and the limits of what a model can handle at once.

What is a token?

A token is the smallest unit of text an AI model works with. For the model, language is not made of words or letters but of tokens. A tokeniser first turns your input into a sequence of these units, and the model reasons only at that level. Every answer is generated token by token too.

In English a token is roughly half a syllable to a whole one. The word cat is one token, cats usually two, unreliability can break into four or five. Punctuation, spaces, and line breaks each count as tokens too. For Chinese, Arabic, or Japanese a single character often pairs into one or two tokens.

Picture tokens as the puzzle pieces a model uses to chop up language. They are not random: the tokeniser is trained so that common word combinations stay as one piece and rare words split into smaller building blocks.

Why tokens instead of words?

A word-based approach does not work well for models that must handle many languages, accept new words, and stay memory-efficient. A tokeniser solves that with three properties:

Common words stay whole. Everyday words such as house, company, or the are a single token each.
Rare or compound words get broken up. That way the model can still understand unknown words by composing them from familiar pieces.
Multilinguality is baked in. A well-trained tokeniser handles English, Dutch, code, and emoji through the same vocabulary.

The common methods are Byte Pair Encoding (BPE), WordPiece, and SentencePiece. OpenAI, Anthropic, and Google each use a variant, which is why the same text ends up with slightly different token counts across models.

Rules of thumb for estimating tokens

100 tokens is roughly 75 words of English or Dutch.
1,000 tokens fits on one A4 page in a normal font.
1 million tokens covers around five books or a medium-sized codebase.

For exact counts, use the tokeniser of the model you are calling. OpenAI offers tiktoken, Anthropic has a tokeniser endpoint, and most SDKs expose the token count per response. Always count in advance for large prompts, because surprises on the bill are never fun.

Why tokens matter

Billing
Most AI APIs price per token. Input tokens are usually cheaper than output tokens, but both count. A chatbot that sends 50,000 tokens of context on every query runs about half a euro cent per call. With millions of calls per month, that adds up.

Context window
An LLM can only hold a limited number of tokens in its context window at once. Instructions, retrieved documents, and conversation history all need to fit. Managing tokens is as much a quality question as a cost one.

Latency
Every output token takes compute time. A model that generates 2,000 tokens of answer takes about twenty times longer than one that answers in 100 tokens. For chat experiences that is noticeable to the user.

How to spend fewer tokens

Shorter system prompts. Long role descriptions packed with examples eat the same budget on every call. Cutting them back and testing often has no impact on quality.
Selective retrieval with RAG. Send only relevant passages instead of full documents.
Prompt caching. Anthropic and OpenAI let you cache large, stable parts of the prompt so you do not pay again for the same system prompt or knowledge on every call.
Smaller model where possible. Route routine tasks to a cheaper model (a classifier or extractor) and reserve the most capable model for complex work. Often cuts total token cost in half.
Force structured output. A model that returns JSON does not need to write an intro sentence or a conclusion. Saves tokens and makes downstream processing easier.

Tokens versus words when comparing prices

When providers quote prices per million tokens, translating that to your own usage is useful. A customer service chatbot that handles 3,000 conversations per day at 1,500 tokens per conversation burns 4.5 million tokens per day, or 135 million per month. At 3 euros per million input tokens, that is a bit more than 400 euros per month for input alone. Numbers like these turn token budget from a technical detail into a conscious design choice.

Last Updated: April 23, 2026 Back to Dictionary

Keywords

tokens tokenisation llm context window ai generative ai gpt claude api cost prompt engineering

/ Related

Related Terms

Term

AI Act (EU)

The AI Act is the European Union regulation that governs artificial intelligence. It sorts AI systems by risk and places obligations on anyo...

Read definition

Term

AI agent

An AI agent is an AI system that autonomously plans and executes multiple steps to reach a goal. It uses a language model as its brain and c...

Read definition

Term

Artificial Intelligence (AI)

Artificial intelligence is technology that teaches computers to learn, reason, and make decisions from data instead of following hand-writte...

Read definition

Term

Bias

Bias in AI is a skew that creeps into models through data, algorithms, or human choices. It is not always harmful, but it has to be managed ...

Read definition

Term

Bottleneck analysis

Bottleneck analysis finds the step in a process where work gets stuck waiting, the step that dictates total throughput time. You spot bottle...

Read definition

/ Further reading

From the blog.

Driverless electric delivery cart on the streets of Leuven

Article · Apr 22, 2026

Collect&Go rolls out a driverless grocery cart in Leuven

Collect&Go and Telenet Business are testing an autonomous electric delivery cart in Leuven, steered over 5G. What it means for logistics and...

Sketched illustration of repetitive tasks flowing through a gear into a dashboard showing eight hours saved per week.

Article · Jan 28, 2026

10 Practical Steps to Automate Your Business Processes

Ten practical steps to automate your business processes without AI hype. Start small, fix the process first, use the tools you already own, ...