AI Act (EU)
The AI Act is the European Union regulation that governs artificial intelligence. It sorts AI systems by risk and places obligations on anyo...
Read definitionThe context window is the amount of text a language model can see and process in a single call. It sets how many instructions, documents, and chat history you can include before the model simply forgets the oldest parts.
A language model's context window is the maximum amount of text it can process in one call. Everything you send in (system instruction, user question, retrieved documents, chat history) plus everything the model generates has to fit inside that window. The unit is typically not words but tokens, the building blocks the model works with.
Think of the context window as a model's short-term memory. A human can hold around seven things in mind at once. A modern language model can hold hundreds of thousands, but that is still finite. Anything that falls out of the window is simply invisible to the model.
Context windows have grown quickly. GPT-3 started in 2020 with 2,000 tokens. Today Claude, GPT-4.x, and Gemini run with windows of 200,000 up to 2 million tokens. That sounds unlimited, but there are plenty of caveats in practice.
It bounds what you can pass in. An 80-page contract does not fit in a 4,000-token window. A full codebase does not fit in 32,000 tokens. For large documents you either need a big window or a smart way to send only the relevant pieces.
It drives what you pay. Most APIs bill per token, for input and output alike. A 100,000-token prompt on every call quickly becomes expensive. Caching and deliberate context pruning are not luxuries in production systems, they are requirements.
It affects quality. The further information sits in the window, the more likely the model forgets it or combines it poorly. The so-called lost in the middle effect: models recall the beginning and end of a long prompt better than the middle.
A token is roughly half a word in English or Dutch, and shorter for languages such as Chinese or Arabic. What counts as one token depends on the model's tokeniser. A few rules of thumb for English:
100 tokens is roughly 75 words or five short sentences.
1,000 tokens is about one A4 page of text.
100,000 tokens is a short novel, around 300 pages.
1 million tokens covers roughly five books or a medium-sized codebase.
Names, numbers, URLs, and abbreviations often break into more tokens than you expect. Count them with the model's own tokeniser before sizing your prompts.
Retrieval-Augmented Generation
Instead of sending every document, RAG fetches only the relevant passages and passes those in. A knowledge base of gigabytes fits in a few thousand tokens of context this way.
Summarising on the fly
In long conversations, periodically have the model summarise what was discussed and drop the oldest messages. The less noise, the better the answer.
Chunking by task
Split large documents and have the model do a sub-task per chunk, then merge the outputs. Works well for summarisation, extraction, and comparison.
Prompt caching
APIs that offer prompt caching (Anthropic, OpenAI) only charge once for a big system prompt or document, no matter how many calls you send over it. Can cut cost by up to 10x.
A million-token window sounds impressive, but does not solve every problem.
Quality degrades with length. Research shows models struggle to recall facts from the middle of very long contexts. Performance on needle in a haystack tests measurably drops between 32,000 and 128,000 tokens.
Cost scales linearly. Every token you send in gets billed, relevant or not. A well-designed RAG with 5,000 tokens of context almost always beats a brute-force prompt with 500,000 tokens.
Latency scales too. Large prompts take more compute. For interactive use cases, a 30-second response is often unusable.
In practice you combine a generous context window with a thoughtful pipeline: smart retrieval, smart summarisation, smart caching. That gets you the best of both worlds.
The AI Act is the European Union regulation that governs artificial intelligence. It sorts AI systems by risk and places obligations on anyo...
Read definitionAn AI agent is an AI system that autonomously plans and executes multiple steps to reach a goal. It uses a language model as its brain and c...
Read definitionArtificial intelligence is technology that teaches computers to learn, reason, and make decisions from data instead of following hand-writte...
Read definitionBias in AI is a skew that creeps into models through data, algorithms, or human choices. It is not always harmful, but it has to be managed ...
Read definitionBottleneck analysis finds the step in a process where work gets stuck waiting, the step that dictates total throughput time. You spot bottle...
Read definition
Collect&Go and Telenet Business are testing an autonomous electric delivery cart in Leuven, steered over 5G. What it means for logistics and...
Ten practical steps to automate your business processes without AI hype. Start small, fix the process first, use the tools you already own, ...