AI Act (EU)
The AI Act is the European Union regulation that governs artificial intelligence. It sorts AI systems by risk and places obligations on anyo...
Read definitionText-to-speech (TTS) turns written text into spoken audio. It evolved from clunky mechanical machines to modern AI voices that sound almost human. You will find it in accessibility tools, automation, AI voice agents, and content production.
Text-to-speech, or TTS, is technology that turns written text into spoken audio. You hear it in navigation apps, digital assistants, AI voice agents, call centres, e-learning platforms, and audiobooks. Voices range from the basic robotic tones of the early days to highly lifelike AI voices that adjust intonation, emotion, and pace on the fly.
At its core, TTS comes down to three steps:
analyse the text
decide on pronunciation and rhythm
generate the audio
Modern systems use neural models that sound far more natural than the early generations ever managed.
The model reads the text and identifies sentences, punctuation, numbers, and abbreviations. Smarter systems even pick up on semantics so they can place pauses or emotional cues in the right spots.
The system works out how each word should sound: stress, intonation, rhythm, and pace.
The engine converts the prepared text into audio. There are three broad approaches:
Formant synthesis: fully synthetic sound built from scratch (older, robotic).
Concatenative synthesis: real recorded speech fragments stitched together (more natural).
Neural TTS: AI models that generate speech directly from waveforms (very natural, with flexible emotion and pacing).
In 1779, Wolfgang von Kempelen built a mechanical speaking machine that produced sounds using bellows and reeds. It was not real speech synthesis, but it was a milestone in modelling the human voice.
In 1939, Bell Labs unveiled the Voder. An operator pressed keys to form sounds. It was the first electronic speech system.
Researchers modelled the resonance of the human vocal tract. The output was robotic but intelligible. This led to the first computer-driven TTS systems.
The DECtalk system became iconic. Stephen Hawking famously used a variant of it. The speech was mechanical but useful for accessibility and early call centres.
TTS shifted to using real audio fragments. The result was much more natural, but harder to adapt. Navigation systems and telephony adopted it at scale.
DeepMind introduced WaveNet, followed by models like Tacotron, FastSpeech, Glow-TTS, and VITS. They produce fluid, realistic speech and can shape emotion, style, and context.
One name keeps surfacing in any history of speech technology: Lernout & Hauspie (L&H). The Belgian company grew through the 1990s into a global player in speech recognition and synthesis, building commercial TTS voices at a time when the technology was still mostly an academic curiosity. Their products ended up in call centres, screen readers for the visually impaired, medical dictation systems, and consumer electronics. Around L&H, an ecosystem called Flanders Language Valley formed in West Flanders. After L&H collapsed in 2001, many of its engineers moved to Nuance and ScanSoft, both later absorbed into Microsoft. Academic work at KU Leuven and imec on acoustic modelling and neural speech synthesis kept the region active in the field well into the WaveNet era.
From 2020 onward, the sector picked up pace again. TTS suddenly became:
more natural sounding
cheaper to run
usable in real time
capable of voice cloning
practical in telephony, customer service, and media production
Today's startups and scale-ups focus on areas like:
AI voice agents for call centres
digital brand voices for companies that want a consistent audio identity
data annotation and modelling for less-represented languages and dialects
voice interfaces for sectors like healthcare, logistics, and education
translation combined with TTS for international communication
Bigger companies are also experimenting with their own branded voice models for internal processes and customer contact.
The AI Act is the European Union regulation that governs artificial intelligence. It sorts AI systems by risk and places obligations on anyo...
Read definitionArtificial intelligence is technology that teaches computers to learn, reason, and make decisions from data instead of following hand-writte...
Read definitionBias in AI is a skew that creeps into models through data, algorithms, or human choices. It is not always harmful, but it has to be managed ...
Read definitionEmbeddings turn words, sentences, or images into numbers that capture their meaning. Neural networks learn them from huge amounts of text. T...
Read definitionGenerative AI (GenAI) is technology that produces new content on its own, things like text, images, code, or music. It learns patterns from ...
Read definition
Discover how Microsoft Copilot in Power BI leverages generative AI to simplify report creation, enhance data insights, and empower teams. Le...
Explore five famous data visualizations, their impact, key takeaways, and best practices for creating impactful visualizations.