About Groq
Inference on a chip built only for inference.
Groq was founded in 2016 in Mountain View, California by Jonathan Ross, who designed the original Tensor Processing Unit at Google before starting the company. The architecture began life as the Tensor Streaming Processor and was rebranded to LPU (Language Processing Unit) once large language models took over the workload. Headquarters sits in Mountain View, with offices in San Jose, Liberty Lake, Toronto and London. Note for clarity: Groq the inference company is not the same thing as Grok the chatbot from xAI; the names sound alike, the products are unrelated.
The product is GroqCloud, a pay-per-token API that hosts open-weight models on LPU hardware instead of GPUs. The current line-up includes Llama 3.1 8B Instant at around 560 tokens per second, Llama 3.3 70B Versatile at around 280, Llama 4 Scout 17B in preview at around 750, OpenAI's GPT-OSS 120B and 20B at 500 and 1000 respectively, Qwen3-32B at around 400, plus Whisper Large V3 and V3 Turbo for speech-to-text. Groq Compound bundles a model with built-in tools (web search, code execution) at around 450 tokens per second for agent workloads. The LPU is single-architecture, deterministic and built only for inference, which is where the headline tokens-per-second numbers come from. The company licensed its inference technology to Nvidia in late 2025, with Jonathan Ross moving over to lead inference at Nvidia.