Mistral has launched Voxtral, an open automatic speech recognition (ASR) software bundle aimed at disrupting the ASR market by breaking the traditional trade-off between cost and quality.
Why it matters
Using ASR in production has often meant choosing between high-error, open-source models or expensive proprietary ones with better accuracy. Voxtral claims to bridge this gap by delivering state-of-the-art accuracy and native semantic understanding — at less than half the price of leading APIs.
The big picture
OpenAI’s Whisper charges $0.006 per minute, and GPT-4o-mini-transcribe costs around $0.003 per minute. Voxtral starts at $0.001 per minute, scaling up to $0.004, while reportedly outperforming these competitors on key benchmarks, including multilingual transcription and short-form English.
Zoom in
Mistral claims Voxtral beats Whisper large-v3, GPT-4o mini Transcribe, Gemini 2.5 Flash, and ElevenLabs Scribe across all tested tasks. However, unlike Whisper, Mistral hasn’t disclosed hallucination rates — a key quality metric for ASR.
Yes, but
The ASR space is competitive, and real-world adoption depends on integration ease, transparency of benchmark data, and ecosystem support. Voxtral’s open approach and pricing could pressure incumbents but must prove itself in production environments.