OpenAI launches three new voice models for real-time applications
OpenAI announced three new audio models designed for real-time voice applications. The models include GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, according to a company press release.
GPT-Realtime-2 features what the company describes as GPT-5-class reasoning capabilities for voice interactions. The model includes an expanded context window from 32,000 to 128,000 tokens and offers adjustable reasoning levels from minimal to extra-high. On audio evaluation benchmarks, GPT-Realtime-2 scored 15.2% higher on Big Bench Audio compared to its predecessor GPT-Realtime-1.5.
GPT-Realtime-Translate provides live translation capabilities supporting more than 70 input languages and 13 output languages. The model translates speech in real-time while maintaining pace with speakers.
GPT-Realtime-Whisper offers streaming speech-to-text transcription, converting spoken words to text as people speak.
Several companies participated in early testing of the models. Zillow reported a 26-point improvement in call success rates using GPT-Realtime-2, with success rates reaching 95% compared to 69% with previous models. BolnaAI reported 12.5% lower word error rates when testing GPT-Realtime-Translate across Hindi, Tamil, and Telugu languages.
The models are available through OpenAI's Realtime API. GPT-Realtime-2 costs $32 per million audio input tokens and $64 per million audio output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.
The company stated that the API includes safety measures including active classifiers to halt conversations that violate content guidelines. The service supports EU data residency requirements.
