Upgrade to SI Premium - Free Trial

OpenAI launches three new voice models for real-time applications

May 7, 2026 1:15 PM

OpenAI announced three new audio models designed for real-time voice applications. The models include GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, according to a company press release.

GPT-Realtime-2 features what the company describes as GPT-5-class reasoning capabilities for voice interactions. The model includes an expanded context window from 32,000 to 128,000 tokens and offers adjustable reasoning levels from minimal to extra-high. On audio evaluation benchmarks, GPT-Realtime-2 scored 15.2% higher on Big Bench Audio compared to its predecessor GPT-Realtime-1.5.

GPT-Realtime-Translate provides live translation capabilities supporting more than 70 input languages and 13 output languages. The model translates speech in real-time while maintaining pace with speakers.

GPT-Realtime-Whisper offers streaming speech-to-text transcription, converting spoken words to text as people speak.

Several companies participated in early testing of the models. Zillow reported a 26-point improvement in call success rates using GPT-Realtime-2, with success rates reaching 95% compared to 69% with previous models. BolnaAI reported 12.5% lower word error rates when testing GPT-Realtime-Translate across Hindi, Tamil, and Telugu languages.

The models are available through OpenAI's Realtime API. GPT-Realtime-2 costs $32 per million audio input tokens and $64 per million audio output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

The company stated that the API includes safety measures including active classifiers to halt conversations that violate content guidelines. The service supports EU data residency requirements.

Categories

Corporate News Hot Corp. News