Back to mobile site

OpenAI unveils three audio models for real-time voice tasks

May 7, 2026 2:09 PM EDT

FILE PHOTO: OpenAI logo is seen in this illustration taken May 20, 2024. REUTERS/Dado Ruvic/Illustration/File Photo/File Photo/File Photo

May 7 (Reuters) - OpenAI ‌introduced three ​audio ​models for its developer platform on Thursday, aiming to make ‌voice-based software agents more conversational and capable ⁠of completing tasks in real time.

The launch of ‌the application programming ‌interface (API) moves the ChatGPT-maker beyond transcription and chat toward agents that can listen, ​translate and act during live conversations.

The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. ⁠OpenAI said they are available to test in its ​developer playground.

GPT-Realtime-2 is designed to manage harder requests, call tools, handle interruptions ​and maintain context across ‌longer voice sessions.

The second model supports translation from more than 70 ⁠languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live ⁠speech-to-text, allowing captions, meeting notes and workflow updates to ​be generated as a speaker talks.

Customers testing the models include online real estate marketplace Zillow, online ‌travel agency Priceline and European telecommunications firm Deutsche Telekom.

Pricing for ‌GPT-Realtime-2 starts at $32 per million audio input ⁠tokens, GPT-Realtime-Translate costs $0.034 ‌per minute and ​GPT-Realtime-Whisper $0.017 per minute.

(Reporting by Anhata Rooprai in Bengaluru; Editing by Vijay ‌Kishore)



Serious News for Serious Traders! Try StreetInsider.com Premium Free!

You May Also Be Interested In





Related Categories

Reuters