Alibaba unveils new flagship AI model: Qwen2.5-Omni

Get Alerts BABA Hot Sheet
Overall Analyst Rating:
SELL (= Flat)
Dividend Yield: 1.4%
Revenue Growth %: +7.4%
Join SI Premium – FREE
Investing.com -- Alibaba Group Holdings Ltd ADR (NYSE: BABA) has introduced Qwen2.5-Omni, its new flagship model in the Qwen series. The end-to-end multimodal model is designed for extensive multimodal perception and can process a variety of inputs such as text, images, audio, and video. It provides real-time streaming responses through text generation and natural speech synthesis.
Key features of the model include its Thinker-Talker architecture, designed to perceive a range of modalities, including text, images, audio, and video. This architecture allows the model to generate text and natural speech responses simultaneously. It also includes a novel position embedding, dubbed TMRoPE (Time-aligned Multimodal RoPE), which synchronizes the timestamps of video inputs with audio.
The model is designed for fully real-time interactions, supporting chunked input and immediate output. It surpasses many existing streaming and non-streaming alternatives in terms of robustness and naturalness in speech generation. Qwen2.5-Omni showcases exceptional performance across all modalities and outperforms the similarly sized Qwen2-Audio in audio capabilities. It also matches the performance of Qwen2.5-VL-7B.
Qwen2.5-Omni employs the Thinker-Talker architecture, where the Thinker functions like a brain, processing and understanding inputs from text, audio, and video modalities. It generates high-level representations and corresponding text. The Talker operates like a human mouth, taking in the high-level representations and text produced by the Thinker and outputting discrete tokens of speech fluidly.
A comprehensive evaluation of Qwen2.5-Omni has been conducted, showing strong performance across all modalities when compared to similarly sized single-modality models and closed-source models like Qwen2.5-VL-7B, Qwen2-Audio, and Gemini-1.5-pro. In tasks requiring the integration of multiple modalities, such as OmniBench, Qwen2.5-Omni achieves state-of-the-art performance.
In the near future, Alibaba plans to enhance the model's ability to follow voice commands and improve audio-visual collaborative understanding. The company also aims to integrate more modalities towards an omni-model.
The Qwen2.5-Omni model is now publicly available on platforms like Hugging Face, ModelScope, DashScope, and GitHub. Users can experience the model's interactive features through a demo or join discussions on Discord.
You May Also Be Interested In
- Avoid this Japanese stock as it 'will suffer the most from U.S. tariffs': Jefferies
- Harvard Says It Has Filed Federal Suit Against Trump Administration --WSJ
- ROSEN, LEADING INVESTOR COUNSEL, Encourages Viatris Inc. Investors to Secure Counsel Before Important Deadline in Securities Class Action - VTRS
Create E-mail Alert Related Categories
InvestingRelated Entities
Maynard Um, Mark Zuckerberg, ARKSign up for StreetInsider Free!
Receive full access to all new and archived articles, unlimited portfolio tracking, e-mail alerts, custom newswires and RSS feeds - and more!