NVIDIA launches Cosmos 3 open AI model for physical world applications
NVIDIA (NASDAQ: NVDA) announced the launch of Cosmos 3, an open foundation model designed for physical AI applications, at its GTC Taipei conference. The model uses a mixture-of-transformers architecture that combines vision reasoning, world generation and action prediction capabilities.
Cosmos 3 can process and generate text, images, video, ambient sound and actions. The company positions it as the first fully open omnimodel with these multimodal capabilities for physical AI development. The model aims to reduce training and evaluation cycles for physical AI applications from months to days.
The model was trained on what NVIDIA describes as one of the largest multimodal physical AI datasets, including billions of samples across multiple data types. Developers can use Cosmos 3 as a vision language model, world model for simulating physical environments, or as a foundation for training robots to perform specific tasks.
NVIDIA also announced the Cosmos Coalition, a collaboration with AI companies including Agile Robots, Black Forest Labs, Generalist, LTX, Runway and Skild AI. The coalition aims to advance open world model development through shared research and evaluation techniques.
The Cosmos 3 lineup includes three variants: Cosmos 3 Super for applications requiring high physics accuracy, Cosmos 3 Nano for faster video and action reasoning, and Cosmos 3 Edge for real-time inference at the edge, which is coming soon.
Companies across various industries are building applications on the Cosmos platform, including Agile Robots, Doosan Robotics, LG Electronics and Samsung for robotics applications, Li Auto for autonomous vehicles, and several companies for vision AI applications.
Cosmos 3 Super and Cosmos 3 Nano are available through build.nvidia.com and Hugging Face, with deployment options through NVIDIA NIM microservices and various cloud infrastructure partners.
