NVIDIA launches Nemotron 3 Nano Omni multimodal AI model

April 28, 2026 12:07 PM

NVIDIA Corp. (NASDAQ: NVDA) announced the release of Nemotron 3 Nano Omni, a multimodal artificial intelligence model that processes text, images, audio, and video within a single system. The model launched April 28 through Hugging Face, OpenRouter, build.nvidia.com, and 25 partner platforms.

The model uses a 30B-A3B hybrid mixture-of-experts architecture with vision and audio encoders integrated into one system. NVIDIA states the model achieves 9x higher throughput compared to other open omni models with similar interactivity capabilities.

Companies including Aible, Applied Scientific Intelligence, Eka Care, Foxconn, H Company, and Palantir have adopted the model, while Dell Technologies, DocuSign, Infosys, Oracle, and Zefr are evaluating it for potential use.

"By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn't practical before," said Gautier Cloix, CEO of H Company.

The model handles computer use applications, document intelligence, and audio-video reasoning tasks. H Company's computer usage agent using the model operates at 1920×1080 pixel resolution for visual processing tasks.

NVIDIA released the model with open weights, datasets, and training techniques. Organizations can customize the model using NVIDIA NeMo tools and deploy it across various environments from local systems to cloud platforms.

The Nemotron 3 family of models has recorded over 50 million downloads in the past year, according to NVIDIA. The company offers the model as an NVIDIA NIM microservice through its cloud partner ecosystem and inference platforms.

StreetInsider

Log in to your account:

NVIDIA launches Nemotron 3 Nano Omni multimodal AI model

Categories