NVIDIA releases Dynamo 1.0 software for AI inference scaling

March 16, 2026 4:37 PM

NVIDIA (NASDAQ: NVDA) announced the release of Dynamo 1.0, open source software designed for generative and agentic artificial intelligence inference at scale. The company made the announcement at its GTC conference.

The software functions as a distributed operating system for AI data centers, orchestrating GPU and memory resources across clusters to manage AI workloads. According to NVIDIA, Dynamo 1.0 increased inference performance of NVIDIA Blackwell GPUs by up to 7x in recent industry benchmarks.

The software integrates with open source frameworks including LangChain, llm-d, LMCache, SGLang and vLLM. Dynamo 1.0 splits inference work across GPUs and can move data between GPUs and lower-cost storage to reduce memory limitations.

Major cloud service providers have integrated the NVIDIA inference platform, including Amazon Web Services, Microsoft Azure, Google Cloud and Oracle Cloud Infrastructure. NVIDIA cloud partners adopting the technology include Alibaba Cloud, CoreWeave, Together AI and Nebius.

Companies using the software include AI-native firms Cursor and Perplexity, inference endpoint providers Baseten, Deep Infra and Fireworks, and global enterprises ByteDance, Meituan, PayPal and Pinterest.

"Inference is the engine of intelligence, powering every query, every agent and every application," said Jensen Huang, founder and CEO of NVIDIA. "With NVIDIA Dynamo, we've created the first-ever 'operating system' for AI factories."

The software is available to developers worldwide as of the announcement date. NVIDIA also contributes TensorRT-LLM CUDA kernels to the FlashInfer project for integration into open source frameworks.

StreetInsider

Log in to your account:

NVIDIA releases Dynamo 1.0 software for AI inference scaling

Categories

Next Articles