DigitalOcean launches Inference Engine for AI production workloads

April 28, 2026 9:01 AM

DigitalOcean (NYSE: DOCN) announced the launch of its Inference Engine, a set of production capabilities designed to help AI developers manage inference workloads. The announcement was made ahead of the company's Deploy conference in San Francisco.

The Inference Engine includes four core capabilities: Inference Router, Batch Inference, Serverless Inference, and Dedicated Inference. The Inference Router feature uses a Mixture of Expert router model to match requests to appropriate models based on task complexity and developer-defined preferences.

According to the company, early customers have reported cost reductions. Workato's Research Lab reported 67% lower inference costs, while LawVo reported a reduction of more than 40% in inference costs when using the Inference Router capability.

Hippocratic AI, which operates healthcare agents on the platform, achieved 2x production throughput and 40% lower P99 latency across more than 20 million patient interactions, according to the company statement.

Independent benchmarking platform Artificial Analysis found that DigitalOcean demonstrated 3x faster time-to-first-answer-token and 3x higher output speed than Amazon Bedrock on DeepSeek V3.2 at 10,000 input tokens.

The Serverless Inference feature provides access to multiple models through a single API and includes off-peak pricing. Batch Inference is designed for offline workloads and offers a 24-hour completion window. Dedicated Inference provides reserved capacity for high-scale workloads.

DigitalOcean serves more than 640,000 customers and will present additional product announcements at its Deploy conference. The information is based on a company press release.

StreetInsider

Log in to your account:

DigitalOcean launches Inference Engine for AI production workloads

Categories

Next Articles