Microsoft releases three AI models with competitive pricing
Microsoft announced the release of three artificial intelligence models: MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2, now available through Microsoft Foundry and MAI Playground.
MAI-Transcribe-1 provides speech-to-text transcription for 25 languages and operates at 2.5 times the speed of Microsoft Azure's existing Fast offering. The model achieved a 3.9% word error rate across reported languages, according to company data. Pricing starts at $0.36 per hour.
MAI-Voice-1 generates audio content and can produce 60 seconds of audio in one second. The service includes custom voice creation using brief audio samples. Pricing begins at $22 per million characters.
MAI-Image-2 delivers image generation with what Microsoft reports as twice the speed compared to previous versions while maintaining quality levels. The model ranks in the top three on the Arena.ai leaderboard. Pricing starts at $5 per million tokens for text input and $33 per million tokens for image output.
WPP, a marketing communications company, is using MAI-Image-2 for commercial applications. "MAI-Image-2 is a genuine game-changer," said Rob Reilly, Global Chief Creative Officer at WPP. "It's a platform that not only responds to the intricate nuance of creative direction, but deeply respects the sheer craft involved in generating real-world, campaign-ready images."
The models are being integrated into Microsoft's consumer and commercial products. Microsoft stated the models underwent safety testing and include built-in guardrails for enterprise deployment. Access is available through Microsoft Foundry, with MAI Playground currently limited to U.S. users.
