multimodal models AI News & Updates
Microsoft Launches Three Multimodal Foundation Models to Compete in AI Market
Microsoft AI announced three new foundational models: MAI-Transcribe-1 for speech-to-text across 25 languages, MAI-Voice-1 for audio generation, and MAI-Image-2 for video generation. Developed by Microsoft's MAI Superintelligence team led by Mustafa Suleyman, these models are positioned as cost-competitive alternatives to offerings from Google and OpenAI, with pricing starting at $0.36 per hour for transcription. The release represents Microsoft's effort to build its own AI model stack while maintaining its partnership with OpenAI.
Skynet Chance (+0.01%): The release of more capable multimodal models increases the general sophistication of AI systems in the market, but these are commercial tools with apparent human oversight and practical use focus rather than autonomous or agentic capabilities that would significantly heighten loss-of-control risks.
Skynet Date (+0 days): The models represent incremental capability advancement in multimodal AI, slightly accelerating the overall pace of AI sophistication deployment. However, the focus on practical commercial applications rather than autonomous systems limits the acceleration of existential risk timelines.
AGI Progress (+0.02%): The simultaneous deployment of text, voice, and video generation capabilities in foundational models demonstrates progress toward integrated multimodal AI systems, which is a component of AGI. However, these appear to be specialized models for narrow tasks rather than general-purpose reasoning systems.
AGI Date (+0 days): Microsoft's competitive push with cost-effective multimodal models accelerates market adoption and incentivizes faster development cycles across the industry. The formation of a dedicated "Superintelligence team" and rapid model releases suggest an accelerated timeline for advanced AI development.