voice AI AI News & Updates

OpenAI Releases Advanced Real-Time Voice API with GPT-5-Class Reasoning and Multi-Language Translation

OpenAI announced new voice intelligence features for its API, including GPT-Realtime-2 with GPT-5-class reasoning for complex conversational requests, GPT-Realtime-Translate supporting 70+ input languages, and GPT-Realtime-Whisper for live transcription. These features are designed to enable voice interfaces that can listen, reason, translate, transcribe, and take action in real-time across enterprise applications including customer service, education, and media.

Google DeepMind Acquires Hume AI Leadership Team to Enhance Voice Emotion Recognition

Google DeepMind has hired the CEO and approximately seven engineers from voice AI startup Hume AI through a licensing agreement, aiming to improve Gemini's voice features with emotional intelligence capabilities. This "acquihire" represents the latest trend of major AI companies acquiring startup talent without buying the company outright, potentially to avoid regulatory scrutiny. The deal underscores voice AI's emergence as a critical competitive frontier, with Hume AI's technology specializing in detecting user emotions and mood through voice analysis.

OpenAI Launches GPT-5 Pro, Sora 2 Video Model, and Cost-Efficient Voice API at Dev Day

OpenAI announced major API updates at its Dev Day, introducing GPT-5 Pro for high-accuracy reasoning tasks, Sora 2 for advanced video generation with synchronized audio, and a cheaper voice model called gpt-realtime mini. These releases target developers across finance, legal, healthcare, and creative industries, aiming to expand OpenAI's developer ecosystem with more powerful and cost-effective tools.

Google Launches Real-Time Voice Conversations with AI-Powered Search

Google has introduced Search Live, enabling back-and-forth voice conversations with its AI Mode search feature using a custom version of Gemini. Users can now engage in free-flowing voice dialogues with Google Search, receiving AI-generated audio responses and exploring web links conversationally. The feature supports multitasking and background operation, with plans to add real-time camera-based queries in the future.