Large Language Models AI News & Updates

AI Language Models Demonstrate Breakthrough in Solving Advanced Mathematical Problems

OpenAI's latest model GPT 5.2 and Google's AlphaEvolve have successfully solved multiple open problems from mathematician Paul Erdős's collection of over 1,000 unsolved conjectures. Since Christmas, 15 problems have been moved from "open" to "solved," with 11 solutions crediting AI models, demonstrating unexpected capability in high-level mathematical reasoning. The breakthrough is attributed to improved reasoning abilities in newer models combined with formalization tools like Lean and Harmonic's Aristotle that make mathematical proofs easier to verify.

Apple Partners with Google to Integrate Gemini AI Models into Siri and Apple Intelligence

Apple has officially partnered with Google to use Gemini models and cloud technology to power AI features including an upgraded Siri assistant. The multi-year, non-exclusive deal reportedly worth around $1 billion comes after Apple's AI efforts lagged behind competitors, though the company maintains its focus on privacy with on-device processing. The partnership occurs amid Google's ongoing antitrust battles over exclusive default agreements with Apple.

OpenAI Launches ChatGPT Health for Medical Conversations Despite AI Limitations

OpenAI announced ChatGPT Health, a dedicated space for health-related conversations that keeps medical discussions separate from other chats and can integrate with wellness apps like Apple Health. The company reports 230 million weekly users ask health questions on ChatGPT, though it acknowledges the platform is not intended for medical diagnosis or treatment and that LLMs are prone to hallucinations and don't understand truth. The feature will not use health conversations for model training and is expected to roll out in coming weeks.

Google Releases Gemini 3 Flash as Default Model, Intensifying Competition with OpenAI

Google has launched Gemini 3 Flash, a fast and cost-effective AI model that outperforms its predecessor Gemini 2.5 Flash and matches frontier models like GPT-5.2 on several benchmarks. The model is now the default in Google's Gemini app and features enhanced multimodal capabilities, reasoning, and visual content generation. This release continues the intense competition between Google and OpenAI, with Google processing over 1 trillion tokens daily through its API.

OpenAI Releases GPT-5.2 in Three Variants to Compete with Google's Gemini 3 Leadership

OpenAI launched GPT-5.2 in three variants (Instant, Thinking, and Pro) targeting developers and enterprise users, claiming superior performance in coding, math, and reasoning benchmarks. The release follows internal "code red" concerns about losing market share to Google's Gemini 3, which currently leads most benchmarks, and represents OpenAI's attempt to reclaim competitive advantage. The model focuses on reliability for production workflows and agentic systems, though it comes with higher compute costs and lacks new image generation capabilities.

Anthropic Launches Opus 4.5 with Enhanced Memory and Agent Capabilities

Anthropic released Opus 4.5, completing its 4.5 model series, featuring state-of-the-art performance across coding, tool use, and problem-solving benchmarks, including being the first model to exceed 80% on SWE-Bench verified. The model introduces significant memory improvements for long-context operations, an "endless chat" feature, and new Chrome and Excel integrations designed for agentic use-cases. Opus 4.5 competes directly with OpenAI's GPT 5.1 and Google's Gemini 3 in the frontier model landscape.

Hugging Face CEO Warns of 'LLM Bubble' While Broader AI Remains Strong

Hugging Face CEO Clem Delangue argues that while large language models (LLMs) may be experiencing a bubble that could burst soon, the broader AI field remains healthy and is just beginning. He predicts a shift toward smaller, specialized models tailored for specific use cases rather than universal LLMs, and notes his company maintains a capital-efficient approach with significant cash reserves.

Google Releases Gemini 3 Foundation Model with Record-Breaking Reasoning Capabilities

Google has launched Gemini 3, its most advanced foundation model to date, available immediately through the Gemini app and AI search interface. The model achieved record-breaking benchmark scores, including 37.4 on Humanity's Last Exam and top placement on LMArena, representing a significant advancement in AI reasoning capabilities. Google also released Gemini 3 Deepthink for research and Antigravity, an agentic coding interface for software development.

OpenAI Criticized for Overstating GPT-5 Mathematical Problem-Solving Capabilities

OpenAI researchers initially claimed GPT-5 solved 10 previously unsolved Erdős mathematical problems, prompting criticism from AI leaders including Meta's Yann LeCun and Google DeepMind's Demis Hassabis. Mathematician Thomas Bloom clarified that GPT-5 merely found existing solutions in the literature that were not catalogued on his website, rather than solving truly unsolved problems. OpenAI later acknowledged the accomplishment was limited to literature search rather than novel mathematical problem-solving.

Anthropic Releases Claude Sonnet 4.5 with Advanced Autonomous Coding Capabilities

Anthropic launched Claude Sonnet 4.5, a new AI model claiming state-of-the-art coding performance that can build production-ready applications autonomously. The model has demonstrated the ability to code independently for up to 30 hours, performing complex tasks like setting up databases, purchasing domains, and conducting security audits. Anthropic also claims improved AI alignment with lower rates of sycophancy and deception, along with better resistance to prompt injection attacks.