Research Breakthrough AI News & Updates

OpenAI's Reasoning Model Disproves 80-Year-Old Erdős Conjecture in Geometry

OpenAI claims its new general-purpose reasoning model has autonomously produced an original mathematical proof disproving a famous unsolved conjecture in geometry first posed by Paul Erdős in 1946. This follows a previous false claim seven months ago where OpenAI mistakenly announced GPT-5 had solved Erdős problems, only to discover it had found existing solutions. The current claim is supported by verification from prominent mathematicians including Noga Alon, Melanie Wood, and Thomas Bloom, marking what OpenAI calls the first time AI has autonomously solved a prominent open problem in mathematics.

Google Integrates Street View with Genie World Model for Interactive Environment Simulation

Google DeepMind is connecting Street View's 280 billion images across 110 countries to Project Genie, its world model that generates interactive environments. The integration allows users and AI agents to simulate real-world locations with adjustable conditions like weather, aimed at applications in robotics training, gaming, and educational experiences. While spatially continuous, the current implementation is video-game quality rather than photorealistic and lacks physics awareness, though researchers expect these limitations to be resolved within 6-12 months.

Recursive Superintelligence Startup Emerges with $650M to Build Self-Improving AI Systems

Richard Socher has launched Recursive Superintelligence, a San Francisco-based AI startup that emerged from stealth with $650 million in funding, aiming to create recursively self-improving AI models. The company, staffed by prominent AI researchers including Peter Norvig and Tim Shi, is focused on building systems that can autonomously identify their own weaknesses and redesign themselves without human intervention, using an "open-endedness" approach inspired by biological evolution. Socher indicates that products will be released within quarters rather than years.

Adaption Launches AutoScientist: AI System for Automated Model Training and Self-Improvement

Adaption, a new AI research lab, has released AutoScientist, a tool that automates the fine-tuning process by co-optimizing data and models to help AI systems learn capabilities more efficiently. The system is designed to enable continuous model improvement and could democratize frontier AI training beyond major labs. The company claims AutoScientist has more than doubled win-rates across different models and is offering free access for the first 30 days.

Google Expands Agentic AI Features Enabling Multi-Step Task Completion Across Android Apps

Google introduced enhanced agentic AI capabilities to Android through Gemini Intelligence, allowing the assistant to perform multi-step tasks across applications like transferring grocery lists to shopping carts and completing checkouts. New features include autonomous web browsing, AI-powered form filling using personal data, dictation with automatic formatting via Gboard's Rambler, and natural language widget creation ("vibe-coding"). These AI features will initially deploy on Samsung Galaxy and Google Pixel devices this summer before broader Android rollout.

Anthropic's Mythos AI Model Revolutionizes Firefox Vulnerability Detection

Anthropic's Mythos model has significantly enhanced Firefox's cybersecurity by discovering thousands of high-severity bugs, including some over a decade old, with Mozilla reporting a 13x increase in bug fixes compared to the previous year. The AI system excels at finding complex sandbox vulnerabilities that traditionally commanded $20,000 bounties, though human engineers are still required to write the actual patches. The advancement marks a turning point for AI security tools, which previously suffered from high false positive rates.

Genesis AI Unveils GENE-26.5 Foundation Model with Custom Robotic Hands and Data Collection Gloves

Genesis AI has revealed its first foundational robotics model, GENE-26.5, alongside custom-designed robotic hands that match human hand size and shape. The startup has developed a full-stack approach including sensor-loaded gloves for data collection from human workers, simulation systems for rapid iteration, and plans to release a full-body general-purpose robot soon. The company raised $105 million in seed funding and is expanding across Paris, California, and London with a team of 60 people.

OpenAI's GPT Models Outperform Emergency Room Physicians in Diagnostic Accuracy Study

A Harvard Medical School study published in Science found that OpenAI's o1 model provided more accurate diagnoses than human emergency room physicians when analyzing 76 real patient cases from Beth Israel Deaconess Medical Center. The AI model achieved exact or close diagnoses in 67% of initial triage cases compared to 50-55% for attending physicians, though researchers emphasized the need for prospective trials before real-world clinical deployment. The study only evaluated text-based information and acknowledged current AI limitations with non-text inputs and the need for human accountability in medical decision-making.

Anthropic Tests AI Agent Marketplace with Real Transactions Among Employees

Anthropic conducted an experimental marketplace called Project Deal where AI agents autonomously negotiated and completed real purchases on behalf of 69 employees using $100 budgets. The experiment revealed that users represented by more advanced AI models achieved objectively better outcomes, but participants remained unaware of these disparities, raising concerns about "agent quality gaps." The pilot resulted in 186 deals totaling over $4,000 in value across four different marketplace configurations.

Physical Intelligence Unveils Robot AI with Emergent Task Generalization Capability

Physical Intelligence has released research on its π0.7 model, demonstrating that the robot brain can perform tasks it was never explicitly trained on through compositional generalization. The model successfully combined fragmented training data to operate an air fryer and perform other novel tasks, surprising even the researchers who knew the training data intimately. While promising, the system still requires step-by-step verbal coaching for complex tasks and lacks standardized benchmarks for validation.