Research Breakthrough AI News & Updates
OpenAI's Reasoning Model Disproves 80-Year-Old Erdős Conjecture in Geometry
OpenAI claims its new general-purpose reasoning model has autonomously produced an original mathematical proof disproving a famous unsolved conjecture in geometry first posed by Paul Erdős in 1946. This follows a previous false claim seven months ago where OpenAI mistakenly announced GPT-5 had solved Erdős problems, only to discover it had found existing solutions. The current claim is supported by verification from prominent mathematicians including Noga Alon, Melanie Wood, and Thomas Bloom, marking what OpenAI calls the first time AI has autonomously solved a prominent open problem in mathematics.
Skynet Chance (+0.04%): Autonomous complex reasoning and novel problem-solving in mathematics demonstrates AI systems can now perform sophisticated intellectual tasks independently, potentially increasing capability for unexpected behaviors. However, mathematical reasoning is still a narrow domain and doesn't directly relate to goal misalignment or control challenges.
Skynet Date (-1 days): The demonstration of long-chain autonomous reasoning capabilities suggests faster-than-expected progress in AI systems that can independently solve complex problems. This acceleration in reasoning capabilities could shorten timelines to advanced AI systems that might pose control challenges.
AGI Progress (+0.04%): Successfully solving a prominent 80-year-old mathematical problem autonomously using a general-purpose reasoning model represents significant progress toward AGI's requirement for abstract reasoning, creativity, and intellectual generalization. The ability to discover novel solutions across fields suggests meaningful advancement in core AGI capabilities beyond narrow pattern matching.
AGI Date (-1 days): The breakthrough demonstrates that general-purpose reasoning models are advancing faster than anticipated, achieving autonomous novel research contributions sooner than expected. This suggests acceleration in the timeline toward AGI as systems demonstrate intellectual capabilities previously thought to require human-level general intelligence.
Google Integrates Street View with Genie World Model for Interactive Environment Simulation
Google DeepMind is connecting Street View's 280 billion images across 110 countries to Project Genie, its world model that generates interactive environments. The integration allows users and AI agents to simulate real-world locations with adjustable conditions like weather, aimed at applications in robotics training, gaming, and educational experiences. While spatially continuous, the current implementation is video-game quality rather than photorealistic and lacks physics awareness, though researchers expect these limitations to be resolved within 6-12 months.
Skynet Chance (+0.04%): The ability to simulate diverse real-world environments with variable conditions creates more robust training grounds for autonomous agents and robots, potentially accelerating their deployment in unpredictable real-world scenarios with less human oversight. However, the current lack of physics awareness and limited quality somewhat mitigates immediate risk escalation.
Skynet Date (-1 days): This development accelerates the timeline for deploying capable autonomous agents in real-world environments by providing rich simulation training data, though the technology's current limitations (6-12 months behind video generation quality) moderate the acceleration effect. The integration with robotics platforms like Waymo suggests faster practical deployment of autonomous systems.
AGI Progress (+0.03%): Genie's ability to generate interactive, spatially continuous simulations from real-world data represents meaningful progress in world modeling and spatial reasoning, key components for general intelligence. The model demonstrates understanding of 3D space and environmental continuity, which are foundational capabilities for AGI.
AGI Date (-1 days): By providing a scalable platform for training AI agents on realistic world simulations derived from massive real-world datasets, this accelerates the development cycle for embodied AI systems. The planned improvements to physics understanding and quality within 6-12 months suggest rapid capability gains in world modeling.
Recursive Superintelligence Startup Emerges with $650M to Build Self-Improving AI Systems
Richard Socher has launched Recursive Superintelligence, a San Francisco-based AI startup that emerged from stealth with $650 million in funding, aiming to create recursively self-improving AI models. The company, staffed by prominent AI researchers including Peter Norvig and Tim Shi, is focused on building systems that can autonomously identify their own weaknesses and redesign themselves without human intervention, using an "open-endedness" approach inspired by biological evolution. Socher indicates that products will be released within quarters rather than years.
Skynet Chance (+0.09%): Autonomous self-improving AI systems that can redesign themselves without human oversight directly increase risks of loss of control and alignment challenges, as the system's evolution may diverge from human values. The explicit goal of removing humans from the improvement loop reduces our ability to monitor and correct problematic developments.
Skynet Date (-1 days): The $650M funding and claim of product release within quarters suggests rapid progress toward systems that autonomously improve themselves, potentially accelerating the timeline to scenarios where AI capabilities exceed human control mechanisms. The focus on removing human bottlenecks from AI development could compress timelines significantly.
AGI Progress (+0.06%): Recursive self-improvement represents a fundamental capability leap toward AGI, as it addresses the core challenge of autonomous research and development. The well-funded team of prominent researchers with a concrete technical approach (open-endedness, co-evolution) suggests meaningful progress toward systems that can independently advance their own capabilities.
AGI Date (-1 days): The substantial funding ($650M), high-caliber team, and near-term product timeline (quarters not years) indicate significant acceleration of efforts toward AGI through recursive self-improvement. If successful, such systems could dramatically compress development timelines by automating AI research itself, potentially achieving what Socher calls "superintelligence at scale."
Adaption Launches AutoScientist: AI System for Automated Model Training and Self-Improvement
Adaption, a new AI research lab, has released AutoScientist, a tool that automates the fine-tuning process by co-optimizing data and models to help AI systems learn capabilities more efficiently. The system is designed to enable continuous model improvement and could democratize frontier AI training beyond major labs. The company claims AutoScientist has more than doubled win-rates across different models and is offering free access for the first 30 days.
Skynet Chance (+0.04%): Self-improving AI systems that can optimize themselves with minimal human oversight represent a step toward recursive self-improvement, a key concern in AI safety and loss of control scenarios. However, this system appears focused on task-specific fine-tuning rather than fundamental architectural changes, limiting immediate risk elevation.
Skynet Date (-1 days): By democratizing advanced model training capabilities beyond major labs and accelerating the fine-tuning process, this tool could accelerate the development of increasingly capable systems across more actors. The automation of what was previously human-intensive work speeds the overall pace of AI capability advancement.
AGI Progress (+0.03%): AutoScientist represents meaningful progress toward automated AI development pipelines and self-improving systems, which are important capabilities on the path to AGI. The ability to co-optimize data and models automatically addresses key bottlenecks in scaling AI capabilities and suggests movement toward more autonomous AI research.
AGI Date (-1 days): The tool significantly accelerates model training and fine-tuning processes while democratizing access to frontier-level capabilities, potentially multiplying the effective research capacity working on advanced AI. This automation of previously manual optimization processes could materially speed the timeline toward AGI by reducing iteration cycles and expanding the number of teams capable of frontier research.
Google Expands Agentic AI Features Enabling Multi-Step Task Completion Across Android Apps
Google introduced enhanced agentic AI capabilities to Android through Gemini Intelligence, allowing the assistant to perform multi-step tasks across applications like transferring grocery lists to shopping carts and completing checkouts. New features include autonomous web browsing, AI-powered form filling using personal data, dictation with automatic formatting via Gboard's Rambler, and natural language widget creation ("vibe-coding"). These AI features will initially deploy on Samsung Galaxy and Google Pixel devices this summer before broader Android rollout.
Skynet Chance (+0.03%): Agentic AI capabilities that autonomously browse the web, complete multi-step tasks, and access personal data across applications represent meaningful progress toward goal-directed AI systems with increased autonomy. The ability to act on user behalf with confirmation steps shows advancing but still-supervised agency that could present alignment challenges if controls fail.
Skynet Date (+0 days): Deployment of autonomous task-completion AI to millions of consumer devices accelerates the timeline for widespread agentic systems and potential emergent behaviors at scale. The rapid commercialization of autonomous web browsing and cross-application task execution pushes agentic AI capabilities into production faster than safety frameworks may mature.
AGI Progress (+0.02%): Multi-step reasoning across applications, autonomous web navigation with goal completion, and contextual understanding from screen content represent significant progress toward general-purpose task automation. These agentic capabilities demonstrate meaningful advancement in AI systems that can understand goals, plan multi-step actions, and execute tasks across diverse digital environments.
AGI Date (+0 days): The deployment of agentic AI with cross-application task completion and autonomous browsing to consumer devices represents acceleration of practical AGI-relevant capabilities. Google's rapid commercialization of these features indicates faster-than-expected progress in translating research advances into deployable systems with general task-handling abilities.
Anthropic's Mythos AI Model Revolutionizes Firefox Vulnerability Detection
Anthropic's Mythos model has significantly enhanced Firefox's cybersecurity by discovering thousands of high-severity bugs, including some over a decade old, with Mozilla reporting a 13x increase in bug fixes compared to the previous year. The AI system excels at finding complex sandbox vulnerabilities that traditionally commanded $20,000 bounties, though human engineers are still required to write the actual patches. The advancement marks a turning point for AI security tools, which previously suffered from high false positive rates.
Skynet Chance (+0.04%): The capability to autonomously discover complex software vulnerabilities demonstrates advanced agentic reasoning and multi-step planning abilities that could be applied to finding and exploiting security flaws in AI safety mechanisms themselves. However, the model's use under responsible disclosure norms and the fact that patching still requires human oversight provides some mitigation.
Skynet Date (-1 days): The demonstrated agentic capabilities and multi-step reasoning required to find sandbox vulnerabilities suggests faster progress in autonomous AI systems that can navigate complex problem spaces. This acceleration in practical AI agent capabilities could accelerate timelines for more advanced autonomous systems.
AGI Progress (+0.03%): The model's ability to perform complex multi-step reasoning, write code, attack systems creatively, and self-assess its work represents meaningful progress toward AGI-relevant capabilities like autonomous problem-solving and task decomposition. The shift from low-quality AI security tools to highly effective ones in just months indicates rapid capability gains.
AGI Date (-1 days): The rapid improvement in agentic AI capabilities over "a few short months" and the model's ability to outperform human experts in complex vulnerability discovery suggests an accelerating pace of AI capability development. The dramatic improvement from previous AI security tools indicates faster-than-expected progress in practical reasoning systems.
Genesis AI Unveils GENE-26.5 Foundation Model with Custom Robotic Hands and Data Collection Gloves
Genesis AI has revealed its first foundational robotics model, GENE-26.5, alongside custom-designed robotic hands that match human hand size and shape. The startup has developed a full-stack approach including sensor-loaded gloves for data collection from human workers, simulation systems for rapid iteration, and plans to release a full-body general-purpose robot soon. The company raised $105 million in seed funding and is expanding across Paris, California, and London with a team of 60 people.
Skynet Chance (+0.04%): The development of general-purpose robotic systems with human-like manipulation capabilities and autonomous task execution increases the potential attack surface and deployment scale of AI systems that could be misused or develop unintended behaviors. However, the current focus on specific tasks and human supervision mitigates immediate control concerns.
Skynet Date (-1 days): The full-stack approach combining hardware, software, and rapid data collection methods accelerates the deployment timeline for capable robotic systems in real-world environments. The simulation-based rapid iteration and novel data collection through worker gloves could speed up capability development.
AGI Progress (+0.04%): This represents significant progress toward AGI by bridging the embodiment gap through human-scale manipulation, multimodal learning from video and physical interaction data, and demonstrated ability to perform complex sequential tasks. The foundation model approach for robotics parallels the successful trajectory of language models.
AGI Date (-1 days): The combination of scalable data collection methods (gloves worn during normal work, internet videos), rapid simulation-based iteration, and full-stack control significantly accelerates the pace toward general-purpose physical intelligence. The startup's massive funding and aggressive hiring across three continents enables parallel development that could compress typical research timelines.
OpenAI's GPT Models Outperform Emergency Room Physicians in Diagnostic Accuracy Study
A Harvard Medical School study published in Science found that OpenAI's o1 model provided more accurate diagnoses than human emergency room physicians when analyzing 76 real patient cases from Beth Israel Deaconess Medical Center. The AI model achieved exact or close diagnoses in 67% of initial triage cases compared to 50-55% for attending physicians, though researchers emphasized the need for prospective trials before real-world clinical deployment. The study only evaluated text-based information and acknowledged current AI limitations with non-text inputs and the need for human accountability in medical decision-making.
Skynet Chance (+0.04%): The study demonstrates AI systems making better life-or-death decisions than trained professionals in critical scenarios, highlighting potential over-reliance risks and the challenge of maintaining human oversight when AI appears superior. The noted lack of formal accountability frameworks for AI medical decisions represents a concrete example of deployment outpacing safety governance.
Skynet Date (-1 days): The success of AI in high-stakes emergency medical decisions may accelerate deployment of autonomous AI systems in critical domains before adequate safety and accountability frameworks are established. This could compress the timeline for AI systems operating with reduced human supervision in consequential scenarios.
AGI Progress (+0.04%): The study demonstrates that LLMs can outperform expert humans in complex, high-stakes reasoning tasks requiring rapid synthesis of incomplete information under time pressure—a key AGI capability. This represents significant progress in AI reasoning and decision-making in real-world, unstructured scenarios beyond controlled benchmarks.
AGI Date (-1 days): The demonstration that current models already exceed human expert performance in complex diagnostic reasoning suggests AI capabilities are advancing faster than expected in critical cognitive domains. This indicates the gap between current AI and AGI-level reasoning may be narrower than previously estimated, potentially accelerating the timeline.
Anthropic Tests AI Agent Marketplace with Real Transactions Among Employees
Anthropic conducted an experimental marketplace called Project Deal where AI agents autonomously negotiated and completed real purchases on behalf of 69 employees using $100 budgets. The experiment revealed that users represented by more advanced AI models achieved objectively better outcomes, but participants remained unaware of these disparities, raising concerns about "agent quality gaps." The pilot resulted in 186 deals totaling over $4,000 in value across four different marketplace configurations.
Skynet Chance (+0.04%): The demonstration of AI agents autonomously conducting real economic transactions with undetected capability disparities highlights emerging control and transparency challenges. The finding that users couldn't recognize when they were disadvantaged by inferior agents suggests potential risks in delegating decisions to AI systems without adequate oversight mechanisms.
Skynet Date (+0 days): Successful deployment of autonomous AI agents handling real transactions with minimal human intervention demonstrates practical capability advancement that could accelerate the timeline for AI systems operating independently in critical domains. However, the small scale and controlled nature of this experiment limits its acceleration impact.
AGI Progress (+0.03%): This experiment demonstrates meaningful progress in multi-agent coordination, economic reasoning, and autonomous decision-making in real-world scenarios with actual consequences. The ability of AI agents to successfully negotiate and complete complex transactions represents advancement toward more general capabilities beyond narrow task execution.
AGI Date (+0 days): The successful autonomous operation of AI agents in economic transactions with real monetary stakes suggests faster-than-expected progress in practical agentic capabilities, which are critical components of AGI. The finding that model quality directly correlates with outcome quality indicates a clear scaling path that could accelerate development timelines.
Physical Intelligence Unveils Robot AI with Emergent Task Generalization Capability
Physical Intelligence has released research on its π0.7 model, demonstrating that the robot brain can perform tasks it was never explicitly trained on through compositional generalization. The model successfully combined fragmented training data to operate an air fryer and perform other novel tasks, surprising even the researchers who knew the training data intimately. While promising, the system still requires step-by-step verbal coaching for complex tasks and lacks standardized benchmarks for validation.
Skynet Chance (+0.04%): The model's unexpected emergent capabilities—combining skills in unpredictable ways beyond its training data—demonstrate a degree of autonomous problem-solving that marginally increases alignment challenges. However, the system still requires human coaching and operates in constrained physical domains, limiting immediate control risks.
Skynet Date (-1 days): Emergent generalization in robotics accelerates the timeline slightly by demonstrating that physical AI systems may follow similar capability curves as language models. The surprise element suggests capabilities are scaling faster than expected, though physical deployment constraints remain significant.
AGI Progress (+0.04%): Compositional generalization in embodied AI represents a meaningful step toward general intelligence, showing that robots can synthesize knowledge across contexts similarly to language models. The researchers' genuine surprise at capabilities exceeding training data suggests a potential inflection point in robotic AI development.
AGI Date (-1 days): The demonstration of emergent capabilities and favorable scaling properties in robotics—previously seen only in language and vision domains—suggests AGI-relevant capabilities may be developing faster than anticipated. The $11 billion valuation discussions indicate significant capital acceleration toward embodied general intelligence research.