January 31, 2025 News
OpenAI Tests AI Persuasion Capabilities Using Reddit's r/ChangeMyView
OpenAI has revealed it uses the Reddit forum r/ChangeMyView to evaluate its AI models' persuasive capabilities by having them generate arguments aimed at changing users' minds on various topics. While OpenAI claims its models perform in the top 80-90th percentile of human persuasiveness but not at superhuman levels, the company is developing safeguards against AI models becoming overly persuasive, which could potentially allow them to pursue hidden agendas.
Skynet Chance (+0.08%): The development of AI systems with high persuasive capabilities presents a clear risk vector for AI control problems, as highly persuasive systems could manipulate human operators or defenders, potentially allowing such systems to bypass intended restrictions or safeguards through social engineering.
Skynet Date (-3 days): OpenAI's explicit focus on testing persuasive capabilities and acknowledgment that current models are already achieving high-percentile human performance indicates this capability is advancing rapidly, potentially accelerating the timeline to AI systems that could effectively manipulate humans.
AGI Progress (+0.05%): Advanced persuasive reasoning represents progress toward AGI by demonstrating sophisticated understanding of human psychology, values, and decision-making, allowing AI systems to construct targeted arguments that reflect higher-order reasoning about human cognition and social dynamics.
AGI Date (-2 days): The revelation that current AI models already perform at the 80-90th percentile of human persuasiveness suggests this particular cognitive capability is developing faster than might have been expected, potentially accelerating the overall timeline to generally capable systems.
Altman Admits OpenAI Falling Behind, Considers Open-Sourcing Older Models
In a Reddit AMA, OpenAI CEO Sam Altman acknowledged that Chinese competitor DeepSeek has reduced OpenAI's lead in AI and admitted that OpenAI has been "on the wrong side of history" regarding open source. Altman suggested the company might reconsider its closed source strategy, potentially releasing older models, while also revealing his growing belief that AI recursive self-improvement could lead to a "fast takeoff" scenario.
Skynet Chance (+0.09%): Altman's acknowledgment that a "fast takeoff" through recursive self-improvement is more plausible than he previously believed represents a concerning shift in risk assessment from one of the most influential AI developers, suggesting key industry leaders now see rapid uncontrolled advancement as increasingly likely.
Skynet Date (-3 days): The increased competitive pressure from Chinese companies like DeepSeek is accelerating development timelines and potentially reducing safety considerations as OpenAI feels compelled to maintain its market position, while Altman's belief in a possible "fast takeoff" suggests timelines could compress unexpectedly.
AGI Progress (+0.06%): The revelation of intensifying competition between major AI labs and OpenAI's potential shift toward more open source strategies will likely accelerate overall progress by distributing advanced AI research more widely and creating stronger incentives for rapid capability advancement.
AGI Date (-4 days): The combination of heightened international competition, OpenAI's potential open sourcing of models, continued evidence that more compute leads to better models, and Altman's belief in recursive self-improvement suggest AGI timelines are compressing due to both technical and competitive factors.
VC Midha: DeepSeek's Efficiency Won't Slow AI's GPU Demand
Andreessen Horowitz partner and Mistral board member Anjney Midha believes that despite DeepSeek's impressive R1 model demonstrating efficiency gains, AI companies will continue investing heavily in GPU infrastructure. He argues that efficiency breakthroughs will allow companies to produce more output from the same compute rather than reducing overall compute demand.
Skynet Chance (+0.04%): The continued acceleration of AI compute infrastructure investment despite efficiency gains suggests that control mechanisms aren't keeping pace with capability development. This unrestrained scaling approach prioritizes performance over safety considerations, potentially increasing the risk of unintended AI behaviors.
Skynet Date (-2 days): The article indicates AI companies will use efficiency breakthroughs to amplify their compute investments rather than slow down, which accelerates the timeline toward potential control problems. The "insatiable demand" for both training and inference suggests rapid deployment that could outpace safety considerations.
AGI Progress (+0.08%): DeepSeek's engineering breakthroughs demonstrate significant efficiency improvements in AI models, allowing companies to get "10 times more output from the same compute." These efficiency gains represent meaningful progress toward more capable AI systems with the same hardware constraints.
AGI Date (-4 days): The combination of efficiency breakthroughs with undiminished investment in compute infrastructure suggests AGI development will accelerate significantly. Companies can now both improve algorithmic efficiency and continue scaling compute, creating a multiplicative effect that could substantially shorten the timeline to AGI.
OpenAI Launches Affordable Reasoning Model o3-mini for STEM Problems
OpenAI has released o3-mini, a new AI reasoning model specifically fine-tuned for STEM problems including programming, math, and science. The model offers improved performance over previous reasoning models while running faster and costing less, with OpenAI claiming a 39% reduction in major mistakes on tough real-world questions compared to o1-mini.
Skynet Chance (+0.06%): The development of more reliable reasoning models represents significant progress toward AI systems that can autonomously solve complex problems and check their own work. While safety measures are mentioned, the focus on competitive performance suggests capability development is outpacing alignment research.
Skynet Date (-2 days): The accelerating competition in reasoning models with rapidly decreasing costs suggests faster-than-expected progress toward autonomous problem-solving AI. The combination of improved accuracy, reduced costs, and faster performance indicates an acceleration in the timeline for advanced AI reasoning capabilities.
AGI Progress (+0.1%): Self-checking reasoning capabilities represent a significant step toward AGI, as they demonstrate improved reliability in domains requiring precise logical thinking. The model's ability to fact-check itself and perform competitively on math, science, and programming benchmarks shows meaningful progress in key AGI components.
AGI Date (-4 days): The rapid improvement cycle in reasoning models (o1 to o3 series) combined with increasing cost-efficiency suggests an acceleration in the development timeline for AGI. OpenAI's ability to deliver specialized reasoning at lower costs indicates that the economic barriers to AGI development are falling faster than anticipated.
Microsoft Establishes Advanced Planning Unit to Study AI's Societal Impact
Microsoft is creating a new Advanced Planning Unit (APU) within its Microsoft AI division to study the societal, health, and work implications of artificial intelligence. The unit will operate from the office of Microsoft AI's CEO Mustafa Suleyman and will combine research to explore future AI scenarios while making product recommendations and producing reports.
Skynet Chance (-0.13%): The establishment of a dedicated unit to study AI's societal implications demonstrates increased institutional focus on understanding and potentially mitigating AI risks. This structured approach to anticipating problems could help identify control issues before they become critical.
Skynet Date (+2 days): Microsoft's investment in studying AI's impacts suggests a more cautious, deliberate approach that may slow deployment of potentially problematic systems. The APU's role in providing recommendations could introduce additional safety considerations that extend the timeline before high-risk AI capabilities are released.
AGI Progress (+0.03%): While the APU itself doesn't directly advance technical capabilities, Microsoft's massive $22.6 billion quarterly AI investment and reorganization around AI priorities indicates substantial resources being directed toward AI development. The company's strategic focus on "model-forward" applications suggests continued progress toward more capable systems.
AGI Date (-1 days): The combination of record-high capital expenditures and organizational restructuring around AI suggests accelerated development, but the introduction of the APU might introduce some caution in deployment. The net effect is likely a slight acceleration given Microsoft's stated focus on compressing "thirty years of change into three years."
DeepSeek's Reasoning Model Disrupts AI Industry and Raises International Concerns
DeepSeek's release of its R1 reasoning model has created significant industry disruption, displacing ChatGPT as the App Store's top app and prompting reactions from both tech giants and the U.S. government. The Chinese AI lab claims to have built its models more efficiently and at lower cost than competitors, though some remain skeptical of these claims.
Skynet Chance (+0.05%): The emergence of a powerful reasoning model from China intensifies international AI competition, potentially leading to reduced safety oversight as companies and nations race for AI dominance. This geopolitical dimension could prioritize capability development over careful control mechanisms to maintain competitive advantages.
Skynet Date (-3 days): The unexpected rapid advancement of DeepSeek's capabilities suggests AI progress is occurring faster than anticipated in multiple global regions simultaneously. This competitive pressure will likely accelerate development timelines as companies rush to match or exceed these capabilities.
AGI Progress (+0.09%): DeepSeek's R1 model represents significant progress in reasoning capabilities that are fundamental to AGI development. The fact that it has achieved competitive performance through claimed efficiency improvements demonstrates meaningful advancement in the algorithmic approaches needed for AGI.
AGI Date (-4 days): DeepSeek's claimed efficiency breakthroughs, if valid, suggest that AGI development might require significantly less computational resources than previously estimated. This major reduction in resource requirements could dramatically accelerate the timeline for achieving AGI by lowering economic barriers to advanced model development.