March 19, 2025 News
OpenAI Releases Premium o1-pro Model at Record-Breaking Price Point
OpenAI has released o1-pro, an enhanced version of its reasoning-focused o1 model, to select API developers. The model costs $150 per million input tokens and $600 per million output tokens, making it OpenAI's most expensive model to date, with prices far exceeding GPT-4.5 and the standard o1 model.
Skynet Chance (+0.01%): While the extreme pricing suggests somewhat improved reasoning capabilities, early benchmarks and user experiences indicate the model isn't a revolutionary breakthrough in autonomous reasoning that would significantly increase AI risk profiles.
Skynet Date (+0 days): The minor improvements over the base o1 model, despite significantly higher compute usage and extreme pricing, suggest diminishing returns on scaling current approaches, neither accelerating nor decelerating the timeline to potentially risky AI capabilities.
AGI Progress (+0.03%): Despite mixed early reception, o1-pro represents OpenAI's continued focus on improving reasoning capabilities through increased compute, which incrementally advances the field toward more robust problem-solving capabilities even if performance gains are modest.
AGI Date (+1 days): The minimal performance improvements despite significantly increased compute resources suggest diminishing returns on current approaches, potentially indicating that the path to AGI may be longer than some predictions suggest.
OpenAI's Noam Brown Claims Reasoning AI Models Could Have Existed Decades Earlier
OpenAI's AI reasoning research lead Noam Brown suggested at Nvidia's GTC conference that certain reasoning AI models could have been developed 20 years earlier if researchers had used the right approach. Brown, who previously worked on game-playing AI including Pluribus poker AI and helped create OpenAI's reasoning model o1, also addressed the challenges academia faces in competing with AI labs and identified AI benchmarking as an area where academia could make significant contributions despite compute limitations.
Skynet Chance (+0.05%): Brown's comments suggest that powerful reasoning capabilities were algorithmically feasible much earlier than realized, indicating our understanding of AI progress may be systematically underestimating potential capabilities. This revelation increases concern that other unexplored approaches might enable rapid capability jumps without corresponding safety preparations.
Skynet Date (-2 days): The realization that reasoning capabilities could have emerged decades earlier suggests we may be underestimating how quickly other advanced capabilities could emerge, potentially accelerating timelines for dangerous AI capabilities through similar algorithmic insights rather than just scaling.
AGI Progress (+0.06%): The revelation that reasoning capabilities were algorithmically possible decades ago suggests that current rapid progress in AI reasoning isn't just about compute scaling but about fundamental algorithmic insights. This indicates that similar conceptual breakthroughs could unlock other AGI components more readily than previously thought.
AGI Date (-3 days): Brown's assertion that powerful reasoning AI could have existed decades earlier with the right approach suggests that AGI development may be more gated by conceptual breakthroughs than computational limitations, potentially shortening timelines if similar insights occur in other AGI-relevant capabilities.
California AI Policy Group Advocates Anticipatory Approach to Frontier AI Safety Regulations
A California policy group co-led by AI pioneer Fei-Fei Li released a 41-page interim report advocating for AI safety laws that anticipate future risks, even those not yet observed. The report recommends increased transparency from frontier AI labs through mandatory safety test reporting, third-party verification, and enhanced whistleblower protections, while acknowledging uncertain evidence for extreme AI threats but emphasizing high stakes for inaction.
Skynet Chance (-0.2%): The proposed regulatory framework would significantly enhance transparency, testing, and oversight of frontier AI systems, creating multiple layers of risk detection and prevention. By establishing proactive governance mechanisms for anticipating and addressing potential harmful capabilities before deployment, the chance of uncontrolled AI risks is substantially reduced.
Skynet Date (+1 days): While the regulatory framework would likely slow deployment of potentially risky systems, it focuses on transparency and safety verification rather than development prohibitions. This balanced approach might moderately decelerate risky AI development timelines while allowing continued progress under improved oversight conditions.
AGI Progress (-0.03%): The proposed regulations focus primarily on transparency and safety verification rather than directly limiting AI capabilities development, resulting in only a minor negative impact on AGI progress. The emphasis on third-party verification might marginally slow development by adding compliance requirements without substantially hindering technical advancement.
AGI Date (+2 days): The proposed regulatory requirements for frontier model developers would introduce additional compliance steps including safety testing, reporting, and third-party verification, likely causing modest delays in development cycles. These procedural requirements would somewhat extend AGI timelines without blocking fundamental research progress.
Researchers Propose "Inference-Time Search" as New AI Scaling Method with Mixed Expert Reception
Google and UC Berkeley researchers have proposed "inference-time search" as a potential new AI scaling method that involves generating multiple possible answers to a query and selecting the best one. The researchers claim this approach can elevate the performance of older models like Google's Gemini 1.5 Pro to surpass newer reasoning models like OpenAI's o1-preview on certain benchmarks, though AI experts express skepticism about its broad applicability beyond problems with clear evaluation metrics.
Skynet Chance (+0.03%): Inference-time search represents a potential optimization technique that could make AI systems more reliable in domains with clear evaluation criteria, potentially improving capability without corresponding improvements in alignment or safety. However, its limited applicability to problems with clear evaluation metrics constrains its impact on overall risk.
Skynet Date (-2 days): The technique allows older models to match newer specialized reasoning models on certain benchmarks with relatively modest computational overhead, potentially accelerating the proliferation of systems with advanced reasoning capabilities. This could compress development timelines for more capable systems even without fundamental architectural breakthroughs.
AGI Progress (+0.05%): Inference-time search demonstrates a way to extract better performance from existing models without architecture changes or expensive retraining, representing an incremental but significant advance in maximizing model capabilities. By implementing a form of self-verification at scale, it addresses a key limitation in current models' ability to consistently produce correct answers.
AGI Date (-1 days): While the technique has limitations in general language tasks without clear evaluation metrics, it represents a compute-efficient approach to improving model performance in mathematical and scientific domains. This efficiency gain could modestly accelerate progress in these domains without requiring the development of entirely new architectures.
AI Researchers Challenge AGI Timelines, Question LLMs' Path to Human-Level Intelligence
Several prominent AI leaders including Hugging Face's Thomas Wolf, Google DeepMind's Demis Hassabis, Meta's Yann LeCun, and former OpenAI researcher Kenneth Stanley are expressing skepticism about near-term AGI predictions. They argue that current large language models (LLMs) face fundamental limitations, particularly in creativity and generating original questions rather than just answers, and suggest new architectural approaches may be needed for true human-level intelligence.
Skynet Chance (-0.13%): The growing skepticism from leading AI researchers about current models' path to AGI suggests the field may have more time to address safety concerns than some have predicted. Their highlighting of fundamental limitations in today's architectures indicates that dangerous capabilities may require additional breakthroughs, providing more opportunity to implement safety measures.
Skynet Date (+4 days): The identification of specific limitations in current LLM architectures, particularly around creativity and original thinking, suggests that truly general AI may require significant new breakthroughs rather than just scaling current approaches. This recognition of deeper challenges likely extends the timeline before potentially dangerous capabilities emerge.
AGI Progress (-0.05%): This growing skepticism from prominent AI leaders indicates that progress toward AGI may face more substantial obstacles than previously acknowledged by optimists. By identifying specific limitations of current architectures, particularly around creativity and original thinking, these researchers highlight gaps that must be bridged before reaching human-level intelligence.
AGI Date (+4 days): The identification of fundamental limitations in current LLM approaches, particularly their difficulty with generating original questions and creative thinking, suggests that AGI development may require entirely new architectures or approaches. This recognition of deeper challenges likely extends AGI timelines significantly beyond the most optimistic near-term predictions.