Model Scaling AI News & Updates

DeepSeek Updates Prover V2 for Advanced Mathematical Reasoning

Chinese AI lab DeepSeek has released an upgraded version of its mathematics-focused AI model Prover V2, built on their V3 model with 671 billion parameters using a mixture-of-experts architecture. The company, which previously made Prover available for formal theorem proving and mathematical reasoning, is reportedly considering raising outside funding for the first time while continuing to update its model lineup.

OpenAI's Reasoning Models Show Increased Hallucination Rates

OpenAI's new reasoning models, o3 and o4-mini, are exhibiting higher hallucination rates than their predecessors, with o3 hallucinating 33% of the time on OpenAI's PersonQA benchmark and o4-mini reaching 48%. Researchers are puzzled by this increase as scaling up reasoning models appears to exacerbate hallucination issues, potentially undermining their utility despite improvements in other areas like coding and math.

OpenAI Expands GPT-4.5 Access Despite High Operational Costs

OpenAI has begun rolling out its largest AI model, GPT-4.5, to ChatGPT Plus subscribers, with the rollout expected to take 1-3 days. Despite being OpenAI's largest model with deeper world knowledge and higher emotional intelligence, GPT-4.5 is extremely expensive to run, costing 30x more for input and 15x more for output compared to GPT-4o, raising questions about its long-term viability in the API.

Ai2 Claims New Open-Source Model Outperforms DeepSeek and GPT-4o

Nonprofit AI research institute Ai2 has released Tulu 3 405B, an open-source AI model containing 405 billion parameters that reportedly outperforms DeepSeek V3 and OpenAI's GPT-4o on certain benchmarks. The model, which required 256 GPUs to train, utilizes reinforcement learning with verifiable rewards (RLVR) and demonstrates superior performance on specialized knowledge questions and grade-school math problems.