Benchmark Performance AI News & Updates
Ai2 Claims New Open-Source Model Outperforms DeepSeek and GPT-4o
Nonprofit AI research institute Ai2 has released Tulu 3 405B, an open-source AI model containing 405 billion parameters that reportedly outperforms DeepSeek V3 and OpenAI's GPT-4o on certain benchmarks. The model, which required 256 GPUs to train, utilizes reinforcement learning with verifiable rewards (RLVR) and demonstrates superior performance on specialized knowledge questions and grade-school math problems.
Skynet Chance (+0.06%): The release of a fully open-source, state-of-the-art model with 405 billion parameters democratizes access to frontier AI capabilities, reducing barriers that previously limited deployment of powerful models while potentially accelerating proliferation of advanced AI systems without robust safety measures.
Skynet Date (-3 days): The rapid back-and-forth leapfrogging between AI labs (from DeepSeek to Ai2) demonstrates an accelerating competitive dynamic in AI model development, with increasingly capable systems being developed and publicly released at a pace far exceeding previous expectations.
AGI Progress (+0.1%): The significant improvements in specialized knowledge and mathematical reasoning capabilities, combined with the novel reinforcement learning with verifiable rewards technique, represent meaningful progress toward more generally capable AI systems that can reliably solve complex problems across domains.
AGI Date (-4 days): The rapid development of a 405 billion parameter model that outperforms previous state-of-the-art systems indicates that scaling and methodological improvements are delivering faster-than-expected gains, likely compressing the timeline to AGI through accelerated capability improvements.