April 20, 2025 News

OpenAI's Public o3 Model Underperforms Company's Initial Benchmark Claims

Independent testing by Epoch AI revealed OpenAI's publicly released o3 model scores significantly lower on the FrontierMath benchmark (10%) than the company's initially claimed 25% figure. OpenAI clarified that the public model is optimized for practical use cases and speed rather than benchmark performance, highlighting ongoing issues with transparency and benchmark reliability in the AI industry.