February 16, 2025 News

Researchers Use NPR Sunday Puzzle to Test AI Reasoning Capabilities

Researchers from several academic institutions created a new AI benchmark using NPR's Sunday Puzzle riddles to test reasoning models like OpenAI's o1 and DeepSeek's R1. The benchmark, consisting of about 600 puzzles, revealed intriguing limitations in current models, including models that "give up" when frustrated, provide answers they know are incorrect, or get stuck in circular reasoning patterns.

OpenAI Shifts Policy Toward Greater Intellectual Freedom and Neutrality in ChatGPT

OpenAI has updated its Model Spec policy to embrace intellectual freedom, enabling ChatGPT to answer more questions, offer multiple perspectives on controversial topics, and reduce refusals to engage. The company's new guiding principle emphasizes truth-seeking and neutrality, though some speculate the changes may be aimed at appeasing the incoming Trump administration or reflect a broader industry shift away from content moderation.