professional tasks AI News & Updates

Research Breakthrough

OpenAI released GDPval, a new benchmark testing AI models against human professionals across 44 occupations in nine major industries. GPT-5 performed at or above human expert level 40.6% of the time, while Anthropic's Claude Opus 4.1 achieved 49%, representing significant progress from GPT-4o's 13.7% score just 15 months prior.

Economic Impact GPT-5 benchmarking professional tasks human-AI comparison

+0.04% -1 days

Skynet Chance (+0.04%): AI models approaching human-level performance across diverse professional tasks suggests rapid capability advancement that could lead to unforeseen emergent behaviors. However, the limited scope of current testing and acknowledgment of gaps provides some reassurance about maintaining oversight.

Skynet Date (-1 days): The dramatic improvement from 13.7% to 40.6% human-level performance in just 15 months indicates an accelerating pace of AI capability development. This rapid progress timeline suggests potential risks may emerge sooner than previously expected.

AGI Progress (+0.04%): Demonstrating near-human performance across diverse professional domains represents significant progress toward AGI's goal of general intelligence across multiple fields. The benchmark directly measures economically valuable cognitive work, a key component of human-level general intelligence.

AGI Date (-1 days): The rapid improvement trajectory shown in GDPval results, with nearly triple performance gains in 15 months, suggests AGI development is accelerating faster than anticipated. OpenAI's systematic approach to measuring progress across economic sectors indicates focused advancement toward general capabilities.

professional tasks AI News & Updates

OpenAI's GPT-5 Shows Near-Human Performance Across Professional Tasks in New Economic Benchmark