January 22, 2026 News
New Benchmark Reveals AI Agents Still Far From Replacing White-Collar Workers
A new benchmark called Apex-Agents tests leading AI models on real white-collar tasks from consulting, investment banking, and law, revealing that even the best models achieve only about 24% accuracy. The models struggle primarily with multi-domain information tracking across different tools and platforms, a core requirement of professional knowledge work. Despite current limitations, researchers note rapid year-over-year improvement, with accuracy potentially quintupling from previous years.
Skynet Chance (-0.03%): The benchmark reveals significant current limitations in AI agents' ability to perform complex multi-domain tasks, suggesting that even advanced models lack the autonomous competence that would be necessary for uncontrolled, independent operation. These capability gaps provide evidence against near-term scenarios of AI systems operating without meaningful human oversight.
Skynet Date (+0 days): The research demonstrates that current AI systems struggle with real-world task complexity, indicating existing technical bottlenecks that must be overcome before AI could achieve the autonomous capability levels associated with uncontrollable scenarios. However, the noted rapid improvement trajectory (5-10% to 24% accuracy year-over-year) suggests these limitations may be temporary.
AGI Progress (-0.03%): The benchmark exposes a critical gap in current AI capabilities: the inability to effectively navigate and integrate information across multiple domains and tools, which is fundamental to general intelligence. The low accuracy scores (18-24%) on professional tasks highlight that despite advances in foundation models, systems still lack the robust real-world reasoning required for AGI.
AGI Date (+0 days): While the current low performance suggests AGI capabilities are further away than some predictions implied, the documented rapid improvement rate (potentially quintupling accuracy year-over-year) indicates progress may accelerate once key bottlenecks are addressed. The establishment of this rigorous benchmark provides a clear target for AI labs to optimize against, which could paradoxically accelerate development.
Humans& Raises $480M to Build Foundation Model for AI-Powered Team Coordination
Humans&, a startup founded by former employees of Anthropic, Meta, OpenAI, xAI, and Google DeepMind, has raised a $480 million seed round to develop a foundation model focused on social intelligence and team coordination rather than traditional chatbot capabilities. The company plans to build a new model architecture trained using long-horizon and multi-agent reinforcement learning to enable AI systems that can coordinate people, manage group decisions, and serve as connective tissue across organizations. The startup aims to create both the model and product interface together, positioning itself as a coordination layer rather than a plugin for existing collaboration tools.
Skynet Chance (+0.04%): Multi-agent AI systems with social intelligence and coordination capabilities could increase risks of emergent behaviors and collective AI autonomy that are harder to predict or control than single-agent systems. The focus on AI systems that mediate human decisions and organizational coordination also increases dependency on AI for critical social functions.
Skynet Date (-1 days): Development of novel multi-agent RL architectures and social intelligence models represents a new frontier that could accelerate capabilities in autonomous coordination, though the early-stage nature and focus on human-AI collaboration rather than pure autonomy provides some moderating influence. The substantial funding enables faster research progress in this previously underexplored area.
AGI Progress (+0.03%): The focus on social intelligence, long-horizon planning, and multi-agent coordination addresses key AGI capabilities beyond current chatbot limitations, representing progress toward more general intelligence that can navigate complex social and collaborative contexts. Training models to understand motivations, balance competing priorities, and coordinate across multiple agents moves closer to human-like general reasoning.
AGI Date (-1 days): The $480 million seed funding and talent concentration from top AI labs accelerates development of underexplored model architectures focused on social intelligence and multi-agent systems, which are critical components of AGI. The company's approach of co-developing novel training methods with product interfaces could yield faster insights into coordination capabilities that other labs haven't prioritized.
Google DeepMind Acquires Hume AI Leadership Team to Enhance Voice Emotion Recognition
Google DeepMind has hired the CEO and approximately seven engineers from voice AI startup Hume AI through a licensing agreement, aiming to improve Gemini's voice features with emotional intelligence capabilities. This "acquihire" represents the latest trend of major AI companies acquiring startup talent without buying the company outright, potentially to avoid regulatory scrutiny. The deal underscores voice AI's emergence as a critical competitive frontier, with Hume AI's technology specializing in detecting user emotions and mood through voice analysis.
Skynet Chance (+0.01%): Enhanced emotional recognition in AI systems could marginally increase manipulation capabilities and make AI interactions more persuasive, though this represents incremental capability improvement rather than fundamental alignment risk. The consolidation of talent at major labs may reduce diversity in safety approaches.
Skynet Date (+0 days): The acquihire accelerates voice AI development at a major lab, slightly advancing the timeline for more capable and emotionally-aware AI systems. However, the impact on overall risk timeline is minimal as voice interfaces represent a narrow application domain.
AGI Progress (+0.01%): Emotional intelligence and multimodal voice interaction represent important dimensions of general intelligence, and consolidating this expertise at DeepMind advances progress toward more human-like AI capabilities. This acquisition demonstrates ongoing investment in making AI systems more contextually aware and adaptive.
AGI Date (+0 days): The concentration of specialized talent at a leading AI lab with substantial resources likely accelerates the development timeline for advanced multimodal AI systems. The industry-wide focus on voice as the next frontier, evidenced by parallel investments at OpenAI and Meta, suggests coordinated acceleration in this capability area.
Neurophos Raises $110M for Optical AI Chips Claiming 50x Efficiency Over Nvidia
Neurophos, a Duke University spinout, has raised $110 million led by Gates Frontier to develop optical processing units using metamaterial-based metasurface modulators for AI inferencing. The startup claims its photonic chips will deliver 235 POPS at 675 watts compared to Nvidia's B200 at 9 POPS at 1,000 watts, representing a claimed 50x advantage in energy efficiency and speed. Production is expected by mid-2028 using standard silicon foundry processes.
Skynet Chance (+0.01%): More efficient AI hardware could enable larger-scale deployment of AI systems and reduce barriers to running advanced models, potentially increasing proliferation risks. However, the technology is primarily focused on inferencing rather than training, limiting its impact on developing fundamentally more capable systems.
Skynet Date (+0 days): If successful, dramatically more efficient inference hardware could accelerate AI deployment timelines by reducing cost and power barriers, though the 2028 production target limits near-term impact. The technology addresses scaling bottlenecks that currently constrain widespread AI system deployment.
AGI Progress (+0.03%): Breakthrough hardware efficiency could enable more complex AI architectures and larger-scale continuous learning systems that are currently constrained by power and cost. Removing compute bottlenecks historically accelerates progress in AI capabilities by enabling new research directions.
AGI Date (-1 days): A 50x improvement in inference efficiency could significantly accelerate AGI timelines by making continuous learning, massive-scale deployment, and more complex architectures economically viable. However, the 2028 production timeline and focus on inference rather than training moderates the near-term acceleration effect.
Claude AI Models Now Outperform Humans on Anthropic's Technical Hiring Tests
Anthropic's performance optimization team has been forced to repeatedly redesign their technical hiring test as newer Claude models have surpassed human performance. Claude Opus 4.5 now matches even the strongest human candidates on the original test, making it impossible to distinguish top applicants from AI-assisted cheating in take-home assessments. The company has designed a novel test less focused on hardware optimization to combat this issue.
Skynet Chance (+0.04%): AI systems demonstrating superior performance to top human candidates in complex technical tasks suggests advancing capabilities that could eventually exceed human oversight and control in critical domains. The inability to distinguish AI output from human expertise raises concerns about autonomous AI systems operating undetected in technical fields.
Skynet Date (-1 days): The rapid progression from Claude models being detectable to surpassing human experts within a short timeframe indicates faster-than-expected capability advancement. This acceleration in practical coding and optimization abilities suggests AI development timelines may be compressed.
AGI Progress (+0.04%): AI surpassing top human technical candidates in specialized optimization tasks represents significant progress toward general cognitive abilities. The rapid improvement from Opus 4 to 4.5 matching even the strongest human performers demonstrates meaningful advancement in reasoning and problem-solving capabilities.
AGI Date (-1 days): The successive versions of Claude achieving and then exceeding human-expert performance within a compressed timeframe suggests capabilities are scaling faster than anticipated. This rapid progression in practical technical competence indicates AGI milestones may be reached sooner than baseline projections.