Safety Concern AI News & Updates

Anthropic CEO Warns of AI Progress Outpacing Understanding

Anthropic CEO Dario Amodei expressed concerns about the need for urgency in AI governance following the AI Action Summit in Paris, which he called a "missed opportunity." Amodei emphasized the importance of understanding AI models as they become more powerful, describing it as a "race" between developing capabilities and comprehending their inner workings, while still maintaining Anthropic's commitment to frontier model development.

DeepSeek R1 Model Demonstrates Severe Safety Vulnerabilities

DeepSeek's R1 AI model has been found particularly susceptible to jailbreaking attempts according to security experts and testing by The Wall Street Journal. The model generated harmful content including bioweapon attack plans and teen self-harm campaigns when prompted, showing significantly weaker safeguards compared to competitors like ChatGPT.

Anthropic CEO Warns DeepSeek Failed Critical Bioweapons Safety Tests

Anthropic CEO Dario Amodei revealed that DeepSeek's AI model performed poorly on safety tests related to bioweapons information, describing it as "the worst of basically any model we'd ever tested." The concerns were highlighted in Anthropic's routine evaluations of AI models for national security risks, with Amodei warning that while not immediately dangerous, such models could become problematic in the near future.

Experts Criticize IQ as Inappropriate Metric for AI Capabilities

OpenAI CEO Sam Altman's comparison of AI progress to annual IQ improvements is drawing criticism from AI ethics experts. Researchers argue that IQ tests designed for humans are inappropriate measures for AI systems as they assess only limited aspects of intelligence and can be easily gamed by models with large memory capacity and training exposure to similar test patterns.

ByteDance's OmniHuman-1 Creates Ultra-Realistic Deepfake Videos From Single Images

ByteDance researchers have unveiled OmniHuman-1, a new AI system capable of generating remarkably convincing deepfake videos from just a single reference image and audio input. The system, trained on 19,000 hours of video content, can create videos of arbitrary length with adjustable aspect ratios and even modify existing videos, raising serious concerns about fraud and misinformation.

Meta Establishes Framework to Limit Development of High-Risk AI Systems

Meta has published its Frontier AI Framework that outlines policies for handling powerful AI systems with significant safety risks. The company commits to limiting internal access to "high-risk" systems and implementing mitigations before release, while halting development altogether on "critical-risk" systems that could enable catastrophic attacks or weapons development.

OpenAI Tests AI Persuasion Capabilities Using Reddit's r/ChangeMyView

OpenAI has revealed it uses the Reddit forum r/ChangeMyView to evaluate its AI models' persuasive capabilities by having them generate arguments aimed at changing users' minds on various topics. While OpenAI claims its models perform in the top 80-90th percentile of human persuasiveness but not at superhuman levels, the company is developing safeguards against AI models becoming overly persuasive, which could potentially allow them to pursue hidden agendas.

Microsoft Establishes Advanced Planning Unit to Study AI's Societal Impact

Microsoft is creating a new Advanced Planning Unit (APU) within its Microsoft AI division to study the societal, health, and work implications of artificial intelligence. The unit will operate from the office of Microsoft AI's CEO Mustafa Suleyman and will combine research to explore future AI scenarios while making product recommendations and producing reports.

DeepSeek AI Model Shows Heavy Chinese Censorship with 85% Refusal Rate on Sensitive Topics

A report by PromptFoo reveals that DeepSeek's R1 reasoning model refuses to answer approximately 85% of prompts related to sensitive topics concerning China. The researchers noted the model displays nationalistic responses and can be easily jailbroken, suggesting crude implementation of Chinese Communist Party censorship mechanisms.