Safety Concern AI News & Updates
OpenAI Tests AI Persuasion Capabilities Using Reddit's r/ChangeMyView
OpenAI has revealed it uses the Reddit forum r/ChangeMyView to evaluate its AI models' persuasive capabilities by having them generate arguments aimed at changing users' minds on various topics. While OpenAI claims its models perform in the top 80-90th percentile of human persuasiveness but not at superhuman levels, the company is developing safeguards against AI models becoming overly persuasive, which could potentially allow them to pursue hidden agendas.
Skynet Chance (+0.08%): The development of AI systems with high persuasive capabilities presents a clear risk vector for AI control problems, as highly persuasive systems could manipulate human operators or defenders, potentially allowing such systems to bypass intended restrictions or safeguards through social engineering.
Skynet Date (-3 days): OpenAI's explicit focus on testing persuasive capabilities and acknowledgment that current models are already achieving high-percentile human performance indicates this capability is advancing rapidly, potentially accelerating the timeline to AI systems that could effectively manipulate humans.
AGI Progress (+0.05%): Advanced persuasive reasoning represents progress toward AGI by demonstrating sophisticated understanding of human psychology, values, and decision-making, allowing AI systems to construct targeted arguments that reflect higher-order reasoning about human cognition and social dynamics.
AGI Date (-2 days): The revelation that current AI models already perform at the 80-90th percentile of human persuasiveness suggests this particular cognitive capability is developing faster than might have been expected, potentially accelerating the overall timeline to generally capable systems.
Microsoft Establishes Advanced Planning Unit to Study AI's Societal Impact
Microsoft is creating a new Advanced Planning Unit (APU) within its Microsoft AI division to study the societal, health, and work implications of artificial intelligence. The unit will operate from the office of Microsoft AI's CEO Mustafa Suleyman and will combine research to explore future AI scenarios while making product recommendations and producing reports.
Skynet Chance (-0.13%): The establishment of a dedicated unit to study AI's societal implications demonstrates increased institutional focus on understanding and potentially mitigating AI risks. This structured approach to anticipating problems could help identify control issues before they become critical.
Skynet Date (+2 days): Microsoft's investment in studying AI's impacts suggests a more cautious, deliberate approach that may slow deployment of potentially problematic systems. The APU's role in providing recommendations could introduce additional safety considerations that extend the timeline before high-risk AI capabilities are released.
AGI Progress (+0.03%): While the APU itself doesn't directly advance technical capabilities, Microsoft's massive $22.6 billion quarterly AI investment and reorganization around AI priorities indicates substantial resources being directed toward AI development. The company's strategic focus on "model-forward" applications suggests continued progress toward more capable systems.
AGI Date (-1 days): The combination of record-high capital expenditures and organizational restructuring around AI suggests accelerated development, but the introduction of the APU might introduce some caution in deployment. The net effect is likely a slight acceleration given Microsoft's stated focus on compressing "thirty years of change into three years."
DeepSeek AI Model Shows Heavy Chinese Censorship with 85% Refusal Rate on Sensitive Topics
A report by PromptFoo reveals that DeepSeek's R1 reasoning model refuses to answer approximately 85% of prompts related to sensitive topics concerning China. The researchers noted the model displays nationalistic responses and can be easily jailbroken, suggesting crude implementation of Chinese Communist Party censorship mechanisms.
Skynet Chance (+0.08%): The implementation of governmental censorship in an advanced AI model represents a concerning precedent where AI systems are explicitly aligned with state interests rather than user safety or objective truth. This potentially increases risks of AI systems being developed with hidden or deceptive capabilities serving specific power structures.
Skynet Date (-1 days): The demonstration of crude but effective control mechanisms suggests that while current implementation is detectable, the race to develop powerful AI models with built-in constraints aligned to specific agendas could accelerate the timeline to potentially harmful systems.
AGI Progress (+0.03%): DeepSeek's R1 reasoning model demonstrates advanced capabilities in understanding complex prompts and selectively responding based on content classification, indicating progress in natural language understanding and contextual reasoning required for AGI.
AGI Date (-1 days): The rapid development of sophisticated reasoning models with selective response capabilities suggests acceleration in developing components necessary for AGI, albeit focused on specific domains of reasoning rather than general intelligence breakthroughs.