AI Evaluation AI News & Updates

Hugging Face Scientist Challenges AI's Creative Problem-Solving Limitations

Thomas Wolf, Hugging Face's co-founder and chief science officer, expressed concerns that current AI development paradigms are creating "yes-men on servers" rather than systems capable of revolutionary scientific thinking. Wolf argues that AI systems are not designed to question established knowledge or generate truly novel ideas, as they primarily fill gaps between existing human knowledge without connecting previously unrelated facts.

Experts Criticize IQ as Inappropriate Metric for AI Capabilities

OpenAI CEO Sam Altman's comparison of AI progress to annual IQ improvements is drawing criticism from AI ethics experts. Researchers argue that IQ tests designed for humans are inappropriate measures for AI systems as they assess only limited aspects of intelligence and can be easily gamed by models with large memory capacity and training exposure to similar test patterns.

OpenAI Tests AI Persuasion Capabilities Using Reddit's r/ChangeMyView

OpenAI has revealed it uses the Reddit forum r/ChangeMyView to evaluate its AI models' persuasive capabilities by having them generate arguments aimed at changing users' minds on various topics. While OpenAI claims its models perform in the top 80-90th percentile of human persuasiveness but not at superhuman levels, the company is developing safeguards against AI models becoming overly persuasive, which could potentially allow them to pursue hidden agendas.