AI Transparency AI News & Updates

EleutherAI Creates Massive Licensed Dataset to Train Competitive AI Models Without Copyright Issues

EleutherAI released The Common Pile v0.1, an 8-terabyte dataset of licensed and open-domain text developed over two years with multiple partners. The dataset was used to train two AI models that reportedly perform comparably to models trained on copyrighted data, addressing legal concerns in AI training practices.

OpenAI Skips Safety Report for GPT-4.1 Release, Raising Transparency Concerns

OpenAI has launched GPT-4.1 without publishing a safety report, breaking with industry norms of releasing system cards detailing safety testing for new AI models. The company justified this decision by stating GPT-4.1 is "not a frontier model," despite the model making significant efficiency and latency improvements and outperforming existing models on certain tests. This comes amid broader concerns about OpenAI potentially compromising on safety practices due to competitive pressures.

Meta Denies Benchmark Manipulation for Llama 4 AI Models

A Meta executive has refuted accusations that the company artificially boosted its Llama 4 AI models' benchmark scores by training on test sets. The controversy emerged from unverified social media claims and observations of performance disparities between different implementations of the models, with the executive acknowledging some users are experiencing "mixed quality" across cloud providers.

OpenAI Announces Plans for First 'Open' Language Model Since GPT-2

OpenAI has announced plans to release its first 'open' language model since GPT-2 in the coming months, with a focus on reasoning capabilities similar to o3-mini. The company is actively seeking feedback from developers, researchers, and the broader community through a form on its website and upcoming developer events in San Francisco, Europe, and Asia-Pacific regions.

DeepSeek Announces Open Sourcing of Production-Tested AI Code Repositories

Chinese AI lab DeepSeek has announced plans to open source portions of its online services' code as part of an upcoming "open source week" event. The company will release five code repositories that have been thoroughly documented and tested in production, continuing its practice of making AI resources openly available under permissive licenses.

Hugging Face Launches Open-R1 Project to Replicate DeepSeek's Reasoning Model in Open Source

Hugging Face researchers have launched Open-R1, a project aimed at replicating DeepSeek's R1 reasoning model with fully open-source components and training data. The initiative, which has gained 10,000 GitHub stars in three days, seeks to address the lack of transparency in DeepSeek's model despite its permissive license, utilizing Hugging Face's Science Cluster with 768 Nvidia H100 GPUs to generate comparable datasets and training pipelines.