inference optimization AI News & Updates

SGLang Spins Out as RadixArk at $400M Valuation Amid Inference Infrastructure Boom

RadixArk, a commercial startup built around the popular open-source SGLang tool for AI model inference optimization, has raised funding at a $400 million valuation led by Accel. The company, founded by former xAI engineer Ying Sheng and originating from UC Berkeley's Databricks co-founder Ion Stoica's lab, focuses on making AI models run faster and more efficiently. This follows a broader trend of inference infrastructure startups raising significant capital, with competitors like vLLM pursuing $160M at $1B valuation and Baseten securing $300M at $5B valuation.

Nvidia Unveils Rubin Architecture: Next-Generation AI Computing Platform Enters Full Production

Nvidia has officially launched its Rubin computing architecture at CES, described as state-of-the-art AI hardware now in full production. The new architecture offers 3.5x faster model training and 5x faster inference compared to the previous Blackwell generation, with major cloud providers and AI labs already committed to deployment. The system includes six integrated chips addressing compute, storage, and interconnection bottlenecks, with particular focus on supporting agentic AI workflows.

DeepSeek Introduces Sparse Attention Model Cutting Inference Costs by Half

DeepSeek released an experimental model V3.2-exp featuring "Sparse Attention" technology that uses a lightning indexer and fine-grained token selection to dramatically reduce inference costs for long-context operations. Preliminary testing shows API costs can be cut by approximately 50% in long-context scenarios, addressing the critical challenge of server costs in operating pre-trained AI models. The open-weight model is freely available on Hugging Face for independent verification and testing.

Spanish Startup Raises $215M for AI Model Compression Technology Reducing LLM Size by 95%

Spanish startup Multiverse Computing raised €189 million ($215M) Series B funding for its CompactifAI technology, which uses quantum-computing inspired compression to reduce LLM sizes by up to 95% without performance loss. The company offers compressed versions of open-source models like Llama and Mistral that are 4x-12x faster and reduce inference costs by 50%-80%, enabling deployment on devices from PCs to Raspberry Pi. Founded by quantum physics professor Román Orús and former banking executive Enrique Lizaso Olmos, the company claims 160 patents and serves 100 customers globally.