Google has unveiled TurboQuant, an algorithm that compresses AI's working memory sixfold, which could fundamentally change the approach to neural network infrastructure.
How a small research team turns the theoretical potential of GPUs into real-world performance for AI systems – the story of the Together AI team.
AI: Events
One GPU Failure Shouldn't Bring Down the Entire System
Technical context • Infrastructure
The Mooncake and Volcano Engine teams have integrated an elastic expert parallelism mechanism into the SGLang framework, allowing it to withstand partial failures without requiring a restart.
AI: Events
AMD at MLPerf Inference 6.0: A Million Tokens Per Second and a Debut in Video Generation
Technical context • Infrastructure
AMD has presented its MLPerf Inference 6.0 results, showcasing new performance records, the first video generation tests, and scaling up to the cluster level on the Instinct MI355X GPU.
AI: Events
Red Hat AI Achieves Top Results in MLPerf Inference v6.0 – Here's What's Behind It
Infrastructure
Red Hat AI has secured top spots in the latest round of the MLPerf Inference v6.0 benchmark, testing three models on both NVIDIA and AMD GPUs.
AI: Events
SGLang at NVIDIA GTC 2026: A Behind-the-Scenes Look at a Top AI Conference
Technical context • Infrastructure
SGLang was prominently featured at NVIDIA GTC 2026 in multiple formats, from a mention in the keynote to a 200-person meetup and a hands-on lab.
OpenAI has closed the largest funding round in the history of the tech industry, securing $122 billion for AI infrastructure development and global expansion.
Google has launched Veo 3.1 Lite – a lightweight version of its video generation model that is significantly more affordable to use.
Liquid AI has introduced the compact language model LFM2.5-350M, explaining why even small models deserve serious attention.