Artificial intelligence consumes energy on a scale that is becoming difficult to ignore, calling into question the very logic of infinite growth.
AI: Events
Monarch: How PyTorch Is Simplifying Supercomputer Management
Technical context • Infrastructure
PyTorch has introduced Monarch, a new tool designed to simplify the launching and debugging of distributed training tasks on large GPU clusters.
AI: Events
Higress: An AI Traffic Gateway to Replace the Legacy Nginx Ingress
Technical context • Infrastructure
Alibaba Cloud's Higress project has joined the CNCF Sandbox as a replacement for the retiring Nginx Ingress, offering zero downtime, support for AI traffic, and MCP servers.
Researchers have discovered that language models can optimize database query execution, doing so far more effectively than traditional statistical methods.
Red Hat and NVIDIA have jointly achieved leading results in the independent MLPerf Inference v6.0 test, which covers image recognition, speech, and reasoning tasks.
AI: Events
When One GPU Isn't Enough, and a Second Is Too Costly: A New Approach to Running AI in Production
Infrastructure
Two new open-source projects offer a way to run multiple AI models on a single GPU with dynamic memory management, without sacrificing performance.
How a small research team turns the theoretical potential of GPUs into real-world performance for AI systems – the story of the Together AI team.
AI: Events
One GPU Failure Shouldn't Bring Down the Entire System
Technical context • Infrastructure
The Mooncake and Volcano Engine teams have integrated an elastic expert parallelism mechanism into the SGLang framework, allowing it to withstand partial failures without requiring a restart.
AI: Events
AMD at MLPerf Inference 6.0: A Million Tokens Per Second and a Debut in Video Generation
Technical context • Infrastructure
AMD has presented its MLPerf Inference 6.0 results, showcasing new performance records, the first video generation tests, and scaling up to the cluster level on the Instinct MI355X GPU.