Alibaba Cloud has open-sourced SysOM MCP – a tool that allows AI agents to independently diagnose problems in server and system operations.
AI: Events
How AI Learns to Improve Its Own Code: An Experiment in Self-Optimization
Technical context • Research
AMD researchers have demonstrated how an AI agent can iteratively optimize high-performance code without human intervention.
Helion, a DSL for writing fast ML kernels, has gained a new automatic tuning mechanism based on Bayesian optimization that saves developers' time.
What if the chaotic behavior of complex systems is merely a matter of perspective? Scientists have discovered a way to 'hide' non-linearity within the very structure of the network.
AI: Events
How to Train Large Language Models Without Constantly Babysitting the Terminal
Technical context • Infrastructure
AMD demonstrates how to set up LLM training on GPU clusters so that failures are handled automatically, eliminating the need for manual intervention.
AI: Events
25x Inference Speedup: What's Happening with AI Performance on New NVIDIA Hardware
Infrastructure
The new NVIDIA GB300 NVL72 server, paired with the SGLang framework, has demonstrated a 25x performance boost when running language models.
LightOn has released EDiTh, an open-source benchmark that allows testing corporate search on realistic documents without the risk of leaking confidential data.
AI: Events
How AMD Is Teaching Neural Networks to Work Together: Ray and ROCm 7 for Large-Scale ML Tasks
Technical context • Infrastructure
AMD has explained how to run distributed ML tasks on GPUs using Ray and ROCm 7 – from model training to creating agent-based systems.
AI: Events
How to Train an Image Generation Model in 24 Hours: The Photoroom Team's Experience
Development
The Photoroom team shares how they managed to train their own image generation model in just 24 hours and what the results were.