An MIT team has developed a method for generating 2K video that runs at the same speed as standard 720p generation, utilizing a two-stage processing scheme.
Specialists at AI21 Labs have demonstrated that simple data packing optimization during LLM training allows the process to be significantly sped up without altering the neural network architecture.
AI: Events
Unsloth Speeds Up MoE Model Training 12x and Boosts Context Window
Technical context • Development
Unsloth's new kernels and mathematical optimizations slash memory requirements by 35%, boost training speeds by 12x, and enable context windows six times longer than the original.
Oracle's AI data centers utilize a closed-loop cooling system where water circulates without evaporation or refills: it is filled just once.
AI: Events
AMD Shows How to Train Large Models Without the Fear of Losing Progress to a Single Crash
Infrastructure
The new pairing of TorchFT and TorchTitan allows model training on AMD GPUs to continue even after cluster node failures – without a full process restart.
AI: Events
Perplexity Shows How to Train Trillion-Parameter Models on AWS
Technical context • Infrastructure
The Perplexity team has adapted a framework for training ultra-large neural networks for Amazon's cloud infrastructure. This allowed them to eliminate the rigid dependency on proprietary NVIDIA hardware and utilize standard networking solutions.
AI: Events
RDMA for Language Models: When Servers Learn to Talk Directly to Each Other
Technical context • Infrastructure
The Perplexity AI team has demonstrated how direct server-to-server data transfer technology helps language models run faster and more efficiently by eliminating bottlenecks in network infrastructure.
AI: Events
Zyphra Finds a Way to Make Neural Network Attention Mechanisms Faster and More Efficient
Technical context • Infrastructure
Zyphra's new OVQ-attention layer aims to reduce memory and computational overhead when working with long contexts while maintaining high sequence processing quality.
The AI21 Labs team shared their experience optimizing vLLM – a popular tool for deploying language models that often faces critical errors due to RAM shortages when scaling.