The SGLang team has released a new system to accelerate video generation, featuring support for long videos and real-world optimizations for high-load environments.
AI: Events
ByteDance Releases Dola-Seed-2.0-Preview: A Long-Context Model with Advanced Reasoning
Products
ByteDance has introduced Dola-Seed-2.0-Preview, a new language model that combines long-context capabilities, advanced analytical features, and multimodality.
AI: Events
Tencent Releases the Most Compact Language Model: 0.3 Billion Parameters in 600 MB
Development
The Chinese company has open-sourced the HY-1.8B-2Bit model with 2-bit quantization – it weighs less than many mobile apps.
AI: Events
Gang Scheduling: Balancing Rigidity and Flexibility in AI Compute Allocation
Technical context • Infrastructure
We explore how Gang Scheduling technology helps efficiently allocate resources for training AI models and why striking a balance between rigidity and flexibility is crucial.
The Higress cloud gateway has been updated to support the Gateway API standard and now includes specialized features for working with artificial intelligence models.
AI: Events
Olmix: Allen AI's Approach to Data Mixing Across All Stages of Language Model Training
Development
Allen AI has introduced Olmix, an open-source framework for data mixing in the language model training process, including pre-training, instruction tuning, and alignment.
AI: Events
AI Agents Write CUDA Kernels: GPT and Claude Learn to Generate GPU Code
Technical context • Development
Two AI agents can create optimized CUDA kernels to speed up operations straight from a task description. Let's dive into what this means for people working with models.
AI: Events
MiniMax Introduces Forge: A Platform for Training AI Agents on Powerful Computing Clusters
Infrastructure
Chinese company MiniMax has released Forge, an open platform designed for training agents using reinforcement learning on large-scale GPU clusters.
AI: Events
How AMD and Qwen Optimized MI300X GPUs for Peak Performance
Technical context • Infrastructure
The Qwen team optimized their models to effectively run on AMD MI300X GPUs, achieving a response latency as low as 15 ms per token and full image generation in just 0.4 seconds.