OpenAI has released GPT-5.3 Instant – a lightweight version of the model designed for speed and convenience in casual conversations and routine tasks.
AI: Events
25x Inference Speedup: What's Happening with AI Performance on New NVIDIA Hardware
Infrastructure
The new NVIDIA GB300 NVL72 server, paired with the SGLang framework, has demonstrated a 25x performance boost when running language models.
Qualcomm has introduced a comprehensive infrastructure for running large AI models, featuring a server rack, expansion cards, and a management system as a single integrated solution.
AI: Events
Offline Tuning in PyTorch: Accelerating Neural Networks Before Their First Run
Technical context • Infrastructure
An exploration of how TunableOp technology enables the pre-selection of optimal parameters for neural networks, and why this is a valuable practice.
AI: Events
Cache as a Resource: How Alibaba Cloud Teaches AI Not to Calculate the Same Thing Twice
Technical context • Infrastructure
Alibaba Cloud has introduced a precise request routing mechanism for language models that significantly boosts caching efficiency in distributed inference.
AI: Events
How to Safely Update AI Services: Canary Releases Across Multiple Clusters
Infrastructure
We explore how companies update AI services without the risk of widespread outages, and why the canary release approach is becoming an industry standard.