The Pruna AI team has accelerated image generation in the FLUX.2 [flex] model threefold without compromising quality. We explain how this was achieved and what it means for users.
OpenHands has launched a benchmark demonstrating how models handle real-world GitHub tasks – from bug fixes to implementing new features in open-source projects.
AI: Events
How a Single Token Broke an Entire Model: The Story of a vLLM Bug
Technical context • Infrastructure
Engineers at AI21 Labs discovered a bizarre bug in vLLM that turned the Jamba model's normal responses into gibberish – and it was all down to a single incorrect token.
YouTube creators can now leverage AI avatar technology to produce short videos, thanks to a new platform tool powered by Supertone.
AI: Events
Chunk Size Depends on the Query: How AI21 Labs Proposes Solving a Major RAG System Challenge
Development
AI21 Labs demonstrated that a single «chunk» size in RAG systems is a compromise and proposed a simple way to adapt text segmentation to the user's query type.
AI: Events
Claude Taught to Write CUDA Kernels and Train Open Models
Technical context • Development
Anthropic has enhanced Claude's capabilities in handling low-level code and transferring knowledge to other models through its new «Extended Thinking» feature.
AMD has introduced a tool for automatically identifying the best quantization settings for ONNX models, eliminating the need for developers to manually sift through options.
AMD has demonstrated how to deploy OpenHands – an agent for automating code writing – on its server GPUs using the vLLM engine.
Cursor found a way to speed up the indexing of large codebases by safely reusing indexes created by colleagues, reducing the time from hours to seconds.