about.subtitle

multimodal models

In this section, we curate materials dedicated to systems that transcend the limitations of text input and learn to perceive the world through diverse data channels. We are exploring architectures capable of simultaneously processing and aligning visual imagery, sound, text, and sensory information. We view multimodality not merely as a technical complexity, but as a fundamental shift in the way meanings are conveyed and interpreted.

Alibaba DAMO Academy has unveiled RynnBrain, an open-source model for robot control capable of interpreting its environment and making real-world decisions.

Alibaba Cloudwww.alibabacloud.com Feb 25, 2026

Alibaba has introduced Qwen3.5, the first model in the Qwen3 family, adept at processing text, images, and audio natively, without needing additional adapters.

Alibaba Cloudwww.alibabacloud.com Feb 17, 2026

AI: Events

How AMD and Qwen Optimized MI300X GPUs for Peak Performance

Technical context Infrastructure

The Qwen team optimized their models to effectively run on AMD MI300X GPUs, achieving a response latency as low as 15 ms per token and full image generation in just 0.4 seconds.

LMSYS ORGlmsys.org Feb 13, 2026

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe