Two research papers from the Typhoon team have been accepted to the EACL 2026 conference. They focus on evaluating speech models and handling long audio recordings, addressing key challenges in the field.
AI: Events
A Small Model That Hears Better: Turning a Multimodal AI into an Effective Audio Embedder
Research
Researchers have demonstrated how to transform a large multimodal model into a compact audio tool that surpasses competitors while being trained on 25 times less data.
Yandex AI Studio has updated its file search tool, enabling AI agents to work with tables, audio, and video to find information in corporate knowledge bases.
AI: Events
How AI Learns to 'Hear' What Matters: Extracting Data from Live Speech in Real Time
Development
We explore how modern speech recognition systems have learned to extract specific data – phone numbers, addresses, and emails – from conversations on the fly.
AssemblyAI has released the Universal-3 Pro model, which supports six languages and allows switching between them mid-speech without manual adjustments.
AssemblyAI has unveiled technology that can identify which participant is speaking in real time, even in crowded meetings.
Hume AI has open-sourced TADA, a speech model that performs frame-by-frame alignment of text and audio, making speech synthesis fast and predictable.
A new feature in ElevenCreative lets you turn text into a finished audiobook without stepping foot in a recording studio or hiring professional narrators.
Indian startup Sarvam AI has unveiled Bulbul V3 – a speech synthesis model supporting 15 languages and capable of voice cloning from a short audio sample.