Published March 4, 2026

MiniMax Music 2.5+: Generate AI Music Without Vocals

MiniMax Music 2.5+: Now You Can Generate Music Without Vocals

MiniMax has released an update to its music model – Music 2.5+ can now create instrumental tracks without vocals based on a text description.

Products
Event Source: MiniMax Reading Time: 3 – 4 minutes

AI music generation is evolving quite rapidly. However, for a long time, most tools shared one common characteristic: they would readily add vocals even when they weren't needed. If you desired an atmospheric background track for a video or a calm instrumental theme, a singer often came bundled in. MiniMax has addressed this gap in its Music 2.5 model by releasing an update called Music 2.5+.

What's New in Music 2.5+

What's New

The primary innovation is the ability to explicitly specify whether vocals are required in a track. Simply put, you can now instruct the model to generate purely instrumental music, and it will do precisely that – without any randomly appearing voices.

Before this update, the model might add a vocal part even if the description didn't mention one. Now, the user has direct control over this parameter: want vocals? Turn them on. Don't want them? Turn them off.

Practical Uses of Instrumental AI Music

Practical Applications

Instrumental music is a distinct and highly sought-after format, utilized extensively as background music in podcasts, video clips and commercials, games and applications, and educational materials. In such instances, a voice can be distracting as it competes with the main content.

For independent creators producing content on a tight budget, the ability to quickly generate a suitable instrumental track from a text description is a real time-saver. Previously, they had to either search for a suitable track in stock music libraries or manually remove the vocals from a generated result.

How Instrumental Music Generation Works

How It Works – In a Nutshell

The model takes a text description of the music you wish to create – for example, “a calm piano melody with light strings” – and generates a track based on it. Explicit control over the presence or absence of vocals has now been added to this process.

This isn't a fundamentally new architecture, but rather a significant expansion of functionality that makes the model suitable for a wider range of tasks.

Music 2.5+ Explained

Music 2.5+ in Context

Music 2.5 is MiniMax's music model, positioned as a versatile tool for generating audio from text. The 2.5+ version doesn't rework the entire model; instead, it introduces a specific mode of operation: instrumental.

The plus in the name here signifies a targeted improvement, rather than a full-fledged next generation. This is common practice in the development of AI tools: releasing an interim update that addresses a specific user request without rewriting everything from scratch.

Future of AI Music Generation Control

What Remains to Be Seen

Control over the presence of vocals is just one of many parameters that influence the quality and usability of the generated music. How accurately the model follows text descriptions, how consistently it maintains the desired mood and tempo, and how it handles unconventional requests – all these factors still vary from case to case and require real-world testing.

Nevertheless, the very existence of explicit vocal control is a step toward a more predictable tool. And in creative tools, predictability is valued no less than the quality of the result.

Original Title: Music 2.5+: Unlock instrumental music
Publication Date: Mar 4, 2026
MiniMax www.minimax.io A Chinese AI company developing large language and multimodal models for dialogue and content generation.
Previous Article How AMD Optimizes Recommendation Model Training: A Simple Guide to a Complex Task Next Article Robots That Remember: How Long- and Short-Term Memory Are Changing Robot Control

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

Indian company Sarvam AI has unveiled a system for automatically dubbing videos into regional languages while preserving the original intonations and synchronizing lip movements.

Sarvamwww.sarvam.ai Feb 8, 2026

Want to dive deeper into the world
of neuro-creativity?

Be the first to learn about new books, articles, and AI experiments
on our Telegram channel!

Subscribe