Published on April 7, 2026

Alibaba Wan2.7-Video: One Prompt to Rule AI Video

Alibaba's Wan2.7-Video: One Prompt – and You're the Director

Alibaba has released Wan2.7-Video – a video generation model that gives users control over the entire video creation process, from script to final edit.

Products 4 – 6 minutes min read
Event Source: Alibaba Cloud 4 – 6 minutes min read

For a long time, AI video creation looked something like this: enter text, get a short clip. Nice, but a one-off. Controlling details, building scenes, ensuring a character in the third frame looked the same as in the first – all of that was left by the wayside. Alibaba decided to change precisely that.

The company has introduced Wan2.7-Video – a model that, based on its stated capabilities, elevates AI video generation from simply «generating a moving picture» to a full-fledged production tool: not just a draft, but a finished video; not a single scene, but a cohesive story.

Wan2.7-Video Features and Significance

What's Inside and Why It Matters

Wan2.7-Video isn't a single model but a suite of four tools: text-to-video, image-to-video, reference-based generation, and a separate tool for editing existing videos. All four operate within a unified system that accepts text, images, video, and audio as input.

Simply put: you can start with a text description of a scene, add a reference photo of a character, attach an audio clip to set the mood – and the system will combine it all into a finished video. The length ranges from 2 to 15 seconds, with a resolution of 720p or 1080p.

This is significant because most existing tools are good at either generating or editing – but not both simultaneously, and certainly not within a single, unified workflow.

Advanced Video Control Without Editing Software

Director-Level Control Without an Editing Suite

One of the most interesting aspects of Wan2.7-Video is the level of detailed control users have over the output without resorting to specialized software.

Want to change the camera movement? Describe it in text. Need to rewrite a character's dialogue? The system automatically adjusts lip movements and preserves the voice's timbre. It supports several dozens of basic and complex camera techniques: pans, orbiting shots, and first-person views.

The handling of multiple characters is particularly noteworthy. The system maintains the visual and vocal identity of up to five different characters throughout the entire video – meaning the same character will look and sound consistent across different scenes. This is something AI tools have historically struggled with.

Additionally, the model supports over 50 emotional states for characters and thousands of visual style combinations – from realism to animated stylization.

AI Storyboard Generation From Text Prompt

From a Single Phrase to a Storyboard

A feature Alibaba particularly highlights: a single prompt is enough for the system to generate a complete storyboard with scene transitions, established lighting, and cinematographic choices. This isn't just a pretty clip – it's a structured narrative with editing logic.

The video continuation feature allows users to pre-define the final frame, eliminating the typical problem of an abrupt «cutoff» at the end of the generation. Transitions become smoother, and the story more cohesive.

Alibaba Wan2.7-Image: AI for Image Generation

Wan2.7-Image: A Bit Earlier, but Part of the Same Series

A few days before the video model's release, Alibaba introduced Wan2.7-Image – an image generation tool that addresses several chronic problems in AI graphics.

The first is visual blandness. Most AI generators produce images with an averaged aesthetic that's difficult to customize for a specific look. Wan2.7-Image offers detailed control over character features – down to the shape of the skull and eyes.

The second is color accuracy. For branding, this is critical: a corporate blue must be the exact blue specified in the style guide. The model supports the input of precise color codes.

The third is handling text within images. This has been a long-standing weakness of generative models: text would often appear distorted, with non-existent characters. Wan2.7-Image claims to support 12 languages and can generate readable text, tables, and formulas directly within the image.

The model can process up to nine reference images at once and outputs 12 variations in a single batch. Simultaneously, a Wan2.7-Image-Pro version was released, featuring 4K support and improved prompt interpretation.

Impact on Content Production and Creation

What This Means for Content Creators

Both models are part of the same philosophy: giving individuals or small teams tools that previously required either expensive production setups or technical expertise in AI.

If it works as advertised, the barrier to entry for professional video and photo production will be significantly lowered. Anyone who can formulate ideas and build a narrative will have the ability to bring them to life without intermediaries.

For now, however, this describes capabilities based on the stated specifications. How stably the system performs in real-world scenarios – with unconventional prompts, complex references, and long narratives – remains to be seen.

The models are available through Alibaba's cloud platform and the official Wan series website, and are also integrated into the Qwen app.

Original Title: Alibaba Unveils Wan2.7-Video to Elevate Creators from Executors to Directors
Publication Date: Apr 7, 2026
Alibaba Cloud www.alibabacloud.com A Chinese cloud and AI division of Alibaba, providing infrastructure and AI services for businesses.
Previous Article GitHub Copilot CLI Learns to Consult Itself Next Article Google's Gemma 4: What Will It Change for On-Device AI?

Related Publications

You May Also Like

Explore Other Events

Events are only part of the bigger picture. These materials help you see more broadly: the context, the consequences, and the ideas behind the news.

From Source to Analysis

How This Text Was Created

This material is not a direct retelling of the original publication. First, the news item itself was selected as an event important for understanding AI development. Then a processing framework was set: what needs clarification, what context to add, and where to place emphasis. This allowed us to turn a single announcement or update into a coherent and meaningful analysis.

Neural Networks Involved in the Process

We openly show which models were used at different stages of processing. Each performed its own role — analyzing the source, rewriting, fact-checking, and visual interpretation. This approach maintains transparency and clearly demonstrates how technologies participated in creating the material.

1.
Claude Sonnet 4.6 Anthropic Analyzing the Original Publication and Writing the Text The neural network studies the original material and generates a coherent text

1. Analyzing the Original Publication and Writing the Text

The neural network studies the original material and generates a coherent text

Claude Sonnet 4.6 Anthropic
2.
Gemini 2.5 Pro Google DeepMind step.translate-en.title

2. step.translate-en.title

Gemini 2.5 Pro Google DeepMind
3.
Gemini 2.5 Flash Google DeepMind Text Review and Editing Correction of errors, inaccuracies, and ambiguous phrasing

3. Text Review and Editing

Correction of errors, inaccuracies, and ambiguous phrasing

Gemini 2.5 Flash Google DeepMind
4.
DeepSeek-V3.2 DeepSeek Preparing the Illustration Description Generating a textual prompt for the visual model

4. Preparing the Illustration Description

Generating a textual prompt for the visual model

DeepSeek-V3.2 DeepSeek
5.
FLUX.2 Pro Black Forest Labs Creating the Illustration Generating an image based on the prepared prompt

5. Creating the Illustration

Generating an image based on the prepared prompt

FLUX.2 Pro Black Forest Labs

Don’t miss a single experiment!

Subscribe to our Telegram channel —
we regularly post announcements of new books, articles, and interviews.

Subscribe