For a long time, AI video creation looked something like this: enter text, get a short clip. Nice, but a one-off. Controlling details, building scenes, ensuring a character in the third frame looked the same as in the first – all of that was left by the wayside. Alibaba decided to change precisely that.
The company has introduced Wan2.7-Video – a model that, based on its stated capabilities, elevates AI video generation from simply «generating a moving picture» to a full-fledged production tool: not just a draft, but a finished video; not a single scene, but a cohesive story.
What's Inside and Why It Matters
Wan2.7-Video isn't a single model but a suite of four tools: text-to-video, image-to-video, reference-based generation, and a separate tool for editing existing videos. All four operate within a unified system that accepts text, images, video, and audio as input.
Simply put: you can start with a text description of a scene, add a reference photo of a character, attach an audio clip to set the mood – and the system will combine it all into a finished video. The length ranges from 2 to 15 seconds, with a resolution of 720p or 1080p.
This is significant because most existing tools are good at either generating or editing – but not both simultaneously, and certainly not within a single, unified workflow.
Director-Level Control Without an Editing Suite
One of the most interesting aspects of Wan2.7-Video is the level of detailed control users have over the output without resorting to specialized software.
Want to change the camera movement? Describe it in text. Need to rewrite a character's dialogue? The system automatically adjusts lip movements and preserves the voice's timbre. It supports several dozens of basic and complex camera techniques: pans, orbiting shots, and first-person views.
The handling of multiple characters is particularly noteworthy. The system maintains the visual and vocal identity of up to five different characters throughout the entire video – meaning the same character will look and sound consistent across different scenes. This is something AI tools have historically struggled with.
Additionally, the model supports over 50 emotional states for characters and thousands of visual style combinations – from realism to animated stylization.
From a Single Phrase to a Storyboard
A feature Alibaba particularly highlights: a single prompt is enough for the system to generate a complete storyboard with scene transitions, established lighting, and cinematographic choices. This isn't just a pretty clip – it's a structured narrative with editing logic.
The video continuation feature allows users to pre-define the final frame, eliminating the typical problem of an abrupt «cutoff» at the end of the generation. Transitions become smoother, and the story more cohesive.
Wan2.7-Image: A Bit Earlier, but Part of the Same Series
A few days before the video model's release, Alibaba introduced Wan2.7-Image – an image generation tool that addresses several chronic problems in AI graphics.
The first is visual blandness. Most AI generators produce images with an averaged aesthetic that's difficult to customize for a specific look. Wan2.7-Image offers detailed control over character features – down to the shape of the skull and eyes.
The second is color accuracy. For branding, this is critical: a corporate blue must be the exact blue specified in the style guide. The model supports the input of precise color codes.
The third is handling text within images. This has been a long-standing weakness of generative models: text would often appear distorted, with non-existent characters. Wan2.7-Image claims to support 12 languages and can generate readable text, tables, and formulas directly within the image.
The model can process up to nine reference images at once and outputs 12 variations in a single batch. Simultaneously, a Wan2.7-Image-Pro version was released, featuring 4K support and improved prompt interpretation.
What This Means for Content Creators
Both models are part of the same philosophy: giving individuals or small teams tools that previously required either expensive production setups or technical expertise in AI.
If it works as advertised, the barrier to entry for professional video and photo production will be significantly lowered. Anyone who can formulate ideas and build a narrative will have the ability to bring them to life without intermediaries.
For now, however, this describes capabilities based on the stated specifications. How stably the system performs in real-world scenarios – with unconventional prompts, complex references, and long narratives – remains to be seen.
The models are available through Alibaba's cloud platform and the official Wan series website, and are also integrated into the Qwen app.