Waiting for your creations!
Experience synchronized sound and visuals, with high-fidelity voices, ASMR, ambient sounds, music, multilingual support.

Generate cinematic 10-second 1080P 24fps videos with richer temporal-spatial detail, full storytelling capability, and stable.

Unlock advanced natural language understanding and instruction following, creating images or videos from input.

Wan2.5 is Alibaba’s latest multimodal generation model. The Wan2.5-Preview version supports text-to-video, image-to-video, text-to-image, and image editing, featuring the first-ever audio-visual synced video generation capability, with HD 1080P, 24fps videos and matching voices, sound effects, and music
Wan2.5 can generate high-definition videos from text or images with synchronized voice and sound effects, enabling creators to quickly produce short videos, animation clips, or film previews, reducing production costs and time.

By generating dynamic scenes and character animations from text or images, Wan2.5 helps game developers efficiently create in-game cinematics, cutscenes, and immersive virtual environments.

Educators can transform course content, diagrams, or key concepts into engaging instructional videos with clear narration and background music, making learning more intuitive and immersive.

Marketing teams can use Wan2.5 to create promotional videos from ad copy and product images, with synchronized voiceovers and music, enabling fast, high-quality content production that drives higher engagement and conversion.

With Wan2.5’s audio-visual synchronization, users can create virtual hosts or AI characters for product demos, live presentations, or interactive content, making virtual communication more natural.

Wan2.5 combines text, image, and video modalities to generate high-quality videos suitable for multilingual teaching, international promotion, and cross-platform content creation, enhancing audience engagement and reach.

Emily R.
Wan2.5 has completely changed how I create content. I can generate high-quality videos from just a text prompt in minutes — it saves me so much time!
David L.
The audio-visual synchronization is incredible. I’ve never seen an AI model match voice, sound effects, and music to the visuals this accurately.
Sophia M.
I use Wan2.5 for marketing videos, and the results are stunning. Even complex animations come out professional without any manual editing.
James K.
As someone who isn’t skilled in video editing, Wan2.5 has made it possible for me to produce cinematic-quality content with ease.
Olivia T.
The multimodal capabilities are fantastic — I can start from text, an image, or even an audio clip, and Wan2.5 understands it all.
Michael B.
I love that it supports 1080P at 24fps. The videos are smooth, realistic, and ready for professional use right out of the AI.
Wan2.5 is a multimodal AI generation model developed by Alibaba, supporting text-to-video, image-to-video, text-to-image, and image editing. It also offers synchronized audio-visual generation.
You can generate videos with human voice, sound effects, and music, as well as high-resolution images or edited images from prompts, existing media, or audio input.
Wan2.5 supports HD 1080P videos at 24 frames per second, delivering smooth and professional-quality video outputs.
No. Wan2.5 is designed to lower the creative barrier. You can create professional-grade content by simply providing prompts, images, or audio.
Currently, video length is limited by system constraints, but Wan2.5 can generate continuous video sequences using frame-to-frame techniques for longer projects.
Wan2.5 can automatically generate matching human speech, sound effects, and background music that align with the video visuals, ensuring seamless audio-visual integration.
The model supports multiple languages for text prompts and speech output, making it suitable for global content creation.
