Search

Enthusiasts Smitten by Image to Video Tool – VASA-1

Enthusiasts Smitten by Image to Video Tool - VASA-1

As the race for AI supremacy continues, Microsoft now wants to transform people’s portrait pictures into talking faces or videos with its latest tool, VASA-1.

According to a research paper by the tech giant, Microsoft is taking the AI race to another level, with VASA 1, framework for creating lifelike talking faces of virtual characters with visual affective skills (VAS), all from a portrait.

Also read: Video Game Industry Rush to Unionize Over AI

From portraits to talking faces

Although it is not yet available to the public, the tool takes a single portrait photo and speech audio and produces a hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements generated in real-time.

The tool is still at research preview stage with the Microsoft Research team, and the demo videos “look impressive.”

While companies like Nvidia and Runway already have similar head movement and lip sync technology, VASA-1 seem “to be of a much higher quality and realism,” which reduces mouth artifacts, according to Tom’s Guide.

Additionally, this approach to audio driven animation is also like the recent Vlogger AI model by Google Research.

According to Microsoft, while all the images in the demonstration examples are synthetic created by Dall-E, VASA-1 can still animate a real picture.

The demo shows different people talking with almost natural movements, facial expressions, eye movements “no artifacts around top and bottom of the mouth seen in other tools.”

It also does not require a face-forward portrait style image for it to work.

VASA-1 got people talking

Already, AI enthusiasts seem smitten by the technology describing it as “wild” and “insane” on the X platform.

“The improvements we’re getting between each release is incredible,” said Linus Ekenstam.

Others are of the view the world is witnessing a “seismic shift in the way media content is created” and how it’s consumed.

“This is mind blowing, the realism is top notch,” said another enthusiast identified as Sam.

Although others recognize the tool’s abilities, they also think it is a bit irresponsible on the part of Microsoft to introduce a tool that can easily be manipulated for election deepfakes.

“Wild to drop this right before the election,” wrote Rowan Cheung on X platform.

Another user Evan Kirstel commented with a stern warning: “Microsoft Research’s VASA-1 is a game-changer, creating hyper-realistic AI-generated videos from just a photo and audio.”

“The possibilities are endless, from reviving classic cinema legends to personalized media. But let’s stay alert to deepfake risks.”

Already, the world has seen an influx of election deepfakes where politicians’ voices or images have been manipulated using AI to spread propaganda. About a third of the global population is going for polls this year.

However, the researchers at Microsoft have indicated this is just for demonstration and there are currently no plans for a public release or making it available to developers.

How does VASA-1 work?

According to Tom’s Guide, the researchers themselves are surprised at the model’s ability to “perfectly lip-sync to a song, reflecting the words from the singer without issue despite no music being used in the training dataset.”

Additionally, VASA-1 handled different image styles including the historical portraits like the famous Mona Lisa.

The tool could be used in gaming on the back of its advanced lip-sync abilities. This, experts have said, could be a game changer for immersion.

Additionally, the technology can be instrumental in creating avatars for social media videos, as in the case with firms like Synthesia and HeyGen.

AI-based movies and music video productions can also leverage VASA-1 technology for more realistic videos.

There are chances that with Microsoft having a stake in OpenAI, VASA-1 could be part of a “future Copilot Sora integration.”

 

 

 

Image credits: Shutterstock, CC images, Midjourney, Unsplash.

Welcome

Install
×