Microsoft introduced VASA-1, an AI tool that makes it possible to upload a still photo, add a voice sample, and create videos that look and sound like a real person. VASA-1 can take a single portrait-style image and an audio file and combine them to create a short video of a talking head with realistic facial expressions, head movements, and even singing with the uploaded voice.
Although Microsoft says that VASA-1 is currently only a research project and therefore does not make it available to others, it has published interesting videos about the tool. The company said the new tool was designed specifically for the purpose of animating virtual characters, so all individuals in its samples are synthetic and created using OpenAI’s DALL-E rendering model.
In the demo video, the talking heads appear to be real individuals filmed with smooth, natural-looking movements. Her lip-syncing abilities are particularly impressive, and it’s hard to spot unnatural movements. It’s also impressive that the VASA-1 doesn’t require a traditional, forward-facing, passport or portrait-style display to work. The examples include shots where heads are facing slightly different directions, and the model also offers a high level of control, using things like eye gaze direction, head distance, and even emotional expressions as input, adding to the realism.
Although Microsoft’s new technology is interesting, it can also be misused by attracting the attention of those who create deep fake videos. This may be why the company does not want to make this tool available to others at this time.
Source link: https://webrazzi.com/2024/04/19/microsofttan-fotograflari-konusturan-yapay-zeka-araci-vasa-1/