AI-generated art is popping up everywhere, but that’s just the beginning. Microsoft recently released a new artificial intelligence tool called VALL-E, which is similar to DALL-E but for voices. After just three seconds of audio, VALL-E can replicate any voice.
If that sounds scary, that’s because it is. That is not all. After AItopics, Microsoft’s new tool, easily matches emotion and tone, which many speech AI tools struggle with. The team trained VALL-E on approximately 60,000 hours of English language data, and it demonstrated contextual learning abilities and was even able to replicate words it had never heard before.
The report states that VALL-E is capable of prompt-based TTS, follows context, and requires no pre-engineered acoustics or structural engineering to deliver a high-quality audio sample. Basically, this new AI tool is pretty impressive. All VALL-E needs is listening to about three seconds of any voice, and it will be able to mimic (or replicate) the voice quickly and easily.
There are several audio samples from the tool on GitHub, and while some sound great, others aren’t as impressive and have a robotic tone. But when it works, it works very well. However, we are still in the early days of VALL-E and things will get better over time. Also, it would likely be more accurate if the team used larger samples.
It’s important to note that VALL-E isn’t open to the public, at least not yet, so we can all breathe a sigh of relief. When that happens, there will no doubt be a slew of safety, social, and ethical concerns, to say the least. While this technology certainly sounds impressive, it’s also pretty wild.
via Windows Central
This article was previously published on Source link