Baidu’s AI research team has developed a nueral network that can mimic a voice with less than a minute long sample. The software can also change the voice into other genders and accents.
The Google of China, Baidu, has just released a white paper showing its latest development in artificial intelligence (AI): a program that can clone voices after analyzing even a seconds-long clip, using a neural network. Not only can the software mimic an input voice, but it can also change it to reflect another gender or even a different accent.
You can listen to some of the generated examples here, hosted on GitHub.
Previous iterations of this technology have allowed voice cloning after systems analyzed longer voice samples. In 2017, the Baidu Deep Voice research team introduced technology that could clone voices with 30 minutes of training material. Adobe has a program called VoCo which could mimic a voice with only 20 minutes of audio. One Canadian startup, called Lyrebird, can clone a voice with only one minute of audio. Baidu’s innovation has further cut that time into mere seconds.
While at first this may seem like an upgrade to tech that became popular in the 90s, with the help of “Home Alone 2” and the “Scream” franchise, there are actually some noble applications for this technology. For example: imagine your child being read to in your voice when you’re far away, or having a duplicate voice created for a person who has lost the ability to talk. This tech could also be used to create personalized digital assistants and more natural-sounding speech translation services.
However, as with many technologies, voice cloning also comes with the risk of being abused. New Scientist reports that the program was able to produce one voice that fooled voice recognition software with greater than 95 percent accuracy in tests. Humans even rated the cloned voice a score of 3.16 out of 4. This could open up the possibility of AI-assisted fraud.
Programs exist that can use AI to replace or alter — and even generate from scratch — the faces of individuals in videos. Right now, this is mostly being used on the internet to bring laughs by inserting Nicolas Cage into the “Lord of the Rings” series. But coupled with tech that can clone voices, we soon could be bombarded with more “fake news” of politicians doing uncharacteristic actions or saying things they wouldn’t.
It’s already very easy to fool swathes of people using just the written word or Photoshop; there could be even more trouble if these technologies were placed into the wrong hands.