Posted by & filed under AI/Artificial Intelligence, Ethical issues.

Microsoft recently unveiled its cutting-edge text-to-speech AI language model VALL-E, which it claims can mimic any voice — including its emotional tone, vocal timbre and even the background noise — after training using just three seconds of audio.
The researchers believe VALL-E could work as a high-quality text-to-speech synthesizer, as well as a speech editor that could doctor audio recordings to include phrases not originally said. Coupled with generative AI models like OpenAI’s GPT-3, the developers say VALL-E could even be used in original audio content creation.
The development has some experts sounding alarm bells over the technology’s implications for misuse; through VALL-E and other generative AI programs, malicious actors could mass produce audio-based disinformation at unprecedented scales, sources say.

Source: Toronto Daily Star

Date: January 13th, 2023

Link to research paper with some really cool examples of the input and output:


  1. How could this be used in a very bad way?
  2. Is there anything to be done to stop VALL-E being used in a very bad way?

Leave a Reply

Your email address will not be published. Required fields are marked *