OpenAI's speech engine was first developed in late 2022.
OpenAI is making its artificial intelligence (AI) more human and creepy with a text-to-speech tool that takes a 15-second clip of someone's voice and generates a natural-sounding voice that sounds like the original speaker. did.
But even OpenAI is wary of the potential for misuse of the technology, saying it won't release Voice Engine to the public as it's currently only available to early testers.
The San Francisco-based company said in a statement that it “recognizes the serious risks of producing audio that resembles people's voices, and that is a top priority, especially during an election year.” said.
Voice cloning AI technology is not new and is already being used in some concerning circumstances.
Ahead of the U.S. primary vote in January, an AI-generated robocall imitating President Joe Biden was sent to thousands of voters, urging them to stay home and refrain from voting.
As a result, the US Federal Communications Commission (FCC) banned AI-generated robocalls last month.
But it's not just elections that could be affected; voice cloning technology and deepfakes could also be affected. Extortion scams involving fraud impersonating AI are also a growing concern.
But it can also be used for good. OpenAI demonstrated how this technology is helping patients suffering from sudden or degenerative language disorders by restoring speech using video and audio material from before they lost the ability to speak.
Another use case, according to OpenAI, is to provide non-robotic voices to people who cannot or have difficulty speaking.
“These small-scale deployments help us think about our approach, safeguards, and how voice engines can be beneficially used in a variety of industries,” OpenAI said in a blog post.
Voice Engine is currently available only to a few OpenAI partners, who the company says have agreed to usage policies that prohibit impersonating other people or organizations without their consent.
Companies with access to Voice Engine include education technology company Age of Learning, visual storytelling platform HeyGen, and healthcare system Lifespan.
OpenAI said another safety measure is watermarking to track the origin of the audio produced by the speech engine. Partners must also obtain “explicit and informed consent” from the original speaker.
“We believe that the broader deployment of synthetic voice technology will include a voice authentication experience that verifies that the original speaker knowingly added their voice to a service, as well as a voice authentication experience that verifies that the original speaker is intentionally adding their voice to a service, as well as the creation of overly inappropriate voices. “We believe it should be accompanied by a list of banned voices to detect and prevent, similar to famous people,” OpenAI said.