Artificial Speech Mimics Real Thing

Reading audio



November 02, 2011

At a University of Arizona laboratory, Professor Brad Story analyzes the voices of singers, actors and public speakers both live and on audio tape.

Computers break down speech digitally. Magnetic-resonance equipment freezes images of the throat, tongue, teeth, and lips as they create sound.

This is pure research, but it has practical applications. Story's findings may one day help performers and those suffering from Parkinson's Disease and other medical conditions. In the former case, it looks like it won't be long before singers can employ synthesized backup singers.

In the latter case, this research can help the medical community develop therapy that improves patients' quality of speech.

Story is especially fascinated by synthetic voices, used in everything from video games to machines that translate documents into spoken words for the visually impaired. He is working to make these artificial voices sound more human by re-creating the complicated physics that go into the production of speech.

They include the intake of air into one's lungs, the mechanics of vocal folds in the larynx, and the movement of acoustic waves through the vocal tract.

Typical synthesized speech sounds robotic because it's difficult to duplicate human anatomy. Story says that's because it takes more muscles to say a few words than it does to play the piano.

Story says singing is in some ways easier to simulate, because it tends to have long, sustained vowels, a more precise pitch, and a musical score to mask the imperfections. There is already a singing synthesizer, marketed by the Japanese company Yamaha, called "Vocaloid," that allows users to compose and perform new songs.

Once scientists are successful in duplicating whole blocks of speech, Story says, surgeons can predict how patients will sound after throat surgery, and can make synthesized speech sound realistic for patients who cannot speak at all.

And some day, he confidently predicts, the general public will hear artificial speech on the telephone, in games, and on buses and subways that sounds so natural, we'll have a difficult time telling the difference between real and made-up human voices.