Scientists film throat movements to decode the spoken word
Eavesdroppers soon might have another way to monitor far-off conversations. All they will need is the right camera, pointed at a speaker’s throat.
When you talk, your voice box jiggles and wiggles the skin on your neck. Those vibrations can reveal to an eavesdropper what you’re saying, reports a new study. Its authors include engineer Yasuhiro Oikawa and other researchers from Waseda University in Tokyo, Japan.
The team used a high-speed camera to record thousands of images per second of a speaker’s throat. Those images captured the throat as it vibrated. The team then fed that vibration data into a computer program. The program could then re-create the sound of the person speaking.
Like lipreading, the new technique relies on sight to decipher speech. Its use of computers isn’t new either. For example, lipreading software already exists. But most of those programs monitor the motions of a speaker’s lips, jaws and tongue. As a result, they can identify only what words were said, not how they were said, Oikawa told Science News.
Words are important, he said, but so is the sound of speech. That’s because people communicate thoughts and emotions through volume, pitch and tone too. Pitch is the highness or lowness of a sound. And a person changes the tone of their voice to express a particular feeling or mood.
Oikawa presented the findings in June at a meeting of researchers who study acoustics, or the science of sound. But don’t go looking for the new spying software just yet. Oikawa and his coworkers have only tested the technology. They have not created a final product.
The high-speed camera the scientists used recorded 10,000 frames per second. A frame is a single image. By comparison, most theaters show movies at a rate of 24 frames per second. With such a speedy camera, the scientists could capture every small movement of a person’s throat.
Two volunteers agreed to have their throats recorded as they spoke the Japanese word tawara, which means straw bale or bag. The scientists then used a computer program to analyze the movie footage and re-create the sound of the person speaking. The reproduced sound was clear enough to understand the word, Oikawa reported. He hopes to be able to record and play back an entire sentence before the end of the year.
Other scientists who did not work on the experiment reacted with excitement, caution and skepticism. Physicist Claire Prada from the National Center for Scientific Research in Paris told Science News the finding is promising but proves only that the technology works in principle.
Meanwhile, Weikang Jiang said he would have liked to hear the computer sounds for himself. Oikawa showed pictures of the sound waves but did not let the audience hear the actual sounds. “He didn’t show us the results,” said Jiang, a mechanical engineer at Shanghai Jiao Tong University in China.
Stay tuned: Oikawa will continue to work on his eavesdropping camera. For his next project, he’s training the camera on the wobbling skin of a person’s cheek. There, small but visible changes in the skin may help him fine-tune the ability to re-create a speaker’s voice.
engineering The use of math and science to solve practical problems.
acoustics The science related to sounds and hearing.
physics The scientific study of the nature and properties of matter and energy.
voice box The hollow, muscular organ forming an air passage to the lungs and holding the vocal cords in people and other mammals. It’s also known as the larynx.
pitch The highness or lowness of a sound, determined by the vibration that made the sound.
tone Changes in a voice that express a particular feeling or mood.
sound wave A wave that transmits sound. Sound waves have alternating swaths of high and low pressure.