Researchers Can Now Hear Your Conversations Through a Potato Chip Bag—Find Out How!
MIT Researchers use video analysis to capture speech and other sounds from object vibrations, opening new possibilities in surveillance and material analysis.
Researchers from MIT, Microsoft, and Adobe have created an innovative algorithm capable of reconstructing audio by analyzing the minute vibrations of objects captured in video footage. This breakthrough was demonstrated by extracting intelligible speech from the subtle movements of a potato-chip bag, filmed through soundproof glass at a distance of 15 feet.
The technology also successfully extracted audio from videos of everyday items like aluminum foil, a glass of water, and plant leaves.
The findings will be presented at the Siggraph conference. These experiments utilized both high-speed cameras, recording thousands of frames per second, and standard digital cameras, which exploit sensor quirks to analyze high-frequency vibrations at 60 frames per second.
While the high-speed camera provided more accurate audio reconstructions, the standard camera still offered valuable insights, potentially identifying speaker characteristics in a room.
The technique not only has significant implications for law enforcement and forensic investigations but also introduces a novel method for exploring the acoustic and material properties of various objects. This method represents a substantial advancement in how visual information can be used to infer sound, suggesting a myriad of unforeseen applications.
PS Guess what? This technology was developed 10 years ago in 2014!
Woah…scary weird
Why can't they just leave us the "F" alone?