Google engineers are currently developing a new system capable of isolating voices out of a crowd.
According to a team from Google Research, the tool utilizes a deep learning system to mimic what is known as the “cocktail party effect,” the ability of humans to focus on a single voice in a loud environment.
A research paper outlining the project says the system “incorporates both visual and auditory signals” in order to achieve its goal.
“We demonstrate the applicability of our method to classic speech separation tasks, as well as real-world scenarios involving heated interviews, noisy bars, and screaming children, only requiring the user to specify the face of the person in the video whose speech they want to isolate,” the paper says.
Video released by the team shows how the technology is able to work in several different scenarios.
“To train our model, we collected 90,000 high-quality lectures, TED talks and how-to videos from YouTube, then automatically extracted from these videos roughly 2000 hours of video clips with visible speakers and clean speech with no interfering sounds,” the team added.
Google says it believes the new system will likely aid numerous other technologies, including those developed by Google itself.
“We envision a wide range of applications for this technology,” Google said. “We are currently exploring opportunities for incorporating it into various Google products.”
As noted by Catalin Cimpanu, Security News Editor for Bleeping Computer, the tool is also likely to be used for surveillance-related purposes as well.
“The system can also be deployed with CCTV systems to aid authorities isolate a single person’s voice inside noisy audio tracks recorded by video surveillance cameras,” Cimpanu says.