Home Tech Updates Meta’s latest auditory AIs promise a more immersive AR/VR experience

Meta’s latest auditory AIs promise a more immersive AR/VR experience

by Patricia R. Mills

As Meta CEO Mark Zuckerberg envisions, the Metaverse will be a fully immersive virtual experience that rivals reality, at least from the waist up. But the visuals are only part of the overall Metaverse experience.

“Getting spatial audio right is key to delivering a realistic sense of presence in the metaverse,” Zuckerberg wrote in a Friday blog post. “If you’re at a concert or just talking with friends around a virtual table, a realistic sense of where sound is coming from makes you feel like you’re there.”

The blog post notes that that concert will sound very different if performed in a full-sized concert hall than in a middle school auditorium because of the differences between their physical spaces and acoustics. As such, Meta’s AI and Reality Lab (MAIR, formerly FAIR) is collaborating with researchers from UT Austin to develop a trio of open-source audio “understanding tasks” to help developers build more immersive AR and VR experiences with more lifelike audio.Meta's latest auditory AIs promise a more immersive AR/VR experience

Want to hear what the NY Philharmonic would sound like inside San Francisco’s Boom Boom Room? Now you can. The first is MAIR’s Visual Acoustic Matching model, which can adapt a sample audio clip to any given environment using a picture of the space. ‘Previous simulation models could recreate a room’s acoustics based on its layout — but only if the precise geometry and material properties were already known — or from audio sampled within the space, neither producing particularly accurate results.

MAIR’s solution is the Visual Acoustic Matching model, called AViTAR, which “learns acoustic matching from in-the-wild web videos, despite lacking acoustically mismatched audio and unlabeled data,” according to the post.

“One future use case we are interested in involves reliving memories,” Zuckerberg wrote, betting on nostalgia. “Imagine being able to put on a pair of AR glasses and see an object with the option to play a memory associated with it, such as picking up a tutu and seeing a hologram of your child’s ballet recital. The audio strips away reverberation and makes the memory sound like when you experienced it, sitting in your exact seat in the audience.”

MAIR’s Visually-Informed Dereverberation mode (VIDA), on the other hand, will strip the echoey effect from playing an instrument in a large, open space like a subway station or cathedral. You’ll hear just the violin, not the reverberation of it bouncing off distant surfaces. Specifically, it “learns to remove reverberation based on both the observed sounds and the visual stream, which reveals cues about room geometry, materials, and speaker locations,” the post explained. This technology could be used to isolate vocals and spoken commands more effectively, making them easier for humans and machines to understand.

VisualVoice does the same as VIDA but for voices. During its self-supervised training sessions, it uses visual and audio cues to learn how to separate voices from background noises. Meta anticipates this model getting a lot of work in the machine understanding applications and improving accessibility. Think, more accurate subtitles, Siri understanding your request even when the room isn’t dead silent or having the acoustics in a virtual chat room shift as people speaking move around the digital space. Again, ignore the lack of legs.

“We envision a future where people can put on AR glasses and relive a holographic memory that looks and sounds the exact way they experienced it from their vantage point, or feel immersed by not just the graphics but also the sounds as they play games in a virtual world,” Zuckerberg wrote, noting that AViTAR and VIDA can only apply their tasks to the one picture they were trained for and will need a lot more development before public release. “These models are bringing us even closer to the multimodal, immersive experiences we want to build in the future.”

Our editorial team, independent of our parent company, selects all products Engadget recommends. Some of our stories include affiliate links. We may earn an affiliate commission if you buy something through one of these links.

You may also like