Spatial (3D) Audio - Visisonics & Vuze+ camera

Author: Michel Henein, VP Product at VisiSonics Corporation.

Through millions of years of evolution, sound perception became crucial to the human experience. Our brains collect information about the environment through subtle changes in sound for survival -- to determine whether or not an environment is small or large, indoor or outdoor, location of threats, etc. We perceive sound in all directions at once which is quite different from our foveated vision; our eyes only provide a partial ‘field of view’ of the world around us, and so in order to see something behind, for example, you have to turn your head. With sound, you perceive something behind you without the need to turn your head.

At CES 2017, Humeneyes and VisiSonics announced a partnership bringing 3D audio capabilities to the VUZE+ camera, based on over a decade research and development at the University of Maryland.

While attempting to solve perceptual issues associated with sound simulation in virtual environments, it was discovered (among many things) that in the human brain, the area dedicated to aural perception is similar in size to the area dedicated to visual perception, being part of a larger perceptual system. From a simulation perspective, if the aural cues do not match what’s happening visually, the brain will experience fatigue preventing users from wanting to continue in a virtual experience. For VR/AR/MxR adoption (including stereo 360 video), 3D audio is critical in order to deliver the level of immersion that the human brain expects - anything less makes breaks immersion in the virtual experience.

360 Degree Storytelling

When it comes to storytelling, sound in traditional cinema is used to support the visual field in front of the audience: cinematic sound is mixed in away to prevent the audience to look behind them, for example, because all of the relevant aspects of the story are always conveyed in front.

With VR/AR/MxR, the story can be conveyed in any direction, thus content creators must approach sound differently than traditional media. If you want the user to look in a certain direction at a particular point in time, sound can be used as a prompt (i.e. “look here” voice over, etc.) With the “look here” VO example, 3D audio is key to render the voice spatially so that the user can turn to look in the right place.

3D Audio: Object-based vs. Ambisonics

Real-time rendered VR experiences typically use object-based 3D audio, which use a graphics-like audio engine to propagate sound through a virtual scene (as a plugin for Unreal or Unity, for example.) Several object-based 3D audio renderers exist, including VisiSonics’ own RealSpace 3D Audio (RS3D), which uses physics-based sound propagation for the most accurate and immersive 3D audio.

For 360 video, capturing audio as object data is too computationally intensive, and requires a player that can read the scene data (such players are being developed, including by VisiSonics.) A commonly used intermediate audio solution, which allows the representation of the sound field near a listener using specialized mathematical functions called spherical harmonics, is known as Ambisonics. Developed in the 1970s under direction of the British National Research Development Corporation, Ambisonics is a sound-encoding method to store 3D audio as a full-sphere using information from a minimum of four audio signals for first order, and more audio signals as the order increases. YouTube and Facebook have adopted Ambisonics as the 3D audio system for their 360 video players due to the lower run-time processing requirements of lower-order Ambisonics. In addition, Ambisonics sound fields are rotatable with yaw, pitch, and roll for use with head-mounted displays.

How does the VUZE+ audio system work?

Humaneyes, when designing the VUZE+ camera, added a microphone array to capture 3D audio and 360 video simultaneously. The VUZE+ comes equipped with four microphones arranged on a horizontal plane which captures four channels of audio. The captured audio is then downloaded from the camera as four audio files to the VUZE+ VR Studio application where it is converted, using special code developed by VisiSonics, to an Ambisonics soundfield.

The VUZE+ camera uses four microphones arranged horizontally on a flat plane thus capturing sound only in 360 degrees on the horizontal plane (front and rear.) While the VUZE+ itself does not capture audio above and below, due to the planar configuration of the microphones, using the RealSpace 360 (RS360) Cinema audio post production software developed by VisiSonics, users are able to add additional audio tracks in a full-spherical sound-field, including above and below. For example, a user can add ‘sweetening tracks’ to an outdoor scene recorded with the VUZE+ camera using RS360.

1st Order, B-Format Ambisonics. The VUZE+ camera does not capture the Z axis,
but using RS360, users can add additional tracks in full 3D.

Using the VUZE+ VR Studio software from Humaneyes, audio captured from the camera is converted to Ambisonics (the four microphone signals are converted to a four channel 1st order, B-format Ambisonics file - the Z channel is not used since height is not recorded.)

Once the Ambisonics output file is generated from VR Studio, the file can be imported into RealSpace 360 Cinema.

Introducing RealSpace 360 (RS360) Cinema for VUZE+ users

Available as a complement to the VUZE+ VR Studio software, RS360 is a 3D audio authoring tool for 360 content creators enabling the following:

  • Import of various audio and video formats (including stereoscopic video, Ambisonics up to 7th order, mono, and stereo audio formats)
  • Authoring of spatial audio paths using keyframes using a highly visual and intuitive workflow without a professional audio workstation
  • Export to YouTube, Facebook 360 video platforms in addition to up to 7th order Ambisonics audio
  • All VUZE+ purchasers will receive a 6 month free trial key of RS360 cinema

With RS360, users can add additional tracks to be mixed with the Ambisonics audio from VR Studio, in full-spherical 3D, enabling content creators to create more sophisticated sound mixes for export to YouTube and Facebook. Additional mono tracks can be keyframed with visually intuitive spatial audio paths with track objects represented in a 3D ‘Scene Editor’.

For more information on VisiSonics and RealSpace 360 Cinema, visit

RS3D object-based renderer is available for Unity and Wwise with additional engine support coming soon