The spectrum of immersive sound: New technologies create enhanced audio experience


Imagine stepping from life and being totally immersed in the story during your next cinema experience. Hearing everything as if you were actually there in the scene. Close your eyes. You’re at a café in Paris, around you dishes are clanking and patrons are engaged in conversations. A woman is shouting from a third-floor window and birds are chirping in the trees. High overhead, a jet cruises by, and you subconsciously note that it’s departing to the east. You hear the familiar footsteps of your date approaching behind you. You hear all these details exactly where they belong.

This is the goal of Immersive Sound, the next big advance in cinema technology.
This emerging technology gives audio professionals more creative latitude than today’s standard 5.1 or 7.1-channel-based systems allow. Immersive technologies open up the height and space available for sounds and provide editors with the tools to identify objects (birds, planes, gunshots) and move those objects into those spaces as desired. Fundamentally, immersive sound allows audiences to be immersed in the sound field and, more importantly, into the story.

Achieving this next level of reality requires new approaches to traditional cinema sound. Existing 5.1 and 7.1 systems are channel-based and sounds emanate from the screen and from the horizontal plane around the audience. Immersive technologies add height with additional speakers to lift the sound off the screen. Auro Technologies, for example, provides an 11.1 solution (Auro 11.1) that adds additional height channels above the standard 5.1 configuration. With clever technology, Auro uses the lowest four bits of each audio channel to extract the height audio from the lower channels. Thus the same audio files can be used for both a standard 5.1 feature, or decoded to 11.1 with cinema servers equipped with Auro 11.1 technology.

Object-based technologies provide a way to move beyond channel-based encoding. Both Dolby Atmos and DTS MDA (Multi-Dimensional Audio) systems employ audio objects, which are sounds with associated location metadata. These objects are independent of any particular audio channel or speaker location. With their own location information, rendering equipment places the sounds to match the original creative intent given theatre-specific speaker configuration. Object-based technologies are extensible, with current implementations supporting up to 128 simultaneous objects and 64 speaker channels.

Object-based solutions also maintain the notion of channel-based “bed channels.” These channels have sounds assigned directly to specific channels, typically with at least the traditional 5.1 speaker layout. In practice, objects are rendered and placed on top of the bed channels and all other available speakers to provide the total immersive experience—an immersive experience only rivaled by exact environmental reproduction.

Dolby Atmos Post-production

Dolby provides an RMU processing unit for the dubbing stage. The Dolby ProTools plugin handles the sound object direction while the RMU unit handles audio panning and metadata in real-time. The RMU unit also creates the audio master used to create the DCPs.
Dolby also provides specific details on the optimum speaker location and characteristics for a particular auditorium.

In order to place audio objects at the correct position, a location must be identified in 3D space. Interestingly, Atmos and MDA use different coordinate systems to map their objects in metadata—with Atmos using Cartesian and MDA using polar coordinates. MDA distributions use polar coordinates; however, Auro's mixing tools work in the Cartesian domain, following the preference of mixing engineers, and the tools convert to polar for the MDA output.

Auro/MDA Post-production

Supporting the object-based approach, Auro provides the ability to generate an MDA-formatted mix in addition to the Auro-encoded format, Auro-3D®, which provides a path for Auro beyond the channel-based Auro 11.1.

For Auro and MDA, an object-based workflow is similar to traditional mixing in that panning is done with Auro Technologies’ ProTools plugin for post-production mixing software. However, the artist now has more spatial coverage control and the ability to attach object locational information—in the form of metadata—to the final mix. This metadata is what allows theatres to properly reproduce all the new creative possibilities given to the original artist.

With MDA object-based mixes that can render to nearly all theatre configurations, exhibitors have the ability to scale the number of speakers to an auditorium’s specific needs. Since a single audio mix supports multiple theatre configurations, DCP delivery can be simplified by avoiding separate discrete audio mixes.

Cinema Playback
Playback hardware is available to support each of the immersive audio systems. There is not space here to go into each offering, but USL has implemented an MDA system in software which renders immersive audio on existing cinema media servers. This unique approach allows MDA implementations up to 13.1 on existing servers without the need for any external rendering hardware. This technology on other servers could make immersive audio less costly to implement and expand its use via simple software updates on existing equipment.

The major immersive systems, Auro, Atmos and MDA, each have their differences and unique advantages to distinguish their immersive audio formats. However, these initial offerings have a great deal in common and the parties are all working diligently with the industry standards bodies toward a common goal.

Future Standards
The standards groups are particularly active in the area of immersive sound. Their challenge is to sort out the technical differences and find the common ground that will provide the industry with guidance and best practices for a robust immersive standard.

Digital Cinema Initiatives (DCI) has provided an addendum for Digital Cinema Object-Based Audio and more recently additional detail on Standardized Audio Formats. DCI has also added the Multiple Media Block Architecture (MMB) which will accommodate the playback of Object-Based Audio Essence (OBAE).

SMPTE’s immersive standardization effort is chaired by Peter Lude. He leads a group of 70 expert participants and is receiving strong support from industry veterans. Peter indicates there has been “exemplary constructive collaboration in the immersive work” and that the draft file format is expected in early 2015, which is less than a year away.

Providing an extraordinary user experience is what immersive audio is all about and this next big thing is happening now. Major motion pictures are already being released using immersive technologies. For example, The Amazing Spider-Man 2 and How to Train Your Dragon 2 were released with Auro 11.1 sound and Gravity and Frozen were released with Dolby Atmos.

We highly recommend that you seek out the select theatres that are equipped for these new technologies and listen to the new reality for yourself!

Immersive sound: The term used to describe sound that emanates from sources beyond the horizontal plane by means of enhanced spatial properties such as additional height and overhead speakers and localized apparent sound sources within the auditorium.

Immersive audio
: Audio that is created with the intent of being reproduced as Immersive Sound via an Immersive Sound System. Immersive audio can consist of channel- and/or object-based audio essence, with associated metadata, including temporal and spatial metadata.