Bridging the gap between traditional film sound and interactive audio for games

Bridging the gap between traditional film sound and interactive audio for games


Kevin Bolen, Interactive Audio Supervisor at Skywalker Sound, spoke to Behind the Glass about his work crafting immersive audio experiences for cinematic sound and virtual reality games. With ten years of experience in films, AAA games and virtual reality his credits include Avengers: Damage Control, Vader Immortal: A Star Wars VR Series, Millennium Falcon: Smugglers Run and Wreck-it Ralph: Ralph breaks VR. He has won the Golden Reel Award and has been nominated for BAFTA and the Game Audio Network Guild Awards.


Bolen’s work includes bridging the gap between traditional film sound and interactive audio for games but how does he do that? “Traditional cinema allows total control of what the audience sees and hears at all points in the narrative, including standards for the quality of projection and the format and quality of the sound system” he said.  “The more absolute the control of the intended experience the greater the audience’s suspension of disbelief. In comparison, immersion in video games is achieved by relinquishing control to the player, allowing them to increase their sense agency and mastery over the mechanics of the game world.  I strive to convince listeners that their personal narrative experience is being scored for them in real-time. The flow of the music and the evolution of the soundscape can be creatively altered subtly or drastically, based on as little input as the turn of the listener’s head. In this way, even a passive, non-interactive linear cinematic can be made to feel dynamic based on the movement of the listener”.


Game audio has continued to evolve immensely since the first blip was heard in Pong. “Today’s VR experiences and video games are able to play back full fidelity uncompressed audio files at film sound quality standards, and in some cases take advantage of higher resolution audio files than are used by  typical film post-production standards” added Bolen. “However, I almost never have full control of what the player sees or hears, and so I always have to relinquish a little bit of control over the mix to the listener themselves. Additionally, I have very little control over how the end user hears an experience, whether they use headphones or not, or the quality or number of speakers that they are using.  So I have to invest time ensuring translation of the mix to a larger number of devices with a greater disparity of sound quality, from a tiny speaker on the side of a mobile phone all the way up to an immersive home theatre system”.   


When mixing for games narratively important sounds are not always the loudest sounds in the games.  Artificially boosting narratively important dialog over sound effects and music is commonplace in most games, but when the game is in VR, it can feel like the sound of the world is dropping out around you when those characters speak.  “I try to coordinate the frequency ranges of dialog, music, and sound effects from the earliest stages so that they all start with their own space in the mix, and utilize frequency-dependent ducking so that only conflicting frequencies are lowered in volume, rather than the entire soundscape”.


“Virtual reality experiences often call for a higher standard of immersion or embodiment than video games or other screen based experiences, so I ensure that everything that might make a sound can make a sound, when needed” added Bolen.  “However, when an object doesn’t need to make a sound, or it is not important that a sound is heard, I prevent those sounds from ever playing in the first place. I can build complex systems that automatically change the volume and frequency of sounds based on distance from the listener or even if they are outside the listener’s point of view.  If a sound is too far away, or too many other more important sounds need to be heard, I don’t let the sound play. Those are opportunities to take advantage of the strengths of the video game approach to mixing. Then during the final mix, I have the creative freedom to override those automatic systems so that narratively important sounds take priority, or that distant or off screen threats to the player are heard over less consequential sounds.  When a player reaches a specific location or conversation point, I can obviously adjust the balance of the music, drawing attention to a clear shift in the narrative arc. Those are opportunities to take advantage of the cinematic approach to mixing”. Rather than starting with one scene and continuing to refine the mix of all the elements to some final state, Bolen prefers to work in broad strokes, adjusting all the dialog, or music, or sound effects over larger sequences in order to ensure consistency, and establish a dynamic arc for both the loudness and complexity of the soundscape.  “When I can play through a sequence and get completely lost in a flow state, when I forget that I was mixing and find myself just playing, then I know my work is done”.


When recreating acoustic behaviour in VR, it has to be convincing. “I’ve found that when I can establish a believable presentation of acoustic space and maintain it consistently, spatial audio can feel authentic without being scientifically accurate.  Complex soundscapes with elaborate interactivity can push the technical limits of the smart phones and other devices that play VR experiences, so I often have to find ways to cheat the limits of simulation, by pre-rendering reverb or binaural effects directly into audio files, or by using a computationally cheap stereo reverb instead of a processor hungry convolution reverb with early-reflection simulations.  When I establish audio behaviours early, and only subvert those expectations intentionally, I only have to convince the listener of the level of realism that is appropriate for the detail of the virtual world they are in, and not necessarily live up to the expectations of the complexity of the real world”.


When playing VR games it can be easy to become disoriented so preventing this from happening is crucial. “I try not to drastically change the overall volume of the soundscape too drastically or too quickly, or allow small sounds to duck sounds that are expected to be loud.  I also try to make unimportant sounds less mobile or static, and allow intentional movement to draw attention to a few narratively important sounds”.  


Interactive Virtual Reality often relies on headphones for sound playback, and more devices are being designed with integrated speakers of various quality. “Most frequently I mix object-based audio rendered into a binaural stereo stream to the headphones” added Bolen. “All of my 360 video experiences are mixed in third order ambisonics, and mixed down into lower order or binaural streams for final playback and distribution.  Location-based entertainment experiences often rely on headphones in combination with subwoofers, floor transducers, or other haptic effects that are also driven by audio signals. Very rarely do I mix an experience in traditional stereo or multichannel, other than trailers and behind-the-scenes videos”.  


As for cinema, immersive sound is critical for certain types of film.  “Miyazaki’s 2013 film “The Wind Rises” was mixed in mono, and I was just as entranced in that theatre as I was hearing Alfonso Cuarón’s Gravity in Dolby Atmos released the very same year.  “Immersive” is often used to describe the complexity of the speaker system that implies that more is always better. To paraphrase Randy Thom, it’s the nature of the details that makes them important, not the number”.  


Audience awareness of sound in film is growing.  Part of it is the enhanced storytelling potential and the way we use the tools available, and part of it is awareness driven as technology companies are better able to inform and influence consumers.  “It’s a clearly magical moment when you hear the bubble jet circle in the overhead speakers before landing in Joseph Kosinski’s Oblivion, or the disorienting way the characters’ dialog spins about the room as the camera tumbles in Gravity.  It’s hard to ignore those moments when sound becomes the most important narrative element, seamlessly moving the story off the screen and back again”.


Today’s immersive sound formats enable us to manipulate three-dimensional soundscapes with greater flexibility and precision than ever before. “The creative techniques we employ must remain focused on the listeners’ perception of the story, whether on- or off-screen.  The beauty of the new immersive formats is that if the story calls for us to place a sound off screen, behind us in the room or over our heads; we are more likely to be able to place it appropriately in the sound field.” When talking about extending the narrative off-screen, re-recording mixer Juan Peralta agrees. “I love spatial storytelling with sound effects.  I look at the characters’ eyes.  Anytime they look off-screen, or an object leaves the screen, I try to put an appropriate sound there.”


Regardless of the specific format of immersive or 3D film sound, the quality of the audio experience dramatically increases the value of cinema.  “All location-based entertainment needs to provide an audio-visual experience that simply cannot be replicated in one’s home, or even in VR. As filmmakers and storytellers become more familiar and comfortable with the technology, we’ll continue to see and hear interesting and novel ideas that literally could not have been done in the past”.  


As for kit, Bolen is constantly falling in and out of love with different tools.  “Most commonly, it’s a different ambisonic panner or binaural renderer that I used on the last production, or a different pair of headphones with a slightly different level of impedance, sensitivity, or frequency response.  It’s always something that starts with my ears, and better informs my creative choices with the content I create and how I create it”.  


As technology moves forward, we already have 5:1, 7:1, Dolby Atmos, VR, AR, 

Bolen feels that the current pace of rapid development and deployment of enhanced audio tools and playback devices has led to extreme fragmentation, which is great for innovation but makes it challenging to develop expertise with toolsets or workflows.  “Everything has a slightly different set of bells and whistles arranged in a slightly different way, and sounds slightly different than it did in the previous version. I think the next steps will be the evolution of standards that support the creation of immersive media with 6 degrees of freedom, over both downloaded and streaming media, and allow for both cross-platform content creation and playback.  Hopefully we’ll see a convergence of the capabilities of tools and platforms so that the difference between Mac, PC, game console, Android, or iOS devices matters less to both the content creator and the end user”.



Other Interviews