Cinematic Audio for Games

Cinematic Audio for Games


Games are now telling more complex stories and we are seeing more cinematic aspects incorporated into their sound design. With a shift towards hyper-realism; delivering cinematic experiences with emotional heft and using complex object based audio to achieve a more immersive sound environment can only mean content with more depth, detail and more demands. We spoke to Barney Pratt, Director of Audio at Supermassive Games and Matthew Florianz, Lead Audio Designer, at Frontier Developments about creating cinematic audio for game play.


“Ten years ago I would have banged on about how game sound aspires to be like cinema, but things have moved on a lot since then, and nowadays, both industries co-exist as equals” said Pratt. “We still talk in film jargon and employ cinematic techniques, but the real wins come when you adapt those techniques into the very different technical demands and opportunities of games, whichever genre game you are working on. Game audio techniques now, in most areas, have progressed and advanced to such a point that they only have a residue of the cinematic techniques that served as the origin for their existence.”


Supermassive Games approach the audio for their narrative driven games such as The Dark Pictures horror anthology as short sections of film, in terms of benefitting from exact linear edit timings, then employ game audio techniques to blend interactively which may lead to 2, 3 or maybe even 6 different outcomes. They also use hybrid versions of premix mastering stage and 5.1 surround panning rules.


“Film has mastered the art of storytelling in an economical way” said Florianz.  “Scene by scene, the director and sound team make decisions about the needs of their finished film. Game audio integration starts much earlier in production. Sounds are implemented onto game world-objects and mixed by assumption about the state of game play that won’t be finished until years later. The complexity of Frontier Developments’ non-linear games has advanced enough that, without a filter, hundreds of sounds objects could be active simultaneously. Without a cinematic approach, a cluttered and unfocused mix might obscure player relevant information. Better storytelling can be achieved using a real time contextual mixing solutions that emulates the director/audio-team relationship. The difference being that rather than mixing audio to a scene, in games the mix is achieved by filtering from a large pool of available game objects.”


Consistency within game play audio is paramount and providing a seamless transition in terms of mix, assets and aesthetics are at the forefront of player experience. “For our narrative based games, at Supermassive we approach sections of the game as short sections of film, editing to precise frame accuracy to the action” added Pratt. “Where the narrative branches often to 5 or 6 possible outcomes, we need to both author the incoming section of music to be able to handle a transition, whether immediate, on beat, bar, or phrase etc. And just as importantly, author the outgoing piece of music to be able to handle every single possible transition in order to be inaudible to the player whichever route he chooses. Another example is panning or emitter position relative to listener. For a smooth, first person, POV  (point of  view) experience, audio would pan naturally around the player, but with third person cinematic cameras, that cut mid line, naturally panned audio from precise location relative to listener harshly changes position, breaking the player immersion. We have developed lots of softening techniques to allow for panning to meet the players expectations, but soften the result to maintain cinematic immersion.”


“Game audio works by attaching sound to objects in the game world” added Florianz. “The POV acts as a microphone. In this scenario, distance is often used to decide what is audible in the mix and leaves some decisions to level design. This is not CPU performant in a busy game and can also result in a cluttered mix. For those unfamiliar with game-audio implementation, think of an audio project as a fully automated DAW meet SAMPLER that can playback, manipulate and mix a game in real time. A mix oriented implementation, allows for creative decision making towards the end of a project cycle.”


“If the implementation is fully object centred then audio is linearly added as it is created over the course of a project. This might result in shared implementations between sound designers and across the structure of a project which are hard to maintain or alter towards the end of development. Imagine having aux busses, automation and grouping more or less all capable of mixing a sound. Multiply this across nested projects, an on average 25000 sounds and the result is difficult to mix. In such a scenario, game-wide mixing decisions might require extended bug fixing. At the very least it can increase the amount of testing needed in different circumstances.”


A more consistent approach can be achieved by decoupling audio from game-objects (the so-called 1:1 relationship) and add them when needed based on context. The audio project should equally decouple and separate the role of a sound designer (implementation) from the role of a game mixer. “All this allows for the relationship between the sound design/implementation and the mixing of a game to be much like the relationship between a sound designer and a re-recording mixer in a film and that benefits consistency” said Florianz. “A decoupled approach also manages CPU performance better. When a team is fire-fighting performance, they are not spending time mixing and improving game-audio!”


Employing similar tricks to those used in film, sound designers can ensure a dynamic mix to focus the audio for the player. “Dialogue is ‘always audible’ no matter how far away the listener is placed in relation to the character, but footsteps and cloth drop of a little faster to improve intimacy at close proximity, hearing the footsteps and cloth more clearly” added Pratt. “I like to have a clear sense of distance to help with the voyeurism, being in the scene viewing the action, so we don’t focus pull the mix completely to the subject, but find a happy medium which allows the horror enhancing voyeurism and the effect of distance, but in a hyper-real sense so maintaining audibility.”


“By implementing contextual relevance into the sorting of unimportant and important game-audio, decisions about audibility are moved away from ‘is it close enough to hear to’  ‘what’s important right now?” added Florianz. “Contextual decision making uses simple questions to determine importance before adding audio objects to the game. The mix is thus always focused on what a player needs to hear, rather than what is close to the camera. When we were working on Jurassic World: Evolution code and design developed a system for making contextual based decisions. For example, observing a mixed group of dinosaurs.  Allowing them all into the mix is messy because these animals are huge. Our system uses a simple yes / no questions to help build a score of relevance every frame: Is it moving towards the player? Is it moving faster that other objects? Is it visually larger than other objects? Is it behaving in a player informative way such as fighting?”


The use of licensed music creates recognition and identification. Any licensed track placed well will do a lot to lift an overall soundtrack. “The main limitations for using licensed over original composition is the inability to make these tracks interactive, to exactly match narrative / game play” said Pratt. “With original compositions we have stems to edit and rearrange to enable variation, loops, transitions, but with a licensed track it is normally a stereo wav which is a lot harder to seamlessly blend and transition. Thankfully stems are becoming a standard in the music industry and recently we have licensed tracks with stems, thus benefitting from both the creative and technical goals.”


“Licensed music provides access to a cultural consciousness and does the heavy lifting of explaining context, mood, time or place in an economical way. In some cases, inclusion of licensed music lends authenticity” added Florianz. “Newly created music can precisely voice the specific mood and feeling of a situation and doesn’t risk invoking an out of context memory. It keeps the attention firmly grounded in the game world. Alan Wake is an example of a game that plays with both these concepts in a post-modern way. It presents licensed music through a radio show and for a brief moment pulls both player and Wake out of the dark world. It’s a call back to the main characters sanity which is slowly lost in the woods and an example where licensed music adds another layer to storytelling.”


Streaming and copyright present a hurdle for deciding to use license music. The process of acquiring licensed music represents a significant investment, only to remove it when a game is streamed. Additionally, it’s rare to be able to access the stems of licensed music and rights owners are less keen for altering the mix. This means that there is a lack of flexibility in terms of interactivity. For this reason Florianz tends to use licensed music against things like cut-scenes and intros or for trailers only.

Dolby Atmos is already widely used in games to enhance the audio experience so we asked what other tools are used to create spatial and immersive audio. “Spatial audio is still a big growth area in game audio tech” added Pratt. “Whereas Dolby Atmos provides a linear mix with emitters to move around a virtual space, in game every sound is already attached to an emitter moving around in virtual space, so our use of Dolby Atmos is different to cinema. Virtual surround delivered in stereo headphones via an HRTF solution is the most commonly offered spatial audio, but these solutions have both positive and negative effects on the audio so you have to hunt down a good one and treat your audio output in accordance with certain limitations to get the best result. Here at Supermassive, we are advancing our tech in this area all the time and as ever there are some amazing opportunities on offer whether via middleware such as Wwise, or bespoke engineered solutions.”


“Positional audio is a core principle in game audio. All diegetic sounds are assigned with in-world positioning and the listener is dynamic within this game world” added Florianz. “Games generally run at 30 or 60 fps and we have to deliver new audio for each frame. Contextual mixing filters player relevant audio, which leaves CPU time to apply expensive spatial and immersive filtering such as impulse responses, reverbs, tap-delays, speaker modelling and EQ’s.”

“Game data from the virtual game environment can be used to manipulate the mix for room characteristics and provide early reflection and size dependant reverbs. A consistent distance model is important as it conveys the internal logic and “distance-feeling” of a game-world. It doesn’t have to be physically correct but it does have to be consistent to reinforce it. “Dolby Atmos, DTS HPX, HTRF or any other binaural solution acts as a transport for game audio. These new technologies allow us to explore the Y plane (height) alongside the X (panning) and Z (depth) planes of traditional 3D audio surround technology. What’s really good about all these developments is that after 15 years of not having hardware acceleration, there’s a revived interested in creating bespoke hardware for audio.”


When creating cinematic audio for games, there are certainly some challenges along the way. “The biggest challenge is always in delivering the best player experience – everything else is subservient, whether considered cinematic or game audio” said Pratt. “Shifting cinematic passive approaches into active and interactive environments of games is always an interesting challenge. In order to achieve the best results in game, we spend a lot of time developing the tech and pipelines to enable fast processing and integration of sound, freeing us up to have more creative discussions in order to provide the best player experience.”


“An abundance of audio in the game world needs to be managed in a game mix or the game will sound cluttered” said Florianz. “The means of managing this can be crude or highly sophisticated but has to be implemented and maintained as the project is growing. Timing is also tricky. Dramatic storytelling that is built around the relationship between anticipation, tension and release can be managed if it’s abrupt. Predicting moment to moment game play is a more capricious undertaking and implementation needs to incorporate player decision making into the cadence and transition-arcs of anticipation building.”


As for additional considerations, each department has their own when creating audio and they can be both technical and creative.  “Dialogue, as in film is vital to have consistent mastering, but for games we need additional attention to the natural feel of any dialogue systems or manually added assets” added Pratt. “Foley is also vital for our character focused games with otherwise silent characters, bedding the animated characters in reality, and the foley team need to be aware of consistency between both linear and procedural systems. Ambience needs to reflect and enhance the environments which would also be absolutely silent, and as with film we can flex the ambiences to reflect and enhance the narrative. Music for games can be treated in a widely different ways to film, serving interactive purposes responsive to player input, and for games, music editors need to be conscious of multiple outcomes, conditions and game variables when producing assets, making this a lot more complicated than for cinema. Sound effects adhere much more closely to film sound, offering hyper-real enhancements reflecting character personalities or narrative goals, but again the integration offers significantly greater technical challenges to the consistency of the way those sounds are perceived. Sound design, an often ambiguous term, we use to describe all of the non-diegetic elements that sit in the huge grey area of opportunity between diegetic and non-diegetic audio rather than the design of the overall soundscape, and as in film, the opportunities with this abstract audio layer are limitless.”


Audio departments are in a continuous dialogue with animation about repetition (prevent predictable audio amplified rhythms and sequences), with render about weather (expose surfaces for rain or waving of trees in the wind) and with UI (timing and sequencing of data). “Visual repetition is less noticeable than auditory repetition” added Florainz. “One could look at someone knocking on a door all day long but sound would immediately add a sense of urgency, if not annoyance through repetition. Information to make sound interactive needs to be exposed in an audio relevant way. A car engine with “faked” physics can look and behave fine, but as soon as that data is used in the context of interactive audio, it won’t work. This is because we feel sound emotionally, more than we hear it.”


There are also some common problems associated with creating immersive audio.  “Most of the issues we have are with our integration tools, so we try to design pipelines that have safety nets and do a lot of the work for us” added Pratt. “We use the technology as much as we can to create solutions that work at runtime rather than time consuming offline authoring solutions. For example, if consistently premixed foley assets hit the game, the bussing structure and other variables enable the right EQ mix, prioritisation, ducking and levelling so that those assets sound nearly final the moment they hit the game.”


Intelligible dialogue is important to game play. “Dialogue in our games is always audible, but we allow the player to get a sense of distance when the listener / camera is further away from the subject. This does make the dialogue quieter, but we have a ducking and prioritisation process that enables that dialogue to be heard over all other elements. There are two approaches to this, HDR audio and traditional sidechain ducking. With the sidechain ducking we have the ability to not only effect volume of ducked sources, but individual frequencies and other plug-in parameters can change actively, reacting to other audio sources at runtime, such as dialogue amplitude, giving us a comprehensive platform to enable a better result at runtime in games.”


“The hardest part is focus and transitions” said Florianz. “A human generally won’t turn away from a conversation mid-sentence and the brain can order a cacophony of many voices into audible conversations. These situations are less straight forward in an open world game environment. People find it important to understand dialogue and are potentially perceived as intended for the player, so we have to take the role of the brain and filter for relevance. For legibility we mix game audio in line with cinematic dialogue delivery, using side chain with subtle EQ ducking or aggressively ducking sound effects in the case of accessibility options.”


Film appears to have led the way for cinematic sound in games, however the recent use of game technology in the creation of film content is something to watch out for and games may end up influencing film rather than the other way around.