3D Audio, Game Theory, and Immersion Within Virtual Reality

September 25, 2018
Posted in News
September 25, 2018 Tommy Goodkin

By Tommy Goodkin, Head of Content at VR World

Until very recently, virtual reality start-ups have been increasingly concentrated on the

visual aspects of the virtual reality experience. Little attention has been focused on the utilization

of sound to enhance the immersive qualities of virtual reality. However, as the demand for

greater realism within the presentation of virtual worlds increases, content creators are

acknowledging the role of audio in creating an illusion of physical presence within a simulation.

While a great deal of produced content in virtual reality is successful in visually portraying an

interactive world, they overlook the importance of other sensory cues that inform perceptual

understanding of an environment.

Game audio theory is a useful context for examining the role of sound in interactive

experiences. In a video game, sound is categorized into diegetic (belonging to the game world)

and extra-diegetic sounds. The diegetic sounds are what a video game character may hear, while

extradiegetic sounds are directed towards the player, enhancing gameplay. Diegetic sounds are

useful for giving the game player environmental context in the game world. Diegetic sounds can

include in-game dialogue or indicate imminent action. Extradiegetic sound in video games may

include music that enhances the overall atmosphere, but can also serve practical functions that

allow the player to distinguish between gameplay and different modes of the game or identify

gameplay objectives. In addition, the extradiegetic element of music may be contingent upon the

actions of the player, building in intensity during arcs in the gameplay narrative or signaling

entry to a different location or level within the game.

Because of game audio’s reliance on interactivity, it serves as a much more useful model

than film for developing sound theory in the context of virtual reality. Unlike a film score that

follows a predetermined narrative, sound in virtual reality can be user-generated. It must rely on

an adaptive model of synchronization between sound and the visual environment that depends on

the potential outcomes within an experience. Unlike video games, however, a virtual reality

experience may not have the limitations of restricted mobility and visual perception. Because of

this, extradiegetic sounds that disrupt full immersion into the game world for the purpose of

gameplay enhancement are sometimes unnecessary. The potential of immersion in virtual reality

over video games allows sound design to have a more important role in contributing to the

immersive experience rather than detracting from it. In virtual reality, diegetic sound takes on an

entirely new importance, not only establishing an environment but also conveying a sense of

realism within that environment. Diegetic audio in video games, like footsteps indicating

movement, has the tendency to be exaggerated to emphasize an unseen action. In contrast, virtual

reality has the ultimate goal of enabling the user to truly believe they have been transported to a

different environment. This not only requires realistic representations of diegetic sounds (or

logically approximated, in the case of a fantasy world), but also spatial cues that correspond to

the sound’s origin within the world.

As of yet, there’s no uniform approach to synchronizing audio and a 3D environment but

various developers have been experimenting with mimicking how we perceive sound in the real

world and relating it to other sensory elements of the virtual reality experience. To successfully

convey the illusion of embodiment, the participant must be surrounded by sound in the same way

they are in the real world. Sound cues must distinguish what is above, below, behind, in front,

and around us. By mimicking human perception of sound, the emulation of a multi-directional

sound environment is essential in sustaining the illusion of embodiment. As opposed to popular

music, which relies on the left-right balance of a stereo configuration, virtual reality must rely on

3D audio to place sounds within a three-dimensional space. While surround sound can also be

integrated into virtual reality, the cost and size of surround sound systems optimized for virtual

content could prevent virtual reality developers from their ultimate goal of a product intended for

mass consumption.

Beyond the constraints of realism, experimental and artistic virtual reality content should

still have some synchronization between audio and visuals that can convey a multidimensional

environment, real or imagined, to multiple senses. In these projects, traditionally extradiegetic

elements within film and video games such as music can be diegetic in the context of the

experience. An example could be a multimedia art project that utilizes virtual reality and 3D

audio that have synchronistic elements like music and visuals that work together in three-

dimensions to communicate their own narrative. In the context of traditional narrative or

objective based content, a music score can contribute to the artistic value and emotional

immersion within virtual reality content, but should be wary of 3D audio’s potential to

distinguish diegetic from extradiegetic within the context of the experience. For example, in

content that uses music as an atmospheric element, 3D audio can create the false illusion that the

music is being performed within the diegesis and confuse the user. However, in a virtual concert

going experience, 3D audio can establish both the diegesis and a realistic environment, following

the spatial relationship between the viewer and music event. Two-dimensional audio still has

appropriate applications within virtual reality, as scoring contributes to immersion by enhancing

or highlighting the interactive nature of the content.

The importance of sustaining immersion through diegetic sound distinguishes virtual

reality from film. The confinement of film within the space of a frame allows the medium to get

away with the audience’s subconscious auditory completion of implied actions. For example, a

character exiting an environment in a film does’t utilize 3D audio to express the movement away

from the perspective of the camera. The perceived depth of the action isn’t of any importance to

the immersive qualities of a film. The audience automatically distinguishes themselves as

separate entities from the film world, so diegetic sound doesn’t have the responsibility of

immersion that exists in virtual reality content and games. However, the immersive quality of

video games is usually confined to the user’s responsibility to achieve a goal set within the game

world. Virtual reality can include the goal-oriented entertainment of a game, but can also be a

narrative that unfolds before the viewer from a first person, interactive environment. This form

of entertainment is still interactive, but has a pre-established storyline and doesn’t require active

task-oriented participation. Narrative-based content such as this relies on the potential of

embodiment within a virtual storyline to be immersive entertainment. The diegetic sound in

virtual reality storytelling content is of incredible importance to the enhancement of the narrative

and the quality of immersion. Unlike film, it requires 3D audio to fully engage the viewer within

the virtual story world by establishing continuity between what the user sees and hears. If the

continuity is broken, and three-dimensional actions are accompanied by two-dimensional sounds,

the mind will register the inauthenticity of the environment and prevent the sensation of being

fully transported into the virtual world.

The idea that sound can occupy space in a recording the same way it does in an actual

acoustic event was first discovered with binaural recording. Though over a century old, binaural

recording was first fully realized by microphone company Neumann in 1973. Neumann’s

recording device was an anatomical replica of the human head which used microphones placed

in the eardrums to record from the physical source where sound is identified by humans. The

human head model replicated the human process of hearing and accurately recreates the sound

that arrives at each ear during human perception of the acoustic event.

The human instinct to localize sound is primarily based on ITDs (interaural time

differences) and ILDs (interaural level differences) which contrast the differences in time of

arrival and level of sound between two ears to determine the location of the sound. Mathematical

functions known as head-related transfer functions, or HRTFs, identify how sound travels before

it is perceived by the eardrum. Beyond the influence of ITDs and ILDs in the perception of a

sound, the HRTFs translate the travel of sound to an algorithm based on the shape of the

listener’s ear, the shape of their body, and acoustic characteristics of an environment. By

approximating the HRTFs that occur during human hearing and configuring microphones based

on the approximation, the binaural recording device captures all the spatial clues that occur in a

real acoustic event. A set of two HRTFs for each ear provide the brain with enough information

to make inferences about the location of a sound.

Binaural recording’s ability to only capture acoustic sounds can have useful virtual reality

applications in realistic emulations of the world, reproducing a conversation, a live music

experience, or the background noise of nature or an urban landscape. In applications that require

synthesized sound or directly recorded electronic instruments, a 3D audio effect requires digital

manipulation of the sound to localize a source. Using algorithms obtained from HRTF

measurements, analysis of the qualities that determine the spectral position of a sound can be

obtained and replicated. Processes like selective filtering of spectral qualities beyond the desired

positional placement take into account individual HRTFs and imagine a three-dimensional audio

world based on an algorithmic approximation of spectral clues that imply how human hearing

perceives directional sound. However, it is the audio engineer’s responsibility to design a

realistic acoustic environment, using environmental modeling with reverb and delays.

The Oculus software development kit includes resources devoted to content developers

who want to create immersive audio. It features Oculus Spatializer, a plugin for digital audio

workstations that employs 3D audio synthesis to create the illusion of positional sound. An

important focus of audio engineers and content designers working in the integration of 3D audio

into virtual reality environments is the ability to automate spatial mixes in relation to user-

generated movements. Using coding within Unity 5 (or similar game development engine) in

conjunction with the Oculus audio software development kit, content creators can associate

various game responses with audio cues that create an interactive 3D audio world. In

appropriately coded content, elements of the game have positional sound effects that create an

illusion of space. In addition, the illusion of space will interact with the user’s position within the

virtual environment. As they approach the direction of a sound, it may get louder or change. On

Oculus’s developer website, however, they state that the audio development kit “only supports

direct reflections and does not factor in the virtual world geometry.”1 Though the technology has

yet to be developed, creating audio that references the geometry of a virtual world to capture

accurate sound propagation is an important part of creating a coherent simulated environment.

The qualities of interactive 3D audio within virtual reality depend on the implementation

of the technology in virtual reality headsets. To communicate a spacial environment that reacts to

the user’s specific movements, effective head tracking is essential. Stationary positional sound

that is unable to maintain its location in proximity to the user will automatically create logical

inconsistencies that compromise the immersive qualities of virtual reality. Software developers

for multiple virtual reality companies including Microsoft, Oculus, and Sony have already begun

perfecting their prototypes for head tracking devices that synchronize with 3D audio. The main

obstacle is that the HRTF measurements used to maintain synchronization between the user’s

movements and the placement of the sound are unique to every individual, depending on their

overall body shape, ear size, and placement. With their Kinnect technology, Microsoft has

developed a way to approximate a user’s HRTF with a 3D scan of their body. Awaiting release as

a consumer version in early 2016, Oculus’s most advanced prototype is the first to use head

tracking for complete 3D audio-visual synchronization, but Sony’s Morpheus claims similar

capabilities. The success of these devices will depend on how well they integrate this technology

into other features, creating a convincing, full immersion of all the senses. Audio immersion

altogether depends on the implementation of interactive 3D audio by creative professionals. The

technical ability required to program the technology may alienate composers and musicians, so

it’s important that a mode of communication between programmers and creatives is established

so artistic ideas can be brought to life within virtual reality technology.

Works Cited:

1 “Introduction to Virtual Reality Audio.” Developer Center. Oculus VR, n.d. Web. <https:// developer.oculus.com/documentation/audiosdk/latest/>.

Dolezal, Luna. “The Remote Body: The Phenomenology of Telepresence and Re-embodiment.” Human Technology 5(2) (2009): 208-26. Academia. Web.

Lalwani, Mona. “Surrounded by Sound: How 3D Audio Hacks Your Brain.” The Verge. N.p., 12 Feb. 2015. Web.

Huiberts, S., Captivating Sound: the Role of Audio for Immersion in Games. Doctoral Thesis. University of Portsmouth and Utrecht School of the Arts, Portsmouth, 2010.

Dorrier, Jason. “What’s Missing from Virtual Reality? Immersive 3D Soundscapes” Singularity HUB. N.p., 06 July 2014. Web.

Basnicki, Erica. “Feature: The Rise of Immersive Audio.” Audio Media International. N.p., 2 June 2015. Web.

Simonite, Tom. “Microsoft’s ‘3-D Audio’ Gives Virtual Objects a Voice.” MIT Technology Review. N.p., 04 June 2014. Web.

Nelson, Phillip A. “Virtual Acoustics and Audio Engineering.” University of Southampton: Institute of Sound and Vibration Research. N.p., n.d. Web.

Geier, Matthias, Jens Ahrens, and Sascha Spors. “Object-Based Audio Reproduction and the Audio Scene Description Format.” Organised Sound 15.3 (2010): 219-27. ProQuest. Web. 16 Dec. 2015.

Tonnesen, Cindy, and Joe Steinmetz. “3D Sound Synthesis.” Encyclopedia of Virtual Environments, University of Washignton. Web.

Burns, Chris. “Oculus Rift Crescent Bay Hands-on with 3D Audio.” SlashGear. N.p., 12 Jan. 2015. Web. <http://www.slashgear.com/oculus-rift-crescent-bay-hands-on-with-3d- audio-12363933/>.

, ,