By Tommy Goodkin, Head of Content at VR World
Until very recently, virtual reality start-ups have been increasingly concentrated on the
visual aspects of the virtual reality experience. Little attention has been focused on the utilization
of sound to enhance the immersive qualities of virtual reality. However, as the demand for
greater realism within the presentation of virtual worlds increases, content creators are
acknowledging the role of audio in creating an illusion of physical presence within a simulation.
While a great deal of produced content in virtual reality is successful in visually portraying an
interactive world, they overlook the importance of other sensory cues that inform perceptual
understanding of an environment.
Game audio theory is a useful context for examining the role of sound in interactive
experiences. In a video game, sound is categorized into diegetic (belonging to the game world)
and extra-diegetic sounds. The diegetic sounds are what a video game character may hear, while
extradiegetic sounds are directed towards the player, enhancing gameplay. Diegetic sounds are
useful for giving the game player environmental context in the game world. Diegetic sounds can
include in-game dialogue or indicate imminent action. Extradiegetic sound in video games may
include music that enhances the overall atmosphere, but can also serve practical functions that
allow the player to distinguish between gameplay and different modes of the game or identify
gameplay objectives. In addition, the extradiegetic element of music may be contingent upon the
actions of the player, building in intensity during arcs in the gameplay narrative or signaling
entry to a different location or level within the game.
Because of game audio’s reliance on interactivity, it serves as a much more useful model
than film for developing sound theory in the context of virtual reality. Unlike a film score that
follows a predetermined narrative, sound in virtual reality can be user-generated. It must rely on
an adaptive model of synchronization between sound and the visual environment that depends on
the potential outcomes within an experience. Unlike video games, however, a virtual reality
experience may not have the limitations of restricted mobility and visual perception. Because of
this, extradiegetic sounds that disrupt full immersion into the game world for the purpose of
gameplay enhancement are sometimes unnecessary. The potential of immersion in virtual reality
over video games allows sound design to have a more important role in contributing to the
immersive experience rather than detracting from it. In virtual reality, diegetic sound takes on an
entirely new importance, not only establishing an environment but also conveying a sense of
realism within that environment. Diegetic audio in video games, like footsteps indicating
movement, has the tendency to be exaggerated to emphasize an unseen action. In contrast, virtual
reality has the ultimate goal of enabling the user to truly believe they have been transported to a
different environment. This not only requires realistic representations of diegetic sounds (or
logically approximated, in the case of a fantasy world), but also spatial cues that correspond to
the sound’s origin within the world.
As of yet, there’s no uniform approach to synchronizing audio and a 3D environment but
various developers have been experimenting with mimicking how we perceive sound in the real
world and relating it to other sensory elements of the virtual reality experience. To successfully
convey the illusion of embodiment, the participant must be surrounded by sound in the same way
they are in the real world. Sound cues must distinguish what is above, below, behind, in front,
and around us. By mimicking human perception of sound, the emulation of a multi-directional
sound environment is essential in sustaining the illusion of embodiment. As opposed to popular
music, which relies on the left-right balance of a stereo configuration, virtual reality must rely on
3D audio to place sounds within a three-dimensional space. While surround sound can also be
integrated into virtual reality, the cost and size of surround sound systems optimized for virtual
content could prevent virtual reality developers from their ultimate goal of a product intended for
Beyond the constraints of realism, experimental and artistic virtual reality content should
still have some synchronization between audio and visuals that can convey a multidimensional
environment, real or imagined, to multiple senses. In these projects, traditionally extradiegetic
elements within film and video games such as music can be diegetic in the context of the
experience. An example could be a multimedia art project that utilizes virtual reality and 3D
audio that have synchronistic elements like music and visuals that work together in three-
dimensions to communicate their own narrative. In the context of traditional narrative or
objective based content, a music score can contribute to the artistic value and emotional
immersion within virtual reality content, but should be wary of 3D audio’s potential to
distinguish diegetic from extradiegetic within the context of the experience. For example, in
content that uses music as an atmospheric element, 3D audio can create the false illusion that the
music is being performed within the diegesis and confuse the user. However, in a virtual concert
going experience, 3D audio can establish both the diegesis and a realistic environment, following
the spatial relationship between the viewer and music event. Two-dimensional audio still has
appropriate applications within virtual reality, as scoring contributes to immersion by enhancing
or highlighting the interactive nature of the content.
The importance of sustaining immersion through diegetic sound distinguishes virtual
reality from film. The confinement of film within the space of a frame allows the medium to get
away with the audience’s subconscious auditory completion of implied actions. For example, a
character exiting an environment in a film does’t utilize 3D audio to express the movement away
from the perspective of the camera. The perceived depth of the action isn’t of any importance to
the immersive qualities of a film. The audience automatically distinguishes themselves as
separate entities from the film world, so diegetic sound doesn’t have the responsibility of
immersion that exists in virtual reality content and games. However, the immersive quality of
video games is usually confined to the user’s responsibility to achieve a goal set within the game
world. Virtual reality can include the goal-oriented entertainment of a game, but can also be a
narrative that unfolds before the viewer from a first person, interactive environment. This form
of entertainment is still interactive, but has a pre-established storyline and doesn’t require active
task-oriented participation. Narrative-based content such as this relies on the potential of
embodiment within a virtual storyline to be immersive entertainment. The diegetic sound in
virtual reality storytelling content is of incredible importance to the enhancement of the narrative
and the quality of immersion. Unlike film, it requires 3D audio to fully engage the viewer within
the virtual story world by establishing continuity between what the user sees and hears. If the
continuity is broken, and three-dimensional actions are accompanied by two-dimensional sounds,
the mind will register the inauthenticity of the environment and prevent the sensation of being
fully transported into the virtual world.
The idea that sound can occupy space in a recording the same way it does in an actual
acoustic event was first discovered with binaural recording. Though over a century old, binaural
recording was first fully realized by microphone company Neumann in 1973. Neumann’s
recording device was an anatomical replica of the human head which used microphones placed
in the eardrums to record from the physical source where sound is identified by humans. The
human head model replicated the human process of hearing and accurately recreates the sound
that arrives at each ear during human perception of the acoustic event.
The human instinct to localize sound is primarily based on ITDs (interaural time
differences) and ILDs (interaural level differences) which contrast the differences in time of
arrival and level of sound between two ears to determine the location of the sound. Mathematical
functions known as head-related transfer functions, or HRTFs, identify how sound travels before
it is perceived by the eardrum. Beyond the influence of ITDs and ILDs in the perception of a
sound, the HRTFs translate the travel of sound to an algorithm based on the shape of the
listener’s ear, the shape of their body, and acoustic characteristics of an environment. By
approximating the HRTFs that occur during human hearing and configuring microphones based
on the approximation, the binaural recording device captures all the spatial clues that occur in a
real acoustic event. A set of two HRTFs for each ear provide the brain with enough information
to make inferences about the location of a sound.
Binaural recording’s ability to only capture acoustic sounds can have useful virtual reality
applications in realistic emulations of the world, reproducing a conversation, a live music
experience, or the background noise of nature or an urban landscape. In applications that require
synthesized sound or directly recorded electronic instruments, a 3D audio effect requires digital
manipulation of the sound to localize a source. Using algorithms obtained from HRTF
measurements, analysis of the qualities that determine the spectral position of a sound can be
obtained and replicated. Processes like selective filtering of spectral qualities beyond the desired
positional placement take into account individual HRTFs and imagine a three-dimensional audio
world based on an algorithmic approximation of spectral clues that imply how human hearing
perceives directional sound. However, it is the audio engineer’s responsibility to design a
realistic acoustic environment, using environmental modeling with reverb and delays.
The Oculus software development kit includes resources devoted to content developers
who want to create immersive audio. It features Oculus Spatializer, a plugin for digital audio
workstations that employs 3D audio synthesis to create the illusion of positional sound. An
important focus of audio engineers and content designers working in the integration of 3D audio
into virtual reality environments is the ability to automate spatial mixes in relation to user-
generated movements. Using coding within Unity 5 (or similar game development engine) in
conjunction with the Oculus audio software development kit, content creators can associate
various game responses with audio cues that create an interactive 3D audio world. In
appropriately coded content, elements of the game have positional sound effects that create an
illusion of space. In addition, the illusion of space will interact with the user’s position within the
virtual environment. As they approach the direction of a sound, it may get louder or change. On
Oculus’s developer website, however, they state that the audio development kit “only supports
direct reflections and does not factor in the virtual world geometry.”1 Though the technology has
yet to be developed, creating audio that references the geometry of a virtual world to capture
accurate sound propagation is an important part of creating a coherent simulated environment.
The qualities of interactive 3D audio within virtual reality depend on the implementation
of the technology in virtual reality headsets. To communicate a spacial environment that reacts to
the user’s specific movements, effective head tracking is essential. Stationary positional sound
that is unable to maintain its location in proximity to the user will automatically create logical
inconsistencies that compromise the immersive qualities of virtual reality. Software developers
for multiple virtual reality companies including Microsoft, Oculus, and Sony have already begun
perfecting their prototypes for head tracking devices that synchronize with 3D audio. The main
obstacle is that the HRTF measurements used to maintain synchronization between the user’s
movements and the placement of the sound are unique to every individual, depending on their
overall body shape, ear size, and placement. With their Kinnect technology, Microsoft has
developed a way to approximate a user’s HRTF with a 3D scan of their body. Awaiting release as
a consumer version in early 2016, Oculus’s most advanced prototype is the first to use head
tracking for complete 3D audio-visual synchronization, but Sony’s Morpheus claims similar
capabilities. The success of these devices will depend on how well they integrate this technology
into other features, creating a convincing, full immersion of all the senses. Audio immersion
altogether depends on the implementation of interactive 3D audio by creative professionals. The
technical ability required to program the technology may alienate composers and musicians, so
it’s important that a mode of communication between programmers and creatives is established
so artistic ideas can be brought to life within virtual reality technology.
1 “Introduction to Virtual Reality Audio.” Developer Center. Oculus VR, n.d. Web. <https:// developer.oculus.com/documentation/audiosdk/latest/>.
Dolezal, Luna. “The Remote Body: The Phenomenology of Telepresence and Re-embodiment.” Human Technology 5(2) (2009): 208-26. Academia. Web.
Lalwani, Mona. “Surrounded by Sound: How 3D Audio Hacks Your Brain.” The Verge. N.p., 12 Feb. 2015. Web.
Huiberts, S., Captivating Sound: the Role of Audio for Immersion in Games. Doctoral Thesis. University of Portsmouth and Utrecht School of the Arts, Portsmouth, 2010.
Dorrier, Jason. “What’s Missing from Virtual Reality? Immersive 3D Soundscapes” Singularity HUB. N.p., 06 July 2014. Web.
Basnicki, Erica. “Feature: The Rise of Immersive Audio.” Audio Media International. N.p., 2 June 2015. Web.
Simonite, Tom. “Microsoft’s ‘3-D Audio’ Gives Virtual Objects a Voice.” MIT Technology Review. N.p., 04 June 2014. Web.
Nelson, Phillip A. “Virtual Acoustics and Audio Engineering.” University of Southampton: Institute of Sound and Vibration Research. N.p., n.d. Web.
Geier, Matthias, Jens Ahrens, and Sascha Spors. “Object-Based Audio Reproduction and the Audio Scene Description Format.” Organised Sound 15.3 (2010): 219-27. ProQuest. Web. 16 Dec. 2015.
Tonnesen, Cindy, and Joe Steinmetz. “3D Sound Synthesis.” Encyclopedia of Virtual Environments, University of Washignton. Web.
Burns, Chris. “Oculus Rift Crescent Bay Hands-on with 3D Audio.” SlashGear. N.p., 12 Jan. 2015. Web. <http://www.slashgear.com/oculus-rift-crescent-bay-hands-on-with-3d- audio-12363933/>.