Sound and Music for Video Games

57
Sound and Music for Video Games Technology Overview Roger Crawfis Ohio State University

description

Sound and Music for Video Games. Technology Overview Roger Crawfis Ohio State University. Overview. Fundamentals of Sound Psychoacoustics Interactive Audio Applications. What is sound?. Sound is the sensation perceived by the sense of hearing - PowerPoint PPT Presentation

Transcript of Sound and Music for Video Games

Page 1: Sound and Music for Video Games

Sound and Music for Video Games

Technology Overview

Roger Crawfis

Ohio State University

Page 2: Sound and Music for Video Games

Overview

• Fundamentals of Sound

• Psychoacoustics

• Interactive Audio

• Applications

Page 3: Sound and Music for Video Games

What is sound?

• Sound is the sensation perceived by the sense of hearing

• Audio is acoustic, mechanical, or electrical frequencies corresponding to normally audible sound waves

Page 4: Sound and Music for Video Games

Dual Nature of Sound

• Transfer of sound and physical stimulation of ear

• Physiological and psychological processing in ear and brain (psychoacoustics)

Page 5: Sound and Music for Video Games

Transmission of Sound

• Requires a medium with elasticity and inertia (air, water, steel, etc.)

• Movements of air molecules result in the propagation of a sound wave

Page 6: Sound and Music for Video Games

Longitudinal Motion of Air

Page 7: Sound and Music for Video Games

Wavefronts and Rays

Page 8: Sound and Music for Video Games

Reflection of Sound

Page 9: Sound and Music for Video Games

Absorption of Sound

• Some materials readily absorb the energy of a sound wave

• Example: carpet, curtains at a movie theater

Page 10: Sound and Music for Video Games

Refraction of Sound

Page 11: Sound and Music for Video Games

Refraction of Sound

Page 12: Sound and Music for Video Games

Diffusion of Sound

• Not analogous to diffusion of light

• Naturally occurring diffusions of sounds typically affect only a small subset of audible frequencies

• Nearly full diffusion of sound requires a reflection phase grating (Schroeder Diffuser)

Page 13: Sound and Music for Video Games

The Inverse-Square Law (Attenuation)

24 r

WI

I is the sound intensity in W/cm^2

W is the sound power of the source in W

r is the distance from the source in cm

Page 14: Sound and Music for Video Games

The Skull

• Occludes wavelengths “small” relative to the skull

• Causes diffraction around the head (helps amplify sounds)

• Wavelengths much larger than the skull are not affected (explains how low frequencies are not directional)

Page 15: Sound and Music for Video Games

The Pinna

Page 16: Sound and Music for Video Games

Ear Canal and Skull

• (A) Dark line – ear canal only

• (B) Dashed line – ear canal and skull diffraction

Page 17: Sound and Music for Video Games

Auditory Area (20Hz-20kHz)

Page 18: Sound and Music for Video Games

Spatial Hearing

• Ability to determine direction and distance from a sound source

• Not fully understood process

• However, some cues have been identified as useful

Page 19: Sound and Music for Video Games

The “Duplex” Theory of Localization

• Interaural Intensity Differences (IIDs)

• Interaural Arrival-Time Differences (ITDs)

Page 20: Sound and Music for Video Games

Interaural Intensity Difference

• The skull produces a sound shadow• Intensity difference results from one ear being

shadowed and the other not• The IID does not apply to frequencies below

1000Hz (waves similar or larger than size of head)

• Sound shadowing can result in up to ~20dB drops for frequencies >=6000Hz

• The Inverse-Square Law can also effect intensity

Page 21: Sound and Music for Video Games

Head Rotation or Tilt

• Rotation or tilt can alter interaural spectrum in predictable manner

Page 22: Sound and Music for Video Games

Interaural Arrival-Time Difference

• Perception of phase difference between ears caused by arrival-time delay (ITD)

• Ear closest to sound source hears the sound before the other ear

Page 23: Sound and Music for Video Games

Digital Sound

• Remember that sound is an analogue process (like vision).

• Computers need to deal with digital processes (like digital images).

• Many similar properties between computer imagery and computer sound processing.

Page 24: Sound and Music for Video Games

Class or Semantics

• Sample

• Stream Sounds

• Music

• Tracks

• MIDI

Page 25: Sound and Music for Video Games

Sound for Games

• Stereo doesn’t cut it anymore – you need positional audio.

• Positional audio increases immersion• The Old: Vary volume as position changes• The New: Head-Related Transfer Functions (HRTF) for

3d positional audio with 2-4 speakers• Games use:

– Dolby 5.1: requires lots of speakers – Creative’s EAX: “environmental audio”– Aureal’s A3D: good positional audio– DirectSound3D: Microsoft’s answer– OpenAL: open, cross-platform API

Page 26: Sound and Music for Video Games

Audio Basics

• Has two fundamental physical properties– Frequency (the pitch of the wave – oscillations per second

(Hertz))– Amplitude (the loudness or strength of the wave - decibels)

Frequency

Amplitude

Page 27: Sound and Music for Video Games

Sampling• A sound wave is “sampled”

– measurements of amplitude taken at a “fast” rate– results in a stream of numbers

5 10 15 20mS Time

-1

-0.5

0.5

1

Amplitude

Page 28: Sound and Music for Video Games

Data Rates for Sound

• Human ear can hear frequencies between ?? and ??.• Must sample at twice the highest frequency.

– Assume stereo (two channels)

– Assume 44Khz sampling rate (CD sampling rate)

– Assume 2 bytes per channel per sample

– How much raw data is required to record 3 minutes of music?

Page 29: Sound and Music for Video Games

Waveform Sampling: Quantization

• Quantization

• Introduces

• Noise

• Examples: 16, 12, 8, 6, 4 bit music

• 16, 12, 8, 6, 4 bit speech

Page 30: Sound and Music for Video Games

Limits of Human Hearing• Time and Frequency

Events longer than 0.03 seconds are resolvable in time

shorter events are perceived as features in frequency

20 Hz. < Human Hearing < 20 KHz.

(for those under 15 or so)

“Pitch” is PERCEPTION related to FREQUENCY

Human Pitch Resolution is about 40 - 4000 Hz.

Page 31: Sound and Music for Video Games

Limits of Human Hearing

• Amplitude or Power???

– “Loudness” is PERCEPTION related to POWER, not

AMPLITUDE

– Power is proportional to (integrated) square of signal

– Human Loudness perception range is about 120 dB, where +10 db = 10 x power = 20 x amplitude

– Waveform shape is of little consequence. Energy at each frequency, and how that changes in time, is the most important feature of a sound.

Page 32: Sound and Music for Video Games

Limits of Human Hearing• Waveshape or Frequency Content??

– Here are two waveforms with identical power spectra, and which are (nearly) perceptually identical:

Wave 1

Wave 2

Magnitude

Spectrum

Page 33: Sound and Music for Video Games

Limits of Human HearingMasking in Amplitude, Time, and Frequency

– Masking in Amplitude: Loud sounds ‘mask’ soft ones.Masking in Amplitude: Loud sounds ‘mask’ soft ones.

Example: Quantization NoiseExample: Quantization Noise

– Masking in time: A soft sound just before a louderMasking in time: A soft sound just before a louder

sound is more likely to be heard than if it is just after.sound is more likely to be heard than if it is just after.

Example (and reason): Reverb vs. “Preverb”Example (and reason): Reverb vs. “Preverb”

– Masking in Frequency: Loud ‘neighbor’ frequencyMasking in Frequency: Loud ‘neighbor’ frequency

masks soft spectral components. Low soundsmasks soft spectral components. Low sounds

mask higher ones more than high masking low.mask higher ones more than high masking low.

Page 34: Sound and Music for Video Games

Limits of Human Hearing• Masking in Amplitude• Intuitively, a soft sound will not be heard if

there is a competing loud sound. Reasons:– Gain controls in the ear

stapedes reflex and more

– Interaction (inhibition) in the cochlea– Other mechanisms at higher levels

Page 35: Sound and Music for Video Games

Limits of Human Hearing• Masking in Time

– In the time range of a few milliseconds:

– A soft event following a louder event tends to be grouped perceptually as part of that louder event

– If the soft event precedes the louder event, it might be heard as a separate event (become audible)

Page 36: Sound and Music for Video Games

Limits of Human Hearing• Masking in Frequency

Only one component in this spectrum is

audible because of frequency masking

Page 37: Sound and Music for Video Games

Sampling Rates

• For Cheap Compression, Look atLowering the Sampling

Rate First• 44.1kHz 16 bit = CD Quality

• 8kHz 8 bit MuLaw = Phone Quality

• Examples:• Music: 44.1, 32, 22.05, 16, 11.025kHz• Speech: 44.1, 32, 22.05, 16, 11.025, 8kHz

Page 38: Sound and Music for Video Games

Views of Digital Sound• Two (mainstream) views of sound

and their implications for compression

• 1) Sound is Perceived– The auditory system doesn’t hear everything present

– Bandwidth is limited

– Time resolution is limited

– Masking in all domains

• 2) Sound is Produced

– “Perfect” model could provide perfect compression

Page 39: Sound and Music for Video Games

Production Models

• Build a model of the sound production system, then fit the parameters– Example: If signal is speech, then a well-

parameterized vocal model can yield highest quality and compression ratio

• Benefits: Highest possible compression• Drawbacks: Signal source(s) must be

assumed, known, or identified

Page 40: Sound and Music for Video Games

MIDI and Other ‘Event’ Models• Musical Instrument Digital Interface

– Represents Music as Notes and Events – and uses a synthesis engine to “render” it.

• An Edit Decision List (EDL) is another example.– A history of source materials, transformations, and

processing steps is kept. Operations can be undone or recreated easily. Intermediate non-parametric files are not saved.

Page 41: Sound and Music for Video Games

Event Based Compression

• A Musical Score is a very compact representation of music

• Benefits:– Highest possible compression

• Drawbacks:– Cannot guarantee the “performance”– Cannot assure the quality of the sounds– Cannot make arbitrary sounds

Page 42: Sound and Music for Video Games

Event Based Compression• Enter General MIDI

– Guarantees a base set of instrument sounds,– and a means for addressing them,– but doesn’t guarantee any quality

• Better Yet, Downloadable Sounds– Download samples for instruments– Benefits: Does more to guarantee quality

– Drawbacks: Samples aren’t reality

Page 43: Sound and Music for Video Games

Event Based Compression

• Downloadable Algorithms– Specify the algorithm, the synthesis engine runs it,

and we just send parameter changes– Part of “Structured Audio” (MPEG4)

• Benefits:– Can upgrade algorithms later– Can implement scalable synthesis

• Drawbacks:– Different algorithm for each class of sounds

(but can always fall back on samples)

Page 44: Sound and Music for Video Games

Compressed Audio Formats

Name Extension OwnershipAIFF (Mac) .aif, .aiff Public

AU (Sun/Next) .au Public

CD audio (CDDA) N/A Public

MP3 .mp3 MPEG Audio Layer-III

Windows Media Audio .wma Proprietary (Microsoft)

QuickTime .qt Proprietary (Apple)

RealAudio .ra, ram Proprietary (Real Networks)

WAV .wav Public

Page 45: Sound and Music for Video Games

To be continued …

• Stop here

• Sound Group Technical Presentations.

• Suggested Topics:– Compression– Controlling the Environment– ToolKit I features– ToolKit II features– Examples and Demos

Page 46: Sound and Music for Video Games

Environmental Effects

• Obstruction/Occlusion

• Reverberation

• Doppler Shift

• Atmospheric Effects

Page 47: Sound and Music for Video Games

Obstruction

• Same as sound shadowing

• Generally approximated by a ray test and a low pass filter

• High frequencies should get shadowed while low frequencies diffract

Page 48: Sound and Music for Video Games

Obstruction

Page 49: Sound and Music for Video Games

Occlusion

• A completely blocked sound

• Example: A sound that penetrates a closed door or a wall

• The sound will be muffled (low pass filter)

Page 50: Sound and Music for Video Games

Reverberation

• Effects from sound reflection

• Similar to echo

• Static reverberation

• Dynamic reverberation

Page 51: Sound and Music for Video Games

Static Reverberation

• Relies on the “closed container” assumption

• Parameters used to specify approximate environment conditions (decay, room size, etc.)

• Example: Microsoft DirectSound3D EAX

Page 52: Sound and Music for Video Games

Static Reverberation

Page 53: Sound and Music for Video Games

Dynamic Reverberation

• Calculation of reflections off of surfaces taking into account surface properties

• Typically diffusion and diffraction ignored

• “Wave Tracing”

• Example: Aureal A3D 2.0

Page 54: Sound and Music for Video Games

Dynamic Reverberation

Page 55: Sound and Music for Video Games

Comparison

• Static Reverberation less expensive computationally, simple to implement

• Dynamic Reverberation very expensive computationally, difficult to implement, but potentially superior results

Page 56: Sound and Music for Video Games

Doppler Shift

• Change in frequency due to velocity

• Very susceptible to temporal aliasing

• The faster the update rate the better

• Requires dedicated hardware

Page 57: Sound and Music for Video Games

Atmospheric Effects

• Attenuate high frequencies faster than low frequencies

• Moisture in air increases this effect