Brent Cowan and Bill Kapralos · e.g., footsteps in a small room vs. footsteps outside in a large...
Transcript of Brent Cowan and Bill Kapralos · e.g., footsteps in a small room vs. footsteps outside in a large...
1
Brent Cowan and Bill Kapralos
Faculty of Business and Information Technology, University of Ontario Institute of Technology
2000 Simcoe Street North, Oshawa, Ontario, Canada. L1H 7K4.
2
Motivation (1):Importance of Real World Sounds
Sounds give detailed info of our surroundings
Determine direction and distance to objects
Warn of approaching dangers → particularly important in the “animal kingdom” e.g. predatorspo ta t t e a a gdo e g p edato s
Unlike vision, hearing is omni-directional
Can hear in complete darkness!
Can guide the more “finely tuned” visual system
UOIT Student Research Day – August 22 2008
Eases the burden of the visual system
3
Motivation (2):Importance of Real
World Sounds (cont.)We do not need to see a “roaring” lion to realize that we may be in athat we may be in a potentially dangerous situation
The lion’s roar is
UOIT Student Research Day – August 22 2008
enough!
4
Motivation (3):Importance of Real
World Sounds (cont.)We do not need to see an “angry” dog to realize that we may be in athat we may be in a potentially dangerous situation
The dog’s bark is
UOIT Student Research Day – August 22 2008
enough!
5
Motivation (4):Sound is an Essential Part of Any Immersive
Environment (VR, Games, etc.)Conveys basic information to the the users
e.g., footsteps in a small room vs. footsteps outside in a large open fielda large open field
Allows users to orient themselves
Increases situational awareness
Helps increase immersion and hence presence
UOIT Student Research Day – August 22 2008
Helps increase immersion and hence presence
Can enhance perception of poor video
Can provide a sense of ambience → mood and emotion
6
Motivation (5):Sound is an Essential Part of Any Immersive
Environment (VR, Games, etc.) (cont.)Although definitely downplayed, sound has actually been a key element of video games from the “early times”times
Consider the following sample
Does it sound familiar ?
UOIT Student Research Day – August 22 2008
7
Motivation (6):Sound is an Essential Part of
Any Immersive Environment
(VR, Games, etc.) (cont.)Namco’s Pac-man (1980)
“The world’s most popular arcade video game ever”
You can still recollect a “key” sound in this game → sound is
UOIT Student Research Day – August 22 2008
sound in this game → sound is more important than you might have thought!
8
Motivation (3):Spatial Sound Often Ignored in a VE
When present, typically:
Cues are poor → don’t always reflect natural spatial cues
“Far-field” acoustical model assumed → sounda e d acoust ca ode assu ed sou dsource at infinity, plane waves
Emphasis typically placed on visual senses
Graphics
UOIT Student Research Day – August 22 2008
Stereo vision, etc…
9
Overview (1):What is Auralization ?
According to Kleiner et al.
The process of rendering audible, by physical or mathematical modeling, the sound field of a source in space in such a way as to simulate the binauralspace in such a way as to simulate the binaural listening experience at a given position in the modeled space
Goal → recreate a particular listening environment, taking into account the acoustics of the environment
UOIT Student Research Day – August 22 2008
taking into account the acoustics of the environment (e.g., the “room acoustics”), and the characteristics of the listener
10
Overview (2):What is Auralization ? (cont.)
Auralization can be realized by determining the binaural room impulse response (BRIR)
BRIR represents the response of a particular acoustical environment and human listener to sound energy and captures the room acoustics for a particular sound source and listener configuration
UOIT Student Research Day – August 22 2008
11
Overview (4):For Simplicity, Typically Decomposed Into Two Components
Room impulse response (RIR)Represents the reflection (reverberation), diffraction, refraction, sound attenuation, and absorptionrefraction, sound attenuation, and absorption properties of a particular room configuration The environmental context of a listening room or the “room acoustics”
Head-related transfer function (HRTF)
UOIT Student Research Day – August 22 2008
Head-related transfer function (HRTF)Filtering of sound spectrum by interactions of sound with head, torso, and particularly pinna
12
Sound Localization (1):Head Related Transfer Function (HRTF)
Filt i f d t b i t ti f d ithFiltering of sound spectrum by interactions of sound with head, torso and particularly pinna
Pinna:Series of grooves and notches which accentuate or suppress mid & high frequency components in a position dependant manner
UOIT Student Research Day – August 22 2008
dependant manner
Each person’s pinna differs →filtering effects differ
13
Spatial Audio (1):Binaural Synthesis
Assume the HRTF and RIR can both be modeled by a linear time invariant (LTI) filters
Measure or model the HRTF and RIR → resulting transfer function can be used to filter a source sound
Combine the HRTF and RIR-filtered (processed) sounds via a post-processing operation
When presented to the listener the impression of the environment being synthesized is recreated
UOIT Student Research Day – August 22 2008
environment being synthesized is recreated
14
Graphics Processing Unit (GPU) (1):Overview
A dedicated graphics rendering device for a personal computer, workstation, or game console
Very efficient at manipulating and displaying computer graphics, and their highly parallel structure makes themgraphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms
Modern GPUs use most of their
UOIT Student Research Day – August 22 2008
power to do calculations related
to 3D computer graphics
15
Graphics Processing Unit (GPU) (2):Overview (cont.)
Modern GPUs contain a programmable pipelineModern GPUs contain a programmable pipeline
User flexibility → programmer is free to exploit the inherent power of the GPU
Shader → GPU program written in one of many “shader languages”
Can exploit GPU power for non-computer graphics applications
General purpose GPU or GPGPU
UOIT Student Research Day – August 22 2008
General purpose GPU or GPGPU→ many applications including computer vision, audio…
16
Goals of this Work (1):Application of the GPU to Spatial Audio
Take advantage of the tremendous computational power of the graphics processing unit
Develop a real-time, one dimensional convolution method that utilizes the GPU that can be employed p yfor the generation of spatial audio
Allow for the inclusion of plausible spatial audio in interactive virtual environments and games
P id i f l ft b d
UOIT Student Research Day – August 22 2008
Provide a comparison of general software-based convolution and GPU-based convolution
17
Method / Implementation (1):Implementation
OpenGL Shading Language
Executed on typical ypprogrammable graphics cards
UOIT Student Research Day – August 22 2008
18
Results (1):Comparison
Running time comparison between software-based and GPU-based convolution
Input signal → “sine-wave” whose size varied from 5,000 – 60,000 samples in increments of 5,000, , p ,
HRTF → obtained from the CIPIC HRTF dataset and consisted of 200 samples
All tests were performed on a Dell XPS 720 high-end gaming PC Intel Core 2 6700 (2 66GHz) with
UOIT Student Research Day – August 22 2008
end gaming PC → Intel Core 2 6700 (2.66GHz) with an NVIDIA GeForce 8800 GTX graphics card
19
Results (2):Comparison (cont.)
Graphical summary
UOIT Student Research Day – August 22 2008
20
Results (3):Graphical Comparison
Hi h d b tHigher-order bytes
Visually, there appears to be no difference between software-based and GPU-based convolution
UOIT Student Research Day – August 22 2008
21
Results (4):Graphical Comparison
Hi h d b tHigher-order bytes
Visually, it is evident that artifacts (noise) are introduced to the lower-order bytes of the GPU-based convolution output
UOIT Student Research Day – August 22 2008
22
Conclusions (1):Summary
Development of a GPU-based convolution method using the OpenGL Shading Language
Real-time performance → constant running time of approximately 4ms with a filter with 200 coefficientspp y
Can be executed on general graphics cards that support programmable GPUs
Convolution is vital to the generation of spatial (3D) d
UOIT Student Research Day – August 22 2008
sound
This work demonstrates that real-time spatial sound is now possible!
23
Conclusions (2):Future Work
Despite the real-time performance, the method does introduce artifacts (noise) to the resulting filtered signal
Lower order bytes only affected
Hearing is a perceptual processHearing is a perceptual process
Will these artifacts have any perceptual consequences ?
User tests must be conducted to examine what (if
UOIT Student Research Day – August 22 2008
any) role these artifacts have to the listener