Begault 2002 Challenges for Realistic Room Simulation€¦ · Challenges and solutions for...

Post on 09-Oct-2020

3 views 0 download

Transcript of Begault 2002 Challenges for Realistic Room Simulation€¦ · Challenges and solutions for...

Challenges and solutions for realistic room simulation

ASA June, 2002

Durand R. BegaultHuman Factors Research and Technology Division

NASA Ames Research Center

Moffett Field, California

Background

• Perception of the acoustical environmental context • Basic technique of auralization (virtual room synthesis)

Challenges for improved simulation

• Determining acceptable data reduction in measurement and synthesis (echo thresholds)

•Simulation of low frequency energy, coupled spaces

• Accounting for localization error due to reproduction method

• Accurate simulation of source directivity

Spatial hearing fundamentally involves perception of the location of a sound source at a point in space (azimuth, elevation, distance).

Distance

Elevation

Azimuth

Listener

Spatial (and non-spatial) hearing simultaneously reveals information about the environmental context (EC) of a sound source.

• Reverberance, echoes, surface characteristics reveal dimension, extent of space via cognitive associations

DistanceElevation

Azimuth

Environmental Context

Listener

Spatial (and non-spatial) hearing simultaneously reveals information about the environmental context (EC) of a sound source.

• Reverberance, echoes, surface characteristics reveal dimension, extent of space via cognitive associations

• Interactive cues: EC’s effect on speech effort, intelligibility

DistanceElevation

Azimuth

Environmental Context

Listener

Spatial (and non-spatial) hearing simultaneously reveals information about the environmental context (EC) of a sound source.

• Reverberance, echoes, surface characteristics reveal dimension, extent of space via cognitive associations

• Interactive cues: EC’s effect on speech effort, intelligibility

• Perceived sound source width and distance affected by EC

(“concert hall” musical acoustic factors - intimacy, envelopment)

DistanceElevation

Azimuth

Environmental Context

Listener

Image Size

Spatial (and non-spatial) hearing simultaneously reveals information about the environmental context (EC) of a sound source.

• Reverberance, echoes, surface characteristics reveal dimension, extent of space via cognitive associations

• Interactive cues: EC’s effect on speech effort, intelligibility

• Perceived sound source width and distance affected by EC

• Simultaneous sources: ambient sound & background noises (inside & outside) contribute toEC perception

DistanceElevation

Azimuth

Environmental Context

Listener

Auralization: virtual simulation of sound sources within an environmental context, using loudspeakers or headphones.

Useful applications for auralization technology include:

• Psychoacoustic investigations of spatial hearing in realistic acoustical environments (e.g., echo thresholds)

• Acoustical engineering(listening to a virtual room as a part of the design phase)

• Improved virtual environmentsimulation and 3-D sound reproduction

• Increased realism for entertainment (music reproduction)

• Evaluation of sound quality for industrial applications

Basic technique for auralization• Modeling (pre-synthesis) phase:

either measure or predictroom impulse response(data reduction is usuallynecessary)

Basic technique for auralization• Modeling (pre-synthesis) phase:

either measure or predictroom impulse response(data reduction is usuallynecessary)

• Translate impulse responseto listener-referenced virtual source locations

• Apply virtual acoustic techniques (HRTF filtering) to derive the binaural impulse response

Basic technique for auralization• Modeling/pre-synthesis phase:

either measure or predictroom impulse response(data reduction is usuallynecessary)

• Translate impulse responseto listener-referenced virtual source locations

• Apply virtual acoustic techniques (HRTF filtering) to derive the binaural impulse response

Basic technique for auralization• Modeling/pre-synthesis phase:

either measure or predictroom impulse response(data reduction is usuallynecessary)

• Translate impulse responseto listener-referenced virtual source locations

• Synthesis phase:Convolve “neutral” (anechoic)signals with the binaural impulse response of the room

(data reduction is usuallynecessary for rendering)

• Recent study indicates that echo thresholds in real and virtualenvironments are similar

• Engineering rule of thumb: early reflections re direct sound should be inaudible < -21 dB @ 3 ms and < -30 dB @ 15-30 ms (add 5 dB for speech stimuli).

Although thresholds for reverberation are relatively low, background noise(e.g., NC 35) can mask a significant portion of reverberant energy(particularly the reverberant decay).

One-Third Octave Band Noise Criteria (NC)

0

10

20

30

40

50

60

70

80

One-Third Octave Band Center Frequency, Hz

Normal Speech

relative threshold forreverberation

NC 65

NC 60

NC 55

NC 50

NC 45

NC 40

NC 35

NC 30

NC 25

NC 20

NC 15Approximate Threshold of Hearing for Continuous Noise

NC 10NC 5

31.5 63 125 250 500 1k 2k 4k 8k-40

-35

-30

-25

-20

-15

-10

250

500

1000

2000 fb

w

250

500

1000

2000 fb

w

250

500

1000

2000 fb

w

Small Medium Large

Octave-Band Center Frequency (fbw=full bandwidth)

Rev

erbe

ratio

n th

resh

old

(spe

ech)

re 6

0 dB

SP

L

• Creating virtual spatial sound environments:playback

Historically, the use of acoustical absorption to ‘neutralize’acoustical characteristics of rooms began ca. 1930s

Sound stage & control room Acoustical wall/ceiling panels

Recording for film

The realism of environmental context simulations has increased in tandem with the “neutralization” of rooms via absorption

• Echo chambers• Reverberation plates• Spring reverberators••• Real-time convolution

for head-trackedauralization

Reverberation using echo chamber, NBC, 1930s

Auditorium simulators using actual sound sources

University of Goettigen 1965.

Technical University of Denmark,Lyngby, 1992.

0

5

10

15

20

25

30

anechoic early reflections full auralization

reverberation treatment

Uns

igne

d az

imut

h er

ror

(deg

rees

)

Localization error for headphone stimuli (azimuth)

AnechoicSpeech:Individualdifferences

Mean values for different reverberation conditions

Head-tracked systems increase realism of 3D audio simulations in general (minimization of reversals)

Auditory localization can be influenced or biased bycognitive references and visual capture

• Creating virtual spatial sound environments:measurement

2. Make a binaural recording using a dummy head

Spatial room impulseresponses can be obtainedfrom a existing room by…

1. Calculating the responsefrom a room model

(ray tracing, image modeling)

2. Recording the binaural impulse response

3. Recording a directional impulse response for post-processing of directional information

Challenges:-adequate yet reasonablesampling of sources & receivers

-directivity data for sound sourceslimited; sound source movementusually unaccounted for

-modeling low frequency behavior

Calculate the impulse responsefrom a room model

(using ray tracing, image modeling)

Ray tracing, while imperfect, can be an efficient means for finding the location of disturbing echoes

directional horn

Challenges:-localization error due to non-individualized HRTF-deriving spatial information froma 2-channel source

-multiple measurements for each receiver position required

Measure the binauralroom impulse response

via a dummy head recording

Directional mic room impulse response.

Essert (1996) used a B-format output from a SoundFieldMKV microphone to obtain one omnidirectional (W) and three dipole IRs (oriented left-right X, back-front Y, and down-up Z, respectively)

The omnidirectional response reveals the arrival time of significant early reflections;

Cross-correlations between the monopole and dipole responses indicate reflection direction of arrival.

Individualized HRTFs can be applied during the synthesis phase to significant early reflections

Intensity measurements have also been investigated in the literature.

• Simulation of coupled spaces, low frequency energy

Reverberation can cause spatial movement of reverberant sound within coupled spaces

-examples: theater; organ music in a gothic cathedral

-difficult to model

Undamped stagehouseLong reverberant decay

Sound source

Absorptive seating areaShort reverberant decay

Measurement of “moving reverberation”:

Grace Cathedral, San Francisco

-starter pistol sound source at 5 locations

-7 measurement microphones simultaneously recorded with synch pulse

-Superior to binaural measurement since location/time can be tracked

310’

0 . 5 0 . 5 5 0 . 6 0 . 6 5 0 . 7 0 . 7 5 0 . 8 0 . 8 5 0 . 9 0 . 9 5 1- 8 0

- 7 5

- 7 0

- 6 5

- 6 0

- 5 5

- 5 0

- 4 5

- 4 06 3 H z O c t a v e b a n d 2 0 m s e n v e l o p e

T i m e ( s )

Rel

ativ

e le

vel (

dB)

u p p e r a p s e l o w e r a p s e u p p e r a l t a r l o w e r a l t a r u p p e r n a v e l o w e r n a v e v e s t i b u l e Upper nave Upper apse

0 0 . 0 5 0 . 1 0 . 1 5 0 . 2 0 . 2 5 0 . 3 0 . 3 5 0 . 4 0 . 4 5 0 . 5- 8 0

- 7 5

- 7 0

- 6 5

- 6 0

- 5 5

- 5 0

- 4 5

- 4 06 3 H z O c t a v e b a n d 2 0 m s e n v e l o p e

T i m e ( s )

Rel

ativ

e le

vel (

dB)

u p p e r a p s e l o w e r a p s e u p p e r a l t a r l o w e r a l t a r u p p e r n a v e l o w e r n a v e v e s t i b u l e

Upper-lower altarvestibule

Sound source at altar: 20 ms envelope

Overlap of frequency sensitivities for haptic and acoustic stimuli

“Haptic” includes both kinesthetic (muscle) and touch (cutaneous) sensations.– Touch: 0 – 1 kHz– Kinesthesia: 0 – 30 Hz– Audition: .02- 20 kHz: overlap from .02 –1 kHz

Very low-frequency vibration can also be sensed via vestibular, visceral, and visual sensory systems

Transmission paths from a “felt-heard” source to a receiver

-Airborne pressure waves (sonic & infrasonic)-transfer of acoustic wave to mechanical vibration possible-simultaneous auditory-haptic stimuli possible

(touch; bone conduction; visceral)-Ground / structure-borne vibration

-”Full-body” vibrationfrom 8–80 Hz (displacement threshold 1.5 x 10-6 m)

-Haptic sensation via kinesthetic, touch sensors-Transfer of vibration into acoustic excitation

-Perception of haptic and acoustical stimuli influenced by presence of coordinated visual stimuli

Vibration source(internal, external)

Ground, Structure response

hearing feeling seeing

Airbornesound

Response: qualitative assessment

Performance metric

Walls,Windows,

objects

Chairs,Tables,

floor

Walls,Windows,

plants

3-D audiodisplay

Head-mounted

visualdisplay

ExpectationInter-modal coordinationIdentificationExperience-adaptation

Why simulate- and manipulate- both haptic and acoustical phenomena simultaneously?

-Improved “realism” ,“immersion” in virtual enviroments(experiments; entertainment; training)

Simulations of musical performance (as player or listener) in particular seem incomplete without tactile cues

-Product quality evaluation (e.g., automotive industry)

-Performance enhancement for tasks in VE(e.g., transportation simulators)

Thresholds for vibrotactile perception, ball of thumb-peak at 250 Hz corresponds to acceleration of 0.24 m/s2

Workplace sound and vibration at a desk

Summary

• Challenges for improved simulation include determining acceptable data reduction in measurement and synthesis (echo thresholds)

• Data reduction based on echo thresholds and reverberant masking can relax demands onauralization synthesis computation (important for real-time rendering that includes head tracking).

•Simulation of low frequency effects, modal prediction, vibration, and source directivity require further research efforts