Spatial acoustics EE E6820: Speech & A udio Processing ...dpwe/e6820/lectures/E6820-L08-spat… ·...
Transcript of Spatial acoustics EE E6820: Speech & A udio Processing ...dpwe/e6820/lectures/E6820-L08-spat… ·...
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 1
EE E6820: Speech & Audio Processing & Recognition
Lecture 8:
Spatial sound
Spatial acoustics
Binaural perception
Synthesizing spatial audio
Extracting spatial sounds
Dan Ellis <[email protected]>http://www.ee.columbia.edu/~dpwe/e6820/
Columbia University Dept. of Electrical EngineeringSpring 2006
1
2
3
4
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 2
Spatial acoustics
• Received sound = source + channel
- so far, only considered ideal source waveform
• Sound carries information on its spatial origin
- e.g. “ripples in the lake”
- evolutionary significance
• The basis of scene analysis?
- yes and no - try blocking an ear
1
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 3
Ripples in the lake
• Effect of relative position on sound
- delay =
!
!
r
/
c
- energy decay ~
1/
r
2
- absorption ~
G
(
f
)
r
- direct energy plus reflections
• Give cues for recovering source position
• Describe wavefront by its normal
SourceSource
Listener
Wavefront (@ c m/s)
Energy " 1/r2
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 4
Recovering spatial information
• Source direction as wavefront normal
- moving plane found from timing at 3 points
- need to solve correspondence
• Space:
need 3 parameters
- e.g. 2 angles and range
wavefront
A
B
Ctime
pre
ssure
!t/c = !s = AB·cos#
#
range r
azimuth#
elevation$
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 5
The effect of the environment
• Reflection causes additional wavefronts
- + scattering, absorption
- many paths
!
%
many echoes
• Reverberant effect
- causal ‘smearing’ of signal energy
reflection
diffraction
& shadowing
time / sec
fre
q /
Hz
time / sec
fre
q /
Hz
0 0.5 1 1.50
2000
4000
6000
8000
0 0.5 1 1.50
2000
4000
6000
8000y p y
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 6
Reverberation impulse response
• Exponential decay of reflections:
• Frequency-dependent
- greater absorption at high frequencies
!
%
faster decay
• Size-dependent
- larger rooms
!
%
longer delays
!
%
slower decay
• Sabine’s equation:
• Time constant as size, absorption
t
hroom(t) ~e-t/T
time / s
fre
q /
Hz
hlwy16 - 128pt window
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
2000
4000
6000
8000
-70
-60
-50
-40
-30
-20
-10
RT60
0.049V
S&-----------------=
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 7
Outline
Spatial acoustics
Binaural perception
- The sound at the two ears
- Available cues
- Perceptual phenomena
Synthesizing spatial audio
Extracting spatial sounds
1
2
3
4
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 8
Binaural perception
• What is the information in the 2 ear signals?
- the sound of the source(s) (L+R)
- the position of the source(s) (L-R)
• Example waveforms (ShATR database)
2
path lengthdifference
path lengthdifference
head shadow(high freq)
source
LR
2.2 2.205 2.21 2.215 2.22 2.225 2.23 2.235
-0.1
-0.05
0
0.05
0.1
time / s
shatr78m3 waveform
Left
Right
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 9
Main cues to spatial hearing
• Interaural time difference (ITD)
- from different path lengths around head
- dominates in low frequency (< 1.5 kHz)
- max ~ 750
!
"
s
!
%
ambiguous for freqs > 600 Hz
• Interaural intensity difference (IID)
- from head shadowing of far ear
- negligable for LF; increases with frequency
• Spectral detail (from pinna relfections)
useful for elevation & range
• Direct-to-reverberant useful for rangeClaps 33 and 34 from 627M:nf90
time / s
fre
q /
kH
z
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80
5
10
15
20
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 10
Head-Related Transfer Fns (HRTFs)
• Capture source coupling as impulse responses
• Collection: (
http://interface.cipic.ucdavis.edu/
)
• Highly individual!
l# $ R# # t$ % r# $ R# # t$ %#& '
0 0.5 1 1.5
-45
0
45
0 0.5 1 1.5
0
1
0 0.5 1 1.5-1
0
1
time / ms time / ms
HRIR_021 Left @ 0 el
HRIR_021 Left @ 0 el 0 az
HRIR_021 Right @ 0 el 0 az
HRIR_021 Right @ 0 el
LEFT
RIGHT
Azim
uth
/ d
eg
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 11
Cone of confusion
• Interaural timing cue dominates (below 1kHz)
- from differing path lengths to two ears
• But: only resolves to a cone
- Up/down? Front/back?
azimuth#
Cone of confusion (approx equal ITD)
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 12
Further cues
• Pinna causes elevation-dependent coloration
• Monaural perception
- separate coloration from source spectrum?
• Head motion
- synchronized spectral changes
- also for ITD (front/back) etc.
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 13
Combining multiple cues
• Both ITD and ILD influence azimuth;
What happens when they disagree?
- trading @ around 0.1 ms / dB
t t
r(t)
1 ms
l(t)
t t
r(t)l(t)
Identical signals to both ears% image is centered
Delaying right channelmoves image to left
t t
r(t)l(t)
Attenuating left channelreturns image to center
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 14
Binaural position estimation
• Imperfect results:
(Arruda, Kistler & Wightman 1992)
- listening to ‘wrong’ hrtfs
!
%
errors
- front/back reversals stay on cone of confusion
-180 -120 -60 0 60 120 1800
Target Azimuth (Deg)
-180
-120
-60
60
120
180
0
Judged A
zim
uth
(D
eg)
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 15
The Precedence Effect
• Reflections give misleading spatial cues
• But: Spatial impression based on 1st wavefront
then ‘switches off’ for ~50 ms
- .. even if ‘reflections’ are louder
- .. leads to impression of room
t
l(t)
tR/c
R r(t)
directreflected
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 16
Binaural Masking Release
• Adding noise to reveal target
- why does this make sense?
• Binaural Masking Level Difference up to 12dB
- greatest for noise in phase, tone anti-phase
t
t
Tone + noise to one ear:tone is masked
+
t
t
Identical noise to other ear:tone is audible
t
+
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 17
Outline
Spatial acoustics
Binaural perception
Synthesizing spatial audio
- Position
- Environment
Extracting spatial sounds
1
2
3
4
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 18
Synthesizing spatial audio
• Goal: recreate realistic soundfield
- hi-fi experience
- synthetic environments (VR)
• Constraints
- resources
- information (individual HRTFs)
- delivery mechanism (headphones)
• Source material types
- live recordings (actual soundfields)
- synthetic (studio mixing, virtual environments)
3
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 19
Classic stereo
• ‘Intensity panning’:
no timing modifications, just vary level ±20 dB
- works as long as listener is equidistant (ILD)
• Surround sound:
extra channels in center, sides, ...
- same basic effect - pan between pairs
L R
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 20
Simulating reverberation
• Can characterize reverb by impulse response
- spatial cues are important - record in stereo
- IRs of ~ 1 sec % very long convolution
• Image model: reflections as duplicate sources
• ‘Early echos’ in room impulse response:
• Actual reflection may be hreflect(t), not '(t)
source listener
virtual (image) sources
reflectedpath
t
hroom(t)
direct pathearly echos
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 21
Artificial reverberation
• Reproduce perceptually salient aspects
- early echo pattern (% room size impression)
- overall decay tail (% wall materials...)
- interaural coherence (% spaciousness)
• Nested allpass filters (Gardner ’92)
z-k+ +
-g
g
g,k
x[n] y[n]
nk 2k 3k
-g
1-g2
g(1-g2)g2(1-g2)
h[n]
z-k - g
1 - g·z-kH(z) =
20,0.3
Allpass
Nested+Cascade Allpass Synthetic Reverb
30,0.750,0.5
AP0+ AP1 AP2
LPFg
a0 a1 a2
+ +
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 22
Synthetic binaural audio
• Source convolved with {L,R} HRTFs gives
precise positioning
- ...for headphone presentation
- can combine multiple sources (by adding)
• Where to get HRTFs?
- measured set, but: specific to individual, discrete
- interpolate by linear crossfade, PCA basis set
- or: parametric model - delay, shadow, pinna
• Head motion cues?
- head tracking + fast updates
Source
Delay Shadow Pinna
z-tDL(#)1 - azt
1 - bL(#)z-1
z-tDR(#)1 - azt
1 - bR(#)z-1
(!pkL(#,$)·z-tPkL(#,$)
(!pkR(#,$)·z-tPkR(#,$)
Room echoKE·z-tE
+
+
(after Brown & Duda '97)
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 23
Transaural sound
• Binaural signals without headphones?
• Can cross-cancel wrap-around signals
- speakers SL,R, ears EL,R, binaural signals BL,R.
• Narrow ‘sweet spot’
- head motion?
SL
HLL
1–B
LH
RLS
R–$ %=
SR
HRR
1–B
RH
LRS
L–$ %=
EL ER
HRR
HRLHLR
HLL
SL
BL
SR
BR
M
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 24
Soundfield reconstruction
• Stop thinking about ears
just reconstruct pressure + spatial derivatives
- ears in reconstructed field receive same sounds
• Complex reconstruction setup (ambisonics)
- able to preserve head motion cues?
p(x,y,z,t)
)p(t)/)z
)p(t)/)x)p(t)/)y
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 25
Outline
Spatial acoustics
Binaural perception
Synthesizing spatial audio
Extracting spatial sounds
- Microphone arrays
- Modeling binaural processing
1
2
3
4
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 26
Extracting spatial sounds
• Given access to soundfield,
can we recover separate components?
- degrees of freedom:
>N signals from N sensors is hard
- but: people can do it (somewhat)
• Information-theoretic approach
- use only very general constraints
- rely on precision measurements
• Anthropic approach
- examine human perception
- attempt to use same information
4
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 27
Microphone arrays
• Signals from multiple microphones can be
combined to enhance/cancel certain sources
• ‘Coincident’ mics with diff. directional gains
• Microphone arrays (endfire)
m1s1
m2
s2
a21
a22
a12a11
m1
m2
a11
a12
a21
a22
s1
s2
*= s1ˆ
s2ˆ
+ A1–m*=
DD +D ++
-40
-20
0
, = 4D
, = 2D
, = D
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 28
Adaptive Beamforming &
Independent Component Analysis (ICA)
• Formulate mathematical criteria to optimize
• Beamforming: Drive interference to zero
- cancel energy during nontarget intervals
• ICA: maximize mutual independence of outputs
- from higher-order moments during overlap
• Limited by separation model parameter space
- only NxN?
m1m2
s1s2
a11a21
a12a22
x
-' MutInfo
'a
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 29
Binaural models
• Human listeners do better?
- certainly given only 2 channels
• Extract ITD and IID cues?
- cross-correlation finds timing differences
- ‘consume’ counter-moving pulses
- how to achieve IID, trading
- vertical cues...
-6 -4 -2 0 2 4 6lag / ms
100
200
400
800
1600
3200
Ce
nte
r fr
eq
/ H
z
Interauralcross-correlation
Target azimuth
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 30
Nonlinear filtering
• How to separate sounds based on direction?
- estimate direction locally
- choose target direction
- remove energy from other directions
• E.g. Kollmeier, Peissig & Hohman ’93
- IID from |Lw|/|Rw|; ITD (IPD) from arg{LwRw*}
- match to IID/IPD template for desired direction
- also reverberation?
time
frequency
Xw(mH,2.k/NT)
FFT
analysis
Modulus
l L w |L w| 2
FFT
analysisr R w
Modulus
|R w| 2
Cross-correlation
LwR*w
Smooth(1-a)
(1-az-1) SLL
Smooth(1-a)
(1-az-1)
Smooth(1-a)
(1-az-1)
SLR
SRR
Gainfactorcalc
l'
OLA-FFT
synthesis r'
OLA-FFT
synthesis
g
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 31
Summary
• Spatial sound
- sampling at more than one point gives
information on origin direction
• Binaural perception
- time & intensity cues used between/within ears
• Sound rendering
- conventional stereo
- HRTF-based
• Spatial analysis
- optimal linear techniques
- elusive auditory models
E6820 SAPR - Dan Ellis L08 - Spatial sound 2006-03-23 - 32
References
B.C.J. Moore, An introduction to the psychology of hearing
(4th ed.) Academic, 1997.
J. Blauert, Spatial Hearing (revised ed.), MIT Press, 1996.
R.O. Duda, Sound Localization Research, http://www.engr.sjsu.edu/~duda/Duda.Research.frameset.html