Extracting spatial sounds ... - Columbia Universitydpwe/e6820/lectures/E... · synthetic...
Transcript of Extracting spatial sounds ... - Columbia Universitydpwe/e6820/lectures/E... · synthetic...
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 1
EE
E6820: S
peech & A
udio Processing &
Recognition
Lectu
re 8:S
patial so
un
d
Sp
atial acou
stics
Bin
aural p
erceptio
n
Syn
thesizin
g sp
atial aud
io
Extractin
g sp
atial sou
nd
s
Dan E
llis <dpw
bia.edu>http://w
ww
.ee.columbia.edu/~
dpwe/e6820/
Colum
bia University D
ept. of Electrical E
ngineeringS
pring 2006
1234
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 2
Sp
atial acou
stics
•R
eceived so
un
d = so
urce + ch
ann
el
-so far, only considered ideal source w
aveform
•S
ou
nd
carries info
rmatio
n o
n its sp
atial orig
in
-e.g. “ripples in the lake”
-evolutionary significance
•T
he b
asis of scen
e analysis?
-yes and no - try blocking an ear
1
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 3
Rip
ples in
the lake
•E
ffect of relative p
ositio
n o
n so
un
d
-delay =
∆
r
/
c
-energy decay ~
1/
r
2
-absorption ~
G
(
f
)
r
-direct energy plus reflections
•G
ive cues fo
r recovering
sou
rce po
sition
•D
escribe w
avefron
t by its no
rmal
Source
Source
Listener
Wavefront (@
c m
/s)
Energy ∝
1/r 2
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 4
Recoverin
g sp
atial info
rmatio
n
•S
ou
rce directio
n as w
avefron
t no
rmal
-m
oving plane found from tim
ing at 3 points
-need to solve correspondence
•S
pace:
need
3 param
eters
-e.g. 2 angles and range
wavefront
A
BCtim
e
pressure
∆t/c =
∆s =
AB
·cosθ
θ
range razim
uthθ
elevationφ
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 5
Th
e effect of th
e environ
men
t
•R
eflectio
n cau
ses add
ition
al wavefro
nts
-+
scattering, absorption-
many paths
→
many echoes
•R
everberan
t effect
-causal ‘sm
earing’ of signal energy
reflection
diffraction&
shadowing
time / sec
freq / Hz
time / sec
freq / Hz
00.5
11.5
0
2000
4000
6000
8000
00.5
11.5
0
2000
4000
6000
8000y
py
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 6
Reverb
eration
imp
ulse resp
on
se
•E
xpo
nen
tial decay o
f reflectio
ns:
•F
requ
ency-d
epen
den
t
-greater absorption at high frequencies
→
faster decay
•S
ize-dep
end
ent
-larger room
s
→
longer delays
→
slower decay
•S
abin
e’s equ
ation
:
•T
ime co
nstan
t as size, abso
rptio
n
t
hroom
(t)~
e - t/T
time / s
freq / Hz
hlw
y16 - 128pt w
ind
ow
00.1
0.20.3
0.40.5
0.60.7
0
2000
4000
6000
8000
-70
-60
-50
-40
-30
-20
-10
RT
600.049
VSα
-----------------=
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 7
Ou
tline
Sp
atial acou
stics
Bin
aural p
erceptio
n
-T
he sound at the two ears
-A
vailable cues-
Perceptual phenom
ena
Syn
thesizin
g sp
atial aud
io
Extractin
g sp
atial sou
nd
s
1234
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 8
Bin
aural p
erceptio
n
•W
hat is th
e info
rmatio
n in
the 2 ear sig
nals?
-the sound of the source(s) (L+
R)
-the position of the source(s) (L-R
)
•E
xamp
le wavefo
rms (S
hA
TR
datab
ase)
2
path lengthdifference
path lengthdifference head shadow
(high freq)
source
LR
2.22.205
2.212.215
2.222.225
2.232.235
-0.1
-0.05 0
0.05
0.1
time
/s
shatr78m
3 wavefo
rm
Left
Right
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 9
Main
cues to
spatial h
earing
•In
teraural tim
e differen
ce (ITD
)
-from
different path lengths around head-
dominates in low
frequency (< 1.5 kH
z)-
max ~
750
µ
s
→
ambiguous for freqs >
600 Hz
•In
teraural in
tensity d
ifference (IID
)
-from
head shadowing of far ear
-negligable for LF
; increases with frequency
•S
pectral d
etail (from
pin
na relfectio
ns)
usefu
l for elevatio
n &
rang
e
•D
irect-to-reverb
erant u
seful fo
r rang
eC
laps 33 an
d 34 fro
m 627M
:nf90
time / s
freq / kHz
00.2
0.40.6
0.81
1.21.4
1.61.8
0 5 10
15 20
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 10
Head
-Related
Transfer F
ns (H
RT
Fs)
•C
aptu
re sou
rce cou
plin
g as im
pu
lse respo
nses
•C
ollectio
n: (
http://interface.cipic.ucdavis.edu/
)
•H
igh
ly ind
ividu
al!
lθφ
R,
,t()
rθφ
R,
,t()
,{
}
00.5
11.5
-45 0 45
00.5
11.5
0 100.5
11.5
-1 0 1
time / m
stim
e / ms
HR
IR_021 L
eft @ 0 el
HR
IR_021 L
eft @ 0 el 0 az
HR
IR_021 R
igh
t @ 0 el 0 az
HR
IR_021 R
igh
t @ 0 el
LEF
T
RIG
HT
Azimuth / deg
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 11
Co
ne o
f con
fusio
n
•In
teraural tim
ing
cue d
om
inates (b
elow
1kHz)
-from
differing path lengths to two ears
•B
ut: o
nly reso
lves to a co
ne
-U
p/down? Front/back?
azimuth
θ
Co
ne o
f con
fusio
n�
(app
rox eq
ual IT
D)
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 12
Fu
rther cu
es
•P
inn
a causes elevatio
n-d
epen
den
t colo
ration
•M
on
aural p
erceptio
n
-separate coloration from
source spectrum?
•H
ead m
otio
n
-synchronized spectral changes
-also for IT
D (front/back) etc.
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 13
Co
mb
inin
g m
ultip
le cues
•B
oth
ITD
and
ILD
infl
uen
ce azimu
th;
Wh
at hap
pen
s wh
en th
ey disag
ree?
-trading @
around 0.1 ms / dB
tt
r(t)1 ms
l(t)
tt
r(t)l(t)
Identical signals to both ears→
image is centered
Delaying right channel
moves im
age to left
tt
r(t)l(t)
Attenuating left channel
returns image to center
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 14
Bin
aural p
ositio
n estim
ation
•Im
perfect resu
lts:
(Arruda, K
istler & W
ightman 1992)
-listening to ‘w
rong’ hrtfs
→
errors-
front/back reversals stay on cone of confusion
-180-120
-600
60120
1800
Target A
zimuth (D
eg)
-180
-120
-60
60
120
1800
Judged Azimuth (Deg)
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 15
Th
e Preced
ence E
ffect
•R
eflectio
ns g
ive mislead
ing
spatial cu
es
•B
ut: S
patial im
pressio
n b
ased o
n 1st w
avefron
t th
en ‘sw
itches o
ff’ for ~50 m
s-
.. even if ‘reflections’ are louder-
.. leads to impression of room
t
l(t)
tR
/c
Rr(t)
directreflected
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 16
Bin
aural M
asking
Release
•A
dd
ing
no
ise to reveal targ
et
-w
hy does this make sense?
•B
inau
ral Maskin
g L
evel Differen
ce up
to 12d
B-
greatest for noise in phase, tone anti-phase
tt
Tone + noise to one ear:
tone is masked+
tt
Identical noise to other ear:tone is audible
t
+
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 17
Ou
tline
Sp
atial acou
stics
Bin
aural p
erceptio
n
Syn
thesizin
g sp
atial aud
io-
Position
-E
nvironment
Extractin
g sp
atial sou
nd
s
1234
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 18
Syn
thesizin
g sp
atial aud
io
•G
oal: recreate realistic so
un
dfi
eld-
hi-fi experience-
synthetic environments (V
R)
•C
on
straints
-resources
-inform
ation (individual HR
TF
s)-
delivery mechanism
(headphones)
•S
ou
rce material typ
es-
live recordings (actual soundfields)-
synthetic (studio mixing, virtual environm
ents)
3
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 19
Classic stereo
•‘In
tensity p
ann
ing
’: n
o tim
ing
mo
difi
cation
s, just vary level ±20 d
B-
works as long as listener is equidistant (ILD
)
•S
urro
un
d so
un
d:
extra chan
nels in
center, sid
es, ...-
same basic effect - pan betw
een pairs
LR
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 20
Sim
ulatin
g reverb
eration
•C
an ch
aracterize reverb by im
pu
lse respo
nse
-spatial cues are im
portant - record in stereo-
IRs of ~
1 sec → very long convolution
•Im
age m
od
el: reflectio
ns as d
up
licate sou
rces
•‘E
arly echo
s’ in ro
om
imp
ulse resp
on
se:
•A
ctual refl
ection
may b
e h
reflect (t), no
t δ(t)
sourcelistener
virtual (image) sources
reflectedpath
t
hroom
(t)
direct pathearly echos
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 21
Artifi
cial reverberatio
n
•R
epro
du
ce percep
tually salien
t aspects
-early echo pattern (→
room size im
pression)-
overall decay tail (→ w
all materials...)
-interaural coherence (→
spaciousness)
•N
ested allp
ass filters
(Gardner ’92)
z-k
++
-g
g
g,k
x[n]y[n]
nk
2k3k
-g
1-g2g(1-g
2)g
2(1-g2)
h[n]
z-k - g
1 - g·z-k
H(z) =
20,0.3
Allp
ass
Nested
+Cascad
e Allp
assS
ynth
etic Reverb
30,0.750,0.5
AP
0+
AP
1A
P2
LPF
g
a0
a1
a2
++
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 22
Syn
thetic b
inau
ral aud
io
•S
ou
rce convo
lved w
ith {L
,R} H
RT
Fs g
ives p
recise po
sition
ing
-...for headphone presentation
-can com
bine multiple sources (by adding)
•W
here to
get H
RT
Fs?
-m
easured set, but: specific to individual, discrete-
interpolate by linear crossfade, PC
A basis set
-or: param
etric model - delay, shadow
, pinna
•H
ead m
otio
n cu
es?-
head tracking + fast updates
Source
Delay
Shadow
Pinna
z -tDL (θ)
1 - az t1 - b
L (θ)z -1
z -tDR (θ)
1 - az t1 - b
R (θ)z -1
Σ p
kL (θ,φ)·z -tPkL (θ,φ)
Σ p
kR (θ,φ)·z -tPkR (θ,φ)
Room
echoK
E ·z -tE
++
(after Brow
n & D
uda '97)
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 23
Transau
ral sou
nd
•B
inau
ral sign
als with
ou
t head
ph
on
es?
•C
an cro
ss-cancel w
rap-aro
un
d sig
nals
-speakers S
L,R , ears E
L,R , binaural signals B
L,R .
•N
arrow
‘sweet sp
ot’
-head m
otion?
SL
HL
L 1–B
LH
RL S
R–
()
=
SR
HR
R 1–B
RH
LR
SL
–(
)=
EL
ER
HR
R
HR
LH
LR
HLL
SL
BL
SR
BR
M
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 24
So
un
dfi
eld reco
nstru
ction
•S
top
thin
king
abo
ut ears
just reco
nstru
ct pressu
re + spatial d
erivatives
-ears in reconstructed field receive sam
e sounds
•C
om
plex reco
nstru
ction
setup
(amb
ison
ics)
-able to preserve head m
otion cues?
p(x,y,z,t)
∂p(t)/∂z∂p(t)/∂x
∂p(t)/∂y
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 25
Ou
tline
Sp
atial acou
stics
Bin
aural p
erceptio
n
Syn
thesizin
g sp
atial aud
io
Extractin
g sp
atial sou
nd
s-
Microphone arrays
-M
odeling binaural processing
1234
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 26
Extractin
g sp
atial sou
nd
s
•G
iven access to
sou
nd
field
, can
we recover sep
arate com
po
nen
ts?-
degrees of freedom:
>N
signals from N
sensors is hard-
but: people can do it (somew
hat)
•In
form
ation
-theo
retic app
roach
-use only very general constraints
-rely on precision m
easurements
•A
nth
rop
ic app
roach
-exam
ine human perception
-attem
pt to use same inform
ation
4
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 27
Micro
ph
on
e arrays
•S
ign
als from
mu
ltiple m
icrop
ho
nes can
be
com
bin
ed to
enh
ance/can
cel certain so
urces
•‘C
oin
ciden
t’ mics w
ith d
iff. directio
nal g
ains
•M
icrop
ho
ne arrays (en
dfi
re)
m1
s1
m2
s2
a21
a22
a12
a11
m1
m2
a11
a12
a21
a22
s1s2
⋅
=
s1 ˆs2 ˆ⇒
A1–
m⋅=D
D+
D+
+
-40 -20 0
λ = 4D
λ = 2Dλ =
D
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 28
Ad
aptive B
eamfo
rmin
g &
Ind
epen
den
t Co
mp
on
ent A
nalysis (IC
A)
•F
orm
ulate m
athem
atical criteria to o
ptim
ize
•B
eamfo
rmin
g: D
rive interferen
ce to zero
-cancel energy during nontarget intervals
•IC
A: m
aximize m
utu
al ind
epen
den
ce of o
utp
uts
-from
higher-order mom
ents during overlap
•L
imited
by separatio
n m
od
el param
eter space
-only N
xN?
m1
m2
s1
s2
a11
a21
a12
a22
x
−δ MutInfoδa
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 29
Bin
aural m
od
els
•H
um
an listen
ers do
better?
-certainly given only 2 channels
•E
xtract ITD
and
IID cu
es?
-cross-correlation finds tim
ing differences-
‘consume’ counter-m
oving pulses-
how to achieve IID
, trading-
vertical cues...
-6-4
-20
24
6lag / m
s
100
200
400
800
1600
3200
Center freq / Hz
Interauralcross-correlation T
arget azimuth
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 30
No
nlin
ear filterin
g
•H
ow
to sep
arate sou
nd
s based
on
directio
n?
-estim
ate direction locally-
choose target direction-
remove energy from
other directions
•E
.g. K
ollm
eier, Peissig
& H
oh
man
’93
-IID
from |L
w |/|Rw |; IT
D (IP
D) from
arg{Lw R
w*}
-m
atch to IID/IP
D tem
plate for desired direction-
also reverberation?
time
frequency
Xw
(mH
,2πk/N
T)
FF
T
analysisM
odulusl
Lw
|Lw |
2
FF
T
analysisr
Rw
Modulus
|Rw |
2
Cross-
correlationL
wR
*w
Sm
ooth(1-a)
(1-az-1)
SLL
Sm
ooth(1-a)
(1-az-1)
Sm
ooth(1-a)
(1-az-1)
SLR
SR
R
Gain
factorcalc
l'
OLA
-F
FT
synthesisr'
OLA
-F
FT
synthesis
g
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 31
Su
mm
ary
•S
patial so
un
d-
sampling at m
ore than one point gives inform
ation on origin direction
•B
inau
ral percep
tion
-tim
e & intensity cues used betw
een/within ears
•S
ou
nd
rend
ering
-conventional stereo
-H
RT
F-based
•S
patial an
alysis-
optimal linear techniques
-elusive auditory m
odels
E6820 S
AP
R - D
an Ellis
L08 - Spatial sound
2006-03-23 - 32
Referen
ces
B.C
.J. Moore, A
n introduction to the psychology of hearing (4th ed.) A
cademic, 1997.
J. Blauert, S
patial Hearing (revised ed.), M
IT P
ress, 1996.
R.O
. Duda, S
ound Localization Research,
http://ww
w.engr.sjsu.edu/~
duda/Duda.R
esearch.frameset.htm
l