Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold...
Transcript of Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold...
![Page 1: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/1.jpg)
Nonlinear Manifold Learning for Visual Speech Recognition
Christoph Bregler and Stephen Omohundro
University of California, Berkeley & NEC Research Institute, Inc.
1/25
![Page 2: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/2.jpg)
Overview Manifold Learning:
Applications to Lip Tracking, Image Interpolation, and Feature Extraction.
Experiments with a full Speech Recognition System:
A technique for representing and learning smooth nonlinear manifolds is presented and applied to several lip reading tasks. Given a set of points drawn from a smooth manifold in an abstract feature space, the technique is capable of determining the structure of the surface and of finding the closest manifold point to a given query point.
We use this technique to learn the "space of lips" in a visual speech recognition task. The learned manifold is used for tracking and extracting the lips, for interpolating between frames in an image sequence and for providing features for recognition.
We describe a system based on Hidden Markov Models and this learned lip manifold that significantly improves the performance of acoustic speech recognizers in degraded environments. We also present preliminary results on a purely visual lip reader.
2125
![Page 3: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/3.jpg)
----
Problem
1. Boundary Tracking
-Acoustic Features 2. Image Interpolation
Vt V2 V3 V4 V5 V6 V73. Lip Features At A2A3 A4 A5 A6A7
4. Speech Recognition [ HMM ]
3/25
![Page 4: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/4.jpg)
--
Full Reeo gnition System
"Visual Speech"
"Contour-Surfing" MLPIHMM"Eigenlips"
HybridLip-Interpolation Speech-Recognition
System Acoustic Speech
'pi ",1 RASTA-PLP
I , ~' r
- ~ -- /
@~ Full Sentence Car-Noise Cross-Talk
4/25
![Page 5: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/5.jpg)
Lip Tracking using Active Contour Models
"Snake"Tracking
Snake Model = "Controlled Continuity Spline"
- 5/25
![Page 6: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/6.jpg)
Space ofContours as Constrained Manifold
N Dimensional Features are Points in N Dimensional Space.
N Dimensional Feature Space
K Dimensional Subspace
6/25
![Page 7: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/7.jpg)
••• • • • • • • • • • • • • • • • • • •
Abstract Manifold Learning Task
/'/_.---..• / '"
\( " J \
\ II
• . ... --- ) I I \
\ i I
) \
t
, .. ""---..,._/
Mixture of LearnedSamples local adaptive Patches Representation
7/25
![Page 8: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/8.jpg)
Mixture ofPatches Architecture
~.L.L.L.Luence ?'~,
ction
PI
Linear Patch
I,Gi (x) · Pi (x)= ___.i ____P(x) I,Gi (x)
i
8/25
![Page 9: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/9.jpg)
Manifold Learning
• Initial Estimate
- Cluster Feature Space (K-Means)
- LocalPCA
• Minimize MSE between Training Set and Projection
- EM Algorithm
- Gradient Descent
• First Best Model Merging
9125
![Page 10: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/10.jpg)
Synthetic Examples
Learned surface
Training Salllples 10/25
![Page 11: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/11.jpg)
•• •• •••
Comparison to Nearest Neighbor
MSE 0.3
.,,1...Surface Priornearest -
point 0.2
query
I
• ,,
0.1
"
0.0 ;:
50
11125
![Page 12: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/12.jpg)
Manifold Dimension ?
Eigen Value 180.00
160.00
140.00
120.00
100.00
80.00
60.00
40.00
20.00
o.oo...a...:
0.00
1st Eigenvalue
... 2nd Eigenvalue ,
, " ,
, , "
"
, J
,,
,,,
" "
,,,.
, " ... 3rd Eigenvalue
, ."......."..../ ...........
...................................." ....""",...."....".'
50.00 100.00 150.00 200.00 Samples per peA
12/25
![Page 13: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/13.jpg)
~ ...... -Directed Graylevel Gradients along Snake
Manifold-based Snake Tracking
LIP-Space
Learned Manifold
Jl -.-
Manifold Snake Energy =Image Term+ Learned Manifold Term
Use Manifold-Snakes to estimate position, scale, and rotation for gray level matrix extraction.
13125
![Page 14: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/14.jpg)
Graylevel Manifolds
Linear
Linear Lip-Space
=>
Nonlinear
Real Lip-Space
Dimension Reduction Dimension Reduction
14/25
![Page 15: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/15.jpg)
Linear vs. Nonlinear Interpolation
, / ....
Graylevel
Dimensions
....... Leam nonlinear subspace of "legal" images
....... constrain interpolated images to subspace
(16x16 pixel = 256 dim. space)
15/25
![Page 16: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/16.jpg)
Nonlinear Interpolation Technique
___.-~--i~~> /
Y :,...
"Manifold-Snake" "-
'\,
~
2E = L IVi- 1 - 2Vi + V i + 11+ distance (Vi) i
- Begin with sequence of linear interpolated points - Iteratively move points toward manifold - Iterations are equal to gradient descent steps using the Manifold-Snake Energy
.,,"'- .,., ", ...... - ........ ; ..".-...... #""•• - ••", ,.., -"\ , '. , I '.l
j • I i i j 'j,, ....... .., "I
~~~...~~ ~..... • "'f \ I \. ......_,,/ ..._" .. ... _wl' .. ~.~~" -"'11_.....,,,,I
alter. 1 Iter. 2 Iter. 5 Iter. lOIter.
16/25
![Page 17: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/17.jpg)
![Page 18: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/18.jpg)
Natural Lip Images Linear Interpolation Nonlinear Interpolation
24x24 Images projected in a 32-D linear space and 8-D nonlinear manifold
18/25
![Page 19: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/19.jpg)
![Page 20: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/20.jpg)
Phone Probability Estimation -2
P(phone I acoustics, visual-input)
00000 63 Softmax Prob. Outputs Multi-Layer Perceptron
256-512 Hidden Units
t 19 frames
~~---""".,... ~ ~
9 features 10 Eigen-Features (1 Energy, 8 RASTA-PLP) (+ 10 Delta-Eigen-Features)
~ -
20125
![Page 21: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/21.jpg)
Hidden Markov Models
ViterbiApproximation
ele e e
•e
010 0 0
e01.
• •e
0 0
•0
•0
0 e•0 • 0
Mi (Word-Model)
Likelihood: P(A, VIMi)
Ol~ III
•0 P (phone lA, V)
III P (A, Vlphone) oc P (phone)
e~ Ol~
... Time
21125
![Page 22: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/22.jpg)
Reeo gnition Results
Task
clean
20db SNR
lOdbSNR
15dbSNR crosstalk
Acoustic
11.0 %
33.5 %
56.1 %
67.3 %
Eigenlips
10.1 %
28.9%
51.7 %
51.7%
Delta-Lips Rel.Err.Red.
11.3 % -26.0% 22.4 %
48.0% 14.4 %
46.0% 31.6 %
Table 1: Spelling Task (6 Speakers) Word Error (wrong + insertion + deletion)
Task Pure Lipreading-Error
Bar-Tender 4.5%
Table 2: Pure Lipreading (1 Speaker, "Bar-Thsk": 4 Cocktail-Names)
22/25
![Page 23: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/23.jpg)
Related Work
Related Linear Representations:
Kirby et aI., Pentland et aI.: Linear Subspaces Simard: Tangent Distance
Related Nonlinear Representation:
Jacobs, Jordan, Nowlan, Hinton: Mixture of Experts Kambhatla, Leen: Local peA
Related Interpolation Techniques:
Poggio et aI.: RBFs for Image Interpolation
23/25
![Page 24: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/24.jpg)
Computer Lipreading History
- U.S.Patent 3192321 (1965) E. Nassimbene (IBM)
- E. Petajan (1984), Dissertation
- B. Yuhas, M. Goldstein, T. Sejnowski (1989)
- K. Mase, A. Pentland (1991)
- O. Garcia, A. Goldschen, E. Petajan (1992)
- D. Stork, G. Wolff, K. V. Prasad, M. Hennecke (1992)
- P.L. Silsbee (1993), Dissertation
- A.f. Goldschen (1993), Dissertation
- f. Movellan (1994)
24125
![Page 25: Nonlinear Manifold Learning for Visual Speech Recognition · 2009-03-29 · Overview . Manifold Learning: Applications to Lip Track ing, Image Interpolation, and Feature Extraction.](https://reader034.fdocuments.in/reader034/viewer/2022042804/5f539a42bcd7852ecf2c81c2/html5/thumbnails/25.jpg)
Summary
Lipreading:
- Improves Speech Recognition Performance significantly - Full System Solution
Manifold Learning:
- New fundamental technique for learning in geometric domains
- Applicable to variety of different tasks
25125