Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

20
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering Lab University of Florida

description

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model. Mark Skowronski and John Harris Computational Neuro-Engineering Lab University of Florida. Automatic Speech Recognition Using an Echo State Network. Mark Skowronski and John Harris - PowerPoint PPT Presentation

Transcript of Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Page 1: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Minimum Mean Squared Error Time Series Classification Using an Echo

State Network Prediction Model

Mark Skowronski and John Harris

Computational Neuro-Engineering Lab

University of Florida

Page 2: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Automatic Speech Recognition Using an Echo State Network

Mark Skowronski and John Harris

Computational Neuro-Engineering Lab

University of Florida

Page 3: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Transformation of a graduate student

20062000

Page 4: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Motivation: Man vs. Machine

Wall Street Journal/Broadcast news readings, 5000 words

Untrained human listeners vs. Cambridge HTK LVCSR system

Page 5: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Overview

• Why is ASR so poor?• Hidden Markov Model (HMM)• Echo state network (ESN)• ESN applied to speech• Conclusions

Page 6: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ASR State of the Art

• Feature extraction: MFCC vs. HFCC*

• Acoustic pattern rec: HMM

• Language models

*Skowronski & Harris. JASA, (3):1774–1780, 2004.

... m1 m2 m3 m4 m5 m6

frequency… coefficients

Page 7: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Hidden Markov ModelPremier stochastic model of non-stationary time series used for decision making.

Assumptions:

1) Speech is piecewise-stationary process.

2) Features are independent.

3) State duration is exponential.

4) State transition prob. function of previous-next state only.

Page 8: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ASR Example• Isolated English digits “zero” - “nine” from TI46:

8 male, 8 female, 26 utterances each, fs=12.5 kHz.

• 10 word models, various #states and #gaussians/state.

• Features: 13 HFCC, 100 fps, Hamming window, pre-emphasis (α=0.95), CMS, Δ+ΔΔ (±4 frames)

• Pre-processing: zero-mean and whitening transform

• M1/F1: testing; M2/F2: validation; M3-M8/F3-F8 training

• Test: corrupted by additive noise from “real” sources (subway, babble, car, exhibition hall, restaurant, street, airport terminal, train station)

Page 9: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

HMM Test Results

Page 10: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Overcoming the limitations of HMMs

• HMMs do not take advantage of the dynamics of speech

• Well known HMM limitations include:– Only the present state affects transition

probabilities– Successive observations are independent– Assumes static density models

Need an architecture that better captures the dynamics of speech

Page 11: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Echo State NetworkRecurrent neural network proposed by Jaeger 2001

L MI AN PE PA ER R

W

Win

dx

dy

Recurrent “reservoir” of nonlinear processing elements with random untrained weights.

Linear readout, easily trained weights.

Note similarities to Liquid State Machine

Wout

random untrained input weights.

Page 12: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ESN Diagram & Equations

)()(

))()1(()(

nn

nnfn

out

in

xWy

uWxWx

Page 13: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

How to classify with predictors

Build 10 word models that are trained to predict the future of each of the 10 digits

Z-1

0

1

2

8

9

?

The best predictor determines the class

Not a new idea!

Page 14: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ESN Training

• Minimize mean-squared error between y(n) and desired signal d(n).

1

1( ( ) ( ) ) ( ( ) ( ) )

out

T Tout n n n n

W R p

W x x x d

Wiener solution:

Page 15: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Multiple Readout Filters

• Need good predictors for separation of classes

• One linear filter will give mediocre prediction.• Question: how to divide reservoir space and

use multiple readout filters?• Answer: competitive network of filters

• Question: how to train/test competitive network of K filters?

• Answer: mimic HMM.

],1[),()( Kknn kout

k xWy

Page 16: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ASR Example

• Same spoken digit experiment as before.

• ESN: M=60 PEs, r=2.0, rin=0.1, 10 word models, various #states and #filters/state.

• Identical pre-processing and input features• Desired signal: next frame of 39-dimension

features

Page 17: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ESN Results

Page 18: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

ESN/HMM Comparison

Page 19: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

Conclusions

• ESN classifies by predicting• Multiple filters mimic sequential nature of HMMs• ESN classifier noise robust compared to HMM:

– Ave. over all sources, 0-20 dB SNR: +21 percentage points

– Ave. over all sources: +9 dB SNR

• ESN reservoir provides a dynamical model of the

history of the speech.

Questions?

Page 20: Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model

HMM vs. ESN Classifier

HMM ESN ClassifierOutput Likelihood MSE

Architecture States, left-to-right States, left-to-right

Minimum element

Gaussian kernel Readout filter

Elements combined

GMM Winner-take-all

Transitions State transition matrix Binary switching matrix

Training Segmental K-means (Baum-Welch)

Segmental K-means

Discriminatory No Maybe, depends on desired signal