Neural Network Acoustic Modelling Across the...

43
NN Neural Network Acoustic Modelling Across the Decades Steve Renals University of Edinburgh [email protected] ICSI, 14 March 2015

Transcript of Neural Network Acoustic Modelling Across the...

Page 1: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

★NNNeural Network Acoustic Modelling

Across the Decades

Steve RenalsUniversity of Edinburgh

[email protected]

ICSI, 14 March 2015

Page 2: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

BDNN, CDNN, …Neural Network Acoustic Modelling

Across the Decades

Steve RenalsUniversity of Edinburgh

[email protected]

ICSI, 14 March 2015

Page 3: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

DNN

Steve RenalsUniversity of Edinburgh

[email protected]

ICSI, 14 March 2015

Page 4: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

DNNDecades of Neural Networks

Steve RenalsUniversity of Edinburgh

[email protected]

ICSI, 14 March 2015

Page 5: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

Page 6: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

• Big neural networks trained as acoustic classifiers

So how was it in the early 90s?

Page 7: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

• Big neural networks trained as acoustic classifiers

• Use an HMM for (limited) sequence processing

So how was it in the early 90s?

Page 8: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

• Big neural networks trained as acoustic classifiers

• Use an HMM for (limited) sequence processing

• RNNs starting to look attractive as large-scale acoustic models

So how was it in the early 90s?

Page 9: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

• Big neural networks trained as acoustic classifiers

• Use an HMM for (limited) sequence processing

• RNNs starting to look attractive as large-scale acoustic models

• Worrying about speaker adaptation, modelling acoustic context, robustness, …

So how was it in the early 90s?

Page 10: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

• Big neural networks trained as acoustic classifiers

• Use an HMM for (limited) sequence processing

• RNNs starting to look attractive as large-scale acoustic models

• Worrying about speaker adaptation, modelling acoustic context, robustness, …

• Use vector processors to train networks quickly

So how was it in the early 90s?

Page 11: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

Page 12: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

Page 13: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

If Moore’s Law had applied to this, then by 2015: 16 billion weights

96 billion examples

Page 14: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

Not much has changed?

Page 15: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

So how was it in the early 90s?

Page 16: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling

• Then• largely context-independent NNs• some context-dependence

• Now• large scale context-dependent NNs – CD classes

typically (not always) derived from HMM/GMM decision tree

Page 17: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 18: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 19: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 20: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 21: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 22: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 23: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (then)

Page 24: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Context-dependentacoustic modelling (now)

3-8 hidden layers

~2000 hidden units

~6000 CD phone outputs

~2000 hidden units

9x39 MFCC/LogSpec

Page 25: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Why model context-dependence?

• Divide and conquer the acoustic modelling space, if enough training data

• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones

Page 26: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Why model context-dependence?

• Divide and conquer the acoustic modelling space, if enough training data

• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones

More important for GMM-based generative modelsthan neural networks?Neural networks focus on class boundaries ratherthan within-class structure

Page 27: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Drawbacks of context-dependent phone models in neural networks

Page 28: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the

same phone in different contexts• leads to hidden units learning discriminations that are not

useful• discriminations depend on particular state clustering

Page 29: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the

same phone in different contexts• leads to hidden units learning discriminations that are not

useful• discriminations depend on particular state clustering

• Data sparsity

Page 30: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the

same phone in different contexts• leads to hidden units learning discriminations that are not

useful• discriminations depend on particular state clustering

• Data sparsity

• How to avoid these problems?• factorised CDNN• sequence-level discriminative training

Page 31: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Another way?Multitask CD and CI

• Multitask learning – learning CD and CI phones together

• Interpret as regulariser? Compare with pretraining

• unsupervised RBM

• discriminative monophone pretraining (cf curriculum learning)

• Can combine pretraining with multitask learning

6000CD

targets

41 monophone

targets

Acoustic features

Page 32: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Another way?Multitask CD and CI

• Multitask learning – learning CD and CI phones together

• Interpret as regulariser? Compare with pretraining

• unsupervised RBM

• discriminative monophone pretraining (cf curriculum learning)

• Can combine pretraining with multitask learning

6000CD

targets

41 monophone

targets

Acoustic features

Joint workwith Peter Bell

Page 33: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Experiment16

11

12

13

14

15

WER

/ %

Baseline Multitask

no pretrain

no pretrain

RBM pretrain

RBM pretrain

mono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP features

Page 34: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Experiment16

11

12

13

14

15

WER

/ %

Baseline Multitask

no pretrain

no pretrain

RBM pretrain

RBM pretrain

mono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP features

5% relativeimprovement

Page 35: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

MLAN features

• Tandem/bottleneck features from OOD NN

• OOD net trained on Switchboard

OODinputs

OODCD

targets

In-domaininputs

6000CD

targets

41 monophone

targets

Page 36: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Experiment16

11

12

13

14

15

WER

/ %

Baseline Multitask

no pretrain

no pretrainmono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features

Page 37: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Experiment16

11

12

13

14

15

WER

/ %

Baseline Multitask

no pretrain

no pretrainmono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features

3% relativeimprovement

Page 38: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve
Page 39: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Practical Optimism

Page 40: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Practical Optimism

3-8 hidden layers

~2000 hidden units

~6000 CD phone outputs

~2000 hidden units

9x39 MFCC/LogSpec

Page 41: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Practical Optimism

3-8 hidden layers

~2000 hidden units

~6000 CD phone outputs

~2000 hidden units

9x39 MFCC/LogSpec

WEIGHT SHARING – adaptation, CNNs

ACTIVATION FUNCTIONS – pooling, RELU, gated units

ACOUSTIC INPUT – learning features?

TRAINING – objective function,

optimisation

ARCHITECTURES – convolutional, recurrent

Page 42: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Closing thoughts• Morgan (and Hervé) set the framework within

which all the current HMM/NN acoustic modelling stuff resides

• What a good idea to build a supercomputer to do NN training at scale 25 years ago

• And quite a few of things that seem to have been forgotten are worth remembering

Page 43: Neural Network Acoustic Modelling Across the Decadeshomepages.inf.ed.ac.uk/srenals/morgan2015-srenals.pdf · BDNN, CDNN, … Neural Network Acoustic Modelling Across the Decades Steve

Closing thoughts• Morgan (and Hervé) set the framework within

which all the current HMM/NN acoustic modelling stuff resides

• What a good idea to build a supercomputer to do NN training at scale 25 years ago

• And quite a few of things that seem to have been forgotten are worth remembering

Thanks!