Neural Network Acoustic Modelling Across the...

★NNNeural Network Acoustic Modelling

Across the Decades

Steve RenalsUniversity of Edinburgh

s.renals@ed.ac.uk

ICSI, 14 March 2015

BDNN, CDNN, …Neural Network Acoustic Modelling

Across the Decades

s.renals@ed.ac.uk

ICSI, 14 March 2015

s.renals@ed.ac.uk

ICSI, 14 March 2015

DNNDecades of Neural Networks

s.renals@ed.ac.uk

ICSI, 14 March 2015

So how was it in the early 90s?

• Big neural networks trained as acoustic classifiers

• Use an HMM for (limited) sequence processing

• RNNs starting to look attractive as large-scale acoustic models

• Worrying about speaker adaptation, modelling acoustic context, robustness, …

• Use vector processors to train networks quickly

If Moore’s Law had applied to this, then by 2015: 16 billion weights

96 billion examples

Not much has changed?

Context-dependentacoustic modelling

• Then• largely context-independent NNs• some context-dependence

• Now• large scale context-dependent NNs – CD classes

typically (not always) derived from HMM/GMM decision tree

Context-dependentacoustic modelling (then)

Context-dependentacoustic modelling (now)

3-8 hidden layers

~2000 hidden units

~6000 CD phone outputs

~2000 hidden units

9x39 MFCC/LogSpec

Why model context-dependence?

• Divide and conquer the acoustic modelling space, if enough training data

• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones

Why model context-dependence?

• Divide and conquer the acoustic modelling space, if enough training data

• Context-sensitive adaptation of phone models to local acoustic/phonetic context – enhances discrimination of acoustically confusable phones

More important for GMM-based generative modelsthan neural networks?Neural networks focus on class boundaries ratherthan within-class structure

Drawbacks of context-dependent phone models in neural networks

Drawbacks of context-dependent phone models in neural networks• No distinction between different phones and the

same phone in different contexts• leads to hidden units learning discriminations that are not

useful• discriminations depend on particular state clustering

• Data sparsity

• How to avoid these problems?• factorised CDNN• sequence-level discriminative training

Another way?Multitask CD and CI

• Multitask learning – learning CD and CI phones together

• Interpret as regulariser? Compare with pretraining

• unsupervised RBM

• discriminative monophone pretraining (cf curriculum learning)

• Can combine pretraining with multitask learning

6000CD

targets

41 monophone

targets

Acoustic features

Another way?Multitask CD and CI

• Multitask learning – learning CD and CI phones together

• Interpret as regulariser? Compare with pretraining

• unsupervised RBM

• discriminative monophone pretraining (cf curriculum learning)

• Can combine pretraining with multitask learning

6000CD

targets

41 monophone

targets

Acoustic features

Joint workwith Peter Bell

Experiment16

Baseline Multitask

no pretrain

RBM pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP features

Experiment16

Baseline Multitask

no pretrain

RBM pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP features

5% relativeimprovement

MLAN features

• Tandem/bottleneck features from OOD NN

• OOD net trained on Switchboard

OODinputs

targets

In-domaininputs

6000CD

targets

41 monophone

targets

Experiment16

Baseline Multitask

no pretrain

no pretrainmono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features

Experiment16

Baseline Multitask

no pretrain

no pretrainmono pretrain

mono pretrain

TED dev2010+tst2010+tst2011speaker adapted PLP+ MLAN features

3% relativeimprovement

Practical Optimism

3-8 hidden layers

~2000 hidden units

9x39 MFCC/LogSpec

Practical Optimism

3-8 hidden layers

~2000 hidden units

9x39 MFCC/LogSpec

WEIGHT SHARING – adaptation, CNNs

ACTIVATION FUNCTIONS – pooling, RELU, gated units

ACOUSTIC INPUT – learning features?

TRAINING – objective function,

optimisation

ARCHITECTURES – convolutional, recurrent

Closing thoughts• Morgan (and Hervé) set the framework within

which all the current HMM/NN acoustic modelling stuff resides

• What a good idea to build a supercomputer to do NN training at scale 25 years ago

• And quite a few of things that seem to have been forgotten are worth remembering

Closing thoughts• Morgan (and Hervé) set the framework within

which all the current HMM/NN acoustic modelling stuff resides

• What a good idea to build a supercomputer to do NN training at scale 25 years ago

• And quite a few of things that seem to have been forgotten are worth remembering

Thanks!

Neural Network Acoustic Modelling Across the...

Documents

Transcript of Neural Network Acoustic Modelling Across the...

The thermal and storage stability of bovine haemoglobin by ...gala.gre.ac.uk/15256/15/15256_Trivedi_The thermal... · known as CDNN (Applied Photophysics Ltd., Surrey, UK) was used

Neural Networks for Acoustic Modelling part 2; … Networks for Acoustic Modelling part 2; Sequence discriminative training Steve Renals Automatic Speech Recognition { ASR Lecture

Www.envirocoustics.gr, noesis@envirocoustics.gr Advanced Acoustic Emission Data Analysis Pattern Recognition & Neural Networks Software.

A COMPARISON BETWEEN DEEP NEURAL NETS …avnermay/files/ICASSP_2016...A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC MODELS FOR SPEECH RECOGNITION Zhiyun Lu1 yDong Guo 2Alireza

ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH ... · tiﬁcial neural network-based acoustic models, such as deep neural networks, mixture density networks, and long short-term

1 Deep Neural Networks for Acoustic Modeling in …research.google.com/pubs/archive/38131.pdf1 Deep Neural Networks for Acoustic Modeling in Speech Recognition Geoffrey Hinton, Li

CDNN Catalog 2012-3

Locating and extracting acoustic and neural signals

A COMPARISON BETWEEN DEEP NEURAL NETS AND KERNEL ACOUSTIC …

Neural network detection of grinding burn from acoustic ...

Acoustic Signature Recognition of Moving Vehicles … · 183 Acoustic Signature Recognition of Moving Vehicles Using Elman Neural Network 1School of Mechatronic Engineering, Universiti

Combining Visual and Acoustic Speech Signals with a Neural ...papers.nips.cc/paper/290-combining-visual-and... · Combining Visual and Acoustic Speech Signals 235 Along with each

Deep Neural Network Acoustic models for Multi …irep.ntu.ac.uk/id/eprint/27934/1/Nadia.Hmad-2015.pdfDeep Neural Network Acoustic models for Multi-dialect Arabic Speech Recognition

Studies of the Neural Basis of Evasive Flight Behavior in Response to Acoustic Stimulation in

Multivariate and Artificial Neural Network Analyses of AE ... · Keywords: Acoustic emission, signal discrimination, corrosion, damage, pre-stressed concrete, principal component

Advanced Acoustic Emission Data Analysis Pattern ...acoustic-emission.com/pdf/Noesis 5.0.pdfData Analysis Pattern Recognition & Neural Networks Software. ... , sales@pacndt.com DATA

Neural Networks for Acoustic Modelling 2: Hybrid HMM/DNN ...

Supplemental Information Neural Response Phase · PDF fileSupplemental Information. Neural Response Phase . Tracks How Listeners Learn . New Acoustic Representations. Huan Luo, ...

Neural Network Acoustic Models 1: Introduction

Acoustic-Phonetic Approach toward Understanding Neural ...€¦ · Fricatives, Stops, and Affricates Fricatives, stops, and affricates are termed obstruents because they are produced