DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I...
Transcript of DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I...
![Page 1: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/1.jpg)
DT2118Speech and Speaker Recognition
Language Modelling
Giampiero Salvi
KTH/CSC/TMH [email protected]
VT 2015
1 / 56
![Page 2: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/2.jpg)
Outline
Introduction
Formal Language Theory
Stochastic Language Models (SLM)N-gram Language ModelsN-gram SmoothingClass N-gramsAdaptive Language Models
Language Model Evaluation
2 / 56
![Page 3: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/3.jpg)
Outline
Introduction
Formal Language Theory
Stochastic Language Models (SLM)N-gram Language ModelsN-gram SmoothingClass N-gramsAdaptive Language Models
Language Model Evaluation
3 / 56
![Page 4: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/4.jpg)
Components of ASR System
Speech SignalSpectralAnalysis
FeatureExtraction
Searchand Match
Recognised Words
Acoustic Models
Lexical Models
Language Models
Representation
Constraints - KnowledgeDecoder
Language Models
4 / 56
![Page 5: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/5.jpg)
Why do we need language models?
Bayes’ rule:
P(words|sounds) =P(sounds|words)P(words)
P(sounds)
whereP(words): a priori probability of the words(Language Model)
We could use non informative priors(P(words) = 1/N), but. . .
5 / 56
![Page 6: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/6.jpg)
Branching Factor
I if we have N words in the dictionary
I at every word boundary we have to consider Nequally likely alternatives
I N can be in the order of millions
word
word1
word2
. . .
wordN
6 / 56
![Page 7: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/7.jpg)
Ambiguity
“ice cream” vs “I scream”
/aI s k ô i: m/
7 / 56
![Page 8: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/8.jpg)
Language Models in ASR
We want to:
1. limit the branching factor in the recognitionnetwork
2. augment and complete the acousticprobabilities
I we are only interested to know if the sequenceof words is plausible grammatically or not
I this kind of grammar is integrated in therecognition network prior to decoding
8 / 56
![Page 9: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/9.jpg)
Language Models in Dialogue Systems
I we want to assign a class to each word (noun,verb, attribute. . . parts of speech)
I parsing is usually performed on the output of aspeech recogniser
The grammar is used twice in a Dialogue System!!
9 / 56
![Page 10: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/10.jpg)
Language Models in ASR
I small vocabulary: often formal grammarspecified by hand
I example: loop of digits as in the HTK exercise
I large vocabulary: often stochastic grammarestimated from data
10 / 56
![Page 11: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/11.jpg)
Outline
Introduction
Formal Language Theory
Stochastic Language Models (SLM)N-gram Language ModelsN-gram SmoothingClass N-gramsAdaptive Language Models
Language Model Evaluation
11 / 56
![Page 12: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/12.jpg)
Formal Language Theory
grammar: formal specification of permissiblestructures for the language
parser: algorithm that can analyse a sentenceand determine if its structure iscompliant with the grammar
12 / 56
![Page 13: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/13.jpg)
Chomsky’s formal grammar
Noam Chomsky: linguist, philosopher, . . .
G = (V ,T ,P , S)
where
V : set of non-terminal constituentsT : set of terminals (lexical items)P : set of production rulesS : start symbol
13 / 56
![Page 14: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/14.jpg)
Chomsky’s formal grammar
Noam Chomsky: linguist, philosopher, . . .
G = (V ,T ,P , S)
where
V : set of non-terminal constituentsT : set of terminals (lexical items)P : set of production rulesS : start symbol
13 / 56
![Page 15: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/15.jpg)
ExampleS = sentenceV = {NP (noun phrase),
NP1, VP (verbphrase), NAME, ADJ,V (verb), N (noun)}
T = {Mary , person , loves, that , . . . }
P = {S → NP VPNP → NAMENP → ADJ NP1NP1 → NVP → VERB NPNAME → MaryV → lovesN → personADJ → that }
S
NP
NAME
Mary
VP
V
loves
NP
ADJ
that
NP1
N
person
14 / 56
![Page 16: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/16.jpg)
ExampleS = sentenceV = {NP (noun phrase),
NP1, VP (verbphrase), NAME, ADJ,V (verb), N (noun)}
T = {Mary , person , loves, that , . . . }
P = {S → NP VPNP → NAMENP → ADJ NP1NP1 → NVP → VERB NPNAME → MaryV → lovesN → personADJ → that }
S
NP
NAME
Mary
VP
V
loves
NP
ADJ
that
NP1
N
person
14 / 56
![Page 17: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/17.jpg)
Chomsky’s hierarchy
Greek letters: sequence of terminals ornon-terminalsUpper-case Latin letters: single non-terminalLower-case Latin letters: single terminal
Types Constraints AutomataPhrase structuregrammar
α → β. This is the most generalgrammar
Turing ma-chine
Context-sensitivegrammar
length of α ≤ length of β Linearbounded
Context-freegrammar
A → β. Equivalent to A → w ,A →BC
Push down
Regular grammar A→ w ,A→ wB Finite-state
Context-free and regular grammars are used inpractice
15 / 56
![Page 18: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/18.jpg)
Chomsky’s hierarchy
Greek letters: sequence of terminals ornon-terminalsUpper-case Latin letters: single non-terminalLower-case Latin letters: single terminal
Types Constraints AutomataPhrase structuregrammar
α → β. This is the most generalgrammar
Turing ma-chine
Context-sensitivegrammar
length of α ≤ length of β Linearbounded
Context-freegrammar
A → β. Equivalent to A → w ,A →BC
Push down
Regular grammar A→ w ,A→ wB Finite-state
Context-free and regular grammars are used inpractice
15 / 56
![Page 19: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/19.jpg)
Are languages context-free?
Mostly true, with exceptions
Swiss German:“. . . das mer d’chind em Hans es huus lond hafteaastriiche”
Word-by-word:“. . . that we the children Hans the house let helppaint”
Translation:“. . . that we let the children help Hans paint thehouse”
16 / 56
![Page 20: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/20.jpg)
Parsers
I assign each word in a sentence to a part ofspeech
I originally developed for programming languages(no ambiguities)
I only available for context-free and regulargrammars
I top-down: start with S and generate rules untilyou reach the words (terminal symbols)
I bottom-up: start with the words and work yourway up until you reach S
17 / 56
![Page 21: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/21.jpg)
Example: Top-down parser
Parts of speech RulesS
NP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 22: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/22.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VP
NAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 23: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/23.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAME
Mary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 24: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/24.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → Mary
Mary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 25: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/25.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NP
Mary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 26: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/26.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → loves
Mary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 27: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/27.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1
Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 28: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/28.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → that
Mary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 29: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/29.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → N
Mary loves that person N → person
18 / 56
![Page 30: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/30.jpg)
Example: Top-down parser
Parts of speech RulesSNP VP S → NP VPNAME VP NP → NAMEMary VP NAME → MaryMary V NP VP → V NPMary loves NP V → lovesMary loves ADJ NP1 NP → ADJ NP1Mary loves that NP1 ADJ → thatMary loves that N NP1 → NMary loves that person N → person
18 / 56
![Page 31: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/31.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that person
NAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 32: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/32.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → Mary
NAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 33: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/33.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → loves
NAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 34: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/34.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → that
NAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 35: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/35.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → person
NP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 36: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/36.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAME
NP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 37: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/37.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → N
NP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 38: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/38.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1
NP VP VP → V NPS S → NP VP
19 / 56
![Page 39: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/39.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NP
S S → NP VP
19 / 56
![Page 40: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/40.jpg)
Example: Bottom-up parser
Parts of speech RulesMary loves that personNAME loves that person NAME → MaryNAME V that person V → lovesNAME V ADJ person ADJ → thatNAME V ADJ N N → personNP V ADJ N NP → NAMENP V ADJ NP1 NP1 → NNP V NP NP → ADJ NP1NP VP VP → V NPS S → NP VP
19 / 56
![Page 41: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/41.jpg)
Top-down vs bottom-up parsers
I Top-down characteristics:+ very predictive+ only consider grammatical combinations– predict constituents that do not have a match in
the text
I Bottom-up characteristics:+ check input text only once+ suitable for robust language processing– may build trees that do not lead to full parse
I All in all, similar performance
20 / 56
![Page 42: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/42.jpg)
Chart parsing (dynamic programming)
Name[1,1] Mary
Mary loves that person
21 / 56
![Page 43: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/43.jpg)
Chart parsing (dynamic programming)
S NP ° VPV[2,2] loves
Name MaryNP Name
Mary loves that person
21 / 56
![Page 44: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/44.jpg)
Chart parsing (dynamic programming)
V lovesVP V °NP
Name MaryNP NameS NP °VP
Mary loves that person
ADJ that
21 / 56
![Page 45: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/45.jpg)
Chart parsing (dynamic programming)
ADJ that
NP ADJ ° NP1S NP °VP
V lovesVP V °NP
Name MaryNP NameS NP °VP
Mary loves that person
N person
21 / 56
![Page 46: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/46.jpg)
Chart parsing (dynamic programming)
ADJ that
NP ADJ ° NP1S NP °VP
V lovesVP V °NP
Name MaryNP NameS NP °VP
Mary loves that person
N personNP1 N
NP ADJ NP1
VP V NP
S NP VP
21 / 56
![Page 47: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/47.jpg)
Outline
Introduction
Formal Language Theory
Stochastic Language Models (SLM)N-gram Language ModelsN-gram SmoothingClass N-gramsAdaptive Language Models
Language Model Evaluation
22 / 56
![Page 48: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/48.jpg)
Stochastic Language Models (SLM)
1. formal grammars lack coverage (for generaldomains)
2. spoken language does not follow strictly thegrammar
Model sequences of words statistically:
P(W ) = P(w1w2 . . .wn)
23 / 56
![Page 49: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/49.jpg)
Probabilistic Context-free grammars(PCFGs)
Assign probabilities to generative rules:
P(A→ α|G )
Then calculate probability of generating a wordsequence w1w2 . . .wn as probability of the rulesnecessary to go from S to w1w2 . . .wn:
P(S ⇒ w1w2 . . .wn|G )
24 / 56
![Page 50: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/50.jpg)
Training PCFGs
If annotated corpus, Maximum Likelihood estimate:
P(A→ αj) =C (A→ αj)∑mi=1 C (A→ αi)
If non-annotated corpus: inside-outside algorithm(similar to HMM training, forward-backward)
25 / 56
![Page 51: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/51.jpg)
Independence assumption
S
NP
NAME
Mary
VP
V
loves
NP
ADJ
that
NP1
N
person26 / 56
![Page 52: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/52.jpg)
Inside-outside probabilitiesChomsky’s normal forms: Ai → AmAn or Ai → wl
inside(s,Ai , t) = P(Ai ⇒ wsws+1 . . .wt)
outside(s,Ai , t) = P(S ⇒ w1 . . .ws−1 Ai wt+1 . . .wT )
Ai
w w w w w ws s t t T1 1 1... ... ...- +
S
27 / 56
![Page 53: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/53.jpg)
Probabilistic Context-freegrammars:limitations
I probabilities help sorting alternativeexplanations, but
I still problem with coverage: the productionrules are hand made
P(A→ α|G )
28 / 56
![Page 54: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/54.jpg)
N-gram Language Models
Flat model: no hierarchical structure
P(W) = P(w1,w2, . . . ,wn)
= P(w1)P(w2|w1)P(w3|w1,w2) · · ·P(wn|w1,w2 . . . ,wn−1)
=n∏
i=1
P(wi |w1,w2, . . . ,wi−1)
Approximations:P(wi |w1,w2, . . . ,wi−1) = P(wi) (Unigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−1) (Bigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−2,wi−1) (Trigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−N+1, . . . ,wi−1) (N-gram)
29 / 56
![Page 55: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/55.jpg)
N-gram Language Models
Flat model: no hierarchical structure
P(W) = P(w1,w2, . . . ,wn)
= P(w1)P(w2|w1)P(w3|w1,w2) · · ·P(wn|w1,w2 . . . ,wn−1)
=n∏
i=1
P(wi |w1,w2, . . . ,wi−1)
Approximations:P(wi |w1,w2, . . . ,wi−1) = P(wi) (Unigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−1) (Bigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−2,wi−1) (Trigram)P(wi |w1,w2, . . . ,wi−1) = P(wi |wi−N+1, . . . ,wi−1) (N-gram)
29 / 56
![Page 56: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/56.jpg)
Example (Bigram)
P(Mary , loves, that, person) =
P(Mary |<s>)P(loves|Mary)P(that|loves)
P(person|that)P(</s>|person)
30 / 56
![Page 57: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/57.jpg)
N-gram estimation (Maximum Likelihood)
P(wi |wi−N+1, . . . ,wi−1) =C (
N︷ ︸︸ ︷wi−N+1, . . . ,wi−1,wi)
C (wi−N+1, . . . ,wi−1︸ ︷︷ ︸N−1
)
=C (wi−N+1, . . . ,wi−1,wi)∑wiC (wi−N+1, . . . ,wi−1,wi)
Problem: data sparseness
31 / 56
![Page 58: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/58.jpg)
N-gram estimation (Maximum Likelihood)
P(wi |wi−N+1, . . . ,wi−1) =C (
N︷ ︸︸ ︷wi−N+1, . . . ,wi−1,wi)
C (wi−N+1, . . . ,wi−1︸ ︷︷ ︸N−1
)
=C (wi−N+1, . . . ,wi−1,wi)∑wiC (wi−N+1, . . . ,wi−1,wi)
Problem: data sparseness
31 / 56
![Page 59: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/59.jpg)
N-gram estimation example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = C(<s>,John)C(<s>) = 2
3
P(read|John) = C(John,read)
C(John)= 2
2
P(a|read) = C(read,a)C(read)
= 23
P(book|a) = C(a,book)C(a) = 1
2
P(< /s > |book) =C(book,</s>)
C(book) = 23
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.148
P(Mulan, read, a, book) = P(Mulan| < s >) · · · = 0
32 / 56
![Page 60: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/60.jpg)
N-gram estimation example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = C(<s>,John)C(<s>) = 2
3
P(read|John) = C(John,read)
C(John)= 2
2
P(a|read) = C(read,a)C(read)
= 23
P(book|a) = C(a,book)C(a) = 1
2
P(< /s > |book) =C(book,</s>)
C(book) = 23
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.148
P(Mulan, read, a, book) = P(Mulan| < s >) · · · = 0
32 / 56
![Page 61: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/61.jpg)
N-gram estimation example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = C(<s>,John)C(<s>) = 2
3
P(read|John) = C(John,read)
C(John)= 2
2
P(a|read) = C(read,a)C(read)
= 23
P(book|a) = C(a,book)C(a) = 1
2
P(< /s > |book) =C(book,</s>)
C(book) = 23
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.148
P(Mulan, read, a, book) = P(Mulan| < s >) · · · = 0
32 / 56
![Page 62: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/62.jpg)
N-gram Smoothing
Problem:
I Many very possible word sequences may havebeen observed in zero or very low numbers inthe training data
I Leads to extremely low probabilities, effectivelydisabling this word sequence, no matter howstrong the acoustic evidence is
Solution: smoothing
I produce more robust probabilities for unseendata at the cost of modelling the training dataslightly worse
33 / 56
![Page 63: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/63.jpg)
Simplest Smoothing technique
Instead of ML estimate
P(wi |wi−N+1, . . . ,wi−1) =C (wi−N+1, . . . ,wi−1,wi)∑wiC (wi−N+1, . . . ,wi−1,wi)
Use
P(wi |wi−N+1, . . . ,wi−1) =1 + C (wi−N+1, . . . ,wi−1,wi)∑wi
(1 + C (wi−N+1, . . . ,wi−1,wi))
I prevents zero probabilities
I but still very low probabilities
34 / 56
![Page 64: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/64.jpg)
N-gram simple smoothing example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = 1+C(<s>,John)11+C(<s>) = 3
14
P(read|John) = 1+C(John,read)
11+C(John)= 3
13
. . .
P(Mulan| < s >) = 1+C(<s>,Mulan)11+C(<s>) = 1
14
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.00035(0.148)
P(Mulan, read, a, book) = P(Mulan| < s >)P(read|Mulan)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.000084(0)
35 / 56
![Page 65: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/65.jpg)
N-gram simple smoothing example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = 1+C(<s>,John)11+C(<s>) = 3
14
P(read|John) = 1+C(John,read)
11+C(John)= 3
13
. . .
P(Mulan| < s >) = 1+C(<s>,Mulan)11+C(<s>) = 1
14
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.00035(0.148)
P(Mulan, read, a, book) = P(Mulan| < s >)P(read|Mulan)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.000084(0)
35 / 56
![Page 66: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/66.jpg)
N-gram simple smoothing example
Corpus:1: John read her book2: I read a different book3: John read a book by Mulan
P(John| < s >) = 1+C(<s>,John)11+C(<s>) = 3
14
P(read|John) = 1+C(John,read)
11+C(John)= 3
13
. . .
P(Mulan| < s >) = 1+C(<s>,Mulan)11+C(<s>) = 1
14
P(John, read, a, book) = P(John| < s >)P(read|John)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.00035(0.148)
P(Mulan, read, a, book) = P(Mulan| < s >)P(read|Mulan)P(a|read) · · ·P(book|a)P(< /s > |book) = 0.000084(0)
35 / 56
![Page 67: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/67.jpg)
Interpolation vs Backoff smoothing
Interpolation models:
I Linear combination with lower order n-grams
I Modifies the probabilities of both nonzero andzero count n-grams
Backoff models:
I Use lower order n-grams when the requestedn-gram has zero or very low count in thetraining data
I Nonzero count n-grams are unchanged
I Discounting: Reduce the probability of seenn-grams and distribute among unseen ones
36 / 56
![Page 68: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/68.jpg)
Interpolation vs Backoff smoothing
Interpolation models:
Psmooth(wi |wi−N+1, . . . ,wi−1) = λ
N︷ ︸︸ ︷PML(wi |wi−N+1, . . . ,wi−1) +
(1− λ)
N−1︷ ︸︸ ︷Psmooth(wi |wi−N+2, . . . ,wi−1)
Backoff models:
Psmooth(wi |wi−N+1, . . . ,wi−1) = α
N︷ ︸︸ ︷P(wi |wi−N+1, . . . ,wi−1) if C (wi |wi−N+1, . . . ,wi−1) > 0
γ
N−1︷ ︸︸ ︷Psmooth(wi |wi−N+2, . . . ,wi−1) if C (wi |wi−N+1, . . . ,wi−1) = 0
37 / 56
![Page 69: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/69.jpg)
Deleted interpolation smoothing
Recursively interpolate with n-grams of lower order:if historyn = wi−n+1, . . . ,wi−1
PI (wi |historyn) = λhistorynP(wi |historyn) +
(1− λhistoryn)PI (wi |historyn−1)
I hard to estimate λhistorynfor every history
I cluster into moderate number of weights
38 / 56
![Page 70: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/70.jpg)
Backoff smoothing
Use P(wi |historyn−1) only if you lack data forP(wi |historyn)
39 / 56
![Page 71: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/71.jpg)
Good-Turing estimate
I Partition n-grams into groups depending ontheir frequency in the training data
I Change the number of occurrences of ann-gram according to
r ∗ = (r + 1)nr+1
nr
where r is the occurrence number, nr is thenumber of n-grams that occur r times
40 / 56
![Page 72: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/72.jpg)
Katz smoothing
based on Good-Turing: combine higher and lowerorder n-gramsFor every N-gram:
1. if count r is large (> 5 or 8), do not change it
2. if count r is small but non-zero, discount with≈ r ∗
3. if count r = 0, reassign discounted counts withlower order N-gram
C ∗(wi−1,wi) = α(wi−1)P(wi)
41 / 56
![Page 73: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/73.jpg)
Kneser-Ney smoothing: motivation
Background
I Lower order n-grams are often used as backoff model if the countof a higher-order n-gram is too low (e.g. unigram instead ofbigram)
Problem
I Some words with relatively high unigram probability only occur ina few bigrams. E.g. Francisco, which is mainly found in SanFrancisco. However, infrequent word pairs, such as New Francisco,will be given too high probability if the unigram probabilities ofNew and Francisco are used. Maybe instead, the Franciscounigram should have a lower value to prevent it from occurring inother contexts.
I can’t see without my reading. . .
42 / 56
![Page 74: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/74.jpg)
Kneser-Ney intuition
If a word has been seen in many contexts it is morelikely to be seen in new contexts as well.
I instead of backing off to lower order n-gram,use continuation probability
Example: instead of unigram P(wi), use
PCONTINUATION(wi) =|{wi−1 : C (wi−1wi) > 0}|∑wi|{wi−1 : C (wi−1wi) > 0}|
I can’t see without my reading. . . glasses
43 / 56
![Page 75: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/75.jpg)
Kneser-Ney intuition
If a word has been seen in many contexts it is morelikely to be seen in new contexts as well.
I instead of backing off to lower order n-gram,use continuation probability
Example: instead of unigram P(wi), use
PCONTINUATION(wi) =|{wi−1 : C (wi−1wi) > 0}|∑wi|{wi−1 : C (wi−1wi) > 0}|
I can’t see without my reading. . . glasses
43 / 56
![Page 76: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/76.jpg)
Class N-grams
1. Group words into semantic or grammaticalclasses
2. build n-grams for class sequences:
P(wi |ci−N+1 . . . ci−1) = P(wi |ci)P(ci |ci−N+1 . . . ci−1)
I rapid adaptation, small training sets, smallmodels
I works on limited domains
I classes can be rule-based or data-driven
44 / 56
![Page 77: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/77.jpg)
Combining PCFGs and N-grams
Only N-grams:
Meeting at three with Zhou LiMeeting at four PM with Derek
P(Zhou|three,with) and P(Derek|PM,with))
N-grams + CFGs:
Meeting {at three: TIME} with {Zhou Li: NAME}Meeting {at four PM: TIME} with {Derek: NAME}
P(NAME|TIME,with)
45 / 56
![Page 78: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/78.jpg)
Adaptive Language Models
I conversational topic is not stationary
I topic stationary over some period of time
I build more specialised models that can adapt intime
Techniques
I Cache Language Models
I Topic-Adaptive Models
I Maximum Entropy Models
46 / 56
![Page 79: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/79.jpg)
Cache Language Models
1. build a full static n-gram model
2. during conversation accumulate low ordern-grams
3. interpolate between 1 and 2
47 / 56
![Page 80: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/80.jpg)
Topic-Adaptive Models
1. cluster documents into topics (manually ordata-driven)
2. use information retrieval techniques withcurrent recognition output to select the rightcluster
3. if off-line run recognition in several passes
48 / 56
![Page 81: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/81.jpg)
Maximum Entropy ModelsInstead of linear combination:
1. reformulate information sources into constraints
2. choose maximum entropy distribution thatsatisfies the constraints
Constraints general form:∑X
P(X )fi(X ) = Ei
Example: unigram
fwi=
{1 if w = wi
0 otherwise
49 / 56
![Page 82: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/82.jpg)
Maximum Entropy ModelsInstead of linear combination:
1. reformulate information sources into constraints
2. choose maximum entropy distribution thatsatisfies the constraints
Constraints general form:∑X
P(X )fi(X ) = Ei
Example: unigram
fwi=
{1 if w = wi
0 otherwise
49 / 56
![Page 83: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/83.jpg)
Outline
Introduction
Formal Language Theory
Stochastic Language Models (SLM)N-gram Language ModelsN-gram SmoothingClass N-gramsAdaptive Language Models
Language Model Evaluation
50 / 56
![Page 84: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/84.jpg)
Language Model Evaluation
I Evaluation in combination with SpeechRecogniser
I hard to separate contribution of the two
I Evaluation based on probabilities assigned totext in the training and test set
51 / 56
![Page 85: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/85.jpg)
Information, Entropy, Perplexity
Information:
I (xi) = log1
P(xi)
Entropy:
H(X ) = E [I (X )] = −∑i
P(xi) logP(xi)
Perplexity:PP(X ) = 2H(X )
52 / 56
![Page 86: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/86.jpg)
Perplexity of a model
We do not know the “true” distributionp(w1, . . . ,wn). But we have a modelm(w1, . . . ,wn). The cross-entropy is:
H(p,m) = −∑
w1,...,wn
p(w1, . . . ,wn) logm(w1, . . . ,wn)
Cross-entropy is upper bound to entropy:
H ≤ H(p,m)
The better the model, the lower the cross-entropyand the lower the perplexity (on the same data)
53 / 56
![Page 87: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/87.jpg)
Test-set Perplexity
Estimate the distribution p(w1, . . . ,wn) on thetraining dataEvaluate it on the test data
H = −∑
w1,...,wn∈test set
p(w1, . . . ,wn) log p(w1, . . . ,wn)
PP = 2H
54 / 56
![Page 88: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/88.jpg)
Perplexity and branching factorPerplexity is roughly the geometric mean of thebranching factor
word
word1
word2. . .
wordN
Shannon: 2.39 for English letters and 130 forEnglish wordsDigit strings: 10n-gram English: 50–1000Wall Street Journal test set: 180 (bigram) 91(trigram)
55 / 56
![Page 89: DT2118 Speech and Speaker Recognition - Language Modelling · 2016-05-03 · \ice cream" vs \I scream" ... Language Models in ASR We want to: 1.limit the branching factor in the recognition](https://reader034.fdocuments.in/reader034/viewer/2022050400/5f7e73c0855a896ead7804f4/html5/thumbnails/89.jpg)
Performance of N-grams
Models Perplexity Word Error RateUnigram Katz 1196.45 14.85%Unigram Kneser-Ney 1199.59 14.86%Bigram Katz 176.31 11.38%Bigram Kneser-Ney 176.11 11.34%Trigram Katz 95.19 9.69%Trigram Kneser-Ney 91.47 9.60%
Wall Street Journal database Dictionary: 60 000wordsTraining set: 260 000 000 words
56 / 56