Learning Within-Sentence Semantic Coherence
description
Transcript of Learning Within-Sentence Semantic Coherence
![Page 1: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/1.jpg)
Learning Within-Sentence Semantic Coherence
Elena Eneva
Rose Hoberman
Lucian LitaCarnegie Mellon University
![Page 2: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/2.jpg)
Semantic (in)Coherence
Trigram: content words unrelated Effect on speech recognition:
– Actual Utterance: “THE BIRD FLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK”
– Top Hypothesis: “THE BIRD FLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID”
Our goal: model semantic coherence
![Page 3: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/3.jpg)
A Whole Sentence Exponential Model [Rosenfeld 1997]
P0(s) is an arbitrary initial model (typically N-gram)
fi(s)’s are arbitrary computable properties of s (aka features)
Z is a universal normalizing constant
)exp()(1
)Pr( )(0 sii
i
fsPZ
s
(
def
![Page 4: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/4.jpg)
A Methodology for Feature InductionGiven corpus T of training sentences:
1. Train best-possible baseline model, P0(s)
2. Use P0(s) to generate corpus T0 of “pseudo sentences”
3. Pose a challenge: find (computable) differences that allow discrimination between T and T0
4. Encode the differences as features fi(s)
5. Train a new model:
)exp()(1
)( )(01 sii
i
fsPZ
sP
![Page 5: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/5.jpg)
Discrimination Task:
1. - - - feel - - sacrifice - - sense - - - - - - - - -meant - - - - - - - - trust - - - - truth
2. - - kind - free trade agreements - - - living - - ziplock bag - - - - - - university japan's daiwa bank stocks step –
Are these content words generated from atrigram or a natural sentence?
![Page 6: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/6.jpg)
Building on Prior Work
Define “content words” (all but top 50) Goal: model distribution of content
words in sentence Simplify: model pairwise co-
occurrences (“content word pairs”) Collect contingency tables; calculate
measure of association for them
![Page 7: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/7.jpg)
Q Correlation Measure
Q values range from –1 to +1
21122211
21122211
cccc
cc-cc
Q
W1 yes
W1 no
W2 yes c11 c21
W2 no c12 c22
Derived fromCo-occurrenceContingencyTable
![Page 8: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/8.jpg)
Density Estimates
We hypothesized:– Trigram sentences: wordpair correlation
completely determined by distance– Natural sentences: wordpair correlation
independent of distance kernel density estimation
– distribution of Q values in each corpus– at varying distances
![Page 9: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/9.jpg)
Q Distributions
Q Value
Den
sity
---- Trigram Generated Broadcast News
Distance = 1 Distance = 3
![Page 10: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/10.jpg)
Likelihood Ratio Feature
ji ijij
ijij
TrigramdQ
BNewsdQL
, wordpairs ),|Pr(
),|Pr(
she is a country singer searching for fame and fortune in nashville
Q(country,nashville) = 0.76 Distance = 8Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9
![Page 11: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/11.jpg)
Simpler Features
Q Value based– Mean, median, min, max of Q values for content
word pairs in the sentence (Cai et al 2000)– Percentage of Q values above a threshold– High/low correlations across large/small distances
Other– Word and phrase repetition– Percentage of stop words– Longest sequence of consecutive stop/content
words
![Page 12: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/12.jpg)
Datasets
LM and contingency tables (Q values) derived from 103 million words of BN
From remainder of BN corpus and sentences sampled from trigram LM:– Q value distributions estimated from ~100,000
sentences– Decision tree trained and test on ~60,000 sentences
Disregarded sentences with < 7 words – “Mike Stevens says it’s not real”– “We’ve been hearing about it”
![Page 13: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/13.jpg)
Experiments
Learners: – C5.0 decision tree– Boosting decision stumps with
Adaboost.MH Methodology:
– 5-fold cross validation on ~60,000 sentences
– Boosting for 300 rounds
![Page 14: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/14.jpg)
Results
Feature Set Classification
Accuracy
Q mean, median, min, max (Previous Work)
73.39 ± 0.36
Likelihood Ratio 77.76 ± 0.49
All but Likelihood Ratio 80.37 ± 0.42
All Features 80.37 ± 0.46
Likelihood Ratio + non-Q
![Page 15: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/15.jpg)
Shannon-Style Experiment
50 sentences – ½ “real” and ½ trigram-generated– Stopwords replaced by dashes
30 participants– Average accuracy of 73.77% ± 6– Best individual accuracy 84%
Our classifier:– Accuracy of 78.9% ± 0.42
![Page 16: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/16.jpg)
Summary
Introduced a set of statistical features which capture aspects of semantic coherence
Trained a decision tree to classify with accuracy of 80%
Next step: incorporate features into exponential LM
![Page 17: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/17.jpg)
Future Work
Combat data sparsity– Confidence intervals– Different correlation statistic– Stemming or clustering vocabulary
Evaluate derived features– Incorporate into an exponential language model– Evaluate the model on a practical application
![Page 18: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/18.jpg)
Agreement among Participants
![Page 19: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/19.jpg)
Expected Perplexity Reduction
Semantic coherence feature– 78% of broadcast news sentences– 18% of trigram-generated sentences
Kullback-Leibler divergence: .814 Average perplexity reduction per word
= .0419 (2^.814/21) per sentence? Features modify probability of entire sentence Effect of feature on per-word probability is
small
![Page 20: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/20.jpg)
Likelihood Value
Den
sity
---- Trigram Generated
Broadcast News
Distribution of Likelihood Ratio
![Page 21: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/21.jpg)
Discrimination Task
Natural Sentence:– but it doesn't feel like a sacrifice in a sense that you're
really saying this is you know i'm meant to do things the right way and you trust it and tell the truth
Trigram-Generated:– they just kind of free trade agreements which have been
living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though
![Page 22: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/22.jpg)
Q Value
Den
sity
---- Trigram Generated Broadcast News
Q Values at Distance 1
![Page 23: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/23.jpg)
Q Value
Den
sity
---- Trigram Generated Broadcast News
Q Values at Distance 3
![Page 24: Learning Within-Sentence Semantic Coherence](https://reader035.fdocuments.in/reader035/viewer/2022081515/56814883550346895db596fd/html5/thumbnails/24.jpg)
Outline
The problem of semantic (in)coherence Incorporating this into the whole-
sentence exponential LM Finding better features for this model
using machine learning Semantic coherence features Experiments and results