Overview of Peter D. Turney’s Work on Similarity From 2001-2008.
Overview of Peter D. Turney’s Work on Similarity
-
Upload
cally-camacho -
Category
Documents
-
view
30 -
download
0
description
Transcript of Overview of Peter D. Turney’s Work on Similarity
![Page 1: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/1.jpg)
Overview of Peter D. Turney’s Work o
n Similarity
From 2001-2008
![Page 2: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/2.jpg)
similarity Attributional similarity (2001 - 2003)
the degree to which two words are synonymous also known as
Semantic relatedness and semantic association
Relational similarity (2005 - 2008) the degree to which two relations are analogous
![Page 3: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/3.jpg)
Objective evaluation of the approaches by
Attributional similarity 80 TOFEL Synonym questions
Relational similarity 374 SAT analogy questions
![Page 4: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/4.jpg)
2001 Mining the Web for
Synonyms: PMI-IR versus LSA on TOEFL
In Proceedings of the 12th European
Conference on Machine Learning,
pages 491–502, Springer, Berlin, 2001.
![Page 5: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/5.jpg)
1 Introduction
识别同义词: 给定一个词和一组候选词,从候选词中选出与给定
词意义最相近的一个。 核心思想:基于 co-occurrence
“a word is characterized by the company it keeps”
![Page 6: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/6.jpg)
1 Introduction: idea
给定一个词 problem 和一组候选词 {choice1, choice2, …, choicen} 计算 choicei 的 score(choicei) ,得分最高的即为同义词。
uses Pointwise Mutual Information (PMI) to analyze statistical data collected by Information
Retrieval (IR).
ii 2
i
p(problem & choice )score(choice ) = log
p(problem)p(choice )
![Page 7: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/7.jpg)
2 formula
Score 1:
Score 2: NEAR为十个单词以内
i1 i
i
hits(problem AND choice )score (choice ) =
hits(choice )
i2 i
i
hits(problem NEAR choice )score (choice ) =
hits(choice )
![Page 8: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/8.jpg)
2 formula
Score 3: 避免反义词如 big vs. small
Score 4: 引入上下文 context
context word的选择:只选一个(保证样本数)
3 i
i i
i i
score (choice ) =
hits((problem NEAR choice ) AND NOT ((problem OR choice ) NEAR "not"))
hits(choice AND NOT (choice NEAR "not"))
4 i
i i
i i
score (choice ) =
hits((problem NEAR choice ) AND context AND NOT ((problem OR choice ) NEAR "not"))
hits(choice AND context AND NOT (choice NEAR "not"))
![Page 9: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/9.jpg)
3 Experiments
Compare with LSA: Latent Semantic Analysis
利用百科全书构造初始矩阵 X : 61,000 * 30,473 文档片段:整篇文档 压缩降维: SVD Element: tfidf weight Similarity: cosine
学生的 TOFEL 成绩
![Page 10: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/10.jpg)
Dataset: 80 个 TOFEL 试题
50 个 ESL 考试题
![Page 11: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/11.jpg)
3 Experiments: PMI-IR Vs. LSA
时间效率 PMI-IR :程序简单,耗时少
2s/query * 8 querys ,几乎全部耗时在网络交互 并行: 2S
LSA :耗时长 61,000 * 30,473 压缩到 61,000 *300 , UNIX Station 需时大
约三小时
![Page 12: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/12.jpg)
3 Experiments
80 个 TOFEL 试题, 50 个 ESL考试题 PMI-IR : 73.75%(59/80) 74%(37/50) 留学生: 64.5%(51.6/80) LSA: 64.4%(51.5/80)
性能 : PMI-IR WIN: 10% 原因
NEAR 的使用, Smaller chunk size LSA 64.4% PMI-IR with AND 62.5% PMI-IR with NEAR 72.5%
![Page 13: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/13.jpg)
4 Conclusion
结合 PMI 和 IR 用共现来衡量词语间的相关程度
PMI 利用向引擎发送查询
解决了数据稀疏的问题
![Page 14: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/14.jpg)
2003Combining independent modules
in lexical multiple-choice problems
In RANLP-03, pages 482–489,
Borovets, Bulgaria(RANLP: Recent Advances in Natural Language Proc
essing )
![Page 15: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/15.jpg)
1 Introduction
There are several approaches to natural language problems
No one will be the best for all problem instances.
How about combine them?
![Page 16: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/16.jpg)
1 Introduction
two main contributions introduces and evaluates several new modules
for answering multiple-choice synonym questions and analogy questions.
3 merging rules presents a novel product rule compares it with other 2 similar merging rules.
![Page 17: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/17.jpg)
2 Merging rules: the parameter
The parameter of the rules: w ph
ij >= 0 represents the probability
第 i 个 module 1 <= i <= n 第 h 个 instance 1 <= h <= m. 第 j 个 choice 1 <= j <= k
Dh,wj be the probability
assigned by the merging rule to choice j of training instance h when the weights are set to w.
1<= a(h) <= k be the correct answer for instance
, '
( )'arg maxh w
a hw hw D
![Page 18: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/18.jpg)
2 Merging rules: old
mixture rule: very common
归一化
logarithmic rule
,
1
nh w hj i ij
i
M w p
,
1
exp ln ( ) i
nwh w h h
j i ij ijii
L w p p
, , ,
1
kh w h w h wj j j
j
D L L
, , ,
1
kh w h w h wj j j
j
D M M
![Page 19: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/19.jpg)
2 Merging rules: novel
product rule
, ( (1 ) )h w hj i ij iiP w p w k
, , ,
1
kh w h w h wj j j
j
D P P
![Page 20: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/20.jpg)
3 Synonym: dataset
a training set of 431 4-choice synonym questions
randomly divided them into 331 training questions and 100 testing questions. Optimize w with the training set
![Page 21: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/21.jpg)
3 Synonym: Modules LSA PMI-IR Thesaurus
queries Wordsmyth (www.wordsmyth.net) Create synonyms lists for both stem and choices scored them by their overlap
Connector used summary pages from querying Google with a pair of
words Weighted sum of
the times when the words appear separated by a symbol [, ”, :, ,, =, /, ,, (, ] means, defined, equals, synonym, whitespace, and
the number of times “dictionary” or “thesaurus” appear
![Page 22: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/22.jpg)
3 Synonym: combine results 3 rules’ accuracies are nearly identical the product and logarithmic rules assign
higher probabilities to correct answers as evidenced by the mean likelihood.
![Page 23: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/23.jpg)
3 Synonym: compare with other approaches
![Page 24: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/24.jpg)
4 Analogies: dataset
374 5-choice instances randomly split the collection into 274 training i
nstances and 100 testing instances. Eg. cat:meow::
(a) mouse:scamper,(b) bird:peck, (c) dog:bark, (d) horse:groom,(e) lion:scratch
![Page 25: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/25.jpg)
4 Analogies: modules
Phrase vectors Create vector r to present the relationship betwee
n X and Y. Phrases with 128 patterns
Eg. “X for Y", “Y with X", “X in the Y", “Y on X“ Query and record the number of hits Measure by cosine
Thesaurus paths (WordNet) degree of similarity between paths
![Page 26: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/26.jpg)
4 Analogies: combine results
Lexical relation modules a set of more specific modules using the WordNet 9 modules: Each checks a relationship
Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym:member, Holonym:substance, Holonym:member.
Check the stem first, then the choices Similarity
Make use of definition Similarity:dict uses dictionary.com and Similarity:wordsmyth uses wordsmyth.net
Given A:B::C:D, similarity = sim (A, C) + sim (B, D)
![Page 27: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/27.jpg)
![Page 28: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/28.jpg)
5 Conclusion
applied three trained merging rules to TOEFL questions Accuracy: 97.5%
provided first results on a challenging analogy task with a set of novel modules that use both lexical databases and statistical information. Accuracy: 45%
the popular mixture rule was consistently weaker than the logarithmic and product rules at assigning high probabilities to correct answers.
![Page 29: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/29.jpg)
State of the art (accuracy)
LSA HUMAN PIM-IR
(2001)
HYBRID
(2003)
Synonym
question
64.4% 64.5% 73.75% 97.5%
HYBRID HUMAN
Analogies 45% 57%
![Page 30: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/30.jpg)
2005 Corpus-based Learning of
Analogies and Semantic Relations
IJCAI 2005
Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh,
Scotland, UK, July 30-August 5, 2005.
![Page 31: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/31.jpg)
1 Introduction
Verbal analogy: VSM A:B :: C:D The novelty of the paper is the application of VSM
to measure the similarity between relationships. Noun-modifier pairs relations: supervised nea
rest neighbour algorithm Dataset: Nastase and Szpakowicz (2003), 600 n
one-modifier pairs.
![Page 32: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/32.jpg)
1 Introduction: examples
Analogy
Noun-modifier pairs relations Laser printer Relation: instrument
![Page 33: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/33.jpg)
2 Solving Analogy Problems
assign scores to candidate analogies A:B::C:D For multiple-choice questions, guess highest
scoring choice Sim(R1, R2) difficulty is that R1 and R2 are implicit
attempt to learn R1 and R2 using unsupervised learning from a very large corpus
![Page 34: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/34.jpg)
2 Solving Analogy Problems: Vector Space Model
create vectors, r1 and r2, that represent features of R1 and R2
measure the similarity of R1 and R2 by the cosine of the angle θ between r1 and r2
![Page 35: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/35.jpg)
2 Solving Analogy Problems:简易图解版 Generate vector for each word pair
Joining terms:
“X for Y", “Y with X", “X in the Y", “Y on X“
vector
[ log(hit1), log(hit2)…, log(hit128) ]
Word PairA:B
64 joining terms
phrases
searchhits vector
log
![Page 36: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/36.jpg)
2 Solving Analogy Problems: experiment
![Page 37: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/37.jpg)
2 Solving Analogy Problems: experiment
![Page 38: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/38.jpg)
3 Noun-Modifier Semantic Relations
First attempt to classify semantic relations without a lexicon.
![Page 39: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/39.jpg)
30 Semantic Relations of training data
![Page 40: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/40.jpg)
3 Noun-Modifier Semantic Relations: algorithm
nearest neighbour supervised learning nearest neighbour = cosine
Cosine (training pair, testing pair) vector of 128 elements, same joining terms as
before
![Page 41: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/41.jpg)
3 Noun-Modifier Semantic Relations:Experiment for the 30 Classes
![Page 42: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/42.jpg)
30 Semantic Relations
F when precision and recall are balanced 26.5%
F for random guessing 3.3%
much better than random guessing but still much room for improvement
30 classes is hard too many possibilities for confusing classes
try 5 classes instead group classes together
![Page 43: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/43.jpg)
5 Semantic Relations
![Page 44: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/44.jpg)
F for the 5 Classes
![Page 45: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/45.jpg)
5 Semantic Relations
F when precision and recall are balanced 43.2%
F for random guessing 20.0%
better than random guessing better than 30 classes
26.5% but still room for improvement
![Page 46: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/46.jpg)
Execution Time
experiments presented here required 76,800 queries to AltaVista 600 word pairs × 128 queries per word pair = 76,800 queries
as courtesy to AltaVista, inserted a five second delay between each query processing 76,800 queries took about five
days
![Page 47: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/47.jpg)
Conclusion
The cosine metric in the VSM used to Analogy Classify semantic relations
It performs much better than random guessing, but below human levels.
![Page 48: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/48.jpg)
State of the art
accuracy HYBRID
(2003)
VSM
(2005)
HUMAN
Analogies 45% 47% 57%
F-measure VSM
(2005)
Noun-Modifier
(5 classes)
43.2%
![Page 49: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/49.jpg)
2006aSimilarity of Semantic
Relations
Computational Linguistics, 32(3):379–416.
![Page 50: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/50.jpg)
1 Introduction
Latent Relational Analysis (LRA) LRA extends the VSM approach of Turney an
d Littman (2005) in three ways: The connecting patterns are derived automatical
ly from the corpus, instead of using a fixed set of patterns.
Singular Value Decomposition (SVD) is used to smooth the frequency data.
automatically generated synonyms are used to explore variations of the word pairs.
![Page 51: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/51.jpg)
2 A short description of LRA简易图解版 Generate vector for each word pair
Word PairA:B
64 joining terms
phrases
searchhits vector
log
A’:B, A:B’同义词扩展
熵 *log(hit)
矩阵
SVD
Calculate avg(cosine)
自动获得的pattern
![Page 52: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/52.jpg)
3 Experiment: Word Analogy Questions Baseline LSA
Matrix: 17,232 * 8,000, density of 5.8% Time required: 209:49:36, 9 days Performance:
![Page 53: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/53.jpg)
Experiment: Word Analogy Questions LSA vs. VSM
Corpus size: AltaVista: 5*1011 English words WMTS: 5*1010 English words
![Page 54: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/54.jpg)
Experiment: Word Analogy Questions Varying the Parameters
![Page 55: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/55.jpg)
Experiment: Word Analogy Questions Ablation Experiments
No SVD: not significant, but maybe significant with more word pairs
No synonyms: recall drops No both: recall drops VSM: drop is significant
![Page 56: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/56.jpg)
Experiments with Noun-Modifier Relations
Dataset 600 noun-modifier pairs, hand-labeled with 30 cla
sses of semantic relations Algorithm
Baseline LRA with Single Nearest Neighbor LRA: a distance (nearness) measure
![Page 57: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/57.jpg)
![Page 58: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/58.jpg)
Discussion
For Word Analogy Questions Performance is not yet be adequate for practical
application Speed
For noun-modifier classification More hand-labeled data, but it’s expensive the choice of classification scheme for the
semantic relations Hybrid approach
combine the corpus-based approach of LRA with the lexicon-based approach of Veale (2004)
![Page 59: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/59.jpg)
Conclusion of 2006a
LRA, extend the VSM (2005) in Patterns are derived automatically SVD is used to smooth and compress data. automatically generated synonyms are used to
explore variations of the word pairs.
![Page 60: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/60.jpg)
State of the art
accuracy HYBRID
(2003)
VSM
(2005)
LRA
(2006a)
HUMAN
Analogies 45% 47% 56.8% 57%
F-measure VSM
(2005)
LRA
(2006a)
Noun-Modifier
(5 classes)
43.2% 54.6%
![Page 61: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/61.jpg)
2006bExpressing Implicit Semantic Relations
without Supervision
Coling/ACL-06
![Page 62: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/62.jpg)
Introduction
Hearst (1992): pattern X:Y Pattern “Y such as the X” can be used to mine l
arge text corpora for hypernym-hyponym Search using the pattern “Y such as the X” and fi
nd the string “bird such as the ostrich”, then we can infer that “ostrich” is a hyponym of “bird”.
Here we consider the inverse of this problem: X:Y pattern Can we mine a large text corpus for patterns that
express the implicit relations between X and Y?
![Page 63: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/63.jpg)
Introduction
Discovering high quality patterns Pertinence: measure of quality Reliable for mining further word pairs with the
same semantic relations
![Page 64: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/64.jpg)
2 Pertinence the first formal measure of quality for text mining patterns. a set of word pairs a set of patterns
Pi is pertinent to Xj:Yj
if highly typical word pairs Xk:Yk for the pattern Pi tend to be relationally similar to Xj:Yj
Pertinence tends to be highest with unambiguous patterns
1 1: ,..., :n nW X Y X Y1{ ,..., }mP P P
1
: ,
( : | ) : , :
j j i
n
k k i r j j k kk
pertinence X Y P
p X Y P sim X Y X Y
![Page 65: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/65.jpg)
2 Pertinence: 计算 fk,I is the number of occurrences in a corpus of the w
ord pair Xk:Yk with the pattern Pi
Smoothing, ,
1
( | : )m
i k k k i k jj
p P X Y f f
1
( : ) ( | : )( : | )
( : ) ( | : )
k k i k kk k i n
j j i j jj
p X Y p P X Yp X Y P
p X Y p P X Y
( : ) 1j jp X Y n
1
( | : )( : | )
( | : )
i k kk k i n
i j jj
p P X Yp X Y P
p P X Y
( : | )k k ip X Y P
贝叶斯定理
, ,1
( : | ) ( : , ) ( )n
k k i k k i i k i j ij
p X Y P p X Y P p P f f
![Page 66: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/66.jpg)
3 Related Work Hearst (1992)
describes a method for finding patterns like “Y such as the X”. but her method requires human judgment.
Riloff and Jones (1999) use a mutual bootstrapping technique that can find patterns
automatically but the bootstrapping requires an initial seed of manually cho
sen examples.
Other works all require training examples or initial seed patterns for each relation
![Page 67: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/67.jpg)
3 Related Work
Turney (2006a): LRA maps each pair X:Y to a high-dimensional vector
v, then calculate the cosine. Pertinence is based on it A limitation:
the semantic content of the vectors is difficult to interpret
![Page 68: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/68.jpg)
The Algorithm 1. Find phrases 2. Generate patterns
Note pattern frequency (TF) A local frequency count
3. Count pair frequency: It’s a global frequency count (DF)
4. Map pairs to rows: both for Xj:Yj and Yj:Xj
5. Map patterns to columns drop all patterns with a pair frequency less than 10 1,706,845 distinct patterns 42,032 patterns
![Page 69: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/69.jpg)
The Algorithm 6. Build a sparse matrix
Element is frequency 7. Calculate entropy: log and entropy
gives more weight to patterns that vary substantially in frequency for each pair.
8. Apply SVD: 9. Calculate cosines: 10. Calculate conditional probabilities:
For every word pair and every pattern
11. Calculate pertinence: 1
( | : )( : | )
( | : )
i k kk k i n
i j jj
p P X Yp X Y P
p P X Y
![Page 70: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/70.jpg)
The Algorithm:简易图解版
语义相似度 = pattern list 的相似度
{ 词对 } 矩阵词对 1, pattern list1
……词对 n, pattern listn
检索 , 统计patterns 等 计算 , 排序
![Page 71: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/71.jpg)
5 Experiments with Word Analogies Dataset
374 college-level multiple-choice word analogies, taken from the SAT test.
6*374 = 2244 pairs 4194rows * 84,064 columns The sparse matrix density is 0.91%
Score =
( rankstem + rankchoice ) / 2
![Page 72: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/72.jpg)
![Page 73: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/73.jpg)
the four highest ranking patterns for the stem and solution for the first example
![Page 74: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/74.jpg)
the top five pairs match the pattern “Y such as the X”.
![Page 75: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/75.jpg)
Comparing with other measures
![Page 76: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/76.jpg)
Experiments with Noun-Modifiers
![Page 77: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/77.jpg)
Method and Result Method
A single nearest neighbour algorithm with leave-one-out cross-validation.
The distance between two noun-modifier pairs is measured by the average rank of their best shared pattern.
Result
![Page 78: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/78.jpg)
More
For the 5 general classes
![Page 79: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/79.jpg)
Comparing with other measures
![Page 80: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/80.jpg)
Discussion Time
Word Analogies: 5 hours, vs. 5 days (2005), 9 days(2006a) Noun-Modifiers: 9 hours the majority of the time was spent in SEARCHING
Performance Near the level of the average senior high school student
(54.6% vs. 57%) For applications such as building a thesaurus, lexicon, or
ontology, this level of performance suggests that our algorithm could assist, but not replace, a human expert.
![Page 81: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/81.jpg)
Conclusion
LRA is a black box The main contribution of this paper is the idea
of pertinence use it to find patterns that express the implicit
semantic relations between two words.
![Page 82: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/82.jpg)
State of the art
accuracy HYBRID
(2003)
VSM
(2005)
LRA
(2006a)
pertinence
(2006b)
HUMAN
Analogies 45% 47% 56.8% 55.7% 57%
F-measure VSM
(2005)
LRA
(2006a)
pertinence
(2006b)
Noun-Modifier
(5 classes)
43.2% 54.6% 50.2%
![Page 83: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/83.jpg)
2008A Uniform Approach to Analogies,
Synonyms, Antonyms,and Associations
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), August 2008, Manchester, UK, Pages 905-912
![Page 84: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/84.jpg)
1 Introduction
语义种类太多,不可能每种都提供一种特别的算法 we restrict our attention to
analogous synonymous Antonymous Associated
As far as we know, the algorithm proposed here is the first attempt to deal with all four tasks using a uniform approach.
![Page 85: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/85.jpg)
1 Introduction: idea
Analogous Synonymous
X:Y is analogous to the pair levied:imposed Antonymous
X:Y is analogous to the pair black:white Associated
X:Y is analogous to the pair doctor:hospital
![Page 86: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/86.jpg)
1 Introduction: Why not WordNet?
WordNet contains all of the needed relations. Corpus-based algorithm is BETTER than lexicon
answer 374 multiple-choice SAT analogy questionsWordNet (Veale, 2004): 43%corpus-based (Turney, 2006a): 56%
Less human labor Easy to extend to other languages
![Page 87: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/87.jpg)
1 Introduction: experiments
SAT college entrance test TOFEL ESL a set of word pairs that are labeled similar,
associated, and both, developed for experiments in cognitive psychology
![Page 88: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/88.jpg)
2 Algorithm: PairClass
view the task of recognizing word analogies as a problem of classifying word pairs standard classification problem for supervised
machine learning
![Page 89: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/89.jpg)
2 Algorithm: Resource
Corpus: 5 × 1010 words, consisting of web pages gathered by a web
crawler, gathered by Clarke,CharlesL.A., 2003
Wumpus: an efficient search engine for passage retrieval from large c
orpora. (http://www.wumpus-search.org/) to study issues that arise in the context of indexing dynami
c text collections in multi-user environments.
![Page 90: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/90.jpg)
2 Algorithm: PairClasstraining set
&testing set
Step1: generate morphological variations
Step 2: search in a large corpus forall phrases
Step 3:
generate patterns
Step 4: reduce the number
of patterns
Step 5: generate feature
vectors
Step 6: apply a standard su
pervisedlearning algorithm
Weka
mason:stone
masons:stones
the mason cut the stone
with
[0 to 1 words] X [0 to 3 words] Y [0 to 1 words]
“the X cut * Y with”
“* X * the Y *”
2(n−2) patterns
topkN patterns
k = 20
SMO RBF algorithm
![Page 91: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/91.jpg)
PairClass vs. LSA(Turney, 2006a)
PairClass does not use a lexicon to find synonyms for the input word pairs. a pure corpus-based algorithm can handle synony
ms without a lexicon. PairClass uses a support vector machine (SV
M) instead of a nearest neighbour (NN) learning algorithm.
PairClass does not use SVD to smooth the feature vectors. It has been our experience that SVD is not neces
sary with SVMs.
![Page 92: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/92.jpg)
Measure of similarity PairClass: probability estimates, more useful Turney (2006): cosine
The automatically generated patterns are slightly more general PairClass: [0 to 1 words] X [0 to 3 words] Y [0 to 1 words] Turney (2006): X [0 to 3 words] Y
The morphological processing in PairClass (Minnen et al., 2001) is more sophisticated than in Turney (2006).
![Page 93: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/93.jpg)
3 Experiment: SAT Analogies
use a set of 374 multiple-choice questions from the SAT college entrance exam.
Eg.
a binary classification problem
![Page 94: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/94.jpg)
3 Experiment: SAT Analogies
1st DIFFICULTY: no negative examples the training set consists of one positive example
(the stem pair) and the testing set consists of five unlabeled examples (the five choice pairs).
Solution: Randomly choose one of the other 373 questions,
to be a negative example use PairClass to estimate the probability that
each testing example is positive, and we guess the testing example with the highest probability.
![Page 95: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/95.jpg)
![Page 96: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/96.jpg)
3 Experiment: SAT Analogies
2nd DIFFICULTY: the algorithm is very unstable, for lack of example
s. Solution:
To increase the stability, we repeat the learning process 10 times, using a different randomly chosen negative training example each time.
Average the 10 probability
![Page 97: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/97.jpg)
PairClass: accuracy of 52.1%
52.1%
![Page 98: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/98.jpg)
3 Experiment: TOEFL Synonyms
Recognizing synonyms a set of 80 multiple-choice synonym question
s from the TOEFL
![Page 99: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/99.jpg)
View it as a binary classification problem
![Page 100: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/100.jpg)
3 Experiment: TOEFL Synonyms
80 questions, 80 positive, 240 negative apply PairClass using ten-fold cross-validation
In each random fold, 90% of the pairs are used for training and 10% are used for testing.
For each fold, the model that is learned from the training set is used to assign probabilities to the pairs in the testing set.
They are non-overlapping, so can cover the whole dataset.
Choice: the one with hightest probability
![Page 101: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/101.jpg)
PairClass: accuracy of 76.1%
76.1%
![Page 102: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/102.jpg)
3 Experiment: Synonyms and Antonyms
a set of 136 ESL practice questions
![Page 103: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/103.jpg)
3 Experiment: Synonyms and Antonyms
By patterns: hand-coded Lin et al. (2003) two patterns, “from X to Y ” and “either X or Y ”.
Antonyms: they occasionally appear in a large corpus in one of these two patterns
Synonyms: very rare to appear in these patterns.
PairClass: automatically
![Page 104: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/104.jpg)
3 Experiment: Synonyms and Antonyms
RESULT PairClass: ten-fold cross-validation
accuracy of 75.0% (ten-fold cross-validation) Baseline:
accuracy of 65.4% (Always guessing the majority class)
NO COMPARISON
![Page 105: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/105.jpg)
3 Experiment: Similar, Associated, and Both
Lund et al. (1995) evaluated their corpus-based algorithm for measuring word similarity with word pairs that were labeled similar, associated, or both.
These 144 labeled pairs were originally created for cognitive psychology experiments with human subjects
![Page 106: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/106.jpg)
3 Experiment: Similar, Associated, and Both
Lund et al. (1995) did not measure the accuracy showed that their algorithm’s similarity scores were
correlated with the response times of human subjects in priming tests.
PairClass with ten-fold cross-validation accuracy of 77.1%
Baseline: guessing the majority and Randomly guessing: 33.3%
Since the three classes are of equal size
![Page 107: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/107.jpg)
3 Experiment: summary
For the first two experiments PairClass is not the best, But it performs competitively
For the second two experiments, PairClass performs significantly above the baselin
es.
![Page 108: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/108.jpg)
State of the art
YEAR 算法 类型 synonym analogy
2001 PMI-IR Corpus-based 73.75%
2003 PR Hybrid 97.50%
2005 VSM Corpus-based 47.1%
2006a LRA Corpus-based 56.1%
2006b PERT Corpus-based 53.5%
2008 PairClass Corpus-based 76.1% 52.1%
HUMAN 64.5% 57.0%
![Page 109: Overview of Peter D. Turney’s Work on Similarity](https://reader035.fdocuments.in/reader035/viewer/2022062304/568135ef550346895d9d61f5/html5/thumbnails/109.jpg)
终于讲完了 o_0
Any Questions?