Identifying Comparative Sentences in Text Documents
description
Transcript of Identifying Comparative Sentences in Text Documents
![Page 1: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/1.jpg)
Identifying Comparative Sentences in Text Documents
Nitin Jindal and Bing Liu
University of Illinois
SIGIR 2006
![Page 2: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/2.jpg)
Introduction
• Comparisons are one of the most convincing ways of evaluation.
• Much of such info is available on the Web (customer reviews), forum discussions, and blogs.
• Useful for product manufacturers and potential customers (to make purchasing decisions).
![Page 3: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/3.jpg)
Comparisons vs. Opinions
• Comparisons can be both objective or subjective.
• Comparative sentences have different language constructs from typical opinion sentences.
• Comparative sentences may contain some indicators.
Car X is much better than Car Y
Car X is two feet longer than Car Y
![Page 4: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/4.jpg)
Related Work
• Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification.
• Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.
![Page 5: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/5.jpg)
Comparatives (Linguistic)
• Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property.
John is taller than he was
=>
John is tall to degree d
![Page 6: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/6.jpg)
Comparatives (Linguistic)
• Two broad types:– Metalinguistic Comparatives: compare properti
es of one entity.
Ronaldo is angrier than upset.– Propositional Comparatives: compare between t
wo propositions. Three subcategories:
![Page 7: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/7.jpg)
Comparatives (Propositional)
• Nominal Comparatives: (two sets of entities)
Paul ate more grapes than bananas.
• Adjectival Comparatives: (than, as good as)
Ford is cheaper than Volvo.
• Adverbial Comparatives: (occur after a verb phrase)
Tom ate more quickly than Jane.
![Page 8: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/8.jpg)
Superlatives
• Adjectival Superlatives:
John is the tallest person.
• Adverbial Superlatives:
Jill did her homework most frequently.
• Equality: conjunctions like and, or, …
John and Sue, both like sushi.
![Page 9: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/9.jpg)
POS involved
• NN: Noun• NNP: Proper Noun• VBZ: Verb, present tense, 3rd person singular• JJ: Adjective• RB: Adverb• JJR Adjective, comparatives• JJS: Adjective, superlative• RBR: Adverb, comparative• RBS: Adverb, superlative
![Page 10: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/10.jpg)
Limitations of linguistic classification.
• Non-comparatives with comparative words: many non-comparatives contain comparative words.
In the context of speed, faster means better.John has to try his best to win this game.
• Limited coverage: many comparatives contain no comparative words.
In market capital, Intel is way ahead of Amd.Nokia Samsung, both cell phones perform badly on heat dissipation index.
The M7500 earned a World bench score of 85, whereas Asus A3V posted
a mark of 89.
![Page 11: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/11.jpg)
Enhancements
• First limitation: machine learning methods to distinguish comparatives and non-comparatives.
• Second limitation: – User preferences:
I prefer Intel to Amd = Intel is better than Amd
– Implicit comparatives:Camera X has 2 MP, whereas camera Y has 5 MP.
![Page 12: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/12.jpg)
Types of Comparatives
• Non-Equal Gradable: greater or less than type, including user preferences.
• Equative (Gradable): equal to type• Superlative (Gradable): greater of less than
all others type• Non-Gradable:
– A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t
![Page 13: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/13.jpg)
Tasks
• Identifying comparative sentences from a given text data set.
• Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)
![Page 14: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/14.jpg)
Class Sequential Rules with Multiple Minimum Supports
• For sequential pattern mining, patterns to the left and class to the right.
• Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against)
• The performance of only using keywords are P=32%, R=94%.
![Page 15: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/15.jpg)
Support and Confidence
• Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:
![Page 16: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/16.jpg)
Building the Sequence DBthis/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD
{NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative
• Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%.
• 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..
![Page 17: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/17.jpg)
Classification Learning
• Machine learning methods:
Feature Set = {X | X is the sequential pattern in
CSR X → y} ∪{Z | Z is the pattern in a manual rule
Z → y}
![Page 18: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/18.jpg)
Data Preparation
• Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones.
• Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google.
• News articles on topics such as automobiles, ipods, and soccer vs. football.
![Page 19: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/19.jpg)
Number of Sentences in Data Sets
![Page 20: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/20.jpg)
Experimental Results (1)
![Page 21: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/21.jpg)
Experimental Results (2)
• Review: R low P high -> short sentences, hard to find patterns
• Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.
![Page 22: Identifying Comparative Sentences in Text Documents](https://reader035.fdocuments.in/reader035/viewer/2022081420/5681599e550346895dc6e92a/html5/thumbnails/22.jpg)
Conclusion and Future Work
• Identifying comparative sentences.
• Analyzing different types of comparative sentences.
• Studying how to automatically classify subjective and objective comparisons.