Exploiting Structured Ontology to Organize Scattered Online Opinions
description
Transcript of Exploiting Structured Ontology to Organize Scattered Online Opinions
1
Exploiting Structured Ontology to Organize Scattered Online Opinions
Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai
University of Illinois at Urbana-Champaign
August 24, COLING’2010 Beijing, China
2
Online Opinions: Valuable Resource
Need to organize them in a meaningful way!
…
3
Aspect Summarization
Childhood
Barack Obama is an African American whose father was born in Kenya and got a sholarship to study in American. born in Honolulu, Hawaii, to Barack Hussein Obama Sr., a Kenyan, and Kansas born Ann Dunham.
President Campagne
The Obama campaign’s use of new media technologies to revitalize political activism among youth, engage the public at large, and raise enormous, record-breaking sums of money was unlike that of any political campaign to date.
Health Care Reform
Several months after the landmark healthcare bill was passed, America's faith in healthcare increases dramatically.For health insurance brokers, the new health care reform legislation has created uncertainty of …
What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order
4
Existing Work
What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order
Clustering + Phrase Selection
NA
[Chen&Dumais 2000]
Our idea: use structured ontology
5
Why Using Ontology?
What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order
Ontologybased
In addition:• Great coverage
– 12 millions of entities, e.g. person, place, or thing• Consistently growing
– Anyone can contribute data
Clusteringbased
NA
6
Problem DefinitionTopic = “Abraham Lincoln”Ontology
(>50 aspects)
Professions
QuotationsParents
…Date of Birth
Place of Death
Professions
Online Opinion Sentences
…
Selected Subset of AspectsSelected Matching OpinionsOrdered to optimize readability
Date of BirthBooks written
Place of Death
Place of Birth
Children
Output
Spouse
Two Main Tasks:- Aspect Selection- Aspect Ordering
7
Aspect Selection: Task Definition
What are “good aspects”?– 3. Captures major opinions
…
Professions KL-divergence retrieval model
Query:
Collection:
Aligned relevant opinions
Professions
Parents
…
…
…
Task: Select a subset of K aspects
8
Aspect Selection: Methods (1) (2)
• Size-based– Size = Number of aligned
relevant opinions– Select K aspects of largest size
• Opinion Coverage-based– Reduce redundancy, maximum
coverage– Select K aspects sequentially
(max cover problem)
Professions 12
3
…
Position 45
3
…
Size=800
Size=600
Parents 45
6
…Size=500
9
Aspect Selection: Method (3)Conditional Entropy-based
Professions …
…
Collection:
Clustering, e.g. K-means
C1
C2
C3
…
…
…Parents
Position …
…
Clusters:C
AspectSubset:A
A = argmin H(C|A) p(Ai,Ci)
= argmin - ∑i p(Ai,Ci) log ---------- p(Ai)
A1
A2
A3
Use a greedy algorithm to approximate the solution
10
Aspect Ordering: Task Definition
Date of Birth
Place of Death
Professions
Quotations
Date of Birth
Place of Death
Professions
Quotations
OrderedUn-Ordered Aspect Subset
What are “good aspects”?– 4. Reasonable order
11
Aspect Ordering: Methods
• Ontology Order– Use the order that aspects
appear in ontology• Coherence Order– Follow the order of aligned
opinions in their original articles (e.g. blog article, customer review)
12
Aspect Ordering: Coherence OrderOriginalArticles
Date of Birth
Place of DeathA1
A2
Coherence(A1, A2) #( is before )Coherence(A2, A1) #( is before )
…
So, Coherence(A2, A1) > Coherence (A1, A2)
Π(A) = argmax ∑ Ai before Aj Coherence(Ai, Aj)
Use a greedy algorithm to approximate the solution
13
Experiments: Data Sets
• Ontology– Freebase
• Opinions– Blog entries and CNET customer reviews
Statistics US Presidents Digital Cameras# Topics 36 110# Aspects/Topic 65±26 32±4# Opinions/Topic 1001±1542 140±249
14
Sample Results: Sony Cybershot DSC-W200
Freebase Aspects sup Representative Opinion Sentences
Format: Compact 13
Quality pictures in a compact package.…amazing is that this is such a small and compact unit but packs so much power
Supported Storage Types: Memory Stick Duo
11 This camera can use Memory Stick Pro Duo up to 8 GBUsing a universal storage card and cable (c’mon Sony)
Sensor type: CCD 10
I think the larger ccd makes a difference.but remember this is a small CCD in a compact point-and-shoot.
Digital zoom: 2X 47once the digital :smart” zoom kicks in you get another 3x of zoom. I would like a higher optical zoom, the W200 does a great digital zoom translation...
15
Aspect Selection: Evaluation Measures• Aspect Coverage (AC)• Aspect Precision (AP) = Jaccard similarity• Average Aspect Precision (AAP)
Professions C1
C2
C3Parents
Position
A1
A2
A3
J(A1,C2)=1J(A2,C2)=2/4
J(A3,C1)=2/4 AP=0.5
AP=0.75
AP=0
= 2/3= 0.625
= 0.42
16
Conditional Entropy-based method provides best trade-off for Aspect Selection
Methods Aspect Coverage
Aspect Precision
Average Aspect
PrecisionRandom 0.5140 0.0933 0.1223
Size-based 0.3108 0.1508 0.0949Opin-Cover 0.5463 0.0913 0.1316
Cond Ent 0.5770 0.0856 0.1552
Random 0.6554 0.0871 0.1271Size-based 0.6071 0.1077 0.1340Opin-Cover 0.6998 0.0914 0.1564
Cond Ent 0.7497 0.0789 0.1574
US Presidents
Digital Cameras
17
Aspect Ordering: Human Labeling
Professions
QuotationsParents
…
Cluster Constraints
Order Constraints
Parents Spouse
Party
Positions
…
Date of Birth Date of Death
Education Positions
…
Aspect subsetsize = K
Children
Spouse Children
Date of Birth
Spouse 37%
47%
16%
89%
11%
Human Agreement
X 3
X 3
X 3
18
Aspect Ordering: MeasuresCluster Constraints
Parents Spouse
Party
Positions
Children
Parents Spouse
Parents Children
Children Spouse
Party Positions
Cluster Precision = 0.5
Is this pair presentedtogether in the output?
Cluster Penalty = 1.25
# aspects placed betweenthis pair in the output?
1
0
1
0
0
2
0
3
19
Aspect Ordering: Evaluation ResultsMeasures:
Cluster PrecisionHigher is better
Cluster Penalty Lower is better
Gold STD RandomOrder
OntologyOrder
CoherenceOrder
1 0.2540 0.9355 0.89782 0.2335 0.7758 0.83233 0.2523 0.4030 0.5545union 0.3067 0.7268 0.7488
Gold STD RandomOrder
OntologyOrder
CoherenceOrder
1 2.0656 0.2957 0.20162 2.1790 0.7530 0.52223 2.3079 2.1328 1.1611union 1.9735 1.0720 0.7196
20
Aspect Ordering: Evaluation Results
Higher is better
Gold STD RandomOrder
OntologyOrder
CoherenceOrder
1 0.5106 07111 0.54442 0.4759 0.6759 0.50933 0.5294 0.7143 0.8175union 0.5006 0.6500 0.6833
Order Constraints
Date of Birth Date of Death
Education Positions
Is this order pair preserved in the output?
Spouse Children
1
0
1
Order Precision = 0.67
21
Conclusions• Novel Problem: exploit ontology for structured
organization of online opinions– Aspect selection– Aspect ordering
• Evaluation: US presidents and digital cameras– Conditional Entropy-based aspect selection– Coherence ordering
• Future Directions:– New aspect suggestion for ontology– Better alignment of opinion sentences and aspects– Ontology + well-written articles
22
Thank you!&Questions?