Exploiting Structured Ontology to Organize Scattered Online Opinions

22
Exploiting Structured Ontology to Organize Scattered Online Opinions Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign August 24, COLING’2010 Beijing, China 1

description

Exploiting Structured Ontology to Organize Scattered Online Opinions. Yue Lu , Huizhong Duan , Hongning Wang, ChengXiang Zhai University of Illinois at Urbana-Champaign. August 24, COLING’2010 Beijing, China. Online Opinions: Valuable Resource. …. Need to organize them - PowerPoint PPT Presentation

Transcript of Exploiting Structured Ontology to Organize Scattered Online Opinions

Page 1: Exploiting Structured Ontology to Organize Scattered Online Opinions

1

Exploiting Structured Ontology to Organize Scattered Online Opinions

Yue Lu, Huizhong Duan, Hongning Wang, ChengXiang Zhai

University of Illinois at Urbana-Champaign

August 24, COLING’2010 Beijing, China

Page 2: Exploiting Structured Ontology to Organize Scattered Online Opinions

2

Online Opinions: Valuable Resource

Need to organize them in a meaningful way!

Page 3: Exploiting Structured Ontology to Organize Scattered Online Opinions

3

Aspect Summarization

Childhood

Barack Obama is an African American whose father was born in Kenya and got a sholarship to study in American. born in Honolulu, Hawaii, to Barack Hussein Obama Sr., a Kenyan, and Kansas born Ann Dunham.

President Campagne

The Obama campaign’s use of new media technologies to revitalize political activism among youth, engage the public at large, and raise enormous, record-breaking sums of money was unlike that of any political campaign to date.

Health Care Reform

Several months after the landmark healthcare bill was passed, America's faith in healthcare increases dramatically.For health insurance brokers, the new health care reform legislation has created uncertainty of …

What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order

Page 4: Exploiting Structured Ontology to Organize Scattered Online Opinions

4

Existing Work

What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order

Clustering + Phrase Selection

NA

[Chen&Dumais 2000]

Our idea: use structured ontology

Page 5: Exploiting Structured Ontology to Organize Scattered Online Opinions

5

Why Using Ontology?

What are “good aspects”?1. Concise 2. Relevant to topic3. Captures major opinions4. Reasonable order

Ontologybased

In addition:• Great coverage

– 12 millions of entities, e.g. person, place, or thing• Consistently growing

– Anyone can contribute data

Clusteringbased

NA

Page 6: Exploiting Structured Ontology to Organize Scattered Online Opinions

6

Problem DefinitionTopic = “Abraham Lincoln”Ontology

(>50 aspects)

Professions

QuotationsParents

…Date of Birth

Place of Death

Professions

Online Opinion Sentences

Selected Subset of AspectsSelected Matching OpinionsOrdered to optimize readability

Date of BirthBooks written

Place of Death

Place of Birth

Children

Output

Spouse

Two Main Tasks:- Aspect Selection- Aspect Ordering

Page 7: Exploiting Structured Ontology to Organize Scattered Online Opinions

7

Aspect Selection: Task Definition

What are “good aspects”?– 3. Captures major opinions

Professions KL-divergence retrieval model

Query:

Collection:

Aligned relevant opinions

Professions

Parents

Task: Select a subset of K aspects

Page 8: Exploiting Structured Ontology to Organize Scattered Online Opinions

8

Aspect Selection: Methods (1) (2)

• Size-based– Size = Number of aligned

relevant opinions– Select K aspects of largest size

• Opinion Coverage-based– Reduce redundancy, maximum

coverage– Select K aspects sequentially

(max cover problem)

Professions 12

3

Position 45

3

Size=800

Size=600

Parents 45

6

…Size=500

Page 9: Exploiting Structured Ontology to Organize Scattered Online Opinions

9

Aspect Selection: Method (3)Conditional Entropy-based

Professions …

Collection:

Clustering, e.g. K-means

C1

C2

C3

…Parents

Position …

Clusters:C

AspectSubset:A

A = argmin H(C|A) p(Ai,Ci)

= argmin - ∑i p(Ai,Ci) log ---------- p(Ai)

A1

A2

A3

Use a greedy algorithm to approximate the solution

Page 10: Exploiting Structured Ontology to Organize Scattered Online Opinions

10

Aspect Ordering: Task Definition

Date of Birth

Place of Death

Professions

Quotations

Date of Birth

Place of Death

Professions

Quotations

OrderedUn-Ordered Aspect Subset

What are “good aspects”?– 4. Reasonable order

Page 11: Exploiting Structured Ontology to Organize Scattered Online Opinions

11

Aspect Ordering: Methods

• Ontology Order– Use the order that aspects

appear in ontology• Coherence Order– Follow the order of aligned

opinions in their original articles (e.g. blog article, customer review)

Page 12: Exploiting Structured Ontology to Organize Scattered Online Opinions

12

Aspect Ordering: Coherence OrderOriginalArticles

Date of Birth

Place of DeathA1

A2

Coherence(A1, A2) #( is before )Coherence(A2, A1) #( is before )

So, Coherence(A2, A1) > Coherence (A1, A2)

Π(A) = argmax ∑ Ai before Aj Coherence(Ai, Aj)

Use a greedy algorithm to approximate the solution

Page 13: Exploiting Structured Ontology to Organize Scattered Online Opinions

13

Experiments: Data Sets

• Ontology– Freebase

• Opinions– Blog entries and CNET customer reviews

Statistics US Presidents Digital Cameras# Topics 36 110# Aspects/Topic 65±26 32±4# Opinions/Topic 1001±1542 140±249

Page 14: Exploiting Structured Ontology to Organize Scattered Online Opinions

14

Sample Results: Sony Cybershot DSC-W200

Freebase Aspects sup Representative Opinion Sentences

Format: Compact 13

Quality pictures in a compact package.…amazing is that this is such a small and compact unit but packs so much power

Supported Storage Types: Memory Stick Duo

11 This camera can use Memory Stick Pro Duo up to 8 GBUsing a universal storage card and cable (c’mon Sony)

Sensor type: CCD 10

I think the larger ccd makes a difference.but remember this is a small CCD in a compact point-and-shoot.

Digital zoom: 2X 47once the digital :smart” zoom kicks in you get another 3x of zoom. I would like a higher optical zoom, the W200 does a great digital zoom translation...

Page 15: Exploiting Structured Ontology to Organize Scattered Online Opinions

15

Aspect Selection: Evaluation Measures• Aspect Coverage (AC)• Aspect Precision (AP) = Jaccard similarity• Average Aspect Precision (AAP)

Professions C1

C2

C3Parents

Position

A1

A2

A3

J(A1,C2)=1J(A2,C2)=2/4

J(A3,C1)=2/4 AP=0.5

AP=0.75

AP=0

= 2/3= 0.625

= 0.42

Page 16: Exploiting Structured Ontology to Organize Scattered Online Opinions

16

Conditional Entropy-based method provides best trade-off for Aspect Selection

Methods Aspect Coverage

Aspect Precision

Average Aspect

PrecisionRandom 0.5140 0.0933 0.1223

Size-based 0.3108 0.1508 0.0949Opin-Cover 0.5463 0.0913 0.1316

Cond Ent 0.5770 0.0856 0.1552

Random 0.6554 0.0871 0.1271Size-based 0.6071 0.1077 0.1340Opin-Cover 0.6998 0.0914 0.1564

Cond Ent 0.7497 0.0789 0.1574

US Presidents

Digital Cameras

Page 17: Exploiting Structured Ontology to Organize Scattered Online Opinions

17

Aspect Ordering: Human Labeling

Professions

QuotationsParents

Cluster Constraints

Order Constraints

Parents Spouse

Party

Positions

Date of Birth Date of Death

Education Positions

Aspect subsetsize = K

Children

Spouse Children

Date of Birth

Spouse 37%

47%

16%

89%

11%

Human Agreement

X 3

X 3

X 3

Page 18: Exploiting Structured Ontology to Organize Scattered Online Opinions

18

Aspect Ordering: MeasuresCluster Constraints

Parents Spouse

Party

Positions

Children

Parents Spouse

Parents Children

Children Spouse

Party Positions

Cluster Precision = 0.5

Is this pair presentedtogether in the output?

Cluster Penalty = 1.25

# aspects placed betweenthis pair in the output?

1

0

1

0

0

2

0

3

Page 19: Exploiting Structured Ontology to Organize Scattered Online Opinions

19

Aspect Ordering: Evaluation ResultsMeasures:

Cluster PrecisionHigher is better

Cluster Penalty Lower is better

Gold STD RandomOrder

OntologyOrder

CoherenceOrder

1 0.2540 0.9355 0.89782 0.2335 0.7758 0.83233 0.2523 0.4030 0.5545union 0.3067 0.7268 0.7488

Gold STD RandomOrder

OntologyOrder

CoherenceOrder

1 2.0656 0.2957 0.20162 2.1790 0.7530 0.52223 2.3079 2.1328 1.1611union 1.9735 1.0720 0.7196

Page 20: Exploiting Structured Ontology to Organize Scattered Online Opinions

20

Aspect Ordering: Evaluation Results

Higher is better

Gold STD RandomOrder

OntologyOrder

CoherenceOrder

1 0.5106 07111 0.54442 0.4759 0.6759 0.50933 0.5294 0.7143 0.8175union 0.5006 0.6500 0.6833

Order Constraints

Date of Birth Date of Death

Education Positions

Is this order pair preserved in the output?

Spouse Children

1

0

1

Order Precision = 0.67

Page 21: Exploiting Structured Ontology to Organize Scattered Online Opinions

21

Conclusions• Novel Problem: exploit ontology for structured

organization of online opinions– Aspect selection– Aspect ordering

• Evaluation: US presidents and digital cameras– Conditional Entropy-based aspect selection– Coherence ordering

• Future Directions:– New aspect suggestion for ontology– Better alignment of opinion sentences and aspects– Ontology + well-written articles

Page 22: Exploiting Structured Ontology to Organize Scattered Online Opinions

22

Thank you!&Questions?