Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

31
Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking Gong Cheng , Danyun Xu, Yuzhong Qu Websoft Research Group State Key Laboratory for Novel Software Technology Nanjing University, China

Transcript of Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Page 1: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Summarizing Entity Descriptions for Effective and

EfficientHuman-centered Entity

Linking

Gong Cheng, Danyun Xu, Yuzhong Qu

Websoft Research GroupState Key Laboratory for Novel Software Technology

Nanjing University, China

Page 2: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Entity Linking (EL)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 3: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Human-centered EL is needed

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

• for defining gold standard,• for crowdsourced EL.

Page 4: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

entity description:set of property-value pairs (called features)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 5: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Entity descriptions are long.

Page 6: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Short, extractive summaries are adequate for human-centered EL.

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

summary of k candidate entity descriptions: k subsets of features (subject to a length limit)

?… Apple

Page 7: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Short, extractive summaries are adequate for human-centered EL.

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

?… Apple

summarizing entity descriptions combinatorial optimization

summary of k candidate entity descriptions: k subsets of features (subject to a length limit)

Page 8: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)

ch(type: IT company) < ch(product: iPhone 5)

Apple (Inc.)

Samsung Electronics

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Page 9: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)

ch(type: IT company) < ch(product: iPhone 5)

Apple (Inc.)

Samsung Electronics

h𝑐 ( 𝑓 )=−log number  of   entities   having   𝑓number  of   all   entities Apple (Inc.)

- type: Company- type: IT company- product: iPhone 5- ...

Page 10: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment = maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarity

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Page 11: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarity

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Page 12: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)

a) logical inferenceentailment maximized ov

ov(type: IT company, type: Company) = MAX

b) string/numerical similarityov = max{similarity between properties, similarity between values}

ov(type: IT company, product: iPhone 5) = SMALL

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Page 13: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1)+characterizing power, -information overlap• Formulated as k Quadratic Knapsack Problems

(QKP)

weight of a feature: lengthprofit of a pair of features:

to maximize characterizing powerto minimize information overlap

Page 14: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = property’s value uniqueness * dissimilarity between values

di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment = minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Page 15: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness

di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment = minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Page 16: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (2): +differentiating power

• Differentiating power of a pair of features (di)

a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness

di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM

(Single-valued properties are more useful.)

b) logical inferenceentailment minimized di

di(type: IT company, type: Company) = MIN

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

Samsung Electronics- type: IT Company- ...

Page 17: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (2): +differentiating power

• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)

weight of a feature: lengthprofit of a pair of features: differentiating power

Page 18: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 19: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 20: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 21: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 22: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)

Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}

cs(context, product: iPhone 5) = HIGH

• class weighting: class frequency – inverse instance frequency (CF-IIF)

But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.

Text Knowledge Base

iPhone 6- type: Smartphone- ...

Samsung Electronics- type: IT Company- ...

Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...

Apple (Fruit)- type: Fruit- genus: Malus- ...

?

Candidate entities

Page 23: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (3): +relevance to context

• Solved by k Maximizing Marginal Relevance (MMR) frameworks• Features are iteratively selected.• In each iteration, candidate features are re-ranked by

• relevance to context• dissimilarity to selected features

Page 24: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Optimization goal (1+2+3)

• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)

Page 25: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Experiments: data sets

• Text corpora (with entity mentions linked to Wikipedia)• AQUAINT• IITB

• Knowledge base• DBpedia

• Gold-standard links• entity mentions Wikipedia articles DBpedia entities

Page 26: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Experiments: EL tasks

Apple (Inc.)- type: Company- product: iPhone 5

Apple (Corps)- type: Company- product: Let It Be

Apple (Fruit)- type: Fruit

?..., Apple has finally gone into big-screen territory, …

1 target entity• gold-standard

2 (very challenging) noise entities• sharing a common name with the target entity,

obtained from Wikipedia’s disambiguation pages

Page 27: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Experiments: approaches

• Proposed approaches• CHR: +characterizing power, -information overlap• DFF: +differentiating power• CNT: +relevance to context• COMB: CHR+DFF+CNT

• Baseline approaches• DESC: returns entire entity descriptions• RELIN: a state-of-the-art entity summarization approach for

generic purposes

• average length of entity descriptions: 680 characters• length limit for summaries: 100 characters (14.7%)

Page 28: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Experiments: extrinsic evaluation• COMB is the only approach that achieved the following

statistically significant results on both data sets:• accuracy (% of correct answers): COMB = DESC• time: COMB < DESC (22-23% faster)

Page 29: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Experiments: intrinsic evaluation• Statistically significant results on both data sets:

• human ratings: COMB > CHR > other approaches

Page 30: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Future work

• More extensive experiments• to test with not-in-the-list

• Summaries for automatic EL

Page 31: Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking

Questions?