Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking
-
Upload
gong-cheng -
Category
Presentations & Public Speaking
-
view
75 -
download
0
Transcript of Summarizing Entity Descriptions for Effective and Efficient Human-centered Entity Linking
Summarizing Entity Descriptions for Effective and
EfficientHuman-centered Entity
Linking
Gong Cheng, Danyun Xu, Yuzhong Qu
Websoft Research GroupState Key Laboratory for Novel Software Technology
Nanjing University, China
Entity Linking (EL)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Human-centered EL is needed
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
• for defining gold standard,• for crowdsourced EL.
entity description:set of property-value pairs (called features)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Entity descriptions are long.
Short, extractive summaries are adequate for human-centered EL.
Apple (Inc.)- type: Company- product: iPhone 5
Apple (Corps)- type: Company- product: Let It Be
Apple (Fruit)- type: Fruit
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
?… Apple
Short, extractive summaries are adequate for human-centered EL.
Apple (Inc.)- type: Company- product: iPhone 5
Apple (Corps)- type: Company- product: Let It Be
Apple (Fruit)- type: Fruit
?… Apple
summarizing entity descriptions combinatorial optimization
summary of k candidate entity descriptions: k subsets of features (subject to a length limit)
Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung Electronics
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Optimization goal (1)+characterizing power, -information overlap• Characterizing power of a feature (ch)
ch(type: IT company) < ch(product: iPhone 5)
Apple (Inc.)
Samsung Electronics
h𝑐 ( 𝑓 )=−log number of entities having 𝑓number of all entities Apple (Inc.)
- type: Company- type: IT company- product: iPhone 5- ...
Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)
a) logical inferenceentailment = maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)
a) logical inferenceentailment maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarity
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Optimization goal (1)+characterizing power, -information overlap• Information overlap between features (ov)
a) logical inferenceentailment maximized ov
ov(type: IT company, type: Company) = MAX
b) string/numerical similarityov = max{similarity between properties, similarity between values}
ov(type: IT company, product: iPhone 5) = SMALL
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Optimization goal (1)+characterizing power, -information overlap• Formulated as k Quadratic Knapsack Problems
(QKP)
weight of a feature: lengthprofit of a pair of features:
to maximize characterizing powerto minimize information overlap
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilaritydi = property’s value uniqueness * dissimilarity between values
di(type: IT company, type: Fruit) = SMALL*LARGE = MEDIUM
(Single-valued properties are more useful.)
b) logical inferenceentailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
Samsung Electronics- type: IT Company- ...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inferenceentailment = minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
Samsung Electronics- type: IT Company- ...
Optimization goal (2): +differentiating power
• Differentiating power of a pair of features (di)
a) string/numerical dissimilaritydi = dissimilarity between values * property’s value uniqueness
di(type: IT company, type: Fruit) = LARGE*SMALL = MEDIUM
(Single-valued properties are more useful.)
b) logical inferenceentailment minimized di
di(type: IT company, type: Company) = MIN
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
Samsung Electronics- type: IT Company- ...
Optimization goal (2): +differentiating power
• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)
weight of a feature: lengthprofit of a pair of features: differentiating power
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Relevance of a feature to the context of entity mention• cosine similarity in the class vector model (cs)
Vector(context) = {Smarphone, IT company}Vector(type: Fruit) = {Fruit}Vector(product: iPhone 5) = {Smartphone}
cs(context, product: iPhone 5) = HIGH
• class weighting: class frequency – inverse instance frequency (CF-IIF)
But with the release of the iPhone 6 and the 6 Plus phablet, Apple has finally gone into big-screen territory, giving Samsung a challenge in the category that the company has been dominating for some time now.
Text Knowledge Base
iPhone 6- type: Smartphone- ...
Samsung Electronics- type: IT Company- ...
Apple (Inc.)- type: Company- type: IT company- product: iPhone 5- ...
Apple (Fruit)- type: Fruit- genus: Malus- ...
?
Candidate entities
Optimization goal (3): +relevance to context
• Solved by k Maximizing Marginal Relevance (MMR) frameworks• Features are iteratively selected.• In each iteration, candidate features are re-ranked by
• relevance to context• dissimilarity to selected features
Optimization goal (1+2+3)
• Formulated as a Quadratic Multidimensional Knapsack Problem (QMKP)
Experiments: data sets
• Text corpora (with entity mentions linked to Wikipedia)• AQUAINT• IITB
• Knowledge base• DBpedia
• Gold-standard links• entity mentions Wikipedia articles DBpedia entities
Experiments: EL tasks
Apple (Inc.)- type: Company- product: iPhone 5
Apple (Corps)- type: Company- product: Let It Be
Apple (Fruit)- type: Fruit
?..., Apple has finally gone into big-screen territory, …
1 target entity• gold-standard
2 (very challenging) noise entities• sharing a common name with the target entity,
obtained from Wikipedia’s disambiguation pages
Experiments: approaches
• Proposed approaches• CHR: +characterizing power, -information overlap• DFF: +differentiating power• CNT: +relevance to context• COMB: CHR+DFF+CNT
• Baseline approaches• DESC: returns entire entity descriptions• RELIN: a state-of-the-art entity summarization approach for
generic purposes
• average length of entity descriptions: 680 characters• length limit for summaries: 100 characters (14.7%)
Experiments: extrinsic evaluation• COMB is the only approach that achieved the following
statistically significant results on both data sets:• accuracy (% of correct answers): COMB = DESC• time: COMB < DESC (22-23% faster)
Experiments: intrinsic evaluation• Statistically significant results on both data sets:
• human ratings: COMB > CHR > other approaches
Future work
• More extensive experiments• to test with not-in-the-list
• Summaries for automatic EL
Questions?