Cross-Lingual Linking of News Stories using ESA

10
Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e Enabling Networked Knowledge Cross-Lingual Linking of News Stories using ESA Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI, NUI Galway, Ireland OEG, UPM, Madrid, Spain Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012

description

Cross-Lingual Linking of News Stories using ESA. Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia DERI , NUI Galway, Ireland OEG, UPM, Madrid, Spain. Tuesday, 18 Dec, 2012 CL!NSS, FIRE-2012 . Overview. P roblem Space Approach Search Space Reduction - PowerPoint PPT Presentation

Transcript of Cross-Lingual Linking of News Stories using ESA

Page 1: Cross-Lingual Linking of  News Stories  using ESA

Copyright 2011 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Cross-Lingual Linking of NewsStories using ESA

Nitish Aggarwal, Kartik Asooja, Paul Biutelaar, Tamara Polajanar, Jorge Gracia

DERI, NUI Galway, IrelandOEG, UPM, Madrid, Spain

Tuesday, 18 Dec, 2012CL!NSS, FIRE-2012

Page 2: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Overview

Problem Space Approach

Search Space Reduction Semantic Ranking

Cross-Lingual Explicit Semantic Analysis (CL-ESA) Evaluations Conclusion & Future Work

2

Page 3: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Problem Space

Cross-lingual news story linking identify the same news articles in different languages Cross-Lingual Plagiarism detection

Data set 50 English News Stories 50K Hindi News Stories

Challenge Not directly Translated

– Similar keywords in different stories– Different keywords in similar stories

3

Page 4: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Approach

Search Space Reduction News publication dates

– by taking K days window Vocabulary overlap

– Translating English news stories using Google Translate

Semantic Ranking Rank the news stories with their semantic relatedness CL-ESA semantic relatedness score

4

Page 5: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Corpus-based Relatedness Semantic meaning as a distributional vector

– Words that occur in similar contexts tend to have similar/ related meanings i.e. meaning of a word can be defined in terms of its context. (Distributional Hypothesis (Harris, 1954))

Latent Semantic Analysis (LSA)– Latent or implicit semantics (unsupervised)

Explicit Semantic Analysis (ESA)– Explicit semantics from explicitly derived concepts

(supervised)

5

Semantic Ranking/Relatedness

Page 6: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge6

Word1

Wordn

W1*URI1+w2*URI2…. wn*URIn

W1*URI1+w2*URI2…. wn*URIn

Word1

Wordn

W1*URI1+w2*URI2…. wn*URIn

W1*URI1+w2*URI2…. wn*URIn

Word1

Wordn

W1*URI1+w2*URI2…. wn*URIn

W1*URI1+w2*URI2…. wn*URIn

EN

HI

ES

Inverted Index

W11*URI1+w12*URI2…. w1n*URIn

W11*URI1+w12*URI2…. w1n*URIn Vector Cosine

Semantic Relatedness

Term@en

Term@hi

Cross lingual ESA (CL-ESA)

Multilingual Wikipedia Index EN, DE, ES, PT, FR, NL, HI

– Easily extendable for other languages Performed better than CL-latent models

Page 7: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Run1 window of 4 days (2 days before and 2 days after) Rank all news stories using CL-ESA

Run2 window of 14 days (7 days before and 7 days after) Rank all news stories using Modified CL-ESA

Run3 English stories were translated into Hindi using Google

translator Took top 1000 Hindi news using vocabulary overlap Re-rank all news stories using CL-ESA

7

Experiments

Page 8: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

CL!NSS challenge

8

Evaluation: Results

Page 9: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Initial approach for cross lingual linking of news stories Bigger window with modified CL-ESA works best Translated vocabulary overlap did not work well

Use other ranking scores LSA, LDA

Evaluate separate effect of components Bigger window size Vs Ranking function

9

Conclusion

Page 10: Cross-Lingual Linking of  News Stories  using ESA

Digital Enterprise Research Institute www.deri.ie

Enabling Networked Knowledge

Thank You Questions?

10