Philip Dudchuk (RIA Novosti) News the new way - Semantic Publishing platform
News Semantic Snapshot
-
Upload
joseluisredondo -
Category
Engineering
-
view
118 -
download
3
Transcript of News Semantic Snapshot
![Page 1: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/1.jpg)
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / [email protected]
@giusepperizzo / [email protected]
@McHildebrand / [email protected]
@rtroncy / [email protected]
![Page 2: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/2.jpg)
15th International Conference on Web Engineering (ICWE) 2
NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)
Named Entity Expansion
News item
News Semantic Snapshot (NSS)
Snowden asks Russia for asylum
April 15, 2023
![Page 3: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/3.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE)
NEWS ENTITY EXPANSION
NSS
3
(20) (1) (4) (4)Web-based, Unsupervised, Sequential
![Page 4: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/4.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 4
Involving: (experts in the news domain + users)
Dimensions:
Play with the data and help us to extend it at:
https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation
EVALUATION: NEWS ENTITIES GOLD STANDARD
(1) Video Subtitles(2) Image in the video(3) Text in the video image(4) Suggestions of an expert(5) Related articles
![Page 5: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/5.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 5
DOCUMENT COLLECTION(20 variations)
Using Google Custom Search Engine (CSE)1
[1] https://cse.google.com/cse/all
N
…N NN N N
N N N N N N N N N N
N N N
Web sites to be crawled:- Google:
- L1 : A set of 10 internationals English speaking newspapers
- L2 : A set of 3 international newspapers used in GS
Temporal Window:- 1W:
- 2W:
Annotation filtering:
![Page 6: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/6.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 6
DOCUMENT ANNOTATION
NER extractors in NERD *
(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)
![Page 7: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/7.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 7
ENTITY FILTERING(4 variations)
Filtering dimensions:
- F1: NERD type:- Person
- Organization
- Location
- F2: Confidence score:> Threshold
- F3: Capitalization:
countrypresidentObamaasylum
![Page 8: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/8.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 8
RANKING STRATEGIES (1)
increase representativeness leverage on entity frequency
(Freq) (Gaussian)
![Page 9: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/9.jpg)
April 15, 2023
RANKING STRATEGIES (2)
Rules: [ Sel(e) , ]
POPULARITY EXPERT RULES
9
- Based on Google Trends- w = 2 months- μ + 2*σ (2.5%)- .
Example:- [ Location, = 0.48 ]- [ Person, = 0.74 ]- [ Organization, = 0.95 ]- [ < 2 , = 0.0 ]
(4 variations)
15th International Conference on Web Engineering (ICWE) 9
![Page 10: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/10.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 10
EVALUATION: MEASURES
Mean P/R at N:- Most popular- Easy to interpret
Mean Average Precision at N (MAP):- Considers ranking - Relevant documents at the top positions
Mean Normalized Discounted Cumulative Gain at N (MNDCG):- Different levels of document relevance- The lower an high relevant document is ranked, the less useful
is for the userN = 10
![Page 11: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/11.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 11
RESULTS (1)
Baselines:
BS1: Former Entity Expansion Implementation*
• Google• No temporal window• No_Schema.org • No_Filter•
BS2: TFIDF-based Function.
(*) Describing and Contextualizing Events in TV News Show, Redondo et
al. (2014)
![Page 12: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/12.jpg)
RESU
LTS (
2)
12
20 x 4 x 4 =
320 runs
F3 Freq + POP + EXPGoogle + 2W + Schema.org 12
![Page 13: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/13.jpg)
April 15, 2023 15th International Conference on Web Engineering (ICWE) 13
CONCLUSIONS & FUTURE WORK- News Entity Expansion Generate the News
Semantic Snapshot- Best score: 0.666 in MNDCG at 10, better than BS1/2
• Collection: CSE (Google + 2W + Schema.org)• Filtering: F3• Ranking: Freq + POP + EXP
What’s next:- Extend the Ground Truth- Supervised approach- Better exploit semantic connections between entities in KB- Is MNDCG@10 an ideal indicator for assessing NSS quality?
![Page 14: News Semantic Snapshot](https://reader035.fdocuments.in/reader035/viewer/2022062320/55ce04e7bb61eb926e8b47b8/html5/thumbnails/14.jpg)
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / [email protected]
@giusepperizzo / [email protected]
@McHildebrand / [email protected]
@rtroncy / [email protected]