ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations
-
Upload
lora-aroyo -
Category
Technology
-
view
500 -
download
1
description
Transcript of ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations
Video Stream Analytics for Viewers and the TV Industry
WP6: External Data Service
2 WP6: External Data Service
Objectives
WP6 Objectives
3 TITLE
• O.6.1 • External data service design • Analysis of candidate sources • Analysis of data extracted
• O.6.2 • External data service employed • Enrich the EPG data • Enrich feature extraction data • Discover links between programs for novel recommendations
• O.6.3 • Publish data to the Linked Open Data cloud
The external data service aims at supporting the recommendation process by improving the connectivity of TV programs,
which does not surface with the standard EPG metadata.
ViSTA-TV External Data Service
4 TITLE
load
enrich publish
load
External Data Service 5
"World War II"
"Television Program"
"Green Cross Code"
"Tom Stoppard"
"David Prowse"
synopsis concepts
"In this episode, Larry meets two veterans who each lost a limb in World
War 2 to ask how differently we treat today\'s injured soldiers. Plus a
look back at the iconic Green Cross Code films.
With Stuart Hall and Miriam Stoppard"
po:long_synopsis
"Larry Lamb"
"Miriam Stoppard"
"Stuart Hall"
po:creditpo:credit
"http://dbpedia.org/resource/Larry_Lamb_(newspaper_editor)"
"http://dbpedia.org/resource/Larry_Lamb_(actor)"
"http://dbpedia.org/resource/Miriam_stoppard"
"http://dbpedia.org/resource/Stuart_Hall_(boxer)"
"http://dbpedia.org/resource/Stuart_Hall_(presenter)"
"http://dbpedia.org/resource/Stuart_Hall_(cultural_theorist)"
"http://dbpedia.org/resource/Stuart_Hall_(musician)"
po:credit
EPG
DWH
Concept tagging
DBpedia:<LABEL> LABELrdfs:labeldc:subject
LanguageDetection
SynopsisCredits
Title
DBpedia:<concept>
Zattoo Data Service: RDF
6 WP6: External Data Service
"9966901"
po:pid
"Die allerbeste Sebastian
Winkler Show"dc:title
"mit Motsi Mabuse, Lady Bitch Ray und
Sarah Brendel"
zattoo:episode_title
po:masterbrand
"(Premiere in Einsfestival )"
po:long_synopsis
po:category
po:episode
rdf:type
po:credit
po:credit
po:credit
"guest"
"Sarah Brendel"
"guest"
"Motsi Mabuse"
"guest"
"Lady Bitch Ray"
po:role
po:alias
po:role
po:alias
po:role
po:alias
po="http://purl.org/ontology/po/" zattoo="http://zattoo.com/" dc="http://purl.org/dc/elements/1.1/"rdf ="http://www.w3.org/1999/02/221rdf1syntax1ns#"
7 WP6: External Data Service
8 WP6: External Data Service
Enrichments Service
9 WP6: External Data Service
http://eculture2.cs.vu.nl:4000/browse/list_graphs
10 WP6: External Data Service
11 WP6: External Data Service
12 WP6: External Data Service
13 WP6: External Data Service
14 WP6: External Data Service
LOD Linking Service
15 TITLE
WP5
16 WP6: External Data Service
Recommendations
LOD for recommendations
17 External Data Service
• LOD datasets provide additional information which can be used to provide novel TV recommendations
• The challenge is to identify those links which are more useful to be used in the recommendation process.
• We started to analyze the datasets to identify features which can help in selecting the right links to use
18 WP6: External Data Service
Current & Future Work
Current & Future Work
1. Continuously adding new sources
2. Continuous improvement of EPG enrichment quality • complimentary services
• crowdsourcing
3. Defining LOD-based notion of serendipity
4. Further studies on the LOD patterns and their suitability for recommendations
5. Applying approach in other domains, e.g. books
19 TITLE
1. Adding new sources
20 TITLE
Dataset � Objects � Triples � Links to ... �
DBpedia � 3.77 mil � 400 mil � 27.2 mil �
Freebase � 23 mil � 337 mil � 3.9 mil �
BBC � 60 mil � 43.237 �
BBC music � 20 mil � 23.000 �
NYT � 10.467 � 345.889 � 23.400 �
MusicBrainz � 178 mil � 855.754 �
Flickr � 1.95 mil � 5.61 mil � 3.400.000 �
LinkedMDB � 503.242 � 6 mil � 162 756 �
GeoNames � 8 mil � 94 mil � 0 �
LinkedGeoData � 1 bil � 20 bil � 53204 �
2. Data cleaning
Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways.
enrichment
“Canaletto” ontology:Location
“Rococo” dbpedia:Rococo_(band)
• Type mis-classification • URI mis-annotation
v Integration of different text annotators results v Validation through crouwdsourcing tasks
Collaboration with: Silvia Giannini
2. Data cleaning
extractor label DBpedia ontology class DBpedia URI Canaletto ontology:Location dbpedia:Canaletto
TextRazor Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
• Label • NERD ontology class • sameAs link
• Label • DBpedia ontology class • Dbpedia URI
• Label • DBpedia category • Wikipedia page
• Label • DBpedia ontology class • DBpedia URI
Type & URI alignment
Voting system: <Canaletto, dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto> 3/4
Validate: • Labels relevance • Relevant labels types
results
integration
Aggregated enrichment
(based on majority vote)
Automatic integration of
text annotators for enrichment
Analysis of collected data for: • Voting system validation
(also URIs) • Parameters tuning
(e.g., complementarity handling)
Program synopsis
What if: • there is a tie-break? • majority of annotators are wrong? • more granular alignment ontologies
are adopted to avoid lack of type (or, type owl:Thing)?
Aggregated enrichment
(based on majority vote)
24 WP6: External Data Service
LOD & Serendipity
3. LOD-based Sependipity
25 WP6: External Data Service
Collaboration with:
LOD-based Sependipity
26 WP6: External Data Service
27 WP6: External Data Service
Diversity
4. LOD-based Patterns for Diversity
28 WP6: External Data Service
LOD-based method for increasing diversity in recommendations • extracts all the patterns from an
RDF dataset à clusters generated & measured for diversity
• fed into two statistical models • to determine, which semantic
patterns can extract subsets of Linked Data to improve diversity in recommendations
• data characterization step to choose model
• diversity measures, e.g. entropy & semantic similarity
• IMDB & DBPedia noisiness, size & sparsity of LOD
29 WP6: External Data Service
Applied to ‘Books’ Domain
References • Valentina Maccatrozzo, Lora Aroyo and Willem Robert van Hage, Crowdsourced
Evaluation of Semantic Patterns for Recommendations, User Modeling, Adaptation, and Personalization, Rome, Italy, July 10-14, 2013.
• Valentina Maccatrozzo, Davide Ceolin and Lora Aroyo, LOD Enrichment of TV Programs, in W3C Italy Event: Linked Open Data: where are we?, Rome, Italy, February 20-21, 2014.
• Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo and Paul Groth, Semantic Pattern-based Recommender, Extended Semantic Web Conference (ESWC2014), Heraclion, Greece, May 25-29, 2014.
• Ceolin, Davide, Moreau, Luc, O'Hara, Kieron, Fokkink, Wan, Van Hage, Willem Robert, Maccatrozzo, Valentina, Sackley, Alistair, Schreiber, Guus and Shadbolt, Nigel (2014) Two procedures for analyzing the reliability of open government data. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'2014), Montpellier, FR, 15 Jul 2014.
30 TITLE