Exploratory Search Based on Linked Open Data

58
The Journey is the Reward - Explorative Semantic Search based on Linked Open Data Sophia Antipolis, 09. October 2014 Dr. Harald Sack Hasso-Plattner-Institute for IT Systems Engineering University of Potsdam Donnerstag, 9. Oktober 14

description

Exploratory Search and Intelligent Recommendations, CrEDIBLE 2014 Workshop, Sophia-Antipolis, France, 08-10.10.2014

Transcript of Exploratory Search Based on Linked Open Data

Page 1: Exploratory Search Based on Linked Open Data

The Journey is the Reward

-Explorative Semantic

Search based on Linked Open Data

Sophia Antipolis, 09. October 2014

Dr. Harald SackHasso-Plattner-Institute for IT Systems Engineering

University of PotsdamDonnerstag, 9. Oktober 14

Page 2: Exploratory Search Based on Linked Open Data

Hasso Plattner Institute for IT Systems EngineeringSemantic Technologies & Multimedia Retrieval Research Group

Donnerstag, 9. Oktober 14

Page 3: Exploratory Search Based on Linked Open Data

• Research Topics□ Semantic Web Technologies□ Knowledge Discovery□Ontological Engineering□Multimedia Analysis & Retrieval□ Social Networking□Data/Information Visualization

• Research Projects:

Hasso Plattner Institute for IT Systems EngineeringSemantic Technologies & Multimedia Retrieval Research Group

Donnerstag, 9. Oktober 14

Page 4: Exploratory Search Based on Linked Open Data

Overview

(1) Search & Retrievaland why we are not always content with it...

(2) Semantic Analysisto better „understand“ the content

(3) Explorative Semantic Searchswitching from „retrieval“ to „discovery“

(4) Intelligent Recommendation variatio delectat - variation is delectable

The Journey is the RewardExplorative Semantic Search based onLinked Open Data

Donnerstag, 9. Oktober 14

Page 5: Exploratory Search Based on Linked Open Data

Search & Retrieval today...

Donnerstag, 9. Oktober 14

Page 6: Exploratory Search Based on Linked Open Data

Autocompletion

Google Knowledge Graph

Donnerstag, 9. Oktober 14

Page 7: Exploratory Search Based on Linked Open Data

Query by Example

Visual Analysis

RecommendationsDonnerstag, 9. Oktober 14

Page 8: Exploratory Search Based on Linked Open Data

The Ordinary Archive is a Small World...

Jules Verne

Donnerstag, 9. Oktober 14

Page 9: Exploratory Search Based on Linked Open Data

Information Retrieval Paradigm

(Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)

Set of Documents

Files of records

Set of Queries

Information requests

queryDocument

Index

based on “similarity“

IndexingQuery

Formulation

String Matching

Donnerstag, 9. Oktober 14

Page 10: Exploratory Search Based on Linked Open Data

Let‘s assume you are looking for something and you don‘t know how to phrase your search

correctly....Donnerstag, 9. Oktober 14

Page 11: Exploratory Search Based on Linked Open Data

moon

Donnerstag, 9. Oktober 14

Page 12: Exploratory Search Based on Linked Open Data

moon spaceflight

Donnerstag, 9. Oktober 14

Page 13: Exploratory Search Based on Linked Open Data

moon spaceflight impact

Donnerstag, 9. Oktober 14

Page 14: Exploratory Search Based on Linked Open Data

moon spaceflight impact silent movie

Donnerstag, 9. Oktober 14

Page 15: Exploratory Search Based on Linked Open Data

moon spaceflight impact silent movie

Donnerstag, 9. Oktober 14

Page 16: Exploratory Search Based on Linked Open Data

• sometimes simple query matching with text content or metadata alone is not sufficient to fulfill the user‘s information needs

• what is missing are often the relational connections and circumstances, i.e. contextual information is needed to answer the query

• in order to achieve this the content must be „understood“

Semantic Analysis

Donnerstag, 9. Oktober 14

Page 17: Exploratory Search Based on Linked Open Data

Overview

(1) Search & Retrievaland why we are not always content with it...

(2) Semantic Analysisto better „understand“ the content

(3) Explorative Semantic Searchswitching from „retrieval“ to „discovery“

(4) Intelligent Recommendation variatio delectat - variation is delectable

The Journey is the RewardExplorative Semantic Search based onLinked Open Data

Donnerstag, 9. Oktober 14

Page 18: Exploratory Search Based on Linked Open Data

• Authoritative• structured data• semi-structured data• natural language text

• Non-authoritative• (free) user tags and comments• restricted vocabularies

• (Media) Analysis• low level features• high level features

How to Determine the Meaning of (Meta)data?

SemanticAnalysis

reliability

context

pragmatics

location dependency

accuracy

timedependency

level ofabstraction

(Meta)data Source

Donnerstag, 9. Oktober 14

Page 19: Exploratory Search Based on Linked Open Data

From Raw (Text) Data to Semantic Entities

Neil Armstrong

Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

Astronaut

rdf:type

SpaceMissiondbpedia-owl:crewMember

Person

rdfs:subClassOf

stringdbpedia-owl:birth_name

datedbpedia-owl:birth_date

Event

rdfs:subClassOf

integerdbpedia-owl:crewSize

Entities

Ontologies

tag

text

image annotation

Donnerstag, 9. Oktober 14

Page 20: Exploratory Search Based on Linked Open Data

Web of Data = Linked Open Data

Neil Armstrong

Astronaut

rdf:type

Person

rdfs:subClassOf

SpaceMissiondbpedia-owl:crewMember

stringdbpedia-owl:birth_name

datedbpedia-owl:birth_date

Event

rdfs:subClassOf

integerdbpedia-owl:crewSize

Donnerstag, 9. Oktober 14

Page 21: Exploratory Search Based on Linked Open Data

Named Entity Resolution

Neil ArmstrongNeil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

text

image annotation

Donnerstag, 9. Oktober 14

Page 22: Exploratory Search Based on Linked Open Data

Text

(1) Determine possible Entity Candidates

Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

• linguistic analysis (POS tagging)• n-gram analysis• normalization (stemming)• encoding and spelling• language dependent spellings• abbreviations & acronyms• type dependent spellings• alternative names and synonyms• fuzzy string mapping• ...

Named Entity Resolution

Donnerstag, 9. Oktober 14

Page 23: Exploratory Search Based on Linked Open Data

Text

Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

• Named Entity Tagging• Persons• Locations• Organization• Time• Date• Money• ...

(2) Subsequent Filtering of Entity Candidates

Named Entity Resolution

Donnerstag, 9. Oktober 14

Page 24: Exploratory Search Based on Linked Open Data

(3) Disambiguation of Correct Entity

• Which entity candidate to choose depends on the context• Context Analysis

• takes into account Ambiguity, Accuracy, and Reliability of source data and mapping

TemporalContext

SpatialContext

Context Item

SocialContext

Contextual Description

ClassDiversity

Level of Structure

SourceReliability

SourceDiversity

Context Dimensions

Ambiguity Accuracy

influences influences

Relevance

determines

N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, ESWC 2013

Donnerstag, 9. Oktober 14

Page 25: Exploratory Search Based on Linked Open Data

Neil Armstrong Tranquility Base

Houston

(3) Disambiguation of Correct Entity

• Determine Candidates for all Entities within the given contextEarth

Commander

EagleMission Control

Donnerstag, 9. Oktober 14

Page 26: Exploratory Search Based on Linked Open Data

Tranquility Base

Neil Armstrong

(3) Disambiguation of Correct Entity

• look for existing connections/relations among entity candidates within the given context

Mission Control

Donnerstag, 9. Oktober 14

Page 27: Exploratory Search Based on Linked Open Data

(3) Disambiguation of Correct Entity

• Link Graph AnalysisNeil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

Donnerstag, 9. Oktober 14

Page 28: Exploratory Search Based on Linked Open Data

(3) Disambiguation of Correct Entity

• Link Graph AnalysisNeil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

Tranquility Base

Neil Armstrong

Mission Control

Houston

Eagle

Earth

Donnerstag, 9. Oktober 14

Page 29: Exploratory Search Based on Linked Open Data

NeilArmstrong Houston Tranquility

BasemissioncontrolearthEagle Term

Entities

(3) Disambiguation of Correct Entity

• Link Graph Analysis• identify connected components that cover the most term partitions• only one node per partition should be covered• strongly connected components consolidate the disambiguation

Donnerstag, 9. Oktober 14

Page 30: Exploratory Search Based on Linked Open Data

(3) Disambiguation of Correct Entity

• for our example Neil Armstrong, the 38-year-old civilian commander, radioes to earth an the mission control room here: „Houston, Tranquility Base here, The Eagle has landed.“

Donnerstag, 9. Oktober 14

Page 31: Exploratory Search Based on Linked Open Data

Overview

(1) Search & Retrievaland why we are not always content with it...

(2) Semantic Analysisto better „understand“ the content

(3) Explorative Semantic Searchswitching from „retrieval“ to „discovery“

(4) Intelligent Recommendation variatio delectat - variation is delectable

The Journey is the RewardExplorative Semantic Search based onLinked Open Data

Donnerstag, 9. Oktober 14

Page 32: Exploratory Search Based on Linked Open Data

Search vs. Exploration

Donnerstag, 9. Oktober 14

Page 33: Exploratory Search Based on Linked Open Data

Search vs. Exploration

V E R N E, Jules:From the Earth to the Moon, Direct in 97 Hours 20 Minutes and a Trip Round It, Sampson Low, Marston&Company, London (1873),viii, 323 p. plates.

GRC C.194.a.659, 12516.g.20

Donnerstag, 9. Oktober 14

Page 34: Exploratory Search Based on Linked Open Data

• Find another („comparable“) book, (that will interest me...)

• Find books on related topics?• How did the author / the topic develop over time? • What else would I like to read?

Search vs. Exploration

Donnerstag, 9. Oktober 14

Page 35: Exploratory Search Based on Linked Open Data

• Find another („comparable“) book, (that will interest me...)

• Find books on related topics?• How did the author / the topic develop over time? • What else would I like to read?

Search vs. Exploration

Exploratory Search

Donnerstag, 9. Oktober 14

Page 36: Exploratory Search Based on Linked Open Data

(Traditional) Libraries also enable Exploratory SearchDonnerstag, 9. Oktober 14

Page 37: Exploratory Search Based on Linked Open Data

(Traditional) Librarians enable „intelligent“ RecommendationsDonnerstag, 9. Oktober 14

Page 38: Exploratory Search Based on Linked Open Data

Overview

(1) Search & Retrievaland why we are not always content with it...

(2) Semantic Analysisto better „understand“ the content

(3) Explorative Semantic Searchswitching from „retrieval“ to „discovery“

(4) Intelligent Recommendation variatio delectat - variation is delectable

The Journey is the RewardExplorative Semantic Search based onLinked Open Data

Donnerstag, 9. Oktober 14

Page 39: Exploratory Search Based on Linked Open Data

Exploratory Search based on Linked Open Data

http://dbpedia.org/resource/From_the_Earth_to_the_Moon

Donnerstag, 9. Oktober 14

Page 40: Exploratory Search Based on Linked Open Data

Exploratory Search based on Linked Open Data:From_the_Earth_to_the_Moon

:Jules_Verne

dbpedia-owl:author

:H._G._Wells

dbpedia-owl:influenced

dbpedia-owl:Bookrdf:type

category:1865_novels

category:Frence_science_fiction_novels

category:Novels_by_Jules_Verne

category:Moon_in_fiction

category:Fictional_rivalries

category:Novels_set_in_Florida

category:1860s_science_fiction_novels

...

dcterms:subject

:In_Search_of_the_Castaways

dbprop:preceded_by

Donnerstag, 9. Oktober 14

Page 42: Exploratory Search Based on Linked Open Data

• category:French_science_fiction_novels• category:Moon_in_fiction• category:Novels_by_Jules_Verne• category:American_Civil_War_novels• category:Novels_set_in_Florida• category:1860s_science_fiction_novels• category:1865_novels• category:Fictional_rivalries

Similar Results ➞ belong to consistent categories

Donnerstag, 9. Oktober 14

Page 43: Exploratory Search Based on Linked Open Data

• category:French_science_fiction_novels• category:Moon_in_fiction• category:Novels_by_Jules_Verne• category:American_Civil_War_novels• category:Novels_set_in_Florida• category:1860s_science_fiction_novels• category:1865_novels• category:Fictional_rivalries

Problem: too „similar“ Recommendations (in the long run)

Similar Results ➞ belong to consistent categories

Donnerstag, 9. Oktober 14

Page 44: Exploratory Search Based on Linked Open Data

Donnerstag, 9. Oktober 14

Page 45: Exploratory Search Based on Linked Open Data

Serendipity helps to improve the Quality of Discovery & Exploration

Serendipity:

• finding a solution to a problem that is relevant but not intentionally thought of

• a recommended item is a seredipitious discovery if it is interesting and positively surprising because one was not in search for any of its kind

Relevance Unexpectedness+=

Serendipity Donnerstag, 9. Oktober 14

Page 46: Exploratory Search Based on Linked Open Data

Serendipity helps to improve the Quality of Discovery & Exploration

Relevance:

• In the general case, often referenced (cited) facts are considered more relevant

• In the special case, the relevance of a fact must be adapted to the current (personal) context

Unexpectedness:

• the likelyhood of co-occurrence should be low

combine • similarity based recommendations • with serendipitiuos but relevant findings

ExplorationDonnerstag, 9. Oktober 14

Page 47: Exploratory Search Based on Linked Open Data

Serendipity: Skip most comon (similar) categories (classes)

category:1865_novels

category:Frence_science_fiction_novels

category:Novels_by_Jules_Verne

category:Moon_in_fiction

category:Fictional_rivalries

category:Novels_set_in_Florida

category:1860s_science_fiction_novels

...

most comon (similar) category for a specific entity • contains the most similar entities for this specific entity

dcterms:subject

Donnerstag, 9. Oktober 14

Page 48: Exploratory Search Based on Linked Open Data

Similarity ≈ Sharing comon Properties

dbpedia-owl:Bookrdf:typerdf:type

dbpedia:Science_Fictiondbpedia-owl:literaryGenre dbpedia-owl:literaryGenre

dbpedia-owl:author dbpedia-owl:author

dbpedia:Jules_Verne

dbpedia:Voyages_Extraordinairesdbpedia-owl:series dbpedia-owl:series

Francedbprop:country dbprop:country

category:Moon_in_fictiondcterms:subject dcterms:subject

category:French_science_fiction_novelsdcterms:subject dcterms:subject

Donnerstag, 9. Oktober 14

Page 53: Exploratory Search Based on Linked Open Data

Serendipity: look for the least expected yet relevant...

Donnerstag, 9. Oktober 14

Page 57: Exploratory Search Based on Linked Open Data

Content-Based Recommendations

Industry Project: cENTERTAIN.me video recommendationhttp://mediaglobe.yovisto.com:8080/c.me-gui-0.0.1-SNAPSHOT2/

Donnerstag, 9. Oktober 14

Page 58: Exploratory Search Based on Linked Open Data

Overview

(1) Search & Retrievaland why we are not always content with it...

(2) Semantic Analysisto better „understand“ the content

(3) Explorative Semantic Searchswitching from „retrieval“ to „discovery“

(4) Intelligent Recommendation variatio delectat - variation is delectable

The Journey is the RewardExplorative Semantic Search based onLinked Open Data

Dr. Harald SackHasso-Plattner-Institut für Softwaresystemtechnik, Universität PotsdamProf.-Dr.-Helmert-Str. 2-3, D-14482 Potsdam

Donnerstag, 9. Oktober 14