BipRank: Ranking and Summarizing RDF Vocabulary Descriptions

ws .nju.edu.cn

BipRank: Ranking and SummarizingRDF Vocabulary Descriptions

Gong Cheng1, Feng Ji2, Shengmei Luo2, Weiyi Ge1, Yuzhong Qu1

1State Key Laboratory for Novel Software Technology, Nanjing University, China2Communication Services R&D Institute, ZTE Corporation, China

Presented at JIST2011

Gong Cheng (程龚 ) [email protected] 2 of 25

ws .nju.edu.cn

Outline

Introduction

Salience measurement

Vocabulary summarization

Conclusions


ws .nju.edu.cn

Vocabularies and Linked Data

Linked Data

Vocabularies Your own vocabulary

Reuse


ws .nju.edu.cn

Vocabulary search engines


ws .nju.edu.cn

Vocabularies

Scale


ws .nju.edu.cn

Vocabulary snippets --- state of the art


ws .nju.edu.cn

Vocabulary snippets --- our approach


ws .nju.edu.cn


Vocabulary summarization = ranking and selecting RDF sentences


ws .nju.edu.cn

Outline

Introduction



Conclusions


ws .nju.edu.cn

A bipartite view of vocabulary description


ws .nju.edu.cn

Surfer behavior --- type A


ws .nju.edu.cn

Surfer behavior --- type B


ws .nju.edu.cn

BipRank

type-A behavior

type-B behavior

Next step Current stepUniform?


ws .nju.edu.cn

Pattern of RDF sentence


ws .nju.edu.cn

p(s|u)

Frequency of Pattern(s)#RDF_sentence in the vocabulary that has the same pattern

Popularity of Pattern(s)#Vocabulary in the repository that has the same pattern


ws .nju.edu.cn

Evaluation setting

Test cases9 moderate-sized vocabularies randomly selected from Falcons

Gold standardSalience given by 6 human experts

CompetitorsCp: Zhang et al. (WWW2007)

Our approachBipRank-U: pattern-unaware

BipRank-F: using pattern frequency

BipRank-P: using pattern popularity

MetricPearson product-moment correlation coefficient


ws .nju.edu.cn

Evaluation results


ws .nju.edu.cn

Outline

Introduction



Conclusions


ws .nju.edu.cn

Goodness of a summary

Salience

Query relevanceTextual similarity between query and summary

CohesionTerm overlap between RDF sentences


ws .nju.edu.cn

Looking for the best summary

Multi-objective optimization

Single aggregate objective function

Solution: a greedy strategy


ws .nju.edu.cn

Evaluation setting

Judges18 human experts

Test cases190 searches over 2,012 vocabularies crawled by Falcons

CompetitorsGeneric: Zhang et al. (WWW2007)

Our approachQR: query relevance

QR+S: query relevance + salience

QR+C: query relevance + cohesion

MetricRating on a 10-point scale


ws .nju.edu.cn

Evaluation results


ws .nju.edu.cn

Performance testing

Size of vocabulary

Size of summary Runtime


ws .nju.edu.cn

Outline

Introduction



Conclusions


ws .nju.edu.cn

Conclusions

Salience measurementSentence-term graph

BipRank

Pattern of RDF sentence

Vocabulary summarizationSalience

Query relevance

Cohesion

Implemented in Falcons Ontology Searchhttp://ws.nju.edu.cn/falcons/ontologysearch/

BipRank: Ranking and Summarizing RDF Vocabulary Descriptions

Technology

Transcript of BipRank: Ranking and Summarizing RDF Vocabulary Descriptions