Large Scale Integration of Senses for the Semantic Web Jorge Gracia, Mathieu dAquin, Eduardo Mena...
-
Upload
nathaniel-monroe -
Category
Documents
-
view
219 -
download
2
Transcript of Large Scale Integration of Senses for the Semantic Web Jorge Gracia, Mathieu dAquin, Eduardo Mena...
Large Scale Integration of Sensesfor the Semantic Web
Jorge Gracia, Mathieu d’Aquin, Eduardo Mena
Computer Science and Systems Engineering Department (DIIS)University of Zaragoza, Spain
Knowledge Media Institute (KMi)Open University, United Kingdom
Jorge Gracia, Mathieu d’Aquin, Eduardo Mena
Computer Science and Systems Engineering Department (DIIS)University of Zaragoza, Spain
Knowledge Media Institute (KMi)Open University, United Kingdom
18th International World Wide Web Conference
Madrid, Spain, 20th-24th April 2009
WWW 2009 2
Outline
IntroductionMethodOptimization studyExperimentsConclusions
WWW 2009 3
Introduction
Current Semantic WebFavoured by the increasing amount of online ontologies already available on the WebHampered by the high heterogeneity that this growing semantic content introduces
The redundancy problemExcess of different semantic descriptions, coming from different sources, to describe the same intended meaning
Our proposalA method to cluster the ontology terms that one can find on the Semantic Web, according to the meaning that they intend to represent
WWW 2009 4
Introduction
WWW 2009 5
Introduction
WWW 2009 6
Redundancy problem: many representations of the same meanings
?Watson
apple
Introduction
The Semantic Web
WWW 2009 7
Proposed solution: pool of cross-ontology integrated senses
“clustered” Watson
apple
Introduction
The Semantic Web
The FruitThe Tree
The Company
WWW 2009 8
Introduction
Watson
The Semantic Web
Multiontology Semantic Disambiguator
Ontology Evolution
Semantic Browsing
Scarlet Ontology Matching Folksonomy Enrichment
QueryGen Semantic Query Generation
Question Answering
WWW 2009 9
Ontology terms
Synonym expansion
integration
Sense clustering
Keyword maps
Synonym maps
Senses
(each synonym map)
Watson
Similarity > threshold?
more ont. terms?
yes yesno
no
Extraction
Similarity Computation
risethreshold?
Integration
Senses Clustering
Disintegration
yesnoModify
integration degree
CIDER
Modifyintegration?
yes
Method
OFF-LINE
RUN-TIME
WWW 2009 10
Keyword maps: ontology terms with identical label
Watson
Method
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
WWW 2009 11
Synonym maps: ontology terms with synonym labels
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple
apple tree
Apple Inc.
Apple Inc.
apple tree
manzana
Watson
Method
WWW 2009 12
Method
Agglomerative clustering
CIDER
a
b
c
d
ad
a’
b
c
ad
a’’
b
c
. . .
ee
e
WWW 2009 13
Sense maps: semantically equivalent terms grouped
apple
apple
Apple Inc.
apple tree
manzana
apple
apple
apple
Apple Inc.
apple
apple
apple
The Fruit The Tree
The Company
apple tree
apple
apple
apple
apple
CIDER
Method
WWW 2009 14
Falling threshold(Integration)
Rising threshold(Disintegration)
Optimalthreshold
Method
WWW 2009 15
Integration level varies with similarity threshold
Optimization study
Integration Level = 1 - # finalSenses / # initialOntologyTerms
WWW 2009 16
Which similarity threshold is the best one?Three exploration ways:
Experimenting with ontology matching benchmarks Obtained 0.13 lower bound for optimal threshold
Contrasting with human opinion Range of good values between 0.2 and 0.3
Optimizing time response. Because: It will reduce the response time of the overall system Compatible with the other two ways It is not always feasible to have a large enough number of humans to ask or reference alignments
Optimization study
WWW 2009 17
Response time varies with thresholdOptimal value around 0.22
Optimization study
WWW 2009 18
Scalability study9156 keywords, 73169 different ontology terms to be clustered, Processing time is linear with number of ontology terms
Experiments
WWW 2009 19
Scalability studyProcessing time is independent of ontology size
Experiments
WWW 2009 20
Illustrative exampleKeyword = turkeySynonym map = turkey, Türkei, TürkiyeNº ontology terms = 58Nº Integrated senses = 9 (threshold = 0.27)
Experiments
WWW 2009 21
Experiments
More examples (threshold = 0.19)
Keyword #initial terms
#final senses
appalachian 7 1
apple 39 7
free 51 2
mace 7 3
plant 52 18
poll 5 4
stein 5 1
turkey 58 8
WWW 2009 22
Experiments
Positive factsTerms from different versions of the same ontology are easily detectedVery different meanings are not wrongly integrated (e.g., “plant” as “living organism” with “plant” as “industrial buildings”)
Negative factsHard to obtain a total integration of the same meanings (caused by very different semantic descriptions)
WWW 2009 23
ConclusionsRedundancy of semantic descriptions on the Web can be significantly reducedOur integration technique scales when used on a large body of knowledgeThe proposed method is flexible enough to configure and adapt our integration level to the necessities of client applications
Future workMore advanced prototypeMore extensive human-based evaluationStudy and evaluation of impact on other systems
Conclusions
WWW 2009 24
END of presentation
Thank you!