04/10/23
A Pragmatic Approach to Semantic Repositories Benchmarking
Dhaval Thakker , Taha Osman, Shakti Gohil, Phil Lakin
© Dhaval Thakker, Press Association , Nottingham Trent University
2
OutlineOutline
Introduction to the Semantic Technology Project at PA Images
Semantic Repository benchmarking Parameters Datasets
Results and Analysis Loading and querying results Modification tests
Conclusions
3
http://www.pressassociation.comhttp://www.pressassociation.com
Semantic Web project Benchmarking Results Conclusions
Press Association & its operations UK’s leading multimedia news & information provider Core News Agency operation Content and Editorial services: Sports data, entertainment guides,
weather forecasting, photo syndication
4
Press Association ImagesPress Association Images
Semantic Web project Benchmarking Results Conclusions
5
Current Search EngineCurrent Search Engine
Semantic Web project Benchmarking Results Conclusions
6
Browsing EngineBrowsing Engine
Images of Sports, Entertainment, News domain entities: people, events, locations etc.
Lacks a meta-data rich browsing engine functionality that can utilize these entities for a greater browsing experience.
To help the searchers to browse through images based on these entities and their relationships
Semantic web based browsing engine
Semantic Web project Benchmarking Results Conclusions
7
Semantic Repository BenchmarkingSemantic Repository Benchmarking
“a tool, which combines the functionality of an RDF-based DBMS and an inference engine and can store data and evaluate queries, regarding the semantics of ontologies and metadata schemata.” *
Criteria for selection: The analytical parameters, such as expected level of
reasoning and query language support Selected semantic repositories
AllegroGraph, BigOWLIM, Oracle, Sesame,TDB Jena, Virtuoso
Semantic Web project Benchmarking Results Conclusions
* Kiryakov, A, Measurable Targets for Scalable Reasoning., Ontotext Technology White Paper, Nov 2007.
8
PA DatasetPA Dataset
Ontology – OWL-lite to OWL-DL Classification, subproperties, inverse properties and hasValue for automatic
classification 147 classes, 60 object properties and 30 data properties
Knowledge base – Entities 6.6 M triples Approx 1.2M entities Disk space: 1.23 GB
Image annotations - Each Image: 2-4 triples. 8M triples Approx 5 M images Disk space: 1.57 GB
Semantic Web project Benchmarking Results Conclusions
9
Published Benchmarks & datasetsPublished Benchmarks & datasets
The Lehigh University Benchmark (LUBM) first standard platform to benchmark OWL systems but it gradually fell behind with the increasing expressivity of
OWL reasoning
The University Ontology Benchmark (UOBM) benchmark improve the reasoning coverage of LUBM OWL-DL and OWL-Lite inferencing
Berlin SPARQL benchmark (BSBM) BSBM focuses provides comprehensive evaluation for SPARQL
query features.
Semantic Web project Benchmarking Results Conclusions
10
Benchmarking ParametersBenchmarking Parameters
1) Classification of semantic stores in Native, Memory-based or Database-based (A)
2) Forward-chaining or backward-chaining (A)3) Load Time (P)4) Query Response time (P)5) Query results analysis (P)6) RDF store update tests (P)7) Study different serialisation and impact on performance8) Scalability (A)9) Reasoner Integration (A)10) Query Language supported (A)11) Clustering supported (A)12) Programming Languages support (A)13) Platform support (A)14) RDFview support (support for non-rdf data) (A)
Semantic Web project Benchmarking Results Conclusions
11
UOBM: Load timeUOBM: Load time
Semantic Web project Benchmarking Results Conclusions
UOBM Loading Time
Virtuoso
SesameJenaTDB
Oracle
BigOWLIM
AllegroGraph
0
50
100
150
200
250
300
350
UOBM1 UOBM5 UOBM10 UOBM30 Total
UOBM dataset(s)
Tim
e (m
inu
tes)
12
Load Time: PA DatasetLoad Time: PA Dataset
KBImage Captions
TotalKB
Image CaptionsTotal
KBImage Captions
TotalKB
Image CaptionsTotal
KBImage Captions
Total
0 50 100 150 200 250 300 350
Time (minutes)
Virtuoso
Allegrograph
Sesame
JenaTDB
BigOWLIM
Semantic Repository
PA Dataset Loading Time
Semantic Web project Benchmarking Results Conclusions
13
Dataset QueriesDataset Queries
Measuring query execution speed SPARQL queries From JAVA based client Measured execution speed based on three runs PA Dataset:
13 queries to test the expressiveness supported Subsumption, Inverse properties (Q6,Q12, Q15), Automatic classification
UOBM Dataset: 15 queries - 12 queries fall under OWL-Lite and 3 queries are of OWL-DL
expressivity Q5 and Q7 involves transitive (owl:TransitiveProperty) Q6 relies on semantic repositories to support (owl:inverseOf) Q10 requires symmetric
Semantic Web project Benchmarking Results Conclusions
14
UOBM: Query executionUOBM: Query execution
Partially Answered N Query was not answered by this tool
Execution Timings (seconds)
No. Virtuoso Allegrograph Oracle Sesame Jena TDB BigOWLIM
Q1 6.766 (P) 21.921 0.141 0.203 0.031 0.047
Q2 N 8.906(P) N 0.001(P) 0.001(P) 0.062
Q3 N 651.237 N 0.109 0.016 0.062
Q4 N N(infinite) N 0.14 120 0.063
Q5 N 1.281 N N N 0.047
Q6 N 1153.025 N N N 0.047
Q7 N 300.12 N N N 0.001
Q8 N 6.843(P) N N N 0.031
Q9 N N N N N 0.031(P)
Q10 0 0.25 0.001(P) 0.001 0.001 0.016
Q11 N N(infinite) 0.001(P) 0.094(P) N(infinite) 0.062
Q12 N 476.507 N N N 0.016
Q13 N N N N N N
Q14 N N(infinite) N N N 0.016
Q15 N N N N N N
Semantic Web project Benchmarking Results Conclusions
15
PA Dataset: Query executionPA Dataset: Query execution
Query No. Virtuoso Allegrograph Sesame Jena TDB BigOWLIM
Q1 2.234 (P) 26.422 0.469(P) 0.047 0.219
Q2 N N N N 0.063
Q4 N N N N 0.047
Q5 0.172 1.719 0.141 N 0.078
Q6 N 3.765 N 0.001 0.45
Q7 84.469 28.688 0.203 N 0.093
Q8 0.047 3.39 0.11 0.001 0.062
Q9 0.156 1.782 0.171 N 0.016
Q10 0.001 1.734 0.047 N 0
Q11 N 1.734 0.11 0.001 0.062
Q12 N 16.14 N N 0.079
Q13 5.563(P) 1.812 0.016(P) 0.001 0.641
Q15 N 1.688 N N 0.031
Semantic Web project Benchmarking Results Conclusions
Partially Answered N Query was not answered by this tool
16
Two complete stores: BigOWLIM v/s AllegrographTwo complete stores: BigOWLIM v/s Allegrograph
Bigowlim vs Allegrograph
0
5
10
15
20
25
Q1 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q15
Queries
Se
co
nd
s
Bigowlim Allegrograph
Semantic Web project Benchmarking Results Conclusions
17
Two fast stores: BigOWLIM v/s SesameTwo fast stores: BigOWLIM v/s Sesame
Bigowlim vs Sesame
0
0.05
0.1
0.15
0.2
0.25
Q5 Q7 Q8 Q9 Q10 Q11
Queries
Se
co
nd
s
Bigowlim Sesame
Semantic Web project Benchmarking Results Conclusions
18
Modification Tests: InsertModification Tests: Insert
U1U2
U3
U1U2U3
U1U2
U3
U1U2
U3U1U2
U3
0 5 10 15 20
Time (seconds)
Virtuoso
Allegrogaph
Sesame
Jena TDB
BigOWLIM
Sem
anti
c R
epo
sito
ryImage Insert Operations
U1U2
U3
U1U2
U3U1
U2U3
U1U2U3
U1U2
U3
0 0.5 1 1.5 2
Time(seconds)
Allegrogaph
Virtuoso
Sesame
Jena TDB
BigOWLIM
Sem
an
tic R
ep
osit
ory
KB Insert operations
Semantic Web project Benchmarking Results Conclusions
19
Modification Tests: Update & DeleteModification Tests: Update & Delete
D1D2
D3U1
U2U3
Average
BigOWLIM
Allegrograph
Virtuoso
Sesame
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Time (seconds)
Modification Queries
Modification execution speed
Semantic Web project Benchmarking Results Conclusions
20
ConclusionsConclusions
PA Dataset benchmarking Essential and desirable requirements of our application into a set of functional (practical) and
non-functional (analytical) parameters To consolidate our findings we use UOBM, a public benchmark that satisfies the requirements
of our target system
Analysis All the repositories are sound ..however not complete BigOWLIM provides the best average query response time and answers maximum number of
queries for both the datasets. But Slower in loading, modification tests. Sesame, Jena, Virtuoso and Oracle offered sub-second query response time for the majority of
queries they answer. Allegrograph answers more queries than the former four repositories hence offers better
coverage of OWL properties. Average query response time for Allegrograph was the highest for both the dataset
Further work Expanding this benchmark exercise to billion triples More repositories Adding extra benchmarking parameters such as the performance impact of concurrent users
and transaction-related operations
Semantic Web project Benchmarking Results Conclusions
Top Related