1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi...

42
1 Efficient SPARQL Query Processing in MapR educe through Data Partitioning and Index ing Nie Zhi [email protected]

Transcript of 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi...

Page 1: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

1

Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing

Nie [email protected]

Page 2: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

2

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 3: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

3

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 4: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

4

RDF

Resource Description Framework subject-predicate-object expressions (S-P-O)

Nobel Prize in Physics

阿尔伯特•爱因斯坦

Albert EinsteinisCalled

hasWonPrize

wasBornIn

Albert EinsteinisCalled

Ulm

http://www.mpii.de/yago/resource/

Nobel Prize in Physics

Albert EinsteinisCalled

hasWonPrize

wasBornIn

isCalled

S

OP

Page 5: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

5

SPARQL Query Language for RDF

PREFIX source:<http://www.mpii.de/yago/resource/>SELECT ?name ?whereWHERE {?who source:hasWonPrize Nobel Prize in Physics.?who source:isCalled ?name.?who source:wasBornIn ?where}

Query:

阿尔伯特•爱因斯坦

Albert EinsteinisCalled

hasWonPrize

wasBornIn

Albert EinsteinisCalled

Ulm

http://www.mpii.de/yago/resource/

Nobel Prize in Physics

isCalled

hasWonPrize

wasBornIn

isCalled

name where

Albert Einstein Ulm

阿尔伯特•爱因斯坦

Ulm

Page 6: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

6

RDF knowledge base…

Semantic web , Web2.0Extract Knowledge from the Web

– YAGO– DBpedia– Freebase– Billion Triple Challenge…

Page 7: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

7

RDF knowledge base

295 data sets31 billion RDF triples504 million RDF links

(September 2011)

Page 8: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

8

Challenge and Opportunity

Challenge– The RDF data is growing rapidly. Researchers are working with billi

ons of triples.– Relational database has limited ability on scalability.

Opportunity– Google GFS, MapReduce, BigTable– Hadoop: implementation of the MapReduce framework and HDFS– Achievements:Yahoo!, Amazon,腾讯,百度,淘宝 ......

We need to consider the recent achievements for handling massive scale Web data on clusters

Page 9: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

9

MapReduce: word count file1: the weather is good file2: today is good flie3: good weather is good.

Worker 1:

(the 1), (weather 1),

(is 1), (good 1). Worker 2:

(today 1), (is 1), (good 1). Worker 3:

(good 1), (weather 1),

(is 1), (good 1).

Worker 1:

(the 1) Worker 2:

(is 1), (is 1), (is 1) Worker 3:

(weather 1), (weather 1) Worker 4:

(today 1) Worker 5:

(good 1), (good 1),

(good 1), (good 1)

Worker 1:

(the 1) Worker 2:

(is 3) Worker 3:

(weather 2) Worker 4:

(today 1) Worker 5:

(good 4)

Map output Reduce Input Reduce Output

Map(k1,v1) → list(k2,v2) Reduce(k2, list (v2)) → list(k3,v3)

Page 10: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

10

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 11: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

11

Solution 1

Directly map the SPARQL into a sequence of MapReduce Jobs

Pro.– scalable

Con.– a burden on the user in terms of usage and maintenanc

e– Not support complex query– No index– Not consider the RDF data characteristics

Page 12: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

12

Solution 2

Map the SPARQL to Pig -> MapReduce Jobs

Pro.– Scalable– Support complex query

Con.– No index– Not consider the RDF data characteristics

Page 13: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

13

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 14: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

14

Architecture overview

Map-Reduce Runtime

HDFS

JSON Data Model

Cluster Deployment and Management

JAQL Query Language

SPARQL Translator

Transform Filter Join Sort Group Built-in Functions

BGP Union Filter Optional RDF 2 JSON

Loader

Optimizer

Page 15: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

15

JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format

It is based on a subset of the JavaScript Programming Language

JSON is built on two structures:– A collection of name/value (Key/value) pairs– An ordered list of values (array)

Page 16: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

16

RDF to JSONRDF triple JSON format

Albert Einstein isCalled Albert EinsteinAlbert Einstein isCalled 阿尔伯特•爱因斯坦Albert Einstein wasBornIn UlmAlbert Einstein wasBornOnDate 1879-03-14Albert Einstein hasWonPrize Nobel Prize in PhysicsAlbert Einstein diedOnDate 1955-04-18

[{s:Albert Einstein, p:isCalled, o:Albert Einstein },{s:Albert Einstein, p:isCalled, o: 阿尔伯特•爱因斯坦 },{s:Albert Einstein, p:wasBornIn, o:Ulm },{s:Albert Einstein, p:wasBornOnDate, o:1879-03-14 },{s:Albert Einstein, p:hasWonPrize, o:Nobel Prize in Physics },{s:Albert Einstein, p:diedOnDate, o:1955-04-18 }]

JSON is built on two structures:– name/value (Key/value) pairs {s:Albert Einstein}

– list of values(array) [{s:Albert Einstein},{}…]

Page 17: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

17

JAQL

JAQL is an open-source language for querying JSON (JavaScript Object Notation) data.

It provides a general parallel data processing platform on Hadoop

Developed by IBM

Page 18: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

18

Basic Idea

SPARQL can be supported on Hadoop by translating queries into JAQL operators

Filter

Transform

Join

Group

Sort

Built-in Function merge (d1, d2), regex(), etc

Page 19: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

19

SPARQL to JAQLTransformation

SPARQL Query

PREFIX source:<http://www.mpii.de/yago/resource/>SELECT ?name ?whereWHERE {?who source:hasWonPrize Nobel Prize in Physics.?who source:isCalled ?name.?who source:wasBornIn ?where.}

JAQL Query

//read files from hdfs by predicate name $1 = read(hdfs('source:hasWonPrize')) -> filter $.o == “Nobel Prize in Physics” //select -> transform {$.s}; //project$2 = read(hdfs('source:isCalled')) -> transform {$.s,$.o};$3 = read(hdfs('source:wasBornIn')) -> transform {$.s,$.o};//mult-joinjoin $1, $2, $3 where $1.s == $2.sand $2.s == $3.s into { name:$2.o, where:$3.o }; //project to ?name ?where

{s:Albert Einstein, p:isCalled, o:Albert Einstein }

1

2

3

1

2

3

4

Mapreduce job1

Mapreduce job2

Mapreduce job3

Mapreduce job4

Page 20: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

20

Data storage

In Hadoop framework, – a file is the smallest unit of input to a MapReduc

e job and read from the disk.

One straightforward partitioning strategy is to store all the data in one file– Must scan the entire data in the read operation

Data Partitioning Strategy

Page 21: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

21

Data Partitioning Strategy

Horizontal partitioningVertical partitioningClustered property partitioning

Page 22: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

22

Horizontal partitioning with JSON

For example

Store in HDFS

Albert Einstein isCalled Albert EinsteinAlbert Einstein isCalled 阿尔伯特•爱因斯坦Albert Einstein wasBornIn UlmAlbert Einstein wasBornOnDate 1879-03-14Albert Einstein hasWonPrize Nobel Prize in PhysicsAlbert Einstein diedOnDate 1955-04-18Charles K. Kao hasWonPrize Nobel Prize in PhysicsCharles K. Kao wasBornIn ShanghaiFaye Wong hasWonPrizeMTV Video Music AwardsFaye Wong wasBornIn Beijing

File 1 File name: Hash(Subject1)

[{s:Albert Einstein, p:isCalled, o:Albert Einstein },{s:Albert Einstein, p:isCalled, o:阿尔伯特•爱因斯坦 },{s:Albert Einstein, p:wasBornIn, o:Ulm },{s:Albert Einstein, p:wasBornOnDate, o:1879-03-14 },{s:Albert Einstein, p:hasWonPrize, o:Nobel Prize in Physics },{s:Albert Einstein, p:diedOnDate, o:1955-04-18 }]

File 2 File name: Hash(Subject2)

[{s:Charles K. Kao , p:hasWonPrize, o:Nobel Prize in Physics },{s:Charles K. Kao , p:wasBornIn, o:Shanghai }]

File 3 File name: Hash(Subject3)

[{s:Faye Wong, p:hasWonPrize, o:MTV Video Music Awards },{s:Faye Wong, p:wasBornIn, o:Beijing}]

Page 23: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

23

Vertical Partitioning with JSON

For example

Store in HDFSFile 1 File name: isCalled

[{s:Albert Einstein, o:Albert Einstein },{s:Albert Einstein, o:阿尔伯特•爱因斯坦 }]

File 2 File name: wasBornIn

[{s:Albert Einstein, o:Ulm },{s:Charles K. Kao , o:Shanghai},{s:Faye Wong, o:Beijing}]]

File 5 File name: diedOnDate

[{s:Albert Einstein, o:1955-04-18 }]

File 3 File name: wasBornOnDate

[{s:Albert Einstein, o:1879-03-14 }]

File 4 File name: hasWonPrize

[{s:Albert Einstein, o:Nobel Prize in Physics },{s:Charles K. Kao , o:Nobel Prize in Physics },{s:Faye Wong, o:MTV Video Music Awards }]

Albert Einstein isCalled Albert EinsteinAlbert Einstein isCalled 阿尔伯特•爱因斯坦Albert Einstein wasBornIn UlmAlbert Einstein wasBornOnDate 1879-03-14Albert Einstein hasWonPrize Nobel Prize in PhysicsAlbert Einstein diedOnDate 1955-04-18Charles K. Kao hasWonPrize Nobel Prize in PhysicsCharles K. Kao wasBornIn ShanghaiFaye Wong hasWonPrizeMTV Video Music AwardsFaye Wong wasBornIn Beijing

Page 24: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

24

Clustered property partitioning with JSON

For example

Store in HDFS

Albert Einstein isCalled Albert EinsteinAlbert Einstein isCalled 阿尔伯特•爱因斯坦Albert Einstein wasBornIn UlmAlbert Einstein wasBornOnDate 1879-03-14Albert Einstein hasWonPrize Nobel Prize in PhysicsAlbert Einstein diedOnDate 1955-04-18Charles K. Kao hasWonPrize Nobel Prize in PhysicsCharles K. Kao wasBornIn ShanghaiFaye Wong hasWonPrizeMTV Video Music AwardsFaye Wong wasBornIn Beijing

File 1 File name: cluster1

[{s:Albert Einstein, p:isCalled, o:Albert Einstein },{s:Albert Einstein, p:isCalled, o:阿尔伯特•爱因斯坦 },{s:Albert Einstein, p:wasBornIn, o:Ulm },{s:Albert Einstein, p:wasBornOnDate, o:1879-03-14 },{s:Albert Einstein, p:hasWonPrize, o:Nobel Prize in Physics },{s:Albert Einstein, p:diedOnDate, o:1955-04-18 }]

File 2 File name: cluster2

[{s:Charles K. Kao , p:hasWonPrize, o:Nobel Prize in Physics },{s:Charles K. Kao , p:wasBornIn, o:Shanghai },{s:Faye Wong, p:hasWonPrize, o:MTV Video Music Awards },{s:Faye Wong, p:wasBornIn, o:Beijing}]

Page 25: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

25

Partition Index: Vertical Partitioning

Inverted Indexs

s File list

Albert Einstein isCalled,wasBornIn,wasBornOnDate, hasWonPrize,diedOnDate

……

Inverted Indexs

o File list

Albert Einstein isCalled,

…….

File 1 File name: isCalled

[{s:Albert Einstein, o:Albert Einstein },{s:Albert Einstein, o:阿尔伯特•爱因斯坦 }]

File 2 File name: wasBornIn

[{s:Albert Einstein, o:Ulm },{s:Charles K. Kao , o:Shanghai},{s:Faye Wong, o:Beijing}]

File 5 File name: diedOnDate

[{s:Albert Einstein, o:1955-04-18 }]

File 3 File name: wasBornOnDate

[{s:Albert Einstein, o:1879-03-14 }]

File 4 File name: hasWonPrize

[{s:Albert Einstein, o:Nobel Prize in Physics },{s:Charles K. Kao , o:Nobel Prize in Physics },{s:Faye Wong, o:MTV Video Music Awards }]

Page 26: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

26

Partition Index: Horizontal partitioning

Inverted Indexs

p File list

isCalled Hash(Subject1)

……

Inverted Indexs

o File list

Nobel Prize in Physics Hash(Subject1),Hash(Subject2)

……

File 1 File name: Hash(Subject1)

[{s:Albert Einstein, p:isCalled, o:Albert Einstein },{s:Albert Einstein, p:isCalled, o:阿尔伯特•爱因斯坦 },{s:Albert Einstein, p:wasBornIn, o:Ulm },{s:Albert Einstein, p:wasBornOnDate, o:1879-03-14 },{s:Albert Einstein, p:hasWonPrize, o:Nobel Prize in Physics },{s:Albert Einstein, p:diedOnDate, o:1955-04-18 }]

File 2 File name: Hash(Subject2)

[{s:Charles K. Kao , p:hasWonPrize, o:Nobel Prize in Physics },{s:Charles K. Kao , p:wasBornIn, o:Shanghai }]

File 3 File name: Hash(Subject3)

[{s:Faye Wong, p:hasWonPrize, o:MTV Video Music Awards },{s:Faye Wong, p:wasBornIn, o:Beijing}]

Page 27: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

27

Partition Index: Clustered property partitioning

File 1 File name: cluster1

[{s:Albert Einstein, p:isCalled, o:Albert Einstein },{s:Albert Einstein, p:isCalled, o:阿尔伯特•爱因斯坦 },{s:Albert Einstein, p:wasBornIn, o:Ulm },{s:Albert Einstein, p:wasBornOnDate, o:1879-03-14 },{s:Albert Einstein, p:hasWonPrize, o:Nobel Prize in Physics },{s:Albert Einstein, p:diedOnDate, o:1955-04-18 }]

File 2 File name: cluster2

[{s:Charles K. Kao , p:hasWonPrize, o:Nobel Prize in Physics },{s:Charles K. Kao , p:wasBornIn, o:Shanghai },{s:Faye Wong, p:hasWonPrize, o:MTV Video Music Awards },{s:Faye Wong, p:wasBornIn, o:Beijing}]

Inverted Indexs

p File list

isCalled cluster1

……

Inverted Indexs

o File list

Albert Einstein cluster1

……

Inverted Indexs

s File list

Albert Einstein cluster1

Charles K. Kao cluster2

Faye Wong Cluster2

Page 28: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

28

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 29: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

29

Experiments

Dataset:Billion Triples Challenge 2010(BTC10) . 3.2B <s, p, o, q> quads,624 GBs;The resulted of dataset have

1,426,823,976 unique triples;

Hadoop 0.20.2.Ubuntu 10.04.linux 2.6.32-24-server 64bit. 30nodes: One node is a master, and the others are slaves 47G memory, 4.3TB disk space and 24 processor of Intel(R) Xeon(R)

CPU E5645@ 2.40GHz “dfs.replication” is 2

JAQL is 0.5.1 version Java 1.6

Page 30: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

30

Experiments

Fig. Distribution of data

Page 31: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

31

Experiments

Fig. Cost time of each query

Page 32: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

32

Outline

Introduction Related work SPARQL Query Processing in MapReduce Experiments Conclusion

Page 33: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

33

Conclusion

Solution for SPARQL queries in MapReduce Transforming the queries to JAQL operators running on Hadoop.

Transformation of SPARQL to JAQL Filter, Transform, Join ……

Data Partitioning Strategy Horizontal partitioning Vertical partitioning Clustered property partitioning

Experiments show the performance Clustered property partitioning has best performance Horizontal partitioning is the worst one

Page 34: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

34

Scalability

RDBMS: Waits and deadlocks are increasing nonlinearly with the size

of the transactions and concurrency. Scale-up(Vertical scaling):Commercial RDBMSes are very, ve

ry expensive Schema:Structured data

MapReduce Linear, High throughput Scale-out (horizontal scaling) Schema-free: Unstructured data

Page 35: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

35

RDBMS V.S MapReduce

Traditional RDBMS MapReduce

Data size Gigabytes Petabytes

Access Interactive and batch Batch

Updates Read and write many times Write once, read many times

Structure Static schema Dynamic schema

Integrity High Low

Scaling Nonlinear Linear

Table . RDBMS compared to MapReduce

Page 36: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

36

Limit of hadoop

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines

The MapReduce JobTracker needs a drastic overhaul to address several deficiencies in its scalability, memory consumption, threading-model, reliability and performance

Page 37: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

The Next Generation of Apache Hadoop MapReduce

Divide the two major functions of the JobTracker, resource management and job scheduling/monitoring, into separate components.

ResourceManager ApplicationMaster

Reliability

Availability

Scalability–beyond 10,000 machines

Backward (and Forward) Compatibility

Evolution –for customers to control upgrades

Predictable Latency

Cluster utilization

Page 38: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

38

Conclusion

Hadoop(MapReduce)– Pro.

Scalable High throughput

– Con. Expense of laten

cy No index No more than 40

00 nodes

SPARQL on Cloud– Pro.

Scalable High throughput

– Con. Expense of latency Complex query:JAQL Join operation

Page 39: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

39

Thanks!

Page 40: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

40

Sparql query

Q1:select?X ?Y where{?X rdfs:label Albert Einstein. ?X smc:page ?Y. ?X rdf:type smc:Subject. }

Q2:select ?x ?y ?z where { dbsc:Ulm rdf:type ?x. ?x rdfs:label ?y. ?x rdfs:comment ?z. }

Q3:select? Who ?Y ?date1 ?Z ?date2 ?prize where{?who source:bornIn ?Y.?who source:bornOnDate?date1.?whosource:diedIn?Z.?whosource:diedOnDate ?date2. ?who source:hasWonPrize ?prize. }

Q4:select ?x ?author ?title where {?x purl:hasAuthor ?author. ?x purl:hasBooktitle ISWC 2009. ?x purl:hasTitle ?title.}

Q5:select distinct ?name ?lat ?long ?pop where {?a property:name ?name.?a property:regoin dbsc: Nord-Pas-de-Calais.a pos:lat ?lat.?a pos:long ?long.?a property:population ?pop. }

Page 41: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

41

Sparql query

Q6: select ?bn ?b ?p where{ ?a property:name ?bn. ?a property:dateOfBirth ?b. ?a property:placeOfBirth ?p. }

Q7:select ?Y ?type ?prize where{source:Albert_Einstein source:bornIn ?Y. source:Albert_Einsteinrdf:type?type.source:Albert_Einstein source:hasWonPrize ?prize. }

Q8:select ?a ?type ?pub where{?a rdf:type ?type.?a semweb:publisher ?pub.?a semweb:periodical_title Theory of Computing Systems.}

Q9:select distinct ?a ?lat ?long ?pop where{?a geo:ontology#name Chevilly.?a geo:ontology#inCountry geo:countries#FR.?a pos:lat ?lat.?a pos:long ?long.?a geo:ontology#population ?pop.}

Q10:select distinct ?l ?long ?lat where{?a property:placeOfBirth ?l.?l pos:lat ?lat.?l pos:long ?long.}

Page 42: 1 Efficient SPARQL Query Processing in MapReduce through Data Partitioning and Indexing Nie Zhi niezhixuesen@163.com.

42

Q3, Q10 are star join queries with poplar predicates and unspecified object

Q1, Q4, Q5, Q6, Q8, Q9 are also star join but with one or more known object.

Q2 is a chain query The value of subject is literals in Q7

Sparql query