SSN-TC workshop talk at ISWC 2015 on Emrooz

19
First Joint International Workshop on Semantic Sensor Networks and Terra Cognita October 11, 2015, Bethlehem, PA, USA Emrooz: A Scalable Database for SSN Observations Markus Stocker, Narasinha Shurpali, Kerry Taylor, George Burba, Mauno Rönkkö, Mikko Kolehmainen markus.stocker@uef.fi @markusstocker and @envinf

Transcript of SSN-TC workshop talk at ISWC 2015 on Emrooz

Page 1: SSN-TC workshop talk at ISWC 2015 on Emrooz

First Joint International Workshop onSemantic Sensor Networks and Terra CognitaOctober 11, 2015, Bethlehem, PA, USA

Emrooz: A Scalable Database forSSN Observations

Markus Stocker, Narasinha Shurpali, Kerry Taylor, GeorgeBurba, Mauno Rönkkö, Mikko Kolehmainen

[email protected]@markusstocker and @envinf

Page 2: SSN-TC workshop talk at ISWC 2015 on Emrooz

2

IntroductionI Expressive ontologies for sensor (meta-) data (SSN)I Flexible graph data model (RDF)I Triple stores obvious choiceI Unfortunately hardly viable at scaleI Triple stores indexes for graph pattern queriesI Not designed for time series interval queries

Page 3: SSN-TC workshop talk at ISWC 2015 on Emrooz

3

AimI Build a database that ...I Consumes SSN observations in RDFI Evaluates SSN observation SPARQL queriesI Scales to billions of observationsI Has better query performance than triple stores

Page 4: SSN-TC workshop talk at ISWC 2015 on Emrooz

4

Architecture

Page 5: SSN-TC workshop talk at ISWC 2015 on Emrooz

5

Cassandra data modelI Schema consisting of

I Partition key (row key) of type asciiI Clustering key (column name) of type timeuuidI Column value of type blob

I The partition key consists of two (dash-concatenated) partsI SHA-256 hex string digest of sensor-property-feature URIsI Date time string of pattern yyyyMMddHHmm

I Computed from observation result timeI Floor-rounded to year, month, day, hour, or minuteI Rounding depends on sensor sampling frequencyI Goal is to limit the number of columns per row

I Clustering key determined by observation result timeI Column value is set of triples for observation (binary)

Page 6: SSN-TC workshop talk at ISWC 2015 on Emrooz

6

Experiments

I LI-7500A Open Path CO2/H2O Gas AnalyzerI LI-7700 Open Path CH4 AnalyzerI Property of mole fractionI Three features for the monitored gases

Page 7: SSN-TC workshop talk at ISWC 2015 on Emrooz
Page 8: SSN-TC workshop talk at ISWC 2015 on Emrooz

8

Experiments

I January 7 to May 26, 2015, 6045 GHG archive filesI Estimated # of sensor observations is 326 430 000I Estimated # of triples is 4.9 billion (15 triples / observation)I Load and query performance on 10 subsetsI SPARQL query with 10 min intervalI Compared to Stardog and BlazegraphI Test performance with varying time interval

Page 9: SSN-TC workshop talk at ISWC 2015 on Emrooz

9

The query

select ?time ?valuewhere { [

ssn:observedBy licor:LERS-75H-2035 ;ssn:observedProperty sweet-propFraction:MoleFraction ;ssn:featureOfInterest sweet-matrCompound:CO2 ;ssn:observationResultTime [ time:inXSDDateTime ?time ] ;ssn:observationResult [ ssn:hasValue [

dul:hasRegionDataValue ?value] ]

]filter (?time >= "2015-04-15T00:00:00.000+06:00"^^xsd:dateTime

&& ?time < "2015-04-15T00:10:00.000+06:00"^^xsd:dateTime)}order by asc(?time)

Page 10: SSN-TC workshop talk at ISWC 2015 on Emrooz

10

Results: Some figures

Subset Observations Triples Distinct30 m 54 000 810 000 648 0071 h 108 000 1 620 000 1 296 0073 h 324 000 4 860 000 3 888 0076 h 647 997 9 719 955 7 775 97112 h 1 295 997 19 439 955 15 551 9711 d 2 591 994 38 879 910 31 103 9357 d 18 140 271 272 104 065 217 683 2591 M 72 526 464 1 087 896 960 870 317 5753 M 194 188 107 2 912 821 605 *J-M 328 715 445 4 930 731 675 *

Page 11: SSN-TC workshop talk at ISWC 2015 on Emrooz

11

Results: Load performance

10

100

1000

10000

100000

1000000

30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M

Tim

e (lo

g sc

ale)

[s]

Subsets

EmroozBlazegraph

Stardog

Page 12: SSN-TC workshop talk at ISWC 2015 on Emrooz

12

Results: Query performance

10

100

1000

30 m 1 h 3 h 6 h 12 h 1 d 7 d 1 m 3 m J-M

Tim

e (lo

g sc

ale)

[s]

Subsets

EmroozBlazegraph

Stardog

Page 13: SSN-TC workshop talk at ISWC 2015 on Emrooz

13

Results: Query size performance

0

1

2

3

4

5

6

7

8

9

10

1 s 30 s 1 m 5 m 10 m 20 m 30 m 40 m 50 m 60 m

Tim

e [s

]

Query time interval

Emrooz

Page 14: SSN-TC workshop talk at ISWC 2015 on Emrooz

14

REST

curl http://localhost:8080/sensors/listcurl http://localhost:8080/properties/listcurl http://localhost:8080/features/list

curl -H "Accept: application/json" \http://localhost:8080/sensors/list

curl -H "Accept: text/csv" -G \--data-urlencode sensor=http://example.org#thermometer \--data-urlencode property=http://example.org#temperature \--data-urlencode feature=http://example.org#air \--data-urlencode from=2015-04-21T01:00:00.000+03:00 \--data-urlencode to=2015-04-21T02:00:00.000+03:00 \http://localhost:8080/observations/sensor/list

Page 15: SSN-TC workshop talk at ISWC 2015 on Emrooz

15

R

host <- "http://localhost:8080"

df.sensors <- read.csv(text=getURL(paste0(host, "/sensors/list")),header=FALSE, col.names=c("sensor"))

df.sensorssensor

1 http://licor.com#LERS-75H-CH42 http://licor.com#LERS-75H-CO2

Page 16: SSN-TC workshop talk at ISWC 2015 on Emrooz

16

Rhost <- "http://localhost:8080"sensor <- "http://licor.com#LERS-75H-CO2"property <- "http://sweet.jpl.nasa.gov/2.3/propMass.owl#Density"feature <- "http://sweet.jpl.nasa.gov/2.3/matrCompound.owl#CarbonDioxide"from <- "2015-01-07T00:00:00.000+06:00"to <- "2015-01-07T00:01:00.000+06:00"

url <- paste0(host, "/observations/sensor/list?","sensor=", curlEscape(sensor),"&property=", curlEscape(property),"&feature=", curlEscape(feature),"&from=", curlEscape(from),"&to=", curlEscape(to))

df.observations <- read.csv(text=getURL(url,httpheader=c(Accept="text/csv")), header=TRUE, sep=",")

ggplot(data=df.observations, aes(time, value))+ geom_line() + xlab("Time") + ylab("CO2 [mmol m-3]")

Page 17: SSN-TC workshop talk at ISWC 2015 on Emrooz

17

R

18.00

18.05

18.10

18.15

00 15 30 45 00

Time

CO

2 [m

mol m

−3]

Page 18: SSN-TC workshop talk at ISWC 2015 on Emrooz

18

Related and future workI Other authors have pointed out the problemI “Semantification of measurement data not promising”I RDF databases on NoSQL systems (e.g. Cumulus RDF)I Support for QB observations (done)I REST API (preliminary)I Integration with R/Matlab (preliminary)I Performance comparison with other systems

Page 19: SSN-TC workshop talk at ISWC 2015 on Emrooz

19

ConclusionI SSN and RDF nice for sensor (meta-) dataI Triple stores inadequate for observation dataI Alternative approaches requiredI What are the advantages and disadvantages?I Reasoning on all data by some sensor?I Query for observation values exceeding threshold?