Geospatial Stream Query ProcessingpQy g using Microsoft ...Geospatial Stream Query ProcessingpQy g...

1
Geospatial Stream Query Processing Geospatial Stream Query Processing i Mi f SQL S S I ih using Microsoft SQL Server StreamInsight using Microsoft SQL Server StreamInsight 1 1 2 1 1 Seyed Jalal Kazemitabar 1 Ugur Demiryurek 1 Mohamed Ali 2 Afsin Akdogan 1 Cyrus Shahabi 1 1 It td M di S t C t 2 Mi ft SQL S 1 Integrated Media Systems Center 2 Microsoft SQL Server University of Southern California Microsoft Corporation ICampus IWatch CT ICampus IWatch CT Streaming Engine Introduction Streaming Engine GeoInsight • StreamInsight Architecture • A real-world data-driven framework which enables: A real world data driven framework which enables: – Fast query processing over stream data using Microsoft – Fast query processing over stream data using Microsoft StreamInsight TM StreamInsight Running spatial queries over geospatial data – Running spatial queries over geospatial data O li l i d di ti b d hi t i dt i i – Online analysis and prediction based on historic data using our in- k t hi t hi memory sketching technique • Stream flow in demo Q er Average er Q 3 dapte Value Filter Spatial Filter PCA PCA Predict Refine Q 1 Q 2 Q 5 Average Adapte Q 4 Q 6 Q 7 put Ad Value Filter Spatial Filter PCA PCA, Predict Refine Average tput A Inp Out Application Approach O li A l ti l R fi t dP di ti (OARP) Ui I Sk t h Online Analytical Refinement and Prediction (OARP) Using In-memory Sketches Hybrid queries over spatio-temporal windows provide great analysis • Instead of storing the whole data in DB, store the sketches in memory functionality including: • Principal component Analysis (PCA): a mathematical approach for analyzing • Refinement functions • Principal component Analysis (PCA): a mathematical approach for analyzing correlated data • Refinement functions correlated data – Smoothing noisy input data according to previously observed patterns A b f t ith ti fl Dt ti f li h t i db di th t hi hl • A number of components with great influence – Detection of anomalies characterized by sensor readings that are highly d itdf hi t i l l selected as coordinates deviated from historical mean values • Improving PCA performance for aggregate queries by • Prediction functions Improving PCA performance for aggregate queries by calculating the query result in transformed space P di ti ft t d b d i l b d tt calculating the query result in transformed space – Predicting near future trends based on previously observed patterns – Responding to anomalies and deliberately attempting to change future conditions Contribution/Experiments Contribution/Experiments PCA for Traffic Data PCA for Traffic Data Hi hd t i t • High data compression rate – 98% for highway data • Extra short response time Challenges – 2 milliseconds (compare to 58 sec.) Challenges 2 milliseconds (compare to 58 sec.) • Highly accurate for Traffic Data Large Datasets and Spatial Queries • Highly accurate for Traffic Data MSE for same query: 10 -4 Mph • Large response time caused by disk I/O limits the availability of hybrid – MSE for same query: 10 -4 Mph Large response time caused by disk I/O limits the availability of hybrid queries in real-time streaming applications Real Data Transformed Data queries in real time streaming applications “What was the average speed in I-10 in LA county during summer 2009 from 4:00-5:00 pm?” 98% ta 98% eed e in dat Spe ariance % of Va Response Time for the indexed % Components Database Response Time for the indexed table containing data of one Time Time Components year (150 GB) : 58 Seconds! Conclusion and Future Work • Limited support for geostreaming (continuous spatial queries) in current D li ti f f tf t hi h ti l i database technologies Demo application as a proof of concept for a system which runs spatial queries over real time data real-time data Implementing the fundamentals of Clever Transportation (CT) project as a platform for monitoring, querying, and analyzing real-time Los Angeles traffic data • Devising a scalable spatial alarm continuous query suitable for location-based Devising a scalable spatial alarm continuous query suitable for location based services

Transcript of Geospatial Stream Query ProcessingpQy g using Microsoft ...Geospatial Stream Query ProcessingpQy g...

Page 1: Geospatial Stream Query ProcessingpQy g using Microsoft ...Geospatial Stream Query ProcessingpQy g using Microsoft SQL Serverusing Microsoft SQL Server i Mi f SQL S SIihStreamInsight

Geospatial Stream Query ProcessingGeospatial Stream Query Processingp Q y gi Mi f SQL S S I i husing Microsoft SQL Server StreamInsightusing Microsoft SQL Server StreamInsight

1 1 2 1 1Seyed Jalal Kazemitabar

1Ugur Demiryurek

1Mohamed Ali

2 Afsin Akdogan

1 Cyrus Shahabi

1y g y g y

1I t t d M di S t C t 2Mi ft SQL S1Integrated Media Systems Center 2Microsoft SQL Server University of Southern California Microsoft Corporation ICampus IWatch CTy p ICampus IWatch CT

Streaming EngineIntroduction Streaming Engine

GeoInsight• StreamInsight Architecture

g• A real-world data-driven framework which enables:A real world data driven framework which enables:

– Fast query processing over stream data using Microsoft– Fast query processing over stream data using Microsoft StreamInsightTMStreamInsight

Running spatial queries over geospatial data– Running spatial queries over geospatial data

O li l i d di ti b d hi t i d t i i– Online analysis and prediction based on historic data using our in-k t hi t h imemory sketching technique

• Stream flow in demo

Q

er

Average

er

Q3

dapt

e

Value Filter Spatial Filter PCA PCA PredictRefineQ1 Q2 Q5

Average Ada

pte

Q4 Q6 Q7

put A

d Value Filter Spatial Filter PCA PCA, PredictRefine Average

tput

A

Inp

Out

Application Approachpp

O li A l ti l R fi t d P di ti (OARP)

pp

U i I Sk t hOnline Analytical Refinement and Prediction (OARP) Using In-memory SketchesHybrid queries over spatio-temporal windows provide great analysis • Instead of storing the whole data in DB, store the sketches in memory y q p p p g yfunctionality including:

g , yy g

• Principal component Analysis (PCA): a mathematical approach for analyzing• Refinement functions

• Principal component Analysis (PCA): a mathematical approach for analyzing correlated data• Refinement functions correlated data

– Smoothing noisy input data according to previously observed patternsA b f t ith t i fl

g y p g p y p

D t ti f li h t i d b di th t hi hl• A number of components with great influence

– Detection of anomalies characterized by sensor readings that are highly d i t d f hi t i l l

selected as coordinatesdeviated from historical mean values

• Improving PCA performance for aggregate queries by• Prediction functions

Improving PCA performance for aggregate queries by

calculating the query result in transformed space

P di ti f t t d b d i l b d tt

calculating the query result in transformed space

– Predicting near future trends based on previously observed patterns

– Responding to anomalies and deliberately attempting to change future conditions

Contribution/ExperimentsContribution/Experiments

PCA for Traffic DataPCA for Traffic Data

Hi h d t i t• High data compression rate

– 98% for highway data

• Extra short response time

Challengesp

– 2 milliseconds (compare to 58 sec.)Challenges 2 milliseconds (compare to 58 sec.)

• Highly accurate for Traffic DataLarge Datasets and Spatial Queries

• Highly accurate for Traffic Data

MSE for same query: 10-4 Mphg p Q

• Large response time caused by disk I/O limits the availability of hybrid– MSE for same query: 10-4 Mph

Large response time caused by disk I/O limits the availability of hybrid queries in real-time streaming applications Real Data Transformed Dataqueries in real time streaming applications

“What was the average speed in I-10 in LA county during summer 2009 from 4:00-5:00 pm?”98% ta98%

eed

e in

dat

Spe

aria

nce

% o

f Va

Response Time for the indexed %

ComponentsDatabaseResponse Time for the indexedtable containing data of one

Time TimeComponentsg

year (150 GB) : 58 Seconds!

Conclusion and Future Work

• Limited support for geostreaming (continuous spatial queries) in current D li ti f f t f t hi h ti l i

pp g g ( p q )database technologies Demo application as a proof of concept for a system which runs spatial queries over

real time datag

real-time data

Implementing the fundamentals of Clever Transportation (CT) project as a platform for monitoring, querying, and analyzing real-time Los Angeles traffic data

• Devising a scalable spatial alarm continuous query suitable for location-basedDevising a scalable spatial alarm continuous query suitable for location based services