BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

22
BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology)

description

BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams. Jinwon Lee Y. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song (Korea Advanced Institute of Science and Technology). Outline. Border Monitoring Query (BMQ) BMQ-Index Experiments - PowerPoint PPT Presentation

Transcript of BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

Page 1: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

Jinwon LeeY. Lee, S. Kang, S. Lee, H. Jin, B. Kim and J. Song

(Korea Advanced Institute of Science and Technology)

Page 2: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

2

Outline

Border Monitoring Query (BMQ)BMQ-IndexExperimentsRelated workConclusion

Page 3: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

3

GPSs

Sensors

Data stream monitoring

Emerging Computing Environment

11 10 12 13 12 14Data stream Continuousrange queries

Q1 : 10 < valueQ2 : 11 < value < 13 …….

◀ Remote Medical Service

◀ Disaster Prevention• Flood Warning• Earthquake Prediction• Building Monitoring• Traffic light control

▲ Automatic Home• Automatic Ventilation• Automatic Temperature Control• Automatic Humidity Control

◀ Logistics• Management• Thief-proofing• Catalog • Advertisement

◀ Location-based Service• Tracking (Friends, Employee)• Vehicle Monitoring• Intelligent Transportation

Page 4: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

4

Motivating Service Scenario #1

Stock trading

580

590

600

610

620

630

640

650

660

Dat

a st

ream

val

ue (

$)

SAMSUNG stock price during 23 days from Nov. 16th to Dec. 23rd, 2005

Expensive !! ( > $640)

Time

Cheap !! ( < $600)

buy

sell sell

buy

Monitor stock data streams crossing the borders !!

Page 5: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

5

Motivating Service Scenario #2

Location-based advertisement

Going out

Send a special lunch menu to people within 1km during lunch time !!

Coming into

Monitor location data streams crossing the borders !!

Coupon

Pet-Care

Page 6: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

6

Border Monitoring Query

To monitor data streams crossing the borders – Essential concern in many practical applications

Users’ main interest Useful to automatically trigger or stop relevant actions

BMQ (Border Monitoring Query)– A new type of continuous range query !!– It reports only data crossing the borders of a query range (=

coming into or going out from the query range)

RMQ (Region Monitoring Query) – Conventional continuous range query – It reports all matching data within a query range

Page 7: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

7

Problem: Scalability !!

A large number of BMQs can be issued• Millions of stock investors will register their own queries• Millions of stores will register their own queries+ A huge volume of data streams are rapidly incoming + Fast response is also essential for users

How can we process BMQs over data streams efficiently?– (1) Naïve approach

Individual BMQ processing at each data update Lack of scalability !!

– (2) Based on existing mechanisms for RMQ evaluation Shared RMQ processing by indexing queries Costly post-processing !!

Page 8: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

8

Solution Approach: BMQ-Index

Shared processing– By query indexing approach

BMQ-Index is built on registered BMQs Upon a data arrival, only border-crossed queries are quickly sea

rched for Achieves a high level of scalability !!

Q1, Q2 (border-crossed

queries)

RegisteredBMQs

Q1: 10 < valueQ2: 11 < value < 13 …….

BMQ-Index

14Data tuple

Page 9: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

9

Solution Approach: BMQ-Index

Incremental processing– By incremental access method

Use previous search step for the next search Successive searches are significantly accelerated !!

Keep information only needed for incremental search Low storage cost !!

Q1, Q2 (border-crossed

queries)Registered

BMQs

Q1: 10 < valueQ2: 11 < value < 13 …….

BMQ-Index

Series of data tuples

10 12 13 12 14

Locality of data streams !!

Page 10: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

10

One-dimensional BMQ-Index(Example)

+Q1

+Q3

Q1

+Q4

Q3

+Q5

Stream_ID Node pointer

IBM

Q2 Q4 Q5

0 10 15 20 5 25 30 35 45

Stream Table

Linked list

Q5Q4

Q3

Q2

Q1

Registered BMQs

0 10

5 20

15

0 25

3035 45

reasonable price range

(unit: $)

$10 $30

Notify me whenever the IBM stock price is coming into or going out from my reasonable price range !!

+Q2

Page 11: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

11

Search Operation in One-dimension (Example)

Q5Q4

Q3

Q2

Q1

∞ 0 10 15 20 5 25 30 35 45

0 10

5 20

15

0 25

3035 45

Case 2) 21 37 -Q2, -Q4, +Q5 Traverse BMQ-Index to the right

Case 3) 21 8 +Q3, -Q4, +Q1 Traverse BMQ-Index to the left

Case 1) 21 23 No border-crossed query No node traversal

37 21 8

Stream_ID Node pointer

IBM

23

+Q1 +Q3

Q1

+Q4

Q3

+Q5

Q2 Q4 Q5

+Q2

: previous data value (vt-1): current data value (vt)

Page 12: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

12

Multi-dimensional BMQ-Index

StreamID V PX PY

s1 (vX1, vY1) RS-X2 RS-Y2

s2 (vX2, vY2) RS-X3 RS-Y5

s3 (vX3, vY3) RS-X5 RS-Y4

Stream Table

bY7

{Q1} {Q2}

{Q1}

{Q3}

{Q3} {Q2}

Q1

Q2

Q3

RS-X List

RS-Y List

RS-X5 RS-X6 RS-X7RS-X4RS-X3RS-X2

{} {}

-DQSet-Xi {} {}

{}

RS-Y2

RS-Y3

RS-Y4

RS-Y5

RS-Y6

RS-Y7

+DQSet-Yi-DQSet-Yi

{Q1}

{Q2}

{Q3}

{}

{}

{}

{}

{}

{}

{Q1}

{Q3}

{Q2}

+DQSet-Xi

{}

bX0 bX1 bX2 bX3 bX4 bX5 bX7

bY1

bY2

bY3

bY4

bY5

bY6

bX6

RS-X1

{}

{}

{} {} RS-Y1bY0

v(s1)

v(s2)

v1(s3)

v3(s3)

v2(s3)

QueryID Range

Q1 (bX1, bX3, bY1, bY4)

Q2 (bX2, bX6, bY2, bY6)

Q3 (bX4, bX5, bY3, bY5)

Query Table

Page 13: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

13

Search Operation in Multi-dimension Overall flow

Performance Analysis (d-dimension)– Search performance

(((d–1)d) one-dimensional search time)– Storage cost

(d one-dimensional storage cost)

RS-X list.search()

(xc, yc)

RS-Y list.search()

±XQSet

±YQSet

cross-checkwith Y-dimension

cross-checkwith X-dimension

Union

xc

yc ±YBMQSet

±XBMQSet

QSet±

Per-dimensionsearch

Validation throughcross-check

Union of per-dimension

results

Page 14: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

14

Experiments

Workload generation– Stock trading scenario (one-dimensional case)

Data stream generation (Korea stock market[9])– Fluctuation level: 0.01% ~ 0.1%– 2000 stream sources, 1000 tuples in each stream

Query generation– Lower bound: randomly chosen (1 ~ 106 )– Width of queries: 1 ~ 10 times larger than FL – Number of queries: 10,000 ~ 100,000

Comparisons– An approach based on state-of-the-arts RMQ-Index (CEI[CIKM’05] and IS-list[Information System’96])

Performance metrics– Average search time per data tuple (millisecond)– Index storage size (Mbyte)

Page 15: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

15

Search performance

Effects of the number of queries (W=0.1%, FL=0.01%)

Effects of the widths of queries (N=100000, FL=0.01%)

Average search time (ms)

0

20

40

60

80

100

0 20000 40000 60000 80000 100000

Number of queries

BMQ-IndexCEI-basedIS-list-based

Average search time (ms)

0

20

40

60

80

100

0 0.02 0.04 0.06 0.08 0.1

Width of queries

BMQ-IndexCEI-basedIS-list-based

Page 16: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

16

Storage cost

Effects of the number of queries (W=0.1%)

Effects of the widths of queries (N=100000)

BMQ-Index: twice IS-list: log (# of queries) times CEI: all grids covered by a query range

Index storage size (MB)

0

20

40

60

80

0 20000 40000 60000 80000 100000

Number of queries

BMQ-IndexCEI-basedIS-list-based

Index storage size (MB)

0

20

40

60

80

0 0.02 0.04 0.06 0.08 0.1

Width of queries

BMQ-IndexCEI-basedIS-list-based

Page 17: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

17

Related Work

Semantics– CQL (Continuous Query Language developed by STREAM project)

General concept to transform a Relation to a Stream BMQ is a specific class of continuous range query

Shared and Incremental Processing

Previous research Difference

Data stream processing

Tree-based (1-D: [2][4][5][14])

- O(log N) search performance- O(NlogN) storage cost

Grid-based (1-D: [17], 2-D:[6][13])

- Better search performance than tree-based- Require more storage cost

Spatio-temporal database

SINA[11] (shared and incremental)

- Disk-based algorithm- Not purely incremental access method

GPAC[12] (incremental)

- Not for shared processing

Generally not feasible for BMQs !!

Page 18: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

18

Conclusion

Summary– Characterize a new type of continuous range query

Border Monitoring Query (BMQ) Useful and practical in many emerging applications

– One- and multi-dimensional BMQ-Index Evaluates a large number of BMQs in a shared and increment

al manner, thereby achieving excellent search performance and low storage cost

Page 19: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

19

Thank you

Question?

Page 20: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

Backup slide

Page 21: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

21

Performance Analysis

1-dimensional BMQ-Index– Search performance

(2 Nq FL)

– Storage cost (2Nq + Nd)

d-dimensional BMQ-Index– Search performance

(((d–1)d) 2Nq FL), only 2 times when d=2

– Storage cost (d(2Nq + Nd) + Nq)

Nq = Number of queriesNd = Number of data streams

Page 22: BMQ-Index: Shared and Incremental Processing of Border Monitoring Queries over Data Streams

22

Cross checking

Algorithm– For +XQSet

check whether vt is located between the Y predicates– For –XQSet

check whether vt-1 is located between the Y predicates YQSet is checked with X-dimension by a similar manner