Patel-Paper Review

20
Linear Road: A Stream Data Management Benchmark. A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. Maskey,E. Ryvkina, M. Stonebraker, R. Tibbetts. VLDB Conference, Toronto, Canada, 2004. Your name : Nabilahmed Patel Email: [email protected]

Transcript of Patel-Paper Review

Page 1: Patel-Paper Review

Linear Road: A Stream Data Management Benchmark.A. Arasu, M. Cherniack, E. Galvez, D. Maier, A.

Maskey,E. Ryvkina, M. Stonebraker, R. Tibbetts. VLDB Conference, Toronto, Canada, 2004.

Your name : Nabilahmed PatelEmail: [email protected]

Page 2: Patel-Paper Review

References

A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. Maskey,E.

Ryvkina, M. Stonebraker, R. Tibbetts. Linear Road: A

Stream Data Management Benchmark. VLDB

Conference, Toronto, Canada, 2004. http

://www.cs.brandeis.edu/~linearroad/linear-road.pdf

Sharma Chakravarthy 2

Page 3: Patel-Paper Review

05/03/2023 © your name 3

Acknowledgments, if any

I would like to thank Dr. Sharma and Mr. Nandan for

their help.

Page 4: Patel-Paper Review

05/03/2023 © your name 4

Talk Outline

Summary of the paper Challenges in designing benchmark due to stream data. Benchmark requirements. Implementation and Experiments.

Strong and weak points

Why you think it got accepted

How to extend/improve the work

conclusions

Others

Page 5: Patel-Paper Review

05/03/2023 © your name 5

Summary of the paper Due to unbounded and continuous nature of stream data, input data

introduces some unique challenges1. Semantically Valid Input2. Continuous Query Performance Metrics3. Many Correct Results4. No Query Language

There are some specific ways in which Linear Road addresses this challenges.1. The input data is generated by traffic simulator, MITSIM.2. Response Time and Support Query Load (L-rating)3. Validation for each of queries considers all possible valid

answers.4. Queries are specified formally in the “predicate calculus”.

Page 6: Patel-Paper Review

05/03/2023 © your name 6

Summary of the paper

Understanding of Data

1) Position reports: (Type = 0, Time, VID, Spd, XWay, Lane, Dir, Seg, Pos)

2) Account Balance: A request for the vehicle’s current account balance,

(Type = 2, Time, VID, QID)

3) Daily Expenditure: A request for the vehicle’s total tolls on a specified

expressway, on a specified day in the previous 10 weeks.

(Type = 3, Time, VID, XWay, QID, Day)

4) Travel Time: A request for an estimated toll and travel time for a journey on

a given expressway on a given day of the week, at a given time.

(Type = 4, Time, VID, XWay, QID, Sinit, Send, DOW, TOD)

Page 7: Patel-Paper Review

05/03/2023 © your name 7

Summary of the paper

To avoid the complication of unpredictable event delivery order,

The four types of input tuples are multiplexed together into a single

stream of tuples consisting of the union of all fields.

In order, these are:

(Type, Time, VID, Spd, XWay, Lane, Dir, Seg, Pos, QID, Sinit, Send,

DOW, TOD,Day).

Linear Road implementations can use the Type field to determine which

fields are relevant for a given tuple.

Page 8: Patel-Paper Review

05/03/2023 © your name 8

Summary of the paper

Continuous Queries:

Page 9: Patel-Paper Review

05/03/2023 © your name 9

Summary of the paper

Page 10: Patel-Paper Review

05/03/2023 © your name 10

Summary of the paper

Historical Queries:

Page 11: Patel-Paper Review

05/03/2023 © your name 11

Summary of the paper

Page 12: Patel-Paper Review

05/03/2023 © your name 12

Summary of the paper

Page 13: Patel-Paper Review

05/03/2023 © your name 13

Summary of the paper

Implementation: The historical data generator is run to generate flat files consisting of

10 weeks worth of historical data. The traffic simulator is run to generate L flat files, each of which

consists of 3 hours of traffic data and historical query requests from vehicles reporting from a single expressway during rush hour. The data driver is then invoked to deliver this data in a manner simulating its arrival in real-time.

The system running the benchmark is configured to generate a flat file containing all output tuples in response to the queries defined in the benchmark.

The validation tool is used to check the response times and accuracy of generated output to see if they meet the requirements of the benchmark.

Page 14: Patel-Paper Review

05/03/2023 © your name 14

Strong points

The problem is defined and explained in detail with the real world

example.

The simulation and presentation of stream data is given extensively,

which makes understanding of queries very easy.

The queries are defined in “predicate calculus”, hence everybody can

understand and can implement in their systems using their own query

languages.

Page 15: Patel-Paper Review

05/03/2023 © your name 15

Weak points

Some details are repeated again and again in different sections.

The “Travel Time Estimation query” is not supported by

implementation of Benchmark as it is too complex to expressed.

Page 16: Patel-Paper Review

05/03/2023 © your name 16

Why do you think it got accepted?

Linear Road Benchmark was the first attempt to compare the

performance characteristics of SDMS systems.

The Paper simulates the example of real world into the design of

SDMS by creative thoughts, which makes it easy to understand.

It is creative thought on how to meet the challenges of large scale

streaming data applications.

Page 17: Patel-Paper Review

05/03/2023 © your name 17

Possible Future Work

Time Travel Query is not supported by current implementation of

benchmark, so it can be improved to response this kind of query.

The paper is comparing SDMS with RDBMS, but by following the

procedure it is possible to compare two different SDMS.

Page 18: Patel-Paper Review

05/03/2023 © your name 18

Conclusions

The Paper describes how the challenges introduced due to nature of

stream data can be outlined using design of Linear Road Benchmark.

It also covers how the stream data is simulated and generated using

traffic simulator.

Two continuous and two historical queries are defined in predicate

calculus.

Implementation of Linear Road Benchmark in Aurora (SDMS) and

System X (RDBMS) is covered.

Results of experiments shows that SDMS can outperform a

Relational Database system in processing stream data by at least a

factor of 5.

Page 19: Patel-Paper Review

05/03/2023 © your name 19

Others Readability/presentation

It is well explained and extensively described with experimentation results.

It is easy to read. It is well presented with detailed explanation.

Technical depth Queries are in predicate calculus. Stream data is well defined.

Novelty STREAM (SDMS) also has implemented Linear Road Benchmark. This benchmark makes it possible to compare the performance

characteristics of SDMS’ relative to each other and to RDBMS. Overall comment

Experiments has proven that SDMS outperforms RDBMS at least by the factor of 5.

Page 20: Patel-Paper Review

05/03/2023 © your name 20

Thank You !!!