49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies...

49221052 施賀傑

69521041 何承恩

TelegraphCQ

OutlineIntroductionData Movement Implies AdaptivityTelegraph - an Ancestor of TelegraphCQAdaptive Building BlocksInitial CQ ApproachesTelegraphCQConclusion and Future Work

IntroductionTelegraphCQ is an extension to the

Telegraph projectHandling large streams of continuous

queries over high-volume, highly-variable data streams

Traditional data processing environment is not suitable for motion dataLarge scaleUnpredictability of the environmentNeed for close interaction with users

Data Movement Implies AdaptivityTraditional database are inappropriate for

dataflow processing

Streaming dataPushing, instead of pullingHave to be processed on the fly

Continuous Queries (CQ)Queries are continuously activeData initiates access to queries

Data Movement Implies AdaptivityShared processing

Avoid blocking or interrupt dataflowProcessing each query individually can be

slow and wasteful of resourcesQueries should have some commonalities

Other Sources of UnpredictabilityDeeply networked environmentUser may need to adjust the query on the fly

based on the previous result

Telegraph - an Ancestor of TelegraphCQDesigned to provide adaptability to

individual dataflow graphs

Two new prototypes to extend Telegraph to support shared processing over streamsCACQPSoup

Adaptive Building BlocksTelegraph consist a set of modules

Module TypesIngress and CachingQuery ProcessingAdaptive Routing

Adaptive Building Blocks

Adaptive Building BlocksIngress and Caching

Interface with external data sourcesHTML/XML screen scraper (TeSS)Proxy for fetching data from peer-to-peer

networks (TeleNap)Query Processing

Routing tuples through query modules on a tuple-by-tuple basis

A special type of module known as a State Module (SteM)

Adaptive Building BlocksAdaptive Routing

Construct a query plan that contains adaptive routing modules

Be able to re-optimize the plan while a query is running

Eddy : route data to other query operatorsJuggle : perform online reorderingFLuX : route tupples to support parallelism

with load-balancing and fault-tolerance

EddyContinuously route tuples among a set of

other modules according to a routing policy

EddyRouting policyNaive Eddy:

Handle only operators with different costs but equal selectivity

Deliver tuples to the two selection equallyFast Eddy:

Improve the Naive Eddy with Lottery SchedulingTuple to operator → costs a “ticket”Operator return a tuple → a “ticket” is debitedBenefit: nearly optimal performance with less

effort

EddyOur query

EddySuppose s1 and s2 have the same

selectivitySet s2 cost 5 delay units

EddySuppose s1 and s2 have the same costSet the selectivity of s2 fixed at 50%

SteMsSteMs

A temporary repository of tuples

FjordsAn inter-module communications APIAllow query plans to use a mixture of

push and pull connections between modules

Initial CQ ApproachesCACQPSoupLimitation of CACQ and PSoup

Restricted their processing to data that could fit in memory

Did not investigate scheduling and resource management issues for queries with little or no overlap

Did not explicitly deal with the notion of QoS for adapting to resource limitation

Did not explore opportunities for varying the degree of adaptivity to tradeoff flexibility and overhead

CACQFirst continuous query engine to exploit the

adaptive query processing framework of Telegraph

Modify Eddies to execute multiple queries simultaneously

Use grouped filters to optimize selections in the shared execution of the individual queries.

PSoupExtend the mechanisms developed in CACQ

in two main waysAllow queries to access historical dataSupport disconnected operationNew queries can be applied to old dataNew data can be applied to old queriesAccomplished by creating a query SteM

PSoupFor example: add a new query

PSoupExercise: add a new data using PSoup with

example step by stepR.a=3 R.b=6

49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies...

Documents

Transcript of 49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies...