49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies...
-
Upload
eustacia-newman -
Category
Documents
-
view
223 -
download
0
Transcript of 49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies...
OutlineIntroductionData Movement Implies AdaptivityTelegraph - an Ancestor of TelegraphCQAdaptive Building BlocksInitial CQ ApproachesTelegraphCQConclusion and Future Work
IntroductionTelegraphCQ is an extension to the
Telegraph projectHandling large streams of continuous
queries over high-volume, highly-variable data streams
Traditional data processing environment is not suitable for motion dataLarge scaleUnpredictability of the environmentNeed for close interaction with users
Data Movement Implies AdaptivityTraditional database are inappropriate for
dataflow processing
Streaming dataPushing, instead of pullingHave to be processed on the fly
Continuous Queries (CQ)Queries are continuously activeData initiates access to queries
Data Movement Implies AdaptivityShared processing
Avoid blocking or interrupt dataflowProcessing each query individually can be
slow and wasteful of resourcesQueries should have some commonalities
Other Sources of UnpredictabilityDeeply networked environmentUser may need to adjust the query on the fly
based on the previous result
Telegraph - an Ancestor of TelegraphCQDesigned to provide adaptability to
individual dataflow graphs
Two new prototypes to extend Telegraph to support shared processing over streamsCACQPSoup
Adaptive Building BlocksTelegraph consist a set of modules
Module TypesIngress and CachingQuery ProcessingAdaptive Routing
Adaptive Building BlocksIngress and Caching
Interface with external data sourcesHTML/XML screen scraper (TeSS)Proxy for fetching data from peer-to-peer
networks (TeleNap)Query Processing
Routing tuples through query modules on a tuple-by-tuple basis
A special type of module known as a State Module (SteM)
Adaptive Building BlocksAdaptive Routing
Construct a query plan that contains adaptive routing modules
Be able to re-optimize the plan while a query is running
Eddy : route data to other query operatorsJuggle : perform online reorderingFLuX : route tupples to support parallelism
with load-balancing and fault-tolerance
EddyRouting policyNaive Eddy:
Handle only operators with different costs but equal selectivity
Deliver tuples to the two selection equallyFast Eddy:
Improve the Naive Eddy with Lottery SchedulingTuple to operator → costs a “ticket”Operator return a tuple → a “ticket” is debitedBenefit: nearly optimal performance with less
effort
FjordsAn inter-module communications APIAllow query plans to use a mixture of
push and pull connections between modules
Initial CQ ApproachesCACQPSoupLimitation of CACQ and PSoup
Restricted their processing to data that could fit in memory
Did not investigate scheduling and resource management issues for queries with little or no overlap
Did not explicitly deal with the notion of QoS for adapting to resource limitation
Did not explore opportunities for varying the degree of adaptivity to tradeoff flexibility and overhead
CACQFirst continuous query engine to exploit the
adaptive query processing framework of Telegraph
Modify Eddies to execute multiple queries simultaneously
Use grouped filters to optimize selections in the shared execution of the individual queries.
PSoupExtend the mechanisms developed in CACQ
in two main waysAllow queries to access historical dataSupport disconnected operationNew queries can be applied to old dataNew data can be applied to old queriesAccomplished by creating a query SteM