Reintroducing the Stream Processor: A universal tool for continuous data analysis
Dynamic Load Distribution in the Borealis Stream Processor
description
Transcript of Dynamic Load Distribution in the Borealis Stream Processor
Network Computing Laboratory
Dynamic Load Dynamic Load Distribution in the Distribution in the Borealis Stream Borealis Stream ProcessorProcessor
Ying Xing, Stan Zdonik, Jeong-Heon HwangYing Xing, Stan Zdonik, Jeong-Heon HwangBrown Univ.Brown Univ.
ICDE 2005ICDE 2005
One line commentOne line comment
Proposed an algorithm which Proposed an algorithm which balances load dynamically by balances load dynamically by distributing operators under highly distributing operators under highly fluctuating data in the context of fluctuating data in the context of clustered Borealis (CQE) systemclustered Borealis (CQE) system
ProblemProblem
Cluster of Borealis nodes
•In a push-based (CQE) system load-fluctuation occurs in the input data rate
•Temporary load spike can affect data processing latency significantlyWe should avoid
temporary overload as much as possible!
ChallengesChallengesConnected Plan
A BS1
C DS2
r
2r
2cr
4cr
S1
S2
A B
C D
r
2r
3cr 3cr
A Better Plan
Cluster of Borealis nodes
What operator mapping plancan balance the load best?
How should we rearrange the plandynamically as the load changes?
Solution approachSolution approach
Busy all together, idle all togetherBusy all together, idle all togetherFind out the operators that are busy at the Find out the operators that are busy at the same timesame time
Calculate the correlation of the operatorsCalculate the correlation of the operators
Distribute busy operatorsDistribute busy operatorsMove the operators from a heavily loaded machine to Move the operators from a heavily loaded machine to under loaded machineunder loaded machine
Perform the above operations periodicallyPerform the above operations periodically
Propose a two-phase operator distribution Propose a two-phase operator distribution algorithmalgorithm
Global algorithm – Initial operator mappingGlobal algorithm – Initial operator mapping
Pair-wise algorithm – dynamically rearrange Pair-wise algorithm – dynamically rearrange the mappingthe mapping
Statistics measuredStatistics measured
load time series of operators or nodesload time series of operators or nodesLoad of an operatorLoad of an operator
# of tuples arrived * CPU time required for a tuple# of tuples arrived * CPU time required for a tuple
Load of a machineLoad of a machineSum of the loads of it’s operatorsSum of the loads of it’s operators
only keep the recent only keep the recent KK statistics statistics
Average load of a machine XAverage load of a machine X11
Average of load time series SAverage of load time series S11=(s=(s11, s, s22,…, s,…, skk))
Correlation of operators XCorrelation of operators X1, 1, XX22
Correlation of load time series SCorrelation of load time series SX1X1, S, SX2X2
Ideal state of the clusterIdeal state of the cluster
Average load of all machines are Average load of all machines are equalequal
Minimize the average of each Minimize the average of each machine’s load variancemachine’s load variance
Make the lower bound of the average Make the lower bound of the average variance as small as possiblevariance as small as possible
1 2 3 4 5 6
Pair-wise Load Pair-wise Load Distribution AlgorithmDistribution Algorithm
7One-way
8 9
Select operators having the greatest score until the load of the selected operators exceed (L1-L2)/2
Score function:
Co(O1, M1) – Co(O1, M2)
M1
M2
1 2 3 4 5 6
Pair-wise Load Pair-wise Load Distribution AlgorithmDistribution Algorithm
7Two-way
8 9
M1
M2
•Redistribute all movable operators
•Lower loaded node is selected
•Operators are assigned one by one
•Operator having the highest score is selected
8 9
M1
Global Operator Global Operator DistributionDistribution
M1
M2
M1
•Redistribute all movable operators after warm up period
•A node with the lowest load is selected
•Operators are assigned one by one
•Operator having the highest score is selectedScore function:
Experimental resultsExperimental results
1. computation overhead of the algorithms1. computation overhead of the algorithms2. Effectiveness of the global algorithm2. Effectiveness of the global algorithm
Load varianceLoad varianceEnd-to-end latencyEnd-to-end latency
3. Pair-wise algorithms3. Pair-wise algorithmsAdaptivity to load changesAdaptivity to load changes
Experimental resultsExperimental results
AMD Athlon 3200+ 2GHz, 1G MemAMD Athlon 3200+ 2GHz, 1G Mem Global AlgorithmGlobal Algorithm
nn 1010 2020 5050Comp(sec)Comp(sec) 0.50.5 3.43.4 5454
Pair-wise AlgorithmPair-wise Algorithm6ms for each pair when n = 10, 6ms for each pair when n = 10, 10operators/macine10operators/macine
Experimental resultsExperimental results
Global algorithms evaluationGlobal algorithms evaluation
System load level =
(sum of busy time) / (# of node * simulation duration)
Latency ratio =
end-to-end latency /
sum (processing delay)
Experimental resultsExperimental results
Pair-wise algorithm evaluationPair-wise algorithm evaluation
CritiquesCritiques
Strong pointsStrong pointsBalance loads according to the change of input data rBalance loads according to the change of input data rate (data pushing into the system)ate (data pushing into the system)A simple algorithm using correlationA simple algorithm using correlation
Weak pointsWeak pointsUnrealistic work-load (operator chains, input streams)Unrealistic work-load (operator chains, input streams)Hard to define parameters of statistics measurementHard to define parameters of statistics measurement
Load collection period, score threshold, # of time series. …Load collection period, score threshold, # of time series. …It must be changed depending on the workloadIt must be changed depending on the workload
If an input fluctuation doesn’t have any historical behaIf an input fluctuation doesn’t have any historical behavior the effect will be limitedvior the effect will be limitedDoesn’t consider about dynamic changes of an operatDoesn’t consider about dynamic changes of an operator network (query addition, deletion)or network (query addition, deletion)
Parameters for Parameters for Experiments Experiments (supplementary)(supplementary)Independent linear operator chain(10 Independent linear operator chain(10
ops)ops)
Number of nodes = 20Number of nodes = 20
20 queries20 queries
Operator processing delay = 1msOperator processing delay = 1ms
Load measuring time period = 1secLoad measuring time period = 1sec
# of samples in a load time series = # of samples in a load time series = 1010
Execute the pair-wise algorithm Execute the pair-wise algorithm every secondevery second
Operator migration time=200msOperator migration time=200ms