1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele...

43
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele...

Page 1: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

1

Load Shedding in a Data Stream Manager

Slides edited from the original slides of Kevin Hoeschele

Anurag Shakti Maskey

Page 2: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

2

Overview

• Loadshedding in Streams example

• How Aurora looks at Load Shedding

• The algorithms Used by Aurora

• Experiments and results

Page 3: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

3

Load Shedding in a DSMS

• Systems have a limit to how much fast data can be processed

• When the rate is too high, Queues will build up waiting for system resources

• Loadshedding discards some data so the system can flow

• Different from networking loadshedding– Data has semantic value in DSMS– QoS can be used to find the best stream to drop

Page 4: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

4

• Hospital - Network– Stream of free doctors locations– Stream of untreated patients locations, their

condition (dieing, critical, injured, barely injured)

– Output: match a patient with doctors within a certain distance

JoinDoctors

PatientsDoctors who can work on a patient

Page 5: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

5

Too many Patients, what to do?

• Loadshedding based on condition– Official name “Triage”– Most critical patients get treated first– Filter added before the Join

• Selectivity based on amount of untreated patients

JoinDoctors

PatientsDoctors who can work on a patient

Condition Filter

Page 6: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

6

Aurora Overview

• Push based data from streaming sources• 3 kinds of Quality of Service

– Latency• Shows utility drop as answers take longer to

achieve

– Value-based• Shows which output values are most important

– Loss-tolerance• Shows how approximate answers affect a query

Page 7: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

7

Loadshedding Techniques

• Filters (semantic drop)– Chooses what to shed based on QoS– Filter with a predicate in which selectivity = 1-p– Lowest utility tuples are dropped

• Drops (random drop)– Eliminates a random fraction of input– Has a p% chance of dropping each incoming

tuple

Page 8: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

8

3 Questions of Load Shedding

• When– Load of system needs constant evaluation

• Where– Dropping as early as possible saves most

resources• Can be a problem with streams that fan out and

are used by multiple queries

• How much– the percent for a random drop– Make the predicate for a semantic drop (filter)

Page 9: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

9

Load Shedding in Aurora• Aurora Catalog

– Holds QoS and other statistics– Network description

• Loadshedder monitors these and input rates: makes loadshedding decisions– Inserts drops/filters into the query network,

which are stored in the catalogLoad Shedder

Catalog

Query NetworkInput streams output

Network descriptionChanges toQuery plansData rates

Page 10: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

10

Equation

• N= network

• I=input streams

• C=processing capacity

• Uaccuracy= utility from loss-tolerance QoS graph

• H=Headroom factor, % of sys resources that can be used at a steady state

If (Load(N(I)) > C then load shedding is needed (why no H)

Goal is to get a new network N’ based on N but where: min{Uaccuracy(N(I))-Uaccuracy(N’(I))}

and

(Load(N’(I)) < H * C

Page 11: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

11

Load Shedding Algorithm

• Evaluation Step– When to shed load?

• Load Shedding Road Map (LSRM) – Where to shed load?– How much load to shed?

Page 12: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

12

Load Evaluation

• Load Coefficients (L) [processor cycles / tuple]– the number of processor cycles required to

push a single tuple through the network to the outputs

c1

s1

c2

s2

cn

sn

…I O

n

i

i

ij

j

j cs1

1

1

*)(L = • n operators

• ci = cost

• si = selectivity

Page 13: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

13

Load Evaluation Load Coefficient

L1 = 10 + (0.5 * 10) + (0.5 * 0.8 * 5) + (0.5 * 10) = 22

L2 = 10 + (0.8 * 5) = 14

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22

Page 14: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

14

• Stream Load (S)– load created by the current stream rates

Load Evaluation

m

i

ii rL1

*S = • m input streams

• Li = load coefficient

• ri = input rate

Page 15: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

15

Load EvaluationStream Load

S = 22 * 10 = 220

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

Page 16: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

16

• Queue Load (Q)– load due to any queues that may have built

up since the last load evaluation step

• MELT_RATE = how fast to shrink the queues

(queue length reduction per unit time)

Load Evaluation

Q = MELT_RATE * Li * qi

• Li = load coefficient

• qi = queue length

Page 17: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

17

Load EvaluationQueue Load

MELT_RATE = 0.1

Q = 0.1 * 5 * 100 = 50

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

q = 100

Page 18: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

18

Load EvaluationTotal Load

•Total Load (T) = S + Q

T = 220 + 50 = 270

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10L(I) = 22r = 10

q = 100

Page 19: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

19

• The system is overloaded when

Load Evaluation

T > H * C

headroom factor processing capacity

Page 20: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

20

Load Shedding Algorithm

• Evaluation Step– When to drop?

• Load Shedding Road Map (LSRM)– How much to drop?– Where to drop?

Page 21: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

21

Load Shedding Road Map (LSRM)

<Cycle Savings Coefficients (CSC)

Drop Insertion Plan (DIP)

Percent Delivery Cursors (PDC)>set of drops that will be inserted

how many cycles will be saved

where the system will be running when the DIP is adopted

max savings

(0,0,0,…,0)

CSC

DIP

PDC

ENTRY n……ENTRY 1

cursor more load sheddingless load shedding

Page 22: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

22

LSRM Constructionset Drop Locations

compute & sort Loss/Gain ratios

how much to drop?

take the least ratio

insert Drop

create LSRM entry

how much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Drop-Based LS Filter-Based (Semantic) LS

Page 23: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

23

Drop LocationsSingle Query

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A B C D

Page 24: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

24

Drop LocationsSingle Query

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A

Page 25: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

25

Drop LocationsShared Query

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10A

B

C

D E

F

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 26: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

26

Drop LocationsShared Query

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I

O1

4

c2 = 10

s2 = 0.9

O2

L1 = 22

L2 = 14 L3 = 5

L4 = 10A

B

C

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 27: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

27

Loss/Gain RatioLoss

• Loss – utility loss as tuples are dropped

– determined using loss-tolerance QoS graph

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

100 50 0% tuples0

0.7

1

utility

Loss for first piece of graph

= (1 – 0.7) / 50

= 0.006

Page 28: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

28

Loss/Gain RatioGain

• Gain – processor cycles gained

• R = input rate into drop operator

• L = load coefficient

• x = drop percentage

• D = cost of drop operator

• STEP_SIZE = increments for x to find G(x)

(To garanty G(x) > 0)

Gain G(x) =

otherwise 0

0 x if )*(* DLxR

set Drop Locations

compute & sort Loss/Gain ratios

Drop-Based LS Filter-Based LS

Page 29: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

29

Drop-Based Load Shedding

how much to drop?

• Take the least Loss/Gain ratio

• Determine the drop percentage p

how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

Page 30: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

30

Drop-Based Load Shedding

where to drop?how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A drop drop dropdrop

If there are other drops in the network, modify their drop percentages.

Page 31: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

31

Drop-Based Load Shedding

make LSRM entry

• All drop operators with the modified percentages form the DIP

• Compute CSC

• Advance QoS cursors and store in PDC

LSRM Entry

<Cycle Savings Coefficients (CSC)

Drop Insertion Plan (DIP)

Percent Delivery Cursors (PDC)>

how much to drop?

take the least ratio

insert Drop

create LSRM entry

Drop-Based LS

Page 32: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

32

Filter-Based (Semantic) Load Sheddinghow much to drop?predicate for filter

• Start dropping from the interval

with the lowest utility.

• Keep a sorted list of intervals according to their utility and relative frequency.

• Find out how much to drop and what intervals are needed to.

• Determine the predicate for filter.

how much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Filter-Based LS

Page 33: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

33

Filter-Based Load Shedding

place the filterhow much to drop?

take the least ratio

insert Filter

create LSRM entry

determine predicate

Filter-Based LS

1

c1 = 10

s1 = 0.5

2

c2 = 10

s2 = 0.8

3

cn = 5

sn = 1.0

I O

L1 = 17 L2 = 14 L3 = 5

A filter filter filterfilter

If there are other filters in the network, modify their selectivities.

Page 34: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

34

Experiment setup

• Simulated network – Processing tuple time simulated by having

the simulator process use the cpu for amount of time needed for an operator to consume a tuple

– Process for each input stream– randomly created network

• Num querys, Num operations for querys chosen

• Random networks a good benchmark?

Page 35: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

35

Experiments

• Used only Join, Filter, Union Aurora Operators– Filters were simple comparison predicates of the

form:• Input_value > filter_constant

• Filters and Drops loadshedding were Compared to 4 Admission Control Algorithms– Similar in style to networking loadshedding

Page 36: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

36

Evaluation Methods

• Loss-tolerance, and Value-based QoS were used

• Tuple Utility is the utility from Loss-tolerance QoS– K= num time segments

– ni= num tuples per time segment i

– ui= loss-tolerance utility for each tuple during time segment i

Page 37: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

37

Value Utility

• Value Utility is the Utility from value-based QoS– fi= relative frequency of tuples in value interval i with no

drops– fi’=frequency relative to the total number of tuples– Ui=average value utility for value interval i

• When there are multiple queries, Overall Utility is the sum of the utilities for each query

Page 38: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

38

Algorithms

• Input-Random– One random stream is chosen, and tuples are shed untill

excess load is covered– if the whole stream is shed and there is still excess load,

another random stream is chosen• Input-Cost-Top

– Similar to Input-Random, but uses the input stream with the most costly input

• Input-Uniform– Distributes load shedding uniformly by each input stream

• Input-Cost-Uniform– Load is shed of all input streams, weighted by their cost

Page 39: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

39

Results – Tuple Utility Loss

Observations:

QoS driven AlgorithmsPerform better

Filter works better then Drop

Page 40: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

40

Results -Value utility loss

Filter-LS is clearly the best

Drop-LS is no better then the Admission control algorithms

Page 41: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

41

Conclusion

• Loadshedding is important to DSMS

• Many variables to considor when planning to use Loadshedding

• Drop and Filter are two QoS driven algorithms

• QoS based strategies work better then Admission control

Page 42: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

42

Questions

• Drop and Filter were the two QoS loadshedding algorithms given here. Are there any others?

• Admission Control may be a viable option in processing network requests, but in a streaming database system the connection is already made. Where putting the incoming tuples into a buffer to in effect deny the stream bandwidth, would this increase utility?

• Why are REDs useful or not useful for streaming databases?

Page 43: 1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.

43

More Questions

• When we have a low bandwidth connection like a sensor that is unreliable and when a significant amount of traffic is out of order, is TCP the best transport protocol?

• When there is high traffic, to what extent should the network do the load shedding? Should the database system be doing more because it knows the semantics of the tuples?

• So the idea of Admission control doesn't directly cross-over from networks to streaming databases.  But does the idea of buffering the input when the process becomes overloaded, achieve the same effect?  Why doesn't aurora have this?