Load Balancing Tasks with Overlapping Requirements Milan Vojnovic Microsoft Research Joint work with...

Post on 26-Dec-2015

213 views 0 download

Tags:

Transcript of Load Balancing Tasks with Overlapping Requirements Milan Vojnovic Microsoft Research Joint work with...

Load Balancing Tasks with Overlapping Requirements

Milan VojnovicMicrosoft Research

Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias, Bo Zong

2

Motivating Application Scenario: Stream Processing Platforms

3

Tasks and Requirements

4

5

Problem #1: Bi-Criteria Load Balancing

Query Assignment Problem:

• Find an assignment of tasks to machines that

Criteria 1: minimizes the total number of distinct requirements that need to be supplied to machines

Criteria 2: the number of tasks assigned over machines is balanced

6

Problem #2: Min-Max Load Balancing

Query Assignment Problem:

• Find an assignment of tasks to machines that minimizes the maximum number of distinct requirements needed by a machine

7

Other Motivating Application Scenarios• Scheduling tasks in distributed clusters of machines with data locality

• …

• Beyond resource allocation in data centres:

• Clustering of information objects (documents, images, videos)

• Summarizing topics for collections of documents

• …

8

Related Work

Standard load balancing• Identical machines Graham-1996• Related machines Aspnes et al-1993, Cho and Sahni-

1988• Restricted machines Azar et al-1992• Unrelated machines Aspnes et al-1993• Routing Aspnes et al-1993

Min-max multiway cut Bansal et al-2014Svitkina and Tardos 2004

9

Problem #1: Bi-Criteria Load Balancing

Minimize

subject to

for

set of requirements set of tasks 𝑓 (𝑄′ )=∑

𝑠∈𝑆

𝑤 (𝑠 )1 (𝑠 requiredby some 𝑞∈𝑄 ′)𝑆𝑞⊆𝑆 , for every q∈𝑄

10

NP Hardness

• Query Assignment Problem is NP-complete

Proof: Reduction from the well known bin packing problem

11

Random Query Assignment

• Maximum number of tasks per machine:

with probability

[Raab and Steger, 1998]

• The expected number of requirements needed by the machines:

= number of tasks needing requirement

12

Deficiency of Random Query Assignment

𝑛/ 𝑙

𝑛/ 𝑙

𝑛/ 𝑙

𝑚/ 𝑙

𝑚/ 𝑙

𝑚/ 𝑙

• Expected number of needed requirements:

as

• Optimal:

13

Special Case: Tasks with Singleton Requirements

• There exists a polynomial-time algorithm that guarantees 2-approximation for singleton task requirements with arbitrary weights

14

Algorithm

15

Tasks with Arbitrary Sets of Requirements• For unit-weight requirements, there exists a polynomial algorithm

with approximation ratio

where is maximum number of requirements of a task

• For arbitrary-weight requirements, the same approximation ratio holds but with an extra factor: the ratio of the max to the min weight

16

Gadget: Minimum Task Type Packing

• Given a set of requirements , a set of tasks , and a real number • Find a subset of query types that minimizes

subject to

17

Algorithm

1. Pick an empty machine2. Find a subset of query types that approximately solves MQP problem

with parameter

3. Let be the subset of unassigned queries of type in 4. If then apply a pruning procedure5. If there are unassigned queries, go to 1

18

Experimental Evaluation

• Random bipartite graph for subscriptions of tasks to requirements• Number of tasks per requirement according to a Zipf distribution ()• Number of requirements per task fixed to a constant

• Metric: replication factor

= total number of needed requirements / m

19

Offline Algorithms

• MQP = defined in an earlier slide• OffRand = uniform random assignment of a query type to a machine• IC = Incremental cost• MMS = Min-max traffic cost per machine

20

Performance of Offline Algorithms

Number of requirements per task

21

Online Task Assignment

• LeastCost

• LeastSource

• LeastQT

22

Performance of Online Algorithms

Number of requirements per task

23

Problem #2: Min-Max Load Balancing

Minimize

subject to

24

Online Task Assignment

• At each arrival of task

• Compute for every

• Assign task to machine in

25

Hidden Co-Clustering Input

26

Recovery Theorem

• Suppose and

There exists an online assignment of tasks that guarantees asymptotic recovery of hidden clusters

Proof: coupling to a Polya’s urn process

Asymptotic recovery: portion of tasks from the same hidden cluster of tasks that is assigned to the same bin goes to 1 for asymptotically large number of tasks

27

Experimental Evaluation

• Dataset

• Greedy• Random = random task arrival• Decreasing with respect to the number of requirements

• Balance big = large tasks to least loaded, small items according to greedy• Prefer big = large tasks to least loaded, delayed assignment of up to a fixed number of

small tasks

28

Retail dataset

29

Conclusion

• Studied two variants of non-standard load balancing problems• Bi-criteria and min-max

• Approximation ratios for offline problems• Hidden clustering recovery conditions for a simple greedy online task

assignment strategy• Open questions:• Tighter approximation ratios for offline versions of both problems?• Similar hidden cluster recover questions (allowing for more memory)?