Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly...

36
Parallel Algorithms PART 2

Transcript of Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly...

Page 1: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel AlgorithmsPART 2

Page 2: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Last time …

Introduction to Parallel Algorithms

Complexity analysis

Work/Depth model

Prefix Sum, Parallel Select

Questions?

Page 3: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Select

Select numbers < pivot

𝐴 ← [1 2 3 0 4 0 2 3 0 1 3 4]

pivot ← 2

[1 0 0 1 0 1 0 0 1 1 0 0]

[1 1 1 2 2 3 3 3 4 5 5 5]

Page 4: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Select the 𝑎𝑖 < pv

[l,m] ← select_lower (a, n, pv)

// t = t[0,…,n-1]

parfor (i=0; i<n; ++i) t[i] ← a[i] < pv;

s ← scan (t); m ← s(n-1);

parfor (i=0; i<n; ++i) if t[i] l[s[i] – 1] ← a[i];

𝑊 𝑛 = 𝑂(𝑛)𝐷 𝑛 = 𝑂(log 𝑛)

Page 5: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Today …

Intro to Parallel Algorithms

Parallel Search

Parallel Sorting

Merge sort

Sample sort

Bitonic sort

Communication costs

Page 6: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Search

Problem Description

Given a sorted list 𝑋 of size 𝑛 and an element 𝑦

Find the index 𝑖 | 𝑥𝑖 ≤ 𝑦 < 𝑥𝑖+1

Sequential

Use binary search

𝑂(log𝑛) time

Work depth

parfor(i) if 𝑥𝑖 ≤ 𝑦 < 𝑥𝑖+1 return i; // no duplicates 𝑊 = 𝑛,𝐷 = 1

PRAM

𝑂log 𝑛

log 𝑝using 𝑝 processes

Page 7: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Ranking

Given ordered lists, 𝐴, 𝐵 of lengths 𝑠, 𝑡

Define:

rank 𝑧: 𝐴 ← number of elements 𝑎𝑖 | 𝑎𝑖 ≤ 𝑧

Define:

rank 𝐵: 𝐴 ≔ 𝑟1, 𝑟2, … , 𝑟𝑡𝑟𝑖 ← rank(𝑏𝑖: 𝐴)

Page 8: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Ranking

𝐴 = 7 13 25 26 31 54

𝐵 = [1 8 13 27]

rank 𝐵: 𝐴 = [0 1 2 4]

rank 𝐴: 𝐵 = 1 3 3 3 4 4 4

Use binary search

Consider a multithreaded vs Hadoop implementation

Page 9: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel SortMERGE SORT

Page 10: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Divide & Conquer Merge Sort

Divide 𝑋 into 𝑋1 and 𝑋2

Sort 𝑋1 and 𝑋2

Merge 𝑋1 and 𝑋2

Uses a Binary Tree

Bottom-up approach

Start with the leaves

Climb to the root

Merge the branches

Requires parallel Merge

Page 11: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

example

-8, -7, -5, 3, 6, 12, 28, 51

-7, -5, 12, 51

-5, -12

12 -5

-7, 51

-7 51

-8, 3, 6, 28

6, 28

6 28

-8, 3

-8 3

Input

Page 12: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Merge sort

b = Merge_Sort(a,n)

if n < 100

return seqSort(a, n);

b1 = Merge_Sort(a[0,…,n/2-1], n/2);

b2 = Merge_Sort(a[n/2,…,n-1], n/2);

return Merge (b1, b2);

Page 13: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Merge Sort - Complexity

Page 14: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Merge

Page 15: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Merging two lists of lengths 𝑛,𝑚

Problem description (𝑚 ≤ 𝑛)

Given, 𝐴 = (𝑎1, 𝑎2, … , 𝑎𝑛) and 𝐵 = (𝑏1, 𝑏2, … , 𝑏𝑚)

𝑎𝑖 < 𝑎𝑖+1 ∀𝑖

𝑏𝑖 < 𝑏𝑖+1 ∀𝑖

𝐴 ∩ 𝐵 = ∅

Build 𝐶 = 𝑐1, 𝑐2, … , 𝑐𝑛+𝑚

𝑐𝑖 ∈ 𝐴 ∪ 𝐵

𝑐𝑖 < 𝑐𝑖+1 ∀𝑖

Page 16: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Merging two sorted lists

Best Sequential Time: 𝑂(𝑛)

Parallel Merge:

Tradeoffs between

Depth-Optimal

Work-Optimal

Page 17: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Merging using Ranking

Assume elements in 𝐴 and 𝐵 are distinct

Let 𝐶 be the merged result. Given,

𝑥 ∈ 𝐶

rank 𝑥: 𝐶 = 𝑖

𝑐𝑖 = 𝑥

Propertyrank 𝑥: 𝐶 = rank 𝑥: 𝐴 + rank(𝑥: 𝐵)

Solution to the merging problem,

Find rank 𝐴: 𝐵 and rank(𝐵: 𝐴)

Parallel searches using 𝑝 = 𝑛𝑚, 𝐷 = 𝑂(1) but 𝑊 = 𝑂(𝑛2)

Concurrent binary searches, 𝐷 = 𝑂 log𝑛 and 𝑊 = 𝑂(𝑛 log𝑛)

Goal: Parallelize with optimal work

Recall that an algorithm is work optimal iff 𝑊𝑝 = 𝑊𝑠𝑒𝑞

Page 18: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Example

Page 19: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Work-optimal merge - Merge1

𝐴 = 𝑎1, … , 𝑎𝑛 , 𝐵 = 𝑏1, … , 𝑏𝑚 , 𝑛 ≥ 𝑚

1. Partition 𝐵 into 𝑚

log𝑚blocks

Size of each block log𝑚

2. parallel for i = 1 : 𝑚/ log𝑚

𝑅𝑖 ← rank(𝑏𝑖 log𝑚 ∶ 𝐴) using sequential binary search

3. Partition 𝐴 accordingly

Block 𝐴𝑖 ∶ (𝑎𝑅𝑖−1+1, … , 𝑎𝑅𝑖)

4. Merge blocks 𝐴𝑖 and 𝐵𝑖 in 𝑂(log𝑚) time using sequential merge

But if 𝐴𝑖 ≫ 𝐵𝑖 = log𝑚, then recurse … Merge1(𝐵𝑖 , 𝐴𝑖)

Work ?

Depth ?

Page 20: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sequential Sorting

What is the complexity ?

𝒪(? )

Page 21: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sequential Sorting

Comparison based

𝒪(𝑛 log 𝑛)

Can we sort faster than 𝒪(𝑛 log 𝑛) ?

Non-comparison based

𝒪(𝑛)

Page 22: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Bucket sort

Assume input is uniformly distributed over an interval [𝑎, 𝑏]

Divide interval into 𝑚 equal sized intervals (buckets)

Drop numbers into appropriate buckets

Sort each bucket (say using quicksort)

𝒪 𝑛 log𝑛

𝑚

For 𝑚 = 𝒪(𝑛) 𝒪(𝑛) sorting

Radix sort

dense, uniform distribution

Page 23: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Quicksort

𝑝1 𝑝2 𝑝3 𝑝4

Page 24: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Quicksort

parallel median selection

𝑝1 𝑝2 𝑝3 𝑝4

Page 25: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Quicksort

parallel exchange

𝑝1 𝑝2 𝑝3 𝑝4

Page 26: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Parallel Quicksort

𝑝1 𝑝2 𝑝3 𝑝4

Page 27: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)
Page 28: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

𝑝1 𝑝2 𝑝3 𝑝4

Page 29: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

𝑝1 𝑝2 𝑝3 𝑝4

Page 30: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

𝑝1 𝑝2 𝑝3 𝑝4

𝑝1

Page 31: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

𝑝1 𝑝2 𝑝3 𝑝4

𝑝1

Page 32: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

𝑝1 𝑝2 𝑝3 𝑝4

𝑝1Pick splitters and broadcast

Page 33: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

bucket data & all2all exchange

𝑝1 𝑝2 𝑝3 𝑝4

Page 34: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample sort

randomly partition input in 𝑛

𝑝points

sort locally

select 𝑝 splitters/processor (evenly)

guarantees no more 2*n/p elements / bucket (proof?)

gather(splitters) in 𝑝0

sort splitters in 𝑝0 and create buckets

block partition using 𝑝 binary search on n/p sorted seq.

Exchange data

sort

Page 35: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort

Sort locally 𝒪𝑛

𝑝log

𝑛

𝑝

Select 𝑝 − 1 splitters per process 𝒪(𝑝)

Gather splitters in 𝑝0 𝒪(𝑝2)

Sort splitters in 𝑝0 𝒪(𝑝2 log 𝑝)

Broadcast splitters 𝒪 𝑝 log 𝑝

Sort again 𝒪𝑛

𝑝log

𝑛

𝑝

Page 36: Parallel Algorithms - School of Computing · 2018-02-22 · Bucket sort Assume input is uniformly distributed over an interval [ , ] Divide interval into equal sized intervals (buckets)

Sample Sort – load balance

Guarantees no more 2𝑛

𝑝elements / bucket

Proof:

All entries on 𝑝𝑖 must be > 𝑠𝑖−1 and ≤ 𝑠𝑖

𝑖 − 2 𝑝 +𝑝

2elements of the sample ≤ 𝑠𝑖

lower bound elements = 𝑖−2 𝑝+

𝑝

2𝑛

𝑝2

𝑝 − 𝑖 𝑝 −𝑝

2elements of the sample > 𝑠𝑖

upper bound elements = 𝑝−𝑖 𝑝−

𝑝

2𝑛

𝑝2+

𝑛

𝑝2− 1

Maximum number of elements on processor 𝑖,

𝑛 − 𝑢𝑏 − 𝑙𝑏 =2𝑛

𝑝−

𝑛

𝑝2+ 1 ≤

2𝑛

𝑝∎