Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic...
-
Upload
sarina-turney -
Category
Documents
-
view
213 -
download
0
Transcript of Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic...
![Page 1: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/1.jpg)
Parallel Algorithms
![Page 2: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/2.jpg)
Computation Models
• Goal of computation model is to provide a realistic representation of the costs of programming.
• Model provides algorithm designers and programmers a measure of algorithm complexity which helps them decide what is “good” (i.e. performance-efficient)
![Page 3: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/3.jpg)
Goal for Modeling
• We want to develop computational models which accurately represent the cost and performance of programs
• If model is poor, optimum in model may not coincide with optimum observed in practice
Model Real World
xA
B
optimum
optimum
Y
![Page 4: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/4.jpg)
Models of Computation
What’s a model good for??
• Provides a way to think about computers. Influences design of:
• Architectures
• Languages
• Algorithms
• Provides a way of estimating how well a program will perform.
Cost in model should be roughly same as cost of executing program
![Page 5: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/5.jpg)
The Random Access Machine Model
RAM model of serial computers:– Memory is a sequence of words, each
capable of containing an integer.
– Each memory access takes one unit of time
– Basic operations (add, multiply, compare) take one unit time.
– Instructions are not modifiable
– Read-only input tape, write-only output tape
![Page 6: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/6.jpg)
Has RAM influenced our thinking?
Language design:
No way to designate registers, cache, DRAM.
Most convenient disk access is as streams.
How do you express atomic read/modify/write?
Machine & system design:
It’s not very easy to modify code.
Systems pretend instructions are executed in-order.
Performance Analysis:
Primary measures are operations/sec (MFlop/sec, MHz, ...)
What’s the difference between Quicksort and Heapsort??
![Page 7: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/7.jpg)
What about parallel computers
• RAM model is generally considered a very successful “bridging model” between programmer and hardware.
• “Since RAM is so successful, let’s generalize it for parallel computers ...”
![Page 8: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/8.jpg)
PRAM [Parallel Random Access Machine]
PRAM composed of:– P processors, each with its own unmodifiable
program.
– A single shared memory composed of a sequence of words, each capable of containing an arbitrary integer.
– a read-only input tape.
– a write-only output tape.
PRAM model is a synchronous, MIMD, shared address space parallel computer.
(Introduced by Fortune and Wyllie, 1978)
![Page 9: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/9.jpg)
PRAM model of computation
• p processors, each with local memory
• Synchronous operation
• Shared memory reads and writes
• Each processor has unique id in range 1-p
Shared memory
![Page 10: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/10.jpg)
Characteristics• At each unit of time, a processor is either
active or idle (depending on id)
• All processors execute same program
• At each time step, all processors execute same instruction on different data (“data-parallel”)
• Focuses on concurrency only
![Page 11: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/11.jpg)
Variants of PRAM modelExclusive
WriteConcurrent
Write
ExclusiveRead
EREW ERCW
ConcurrentRead
CREW CRCW
![Page 12: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/12.jpg)
More PRAM taxonomy• Different protocols can be used for reading
and writing shared memory.– EREW - exclusive read, exclusive write
A program isn’t allowed to have two processors access the same memory location at the same time.
– CREW - concurrent read, exclusive write
– CRCW - concurrent read, concurrent write Needs protocol for arbitrating write conflicts
– CROW – concurrent read, owner writeEach memory location has an official “owner”
• PRAM can emulate a message-passing machine by partitioning memory into private memories.
![Page 13: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/13.jpg)
Sub-variants of CRCW• Common CRCW
– CW iff all processors writing same value
• Arbitrary CRCW– Arbitrary value of write set stored
• Priority CRCW– Value of min-index processor stored
• Combining CRCW
![Page 14: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/14.jpg)
Why study PRAM algorithms?• Well-developed body of literature on
design and analysis of such algorithms
• Baseline model of concurrency
• Explicit model– Specify operations at each step
– Scheduling of operations on processors
• Robust design paradigm
![Page 15: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/15.jpg)
Work-Time paradigm• Higher-level abstraction for PRAM algorithms
• WT algorithm = (finite) sequence of time steps with arbitrary number of operations at each step
• Two complexity measures– Step complexity T(n)
– Work complexity W(n)
WT algorithm work-efficient if W(n) = (TS(n))
optimal sequential Algorithm
![Page 16: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/16.jpg)
Designing PRAM algorithms• Balanced trees
• Pointer jumping
• Euler tours
• Divide and conquer
• Symmetry breaking
• . . .
![Page 17: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/17.jpg)
Balanced trees• Key idea: Build balanced binary tree on
input data, sweep tree up and down
• “Tree” not a data structure, often a control structure (e.g., recursion)
![Page 18: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/18.jpg)
Alg : Sum • Given: Sequence a of n = 2k elements
• Given: Binary associative operator +
• Compute: S = a1 + ... + an
![Page 19: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/19.jpg)
WT description of sum
integer B[1..n]forall i in 1 : n do B[i] := ai
enddofor h = 1 to k do forall i in 1 : n/2h do B[i] := B[2i-1] + B[2i] enddoenddoS := B[1]
![Page 20: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/20.jpg)
Points to note about WT pgm• Global program: no references to
processor id
• Contains both serial and concurrent operations
• Semantics of forall
• Order of additions different from sequential order: associativity critical
![Page 21: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/21.jpg)
Analysis of scan operation• Algorithm is correct
(lg n) steps, (n) work
• EREW model
• Two variants– Inclusive: as discussed
– Exclusive: s1 = I, sk = x1 + ... + xk-1
• If n not power of 2, pad to next power
![Page 22: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/22.jpg)
)(
1)(1 2
n
nnnW
k
hh
Complexity measures of Sum• Recall definitions of
step complexity T(n) and work complexity W(n)
• Concurrent execution reduces number of steps
)(lg11)( nknT
![Page 23: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/23.jpg)
How to do prefix sum ?• Input: Sequence x of n = 2k elements,
binary associative operator +
• Output: Sequence s of n = 2k elements, with sk = x1 + ... + xk
• Example:
x = [1, 4, 3, 5, 6, 7, 0, 1]
s = [1, 5, 8, 13, 19, 26, 26, 27]
![Page 24: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/24.jpg)
List Ranking
• List ranking problem– Given a singly linked list L with n objects, for each node,
compute the distance to the end of the list
• If d denotes the distance– node.d = 0 if node.next = nil
– node.next.d + 1 otherwise
• Serial algorithm: O(n)
• Parallel algorithm– Assign one processor for each node
– Assume there are as many processors as list objects
– For each node i, perform1. i.d = i.d + i.next.d
2. i.next = i.next.next // pointer jumping
{
![Page 25: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/25.jpg)
List Ranking - Pointer Jumping
• List_ranking(L)1. for each node i, in parallel do
2. if i.next = nil then i.d = 0
3. else i.d = 1
4. while exists a node i, such that i.next != nil do
5. for each node i, in parallel do
6. if i.next != nil then
7. i.d = i.d + i.next.d // i updates i itself
8. i.next = i.next.next
• Analysis– After a pointer jumping, a list is transformed into two (interleaved) lists
– After that, four (interleaved) lists
– Each pointer jumping doubles the number of lists and halves their length
– After log n, all lists contain only one node
– Total time: O(log n)
![Page 26: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/26.jpg)
List Ranking - Example
![Page 27: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/27.jpg)
List Ranking - Discussion
• Synchronization is important– In step 8 (i.next = i.next.next), all processors must read right hand side
before any processor write left hand side
• The list ranking algorithm is EREW– If we assume in step 7 (i.d = i.d + i.next.d) all processors read i.d and
then read i.next.d
– If j.next = i, i and j do not read i.d concurrently
• Work performance– performs O(n log n) work since n processors in O(log n) time
• Work efficient– A PRAM algorithm is work efficient w.r.t another algorithm if two
algorithms are within a constant factor
– Is the link ranking algorithm work-efficient w.r.t the serial algorithm?• No, because O(n log n) versus O(n)
• Speedup – S = n / log n
![Page 28: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/28.jpg)
Parallel Prefix on a List
• Prefix computation
– Input <x1, x2, .., xn>, a binary, associative operator
– Output <y1, y2, .., yn>
– Prefix computation: yk = x1 x2 .. xk
• Example
– if xk = 1 for k=1..n and = +
– Then yk = k, for k = 1..n
• Serial algorithm: O(n)
• Notation
– [i, j] = xi xi+1 .. xj
• [k, k] = xk • [i, k] [k+1, j] = [i, j]
• Idea: perform prefix computation on a linked list so that
– each node k contains [k, k] = xk initially
– finally each node k contains [1, k] = yk
![Page 29: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/29.jpg)
Parallel Prefix on a List (2)
• List_prefix(L, X)
// L: list, X: <x1, x2, .., xn>
1. for each node i, in parallel
2. i.y = xi
3. While exists a node i such that i.next != nil do
4. for each node i, in parallel do
5. if i.next != nil then
6. i.next.y = i.y i.next.y // i updates its successor
7. i.next = i.next.next
• Analysis– Initially k-th node has [k,k] as y-value, points to (k+1)-th node
– At the first iteration, • k-th node fetches [k+1,k+1] from its successor and
• perform [k,k] [k+1,k+1] resulting in [k,k+1] and • update its successor
– At the second iteration• k-th node fetches [k+1,k+2] from its successor and
• perform [k-1,k] [k+1,k+2] resulting in [k-1,k+2] and • update its successor
![Page 30: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/30.jpg)
Parallel Prefix on a List (3)
![Page 31: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/31.jpg)
Parallel Prefix on a List (4)
• Running time: O(log n)
– After log n, all lists contain only one node
• Work performed: O(n log n)
• Speedup
– S = n / log n
![Page 32: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/32.jpg)
Pointer jumping• Fast parallel processing of linked data
structures (lists, trees)
• Convention: Draw trees with edges directed from children to parents
• Example: Finding the roots of forest represented as parent array P– P[i] = j if and only if (i, j) is a forest edge
– P[i] = i if and only if i is a root
![Page 33: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/33.jpg)
Algorithm (Roots of forest)
forall i in 1:n do S[i] := P[i] while S[i] != S[S[i]] do S[i] := S[S[i]] endwhileenddo
![Page 34: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/34.jpg)
Initial state of forest
![Page 35: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/35.jpg)
After one iteration
![Page 36: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/36.jpg)
After another iteration
![Page 37: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/37.jpg)
Concurrent Read – Finding Roots
![Page 38: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/38.jpg)
Analysis of pointer jumping• Termination detection?
• At each step, tree distance between i and S[i] doubles unless S[i] is a root
• CREW model
• Correctness by induction on h
• O(lg h) steps, O(n lg h) work
• TS(n) = O(n)
• Not work-efficient unless h constant
![Page 39: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/39.jpg)
Concurrent Read – Finding Roots
• This is a CREW algorithm
• Suppose Exclusive-Read is used, what will be the running time?
– Initially only one node i has root information
– First iteration: Another node reads from the node i
• Totally two nodes are filled up
– Second iteration: Another two nodes can reads from the two nodes
• Totally four nodes are filled up
– k-th iteration: 2k-1 nodes are filled up
– If there are n nodes, k=log n
– So Find_root with Exclusive-Read takes O(log n).
• O(log log n) vs. O(log n)
![Page 40: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/40.jpg)
Euler tours• Technique for fast optimal processing of
tree data
• Euler circuit of directed graph: directed cycle that traverses each edge exactly once
• Represent (rooted) tree by Euler circuit of its directed version
![Page 41: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/41.jpg)
Trees = balanced parentheses
( ( ( ) ( ) ) ( ) ( ( ) ( ) ( ) ) )
Key property: The parenthesis subsequence corresponding to a subtree is balanced.
![Page 42: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/42.jpg)
Computing the Depth
• Problem definition– Given a binary tree with n nodes, compute the depth
of each node
• Serial algorithm takes O(n) time
• A simple parallel algorithm– Starting from root, compute the depths level by level
– Still O(n) because the height of the tree could be as high as n
• Euler tour algorithm– Uses parallel prefix computation
![Page 43: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/43.jpg)
Computing the Depth (2)• Euler tour: A cycle that traverses each edge exactly once in a
graph– It is a directed version of a tree
• Regard an undirected edge into two directed edges
– Any directed version of a tree has an Euler tour by traversing the tree• in a DFS way forming a linked list.
• Employ 3*n processors– Each node i has fields i.parent, i.left, i.right
– Each node i has three processors, i.A, i.B, and i.C.
• Three processors in each node of the tree are linked as follows– i.A = i.left.A if i.left != nil
– i.B if i.left = nil
– i.B = i.right.A if i.right != nil
– i.C if i.right = nil
– i.C = i.parent.B if i is the left child
– i.parent.C if i is the right child
– nil if i.parent = nil
{
{
{
![Page 44: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/44.jpg)
Computing the Depth (3)
• Algorithm– Construct the Euler tour for the tree – O(1) time
– Assign 1 to all A processors, 0 to B processors, -1 to C processors
– Perform a parallel prefix computation
– The depth of each node resides in its C processor
• O(log n)– Actually log 3n
• EREW because no concurrent read or write
• Speedup– S = n/log n
![Page 45: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/45.jpg)
Computing the Depth (4)
![Page 46: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/46.jpg)
Broadcasting on a PRAM
• “ Broadcast” can be done on CREW PRAM in O(1) steps:– Broadcaster sends value to shared memory
– Processors read from shared memory
• Requires lg(P) steps on EREW PRAM.
M
PPPPPPPP
B
![Page 47: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/47.jpg)
Concurrent Write – Finding Max
• Finding max problem– Given an array of n elements, find the maximum(s)
– sequential algorithm is O(n)
• Data structure for parallel algorithm– Array A[1..n]
– Array m[1..n]. m[i] is true if A[i] is the maximum
– Use n2 processors
• Fast_max(A, n)1. for i = 1 to n do, in parallel
2. m[i] = true // A[i] is potentially maximum
3. for i = 1 to n, j = 1 to n do, in parallel
4. if A[i] < A[j] then
5. m[i] = false
6. for i = 1 to n do, in parallel
7. if m[i] = true then max = A[i]
8. return max
• Time complexity: O(1)
![Page 48: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/48.jpg)
Concurrent Write – Finding Max
• Concurrent-write– In step 4 and 5, processors with A[i] < A[j] write the same value
‘false’ into the same location m[i]
– This actually implements m[i] = (A[i] A[1]) … (A[i] A[n])
• Is this work efficient? – No, n2 processors in O(1)
– O(n2) work vs. sequential algorithm is O(n)
• What is the time complexity for the Exclusive-write? – Initially elements “think” that they might be the maximum
– First iteration: For n/2 pairs, compare. • n/2 elements might be the maximum.
– Second iteration: n/4 elements might be the maximum.• log n th iteration: one element is the maximum.
– So Fast_max with Exclusive-write takes O(log n).
• O(1) (CRCW) vs. O(log n) (EREW)
![Page 49: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/49.jpg)
Simulating CRCW with EREW• CRCW algorithms are faster than EREW algorithms
– How much fast?
• Theorem
– A p-processor CRCW algorithm can be no more than O(log p) times faster than the best p-processor EREW algorithm
• Proof by simulating CRCW steps with EREW steps– Assumption: A parallel sorting takes O(log n) time with n processors
– When CRCW processor pi write a datum xi into a location li, EREW pi writes the pair (li, xi) into a separate location A[i]
• Note EREW write is exclusive, while CRCW may be concurrent
– Sort A by li
• O(log p) time by assumption
– Compare adjacent elements in A
– For each group of the same elements, only one processor, say first, write xi into the global memory li.
• Note this is also exclusive.
– Total time complexity: O(log p)
![Page 50: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/50.jpg)
Simulating CRCW with EREW (2)
![Page 51: Parallel Algorithms. Computation Models Goal of computation model is to provide a realistic representation of the costs of programming. Model provides.](https://reader036.fdocuments.in/reader036/viewer/2022081518/5518d09b550346a61f8b5c92/html5/thumbnails/51.jpg)
CRCW versus EREW - Discussion
• CRCW– Hardware implementations are expensive
– Used infrequently
– Easier to program, runs faster, more powerful.
– Implemented hardware is slower than that of EREW• In reality one cannot find maximum in O(1) time
• EREW– Programming model is too restrictive
• Cannot implement powerful algorithms