Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to...
-
Upload
meredith-goodwin -
Category
Documents
-
view
214 -
download
0
Transcript of Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to...
![Page 1: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/1.jpg)
Decomposition
• Data Decomposition– Dividing the data into subgroups and assigning
each piece to different processors– Example: Embarrassingly parallel applications
• Functional Decomposition– Dividing an algorithm into its functional pieces and
executing the pieces in separate processors– Example: Pipelining
![Page 2: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/2.jpg)
Pipelined Computations
• Divide a problem into a series of tasks• A processor completes a task sequentially and
pipes the results to the next processor
Example of Summing Groups of Numbers
P0 P1 P4P2 P3 P5
P0 P1 P4P2 P3 P5
∑A[i0] ∑A[i1] ∑A[i2] ∑A[i3] ∑A[i4] ∑A[i5]
zero total
Question: Is this data or is it functional decomposition?
![Page 3: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/3.jpg)
Where is Pipelining Applicable?
Type 1 – More than one instance of a problem – Example: Multiple simulations with different parameter settings
Type 2– Series of data items with multiple operations– Example: Signal Filter or Eratosthenes Sieve
Type 3– Partial results passed on while processing continues– Example: Solving sets of linear equations
Considerations– Are there a series of sequential tasks?– Is the processing of each tack approximately equal?– Can items be grouped to minimize communication cost– If stages exceed processors
o Group stageso Wrap last stage back to the first
– Determine where the result will be at the end of the process
![Page 4: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/4.jpg)
Summing Numbers Example
process Pi>0 && <N-1
recv(&sum, Pi-1);
sum += number;send(&sum, Pi+1);
Process P0
send(&number, P1);
Process PN-1
recv(&number, Pn-2);
sum += number;Save or display result
![Page 5: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/5.jpg)
Application• Remove frequencies from a signal
– Sequential Algorithm: Fourier Analysis (O(N lg(N))– Parallel: Apply filters to the signal (O(N*FilterLength)) with convolution.
– Filter Examples: Chebyshev, ButtorWorth, etc.– Derive filter: Set Z-domain poles and zeroes, perform inverse tranformation.– Filters can be useful to manipulate signals, detect patterns, etc.
![Page 6: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/6.jpg)
Chebyshev Filter DesignChebyshev in the z-domain Chebyshev Frequency Response
Note: Depending on the placement of the poles (+) and zeroes (0), the filter will effect a signal differently
![Page 7: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/7.jpg)
Type 1: Multiple Instances
Sequential execution: t1 = m*tm
Parallel Processing: (m + p – 1)*tm/p
Parallel Communication: (m+p-1)*(tstart+n*tdata)
Speed up: tp= m*tm/((m+p-1)*(tm/p+tstart+n*tdata))
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
Instance 1
Instance 2
Instance 3Instance 4Instance 5
Time
Space Time Diagram
Notation1. m = instances, p = processors2. tstart = latency tdata = bandwidth3. n = data transmitted /instance4. tm = total time to process an
instance5. Total pipeline cycles = m + p – 16. Assume: Equal processing per stage
![Page 8: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/8.jpg)
Type 2: Multiple Data Elements
P0 P1 P4P2 P3 P5
Filter f0
UnfilteredSignal
FilteredSignal
Filter f1
Filter f2
Filter f3
Filter f4
Filter f5
d9d8d7d6d5d4d3d2d1d0 P0 P0 P0 P0 P0 P0
Example: Signal FilterEach process removes one or more frequencies from a digitized signal
![Page 9: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/9.jpg)
Type 2 Timing Diagram
![Page 10: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/10.jpg)
Type 3: Partial Processing• The next stage receives information to continue processing• Additional processing continues at the source processor
Question: How do we determine speed-up?
P0
P1
P2
P3
P4
P5
P0
P1
P2
P3
P4
P5
Linear Equations A More Balanced Load
= Idle
= Executing
![Page 11: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/11.jpg)
Operation at each processorTypes 1 and 2
• Processor with rank r = 0– Generate the instance (type 1) or the data (type 2) to process– Process appropriately– Send message to the processor with rank 1
• Processors with rank r = 1, 2, p-2– Receive message from the processor with rank r-1– Process appropriately– Send message to the processor with rank r+1
• Processor with rank r = p-1– Receive message from processor with rank r-1– Process appropriately– Output final results Examples
1) Adding Numbers: n1 -> n1+n2 -> n1+n2+n3 -> . . .2) Frequency removal: f(t) -> f0; f(t-f0)-> f1; f(t-f0-f1)-
> . . .
![Page 12: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/12.jpg)
Parallel Pipeline
Sort
5 4 3 2 1
5 4 3 2
5 4 3 1
5 4 2
5 3 1
5 2
5 2
5
5
Step Numbers P0 P1 P2 P3 P4
4, 3, 1, 2, 5
4, 3, 1, 2
4, 3, 1
4, 3
4
1
1
1
1
2
2
2
3
3
4
1
2
3
4
5
6
7
8
9
10
• Pseudo code
Receive xi
IF xi < xmax
Send xi
ELSE
Send xmax
xmax = xi
Note: Processors can hold blocks of numbers for better efficiency
![Page 13: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/13.jpg)
Bi-Directional Pipeline• Use the pipeline to return results to the master
– Useful for line topologies, ring, or hypercube
P0 P1 P4P2 P3 P5
Sorting Phase
P4
P3
P2
P1
P0 Time
Gather Phase
Phases•N(generate steps); •N-1 (propagate steps); •N-1 (return steps) = 3N-2
• Sort PhaseIf (myid == 0) generate number Else receive(&number, pmyid-1)If (number > max and myid<P-1){ send(max,pmyid+1); maximuSoFar=number;}
• Gather phaseIf (myid < P-1) receive sorted numbers from pmyid+1
If (myid > 0) send sorted numbers to pmyid-1
Example: Sorting
![Page 14: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/14.jpg)
Sieve of Eratosthenes
![Page 15: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/15.jpg)
Prime Number GenerationSieve of Eratosthenes (Type 2 pipeline)
• Concept– Each processor filters blocks of non-primes from the flow of data– The “potential” prime numbers pass through to the next
processor• Pseudo-code
The Master processor generates an array of odd n numbersIn a loop after receiving a group of numbers
Filter a group of numbers; pass unfiltered numbers down the pipelineGather all of the primes
• Notes– Wrapping the pipeline in a ring could help maintain load balance– A termination message determines when the pipeline empties
Question: What range of numbers should each processor get?
![Page 16: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/16.jpg)
Sequential codefor (i = 2; i < n; i++)
prime[i] = 1; for (i = 2; i <= sqrt_n; i++)
if (prime[i] == 1)for (j = i + i; j < n; j = j + i)
prime[j] = 0
Parallel CodeProcessor pi > 0Recv(number, rank-1);PRIME = TRUE;FOR (int x=MIN; x<MAX; x+=MIN)
IF ((number % x) == 0)PRIME = FALSE and BREAK
IF (PRIME) send(number, rank+1);
Terminationrecv(number, rank-1);send(number, rank+1)IF (number == terminator) break;
Sequential Time O(n2)
Implementation
![Page 17: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/17.jpg)
Upper Triangular Matrix
All entries below the diagonal are zeroUseful for solving N equations and N unknowns
![Page 18: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/18.jpg)
Solving Sets of Linear Equations
• Upper Triangular Forman-1,0x0 + an-1,1x1 + … + an-1,n-1xn-1 = bn-1
an-2,0x0 + an-2,1x1 + … + an-2,n-2xn-1 = bn-2
a1, 0x0 + a1,1x1 = b1
a0,0x0 = b0
• Back Substitutionx0=b0/a0,0
x1=(b1-a1,0x0)/a1,1
x2=(b2-a2,0x0-a2,1x1)/a2,2
• Parallel code for pi where 1<=i<nsum = 0For (j=0; j<i; j++){ receive(&x[j], pi-1); sum += ai,j * xj;
send(xj,pi+1)}xi = (bi – sum)/ai,i
• General solution for xixi= (bi – ∑j=0 to i-1 ai,j xj)/ai,I
• Sequential codex[0] = b0/a0,0, FOR (i=1; i<n; i++)
sum=0;FOR (j=0; j<i; j++)
sum += ai,I xjxi= (bi – sum)/ai,I
• Parallel Pseudo codefor (j = 0; j < i; j++)
recv(x[j], p-1); send(x[j], p+1);sum = 0;for (j = 0; j < i; j++)
sum = sum + a[i][j]*x[j]x[i] = (b[i] - sum)/a[i][i];send(x[i], p+1);
This is a type 3 pipeline example
Note: ai,j and bi are constants
![Page 19: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/19.jpg)
Pipeline Solution
DO
IF p ≠ master, receive xj from previous processor
IF p ≠ P-1, send xj to next processor
back substitute xj
UNTIL xi evaluated
IF p ≠ P-1send xi to the next processor
• Notes:1. Processing continues after sending values
down the pipeline2. Is the load imbalanced?
![Page 20: Decomposition Data Decomposition – Dividing the data into subgroups and assigning each piece to different processors – Example: Embarrassingly parallel.](https://reader035.fdocuments.in/reader035/viewer/2022081603/5697bf871a28abf838c88a1d/html5/thumbnails/20.jpg)
Illustration of Type 3 Solution
Compute x0 Compute x1 Compute x2 Compute x3
x0
x1
x2
x3
x0
x0
x1
x0
x1
x2
P0 P1 P2 P3
Time
P5
P4
P3
P2
P1
P0
How balanced isThis load?