STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of...
-
Upload
quentin-foster -
Category
Documents
-
view
212 -
download
0
Transcript of STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of...
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.)
• Parallelization
• Four types of computing:
– Instruction (single, multiple) per clock cycle
– Data used (single, multiple) per clock cycle
• Single Instruction Single Data: Serial computing
• Single Instruction Multiple Data: Multiple processors, GPU
• Multiple Instruction Single Data: Shared memory
• MIMD: Cluster computing, Multi-core CPU, Multi-threaded, Message-passing (IBM SP-x on hypercube, Intel single chip Xenon Phi: http://spectrum.ieee.org/semiconductors/processors/what-intels-xeon-phi-coprocessor-means-for-the-
future-of-supercomputing)
Grid Computing & Cloud
• Not necessarily parallel• Primary focus is the utilization of CPU-cycles across • Just networked CPU’s, but middle-layer software makes node
utilizations transparent• A major focus: avoid data transfer – run codes where data are• Another focus: load balancing• Message passing parallelization is possible: MPI, PVM, etc.• Community specific Grids: CERN, Bio-grid, Cardio-vascular grid,
etc.• Cloud: Data archiving focus, but really commercial versions of Grid,
CPU utilization is under-sold but coming up: expect service-oriented software business model to pick up
RAM Memory Utilization
• Two types feasible:
• Shared memory:
• Fast, possibly on-chip, no message passing time, no dependency on a ‘pipe’ and its possible failure
• But, consistency needs to be explicitly controlled, that may cause-deadlock, that needs deadlock checking-breaking mechanism adding overhead
• Distributed local memory:
• communication overhead
• ‘pipe’ failure possibility is a practical problem
• good model where threads are independent of each other
• most general model for parallelization
• easy to code, & well-established library (MPI)
• scaling up is easy – on-chip to over-the-globe
Threading Types
• Two types feasible:
• Static threading: OS controls, typically for single-core CPU’s (why would one do it? - OS),
but multi-core CPU’s use it if compiler guarantees safe execution
• Dynamic threading: Program controls explicitly, threads are created/destroyed as needed, parallel computing model
Multi-threaded Fibonacci Recursive
Fib (n)
1If n<=1 then return n;
else
2. x = Fib(n-1);
3. y = Fib(n-2);
4. return (x+y).
Complexity: O(Gn), where G is Golden ration ~1.6
Fibonacci Recursive
Fib (n)
1If n<=1 then return n;
else
2. x = Spawn Fib(n-1);
3. y = Fib(n-2);
4.Sync;
5. return (x+y).
Parallelization of threads is optional: scheduler decides (programmer, script translator, compiler, os)
• GPU-type parallelization’s ideal time ~critical path length• The more balanced the tree is the shorter the critical path
Spawn, or Data collection node is counted as time unit 1This is message passing
Note, GPU/SIMD uses different model:Each thread does same work (kernel), & Data goes to shared memory
Terminologies/Concepts
• For P available processor: Tinf , TP , T1 : no-limit to serial-processor
• Ideal parallelization: TP = T1 / P
• Real situation: TP >= T1 / P
• Tinf is theoretical minimum feasible, so, TP >= Tinf
• Speedup factor = T1 / P
• T1 / TP <= P
• Linear speedup: T1 / TP = O(P) [e.g. 3P +c]
• Perfect linear speedup: T1 / TP = P
• My preferred factor would be TP / T1 (inverse speedup: slowdown factor?)
– linear O(P); quadratic O(P2), …, exponential O(kP, k>1)
Terminologies/Concepts
• For P available processor: Tinf , TP , T1 : no limit to serial processor
• Parallelism factor: T1 / Tinf
– serial-time by ideal-parallelized-time
– note, this is about your algorithm,
• unoptimized over the actual configuration available to you
• T1 / Tinf < P implies NOT linear speedup
• T1 / Tinf << P implies processors are underutilized
• We want to be close to P: T1 / Tinf P, as in limit
• Slackness factor: (T1 / Tinf ) / P , or (T1 / Tinf P)
• We want slackness 1, minimum feasible
– i.e, we want no slack