PDC Lesson2

download PDC Lesson2

of 13

Transcript of PDC Lesson2

  • 8/12/2019 PDC Lesson2

    1/13

    Parallel and DistributedComputing

    Dr. Haroon Mahmood

    Assistant Professor

    University of Central Punjab, Lahore

    Date: 04-04-2014

  • 8/12/2019 PDC Lesson2

    2/13

    2Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Evaluation Criteria

    Overview of previous lecture

    Amdahls law Multiprocessor system classification by

    Flynn taxonomy

    Lecture Outline

  • 8/12/2019 PDC Lesson2

    3/13

    3Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Marks Distribution

    Quizzes/Assignments 15%

    Presentation 15 %

    Midterm 20 %

    50 %

  • 8/12/2019 PDC Lesson2

    4/13

    4Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Motivation for Parallel and Distributed Computing

    Uniprocessor are fast but some problems

    require too much computation

    problems use too much data

    have too many parameters to explore

    Parallel and distributed Systems

  • 8/12/2019 PDC Lesson2

    5/13

    5Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Parallel and distributed systems

    Parallel and distributed computing is going to be

    more and more important Dual and quad core processors are very common

    Up to six and eight cores for each CPU

    Multithreading is growing

    Hardware structure or architecture is importantfor understanding how much it is possible tospeed up beyond a single CPU

    Also capability of compilers to generate efficientcode is very important

    It is always difficult to distinguish between HWand SW influences

  • 8/12/2019 PDC Lesson2

    6/13

    6Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Amdahls law (1)

    1

    Speedup =

    (1-p)+p/n

    p is the ratio of parallelizablecodeover totalexecution time,from 0 to 1

    n is the number of processors the code can use

    If there are 4 processors and only 10% of the code is parallelizable

    Speedup = 1/(0.9+(0.1/4))

    = 1.081

    that is only 8% with 4 processors!!

  • 8/12/2019 PDC Lesson2

    7/137

    Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Flynns Classical Taxonomy

    Most Widely used parallel computer classifications

    Distinguishes multiprocessor computers accordingto the dimensions of instruction and data

    SISD: Single instruction stream, single data stream SIMD: Single instruction stream, multiple data

    stream

    MISD:Multiple instruction stream , single data

    stream

    MIMD:Multiple instruction stream, multiple datastream

  • 8/12/2019 PDC Lesson2

    8/138

    Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Processor organizations

    Single instruction,

    single data

    stream (SISD)

    Single instruction,

    multiple data

    stream (SIMD)

    Multiple instruction,

    single data

    stream (MISD)

    Multiple

    instruction,

    multiple data

    stream (MIMD)

    Uniprocessor Vectorprocessor

    Arrayprocessor

    Shared memory(tightly coupled)

    Distributed memory(loosely coupled)

    ClustersSymmetricmultiprocessor

    (SMP)

    Nonuniform

    memory access

    (NUMA)

  • 8/12/2019 PDC Lesson2

    9/139

    Parallel and Distributed Computing

    Dr. Haroon Mahmood

    SISD (1)

    A serial (non-parallel computer)

    Single instruction: one instructionper cycle

    Single data: only one data stream

    per cycle Easy and deterministic execution

    Example:

    Single CPU workstations Most workstations from HP, IBM

    and SGI are SISD machines

    Load A

    C = A + B

    Load B

    Store C

  • 8/12/2019 PDC Lesson2

    10/13

    10Parallel and Distributed Computing

    Dr. Haroon Mahmood

    SISD (2)

    Performance of a processor can be

    measured with:

    MIPS rate = f x IPC (instructions per cycle)

    How to increase performance:

    increasing clock frequency

    increasing number of instructions completedduring a processor cycle (multiple pipelines in

    a superscalar architecture and/or out of orderexecution)

    multithreading

  • 8/12/2019 PDC Lesson2

    11/13

    11Parallel and Distributed Computing

    Dr. Haroon Mahmood

    SISD(3)- Multithreading

    Implicit multithreading

    concurrent execution of multiple threadsextracted from a single sequential program

    Managed by processor hardware

    Improve individual application performance

    Explicit multithreading

    concurrent execution of instructions fromdifferent explicit threads, either by interleavinginstructions from different threads or by parallelexecution on parallel pipelines

  • 8/12/2019 PDC Lesson2

    12/13

    12Parallel and Distributed Computing

    Dr. Haroon Mahmood

    SISD-Explicit Multithreading

    Four approaches for explicit multithreading

    Interleaved multithreading (fine-grained): switchingcan be at each clock cycle. In case of few activethreads, performance degrades

    Blocked multithreading (coarse-grained): events like

    cache miss produce switch Simultaneous multithreading (SMT): execution units of

    a superscalar processor receive instructions frommultiple threads

    Chip multiprocessing: e.g. dual core (not SISD)

    Architectures like IA-64 Very Long Instruction Word(VLIW) allow multiple instructions (to be executed inparallel) in a single word

  • 8/12/2019 PDC Lesson2

    13/13

    13Parallel and Distributed Computing

    Dr. Haroon Mahmood

    Intels hyper threading Technology

    A single physical processor appears as two logical

    processors by applying two-threaded SMT approach Each logical processor maintains a complete set of

    architecture state (general-purpose registers, controlregisters,)

    Logical processors share nearly all other resourcessuch as caches, execution units, branch predictors,control logic and buses

    Partitioned resources are recombined when only one

    thread is active Add less than 5% to the relative chip size

    Improve performance by 16% to 28%