MODULE 1 - Universiti Teknologi Malaysia · 2015. 9. 10. · Computer Architecture Computer...

61
REFERENCE: DAVID A. PATTERSON & JOHN L. HENNESSY – COMPUTER ORGANIZATION AND DESIGN Module 1a: Organization & Architecture: Structure & Function MODULE 1 (Overview & Computer Performance) 1

Transcript of MODULE 1 - Universiti Teknologi Malaysia · 2015. 9. 10. · Computer Architecture Computer...

  • R E F E R E N C E : D A V I D A . P A T T E R S O N & J O H N L . H E N N E S S Y – C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N

    Module 1a: Organization & Architecture:

    Structure & Function

    MODULE 1 (Overview & Computer Performance)

    1

  • Computer Architecture Computer Organization

    Refers as a set of attributes of a system as seen by programmer

    Deals with all physical components of computer systems that interacts with each other to perform various functionalities

    The lower level of computer organization is known as microarchitecture which is more detailed and concrete.

    Comp. Architecture vs Comp. Organization 2

    The instruction set

    The number of bits to

    represent data types

    I/O mechanisms

    memory addressing techniques

    Control signals

    Interfaces between computer and

    peripherals

    The memory

    technology being

    used

  • Computer Architecture Computer Organization

    The difference between architecture and organization is best described by a non-computer example.

    “Is the gear level in a motorcycle part of it is architecture or organization?

    The architecture of a motorcycle is simple; it transports you from A to B. The gear level belongs to the

    motorcycle's organization because it implements the function of a motorcycle but is not part of that function”

    Comp. Architecture vs Comp. Organization

    3

  • Computer Architecture Computer Organization

    Refers to those attributes visible to the programmer

    Refers to the how features are implemented

    Comp. Architecture vs Comp. Organization

    Can we multiply 2 numbers?

    Yes, we can multiply

    How to multiply.

    4

  • The Computer Family

    Many computer manufacturers offer a family of computer models, all with the same architecture but with differences in organization.

    All Intel x86 family share the same basic architecture

    The IBM System/370 architecture first introduced in 1970 included a number of models that share the same basic architecture and has survived to this day as the architecture of IBM’s mainframe product line.

    The newer models retained the same architecture so that the customer’s software investment was protected (code compatibility)

    5

  • 6

    Register different

    growing fast….

  • same

    architecture

    differences in

    organization

    7

  • Structure and Function

    A computer is a complex system with a hierarchical system of interrelated subsystems with different levels.

    At each level, the designer is concerned with structure and function:

    Structure: The way in which the components are interrelated.

    Function: The operation of each individual component as part of the structure.

    8

  • Structure

    4 main structural

    components

    Central processing unit (CPU)

    Main memory

    I/O

    System interconnection

    Controls the operation of the

    computer and performs its data

    processing functions

    Stores data

    Moves data between the

    computer and its external

    environment

    Mechanism for

    communication among

    CPU, main memory, and

    I/O

    9

  • Structure: CPU

    CPU

    Control unit

    Arithmetic and logic unit

    (ALU)

    Registers

    CPU interconnection

    Controls the operation

    of the CPU

    Performs the

    computer’s data

    processing

    functions Provides storage

    internal to the

    CPU

    Mechanism for

    communication among the

    control unit, ALU, and

    registers

    10

  • THE COMPUTER: TOP-LEVEL STRUCTURE

    Computer

    Main

    Memory

    Input

    Output

    Systems

    Interconnection

    Peripherals

    Communication

    lines

    Central

    Processing

    Unit

    Computer

    11

  • FUNCTION

    Functions

    • process data in variety of forms and requirements Data Processing

    • short and long term data storage for retrieval and update Data storage

    • move data between computer and outside world. Data movement

    • control of process, move and store data using instruction.

    Control

    There are only four functions

    How are functions performed?

    Through PROGRAMS

    12

  • Program

    A sequence of steps

    For each step, a computer function is executed

    For each operation, a different/new set of control signals is needed

    For each operation a unique code (instruction) is provided e.g. ADD, MOVE

    A hardware segment accepts the code and issues the control signals

    13

  • Executing A Program

    Approach 1: Hardwired program

    connecting/combining various logic components to store data and perform arithmetic and logic operations

    Hardwired systems are inflexible

    14

  • Executing A Program

    Approach 2: Software

    General purpose hardware can do different tasks, given correct control signals

    Instead of re-wiring, supply a new set of control signals through instruction codes

    15

  • R E F E R E N C E : W I L L I A M S T A L L I N G S – C O M P U T E R O R G A N I Z A T I O N & A R C H I T E C T U R E ( C H : P E N T I U M E V O L U T I O N )

    Computer Evolution

    16

  • Von Neumann Machine

    1945: stored-program concept first implemented for EDVAC (Electronic Discrete Variable Computer).

    Key concepts:

    Data and instructions are stored in a single read-write memory.

    The contents of this memory are addressable by location, without regard to the type of data contained there

    Execution occurs in a sequential fashion from one instruction to the next

    17

  • Structure of von Neumann machine 18

  • Microprocessors (µP) Intel

    Microprocessor : all CPU components on a single chip

    1971 - 4004

    First microprocessor

    4 bit

    Followed in 1972 by 8008

    8 bit

    Both designed for specific applications

    1974 - 8080

    Intel’s first general purpose microprocessor

    Designed to be the CPU of a general purpose microcomputer

    19

  • Intel µP Evolution ..

    8080 first general purpose microprocessor

    8 bit data path

    Used in first personal computer – Altair

    8086 much more powerful

    16 bit

    instruction cache, prefetch few instructions

    8088 (8 bit external bus) used in first IBM PC

    80286 16 MB memory addressable

    80386 First 32 bit design

    Support for multitasking- run multiple programs at the same time

    20

  • .. Intel µP Evolution ..

    80486 sophisticated powerful cache and instruction pipelining

    built in maths co-processor

    Pentium Superscalar technique - multiple instructions executed in parallel

    Pentium Pro Increased superscalar organization

    Aggressive register renaming

    branch prediction

    data flow analysis

    speculative execution

    21

  • .. Intel µP Evolution

    Pentium II MMX technology graphics, video & audio processing

    Pentium III Additional floating point instructions for 3D graphics

    Pentium 4 Further floating point and multimedia enhancements

    Itanium 64 bit

    Core Duo starts of a multi core processor

    22

  • Intel Evolution

    23

  • R E F E R E N C E : D A V I D A . P A T T E R S O N & J O H N L . H E N N E S S Y – C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N

    Module 1b: Understanding & Measuring

    Performance

    24

  • Introduction

    Hardware performance is often key to the effectiveness of an entire system of hardware and software.

    For different types of applications, different performance metrics may be appropriate, and different aspects of a computer systems may be the most significant factor in determining overall performance.

    Understanding how best to measure performance and limitations of performance is important when selecting a computer system

    To understand the issues of assessing performance. Why a piece of software performs as it does?

    Why one instruction set can be implemented to perform better than another?

    How some hardware feature affects performance?

    25

  • Why measure performance?

    Performance is important!

    Identify HW/SW performance problems

    Comparisons:

    Which machine is faster?

    Which ISA is better?

    Which implementation (of an ISA) is faster?

    Expose significant performance issues (enable us to ignore unimportant issues)

    26

  • More than one way to measure performance

    Performance is evaluated differently by different entity.

    Better performance means faster processing speed (e.g. faster completion of a task/job)

    Better performance means higher throughput (doing more jobs in a time given)

    Better performance means doing more jobs at a smaller cost

    27

  • Which plane has better performance?

    If higher throughput (transporting

    more passengers) is better

    performance If higher speed

    is better

    performance

    If better performance

    means having a long

    range

    28

  • Understanding terminology

    Execution time (a.k.a response time) :The total time it takes from start to completion of a task

    Throughput :The total amount of tasks completed in a given time interval

    CPU execution time (a.k.a CPU time) :The actual time CPU spends on a specific task

    User CPU time: time the CPU spends on running the actual program

    System CPU time: time the CPU spends on OS overhead on behalf of the program

    Clock cycle (a.k.a ticks, cycle) :Discrete time intervals (the processor clock which runs at a constant rate). Usually in nanoseconds (ns) or picoseconds (ps)

    29

  • Understanding terminology

    Clock period (a.k.a clock cycle time): the duration of one clock cycle. In sec, or msec

    Clock rate (or frequency) : the speed that the microprocessor executes each instruction or each vibration of the clock. In MHz/GHz. Frequency = 1/clock period

    1 MHz representing 1 million cycles per second,

    1 GHz representing 1 thousand million cycles per second (109)

    Clock cycles per instruction (CPI) : The average number of clock cycles each instruction takes to execute

    30

  • Figure 1

    Figure 2

    1 cycle time =

    how length of

    this clock cycle

    31

  • Common performance metrics

    MB/s, Mb/s: Megabytes, Megabits Per Second

    MIPS: Millions of Instructions Per Second

    CPI: Clock Cycles Per Instruction

    IPC: Instructions Per Clock cycle

    Hz: (processor clock frequency) cycles Per Second

    LIPS: Logical Interference Per Second

    FLOPS: Floating-Point arithmetic Operations Per Second

    32

  • Computer performance measures

    Performance is related to execution time.

    To maximize performance, we want to minimize the execution time

    If performance of Computer A is 10 times better than Computer B, what is the relation between their execution times?

    This shows that CompB

    needs 10x more time than

    CompA to execute a given

    task.

    33

  • CPU Execution Time 34

    Clock period = 1

    frequency

    If a processor has frequency, 320 MHz:

    Clock period = 1 = 3.125ns

    320 000 000

    Clock rate = frequency (Hz)

    “the frequency at which a CPU is

    running. It is measured in Hz unit”

  • Example 1: Improving Performance

    Our favorite program runs in 10 seconds on computer A, which has a 4 GHz clock. Computer B will run this program in 6 seconds, given that computer B requires 1.2 times as many clock cycles as computer A for this program. What is computer B’s clock rate? Answer: 8Ghz

    What do we know?

    Computer A

    CPU Execution Time = 10s

    Clock rate (CR) = 4GHz = 4 x 109 Hz

    Computer B

    CPU Execution Time = 6s

    Clock cycle (CC) = 1.2 x clock cycle Computer A

    35

  • Example 1: Improving Performance

    What do we know?

    Computer A

    CPU Execution Time = 10s

    Clock rate (CR) = 4GHz = 4 x 109 Hz

    Computer B

    CPU Execution Time = 6s

    Clock cycle (CC) = 1.2 x clock cycle Computer A

    36

  • Clock Cycles per Instruction (CPI)

    Previously, our calculations of Execution time did not include the number of instructions needed for the program.

    Different instructions may take different amounts of time to execute, depending on what they do

    Example: The MOV (Move) instruction – moving data from one place to another

    37

  • The MOV instruction : Analogy

    Analyze Conrad’s movement of putting the red balls into the container.

    Balls from prime storage:

    -walk

    -fetch ball

    -walk (halfway)

    -walk

    -put ball in container

    Total = 5

    Balls from sub storage:

    -fetch ball

    -walk

    -put ball in container

    Total = 3

    To do 5 movements takes

    longer to execute than 3

    38

  • CPU Execution Time

    a.k.a Instruction count

    CPU clock cycle

    39

  • Example 2: Using Performance Equation

    Suppose we have two implementations of the same instruction set architecture (ISA) and for the same program. Which computer is faster and by how much?

    Computer A: clock cycle time=250 ps and CPI=2.0

    Computer B: clock cycle time=500 ps and CPI=1.2

    Note: because both computer uses the same program, and the Instruction Count is not given, we can assume it to be a variable I

    Remember the formula

    40

  • Example 2: Using Performance Equation

    Remember: the lower the

    execution time, the better the

    performance.

    Computer A is faster

    How much faster is Computer A?

    41

  • Example 2 (continued)…

    We can conclude, A is 1.2

    times faster than B for

    this program

    42

  • Measuring the CPI

    Sometimes it is possible to compute the CPU clock cycles by looking at the different types of instructions and using their individual clock cycle counts

    Ci = count of the number of instructions of class i executed

    CPIi = average number of cycles per instruction for that instruction class

    n = number of instruction classes

    Remember that overall CPI for a program will depend on both the number of cycles for each instruction type and the frequency of each instruction type in the program execution

    43

  • Sample: Calculate CPI

    You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark B, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.

    If we say that there are 100 instructions, then:

    30 of them will be loads and stores.

    50 of them will be arithmetic instructions.

    20 of them will be all others.

    Formula: (30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions = 4.4 cycles per instruction

    44

    Instruction Type Frequency Cycles

    Loads & Stores 30% 6 cycles

    Arithmetic Instructions 50% 4 cycles

    All Others 20% 3 cycles

  • Factors Affecting the CPU Performance 45

  • 46

  • Example 3 : Comparing Code Segments

    A compiler designer is trying to decide between two code sequences for a particular computer. The hardware designers have supplied the following facts:

    For a particular high-level-language statement, the compiler writer is considering two code sequence that require the following instruction counts:

    a) Which code sequence executes the most instructions?

    b) Which will be faster?

    c) What is the CPI for each sequence?

    47

    Example: code segments

  • Example 3 : Part (a)

    Sequence 1 executes 2 + 1+ 2 = 5 Instructions

    Sequence 2 executes 4 + 1+ 1 = 6 Instructions

    Seq 2 executes THE MOST instructions

    48

  • Example 3 : Part (b)

    Using this equation

    Takes 10 cycles to execute

    5 instructions

    Takes 9 cycles to execute

    6 instructions Seq 2 is FASTER

    49

  • Example 3 : Part (c)

    Code SEQ2 uses fewer

    clock cycles, it must

    have a lower CPI

    50

  • Example 4 : Comparing Code Segments

    A processor has 3 classes of instructions:

    Which code sequence is faster?

    Instruction CPI Code SEQ1

    Code SEQ2

    Clock cycles SEQ1

    Clock cycles SEQ2

    A 1 5 3 5 3

    B 2 3 2 6 4

    C 5 1 2 5 10

    9 ins. 16 clock

    cycles

    7 ins. 17 clock

    cycles

    Code SEQ1 Takes 16 cycles

    to execute 9 instructions

    Code SEQ2 Takes 17 cycles

    to execute 7 instructions

    Code SEQ1 is FASTER

    Recall 51

  • Example 4a: Calculating with CPI

    The ADD instruction takes 1 clock cycle to execute, while the MUL instruction takes 3 clock cycles. If a program consists of 20 ADD and 10 MUL instructions, what is the average CPI?

    What do we know?

    Instruction Clock cycles

    Instruction count

    ADD 1 20

    MUL 3 10

    There are 2 instructions

    52

  • Example 4a: Calculating average CPI

    Instruction Clock cycles

    Instruction count

    ADD 1 20

    MUL 3 10

    53

  • Homework (to make you cleverer )

    Instruction Instructions

    count

    Clock

    Cycles

    a) CPI b) Execution time

    A 20 3

    B 25 1

    C 10 2

    D 30 2

    E 10 3

    F 5 4

    CPU X runs a program/code sequence Y which consists of 100 instructions. Calculate and fill in the table below: a) The CPI for each instruction class given below. b) The execution time for each instruction class, given a clock

    cycle time is 0.25miliseconds. c) The CPU X’s execution time d) The CPU X’s clock rate

    55

  • Did you get the same???

    Instruction Instruction

    count Clock Cycles

    a) CPI b) Execution

    time

    clock cycle time

    0.25

    A 20 3 0.15 0.75

    B 25 1 0.04 0.25

    C 10 2 0.2 0.5

    D 30 2 0.07 0.5

    E 10 3 0.3 0.75

    F 5 4 0.8 1

    15 3.75

    56

    (a) (b)

    (c)

    (d) Clock rate = ∑clock cycles = 15 = 4 ∑ execution time 3.75

  • Increasing the CPU Performance

    Decreasing the clock cycle time

    Datapath organization leading to lower CPI

    Reduction in the number of executed instructions.

    58

  • Example 5: Improve Performance

    Our favourite program runs in 20 seconds on Computer P, which has 8 GHz clock. We are trying to help a computer designer build Computer Q that will run this program in 5 seconds. The designer has determined that the substantial increase in the clock rate is possible, but this will affect the rest of the CPU design, causing computer Q to require 1.5 times as many clock cycles as computer P for this program. What clock rate should we tell the designer to target?

    What do we know?

    Computer P

    CPU Execution Time = 20s

    Clock rate (CR) = 8GHz = 8 x 109 Hz

    Computer Q

    CPU Execution Time = 5s

    Clock cycle (CC) = 1.5 x clock cycle Computer P

    59

  • What do we know?

    Computer P

    CPU Execution Time = 20s

    Clock rate (CR) = 8GHz = 8 x 109 Hz

    Computer Q

    CPU Execution Time = 5s

    Clock cycle (CC) = 1.5 x clock cycle Computer P

    60

  • Mandatory Homework

    Do Tutorial Module 1 (e-learning).. It is COMPULSORY for my class!

    Submission date will be announced.

    63

  • Understanding the Units

    CPU execution time for a program = Seconds for the program (S/P)

    Clock cycle = clock cycles per program (C/P)

    Clock cycle time = Seconds per clock cycle (S/C)

    Clock rate = clock cycle per second (C/S)

    Instruction count = Instructions executed for the program (I/P)

    Clock cycle per instruction = Average number of clock cycles per instructions (C/I)

    64

  • Understanding the Units

    It cancels each other to

    give the unit.

    Example:

    10s = 20cycle/ clock rate

    Clock rate = 20/10 cycle per seconds = 2Hz

    1 Hz is 1 cycle per second

    65