8_3501_07_2

download 8_3501_07_2

of 19

Transcript of 8_3501_07_2

  • 7/29/2019 8_3501_07_2

    1/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 1

    Designing

    MIPS Processor(Single-Cycle)

    Dr. Arjan DurresiLouisiana State University

    Baton Rouge, LA [email protected]

    These slides are available at:http://www.csc.lsu.edu/~durresi/CSC3501_07/

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 2

    OverviewOverview

    Datapath

    Control Unit

    Problems with single cycle Datapath

  • 7/29/2019 8_3501_07_2

    2/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 3

    PerformancePerformance Instruction Count Clock cycle time Clock cycle per Instruction

    Implementation of Processor

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 4

    We're ready to look at an implementation of the MIPS

    Simplified to contain only:

    memory-reference instructions: lw, sw

    arithmetic-logical instructions: add, sub, and, or, slt

    control flow instructions: beq, j

    Generic Implementation:

    use the program counter (PC) to supply instruction address

    get the instruction from memory

    read registers use the instruction to decide exactly what to do

    All instructions use the ALU after reading the registersWhy? memory-reference? arithmetic? control flow?

    The Processor: Datapath &The Processor: Datapath &ControlControl

  • 7/29/2019 8_3501_07_2

    3/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 5

    Abstract / Simplified View:

    Two types of functional units: elements that operate on data values (combinational) elements that contain state (sequential)

    More Implementation DetailsMore Implementation Details

    Data

    Register #

    Register #

    Register #

    PC Address Instruction

    Instruction

    memory

    Registers ALU Address

    Data

    Data

    memory

    AddAdd

    4

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 6

    Building the DatapathBuilding the Datapath

  • 7/29/2019 8_3501_07_2

    4/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 7

    Our Implementation An edge triggered methodology

    Typical execution: read contents of some state elements at the beginning of the

    clock cycle,

    send values through some combinational logic,

    write results to one or more state elements at the end of theclock cycle.

    An edge triggered methodology allows a state element to be readand written in the same clock cycle without creating a race thatcould to indeterminate data.

    Stateelement

    1

    Stateelement

    2

    Combinational logic

    Clock cycle

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 8

    Single Cycle Design We shall first design a simpler processor that executes each

    instruction in only one clock cycle time.

    This is not efficient from performance point of view, since:

    a clock cycle time (i.e. clock rate) must be chosen such thatthe longest instruction can be executed in one clock cycle

    makes shorter instructions execute in one unnecessary longcycle.

    Additionally, no resource in the design may be used more thanonce per instruction, thus some resources will be duplicated.

    Because of that, the singe cycle design will require:

    two memories (instruction and data),

    two additional adders.

  • 7/29/2019 8_3501_07_2

    5/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 9

    Elements for Datapath Design

    Include the functional units we need for eachinstruction

    Why do we need this stuff?

    PC

    Instructionaddress

    Instruction

    Instruction

    memory

    Add Sum

    a. Instruction memory b. Program counter c. Adder

    Readregister 1

    Readregister 2

    Writeregister

    WriteData

    Registers ALUData

    Data

    Zero

    ALUresult

    RegWrite

    a. Registers b. ALU

    5

    5

    5

    Registernumbers

    Readdata 1

    Readdata 2

    ALU operation4

    AddressReaddata

    Data

    memory

    a. Data memory unit

    Writedata

    MemRead

    MemWrite

    b. Sign-extension unit

    Signextend

    16 32

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 10

    Abstract /Simplified View (1stlook)

    Generic implementation:

    use the program counter (PC) to supply instruction address, get the instruction from memory,

    read registers,

    use the instruction to decide exactly what to do.

  • 7/29/2019 8_3501_07_2

    6/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 11

    Abstract /Simplified View (2nd

    look)

    PC is incremented by 4, by most instructions, and by 4+ 4offset, by branch instructions.

    Jump instructions change PC differently (not shown).

    Data

    Register #

    Register #

    Register #

    PC Address Instruction

    Instruction

    memory

    Registers ALU Address

    Data

    Data

    memory

    AddAdd

    4

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 12

    Incrementing PC & FetchingInstruction

  • 7/29/2019 8_3501_07_2

    7/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 13

    Two elements needed toTwo elements needed to

    implement Rimplement R

    --formant ALUformant ALU

    operationsoperations

    add $t1,$t2,$t3

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 14

    Complete Datapath for R-typeInstructions

  • 7/29/2019 8_3501_07_2

    8/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 15

    Two elements needed toTwo elements needed to

    implement loads and storesimplement loads and stores

    lw $t1, offset_value($t2)

    Sw $t1, offset_value($t2)

    Compute a memory address by adding the base register ($t2)

    to the 16-bit signed offset field contained in the instruction

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 16

    Datapath for LW and SWInstructions

  • 7/29/2019 8_3501_07_2

    9/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 17

    Datapath for R-type, LW & SW

    Instructions

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 18

    The datapath for a branchThe datapath for a branch

    beq $t1, $t2, offset

    Compute the branch target address by adding the sign-extended

    offset of the instruction to PC

  • 7/29/2019 8_3501_07_2

    10/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 19

    Datapath for R-type, LW, SW

    & BEQ

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 20

    ALUALU

    Generate the 4-bit ALU control input using a small controlunit that has as inputs the function field of the instruction and

    a 2-bit control field ALUOp

  • 7/29/2019 8_3501_07_2

    11/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 21

    ControlControl Selecting the operations to perform (ALU, read/write, etc.)

    Controlling the flow of data (multiplexor inputs)

    Information comes from the 32 bits of the instruction

    Example:

    add $8, $17, $18 Instruction Format:

    000000 10001 10010 01000 00000 100000

    op rs rt rd shamt funct

    ALU's operation based on instruction type and function code

    31:26 25:21 20:16 15:11 10:6 5:0

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 22

    Load, store and branchLoad, store and branchinstructionsinstructions

    35 or 43 rs rt address

    31:26 25:21 20:16 15:0

    Load or store instruction

    4 rs rt address

    31:26 25:21 20:16 15:0

    Branch instruction

  • 7/29/2019 8_3501_07_2

    12/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 23

    Must describe hardware to compute 4-bit ALU control input given instruction type

    00 = lw, sw01 = beq,10 = arithmetic

    function code for arithmetic

    Describe it using a truth table (can turn into gates):

    ALUOp

    computed from instruction type

    ControlControl

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 24

    Truth Table for (Main) Control Unit

  • 7/29/2019 8_3501_07_2

    13/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 25

    Truth Table of ALU Control Unit

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 26

    Design of (Main) Control Unit

  • 7/29/2019 8_3501_07_2

    14/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 27

    Design of Control Unit (J included)

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 28

    Design of 7-Function ALU Control Unit

  • 7/29/2019 8_3501_07_2

    15/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 29

    Cycle Time Calculation

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 30

    Single Cycle ImplementationSingle Cycle Implementation Calculate cycle time assuming negligible delays except:

    memory (200ps),ALU and adders (100ps),register file access (50ps)

    Readregister 1

    Readregister 2

    Writeregister

    Writedata

    Writedata

    Registers ALU

    Add

    Zero

    RegWrite

    MemRead

    MemWrite

    PCSrc

    MemtoReg

    Readdata 1

    Readdata 2

    ALU operation4

    Signextend

    16 32

    Instruction

    ALUresult

    Add

    ALUresult

    Mux

    Mux

    Mux

    ALUSrc

    Address

    Datamemory

    Readdata

    Shiftleft 2

    4

    Readaddress

    Instructionmemory

    PC

  • 7/29/2019 8_3501_07_2

    16/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 31

    Performance of Single cycle machinesPerformance of Single cycle machines Memory (200ps), ALU and adders (100ps), register file access (50ps)

    Instructions mix: 25% loads, 10% stores, 45% ALU, 15% branches, 5%jumps. Which of the following implementations would be faster?

    Every instruction operates in a 1 clock cycle of a fixed length Every instruction operates in a 1 clock cycle of a variable length

    CPU execution time = Instruction count x CPI x Clock cycle time Since CPI = 1 CPU execution time = Instruction count x Clock cycle time Using the critical paths we can compute the required length for each

    class: R-type 400ps, Load word 600ps, Store word 550ps, Branch 350ps, jump

    200ps In case 1 the clock has to be 600ps depending on the longest instruction A machine with a variable clock will have a clock cycle that varies

    between 200ps and 600ps. The average CPU clock cycle= 600x25% + 550x10% + 400x45% +350x15%+200x5% = 447.5ps So the variable clock machine is faster 1.34 times

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 32

    ExampleExampleInstructionclass

    Functional units used by the instruction class

    R-type Instruction fetch

    Registeraccess

    ALU Registeraccess

    Load word Instruction fetch

    Registeraccess

    ALU Registeraccess

    Registeraccess

    Store word Instruction fetch

    Registeraccess

    ALU Registeraccess

    Branch Instruction fetch

    Registeraccess

    ALU

    Jump Instruction fetch

  • 7/29/2019 8_3501_07_2

    17/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 33

    ExampleExampleInstructionclass

    Instruction

    memory

    Registerread

    ALU Datamemory

    Registerwrite

    Total

    R-type 200 50 100 0 50 400ps

    Load word 200 50 100 200 50 600ps

    Store word 200 50 100 200 0 550ps

    Branch 200 50 100 0 350ps

    Jump 200 200ps

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 34

    Abstract View of our single cycle procesAbstract View of our single cycle proces

    looks like a FSM with PC as state

    PC

    NextPC

    R

    egister

    Fetch ALU R

    eg.

    Wrt

    Mem

    A

    ccess

    Data

    MemIn

    struction

    Fetch

    ResultStore

    ALUctr

    RegDst

    ALUSrc

    ExtOp

    MemWr

    Equal

    nPC_

    sel

    RegWr

    MemWr

    MemRd

    MainControl

    ALUcontrol

    op

    fun

    Ext

  • 7/29/2019 8_3501_07_2

    18/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 35

    WhatWhats wrong with our CPI=1 processor?s wrong with our CPI=1 processor?

    Long Cycle Time

    All instructions take as much time as the slowest Real memory is not as nice as our idealized memory

    cannot always get the job done in one (short) cycle

    PC Inst Memory mux ALU Data Mem mux

    PC Reg FileInst Memory mux ALU mux

    PC Inst Memory mux ALU Data Mem

    PC Inst Memory cmp mux

    Reg File

    Reg File

    Reg File

    Arithmetic & Logical

    Load

    Store

    Branch

    Critical Path

    setup

    setup

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 36

    Where we are headedWhere we are headed Single Cycle Problems:

    what if we had a more complicated instruction like floating point?

    wasteful of area

    One Solution:

    use a smaller cycle time

    have different instructions take different numbers of cycles

    a multicycle datapath:

    Data

    Register #

    Register #

    Register #

    PC Address

    Instructionor dataMemory

    Registers ALU

    Instruction

    register

    Memorydata

    register

    ALUOut

    A

    BData

  • 7/29/2019 8_3501_07_2

    19/19

    CSC3501 S07Louisiana State University 8- Single Cycle Datapath - 37

    Single Cycle Processor:Conclusion

    Single Cycle Problems: what if we had a more complicated instruction like

    floating point?

    a clock cycle would be much longer,

    thus for shorter and more often used instructions,such as add & lw, wasteful of time.

    One Solution:

    use a smaller cycle time, and

    have different instructions take different numbers

    of cycles. And that is a multi-cycle processor.