Eee446 Lab Manual Spring2015

23
METU NORTHERN CYPRUS CAMPUS Computer Architecture II EEE446 LABORATORIES Ali Muhtaroğlu Spring 2015

description

-

Transcript of Eee446 Lab Manual Spring2015

  • METU NORTHERN CYPRUS CAMPUS

    Computer Architecture II

    EEE446 LABORATORIES

    Ali Muhtarolu

    Spring 2015

  • 2

    Regulations:

    Students are not permitted to perform an experiment without doing the preliminary work before

    coming to the laboratory. It is not allowed to do the preliminary work at the laboratory during the experiment.

    No food or drink in the lab.

    Initial experiments will be done individually. You will form groups in the CPU design portions.

    Cheating is not tolerated in this laboratory. Plagiarism is a form of cheating as is using someone elses written word with minor changes and no attribution. If you are caught cheating, you will, at the very least, receive a zero for that preliminary work.

    Students who miss the lab 3 times without a valid excuse get zero as the EEE-446 laboratory portion of the course grade.

    Those who fail to get a satisfactory score from the laboratory portion will fail the class. This score will be finalized later, but is expected to be around 70%.

  • 3

    EXPERIMENT #1

    INTRODUCTION TO COMPUTER ARCHITECTURE LABORATORY:

    8-BIT BOOTHS MULTIPLIER DESIGN

    1.1 OBJECTIVE

    The purpose of the first laboratory exercise is to implement an 8-bit signed integer multiplier in

    hardware using Booths algorithm. The lab exercise will also be used as a vehicle for getting familiar with Altera Quartus II Software, Cyclone II Field Programmable Gate Array (FPGA) family, programming interface, switch inputs, LED outputs, and clocking interface of the DE2-70 development board.

    1.2 PRELIMINARY WORK

    1.2.1 Download Quartus II Web Edition Software (Free).

    1.2.2 Execute Quartus II, select, and complete the Quartus II Interactive Tutorial, which is one of the

    opening menu options. There are plenty of materials for self-training under Training, and

    Online Demos selections in the opening menu. Get familiar with creating a hierarchical design

    using schematic editor and VHDL, compiling, simulating, and configuring a design by reading

    the attached handouts. You can get to useful documentation at any time through Quartus II Help

    Menu.

    1.2.3 Browse through the DE2-70 User Manual at the below link to understand the general functionality, and some of the features that will be useful at this introductory laboratory. You may want to read all of the Chapters 1-5. Pay special attention to the use of LEDs, Switches, and 7-Segment Displays explained in Chapter 5. http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=53&No=226&PartNo=4

    We will have a copy of the User Manual in the lab. But you may want to download your own

    copy as well.

    1.2.4 Draw out the 8-bit Booth multiplier datapath for signed multiplication, labeling all blocks and signals clearly.

    1.2.5 Do the design entry of the 8-bit multiplier datapath (1.2.4) using VHDL, and simulate to ensure correct operation. You can implement the top level block using schematic editor if you prefer.

    1.2.6 Draw out a state diagram for the Booth multiplier control unit, given your datapath design from 1.2.5. A START signal (preferably one of the push-buttons on the board) should start off the multiplication by loading the two 8-bit numbers from the DE2-70 toggle-switches into the proper registers. The multiplication should be executed only once and stop until the START button is released and pushed again.

    1.2.7 Design the control unit using VHDL, and simulate to ensure correct operation.

    1.2.8 Connect the datapath and the control unit blocks using the schematic editor. Make sure the FPGA device selected in your project is Cyclone II EP2C70F896C6. Use the pin mapping tool to map the 8-bit binary multiplication inputs to toggle-switches, hexadecimal output to 7-segment displays, START and CLK signals to push-buttons. It is a good practice to also have a RESET

  • 4

    input (toggle-switch or push-button) to initialize all flip-flops in your design. Simulate the design to ensure correct operation. In addition to functional simulations, do the timing simulations as well to ensure no timing problems.

    1.2.9 Report the maximum clock speed (minimum clock period) for your design.

    1.2.10 Try to optimize your design to minimize the number of clock cycles it takes to execute one multiplication operation (without degrading the clock period significantly.)

    1.2.11 Submit a preliminary work report including

    an objective statement,

    your drawings from 1.2.4 and 1.2.6,

    printouts for all of your VHDL and schematic design files,

    Booth multiplier timing simulation results showing the correct operation for multiplications: 14x6, -14x6, -6x14, -6x-14.

    the summary from the Timing Analyzer showing the maximum clock frequency you reported in 1.2.9,

    Any optimizations you came up with in 1.2.10 (bonus).

    Bring the project and design files you created in your preliminary work to the lab in order to save time.

    1.3 EXPERIMENTAL WORK 1.3.1 Experimental Setup

    Ensure you have a DE2-board connected through USB to your PC terminal, and to a power supply. Power up the board and observe flashing LEDS and cycling numbers on 7-segment

    displays. The LCD display should have: Welcome to the Altera DE2-70.

    1.3.2 Control Panel Checkout

    Execute the DE2 70 Control Panel executable on your desktop. Download code to either SDRAM-U2 or SSRAM. Connect to the board by pushing the Connect button unless the communication with the board has already been established.

    Try features in the Control Panel to ensure a healthy interface on the DE2-70 board. All toggle switches, LEDs, push-buttons, 7-segment displays, and the LCD display should be checked.

    1.3.3 Design Validation

    i) Load your project and ensure the correct FPGA family is selected in the Set project and Compiler Settings menu item under Assign Constraints. Device Category should indicate Cyclone II EP2C70F896C6.

    ii) Enter pin assignments if you have not done this already. Compile and Program Device. iii) Use carefully picked input vectors to test your design by stepping the CLOCK through the

    push-button. If the design does not work, re-run timing simulations with the same input vectors to ensure your design entry is correct. The multiplier output as observed at the 7-segment displays should match your simulation result for each cycle.

    iv) Debug any problems you may run into through divide and conquer approach. v) If the design works:

    a. Demonstrate to your lab instructor.

  • 5

    b. How many clock cycles does it take to complete one multiplication operation? Can you think of any performance enhancements in your architecture to improve CPI?

    c. Remap your CLOCK input pin to assign each of the free running clocks to your CLOCK input progressively from slowest to fastest. Does your design work for the fastest clock signal available on the DE2-70 board?

    vi) If you have not completed within the allocated time, demonstrate how far you were able to get, and explain your debug and root-causing process in order to get partial credit.

  • 6

    EXPERIMENT #2

    8-bit INTEGER ARITHMETIC PROCESSOR DATAPATH DESIGN

    2.1 OBJECTIVE

    In the first laboratory exercise you got familiar with the Altera Quartus II CAD and DE2-70

    prototyping environment, while delivering an 8-bit Booth multiplier with split Datapath and Control Units. We will extend the multiplier datapath to an 8-bit integer Arithmetic Processor (AP) datapath in this experiment. The Booth multiplier Control Unit from the last experiment will be modified to support the new datapath. This AP will make up the processing power of the CPU core you will complete in Lab 3.

    2.2 PRELIMINARY WORK

    2.2.1 Since this experiment builds upon the results you got in the first lab, it is important that you

    finish any incomplete parts of the Booth multiplier design in that experiment before you start this preliminary work.

    2.2.2 Read through the AP specifications (Section 2.3).

    2.2.3 You will follow a modular design approach as in the first experiment. Design the ALU, register,

    and multiplexer blocks as separate VHDL modules. After making sure they individually satisfy

    the provided functional requirements, instantiate and connect them in the schematic editor to

    build the AP datapath. Save this as your APdatapath design file. Run timing simulations to

    ensure full functionality.

    2.2.4 Modify the Booth multiplier control unit (FSM) to support the new set of signals in Table 1, and

    connect it to APdatapath. Save this as your new boothmultiplier design file. Run timing

    simulations to ensure multiplication still works correctly.

    2.2.5 Submit a preliminary work report that includes:

    An objective statement;

    printouts for all of your VHDL and schematic design files from 2.2.3 and 2.2.4,

    AP datapath timing simulations from 2.2.3 showing all of the main AP micro-operations, documented in Table 1, with realistic signal delays (you may want to submit few distinct simulations for clarity, instead of packing all the functions into one simulation run),

    the Booth multiplier control unit state diagram enhanced from Experiment 1 to support the new signals provided in Table 1,

    timing simulations from 2.2.4 showing an example multiplication operation after the modified Booth control unit is tied to the new datapath,

    a summary from the Timing Analyzer showing the maximum clock frequency of your design.

    Note: It is crucial that you achieve a good health of your design as demonstrated by your simulations before you come to the scheduled lab session. The valuable lab time and resources should not be used for design work. You will use the lab session for hardware prototyping, debug, and validation work. If you cannot demonstrate successful simulations in preliminary work, you may not be accepted to perform this lab.

  • 7

    2.3 Arithmetic Processor Datapath Specifications

    2.3.1 Top Level Datapath The AP top level datapath is depicted in Figure 1. Note this is the enhanced version of the datapath designed in the first experiment in order to support the new micro-operations in Table 1. The clk signal is not shown in the datapath. The three registers and the Qm1 flip-flop are triggered on the clk rising edge.

    Figure 1. Arithmetic Processor datapath top level block diagram

    Table 1. Arithmetic Processor Micro-Operations

    Micro-operation LDM LDA LDQ ASEL BSEL AOP[2:0] SRSEL SR SL RST

    Q 0; Qm1 0; A 0; M 0;

    X X X X X X X X X 1

    M DAT1 1 0 0 X X X X 0 0 0 Q DAT2; Qm1 0

    0 0 1 X X X X 0 0 0

    A(7:0),Q(7:0),Qm1 A(6:0),Q(7:0),Qm1,0

    0 0 0 X X X X X 1 0

    A(7:0),Q(7:0), Qm1 0,A(7:0),Q(7:0)

    0 0 0 X X X 0 1 0 0

    A(7:0),Q(7:0),Qm1 A(7),A(7:0),Q(7:0)

    0 0 0 X X X 1 1 0 0

    A 0 AOP 0 0 1 0 0 0 AOP X 0 0 0

    M

    REGISTER

    DAT3(7:0)0 1

    0 1 2 3

    DAT1(7:0)

    0 1

    0 1 2 3

    CO

    OVFZN

    A B

    F

    CI

    AOP(2:0)

    A

    REGISTER

    Q

    REGISTER

    DAT2(7:0)

    A(7:0) Q(7:0)ALUOUT

    (7:0)

    SIR

    SIR

    SOLSIL

    SOR

    SOR Qm1SIL

    SRSEL

    0

    1

    A(7)

    BSEL(1:0)ASEL(1:0)

    SR,SL

    LDM

    LDQ

    LDA

    ALU

    RST

    Qm1

    0

  • 8

    A 0 AOP 1 0 1 0 0 1 AOP X 0 0 0

    A 0 AOP M 0 1 0 0 2 AOP X 0 0 0

    A 0 AOP DAT3 0 1 0 0 3 AOP X 0 0 0

    A 1 AOP 0 0 1 0 1 0 AOP X 0 0 0

    A 1 AOP 1 0 1 0 1 1 AOP X 0 0 0

    A 1 AOP M 0 1 0 1 2 AOP X 0 0 0

    A 1 AOP DAT3 0 1 0 1 3 AOP X 0 0 0

    A Q AOP 0 0 1 0 2 0 AOP X 0 0 0

    A Q AOP 1 0 1 0 2 1 AOP X 0 0 0

    A Q AOP M 0 1 0 2 2 AOP X 0 0 0

    A Q AOP DAT3 0 1 0 2 3 AOP X 0 0 0

    A A AOP 0 0 1 0 3 0 AOP X 0 0 0

    A A AOP 1 0 1 0 3 1 AOP X 0 0 0

    A A AOP M 0 1 0 3 2 AOP X 0 0 0

    A A AOP DAT3 0 1 0 3 3 AOP X 0 0 0

    2.3.2 ALU

    The block diagram of the simple combinational ALU in the AP datapath is shown in Figure 2.

    Figure 2. ALU inputs and outputs

    Its functions and controls are defined in Table 2 below. The Carry-In (CI) input and Carry-Out (CO) output of the ALU are only used for add/subtract operations. In addition to the CO bit, the ALU has three other status output bits: Overflow (OVF), MSB of the ALU result (N), and Zero (Z). The status bits and the significance of each are summarized in Table 3.

    88

    8

    CICO11

    Z

    A B

    F

    1

    1N

    F(7)

    8

    OVF1

    F(7:0)

    A(7:0) B(7:0)

    AOP3

  • 9

    Table 2. ALU Function Control

    Mnemonic AOP[2:0] ALU Function Symbol

    ADD 0 A Plus B Plus CI CO,F A + B + CI

    SUBB 1 A Minus B Minus !CI CO,F A B !CI

    SUBA 2 B Minus A Minus !CI CO,F = B A !CI

    OR 3 A OR B F A + B

    AND 4 A AND B F A . B

    NOTAB 5 !A AND B F !A . B

    XOR 6 A XOR B F A B

    XNOR 7 A XNOR B F !(A B)

    Table 3. ALU Status Specifications

    Status Description

    CO 1 if there is a Carry Out from add and subtract

    operations; 0 for logic operations

    OVF 1 if the add or subtract operation results in

    overflow (XOR of most significant two carry bits); 0 for all logic operations

    Z 1 if the value of F(7:0) is 0; applies to all

    operations

    N 1 if the MSB of F(7:0) is 1; applies to all

    operations

    2.3.3 Registers

    The M register in Figure 1 has synchronous reset (RST) and load (LDM) functions. Q and A registers have shift-right (SR) and shift-left (SL) functions with corresponding shift-in (SIR, SIL) and shift-out (SOR, SOL) bits in addition to the reset (RST) and load (LDQ, LDA). Qm1 flip-flop has reset (RST), load (LDQ) control inputs. It resets when either RST or LDQ is asserted. When neither RST nor LDQ is asserted, it loads the least significant bit (SOR) from the Q register. The specifications of the synchronous logic blocks in the AP datapath are summarized in Table 4, 5, and 6. Table 4. M functions

    RST LDM Operation

    1 X M 0 0 0 Retain

    0 1 M DAT1

    Table 5. Q (or A) functions

    RST LDQ

    (LDA) SL SR Operation

    1 X X X Q (A) 0 0 0 0 0 Retain

    0 1 X X Q DAT2 (A ALU)

    0 0 1 X QQ(6:0),SIL (AA(6:0),SIL)

    0 0 0 1 QSIR,Q(7:1) (ASIR,A(7:1))

    Table 6. Qm1 functions

    RST LDQ SL SR Operation

    1 X X X Qm1 0 0 0 0 0 Retain

    0 1 X X Qm1 0 0 0 1 X Qm1 0

    0 0 0 1 Qm1

    Q(0)

  • 2.4 EXPERIMENTAL WORK 2.4.1 Arithmetic Processor Datapath Validation

    i) Set your APdatapath design file as the top level, and make sure the correct device is selected in your project (Cyclone II EP2C70F896C6).

    ii) Enter pin assignments for the AP datapath in order to assert the input data and control signals using the available toggle switches and push-buttons, and monitor the outputs using the 7-segment displays and LEDs. Do not forget the CI input bit to the ALU in addition to the control signals in Table 1. Since the number of switches on the DE2-70 board is limited, you may want to connect only 2 or 3 bits of the input data buses to the switches and hard-wire the rest to GND in your design. Clock and RST signals can use the push-buttons. You can take advantage of the remaining two push buttons for the other signals.

    iii) Debug any problems you run into. Validate your datapath using the previously simulated vectors to ensure you get the same results as your simulations.

    iv) Demo 1: Demonstrate your validation results to the lab instructor.

    2.4.2 Booth Multiplier Validation with the Arithmetic Processor Datapath

    v) Set your new boothmultiplier design file as the top level. vi) Make sure you have the new pin assignments. Since the control signals to the AP datapath

    will be coming from the control unit in the design, you do not need to tie these to the switches any more.

    vii) Debug any problems. Validate the Booth multiplier operation using previously simulated vectors to ensure you get the same results as your working simulations.

    viii) Demo 2: Demonstrate your validation results to the lab instructor.

  • 11

    EXPERIMENT #3

    CPU ISA & DATAPATH DESIGN

    3.1 OBJECTIVE

    In this lab exercise you will define and design the datapath of a CPU based on provided high level

    specifications and constraints. As part of the experiment, you will a. need to plan out the instructions, and instruction formats (ISA) supported by your CPU; b. complete the missing pieces of the provided CPU core template in order to design a capable

    general purpose computer CPU datapath, c. do the design entry and validation using the Altera Quartus II CAD and DE2-70 prototyping

    environment. Even though you have 3 weeks to complete this experiment, you are expected to submit a portion

    of the preliminary work in parts (a) and (b) at the end of the first 1.5 weeks.

    3.2 PRELIMINARY WORK

    3.2.1 Define an ISA for the CPU with the below requirements:

    The 8-bit core processing datapath of the CPU is depicted in Figure 3.1. Note this datapath has some similarities with the design you have delivered in Lab 2. It also has some differences due to the addition of a Register File block. Also note that this is not a complete CPU datapath; you will need to enhance it to be able to implement the instruction fetch-decode-execute cycle, and support all instruction types.

    The ISA needs to support general purpose computing, and therefore will contain different instructions for arithmetic-logic, memory load-store, and program control operations. There should be sufficient variety of addressing modes to support both simple constants and variables, but also more complex data structures like indexed arrays as well.

    You will need to analyze the capabilities of the provided CPU core datapath in order to identify the arithmetic-logic instructions you would like to support. It is required that you have at least one multiplication and one division instruction in your arithmetic-logic instruction list.

    The architecture will use memory-mapped I/O, so there is no need to define distinct I/O instructions.

    Both Instruction and Data Memory have 10 bits of addressing space, and 16-bit words at each address i.e. you have 2 kB of accessible Instruction Memory and Data Memory.

    You will submit a report with the details of your ISA specifications in 1.5 weeks. The report should contain a description of different instruction formats and details of each instruction in your ISA. Remember to come up with a name for your ISA (e.g. we previously studied the details of MIPS ISA, Motorola 68HC11 ISA, generally discussed Intel x86 ISA etc.). Add a discussion in your report on different choices you made associated with your ISA design. e.g. How many bits did you assign to your opcode and why? What are the different addressing modes and why? What are the different types of branches/jumps and why? When you make these choices, think about the type of things a high-level programmer (e.g. a C-programmer) is interested in doing while writing a general purpose program, in addition to paying attention to the principles of ISA design we studied last semester:

    o Simplicity favors regularity o Smaller is faster o Make the common case fast o Good design demands good compromises

  • 12

    Figure 3.1. 8-bit CPU Core Datapath

    3.2.2 Design a CPU datapath for a multi-cycle computer to support your ISA description above.

    Think about the essential functional blocks required in the datapath by the von Neumann Fetch-Decode-Execute cycle.

    You will need to do the design along with your ISA definition. You should develop the sequence (cycle-by-cycle) of how each of the instruction types in your ISA is going to execute through the datapath.

    Along with the ISA specification report described in 3.2.1, you will submit a CPU datapath drawing in 1.5 weeks, showing the details of all registers, flip-flops, signals, and signal buses in your CPU, an extended version of Figure 3.1. It is crucial that this picture is consistent with your ISA definition. For this lab exercise, you do not need to worry about: o Control Unit (You should, however, have a good executable idea on how you would

    design the Control Unit (or FSM) for your datapath.) o Instruction and Data Memory: Please show these as abstracted logic blocks in your

    datapath drawing. Assuming you will have a 10-bit one-way address bus, and 16-bit two-

    8x

    8-bit

    RF

    0 1

    0 1 2 3

    0 1

    0 1 2 3

    CO

    OVFZ

    N

    A B

    F

    CI

    AOP(2:0)

    A

    REGISTER

    Q

    REGISTER

    A(7:0) Q(7:0)ALUOUT

    (7:0)

    SIR

    SIR

    SOLSIL

    SOR

    SOR Qm1SIL 0

    SRSEL

    0

    1

    A(7)

    BSEL(1:0)ASEL(1:0)

    SR,SL

    LDQ

    LDA

    ALU

    RST

    Qm1

    0

    SrcAdrA

    (2:0)

    SrcAdrB

    (2:0)

    DstAdr

    (2:0)

    WE

    DataIn(7:0)

    DataOutA

    (7:0)

    DataOutB

    (7:0)

    0 1 2 3DSEL(1:0)

    RST

    DAT1

    (7:0)

    DAT2

    (7:0)

    SrcAdrA(2:0)

    SrcAdrB(2:0)

    DstAdr(2:0)

    WE

    CICOOVF

    ZN

    DAT3

    (7:0)

  • 13

    way (bi-directional) data bus between your CPU datapath and the Memory Units. We will worry about the control signals associated with the Memory in a later lab exercise.

    3.2.3 Review the Register File design and implementation discussion in Section 3.3.

    3.2.4 Follow a modular design approach to do the design entry in VHDL for all the blocks in your

    CPU datapath (designed in 3.2.2) including the Register File and any other components you

    deem necessary. Connect the blocks using schematic capture tool.

    Note: As your design increases in size, it becomes necessary to build some organization skills

    (like doing artwork) to ensure your schematic is easy to follow by anybody. i.e. you should not

    have a lot of twisting and twirling, or snaking signals around your blocks. Plan out your signal

    routing to be as straight and clean as possible similar to the way they look in a drawing like

    Figure 3.1. If it will help to build hierarchies by combining multiple blocks under a simpler

    symbol, for example combining the registers and muxes around the ALU under a single symbol

    along with the ALU, you are recommended to do so. Your top level schematic should represent

    the hierarchy of main functions in your CPU datapath.

    3.2.5 Run a functional simulation of your datapath for each instruction in your ISA. Use a relaxed

    clock frequency when defining your simulation waveforms (e.g. 50ns period). Do not add more

    than two or three instruction to each simulation waveform. Since the Control Unit, Instruction

    Memory, and Data Memory do not yet exist in your design, you will enter the control and

    memory interface signals through the waveform editor manually using the correct timing, as if

    these blocks exist.

    3.2.6 After you achieve functionality, run timing simulations to ensure your design does not have

    timing problems. Before you run the timing simulations save your top level design under a

    different name and do appropriate pin assignments by using the available LEDs, 7-segment

    displays, push-buttons, and toggle-switches efficiently on DE2-70 board. Do not use the

    problematic switch, SW7 for now. The pin assignment will allow your timing simulations to be

    accurate. Do not leave any of the unconnected inputs floating in your design (tie to 0 or 1).

    3.2.7 Submit a preliminary work report when you arrive at the lab (in 3 weeks) that includes:

    An objective statement;

    printouts for all of your VHDL, schematic, and timing simulation files from 3.2.4 and 3.2.5 (do not submit meaningless simulations which do not demonstrate results clearly),

    a summary from the Timing Analyzer showing the maximum clock frequency of your design.

    Note: It is crucial that you achieve a good health of your design as demonstrated by your simulations before you come to the scheduled lab session. The valuable lab time and resources should not be used for design work. You will use the lab session for hardware prototyping, debug, and validation work. If you cannot demonstrate successful simulations in preliminary work, you may not be accepted to perform this lab.

  • 14

    3.3 Register Files

    3.3.1 Overview

    A register file (RF) is the central storage of a CPU. Most operations involve using or modifying data stored in the register file. The register file depicted in Figure 3.2 has 4 locations (00,01,10,11), and is a 4x4-bit RF. A RF can use a Latch or a Flip-Flop as the 1-bit cell. In this example, each cell of the register file is constructed using a D-FF with synchronous RST. Therefore, a register is represented by a row of the 2-dimensional D-FF array shown in the figure. Each of the four registers has a clock input (positive edge triggered), a data input, and a data output. The collection of registers is managed by one 2-to-4 decoder at the input stage, and two 4-to-1 multiplexors at the output stage. There are two output ports (PortA and PortB) for the register file. The output of each register is connected to two multiplexors. The two registers to be read at the output ports are selected through SrcAdrA and SrcAdrB. The register to be written is selected through the DstAdr input to the 2-to-4 decoder. The Write-Enable (WE) signal gates the output of the 2-to-4 decoder enabling the RF for writing. The enabled register decoder line, is further gated with the CLK. Thus, the write operation only happens on the rising edge of the clock. The read operation is asynchronous so that DataOutA and DataOutB are available as soon as the SrcAdrA and SrcAdrB are set respectively.

    Figure 3.2. A Register File block diagram

    3.3.2 Implementing Variable Length Arrays in VHDL VHDL allows use of temporary variables in order to control the generated hardware size (e.g. number of bits in a register, number of registers in a register file) through parameterized description. This provides full flexibility in the VHDL code, where hardware can be scaled easily by changing the value of few variables declared through generic statements. Figure 3.3 contains an example of a variable sized register coding for RF applications. Generic integer parameter w controls the size of the register (set to 4 bits in this case). The number of bits stored by the register can therefore be easily changed by modifying the 4th line of the code. Note the use of a temporary variable i in the body of the code without any prior declaration.

    Using a for loop similar to a high level programming language, the variable allows the generation of the Flip-Flops in the register one bit at a time.

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    D Q

    RST

    3

    2

    1

    0

    3

    2

    1

    0

    DataOutA

    [3:0]

    DataOutB

    [3:0]

    SrcAdrA[1:0]CLK

    2-to-4

    decoder

    WE

    DstAdr

    [1:0]

    RST

    DataIn

    [3:0]

    [0][1][2][3]

    SrcAdrB[1:0]

    0

    1

    2

    3

  • 15

    library IEEE;

    use IEEE.STD_LOGIC_1164.ALL;

    entity RFregister is

    generic (w : INTEGER := 4);

    Port ( RST, LOAD : in STD_LOGIC;

    clk : in STD_LOGIC;

    D : in STD_LOGIC_VECTOR(w-1 DOWNTO 0);

    O : out STD_LOGIC_VECTOR(w-1 DOWNTO 0));

    end RFregister;

    architecture Behavioral of RFregister is

    begin

    p1: process(clk)

    begin

    if (clk'EVENT AND clk='1') then

    for i in w-1 downto 0 loop

    if RST='1' then

    O(i)

  • 16

    architecture Behavioral of RF is

    -- following are defined for the outputs of the D-FF array

    signal tmpq: std_logic_vector(w*l-1 downto 0);

    -- following are the load signals for individual registers

    signal load: std_logic_vector(l-1 downto 0);

    component RFregister is

    Port ( RST, LOAD : in STD_LOGIC;

    clk : in STD_LOGIC;

    D : in STD_LOGIC_VECTOR(w-1 DOWNTO 0);

    O : out STD_LOGIC_VECTOR(w-1 DOWNTO 0));

    end component;

    begin

    -- Generation of correct number of registers:

    genreg1: for i in l-1 downto 0 generate begin

    registers: RFregister port map(

    RST => RST,

    LOAD => load(i),

    clk => clk,

    D => DataIn(w-1 DOWNTO 0),

    O => tmpq((i+1)*w-1 DOWNTO i*w)

    );

    end generate genreg1;

    -- Parameterized DstAdr Decoder:

    p1: process (WE, DstAdr) begin

    for i in l-1 downto 0 loop

    if ((i=conv_integer('0'&DstAdr)) AND (WE='1')) then

    load(i)

  • 3.4 EXPERIMENTAL WORK

    CPU Datapath Validation

    i) Program the version of the design with pin assignments to the Cyclone II FPGA on DE2-70. ii) Go through the combination of tests you simulated to verify functionality of your CPU.

    Ensure each instruction in your ISA is covered. Debug any problems you run into. Note any discrepancies with simulations.

    iii) Demonstrate your validation results to the lab instructor. iv) After achieving functionality with your design, use the APLL megafunction in Altera

    component library to switch your clock input with a variable frequency clock. Check with the instructor on how to do this. Validate the maximum frequency your design can achieve at room temperature.

  • 18

    EXPERIMENT #4 & #5

    MULTI-CYCLE CPU DESIGN w/ SPLIT INSTRUCTION AND DATA MEMORY

    4.1 OBJECTIVE

    You will complete and integrate few missing building blocks of your multi-cycle CPU in this lab exercise using a split-memory architecture:

    a. Control Unit and Front Panel; b. Instruction Memory, c. Data Memory.

    You will build and test the whole of the CPU within two lab sessions by running each of the instructions in your ISA (from Experiment 3) after broadside loading the Instruction Memory during the programming of the Cyclone II FPGA on the DE2-70 Development board.

    4.2 PRELIMINARY WORK

    4.2.1 Control Unit and Front Panel

    Define and design a hardwired (not microcoded) Control Unit based on the register transfer sequences you identified during the datapath and ISA design phase for each instruction:

    Remember the average CPI performance of your design will directly depend on the number of FSM states reserved for each instruction type. Try to minimize the CPI.

    Depending on your prior design of the ISA and datapath, you may have some instructions (e.g. multiplication) which take many unique cycles (or FSM states) to complete. One strategy would be to first design and validate the control associated with these instructions as separate FSMs based on previous labs, and integrate them to the main FSM afterwards.

    Before you can complete the FSM, you will need to make sure memory interface has the correct timing. Go through the next two sections to generate the Instruction and Data Memory modules and ensure they have consistent timing with the Control Unit outputs.

    Implement the user accessible inputs shown in Table 4.1 at the interface to the Control Unit in order to facilitate the start, stop, and debug of your machine.

    Table 4.1. Front Panel Inputs

    RUN (toggle-switch)

    When RUN=1, FETCH the instruction from the Instruction Memory address as stored in the PC (e.g. address 0), and continue FETCH-DECODE-EXECUTE cycle for the rest of the program. When RUN=0, pause the execution after the completion of the current instruction execution (When RUN is reasserted, the execution should continue from the next instruction whose address is in the PC).

    CLR/INIT (push-button)

    Initialize the PC to its default value (the location of the 1st instruction in the instruction memory), and clear the machine state (reset all architectural and user registers)

    AUTO(0)/ MANUAL(1)

    (toggle-switch)

    When in MANUAL mode, switch the clock input over to a push-button on the DE2-70 board. When in AUTO mode, a free running clock is used either directly from the board or from the output of a Phase Locked Loop integrated to your design.

    MAN_CLK Push-button clock used only when in MANUAL mode.

  • 19

    Implement the user visible outputs at the interface to the Control Unit, as depicted in Table 4.2, in order to facilitate the debug of your machine.

    Table 4.2. Front Panel Outputs

    RUN Indicator (green LED)

    On when the machine is in RUNning state. It turns off when RUN=0.

    CLR/INIT Indicator

    (green LED)

    Turns on when the CLR/INIT button is pressed. Turns back off when the button is released.

    AUTO/MANUAL Indicator

    (green LED) Turns on when in AUTO mode and off when in MANUAL mode.

    Critical Debug Signals

    (red LEDs)

    Bring out the following signals to an LED in order to ease debug: PC Write Enable IR Write Enable

    Register File Write Enable Register A and Q Write Enables

    Data Memory Write Enable (And any other important signals you would like to monitor)

    Control State (7-Segment

    Display)

    Decode the (present) state information from your FSM and display using hexadecimal digits on your DE2-70 board.

    4.2.2 Instruction Memory

    The instruction memory will be implemented as a 2 kB ROM with 1024 words and 16 bits per word as described in Experiment 3. The ROM can be programmed either while uploading the design bit file to the Cyclone II FPGA, or during an interactive debug session using the In-System Memory Content Editor under the Quartus II Tools menu. The following steps should be followed in instantiating a ROM into your top level schematic design: a. Double click empty space to launch the library menu b. Pick altsyncram component under quartus/libraries megafunctions storage (or you

    can just type in the name to find it). c. Pick VHDL as the output file type in the first window of the MegaWizard Plug-In Manager.

    Make sure the output file will be created under your own project directory. d. Next Window: Pick With one read port (ROM mode) at the top of the menu. Also indicate

    you want to specify the memory size As a number of words in the below menu. e. Next Window: Select 16 bits for the width of the Read/Write Ports and select 1024 for the

    number of 16-bit words. You can leave Auto for the memory block type and maximum block depth.

    f. Next Window: Select Single clock for the clocking mode. g. Next Window: Unselect any ports to be registered. i.e. you want minimum amount of

    registering to prevent a read operation from taking many clock cycles. Read input ports are registered by default and you do not have control over those based on the Cyclone II FPGA internal memory architecture. No clock enable or clr signals are needed. You can add them if you prefer.

    h. Next Window: Pick Yes, use this file for the memory content data. In the provided space type in a sensible file name you will use to store your program in hexadecimal format with .mif suffix. e.g. test_program_instructions.mif. Also check the option at the bottom to Allow In-System Memory Content Editor to capture and update content and enter a

  • 20

    short word for the Instance ID to be used later to in the In-System Memory Content Editor to refer to this particular ROM e.g. rom1.

    i. For the next set of windows you do not have to enter anything. Finish creating the block. Once you are done, you will find a .JPEG file generated in your project directory to specify the pin timing of the memory module you have just created. Study this file to ensure the timing of the input and output signals with respect to the clock matches your expectations. If not, you may need to modify your Control FSM a bit.

    j. Connect the ROM as the Instruction Memory to the rest of your design.

    You can modify the contents of the ROM to be programmed by opening the .mif file created in your project directory in a text editor e.g. using Wordpad. The content of this file is integrated into the bit file to be used for programming the FPGA at the time of the design compilation stage. You can also use the same file to modify the ROM contents while the system is running on the DE2-70 board using the In-System Memory Content Editor.

    4.2.3 Data Memory

    The data memory will be implemented as a 2 kB Static RAM with 1024 words and 16 bits per word as described in Experiment 3. The steps required to create the RAM are very similar to the above steps followed for the ROM, except a write port is required. Only the modifications to the previous steps are highlighted below. b. Again use the same altsyncram component from the megafunction libraries. d. Pick With one read/write port (Single-port mode) in the parameter settings general

    menu. Also indicate you want to specify the memory size As a number of words in the below menu.

    e. Next Window: Select 16 bits for the width of the Read/Write Ports and select 1024 for the number of 16-bit words. You can leave Auto for the memory block type and maximum block depth.

    h. Next Window: Pick Yes, use this file for the memory content data. In the provided space type in a different name than what you used for the ROM. You can use this .mif file to initialize the data memory with desired data. Remember to check the option at the bottom to Allow In-System Memory Content Editor to capture and update content and enter a short word for the Instance ID to be used later to in the In-System Memory Content Editor to refer to this particular RAM e.g. ram1.

    i. For the next set of windows you do not have to enter anything. Just finish creating the block. j. Connect the RAM as the Data Memory to the rest of your design.

    You can modify the initial contents of the RAM by opening the .mif file created in your project directory in a text editor e.g. using Wordpad. You can also use the same file to modify the RAM contents while running a debug session using the In-System Memory Content Editor for debug purposes.

    Special Note: Cyclone II has a problem associated with the M4K memory feature and the In-System Memory Content Editor, which will prevent your design from compiling once you add the RAM component with step h configuration above. Add the following line to the very end of your project .QSF file (in your project directory) and save using a text editor before running compilation: set_global_assignment -name CYCLONEII_M4K_COMPATIBILITY OFF

    As your design gets larger, it is more important to modularize different parts (e.g. Memory subsystem, RF and ALU subsystem, Control subsystem, etc.) for your top level schematic to be readable. There is a tradeoff in doing this: Some of your internal signals lose visibility at the top level as you push them into modules, and therefore debug may require a bit more work when you need to look at these signals.

  • 21

    4.3 EXPERIMENTAL WORK

    Preliminary Work Report i) Submit a report containing your simulations of:

    a. Control Unit showing state transitions for each instruction type b. Full simulations of following nature showing critical control and data signals to

    demonstrate functionality (fully annotate simulation waveforms to explain): 1. Load immediate values to two registers 2. Do an arithmetic operation between the two registers and store the result into a

    register 3. Store the register value to a memory address 4. Load from the same memory address into another register 5. Run a conditional and an unconditional branch (jump)

    ii) Also be ready to submit a soft copy of your full design with simulation waveform files.

    Multi-Cycle CPU Validation

    i) Use the Front Panel features to validate your Control Unit first by storing each of your OPCODEs to the Instruction Memory, and going through the FSM cycle-by-cycle in MANUAL clocking mode to ensure all state transitions are as expected. Also pay attention to the LEDs to ensure critical control signals are asserted in the expected FSM states. It is very important that your PC will be initialized with a known address value (e.g. 0x0000) during startup, because this will determine the address of the first instruction to be fetched from the Instruction Memory.

    ii) Write simple code segments to load some registers, run arithmetic/logic operations on them, and store the results to the data memory. You can check the final state of the data memory in the lab by using the In-System Memory Content Editor.

    iii) Validate the more sophisticated instructions which consume many clock cycles to execute such as multiplication, division, shift, etc.

    iv) Finally write program segments to validate conditional and unconditional branches/jumps in your ISA.

    v) Demonstrate each of your instruction types to your lab instructor.

  • 22

    EXPERIMENT #6 & #7: FINAL

    PROGRAMMING AND VALIDATION OF A CPU

    6.1 OBJECTIVE

    In this experiment you will complete a set of macro-code programs for the CPU you designed in the

    previous labs. You are expected to write an assembler in order to convert your mnemonic based assembly language into binary machine code (similar to one of the assignments you completed last semester for the MIPS architecture). Programming in machine language would be too cumbersome. You will upload your programs to the instruction memory one at a time, and execute to validate full functionality of your CPU.

    6.2 PRELIMINARY WORK

    Write code to execute each of the following tasks using your CPU.

    6.2.1 n-long Array Computation

    You will be provided with Data Memory entries, which look like the one in Table 6.1. The first number n represents the array length. The rest of the entries contain the arrays A, B, and C each n memory words long. Your program will read the 3 input arrays, and do the below computation on the array elements, which are 8-bit signed integers. The program will then store the result array Z back to the Data Memory starting with the next available entry in the Data Memory. Z[i] = A[i] + B[i] - C[i]

    Table 6.1. Data memory entries for nx1 array computation

    Address Entry Address Entry Address Entry Address Entry

    0 n - - - - - -

    1 A[0] n+1 B[0] 2n+1 C[0] 3n+1 Z[0]

    2 A[1] n+2 B[1] 2n+2 C[1] 3n+2 Z[1]

    n A[n-1] 2n B[n-1] 3n C[n-1] 4n Z[n-1]

    6.2.2 Text Parser

    This program will read ASCII encoded text from the Data Memory starting at data memory address 2, count the number of alphanumeric and space characters (backspace, tab, line feed, form feed, carriage return), and replace any consecutive identical space characters with a single space character. The program will stop execution when it runs into a null character in the text, report the number of alphanumeric characters detected at Data Memory address 0, and the number of space characters at address 1.

  • 23

    6.2.3 Multiplication

    The program will read two 8-bit signed numbers from Data Memory addresses 0 and 1, multiply them, and store the result to address 2.

    6.2.4 Division

    The program will read two 8-bit signed numbers from Data Memory addresses 0 and 1, divide the first number by the second number, and store the result to address 2.

    6.2.5 Infinite Loop

    You will write the shortest program possible using your ISA which loops onto itself forever. The purpose of this program will be to look at performance and power dissipation.

    6.3 EXPERIMENTAL WORK i. Debug, validate, and demonstrate the correct execution of each of the above programs to

    the lab instructor as well as the CPU Front Panel functions outlined in Lab 5.

    ii. What is the average CPI for each of your test programs? What is the execution time (CPU time only) for each?

    iii. Check with your lab instructor about measuring the consumed average power during the execution of the Infinite Loop test. Please do not do this by yourself, since the measurement requires the removal of the DE2-70 top lid to access the power lines.

    iv. BONUS: If you can demonstrate that your machine is capable of handling 16-bit data

    memory words, i.e. two 8-bit numbers per memory word, BONUS points will be awarded.

    6.4 FINAL REPORT

    You will prepare a final report for the semester which has all the critical information about the CPU you designed during the semester:

    - The CPU name and the design team - Objectives and design approach - The final ISA, and the main instruction formats (entered electronically not handwritten) - The final design files (VHDL and schematic) - The code for each of the five programs described above (in assembly and binary) - Sample input and output files showing the relevant portions of the data memory before and

    after each program is run - Performance measurements and power estimates - Conclusions

    You will submit one copy of the final report by June 1st. Please type neatly and bind it into a formal report booklet. It is recommended you keep a copy of the report to yourself for future reference.