Machine Structures Lecture 17 – Introduction to CPU Design

39
Fedora Core 6 (FC6) just out The latest version of the distro has been released; they suggest using Bittorent to get it. Performance improvements and support for Intel-based Macs. (Oh, Apple just upgraded Pros’ CPU to Intel Core 2 Duo). Machine Structures Lecture 17 – Introduction to CPU Design fedoraproject.org

description

Machine Structures Lecture 17 – Introduction to CPU Design. - PowerPoint PPT Presentation

Transcript of Machine Structures Lecture 17 – Introduction to CPU Design

Page 1: Machine Structures Lecture 17 –  Introduction to CPU Design

Fedora Core 6 (FC6) just out The latest version of the distro

has been released; they suggest using Bittorent to get it. Performance improvements and support for Intel-based Macs. (Oh, Apple

just upgraded Pros’ CPU to Intel Core 2 Duo).

Machine Structures

Lecture 17 – Introduction to CPU Design

fedoraproject.org

Page 2: Machine Structures Lecture 17 –  Introduction to CPU Design

Five Components of a Computer

Processor

Computer

Control

Datapath

Memory(passive)

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse

Display, Printer

Disk (where programs, data live when not running)

Page 3: Machine Structures Lecture 17 –  Introduction to CPU Design

The CPU

•Processor (CPU): 计算机的核心,完成所有的工作 ( 操作数据及决策 )

•Datapath (数据通道) : processor 的一部分,功能是执行运算 ( 肌肉部分 brawn)

•Control: processor 的一部分,指挥(控制) datapath 做什么 ( 大脑部分 brain)

Page 4: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath : Overview

•问题 : 将 “执行整个指令”的块做为一个整体

• 太大 ( 该块要执行从取指令开始的所有操作 )

• 效率不高

•解决方案 : 将 “执行整个指令” 的操作分解为多个阶段 (stage) ,然后将所有阶段连接在一起产生整个 datapath

• 每一阶段更小,从而更容易设计• 方便优化其中一个阶段,而不必涉及其他阶

Page 5: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath (1/5)

•MIPS 有多种指令 : 共同的步骤是些什么 ?

•Stage 1: 取指• 无论何种指令 , 首先必须把 32- 位指令字从

内存中取出。 ( 可能涉及缓存结构 )

• 在这一步,我们还需要增加 PC ( 即 PC = PC + 4, 以指向下一条指令,由于是按字节寻址,故 +4)

Page 6: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath (2/5)

•Stage 2: 指令译码 Instruction Decode• 在取到指令后 , 下一步从各域 (fields) 中得到

数据 ( 对必要的指令数据进行解码 )

• 首先,读出 Opcode ,以决定指令类型及字段长度

• 接下来,从相关部分读出数据 for add, read two registers for addi, read one register for jal, no reads necessary

Page 7: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath (3/5)

•Stage 3: ALU (Arithmetic-Logic Unit)• 大多数指令的实际工作在此部完成 : 算术指令 (+, -, *, /), shifting, logic (&, |), comparisons (slt)

• what about loads and stores? lw $t0, 40($t1) 要访问的内存地址 = $t1 的值 + 40 so we do this addition in this stage

Page 8: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath (4/5)

•Stage 4:  内存访问 Memory Access• 事实上只有 load 和 store 指令在此 stage 会

做事 ; 其它指令在此阶段空闲 idle 或者直接跳过本阶段

• 由于 load 和 store 需要此步,因此需要一个专门的阶段 stage 来处理他们

• 由于 cache 系统的作用,该阶段有望加速• 如果没有 caches ,本阶段 stage 会很慢

Page 9: Machine Structures Lecture 17 –  Introduction to CPU Design

Stages of the Datapath (5/5)

•Stage 5: 写寄存器 Register Write• 大多数指令会将计算结果写到寄存器• 例如 : arithmetic, logical, shifts, loads, slt

• what about stores, branches, jumps? don’t write anything into a register at the end these remain idle during this fifth stage or

skip it all together

Page 10: Machine Structures Lecture 17 –  Introduction to CPU Design

Generic Steps of Datapath

PC

inst

ruct

ion

me

mor

y

+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

1. InstructionFetch

2. Decode/ Register

Read

3. Execute 4. Memory5. Reg. Write

Page 11: Machine Structures Lecture 17 –  Introduction to CPU Design

Datapath Walkthroughs (1/3)

•add $r3,$r1,$r2 # r3 = r1+r2• Stage 1: 取指 , 增加 PC

• Stage 2: 解码,知道是 add 指令 , 读寄存器$r1 和 $r2

• Stage 3: 将上一步获得的两个值相加• Stage 4: idle ( 不必读写内存 )

• Stage 5: 将第三步 Stage 3 的结果写入寄存器 $r3

Page 12: Machine Structures Lecture 17 –  Introduction to CPU Design

Example: add Instruction

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

2

1

3

add

r3

, r1

, r2

reg[1]+reg[2]

reg[2]

reg[1]

Page 13: Machine Structures Lecture 17 –  Introduction to CPU Design

Datapath Walkthroughs (2/3)

•slti $r3,$r1,17• Stage 1: 取指 , 增加 PC

• Stage 2: 解码,知道是 slti, 然后读寄存器$r1

• Stage 3: 比较上一步获得的值和 17

• Stage 4: idle

• Stage 5: 将第三步的结果写入寄存器 $r3

Page 14: Machine Structures Lecture 17 –  Introduction to CPU Design

Example: slti Instruction

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

slti

r3, r

1, 1

7

reg[1]<17?

17

reg[1]

Page 15: Machine Structures Lecture 17 –  Introduction to CPU Design

Datapath Walkthroughs (3/3)

•sw $r3, 17($r1)• Stage 1: 取指 , 增加 PC

• Stage 2: 解码,知道是 sw, 然后读寄存器$r1 和 $r3

• Stage 3: 将 17 与寄存器 $r1 的值相加 ( 上一步获得 )

• Stage 4: 将寄存器 $r3 的值 ( 第 2 步取得 )写到第 3 步计算得到的内存地址

• Stage 5: idle ( 不必写入寄存器 )

Page 16: Machine Structures Lecture 17 –  Introduction to CPU Design

Example: sw Instruction

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

SW

r3,

17

(r1

)

reg[1]+17

17

reg[1]

ME

M[r

1+1

7]<

=r3

reg[3]

Page 17: Machine Structures Lecture 17 –  Introduction to CPU Design

Why Five Stages? (1/2)

•是否能有不同的步骤 ?

• 是 , 其他结构是这样的

•为什么 MIPS 有5步,如果指令至少在某一步空闲 (idle)?

• 5 步可以将所有的操作统一 .

• There is one instruction that uses all five stages: the load

Page 18: Machine Structures Lecture 17 –  Introduction to CPU Design

Why Five Stages? (2/2)•lw $r3, 17($r1)

• Stage 1: 取指 , 增加 PC

• Stage 2: 解码,知道是 lw, 读寄存器 $r1

• Stage 3: 将 17 与寄存器 $r1 的值相加 ( 上一步得到 )

• Stage 4: 从上一步计算得到的内在地址中读值

• Stage 5: 将上一步得到的值写入寄存器 $r3

Page 19: Machine Structures Lecture 17 –  Introduction to CPU Design

Example: lw Instruction

PC

inst

ruct

ion

me

mor

y

+4

regi

ste

rs

ALU

Da

tam

em

ory

imm

3

1

x

LW

r3

, 17

(r1

)

reg[1]+17

17

reg[1]

ME

M[r

1+1

7]

Page 20: Machine Structures Lecture 17 –  Introduction to CPU Design

Datapath Summary

•为了执行指令,需要有基于数据变换的数据通道 (Datapath)

•控制器 controller 产生正确的变换

PC

inst

ruct

ion

me

mor

y

+4

rtrs

rd

regi

ste

rs

ALU

Da

tam

em

ory

imm

Controller

opcode, funct

Page 21: Machine Structures Lecture 17 –  Introduction to CPU Design

What Hardware Is Needed? (1/2)

•PC 寄存器:用于踊跃记录下一个指令的内存地址•通用寄存器

• 用于第二步 (Read) 和第五步 (Write)

• MIPS has 32 of these

•内存• 用于第一步 (Fetch) 和第 4 步 (R/W)

• Cache 系统使得这两步和其他步骤同样快(平均而言)

Page 22: Machine Structures Lecture 17 –  Introduction to CPU Design

What Hardware Is Needed? (2/2)• ALU

• 用于第三步• 用于执行所有必要的函数功能 : arithmetic,

logicals, etc.

• 后面会进行详细设计

•其他寄存器• 为了实现每个时钟周期执行一步 , 在各步 (stage) 之

间插入寄存器以保存阶段变换过程中的中间数据和控制信号 .

• 注 : 寄存器是通用名词,意即保存位的实体 . 不是所有寄存器都在“寄存器文件”中 .

Page 23: Machine Structures Lecture 17 –  Introduction to CPU Design

CPU clocking (1/2)

•单周期 CPU: 指令的所有阶段在一个长的时钟周期中完成 .

• The clock cycle is made sufficient long to allow each instruction to complete all stages without interruption and within one cycle.

对每个指令 , 如何控制数据通道中信息的流动 ?

1. InstructionFetch

2. Decode/ Register

Read

3. Execute 4. Memory5. Reg. Write

Page 24: Machine Structures Lecture 17 –  Introduction to CPU Design

CPU clocking (2/2)

•多时钟周期 CPU: 每个时钟周期,执行一个stage 指令 .

• 时钟和最慢的 stage 一样长 .

和单时钟执行相比,有几个好处 : 某个指令未用的阶段 stages 可以跳过,指令可以进入流水线pipelined (重叠 ).

对每个指令 , 如何控制数据通道中信息的流动 ?

1. InstructionFetch

2. Decode/ Register

Read

3. Execute 4. Memory5. Reg. Write

Page 25: Machine Structures Lecture 17 –  Introduction to CPU Design

Verilog big idea: Time in code

•One difference from a prog. lang. is that time is part of the language

• part of what trying to describe is when things occur, or how long things will take

• In both structural and behavioral Verilog, determine time with #n : event will take place in n time units

• structural: not #2(notX, X) says notX does not change until time advances 2 ns

• assign #2 Z = A ^ B; says Z does not change until time advances 2 ns

• Default unit is nanoseconds; can change

Page 26: Machine Structures Lecture 17 –  Introduction to CPU Design

2-input Mux with delay

module mux2 (in0, in1, select, out);

input in0,in1,select;

output out;

wire s0,w0,w1;

not

#1 (s0, select); // 1ns gate delays

and

#1 (w0, s0, in0),

(w1, select, in1);

or

#1 (out, w0, w1);

endmodule // mux2

Page 27: Machine Structures Lecture 17 –  Introduction to CPU Design

Testing in Verilog•Code examples so far define hardware modules.

•Need separate code to test the module (just like C/Java)

•Since hardware is hard to build, major emphasis on testing in HDL

•Testing modules called “test benches” in Verilog;

• like a bench in a lab dedicated to testing

•Could design special hardware blocks to test other blocks - awkward! Use behavioral Verilog

Page 28: Machine Structures Lecture 17 –  Introduction to CPU Design

Example: Behavioral Test Block (signal generator)

Page 29: Machine Structures Lecture 17 –  Introduction to CPU Design

Testing Verilog• Create a test module for mux2:

module testmux;

reg a, b, s;

reg expected; wire f;

mux2 myMux(.select(s), .in0(a),

.in1(b), .out(f));

/* add testing code */

endmodule

•Outline: declare variable to use for connection from testbench, instantiate module, specify stimulus, (compare output to expected), print results (or view with waveform viewer)

Page 30: Machine Structures Lecture 17 –  Introduction to CPU Design

Testing continuedNow we write code to try different inputs

by assigning to connections:

initial

begin

#0 s=0; a=0; b=1; expected=0;

#10 a=1; b=0; expected=1;

#10 s=1; a=0; b=1; expected=1;

#10 $stop;

end

Page 31: Machine Structures Lecture 17 –  Introduction to CPU Design

Testing continued• Use $monitor to watch some signals and

see every time they are updated:

initial

$monitor(

"select=%b in0=%b in1=%b out=%b

expected out=%b time=%d", s, a, b, f,

expected, $time);

• $time is system function which gives current (simulated) time

Page 32: Machine Structures Lecture 17 –  Introduction to CPU Design

Completed Example

Page 33: Machine Structures Lecture 17 –  Introduction to CPU Design

Output

select=0 in0=0 in1=1 out=x, expected out=0 time= 0

select=0 in0=0 in1=1 out=0, expected out=0 time= 2

select=0 in0=1 in1=0 out=0, expected out=1 time= 10

select=0 in0=1 in1=0 out=1, expected out=1 time= 12

select=1 in0=0 in1=0 out=1, expected out=0 time= 20

select=1 in0=0 in1=0 out=0, expected out=0 time= 22

•Expected value (of behavioral Verilog) matches actual value (of structural Verilog), so module works for the inputs patterns tested.

•Simple to extend this testbench to do exhaustive testing.

Page 34: Machine Structures Lecture 17 –  Introduction to CPU Design

Another Testbench for mux2

Page 35: Machine Structures Lecture 17 –  Introduction to CPU Design

For more help ...•Read Verilog Tutorial for many more ideas on building tests benches, including:

• more verilog behavioral constructs more looping constructs use verilog to generate expected output

(rather than enumerate by mimicking behavior of HW module)

more output routines testing circuits with state

•Read ModelSim manual for use of waveform viewer

Page 36: Machine Structures Lecture 17 –  Introduction to CPU Design

Specifying a clock signal...

initial

begin

CLK = 1'b0;

forever

#1 CLK = ~CLK;

end

...

• No built in clock in Verilog, so specify one

• Clock CLK above alternates forever in 2 ns period: 1 ns at 0, 1 ns at 1

Page 37: Machine Structures Lecture 17 –  Introduction to CPU Design

Accumulator Example//Accumulatormodule acc (CLK,RST,IN,OUT);

input CLK,RST;

input [3:0] IN;

output [3:0] OUT;

wire [3:0] W0;

add4 myAdd (.S(W0), .A(IN), .B(OUT));

reg4 myReg (.CLK(CLK), .Q(OUT), .D

(W0), .RST(RST));

endmodule // acc

• This module uses prior modules, using wire to connect output of adder to input of register

Page 38: Machine Structures Lecture 17 –  Introduction to CPU Design

Accumulator TestBenchmodule accTest;

reg [3:0] IN;

reg CLK, RST;

wire [3:0] OUT;

acc myAcc (.CLK(CLK), .RST(RST),

.IN(IN), .OUT(OUT));

initial

begin

CLK = 1'b0;

repeat (20)

#5 CLK = ~CLK;

end ...• Clock has a oscillation cycle of _ ns?

Page 39: Machine Structures Lecture 17 –  Introduction to CPU Design

Part II ...

initial

begin

#0 RST=1'b1; IN=4'b0001;

#10 RST=1'b0;

end

initial

$monitor("time=%0d: OUT=%1h", $time,OUT);

endmodule // accTest

• What does this initial block do?

• What is output sequence?

• How many lines of output?