1998 Morgan Kaufmann Publishers Mario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-1 Chapter Five The...
Transcript of 1998 Morgan Kaufmann Publishers Mario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-1 Chapter Five The...
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-1
Chapter Five
The Processor: Datapath and Control(Parte B: multiciclo)
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-2
• Break up the instructions into steps, each step takes a cycle
– balance the amount of work to be done
– restrict each cycle to use only one major functional unit
• 1 ALU, 1 Memória, 1 Banco de Registradores
• At the end of a cycle
– store values for use in later cycles (easiest thing to do)
– introduce additional “internal” registers
Multicycle Approach
PC
Memory
Address
Instructionor data
Data
Instructionregister
Registers
Register #
Data
Register #
Register #
ALU
Memorydata
register
A
B
ALUOut
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-3
• IR (Instruction Register) e MDR (Memory Data Register) salvam saída da memória
• Registradores A e B salvam saída do banco de registradores
• ALUout salva saída da ALU
• Todos, exceto IR, guardam dados por um clock controle de escrita é desnecessário
• novos MUXes: endereço da memória, operandos da ALU
Multicycle Approach
Shiftleft 2
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Mux
0
1
Mux
0
1
4
Instruction[15– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
Mux
ALUresult
ALUZero
Memorydata
register
Instruction[15– 11]
A
B
ALUOut
0
1
Address
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-4
Via de dados multiciclo e sinais de controle
Shiftleft 2
MemtoReg
IorD MemRead MemWrite
PC
Memory
MemData
Writedata
Mux
0
1
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Instruction[15– 11]
Mux
0
1
Mux
0
1
4
ALUOpALUSrcB
RegDst RegWrite
Instruction[15– 0]
Instruction [5– 0]
Signextend
3216
Instruction[25– 21]
Instruction[20– 16]
Instruction[15– 0]
Instructionregister
1 Mux
0
3
2
ALUcontrol
Mux
0
1ALU
resultALU
ALUSrcA
ZeroA
B
ALUOut
IRWrite
Address
Memorydata
register
Falta implementar 3 possíveis fontes de carga para PC: PC+4, beq e j
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-5
Via de dados completa
S h ift
le f t 2
P C
Mux
0
1
R e gis te rsW ritereg is te r
W rited a ta
R e a dda ta 1
R e a dda ta 2
R e a dreg is te r 1
R e a dreg is te r 2
In stru c tio n[1 5– 11 ]
Mux
0
1
Mux
0
1
4
Ins truc t io n[1 5– 0 ]
S ig n
ex te n d
3 216
In s t ru ct ion[25 – 21 ]
In s t ru ct ion[20 – 16 ]
In s t ru ct ion[1 5 – 0 ]
Ins tru c tio n
re g is te r
A L U
c on t ro l
A L Ur es u l t
A L U
Z er o
M e m o ry
d a ta
re g is te r
A
B
Io r D
M e m R e a d
M e m W r ite
M e m to R e g
P C W r it e C o n d
P C W rit e
IR W ri te
A L U O p
A L U S r c B
A L U S rc A
R e g D s t
P C S o u r c e
R e g W rite
C o n tr o l
O u tp u ts
O p
[5 – 0 ]
Ins truc tio n[3 1 - 26 ]
Ins tru c t io n [5 – 0]
Mu
x
0
2
J u m p
a d d re s s [3 1 -0 ]Ins truc t io n [2 5 – 0 ] 2 6 28S h ift
le f t 2
P C [3 1 - 2 8 ]
1
1 Mux
0
3
2
Mux
0
1
A L U O u t
M e m or y
M em D ata
W r ited a ta
A d d re s s
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-6
Sinais de controle (1 bit)
Desligado Ligado
RegDst Reg de escrita rt Reg de escrita rd
RegWR Escreve no banco
ALUSrcA operando é o PC operando é o reg A
MemRD Lê a memória
MemWR Escreve na memória
Memto Reg Reg WR data ALUout Reg WR data MDR
IorD Endereço do PC Endereço de ALUout
IRWrite Escreve no IR
PCWrite Escreve no PC
PCWriteCond Escr PC se ALU=0
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-7
Sinais de controle (2 bits)
00 Soma 01 Subtração
ALUop
10 Function 00 Segundo operando é o registrador B 01 “ constante 4 10 “ sign-extend, 16 bits do IR
ALUSrcB
11 “ idem acima, deslocado 2 bits à esquerda 00 PC + 4 01 ALUout (target address)
PCSrc
10 Jump (4 bits PC & 26 bits ender & 00)
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-8
• Instruction Fetch
• Instruction Decode and Register Fetch
• Execution, Memory Address Computation, or Branch Completion
• Memory Access or R-type instruction completion
• Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Five Execution Steps
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-9
• Use PC to get instruction and put it in the Instruction Register.
• Increment the PC by 4 and put the result back in the PC.
• Can be described succinctly using RTL "Register-Transfer Language"
IR = Memory[PC];PC = PC + 4;
Can we figure out the values of the control signals?
What is the advantage of updating the PC now?
Step 1: Instruction Fetch
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-10
• Read registers rs and rt in case we need them
• Compute the branch address in case the instruction is a branch
• RTL:
A = Reg[IR[25-21]];B = Reg[IR[20-16]];ALUOut = PC + (sign-extend(IR[15-0]) << 2);
• We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)
Step 2: Instruction Decode and Register Fetch
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-11
• ALU is performing one of three functions, based on instruction type
• Memory Reference:
ALUOut = A + sign-extend(IR[15-0]);
• R-type:
ALUOut = A op B;
• Branch:
if (A==B) PC = ALUOut;
Step 3 (instruction dependent)
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-12
• Loads and stores access memory
MDR = Memory[ALUOut];or
Memory[ALUOut] = B;
• R-type instructions finish
Reg[IR[15-11]] = ALUOut;
The write actually takes place at the end of the cycle on the edge
Step 4 (R-type or memory-access)
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-13
• Reg[IR[20-16]]= MDR;
What about all the other instructions?
Write-back step
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-14
Summary:
Step nameAction for R-type
instructionsAction for memory-reference
instructionsAction for branches
Action for jumps
Instruction fetch IR = Memory[PC]PC = PC + 4
Instruction A = Reg [IR[25-21]]decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] IIcomputation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-15
Outra visão, fluxograma
IR=Mem[PC]PC=PC+4
A=Reg[IR[25:21]]B=Reg[IR[20:16]]
ALUout=PC+(SignExt(IR(15:0))<<2)
ALUout=A op B
Reg[IR[15:11]]=ALUout
ALUout=A+SignExt(IR(15:0))
Mem(ALUout)=BMDR=Mem(ALUout)
Reg[IR[15:11]]=MDR
If A==B PC=ALUout
PC=PC[31:28]&IR[25:0]<<2
5
0
1
6 2 8 9
7 3
4
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-16
• How many cycles will it take to execute this code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
• What is going on during the 8th cycle of execution?• In what cycle does the actual addition of $t2 and $t3 takes place?
Simple Questions
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-17
• Value of control signals is dependent upon:
– what instruction is being executed
– which step is being performed
• Use the information we’ve acculumated to specify a finite state machine
– specify the finite state machine graphically, or
– use microprogramming
• Implementation can be derived from specification
Implementing the Control
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-18
Tabela dos sinais de controle
• How many state bits will we need?
Graphical Specification of FSM
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCond
PCSource = 01
ALUSrcA =1ALUSrcB = 00ALUOp= 10
RegDst = 1RegWrite
MemtoReg = 0MemWriteIorD = 1
MemReadIorD = 1
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
RegDst=0RegWrite
MemtoReg=1
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
Instruction fetchInstruction decode/
register fetch
Jumpcompletion
BranchcompletionExecution
Memory addresscomputation
Memoryaccess
Memoryaccess R-type completion
Write-back step
(Op = 'LW') or (Op = 'SW') (Op = R-type)
(Op
= 'BE
Q')
(Op
= 'J
' )
(Op = 'SW')
(Op
= ' L
W' )
4
01
9862
753
Start
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-20
Finite State Machine for Control
PCWrite
PCWriteCond
IorD
MemtoReg
PCSource
ALUOp
ALUSrcB
ALUSrcA
RegWrite
RegDst
NS3NS2NS1NS0
Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
State register
IRWrite
MemRead
MemWrite
Instruction registeropcode field
Outputs
Control logic
Inputs
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-21
Desempenho desta máquina para o gcc
• CPI = 0,23*5 + 0,13*4 + 0,43*4 + 0,19*3 + 0,02*3 = 4,02
• Melhor do que se todas as instruções tomassem 5 ciclos
• Melhoria em MIPS (supor ck = 100 MHz)
– CPI = 4 -> 25 MIPS
– CPI = 5 -> 20 MIPS
– CPI = 1 (fazer tudo em um ciclo) -> 100 MIPS
lw sw Tipo R beq jump
% 23 13 43 19 2
CPI 5 4 4 3 3
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-22
PLA Implementation
• If I picked a horizontal or vertical line could you explain it?Op5
Op4
Op3
Op2
Op1
Op0
S3
S2
S1
S0
IorD
IRWrite
MemReadMemWrite
PCWritePCWriteCond
MemtoRegPCSource1
ALUOp1
ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0
ALUSrcB1ALUOp0
PCSource0
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-23
• ROM = "Read Only Memory"– values of memory locations are fixed ahead of time
• A ROM can be used to implement a truth table– if the address is m-bits, we can address 2m entries in the ROM.– our outputs are the bits of data that the address points to.
m is the "heigth", and n is the "width"
ROM Implementation
m n
0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-24
• How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)
• How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs
• ROM is 210 x 20 = 1K x 20 bits (and a rather unusual size)
• Rather wasteful, since for lots of the entries, the outputs are the same
— i.e., opcode is often ignored
ROM Implementation
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch5B-25
• Break up the table into two parts
— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM
— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM
— Total: 4.3K bits of ROM
• PLA is much smaller
— can share product terms
— only need entries that produce an active output
— can take into account don't cares
• Size is (#inputs #product-terms) + (#outputs #product-terms)
For this example = (10x17)+(20x17) = 460 PLA cells
• PLA cells usually about the size of a ROM cell (slightly bigger)
ROM vs PLA