Computer Architecture (Kiến trúc máy tính)

249
Nguyen Thanh Kien Department of Computer Engineering Faculty of Information Technology Hanoi University of Technology Computer Architecture This course for using in HEDSPI Project

description

Đây là giáo trình Kiến trúc máy tính của giảng viên Nguyễn Thành Kiên dùng cho lớp kỹ sư chương trình hợp tác Việt Nhật của Khoa Công nghệ thông tin (nay là Viện Công nghệ thông tin và truyền thông) trường Đại học Bách Khoa Hà Nội.

Transcript of Computer Architecture (Kiến trúc máy tính)

Page 1: Computer Architecture (Kiến trúc máy tính)

Nguyen Thanh KienDepartment of Computer EngineeringFaculty of Information TechnologyHanoi University of Technology

Computer ArchitectureThis course for using in HEDSPI Project

Page 2: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 2

Acknowledge

These materials are used as reference for this slide:

– “Computer Architecture, Background and Motivation” slides of UCSB, B.Parhami.

– Computer architecture - Behrooz Parhami

– Computer organization and design - John L. Hennessy & David A. Patterson

Page 3: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 3

Reference

Text Books:

– Computer architecture - Behrooz Parhami

– Computer organization and design - John L. Hennessy & David A. Patterson

Reference books

– Computer Architecture and Organization - John P. Hayes

– Computer Organization and Architecture – Designing for Performance - William Stallings

– Computer Architecture: Single and Parallel Systems - Mehdi R. Zargham

– Assembly Language Programming and Organization of the IBM-PC - Ytha Yu, Charles Marut

Page 4: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 4

About

Lecturer:

– Nguyen Thanh Kien

– Department of Computer Engineering, Faculty of Information Technology, Hanoi University of Technology

– Mobile: +84 983588135

– Email: [email protected]

[email protected]

– Address:

• Room 322, C1, Hanoi University of Technology

• No.1, Dai Co Viet, Hai Ba Trung, Hanoi.

Page 5: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 5

Content

1. Introduction - Computer system technology and Computer Performance

2. Instruction Set Architecture

3. Arithmetic for Computer

4. CPU Organization

Page 6: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 6

Content

1. Introduction - Computer system technology and Computer Performance

2. Instruction Set Architecture

3. Arithmetic for Computer

4. CPU Organization

Page 7: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 7

Content

1. Introduction - Computer system technology and Computer Performance

2. Instruction Set Architecture

3. Arithmetic for Computer

4. CPU Organization

Page 8: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 8

2. Instruction Set Architecture

2.1. Instructions and addressing

2.2. Procedures and Data

Page 9: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 9

2.1. Instructions and addressing

2.1.1 Abstract View of Hardware

2.1.2 Instruction Formats

2.1.3 Simple Arithmetic / Logic Instructions

2.1.4 Load and Store Instructions

2.1.5 Jump and Branch Instructions

2.1.6 Addressing Modes

Page 10: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 10

2.1. Instructions and addressing

2.1.1 Abstract View of Hardware

2.1.2 Instruction Formats

2.1.3 Simple Arithmetic / Logic Instructions

2.1.4 Load and Store Instructions

2.1.5 Jump and Branch Instructions

2.1.6 Addressing Modes

Page 11: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 11

2.1.1 Abstract View of Hardware

Memory and processing subsystems for MiniMIPS.

Memory up to 2 words 30

Loc 0 Loc 4 Loc 8

Loc m 4

Loc m 8

4 B / location

m 2 32

$0

$1

$2

$31

Hi Lo

ALU

$0

$1

$2

$31 FP

arith

EPC

Cause

BadVaddr Status

EIU FPU

TMU

Execution & integer unit

Floating- point unit

Trap & memory unit

. . .

. . .

(Coproc. 1)

(Coproc. 0)

(Main proc.)

Integer mul/div

Chapter 10

Chapter 11

Chapter 12

Page 12: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 12

Data Types

MiniMIPS registers hold 32-bit (4-byte) words. Other common data sizes include byte, halfword, and doubleword.

Byte

Halfword

Word

Doubleword

Halfword = 2 bytes

Byte = 8 bits

Word = 4 bytes

Doubleword = 8 bytes

Page 13: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 13

Register Conventions

Registers and data sizes in MiniMIPS.

Temporary values

More temporaries

Operands

Global pointer Stack pointer Frame pointer Return address

Saved

Saved Procedure arguments

Saved across

procedure calls

Procedure results

Reserved for assembler use

Reserved for OS (kernel)

$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $20 $21 $22 $23 $24 $25 $26 $27 $28 $29 $30 $31

0

$zero

$t0

$t2

$t4

$t6

$t1

$t3

$t5

$t7

$s0

$s2

$s4

$s6

$s1

$s3

$s5

$s7

$t8

$t9

$gp

$sp

$fp

$ra

$at

$k0 $k1

$v0

$a0

$a2

$v1

$a1

$a3

A doubleword sits in consecutive registers or memory locations according to the

big- endian order (most significant word comes first)

When loading a byte into a register, it goes in the low end Byte

Word

Doubleword

Byte numbering: 0 1 2 3

3

2

1

0

A 4- byte word sits in consecutive memory addresses according to the

big- endian order (most significant byte has the lowest address)

Page 14: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 14

Registers Used in This Chapter

10 temporary registers

8 operand registers

Page 15: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 15

2.1.2 Instruction Formats

A typical instruction for MiniMIPS and steps in its execution.

Assembly language instruction:

Machine language instruction:

High-level language statement:

000000 10010 10001 11000 00000 100000

add $t8, $s2, $s1

a = b + c

ALU-type instruction

Register 18

Register 17

Register 24 Unused

Addition opcode

ALU

Instruction fetch

Register readout

Operation

Data read/store

Register writeback

Register file

Instruction

cache

Data cache (not used)

Register file

P C

$17 $18

$24

Page 16: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 16

Add, Subtract, and Specification of Constants

MiniMIPS add & subtract instructions; e.g., compute: g = (b + c) (e + f)

add $t8,$s2,$s3 # put the sum b + c in $t8 add $t9,$s5,$s6 # put the sum e + f in $t9 sub $s7,$t8,$t9 # set g to ($t8) ($t9)

Decimal and hex constants

Decimal 25, 123456, 2873 Hexadecimal 0x59, 0x12b4c6, 0xffff0000

Machine instruction typically contains

an opcode one or more source operands possibly a destination operand

Page 17: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 17

MiniMIPS Instruction Formats

MiniMIPS instructions come in only three formats: register (R), immediate (I), and jump (J).

5 bits 5 bits

31 25 20 15 0

Opcode Source register 1

Source register 2

op rs rt

R 6 bits 5 bits

rd

5 bits

sh

6 bits

10 5 fn

Destination register

Shift amount

Opcode extension

Immediate operand or address offset

31 25 20 15 0

Opcode Destination or data

Source or base

op rs rt operand / offset

I 5 bits 6 bits 16 bits 5 bits

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

31 0

Opcode

op jump target address

J Memory word address (byte address divided by 4)

26 bits

25

6 bits

Page 18: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 18

2.1.3 Simple Arithmetic/Logic Instructions

The arithmetic instructions add and sub have a format that is common to all two-operand ALU instructions. For these, the fn field specifies the arithmetic/logic operation to be performed.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 x 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0

31 25 20 15 0

ALU instruction

Source register 1

Source register 2

op rs rt

R rd sh

10 5 fn

Destination register

Unused add = 32 sub = 34

Add and subtract already discussed; logical instructions are similar

add $t0,$s0,$s1 # set $t0 to ($s0)+($s1) sub $t0,$s0,$s1 # set $t0 to ($s0)-($s1) and $t0,$s0,$s1 # set $t0 to ($s0)($s1) or $t0,$s0,$s1 # set $t0 to ($s0)($s1) xor $t0,$s0,$s1 # set $t0 to ($s0)($s1) nor $t0,$s0,$s1 # set $t0 to (($s0)($s1))

Page 19: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 19

Arithmetic/Logic with One Immediate Operand

Instructions such as addi allow us to perform an arithmetic or logic operation for which one operand is a small constant.

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0

31 25 20 15 0

addi = 8 Destination Source Immediate operand

op rs rt operand / offset

I 1

An operand in the range [32 768, 32 767], or [0x0000, 0xffff], can be specified in the immediate field.

addi $t0,$s0,61 # set $t0 to ($s0)+61 andi $t0,$s0,61 # set $t0 to ($s0)61 ori $t0,$s0,61 # set $t0 to ($s0)61 xori $t0,$s0,0x00ff # set $t0 to ($s0) 0x00ff

For arithmetic instructions, the immediate operand is sign-extended

1 0 0 1Errors

Page 20: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 20

2.1.4 Load and Store Instructions

MiniMIPS lw and sw instructions and their memory addressing convention that allows for simple access to array elements via a base

address and an offset (offset = 4i leads us to the i th word).

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 x 1 0 0 0 0 0 0

31 25 20 15 0

lw = 35 sw = 43

Base register

Data register

Offset relative to base

op rs rt operand / offset

I 1 1 0 0 1 1 1 1 1

A[0] A[1] A[2]

A[i]

Address in base register

Offset = 4i

.

.

.

Memory

Element i of array A

Note on base and offset: The memory address is the sum of (rs) and an immediate value. Calling one of these the base and the other the offset is quite arbitrary. It would make perfect sense to interpret the address A($s3) as having the base A and the offset ($s3). However, a 16-bit base confines us to a small portion of memory space.

lw $t0,40($s3)lw $t0,A($s3)

Page 21: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 21

lw, sw, and lui Instructions

The lui instruction allows us to load an arbitrary 16-bit value into the upper half of a register while setting its lower half to 0s.

0

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0

31 25 20 15 0

lui = 15 Destination Unused Immediate operand

op rs rt operand / offset

I

Content of $s0 after the instruction is executed

lw $t0,40($s3) # load mem[40+($s3)] in $t0 sw $t0,A($s3) # store ($t0) in mem[A+($s3)]

# “($s3)” means “content of

$s3” lui $s0,61 # The immediate value 61 is

# loaded in upper half of $s0 # with lower 16b set to 0s

Page 22: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 22

Initializing a Register

Example

Show how each of these bit patterns can be loaded into $s0:

0010 0001 0001 0000 0000 0000 0011 1101 1111 1111 1111 1111 1111 1111 1111 1111

Solution

The first bit pattern has the hex representation: 0x2110003d

lui $s0,0x2110 # put the upper half in $s0 ori $s0, $s0, 0x003d# put the lower half in $s0

Same can be done, with immediate values changed to 0xffff for the second bit pattern. But, the following is simpler and faster:

nor $s0,$zero,$zero # because (0 0) = 1

Page 23: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 23

2.1.5 Jump and Branch Instructions

Unconditional jump and jump through register instructions

j verify # go to mem loc named “verify” jr $ra # go to address that is in $ra;

# $ra may hold a return address

The jump instruction j of MiniMIPS is a J-type instruction which is shown along with how its effective target address is obtained. The jump register (jr) instruction is R-type, with its specified register often being $ra.

0

0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0

31 0

j = 2

op jump target address

J

Effective target address (32 bits)

25

From PC

0 0

x x x x

0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0

31 25 20 15 0

ALU instruction

Source register

Unused

op rs rt

R rd sh

10 5 fn

Unused Unused jr = 8

Page 24: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 24

Conditional Branch Instructions

Conditional branch instructions of MiniMIPS.

Conditional branches use PC-relative addressing

bltz $s1,L # branch on ($s1)< 0 beq $s1,$s2,L # branch on ($s1)=($s2) bne $s1,$s2,L # branch on ($s1)($s2)

0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0

31 25 20 15 0

bltz = 1 Zero Source Relative branch distance in words

op rs rt operand / offset

I 0

1 1 0 0 x 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0

31 25 20 15 0

beq = 4 bne = 5

Source 2 Source 1 Relative branch distance in words

op rs rt operand / offset

I 1

Page 25: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 25

Comparison Instructions for Conditional Branching

Comparison instructions of MiniMIPS.

slt $s1,$s2,$s3 # if ($s2)<($s3), set $s1 to 1 # else set $s1 to 0;

# often followed by beq/bne slti $s1,$s2,61 # if ($s2)<61, set $s1 to 1 # else set $s1 to 0

1 1 1 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0

31 25 20 15 0

ALU instruction

Source 1 register

Source 2 register

op rs rt

R rd sh

10 5 fn

Destination Unused slt = 42

1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0

31 25 20 15 0

slti = 10 Destination Source Immediate operand

op rs rt operand / offset

I 1

Page 26: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 26

 

Examples for Conditional Branching

If the branch target is too far to be reachable with a 16-bit offset (rare occurrence), the assembler automatically replaces the branch instruction beq $s0,$s1,L1 with:

bne $s1,$s2,L2 # skip jump if (s1)(s2)

j L1 # goto L1 if (s1)=(s2) L2: ...

Forming if-then constructs; e.g., if (i == j) x = x + y

bne $s1,$s2,endif # branch on ij add $t1,$t1,$t2 # execute the “then”

partendif: ...

If the condition were (i < j), we would change the first line to:

slt $t0,$s1,$s2 # set $t0 to 1 if i<j beq $t0,$0,endif # branch if ($t0)=0;

# i.e., i not< j or ij

Page 27: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 27

Example

 

if-then-else Statements

Show a sequence of MiniMIPS instructions corresponding to:

if (i<=j) x = x+1; z = 1; else y = y–1; z = 2*z

Solution

slt $t0,$s2,$s1 # j<i? (inverse condition) bne $t0,$zero,else # if j<i goto else part addi $t1,$t1,1 # begin then part: x = x+1 addi $t3,$zero,1 # z = 1 j endif # skip the else part

else: addi $t2,$t2,-1 # begin else part: y = y–1 add $t3,$t3,$t3 # z = z+z

endif:...

Page 28: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 28

The simple while loop: while (A[i]==k) i=i+1;

Assuming that: i, A, k are stored in $s1,$s2,$s3

Solution

loop: add $t1,$s1,$s1 # t1 = 4*i

add $t1,$t1,$t1 #

add $t1,$t1,$s2 # t1 = A + 4*i

lw $t0,0($t1) # t0 = A[i]

bne $t0,$s3,endwhl #

addi $s1,$s1,1 #

j loop #

endwhl: … #

while Statements

Example

Page 29: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 29

switch(test) {

case 0:

a=a+1; break;

case 1:

a=a-1; break;

case 2:

b=2*b; break;

default:

}

beq s1,t0,case_0beq s1,t1,case_1beq s1,t2,case_2b default

case_0:addi s2,s2,1 #a=a+1b continue

case_1:sub s2,s2,t1 #a=a-1b continue

case_2:add s3,s3,s3 #b=2*bb continue

default:continue:

Example

switch Statements

Assuming that: test,a,b are stored in $s1,$s2,$s3

The simple switch

Page 30: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 30

2.1.6 Addressing Modes

Schematic representation of addressing modes in MiniMIPS.

Addressing Instruction Other elements involved Operand

Implied

Immediate

Register

Base

PC-relative

Pseudodirect

Some place in the machine

Extend, if required

Reg file Reg spec Reg data

Memory Add

Reg file

Mem addr

Constant offset

Reg base Reg data

Mem data

Add

PC

Constant offset

Memory

Mem addr Mem

data

Memory Mem data

PC Mem addr

Addressing mode is the method by which the location of an operandis specified within an instruction. MiniMIPS uses sixs addr modes:

jal

addi, ori…

lw, sw…

branch instr

j

Page 31: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 31

Example

 

Finding the Maximum Value in a List of Integers

List A is stored in memory beginning at the address given in $s1. List length is given in $s2. Find the largest integer in the list and copy it into $t0.

Solution

lw $t0,0($s1) # initialize maximum to A[0]

addi $t1,$zero,0 # initialize index i to 0 loop: add $t1,$t1,1 # increment index i by 1

beq $t1,$s2,done # if all elements examined, quit

add $t2,$t1,$t1 # compute 2i in $t2add $t2,$t2,$t2 # compute 4i in $t2 add $t2,$t2,$s1 # form address of A[i] in

$t2 lw $t3,0($t2) # load value of A[i] into

$t3slt $t4,$t0,$t3 # maximum < A[i]?beq $t4,$zero,loop # if not, repeat

with no changeaddi $t0,$t3,0 # if so, A[i] is the

new maximum j loop # change completed; now

repeat done: ... # continuation of the

program

Page 32: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 32

The 20 MiniMIPS Instructions Covered So Far Instruction Usage

Load upper immediate lui rt,imm

Add  add rd,rs,rt

Subtract sub rd,rs,rt

Set less than slt rd,rs,rt

Add immediate  addi rt,rs,imm

Set less than immediate slti rd,rs,imm

AND and rd,rs,rt

OR or rd,rs,rt

XOR xor rd,rs,rt

NOR nor rd,rs,rt

AND immediate andi rt,rs,imm

OR immediate ori rt,rs,imm

XOR immediate xori rt,rs,imm

Load word lw rt,imm(rs)

Store word sw rt,imm(rs)

Jump  j L

Jump register jr rs

Branch less than 0 bltz rs,L

Branch equal beq rs,rt,L

Branch not equal  bne rs,rt,L

Copy

Control transfer

Logic

Arithmetic

Memory access

op15

0008

100000

1213143543

20145

fn

323442

36373839

8 Table 5.1

Page 33: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 33

2. Instruction Set Architecture

2.1. Instructions and addressing

2.2. Procedures and Data

Page 34: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 34

2.2. Procedures and Data

2.2.1 Simple Procedure Calls

2.2.2 Using the Stack for Data Storage

2.2.3 Parameters and Results

2.2.4 Data Types

2.2.5 Arrays and Pointers

2.2.6 Additional Instructions

Page 35: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 35

#Laboratory Exercise 4, Home Assignment 1

#include <iregdef.h>

.text

.set noreorder

.globl start

.ent start

start:

li a0,-45 #load input parameter

jal abs #jum and link to abs procedure

nop

.end start

.ent abs

abs:

sub v0,zero,a0 #put -(a0) in v0; in case (a0)<0

bltz a0,done #if (a0)<0 then done

nop

add v0,a0,zero #else put (a0) in v0

done:

jr ra

.end abs

2.2.1 Simple Procedure Calls

Page 36: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 36

A procedure is: a subprogram that when called (initiated, invoked) performs a specific task, perhaps leading to one or more results, based on the input parameters (arguments) with which it is provided and returns to the point of call, having perturbed nothing else.

In assembly language, a procedure is associated with a symbolic name that denotes its starting address. The jal instruction in MIPS is intended specifically for procedure calls:

– it performs the control transfer (unconditional jump) to the starting address of the procedure,

– while also saving the return address in register $ra.

2.2.1 Simple Procedure Calls

Page 37: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 37

Illustrating a Procedure Call

Relationship between the main program and a procedure.

jal proc

jr $ra

proc Save, etc.

Restore

PC Prepare

to continue

Prepare to call

main

Page 38: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 38

Using a procedure involves the following sequence of actions:

1. Put arguments in places known to procedure (reg’s $a0-$a3) 2. Transfer control to procedure, saving the return address (jal) 3. Acquire storage space, if required, for use by the procedure 4. Perform the desired task 5. Put results in places known to calling program (reg’s $v0-$v1) 6. Return control to calling point (jr)

MiniMIPS instructions for procedure call and return from procedure:

jal proc # jump to loc “proc” and link; # “link” means “save the return

# address” (PC)+4 in $ra ($31)

jr rs # go to loc addressed by rs

2.2.1 Simple Procedure Calls

Page 39: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 39

Recalling Register Conventions

Registers and data sizes in MiniMIPS.

Temporary values

More temporaries

Operands

Global pointer Stack pointer Frame pointer Return address

Saved

Saved Procedure arguments

Saved across

procedure calls

Procedure results

Reserved for assembler use

Reserved for OS (kernel)

$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15 $16 $17 $18 $19 $20 $21 $22 $23 $24 $25 $26 $27 $28 $29 $30 $31

0

$zero

$t0

$t2

$t4

$t6

$t1

$t3

$t5

$t7 $s0

$s2

$s4

$s6

$s1

$s3

$s5

$s7 $t8 $t9

$gp $sp $fp $ra

$at

$k0 $k1

$v0

$a0

$a2

$v1

$a1

$a3

A doubleword sits in consecutive registers or memory locations according to the big-endian order (most significant word comes first)

When loading a byte into a register, it goes in the low end Byte

Word

Doublew ord

Byte numbering: 0 1 2 3

3

2

1

0

A 4-byte word sits in consecutive memory addresses according to the big-endian order (most significant byte has the lowest address)

Page 40: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 40

Example

 

A Simple MiniMIPS Procedure

Procedure to find the absolute value of an integer.

$v0 |($a0)|

Solution

The absolute value of x is –x if x < 0 and x otherwise.

abs: sub $v0,$zero,$a0 # put -($a0) in $v0; # in case ($a0) < 0 bltz $a0,done # if ($a0)<0 then done add $v0,$a0,$zero # else put ($a0) in $v0 done: jr $ra # return to calling program

In practice, we seldom use such short procedures because of the overhead that they entail. In this example, we have 3-4

instructions of overhead for 3 instructions of useful computation.

Page 41: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 41

Nested Procedure Calls

Example of nested procedure calls.

jal abc

jr $ra

abc Save

Restore

PC Prepare to continue

Prepare to call

main

jal xyz

jr $ra

xyz

Procedure abc

Procedure xyz

Text version is incorrect

Page 42: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 42

2.2.2 Using the Stack for Data Storage

Effects of push and pop operations on a stack.

b a

sp

b a

sp

b a sp

c

Push c Pop x

sp = sp – 4 mem[sp] = c

x = mem[sp] sp = sp + 4

push: addi $sp,$sp,-4 sw $t4,0($sp)

pop: lw $t5,0($sp) addi $sp,$sp,4

Page 43: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 43

Memory Map in MiniMIPS

Overview of the memory address space in MiniMIPS.

Reserved

Program

Stack

1 M words

Hex address

10008000

1000ffff

10000000

00000000

00400000

7ffffffc

Text segment 63 M words

Data segment

Stack segment

Static data

Dynamic data

$gp

$sp

$fp

448 M words

Second half of address space reserved for memory-mapped I/O

$28 $29 $30

Addressable with 16-bit signed offset

80000000

Page 44: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 44

2.2.3 Parameters and Results

Use of the stack by a procedure.

b a

$sp c Frame for current procedure

$fp

. . .

Before calling

b a

$sp

c Frame for previous procedure

$fp

. . .

After calling

Frame for current procedure

Old ($fp)

Saved registers

y z

. . . Local variables

Stack allows us to pass/return an arbitrary number of values

Page 45: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 45

Example of Using the Stack

proc: sw $fp,-4($sp) # save the old frame pointeraddi $fp,$sp,0 # save ($sp) into $fpaddi $sp,$sp,–12 # create 3 spaces on top of stacksw $ra,-8($fp) # save ($ra) in 2nd stack elementsw $s0,-12($fp) # save ($s0) in top stack element . . .lw $s0,-12($fp) # put top stack element in $s0lw $ra,-8($fp) # put 2nd stack element in $raaddi $sp,$fp, 0 # restore $sp to original statelw $fp,-4($sp) # restore $fp to original statejr $ra # return from procedure

Saving $fp, $ra, and $s0 onto the stack and restoring them at the end of the procedure

$fp

$sp($fp)

$fp

$sp($ra)($s0)

Page 46: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 46

2.2.4 Data Types

Data size (number of bits), data type (meaning assigned to bits)

Signed integer: byte wordUnsigned integer: byte wordFloating-point number: word doublewordBit string: byte word doubleword

Converting from one size to another

Type 8-bit number Value 32-bit version of the number 

Unsigned 0010 1011 43 0000 0000 0000 0000 0000 0000 0010 1011Unsigned 1010 1011 171 0000 0000 0000 0000 0000 0000 1010 1011

 Signed 0010 1011 +43 0000 0000 0000 0000 0000 0000 0010 1011Signed 1010 1011 –85 1111 1111 1111 1111 1111 1111 1010 1011

Page 47: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 47

ASCII Characters

ASCII (American standard code for information interchange)

NUL DLE SP 0 @ P ` p

SOH DC1 ! 1 A Q a q

STX DC2 “ 2 B R b r

ETX DC3 # 3 C S c s

EOT DC4 $ 4 D T d t

ENQ NAK % 5 E U e u

ACK SYN & 6 F V f v

BEL ETB ‘ 7 G W g w

BS CAN ( 8 H X h x

HT EM ) 9 I Y i y

LF SUB * : J Z j z

VT ESC + ; K [ k {

FF FS , < L \ l |

CR GS - = M ] m }

SO RS . > N ^ n ~

SI US / ? O _ o DEL

0

1

2

3

4

5

6

7

8

9

a

b

c

d

e

f

0 1 2 3 4 5 6 7 8-9 a-f

More

controls

More

symbols

8-bit ASCII code

(col #, row #)hex

e.g., code for +

is (2b) hex or

(0010 1011)two

Page 48: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 48

Loading and Storing Bytes

Load and store instructions for byte-size data elements.

x x 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0

31 25 20 15 0

lb = 32 lbu = 36 sb = 40

Data register

Base register

Address offset

op rs rt immediate / offset

I 1 1 0 0 0 1 1

Bytes can be used to store ASCII characters or small integers. MiniMIPS addresses refer to bytes, but registers hold words.

lb $t0,8($s3) # load rt with mem[8+($s3)]# sign-extend to fill

reg lbu $t0,8($s3) # load rt with mem[8+($s3)]

# zero-extend to fill reg sb $t0,A($s3) # LSB of rt to mem[A+($s3)]

Page 49: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 49

Meaning of a Word in Memory

A 32-bit word has no inherent meaning and can be interpreted in a number of equally valid ways in the absence of other cues (e.g., context) for the intended meaning.

0000 0010 0001 0001 0100 0000 0010 0000

Positive integer

Four-character string

Add instruction

Bit pattern (02114020) hex

00000010000100010100000000100000

00000010000100010100000000100000

00000010000100010100000000100000

Page 50: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 50

2.2.5 Arrays and Pointers

Index: Use a register that holds the index i and increment the register in each step to effect moving from element i of the list to element i + 1

Pointer: Use a register that points to (holds the address of) the list element being examined and update it in each step to point to the next element

Add 4 to get the address of A[i + 1]

Pointer to A[i] Array index i

Add 1 to i; Compute 4i; Add 4i to base

A[i] A[i + 1]

A[i] A[i + 1]

Base Array A Array A

Stepping through the elements of an array using the indexing method and the pointer updating method.

Page 51: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 51

Selection Sort

One iteration of selection sort.

first

last

max

first

last

first

last

Start of iteration Maximum identified End of iteration

x

x y

y

A A A

To sort a list of numbers, repeatedly perform the following:Find the max element, swap it with the last item, move up the “last” pointer

Page 52: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 52

Selection Sort Using the Procedure max

sort: beq $a0,$a1,done # single-element list is sorted jal max # call the max procedure lw $t0,0($a1) # load last element into $t0 sw $t0,0($v0) # copy the last element to max loc sw $v1,0($a1) # copy max value to last element addi $a1,$a1,-4 # decrement pointer to last element j sort # repeat sort for smaller listdone: ... # continue with rest of program

first

last

max

first

last

first

last

Start of iteration Maximum identified End of iteration

x

x y

y

A A A

Inputs toproc max

Outputs fromproc max

In $a0

In $a1

In $v0 In $v1

Page 53: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 53

2.2.6 Additional Instructions

The multiply (mult) and divide (div) instructions of MiniMIPS.

1 0 0 1 1 0 0

fn

0 0 0 0 0 0 0 0 0 0 0 0 x 0 0 1 1 0 0 0 0 0 0 0 0

31 25 20 15 0

ALU instruction

Source register 1

Source register 2

op rs rt

R rd sh

10 5

Unused Unused mult = 24 div = 26

1 0 0 0 0 0 0 1 0 0

fn

0 0 0 0 0 0 0 0 0 0 0 x 0 0 0 0 0 0 0 0 0 0

31 25 20 15 0

ALU instruction

Unused Unused

op rs rt

R rd sh

10 5

Destination register

Unused mfhi = 16 mflo = 18

MiniMIPS instructions for copying the contents of Hi and Lo registers into general registers .

MiniMIPS instructions for multiplication and division:

mult $s0, $s1 # set Hi,Lo to ($s0)($s1)div $s0, $s1 # set Hi to ($s0)mod($s1)

# and Lo to ($s0)/($s1)mfhi $t0 # set $t0 to (Hi)mflo $t0 # set $t0 to (Lo)

Page 54: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 54

Logical Shifts

The four logical shift instructions of MiniMIPS.

MiniMIPS instructions for left and right shifting:

sll $t0,$s1,2 # $t0=($s1) left-shifted by 2 srl $t0,$s1,2 # $t0=($s1) right-shifted by 2 sllv $t0,$s1,$s0 # $t0=($s1) left-shifted by ($s0) srlv $t0,$s1,$s0 # $t0=($s1) right-shifted by ($s0)

0

x

0 0

fn

0 0 0 0 0 0 0 0 0 0 0 1 0 0 x 0 0 1 1 1 0 0 0 0 0 0 0 0 0

31 25 20 15 0

ALU instruction

Unused Source register

op rs rt

R rd sh

10 5

Destination register

Shift amount

sll = 0 srl = 2

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0

31 25 20 15 0

ALU instruction

Amount register

Source register

op rs rt

R rd sh

10 5 fn

Destination register

Unused sllv = 4 srlv = 6

Page 55: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 55

Unsigned Arithmetic and Miscellaneous Instructions

MiniMIPS instructions for unsigned arithmetic (no overflow exception):

addu $t0,$s0,$s1 # set $t0 to ($s0)+($s1)subu $t0,$s0,$s1 # set $t0 to ($s0)–($s1)multu $s0,$s1 # set Hi,Lo to ($s0)($s1)divu $s0,$s1 # set Hi to ($s0)mod($s1)

# and Lo to ($s0)/($s1)addiu $t0,$s0,61 # set $t0 to ($s0)+61;

# the immediate operand is

# sign extendedTo make MiniMIPS more powerful and complete, we introduce later:

sra $t0,$s1,2 # sh. right arith (Sec. 10.5)srav $t0,$s1,$s0 # shift right arith variablesyscall # system call (Sec. 7.6)

Page 56: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 56

The 20 MiniMIPS Instructions from Chapter 6(40 in all so far)

Instruction Usage

Move from Hi mfhi rd

Move from Lo  mflo rd

Add unsigned addu rd,rs,rt

Subtract unsigned subu rd,rs,rt

Multiply  mult rs,rt

Multiply unsigned multu rs,rt

Divide div rs,rt

Divide unsigned divu rs,rt

Add immediate unsigned addiu rs,rt,imm

Shift left logical sll rd,rt,sh

Shift right logical srl rd,rt,sh

Shift right arithmetic sra rd,rt,sh

Shift left logical variable sllv rd,rt,rs

Shift right logical variable srlv rt,rd,rs

Shift right arith variable srav rd,rt,rd

Load byte lb rt,imm(rs)

Load byte unsigned lbu rt,imm(rs)

Store byte sb rt,imm(rs)

Jump and link jal L

System call syscall

Copy

Control transfer

Shift

Arithmetic

Memory access

op000000009000000

323640

30

fn1618333524252627

02

3 4 6 7

12

Table 6.2 (partial)

Page 57: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 57

Instruction Usage

Move from Hi mfhi rd

Move from Lo  mflo rd

Add unsigned addu rd,rs,rt

Subtract unsigned subu rd,rs,rt

Multiply  mult rs,rt

Multiply unsigned multu rs,rt

Divide div rs,rt

Divide unsigned divu rs,rt

Add immediate unsigned addiu rs,rt,imm

Shift left logical sll rd,rt,sh

Shift right logical srl rd,rt,sh

Shift right arithmetic sra rd,rt,sh

Shift left logical variable sllv rd,rt,rs

Shift right logical variable srlv rd,rt,rs

Shift right arith variable srav rd,rt,rs

Load byte lb rt,imm(rs)

Load byte unsigned lbu rt,imm(rs)

Store byte sb rt,imm(rs)

Jump and link jal L

System call syscall

Instruction Usage

Load upper immediate lui rt,imm

Add  add rd,rs,rt

Subtract sub rd,rs,rt

Set less than slt rd,rs,rt

Add immediate  addi rt,rs,imm

Set less than immediate slti rd,rs,imm

AND and rd,rs,rt

OR or rd,rs,rt

XOR xor rd,rs,rt

NOR nor rd,rs,rt

AND immediate andi rt,rs,imm

OR immediate ori rt,rs,imm

XOR immediate xori rt,rs,imm

Load word lw rt,imm(rs)

Store word sw rt,imm(rs)

Jump  j L

Jump register jr rs

Branch less than 0 bltz rs,L

Branch equal beq rs,rt,L

Branch not equal  bne rs,rt,L

The 37 + 3 MiniMIPS Instructions Covered So Far

Page 58: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 58

Content

1. Introduction - Computer system technology and Computer Performance

2. Instruction Set Architecture

3. Arithmetic for Computer

4. CPU Organization

Page 59: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 59

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 60: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 60

3.1. Introduction

Page 61: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 61

Number Representation

Numbers are normally represented using a positional number system:

– Base/radix: b (the number of digits)

– Digits: 0..(b-1)

• 0 ≤ ai ≤ (b-1)

– Binary: b=2, digits:0,1

– Decimal: b=10, digits: 0,1,2,3,4,5,6,7,8,9

– Octal: b=8, digits: 0,1,2,3,4,5,6,7

– Hexadecimal: b=16, digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

mnnnb aaaaaaaaN ....... 210121)(

Page 62: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 62

Number Representation

mnnnb aaaaaaaaN ....... 210121)(

mm

nn

nn babababababaN

............ 11

00

11

11)10(

n

mi

ii baN .)10(

11101.11(2) = 1x24+1x23+1x22+0x21+1x20+1x2-1+1x2-2= 29.75(10)

Page 63: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 63

Number Representation

Decimal:

– b=10

– Digits: 0,1,2,3,4,5,6,7,8,9

– Eg:

539.45(10) = 5x102+3x101+9x100+4x10-1+5x10-2

mnnn aaaaaaaaN ....... 210121)10( ai = 0..9

n

mi

iiaN 10.)10(

Page 64: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 64

Number Representation

Binary:

– b=2

– Digits: 0,1

– Eg:

1011.011(2) = 1x23 + 0x22 + 1x21 + 1x20 + 0x2-1 + 1x2-2 + 1x2-3

mnnn aaaaaaaaN ....... 210121)2( ai = 0,1

bit – binary digit

n

mi

iiaN 2.)10(

Page 65: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 65

Number Representation

Binary (cnt’)

– n-bit binary number can represent which range?

• an-1...a1a0 from 0 to 2n-1

– MSB – Most Significant Bit

– LSB – Least Significant Bit

0001 = 1 0110 = 6 1011 = 11

0010 = 2 0111 = 7 1100 = 12

0011 = 3 1000 = 8 1101 = 13

0100 = 4 1001 = 9 1110 = 14

0101 = 5 1010 = 10 1111 = 15

0121)2( ... aaaaN nn

MSB LSB

Page 66: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 66

Number Representation

Octal:

– b=8

– Digits: 0,1,2,3,4,5,6,7

– Eg:

503.071(8) = 5x82 + 0x81 + 3x80 + 0x8-1 + 7x8-2 + 1x8-3

mnn aaaaaaaN ....... 21011)8(

ai = 0..7

Hexadecimal:

– b=16

– Digits: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

– Eg:

503.071(16) = 5x162 + 0x161 + 3x160 + 0x16-1 + 7x16-2 + 1x16-3

ai = 0..F

mnn aaaaaaaN ....... 21011)16(

Page 67: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 67

Convert from base b to base 10

Base b to base 10 conversion

Eg:

– 1010.11(2)= 1x23+0x22+1x21+0x20+1x2-1+1x2-2=10.75(10)

– 1010.11(8)=?

– A12(16)=?

mnnnb aaaaaaaaN ....... 210121)(

mm

nn

nn babababababaN

............ 11

00

11

11)10(

Page 68: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 68

Convert from base 10 to base b

Base 10 to base b conversion

– For integer part:

• Divide integer part by b until the result is 0

• Write remainders in reverse order to get the converted result.

– For the odd part after “.”

• Multiply by b until the result is 0

Page 69: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 69

Convert from base 10 to base 2

Eg1: 6.625(10) = ?(2)

– The integer part

Eg2: 120.5625(10) = 1111000.1001(2)

6 2

0 3 2

1 1 2

1 0

– The odd part after “.”

• 0.625 x 2 = 1.25

• 0.25 x 2 = 0.5

• 0.5 x 2 = 1.0

6.625(10) = 110.101(2)

Page 70: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 70

Convert from base 2 to base 2n

Group from right to left n-bit groups and replace the equivalent values in base 2n

Eg:

101011(2) = ?(8) 1010.110(2)=12.6(8)

101011(2) = ?(16) 1010.110(2)=A.C(16)

Page 71: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 71

Convert from base 2n to base 2

Each digit in base 2n is replaced by n bit in base 2.

Eg:

37A.B(16)=?(2)

Page 72: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 72

Convert from base i to base j

If both i and j are powers of 2, use base 2 as an intermediate base:

– Eg: base 8 base 2 base 16

– 735.37(8)=?(16)

Else, use base 10 as an intermediate base:

– Eg: base 5 base 10 base 2

Page 73: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 73

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 74: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 74

3.2. Unsigned Numbers

The general form of signed number A:

an-1an-2...a2a1a0

Value of A:

Range of representation:

– Use n bit to represent 2’s complement numbers

– Range: 0 => 2n-1

1

0

00

11

22

11

2

22...22n

i

ii

nn

nn

aA

aaaaA

Page 75: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 75

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 76: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 76

3.3. Signed Numbers

1’s complement and 2’s complement number

– A binary integer A is represented by n bit:

• 1’s complement number of A is (2n - 1) – A

• 2’s complement number of A is 2n - A

• Notes: 2’s complement number of A = 1’s complement number + 1

– Eg:

• n=8, A = 00110101

• 1’s complement number of A is (28 - 1) - 00110101=

• 2’s complement number of A is 28 - 00110101=

Page 77: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 77

2’s complement representation of signed numbers

Most left bit is sign bit

Positive and 0 numbers are expressed in usual binary format.

– The largest number can be represented is 2n-1-1

– n=8 => largest signed number: 28-1-1 = 127

Negative number a is stored as the binary equivalent of 2n-A in one n-bit system.

– -3 is stored as 28-3=11111101 in a 8-bit system

– The most negative number can be stored is -2n-1

Page 78: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 78

2’s complement representation of signed numbers

The general form of signed number A:

an-1an-2...a2a1a0

Value of A:

Range of representation:

– Use n bit to represent 2’s complement numbers

– Range: -2n-1 => 2n-1-1

2

0

11 22

n

i

ii

nn aaA

Page 79: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 79

2’s complement representation of signed numbers

+10 = 0000 1010

- 10 = 28-10 = 1 0000 0000

– 0000 1010

1111 0110

- 10 = 1111 0110

+10 + (-10) = ?

Page 80: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 80

MIPS signed number representation

32 bit signed numbers:

0000 0000 0000 0000 0000 0000 0000 0000two = 0(10)

0000 0000 0000 0000 0000 0000 0000 0001two = + 1(10)

0000 0000 0000 0000 0000 0000 0000 0010two = + 2(10)

...0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646(10)

0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647(10)

1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648(10)

1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647(10)

1000 0000 0000 0000 0000 0000 0000 0010two = – 2,147,483,646(10)

...1111 1111 1111 1111 1111 1111 1111 1101two = – 3(10)

1111 1111 1111 1111 1111 1111 1111 1110two = – 2(10)

1111 1111 1111 1111 1111 1111 1111 1111two = – 1(10)

Page 81: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 81

2’s complement representation of signed numbers

Procedure to find binary representation of negative number in 2’s complement:

– Find the binary equivalent of the magnitude

– Complement each bit (0=>1, 1=>0)

– Add 1

Eg: find representation of -13 in 8-bit signed number system using 2’s complement:

• Magnitude: 13 = 0000 1101

• 1’s complement: 1111 0010

• Add 1: + 1

• -13 = 1111 0011

Page 82: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 82

2’s complement representation of signed numbers

To find the magnitude of a negative number:

– Complement each bit

– Add 1

Eg:

-5 11111011 -1 11111111

00000100 00000000

1 1

5 00000101 1 00000001

Page 83: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 83

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 84: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 84

3.4. Addition & Subtraction

3.4.1. Addition of unsigned numbers

3.4.2. Addition of signed numbers

3.4.3. Subtraction of signed numbers

Page 85: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 85

3.4.1. Addition of Unsigned Numbers

Unsigned binary addition similar to decimal addition.

decimal binarycarry 1100 11110

A 2565 10110

B 6754 11011

sum 9319 110001

Eg: 10101(2) + 11011(2) = ? (2)

Page 86: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 86

Addition of Unsigned Numbers

Overflow

– Occur when the result of addition is out of range of representation (the result can not be stored in the predefined number of bits)

– Eg:

X = 1001 0110 = 150

Y = 0001 0011 = 19

S = 1010 1001 = 169

Cout = 0

Overflow occurs when Cout = 1

X = 1100 0101 = 197

Y = 0100 0110 = 70

S = 0000 1011=11 267

Cout = 1 carry-out

Page 87: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 87

3.4.2. Addition of Signed Numbers

The reason that 2’s complement is so popular is the simplicity of addition.

To add any two numbers, no matter what the sign of each is, we just do binary addition on their representation.

-5 1011 -5 1011 -5 1011

+7 0111 +5 0101 +3 0011

+2 0010 0 0000 -2 1110

Page 88: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 88

Addition of Signed Numbers

Overflow

– Occur when the result of addition is out of range of representation (the result can not be stored in the predefined number of bits)

– Occur when?

• Add two numbers of the opposite sign?

• Add two positive numbers?

• Add two negative numbers?

Overflow occurs when adding two numbers with the same sign and the result is in different sign

Page 89: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 89

3.4.3. Subtraction of Signed Numbers

Principle:

– Subtraction is addition of negative number.

a – b = a + (-b)

Eg: 7 – 5 = ?5 0101 7 0111

1010 -5 +1011

+ 1 2 0010

-5 1011

Page 90: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 90

Overflow when adding or subtracting signed numbers:

Page 91: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 91

Addition & Subtraction in MIPS

Unsigned integers are commonly used for memory addresses where overflow ignored, so MIPS provide two kinds of addition and subtraction:

– add, addi, sub cause exception when overflow

– addu, addiu, subu do not cause exception when overflow

– MIPS has a register called exception program counter (EPC) to store the address of the instruction that caused the exception.

– Instruction mfc0 is used to copy EPC into a general purpose register

mfc0 s1,epc

Page 92: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 92

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 93: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 93

3.5. Multiplication

3.5.1. Multiplication of unsigned numbers

3.5.2. Multiplication of signed numbers

Page 94: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 94

3.5.1. Multiplication of unsigned numbers

Multiplication of two n-bit unsigned numbers, the product is one 2n-bit unsigned number.

1000 Multiplicand (+8) x 1001 Multiplier (+9) 1000 0000 000010001001000 Product (+72)

Page 95: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 95

Multiplication implementation - 1st version

Multiplicand, ALU, product are 64 bit

Multiplier is 32 bit

Page 96: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 96

Multiplication algorithm

If each step take one clock cycle, this multiplication algorithm will require almost 100 clock cycles to multiply two 32-bit numbers.

Page 97: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 97

Multiplication implementation – 2nd version

Multiplicand, ALU, multiplier are 32 bit

Product is 64 bit

Page 98: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 98

3.5.2. Multiplication of signed numbers

Use unsigned multiplication:

– Convert multiplicand & multiplier to positive numbers

– Multiply using unsigned multiplication algorithm

– Change the sign of product:

• If the signs of multiplicand and multiplier are the same, the product is the result of step 2.

• If the signs disagree, the product is two’s complement of the result of step 2.

Page 99: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 99

Faster multiplication

Page 100: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 100

Multiply in MIPS

Product is stored in a pair of 32-bit special registers, called Hi & Lo

To produce properly signed and unsigned product, MIPS has two multiplication instructions:

– mult – multiply signed

– multu – multiply unsigned

To fetch the 32-bit product, MIPS provide two instructions:

– mfhi

– mflo

Page 101: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 101

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 102: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 102

3.6. Division

3.6.1. Division of unsigned numbers

3.6.2. Division of signed numbers

Page 103: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 103

3.6.1. Division of unsigned numbers

Divide 1001010(10) by 1000(10)

Page 104: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 104

Division implementation – 1st version

Divisor, ALU, remainder are 64 bit

Quotient is 32 bit

Page 105: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 105

Division algorithm

Page 106: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 106

Division implementation – 2nd version

Page 107: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 107

3.6.2. Division of signed numbers

Use unsigned division algorithm:

– Convert dividend and divisor to positive numbers

– Divide using unsigned division algorithm

– Change the sign of product:

• Dividend Divisor Quotient Remainder

+ + Keep Keep

+ - Negate Keep

- + Negate Negate

- - Keep Negate

Page 108: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 108

Faster division

Page 109: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 109

Divide in MIPS

Remainder is stored in Hi, quotient is stored in Lo

To produce properly signed and unsigned result, MIPS has two division instructions:

– div – divide signed

– divu – divide unsigned

To fetch the 32-bit result, MIPS provide two instructions:

– mfhi

– mflo

Page 110: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 110

3. Arithmetic for Computer

1. Introduction

2. Unsigned Numbers

3. Signed Number

4. Addition and Subtraction

5. Multiplication

6. Division

7. Floating Point

Page 111: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 111

3.7. Floating Point

Floating-point numbers can be represented in the form:

(-1)sx1.xxxxx(2)x2yyyy

s: sign (s=0 => positive, s=1 => negative)

x,y={0,1}

15.75(10) =1111.11(2) = 1.11111x23

Page 112: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 112

Floating-point number representation

Page 113: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 113

IEEE 754 standard

To represent floating-point numbers

Base b=2

Three basic forms:

– Single-precision number representation, 32 bit

– Double-precision number representation, 64 bit

– Extended-precision number representation, 80 bit

Formats:

S me

79 63 078 64

S me

31 22 030 23

S me

63 51 062 52

Page 114: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 114

IEEE 754 standard

X = (-1)S x 1.m x 2e-b s: sign bit (s=0 => positive, s=1 => negative) e: excess b: bias

– Single-precision 32-bit : b = 127

– Double-precision 64-bit : b = 1023

– Extended-precisioin 80-bit : b = 16383

S me

79 63 078 64

S me

31 22 030 23

S me

63 51 062 52

Page 115: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 115

IEEE 754 standard

Example: One real number X is represented using IEEE 754 standard with the following format:

1100 0001 0101 0110 0000 0000 0000 0000

Show the decimal value of X

Solution: 1100 0001 0101 0110 0000 0000 0000 0000

X = (-1)S x 1.m x 2e-b

= (-1)1 x 1.1010110 x 2130-127

= -1.101011 x 23= -1101.011 = -13.375(10)

s=1 e=130 m

Page 116: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 116

IEEE 754 standard

Example: One real number X is represented using IEEE 754 standard with the following format:

0011 1111 1000 0000 0000 0000 0000 0000

Show the decimal value of X

Solution:

X = (-1)S x 1.m x 2e-127

= (-1)0 x 1.0 x 2127-127

= 1

Page 117: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 117

IEEE 754 standard

Example: Represent the real number -19.7890625(10) using 32 bit IEEE 754 standard

19.7890625 = 10011.1100101(2) = 1.00111100101 x 24

-19.7890625 = (-1)1 x 1.00111100101 x 2131-127

= 1 1000 0011 001111001010...0

Page 118: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 118

Special convention

Page 119: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 119

Range of Representation

32 bit: a = 2-127 ≈ 10-38 b = 2+127 ≈ 10+38

64 bit: a = 2-1023 ≈ 10-308 b = 2+1023 ≈ 10+308

80 bit: a = 2-16383 ≈ 10-4932 b = 2+16383 ≈ 10+4932

-0 +0-a b-b a

underflowoverflow overflow

¥ ¥

Page 120: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 120

Floating-point addition

Page 121: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 121

Block

diagram

of

an arithmetic

unit

dedicated to

floating-point

addition

Page 122: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 122

Floating-point multiplication

Page 123: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 123

Floating-point instructions in MIPS

Operations Single Double

Addition add.s add.d

Subtraction sub.s sub.d

Multiplication mul.s mul.d

Division div.s div.d

Comparison c.x.s c.x.d

bclt Branch if true

bclf Branch if false

x = eq, neq, lt, le, gt, ge

Page 124: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 124

Floating-point instructions in MIPS

MIPS has dedicated registers and instructions used for floating-point operations:

– 32 floating-point registers: $f0,$f1,$f2...,$f31

– Separate load and store for floating-point registers: lwc1, swc1,

lwc1 f4,0(sp) # Load 32 bit F.P number into f4

lwc1 f6,4(sp) # Load 32 bit F.P number into f6

add.s f2,f4,f6 # f2 = f4 + f6

swc1 f2,8(sp) # Store F.P number from f2

Page 125: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 125

Floating-point instructions in MIPS

A double precision register is combination of an even-old pair of single precision registers, using even register as its name

lwc1 f4,0(sp) # Load upper part of 64 bit F.P number into f4

lwc1 f5,4(sp) # Load lower part of 64 bit F.P number into f5

lwc1 f6,8(sp) # Load upper part of 64 bit F.P number into f6

lwc1 f7,12(sp) # Load lower part of 64 bit F.P number into f7

add.d f2,f4,f6 # (f2,f3) = (f4,f5) + (f6,f7)

swc1 f2,16(sp) # Store upper part of 64 bit F.P number from f2

swc1 f2,20(sp) # Store upper part of 64 bit F.P number from f2

Page 126: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 126

MIPS floating-point assembly language

Page 127: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 127

Content

1. Introduction - Computer system technology and Computer Performance

2. Instruction Set Architecture

3. Arithmetic for Computer

4. CPU Organization

Page 128: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 128

4. CPU Organization

4.1. Building a datapath

4.2. Single cycle implementation

4.3. Multi cycle implementation

4.4. Exceptions

4.5. Pipelining

Page 129: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 129

4.1. Building a datapath

A basic MIPS implementation

Implement a subset of the core MIPS instruction set:

– Memory-reference instructions: lw, sw

– Arithmetic-logical instructions: add, sub, and, or, slt

– Branch instructions: beq, j

Page 130: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 130

5 bits 5 bits

31 25 20 15 0

Opcode Source register 1

Source register 2

op rs rt

R 6 bits 5 bits

rd

5 bits

sh

6 bits

10 5 fn

Destination register

Shift amount

Opcode extension

Immediate operand or address offset

31 25 20 15 0

Opcode Destination or data

Source or base

op rs rt operand / offset

I 5 bits 6 bits 16 bits 5 bits

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

31 0

Opcode

op jump target address

J Memory word address (byte address divided by 4)

26 bits

25

6 bits

Review of instruction classes

Page 131: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 131

A basic MIPS implementation

For every instruction, the first two steps are identical:

– Send the program counter PC to instruction memory and fetch the instruction from that memory.

– Read one or two registers, using fields of the instruction to select the registers to read.

– Instructions class except for jump use ALU:

• Memory-reference instructions use ALU for address calculation

• Arithmetic-logical instructions use ALU for operation execution

• Branch instructions use ALU for comparison

– Next actions differ according to instructions:

• Memory-reference instructions: access memory to write / read

• Arithmetic-logical instructions: write data from ALU back to registers

• Branch instructions: change next instruction address based in comparison

Page 132: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 132

Abstract view of a basic MIPS implementation – v1.0

1

2

3

Page 133: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 133

Abstract view of a basic MIPS implementation – v1.0

Drawback in version 1.0 of the basic MIPS implementation:

– In several places, data going into a particular unit as coming from two different sources.

• Value written into PC can come from one of two adders.

• Data written into registers can come from either ALU or date memory

– Several of the units must be controlled depending on the type of instruction.

• Data memory must read on load and write on store.

• Registers must be written on a load and an arithmetic-logical instruction.

Page 134: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 134

Abstract view of a basic MIPS implementation – v2.0

Page 135: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 135

Abstract view of a basic MIPS implementation – ver 2

Benefits:

– Simple, easy to understand

– Single-cycle datapath

Requirement:

– Instruction memory and data memory are separate because:

• Format of data and instruction is different

• Having separate memories is less expensive

• The processor operates in one cycle and cannot use single-portet memory for two different access within that cycle.

Page 136: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 136

Logic design conventions

To design the machine, how the logic implementing the machine will operate and how the machine is clocked needed to be decided.

MIPS design consists of two types of logic elements:

– Combinational elements

– Sequential elements:

• Flip-flops, memories, registersPlease revise

Digital Logic Design

Page 137: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 137

Building a datapath

Fetching instructions and incrementing PC

Implement R-format ALU operations

Loading and storing data

Branching with beq

Page 138: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 138

Fetching instructions and incrementing PC

Read data from instruction memory to output

PC = PC + 4

Page 139: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 139

Implement R-format ALU operations

R-format ALU instructions: add, sub, and, or, slt

add s1,s2,s3 # s1 = s2 + s3

1. Read s2,s3 from register file according to two register numbers to outputs

2. ALU execute operation add

3. Write result back to register s1, need write control signal

Page 140: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 140

Loading and storing data

lw t1, offset_value(t2)

sw t1, offset_value(t2)

1. Compute memory address by adding the base register t2 with 16-bit sign extended field offset_value.

2. Read data from register t1 to write to calculated address in data memory

or Read data from calculated

address in data memory to

write to register t1.

Page 141: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 141

Branching with beq

beq t1,t2,offset

Compute branch target address by adding sign-extended offset to PC

– Two notes in the definition of branch instruction:

• The instruction set architecture specifies that the base for branch address calculation is the address of the instruction following the branch. So we can always compute PC+4 in fetching period.

• The offset field is shifted left 2 bits so that it’s a word offset.

Use ALU to evaluate branch condition:

– Read two registers t1, t2 from register file to inputs of ALU

– Subtract two inputs of ALU, assert control signal

Page 142: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 142

Branching with beq

Page 143: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 143

Creating a single datapath

Single datapath:

– Execute every instruction in one clock cycle.

– No resource used more than once per instruction, so any elements needed more than once must be duplicated.

• Separate memories for instruction and data.

Page 144: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 144

A single datapath for memory and R-type instructions

Page 145: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 145

A single datapath for a basic MIPS

Control signals are not connected

Page 146: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 146

4. CPU Organization

4.1. Building a datapath

4.2. Single cycle implementation

4.3. Multi cycle implementation

4.4. Exceptions

4.5. Pipelining

Page 147: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 147

4.2. Single cycle implementation

ALU Control

Main Control Unit

Page 148: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 148

ALU Control

ALU has four control inputs:

Depend on the instruction class, one of first five functions of ALU will be performed (NOR is needed for other parts):

– Load, store instructions: use ALU to compute memory address by addition

– R-type instructions: ALU performs one of five actions (add, sub, and, or, slt) based on 6-bit function field in the instruction.

– Branch beq: ALU performs subtraction

Page 149: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 149

ALU control inputs

Truth table which uses don’t care values to have compact minimization form

Page 150: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 150

Designing the Main Control Unit

Page 151: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 151

Three instruction classes

The opcode is always contained in bits 31:26. We refer to this field as Opcode[5:0]

Two registers to be read (R-type, beq, sw) are always at 25:21 and 20:16

The base register for load and store instruction is always in bit 25:21

16 bit offset for load/store/beq is always in 15:0

The destination register is in one of two places:

– For load, it’s in 20:16

– For R-type, it’s in 15:11

Page 152: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 152

A simple datapath with all control lines identified

Page 153: Computer Architecture (Kiến trúc máy tính)

153Single datapathwith control unit

Page 154: Computer Architecture (Kiến trúc máy tính)

154Single control & datapathextended to handle jump

Page 155: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 155

Why a single-cycle implementation is not used today?

Although single-cycle design will work correctly, it will not be used in modern designs because it’s inefficient:

– Clock cycle must have the same length for every instruction in this single-cycle design => Clock cycle is determined by the longest possible path in the machine.

• Longest instruction: Load instruction: uses 5 functional units in series: instruction memory, register file, ALU, data memory, register file.

=> Not good since several instruction could fit in a shorter clock cycle

– Some functional units (ALU) must be duplicated

=> Hardware cost

Page 156: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 156

4. CPU Organization

4.1. Building a datapath

4.2. Single cycle implementation

4.3. Multicycle implementation

4.4. Exceptions

4.5. Pipelining

Page 157: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 157

4.3. Multicycle implementation

Multicycle implementation:

– Break each instruction into a series of steps corresponding to the functional unit operations needed.

• Each step in the execution will take one clock cycle. Each instruction will take different numbers of clock cycles.

• Allow a functional unit to be used more than once per instruction => hardware share.

Page 158: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 158

High-level view of multicycle datapath

Key elements:

– A shared memory unit for both instructions & data

– A single ALU

– Require additional registers: IR, Memory data register, A,B, ALUout

– Require additional multiplexers

Page 159: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 159

Multicycle implementation

At the end of a clock cycle, all data used in subsequent clock cycle must be stored in a state element:

– Data used by subsequent instructions in a later clock cycle is stored in one of the programmer-visible state elements: register file, PC, memory

– Data used by the same instruction in a later clock cycle is stored in one of additional registers.

One clock cycle can accommodate at most one of the following operations: A memory access, A register file access (two reads and one write), An ALU operation

=> Any data produced by these three functional units must be saved into a temporary register for use in later cycle. If not saved, timing race.

Page 160: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 160

Additional registers

Instruction Register (IR) and Memory data register (MDR) are added to save the output of memory for instruction read and a data read. Two separate registers are used since both values are needed during the same clock.

A,B registers are used to hold values read from register file.

ALUout register holds the output of ALU.

Page 161: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 161

Additional multiplexers

Replacing three ALUs of single-cycle datapath by a single ALU requires:

– An additional multiplexer: {A,PC} => the first ALU input.

– An additional multiplexer: {B,4,Sign extend, shift left 2} => the second ALU input

Page 162: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 162

Multicycle datapath with control lines

Page 163: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 163

Branch and jump instructions

With jump and branch instructions, three possible sources of values to be written into PC:

– The output of ALU, PC + 4 => PC directly

– Register ALUout – the address of branch target after it is computed

– Address of jump target = Lower 26 bit of IR << 2 concatenated with 4 upper bits of the incremented PC

Page 164: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 164

Complete datapath for multicycle implementation

Page 165: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 165

5 bits 5 bits

31 25 20 15 0

Opcode Source register 1

Source register 2

op rs rt

R 6 bits 5 bits

rd

5 bits

sh

6 bits

10 5 fn

Destination register

Shift amount

Opcode extension

Immediate operand or address offset

31 25 20 15 0

Opcode Destination or data

Source or base

op rs rt operand / offset

I 5 bits 6 bits 16 bits 5 bits

0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0

31 0

Opcode

op jump target address

J Memory word address (byte address divided by 4)

26 bits

25

6 bits

Page 166: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 166

ftp address to download materials

ftp://dce.hut.edu.vn/kiennt/

Page 167: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 167

Action of the 1-bit control signals

Page 168: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 168

Action of the 2-bit control signals

Page 169: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 169

Breaking instruction execution into clock cycles

Each MIPS instruction needs from three to five of these steps

– 1. Instruction fetch step.

– 2. Instruction decode and register fetch step.

– 3. Execution, memory address computation or branch completion.

– 4. Memory access or R-type instruction completion step.

– 5. Memory read completion step.

Same for all instructions

Page 170: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 170

1. Instruction fetch step

Fetch the instruction from memory and compute the address of the next sequential instructions:

– IR <= Memory[PC];

• Assert the control signals MemRead and IRWrite and set IorD = 0 to select PC as the source of address.

– PC <= PC + 4;

• ALUSrcA = 0 (sending PC to ALU input)

• ALUSrcB = 01 (sending 4 to ALU input)

• ALUOp = 00 (to make ALU add)

• PCSource = 00 (to select output of ALU addition as source)

• assert PCWrite to write back to PC

Page 171: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 171

2. Instruction decode and register fetch step

Read two registers in rs and rt instruction fields and compute the branch target address with ALU

– A <= Reg[IR[25:21]]

– B <= Reg[IR[20:16]]

– ALUOut <= PC + (sign-extend (IR[15:0])<<2

• ALUSrcA = 0 to select PC as ALU input

• ALUSrcB = 11 to select sign-extended and shifted offset field as ALU input

• ALUOp = 00 to make ALU add.

Are these operations necessary for all

instructions?

Page 172: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 172

3. Execution, memory address computation or branch completion step

This is the first cycle during which the datapath operation is determined by the instruction class

– Memory reference

• ALUOut <= A + sign-extend (IR[15:0])

– Arithmetic-logical instruction (R-type)

• ALUOut <= A op B

– Branch

• if (A==B) PC <= ALUOut

– Jump

• PC <= {PC[31:28], (IR[25:0],2’b00)};

Page 173: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 173

4. Memory access or R-type instruction completion

During this step, a load or store instruction accesses memory and an arithmetic-logical instruction writes its result.

– Memory reference:

• MDR <= Memory [ALUOut];

• or Memory [ALUOut] <= B;

– Arithmetic-logical instruction:

• Reg[ IR[15:11] ] <= ALUOut;

Page 174: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 174

5. Memory read completion step

During this step, loads complete by writing back the value from memory

– Reg [IR[20:16]] <= MDR;

Page 175: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 175

Summary of steps taken to execute any instruction class

Page 176: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 176

Defining the control

We use state machine to describe the operation of MIPS

Page 177: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 177

Figure 5.32. FSM for instruction fetch and decode

Page 178: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 178

Figure 5.33. FSM for controlling memory-reference instructions

Page 179: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 179

Figure 5.34. FSM for R-type instruction

Page 180: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 180

Figure 5.35. FSM for branch instruction

Page 181: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 181

Figure 5.36. FSM for jump instruction

Page 182: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 182

Complete FSM

Page 183: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 183

4. CPU Organization

4.1. Building a datapath

4.2. Single cycle implementation

4.3. Multicycle implementation

4.4. Exceptions

Page 184: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 184

4.4. Exceptions

Exception:

– Is an unexpected event from within the processor

– Ex: arithmetic overflow, using an undefined instruction...

Interrupt:

– Is an event that causes an unexpected change in control flow but comes from outside of the processor.

– Interrupts are used by IO devices to communicate with the processor.

– Ex: timer interrupt...

Page 185: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 185

How exceptions are handled?

Two types of exceptions which our current implementation can generate:

– Execution of an undefined instruction

– Execution of an arithmetic overflow

Basic action of machine when an exception occurs:

– Save address of offending instruction in the Exception Program Counter (EPC).

– Transfer control to the OS at some specific address.

• The OS can then take appropriate action which may involve providing some service to the user program, taking predefined action in response to and overflow, or stopping the execution of program and reporting errors.

• Then the OS can terminate the program or may continue its execution, using EPC to determine the address to restart the program.

Page 186: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 186

How exceptions are handled?

We can implement the processing required for exception by adding a few extra registers and control signals to our basic implementation and by slightly extending the FSM.

– Two additional registers:

• EPC: 32-bit register used to hold address of the affected instruction

• Cause: 32-bit register used to record the cause of exception

– Cause register = 0: undefined instruction

– Cause register = 1: arithmetic overflow

– Two control signals used to cause EPC and Cause register to be written: EPCWrite, CauseWrite.

– One-bit signal to set value 0/1 to Cause register.

– Write exception address of handling code to PC (8000 0180(16))

Page 187: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 187

The multicycle datapath with exception handling

Page 188: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 188

How control checks for exceptions?

Undefined instruction exception:

– This is detected when no next state is defined from state 1 for op value

– We handle this exception by defining the next-state value for all op values rather than lw, sw, 0 (R-type), j and beq as state 10.

Arithmetic overflow exception:

– The ALU designed includes logic to detect overflow and a signal called Overflow is provided as an output from ALU. This signal is used to specify additional possible next state (state 11) for state 7.

Page 189: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 189

FSM

with

exception

detection

Page 190: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 190

Midterm exam (70’)

Using multicycle datapath implementation of MIPS, explain the operation of the following instructions:

– a. load/store t1,15(s0)

– b. add/sub/and/or/slt s0,t1,t2

– c. beq t1,t2,Label

– d. jump Label

Page 191: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 191

4. CPU Organization

4.1. Building a datapath

4.2. Single cycle implementation

4.3. Multicycle implementation

4.4. Exceptions

4.5. Pipelining

Page 192: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 192

4.5. Pipelining

Introduction

Pipelined Datapath

Pipelined Control

Pipelined Hazards

Page 193: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 193

4.5. Pipelining

Introduction

Pipelined Datapath

Pipelined Control

Pipelined Hazards

Page 194: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 194

Introduction

Pipelining is an implementation technique used to enhance performance in which multiple instructions are overlapped in execution.

The idea of pipelining is: Divide an instruction into smaller steps which can be executed concurrently.

Page 195: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 195

Introduction

MIPS instructions classically take five steps:

– 1. Fetch instruction from memory.

– 2. Read registers while decoding the instruction. The format of MIPS instruction allow reading and decoding to occur simultaneously.

– 3. Execute the operation or calculate the address.

– 4. Access an operand in memory.

– 5. Write the result into a register.

Page 196: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 196

Pipelining

Inst 1Inst 2Inst 3

Inst 4Inst 5

Inst 6

Inst 7

1 2 3 4 5 6 7 8 9 10 11 12

FI DI EX FO WR

Page 197: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 197

Single-cycle versus Pipelined performance

In this example, we limit our attention to 8 instructions: lw, sw, add, sub, and, or, slt, beq.

Total time for each instruction:

Page 198: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 198

Single-cycle versus Pipelined performance

Page 199: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 199

Pipelined performance

If the stages are perfectly balanced, then the time between instructions in pipelined processor when the number of instruction is large is equal to:

Page 200: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 200

4.5. Pipelining

Introduction

Pipelined Datapath

Pipelined Control

Pipelined Hazards

Page 201: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 201

A pipelined datapath

The division of an instruction into five stages means a five-stage pipeline, which in turn means five instructions will be in execution during any single clock cycle:

Page 202: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 202

Pipelined execution in single-cycle datapath of MIPS

Because each resource is used during only one of the five stages of an instruction, allowing it to be shared by other instructions during the other four stages.

=> To retain the value of an individual instruction for its other four stages, the value must be saved in a register.

Page 203: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 203

Pipelined datapath of MIPS

RegistersRegisters must be wide enough to store all data corresponding to the lines that go through them. IF/ID:64, ID/EX:128, EX/MEM:97,MEM/WB:64

Page 204: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 204

How are portions of datapath used during an instruction?

– Example of a load instruction

Page 205: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 205

IF: first stage of an instruction

Page 206: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 206

ID: second stage of an instruction

Page 207: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 207

EX: third stage of an instruction

Page 208: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 208

MEM: forth stage of an instruction

Page 209: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 209

WB: fifth stage of an instruction

Write register number needed to be saved for WB stage

Page 210: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 210

Corrected datapath to handle a load instruction

Page 211: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 211

4.5. Pipelining

Introduction

Pipelined Datapath

Pipelined Control

Pipelined Hazards

Page 212: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 212

Pipelined datapath of MIPS with control

For pipelined datapath, we can divide control lines into five groups according to the pipeline stage:

– IF: The control signals to read instruction memory and to write the PC are always asserted, so there is nothing special to control in this pipeline stage.

– ID: As in the previous stage, the same thing happens at every clock cycle, so there are no optional control lines to set.

– EX: The signals to be set are RegDst, ALUOp, ALUSrc. The signals select Result register, the ALU operation, and either read data 2 or a sign-extended immediate for ALU.

– MEM: The control lines set in this stage are Branch, MemRead, MemWrite. These signals are set by the branch equal, load, store instructions.

– WB: two control lines are MemtoReg, which decides between sending the ALU result or the memory value to the register file, and RegWrite, which writes the chosen value.

Page 213: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 213

Pipelined datapath of MIPS with control

Page 214: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 214

9 control lines for final three stages

Four control lines for EX stageThree control lines for MEM stageTwo control lines for WB stage

Page 215: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 215

The pipelined datapath with control signals connected

Page 216: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 216

Designing Instruction sets for pipelining

The design of MIPS was designed for pipeline execution:

– 1. All MIPS instructions are the same length.

» Easy to fetch instruction in 1st stage and decode it in 2nd stage.

– 2. MIPS only has a few instruction formats, with the source register fields being located in the same place in each instruction.

» In 2nd stage, register file can be read at the same time that hardware is deciding what type of instruction was fetched.

– 3. Memory operands only appear in load and store in MIPS.

– 4. Operands are aligned in memory.

– RISC – Reduced Instruction Set Computer

– CISC – Complex Instruction Set Computer

Page 217: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 217

4.5. Pipelining

Introduction

Pipelined Datapath

Pipelined Control

Pipelined Hazards

Page 218: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 218

Pipeline Hazards

There are situations in pipelining when the next instruction can not execute in the following clock cycle => hazards.

Three types of hazards:

– Structural Hazards

– Data Hazards

• Forwarding

• Stall

– Control Hazards

Page 219: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 219

Hazards

Structural Hazards

Data Hazards

– Forwarding

– Stall

Control Hazards

Page 220: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 220

Structural Hazards

Structural Hazards occur when the hardware cannot support the combination of instructions that we want to execute in the same clock cycle.

MIPS is designed to avoid structural harzards when designing a pipeline.

– Ex: suppose we have a single memory instead of two mem (instruction memory + data memory)

Conflict when both read memory

Page 221: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 221

Hazards

Structural Hazards

Data Hazards

– Forwarding

– Stall

Control Hazards

Page 222: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 222

Data Hazards

Data Hazards occur when the pipeline must be stalled because one step must wait for another to complete.

– In a computer pipeline, data hazards arise from the dependence of one instruction on an ealier one that is still in pipeline.

add s0, t0, t1

sub t2, s0, t3

# add instruction doesn’t write its result to s0 until fifth stage

# sub instruction read s0 register in second stage

Solution:

– Observation: we don’t need to wait for instruction to complete before trying to resolve the data hazards. For code above, as soon as the ALU creates sum for addition, we can supply it as input for substraction.

– Adding extra hardware to to retrieve missing item early from the internal resources is called forwading or bypassing.

Page 223: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 223

Representation of instruction pipeline

IF: Instruction fetch ID: Instruction decode + register file read EX: Execution MEM: Memory access WB: Write back

The shading indicates the element is used by instruction– MEM: white => add doesn’t access memory

– Shading on the right half: Read

– Shading on the left half: Write

Page 224: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 224

Hazards

Structural Hazards

Data Hazards

– Forwarding

– Stall

Control Hazards

Page 225: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 225

Data hazards and forwarding

Instead of waiting until the fifth stage of add instruction for the result in s0 register, forwarding uses extra hardware to write the output of EX stage of add instruction to the input of EX stage for sub instruction. (extra hardware to create connection)

Page 226: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 226

Data hazards and forwarding

Forwarding paths are valid if only the destination stage is later in time than the source stage.

– Ex: lw s0,20(t1)

sub t2,s0,t3

there cannot be a valid forwarding path from the output of memory access stage in the first instruction to the input of the execution stage of the following , since that would mean going backward in time.

pipeline stall

Page 227: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 227

Data hazards and forwarding

Pipeline stall or bubble is added to solve data hazards when an R-format instruction following a load tries to use the data.

pipeline stall

Page 228: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 228

Reordering code to avoid pipeline stalls

Page 229: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 229

Reordering code to avoid pipeline stalls

Page 230: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 230

Forwarding

Forwarding yields another insight into the MIPS architecture. Each MIPS instruction writes at most one result and does so near the end of the pipeline. Forwarding is harder if there are multiple results to forward per instruction or they need to write a result early on in instruction execution.

Notes:

– The name “forwarding” comes from the idea that the result is passed forward from an earlier instruction to a later instruction. “Bypassing” comes from passing the result by register file to the desired unit.

Page 231: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 231

ALU and pipeline registers without forwarding

Page 232: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 232

ALU and pipeline registers with forwarding

Page 233: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 233

Datapath modified to resolve hazards via forwarding

Page 234: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 234

Hazards

Structural Hazards

Data Hazards

– Forwarding

– Stall

Control Hazards

Page 235: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 235

Data hazards and Stalls

Forwarding can not resolve this program because the destination stage is ealier in time than the source stage.

?

Page 236: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 236

Stalls are inserted into pipeline

Page 237: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 237

Harzards detection unit used for stalls

Page 238: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 238

Harzards detection unit used for stalls

Harzards detection unit is used to solve hazard when an instruction tries to read a register following a load instruction that writes the same register.

Checking for load instruction:

– if (ID/EX.MemRead and

(( ID/EX.RegisterRt = IF/ID.RegisterRs) or

( ID/EX.RegisterRt = IF/ID.RegisterRt)))

stall the pipeline

load instruction or not?

the destination register field of load instructionin EX stage matcheseither source register of the instruction in IDstage

Page 239: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 239

Hazards

Structural Hazards

Data Hazards

– Forwarding

– Stall

Control Hazards

Page 240: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 240

Control Hazards (Branch Hazards)

Control hazards or branch hazards occur when the proper instruction can not execute in the proper clock cycle because the instruction that was fetched is not the one that is needed, that is the flow of instruction addresses is not what the pipeline expected.

In branch instruction:

– We need to fetch the instruction following the branch on the next clock cycle to allow pipeline.

– But the pipeline cannot possibly know what the next instruction should be, since it only just received the branch instruction from memory.

Page 241: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 241

Impact of pipeline on the branch instruction

if branch is taken,we need to discard(flush) these instructions

Page 242: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 242

Solution 1: Stall

If we can move branch decision up earlier, we have less delay.

Assume that we put in enough extra hardware so that we can test registers, calculate the branch address, and update PC during the second stage of pipeline. Even with this highly costed extra hardware, we still have stall:– lw instruction, executed if the branch fails, is stalled one extra 200ps clock

cycle before starting.

The cost of extra hw for most computer is too high

Page 243: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 243

Solution 2: Branch Prediction

To solve branch hazards, we use branch prediction:

– One simple approach is to always predict that branches will be undertaken.

• When branches are correctly undertaken, the pipeline proceeds at full speed.

• When branches are taken, we have some stalls.

Page 244: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 244

Branch prediction is solution to control hazards

Page 245: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 245

Branch prediction

A more sophisticated version of branch prediction would have some branches predicted as undertaken, and some as taken:

– For usual branch like previous example, we predict branch as undertaken.

– For branches at the loops, branches are usually jump back to the top of loop. So branches are predicted as taken.

– Dynamic hardware predictor:

• May change predictions for a branch over the life of a program.

• Keeping a history table for each branch as taken or not taken, the using the past behavior to predict the future.

Page 246: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 246

Solution 3: Delayed decision (used by MIPS)

The delayed branch always executes the next sequential instruction, with branch taking place after that one instruction delay.

Compiler and assembler try to place an instruction that always executes after the branch in the branch delay slot.

add t1,t2,t4beq s0,s1,LABELor t5,t6,t7

LABEL: ....

beq s0,s1,LABELadd t1,t2,t4or t5,t6,t7

LABEL: ....

Page 247: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 247

Scheduling the branch delay slot

Page 248: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 248

Pipeline summary

We have seen three models of execution: single cycle, multicycle, and pipelined.

Pipelined control strives for 1 clock cycle per instruction, like single cycle, but also for a fast clock cycle, like multicycle.

Page 249: Computer Architecture (Kiến trúc máy tính)

© NTK 2009Page 249

FINISH