CMPT 250 Computer Architecture

33
Instructor: Yuzhuang Hu [email protected]

description

CMPT 250 Computer Architecture. Instructor: Yuzhuang Hu [email protected]. Assembly Lines. - PowerPoint PPT Presentation

Transcript of CMPT 250 Computer Architecture

Page 1: CMPT 250 Computer Architecture

Instructor: Yuzhuang [email protected]

Page 2: CMPT 250 Computer Architecture

Assembly LinesAn assembly line is a manufacture process in

which parts are added into a product in a sequential manner using optimally planned logistics to create a finished product much faster than handcrafting-type methods.

The Ford Motor Company built the world’s first assembly line between 1908 and 1915.

This pipeline made the Ford Model T affordable and brought high wages to Ford workers.

Page 3: CMPT 250 Computer Architecture

Some Pictures of the Ford 1913 Assembly Line

Page 4: CMPT 250 Computer Architecture

A CalculationConsider assembly the car. Assume it has

three steps: install the engine, install the hood, and install the wheel.

One car takes 35 minutes. Three cars take 105 minutes, if only one car can be operated at once.

Install the hood

Install the engine

Install the wheel

5 minutes 20 minutes

10 minutes

Page 5: CMPT 250 Computer Architecture

A Calculation contd.What if we have three workers for each part?

Ideally, a car can be assembled in every 20 minutes.

Install the hood

Install the engine

Install the wheel

5

25

35

1st car

Install the hood

Install the engine

Install the wheel

Install the hood

Install the engine

Install the wheel

45

55

65

75

2nd car

3rd car

Page 6: CMPT 250 Computer Architecture

Pipeline Design

Separate the process into different stages of almost the same length.

These stages are separated by registers.

These registers provide temporary storage for data passing through the pipeline and are called pipeline platforms.

Page 7: CMPT 250 Computer Architecture

A Pipelined DatapathConventional: 0.6, 0.6, 0.2, 0.8, 0.2 ns (new) in total: 2.4 ns rate: 416.7 MHzPipelined: 0.6, 0.6, 0.2, 0.2, 0.8, 0.2, 0.2 ns (new version) in total: 1 ns rate: 1 GHz

0.6

0.6

0.2

0.8

0.2

0.6

0.6

0.2

0.2

0.8

0.2

0.2

Page 8: CMPT 250 Computer Architecture

D LatchEliminate the undesirable undefined state in the

SR latch: ensure S and R are never 1 at the same time.

D

Q

Q

C

Page 9: CMPT 250 Computer Architecture

Negative-Edge-Triggered D Flip-Flop1s-Catching behaviour is eliminated as S and

R can not both be 0 in a D Flip-Flop.

D

C

D

C

S

C

R

Page 10: CMPT 250 Computer Architecture

Assume no data hazards.

Page 11: CMPT 250 Computer Architecture

How much can we gain?Conventional: 2.4 * 7 ns Pipeline: 9 * 1 ns

Page 12: CMPT 250 Computer Architecture

Assume no data and control hazards.

Page 13: CMPT 250 Computer Architecture

Pipeline contd.In the first four clock cycles, the pipeline is filling.

In the next four clock cycles, all stages of the pipeline are active. The pipeline is fully utilized.

In the last three clock cycles, not all stages of the pipeline are active, since the pipeline is emptying.

Page 14: CMPT 250 Computer Architecture

The Reduced Instruction Set Computer (RISC)The goal of a RISC architecture is high

throughput and fast execution. To achieve these goals, accesses to memory are to be avoided.

A RISC architecture has the following properties: Memory accesses are restricted to load and store

instructions, and data-manipulation instructions are register-to-register.

Addressing modes are limited in number. Instruction formats are all of the same length. Instructions perform elementary operations.

Page 15: CMPT 250 Computer Architecture

A RISC Instruction Set Architecture32 registers R0 through R31. R0 is a special

register storing the value zero.

Page 16: CMPT 250 Computer Architecture
Page 17: CMPT 250 Computer Architecture
Page 18: CMPT 250 Computer Architecture

Datapath OrganizationThe new datapath has 32 32-bit registers. The address

inputs are therefore five bits.

The replacement of the single-bit position shifter with a barrel shifter to permit multiple-position (SH) shifting.

In the function unit, the ALU is expanded to 32 bits.

The constant unit performs zero fill for CS=0 and sign extension for CS=1.

MUX A is added to provide a path from the updated PC, PC-

1, for implementation of the JML instruction.

Page 19: CMPT 250 Computer Architecture

Datapath Organization contd.Adding an additional input to MUX D to implement

the Set if Less Than (SLT) instruction. It is 1 when N is 1 and V is 0, or N is 0 and V is 1.

A final difference is that the register file is no longer edge triggered and is no longer a part of a pipeline platform at the end of the write-back (WB) stage.

In the second half of the cycle, it is possible to read data written into the register file during the first half of the same clock cycle. It is called a read-after-write register file.

Page 20: CMPT 250 Computer Architecture

Control OrganizationSH is added to IR, CS is added to the instruction

decoder, MD is expanded to two bits.

MUX C selects from three different sources for the next value of PC.

BrA is formed from the sum of the updated PC value for the branch instruction and the target offset.

BAA is used for the register jump.

BS, PS and Z are used to select the next PC value.

Page 21: CMPT 250 Computer Architecture

Control Organization contd.To determine the control codes, the CPU is

viewed much as is the single cycle CPU.

However, it is important to examine the timing carefully to be sure that various parts of the register transfer statement take place in the right stage of the pipeline.

Note that BrA and RAA are obtained in the EX stage.

Page 22: CMPT 250 Computer Architecture

More on Instruction Set ArchitectureThe format of an instruction is depicted in a

rectangular box symbolizing the bits of the binary instruction.

The bits are divided into groups called fields.An opcode field.An address field.A mode field, which specifies the way the

address field is to be interpreted.

Page 23: CMPT 250 Computer Architecture

Operand AddressingTo illustrate the influence of the number of

operands on computer programs, we will evaluate the arithmetic statement X=(A+B)(C+D).

Three address instructions:ADD T1, A, B M[T1]<-M[A]+M[B]ADD T2, C, D M[T2]<-M[C]+M[D]MUL X, T1, T2 M[X]<=M[T1]*M[T2]OrADD R1, A, B R1<-M[A]+M[B]ADD R2, C, D R2<-M[C]+M[D]MUL X, R1, R2 M[X]<=R1*R2

Page 24: CMPT 250 Computer Architecture

Operand Addressing contd. Two-Address Instructions

MOVE T1, A M[T1]<-M[A] ADD T1, B M[T1]<-M[T1]+M[B] MOVE X, C M[X]<-M[C] ADD X, D M[X]<-M[X]+M[D] MUL X, T1 M[X]<-M[X]*M[T1]

One-Address Instructions LD A ACC<-M[A] ADD B ACC<-ACC+M[B] ST X M[X]<-ACC LD C ACC<-M[C] ADD D ACC<-ACC+M[D] MUL X ACC<-ACC*M[X] ST X M[X]<-ACC

Page 25: CMPT 250 Computer Architecture

Zero-Address InstructionsWe use a stack. The top of the stack is

referred to as TOS. The word below is TOS-1.PUSH A TOS<-M[A]PUSH B TOS<-M[B]ADD TOS<-TOS+TOS-1.PUSH C TOS<-M[C]PUSH D TOS<-M[D]ADD TOS<-TOS+TOS-1

MUL TOS<-TOS*TOS-1

POP X M[X]<-TOS

Page 26: CMPT 250 Computer Architecture

Addressing ModesThe addressing mode of an instruction

specifies a rule for interpreting or modifying the address field of the instruction.

The address of the operand produced by such a rule is called the effective address. Give programming flexibility to the user.To reduce the number of bits in the address

fields of the instruction.

Page 27: CMPT 250 Computer Architecture

Addressing Modes contd.Implied Mode: the operand is specified implicitly in

the opcode, e.g. ADD in a stack computer.

Immediate Mode: LDI R0, 3

Register and Register-Indirect ModesRegister Mode: the address field specifies a register.Register-Indirect Mode: the address field specifies a

register whose content gives the address of the operand in memory.

Auto Increment/Decrement Mode:ADD (R1)+,3 M[R1]<-M[R1]+3, R1<-R1+1

Page 28: CMPT 250 Computer Architecture

Addressing Mode contd.Direct Addressing Mode: the address field of the

instruction gives the address of the operand in memory.

Indirect Addressing Mode: the address field of the instruction gives the address at which the effective address is stored in memory.

Relative Addressing Mode:Effective address = Address part of the instruction + PC

Page 29: CMPT 250 Computer Architecture

Addressing Mode contd.Index Addressing Mode: the content of an

index register is added to the address part of the instruction to obtain the effective address.

The index register may be a special CPU register or simply a register in a register file, e.g. for arrays.

The Base-Register Mode: the contents of a base register are added to the address part of the instruction to obtain the effective address.

Page 30: CMPT 250 Computer Architecture

Addressing Modes ExamplesOpcode: Load to ACC

PC=250

R1=400

ACC

250 251 252

400

500

752

800

900

Memory

Page 31: CMPT 250 Computer Architecture

Addressing Modes Examples contd.

Addressing mode Mnemonic Register Transfer Effective address

Contents of ACC

Immediate

Direct

Indirect

Relative

Index

Register

Register-Indirect

LDA ADRS

LDA #NBR

LDA [ADRS]

LDA $ADRS

LDA ADRS(R1)

LDA R1

LDA (R1)

ACC M[ADRS]

ACC NBR

ACC M[M[ADRS]]

ACC M[ADRS+PC]

ACC M[ADRS+R1]

ACC R1

ACC M[R1]

500

251

800

752

900

-----

400

800

500

300

600

200

400

700

Page 32: CMPT 250 Computer Architecture

CISC ArchitectureThe goal of the CISC architecture is to match more

closely the operations used in programming language and to provide instructions that facilitate compact programs and conserve memory.

A purely CISC architecture has the following properties: Memory access is directly available to most types of

instructions. Addressing modes are substantial in number. Instruction formats are of different lengths. Instructions perform both elementary and complex

operations.

Page 33: CMPT 250 Computer Architecture

THANKS!