L4 speeding-up-execution

General Aspects of Computer Organization(Lecture-4)

R S Ananda Murthy

Associate Professor and HeadDepartment of Electrical & Electronics Engineering,

Sri Jayachamarajendra College of Engineering,Mysore 570 006

R S Ananda Murthy General Aspects of Computer Organization

Data Path In Side CPU

A

B

A+B

Registers ALU InputRegister

ALU

ALU OutputRegister

ALUInput Bus

BA A+B

Feeding two operands to the ALU and storing the output ofALU in an internal register is called data path cycle.Faster data path cycle results in faster program execution.Multiple ALUs operating in parallel results in faster datapath cycle.


RISC Design Speeds Up Program Execution

Most manufacturers today implement the following features intheir processors to improve performance –

All instructions are directly executed by hardware insteadof being interpreted by a microprogram.Maximize the rate at which instructions are issued byadopting instruction-level parallelism.Use simple fixed-lengh instructions to speed-up decoding.Avoid performing arithmetic and logical operations directlyon data present in the memory i.e., only LOAD and STOREinstructions should be executed with reference to memory.Provide plently of registers in side the CPU.


Pipelining for High Performance

Number of stages in a pipeline varies depending upon thehardware design of the CPU.Each stage in a pipeline is executed by a dedicatedhardware unit in side the CU.Each stage in a pipeline takes the same amount of time tocomplete its task.Hardware units of different stages in a pipeline can workconcurrently.Operation of hardware units is synchronized by the clocksignal.To implement pipelining instructions must be of fixed lengthand same instruction cycle time.Pipelining requires sophisticated compiling techniques tobe implemented in the compiler.


A 4-Stage Pipeline

Clock Cycle

W1E1D1F1

InstructionI1

W2E2D2F2I2

W3E3D3F3I3

W4E4D4F4I4

1 2 3 4 5 6 7

Time

Hardware Stages in PipelineF: Fetch instructionD: Decode and get operandsE: Execute the instructionW: Write result at destination

Period of clock signalNo. of stages in pipeline

F: FetchInstruction

D: Decodeand get

operands

E: Executeoperation

W: Writeresults

B1 B2 B3

B1, B2, and B3 are storage buffers.Information is passed from one stage to the next throughstorage buffers.Time taken to execute each instruction is nT .Processor Band Width is 1/(T 106) MIPS (MillionInstructions Per Second).


Superscalar Architecture

Instructiondecode

unit

Operandfetchunit

Instructionexecution

unit

Writebackunit

S2 S3 S4 S5

Instructiondecode

unit

Operandfetchunit

Instructionexecution

unit

Writebackunit

Instructionfetchunit

S1

Superscalar architecture has multiple pipelines as shownabove.In the above example, a single fetch unit fetches a pair ofinstructions together and puts each one into its ownpipeline, complete with its own ALU for parallel operation.Compiler must ensure that the two instructions fetched donot conflict over resource usage.


Superscalar Architecture with Five Functional Units

Instructiondecode

unit

Operandfetchunit

LOADWritebackunit

S2 S3

S4

S5S1

Instructionfetchunit

ALU

ALU

STORE

FloatingPoint

Now-a-days the word “superscalar” is used to describeprocessors that issue multiple instructions – often four tosix – in a single clock cycle.Superscalar processors generally have one pipeline withmultiple functional units as shown above.


License

This work is licensed under aCreative Commons Attribution 4.0 International License.


L4 speeding-up-execution

Engineering

Transcript of L4 speeding-up-execution