L4 speeding-up-execution
Click here to load reader
-
Upload
rsamurti -
Category
Engineering
-
view
495 -
download
0
Transcript of L4 speeding-up-execution
General Aspects of Computer Organization(Lecture-4)
R S Ananda Murthy
Associate Professor and HeadDepartment of Electrical & Electronics Engineering,
Sri Jayachamarajendra College of Engineering,Mysore 570 006
R S Ananda Murthy General Aspects of Computer Organization
Data Path In Side CPU
A
B
A+B
Registers ALU InputRegister
ALU
ALU OutputRegister
ALUInput Bus
BA A+B
Feeding two operands to the ALU and storing the output ofALU in an internal register is called data path cycle.Faster data path cycle results in faster program execution.Multiple ALUs operating in parallel results in faster datapath cycle.
R S Ananda Murthy General Aspects of Computer Organization
RISC Design Speeds Up Program Execution
Most manufacturers today implement the following features intheir processors to improve performance –
All instructions are directly executed by hardware insteadof being interpreted by a microprogram.Maximize the rate at which instructions are issued byadopting instruction-level parallelism.Use simple fixed-lengh instructions to speed-up decoding.Avoid performing arithmetic and logical operations directlyon data present in the memory i.e., only LOAD and STOREinstructions should be executed with reference to memory.Provide plently of registers in side the CPU.
R S Ananda Murthy General Aspects of Computer Organization
Pipelining for High Performance
Number of stages in a pipeline varies depending upon thehardware design of the CPU.Each stage in a pipeline is executed by a dedicatedhardware unit in side the CU.Each stage in a pipeline takes the same amount of time tocomplete its task.Hardware units of different stages in a pipeline can workconcurrently.Operation of hardware units is synchronized by the clocksignal.To implement pipelining instructions must be of fixed lengthand same instruction cycle time.Pipelining requires sophisticated compiling techniques tobe implemented in the compiler.
R S Ananda Murthy General Aspects of Computer Organization
A 4-Stage Pipeline
Clock Cycle
W1E1D1F1
InstructionI1
W2E2D2F2I2
W3E3D3F3I3
W4E4D4F4I4
1 2 3 4 5 6 7
Time
Hardware Stages in PipelineF: Fetch instructionD: Decode and get operandsE: Execute the instructionW: Write result at destination
Period of clock signalNo. of stages in pipeline
F: FetchInstruction
D: Decodeand get
operands
E: Executeoperation
W: Writeresults
B1 B2 B3
B1, B2, and B3 are storage buffers.Information is passed from one stage to the next throughstorage buffers.Time taken to execute each instruction is nT .Processor Band Width is 1/(T 106) MIPS (MillionInstructions Per Second).
R S Ananda Murthy General Aspects of Computer Organization
Superscalar Architecture
Instructiondecode
unit
Operandfetchunit
Instructionexecution
unit
Writebackunit
S2 S3 S4 S5
Instructiondecode
unit
Operandfetchunit
Instructionexecution
unit
Writebackunit
Instructionfetchunit
S1
Superscalar architecture has multiple pipelines as shownabove.In the above example, a single fetch unit fetches a pair ofinstructions together and puts each one into its ownpipeline, complete with its own ALU for parallel operation.Compiler must ensure that the two instructions fetched donot conflict over resource usage.
R S Ananda Murthy General Aspects of Computer Organization
Superscalar Architecture with Five Functional Units
Instructiondecode
unit
Operandfetchunit
LOADWritebackunit
S2 S3
S4
S5S1
Instructionfetchunit
ALU
ALU
STORE
FloatingPoint
Now-a-days the word “superscalar” is used to describeprocessors that issue multiple instructions – often four tosix – in a single clock cycle.Superscalar processors generally have one pipeline withmultiple functional units as shown above.
R S Ananda Murthy General Aspects of Computer Organization
License
This work is licensed under aCreative Commons Attribution 4.0 International License.
R S Ananda Murthy General Aspects of Computer Organization