8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 1/8
Question 1:
Give a high-level view of pipelined processor datapath and explain its working;compare the performance of pipelined datapath and the multi-cycle datapath.
Solution:
Instruction pipelining is a technique that implements a form
of parallelism called instruction-level parallelism within a single processor. It therefore
allows faster CPU throughput (the number of instructions that can be executed in a unit
of time) than would otherwise be possible at a given cloc rate. !he basic instruction
c"cle is broen up into a series called a pipeline. #ather than processing each
instruction sequentiall" (finishing one instruction before starting the next)$ each
instruction is split up into a sequence of steps so different steps can be executed
in parallel and instructions can be processed concurrentl"(starting one instruction before
finishing the previous one).
Pipelining increases instruction throughput b" performing multiple operations at the
same time$ but does not reduce instruction latenc"$ which is the time to complete a
single instruction from start to finish$ as it still must go through all steps. Indeed$ it ma"
increase latenc" due to additional overhead from breaing the computation into
separate steps and worse$ the pipeline ma" stall (or even need to be flushed)$ further
increasing the latenc". !hus$ pipelining increases throughput at the cost of latenc"$ and
is frequentl" used in CPUs but avoided in real-time s"stems$ in which latenc" is a hard
constraint.
%ach instruction is split into a sequence of dependent steps. !he first step is alwa"s to
fetch the instruction from memor"& the final step is usuall" writing the results of the
instruction to processor registers or to memor". Pipelining sees to let the processor
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 2/8
wor on as man" instructions as there are dependent steps$ 'ust as an assembl"
line builds man" vehicles at once$ rather than waiting until one vehicle has passed
through the line before admitting the next one. ust as the goal of the assembl" line is to
eep each assembler productive at all times$ pipelining sees to eep ever" portion of
the processor bus" with some instruction. Pipelining lets the computers c"cle time be
the time of the slowest step$ and ideall" lets one instruction complete in ever" c"cle.
!he term pipeline is an analog" to the fact that there is fluid in each lin of a pipeline$ as
each part of the processor is occupied with wor.
Question 2:
Following code lines are written in a high level language:
a = c + d;
= c + e;
!he corresponding instructions for "#$% are:
&' ()* ,(
&' (* /,(
011 (2* ()* (
%' (2* ),(
&' (/* 3,(
011 (4* ()* (/
%' (4* )5,(
!hese instructions are to e executed on a pipelined processor with forwarding.
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 3/8
a. #dentify ha6ards y showing the execution of these instructions per cycleases.
b. (eorder these instructions to avoid any pipeline stalls.c. 7ow many cycles are saved after executing the reordered instructions8
Solution:
a. #dentify ha6ards y showing the execution of theseinstructions per cycle ases.
SR.NO. CODE ASSEMBLIY LENT CODE
1 LW RI, 0(RO) LOADRI Mem[O+Reg[R0]]
2 LW R2, 4(RO) LOADRI € Mem[O+Reg[R0]]
3 ADD R, RI, R2 g!"g[R] € ##"g[RI]$%"g[R2]]
4 SW R, 12(RO) Mem[R] € I#"g[R]+Mem[12+Reg[R1]]
5 LW R4, &(RO) LOADR4 € Mem[&+Reg[RO''
ADD R, RI, R4 Reg[R] € g%[R1]*1"g[R4]]
SW R, 1(RO) Mem[R] € Ilsg[125]+Mem[16-Ren I ]]
Sample
Instruction
1 2 3 4 5 6 7 8 9 10 11
L R1! 0"R0# I$ I% &'& M&M (
L R2! 4"R)# I$ I% &'& M&M (
*%% R3! R1! R2 I$ I% &'& M&M (
R3! 12"R)# I$ I% &'& M&M (
L R4! 8"R)# I$ I% &'& M&M (
*%% R5! R1! R4 I$ I% &'& M&M &
R5! 16"R)# I$ I% &'& M&M (
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 4/8
Instruction
1 2 3 4 5 6 7 8 9 10 11 12 13
L R1! 0 R0 I$ I% &'& M&M (L R2! 4"R0# I$ I% &'& M&M (
*%% R3! R1! R2 I$ st,ll I% &'& M&M (
R3! 12"R)# I$ I% &'& M&M (
L R4! 8"R0# I$ I% &'& M&M (
*%% R5! R1! R4 I$ st,ll I% &'& M&M (
R5! 16"R)# I$ I% &'& M&M (
b. (eorder these instructions to avoid anypipeline stalls.
SR.NO. CODE ASSENIBLIY LINE CODE
1 L RI! 0"R)# L)*%R1 . Mem[01-Reg[R)]]
2 L R2! 4"R)# L)*%RI & Mem[01-Reg[R)]]
3 L R4! "R)# L)*%R4 & Mem[+Reg[R)]]
4 *%% R3! RI! R2 Reg[R3] & I/g[R1] 1Mg[R2]]
5 R3! 12"R)# Mem[121-Reg[R)]] & Reg[R3]
6 *%% R5! RI! R4 Reg[R5] gsg&R1isg[R4]]
7 R5! 16"R)# Mem[16 RegR0]1 * Reg[R5]
Instruction
1 2 3 4 5 6 7 8 9 10 11
L R1! 0"R)# I$ I% & M& (L R2! 4 R) I$ I% &'& M& (L R4! 8"R)# I$ I% &'& M& (
*%% R3! R1! R2 I$ I% &'& M&M & R3! 12 R) I$ I% &'& M& (
*%% R5! R1! R4 I$ I% &'& M& &
R5! 16"R)# I$ I% &'& M&M &
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 5/8
c. 7ow many cycles are saved after executing the reordered instructions8
e coe :e;ore reorering cont,ine 13 cloc/ c<cle in , gi=en >uestion
Inst?uction
1 2 3 4 5 6 7 9 10 11 12 13
L RI! 0"R)# I$ I% &'& M&M (
L R2! 4"R)# I$ I% &'& M&M &
*%% R3! R1! R2 I$ st,ll I% &'& M&M (
R3! 12"R)# I$ I% &'& M&M &
L R4! "R)# I$ I% &'& M&M (
*%% R5! RI! R4 I$ st,ll I% &'& M&M &
R5! 16"R0# I$ I% &'& M&M (
e coe ,;ter reorering con ,ine 11 cloc/ c<cle in , gi=en >uestion
Instruction
1 2 3 4 5 6 7 9 10 11
L R1! 0"R)# I$ I% &'& M&M &
L R2! 4"R)# I$ I% &'& M&M (L R4! R) I$ I% &'& M&M &
*%% R3! R1! R2 I$ I% &'& M&M ( R3! 12 R) I$ I% &'& M,l &*%% R5! R1! R4 I$ I% &'& I&M ( R5! 16"R)# I$ I% &'& M&M (
%ue to reorering s,=e t@o c<cles!
Question 3:
(ead the research paper titled 9 An optimizing pipeline stall reduction algorithm for
power and performance on multi-core CPUs* and answer the following uestions:
a. 7ow the proposed &eft-(ight ,&( algorithm works8b. 'hy &( algorithm is giving etter results as compared to traditional in-order
and !omasulo<s algorithms8
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 6/8
a. 7ow the proposed &eft-(ight ,&( algorithm works8
Proposed algorithm (LR(Left-Right)): We have proposed an algorithm which performs
the stall reduction in a Left-Right (LR) manner, insequential instruction execution as
shown in igure !" #ur algorithm introduces a h$%rid order of instruction execution in
order to reduce the power dissipationl" &ore precisel$, it executes the instructions
seriall$ as in-order execution until a stall condition is encountered, and thereafter, it
uses of concept of out-of-order execution to replace the stall with an independent
instruction" 'hus, LR increases the throughput %$ executing independent instructions
while the length$ instructions are still executed in other functional units or the registers
are involved in an ongoing process" LR also prevents the haards that might occur
during the instruction execution" 'he instructions are scheduled staticall$ at compile
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 7/8
time as shown in igure " *n our proposed approach, if a %uffer in presence can hold a
certain num%er of sequential instructions, our algorithm will generate a sequence
inwhich the instructions should %e executed to reduce the num%er of stalls while
maximiing the throughput of a processor" *t is assumed that all the instructions are in
the form of op-code source destination format"
proposed an algorithm which performs the stall reduction in a +eft-#ight (+#) manner$ in
sequential instruction execution as shown in ,igure . ur algorithm introduces a h"brid
order of instruction execution in order to reduce the power dissipationl. /ore precisel"$ it
executes the instructions seriall" as in-order execution until a stall condition is
encountered$ and thereafter$ it uses of concept of out-of-order execution to replace the
stall with an independent instruction. !hus$ +# increases the throughput b" executing
independent instructions while the length" instructions are still executed in other
functional units or the registers are involved in an ongoing process. +# also prevents
the ha0ards that might occur during the instruction execution. !he instructions are
scheduled staticall" at compile time as shown in ,igure 1. In our proposed approach$ if
a buffer in presence can hold a certain number of sequential instructions$ our algorithm
will generate a sequence in which the instructions should be executed to reduce the
number of stalls while maximi0ing the throughput of a processor. It is assumed that all
the instructions are in the form of op-code source destination format.
. 'hy &( algorithm is giving etter results as compared to traditional in-orderand !omasulo<s algorithms8
Solution:
Comparison of LR vs. Tomasulo algorithm
In this section$ the performance and power gain of the +# and the !omasulo algorithms are
compared.
Simulation and poer!performance evaluation
2s our baseline configuration$ we use an Intel core i3 dual core processor with 1.45678 cloc
frequenc"$ and 94-bit operating s"stem. :e also use the ;im-Panal"0er simulator <13=. !he +#$ in-
order$ and !omasulo algorithms are developed as C programs. !hese C programs were compiled
using arm-linux-gcc in order to obtain the ob'ect files for each
of them$ on an 2#/ microprocessor model.
8/16/2019 Solution Assignment No 2
http://slidepdf.com/reader/full/solution-assignment-no-2 8/8
2t the earl" stage of the processor design$ various levels of simulators can be used to estimate the
power and performance such as transistor level$ s"stem level$ instruction level$ and micro-
architecture level simulators. In transistor level simulators$ one can estimate the voltage and current
behaviour over time. !his t"pe of simulators are used for integrated circuit design$ and not suitable
for large programs. n the other hand$ microarchitecture level simulators provide the power
estimation across c"cles and these are used in modern processors. ur wor is similar to this ind of simulator because our ob'ective is to evaluate the power-performance behaviour of a micro-
architecture level
design abstraction. !hough$ a literature surve" suggests several power estimation tools such as
C2C!I$ :2!!C7 <19=$ and we have choose the ;im-Panal"0er <13= since it provides an accurate
power modelling b" taing into account both the leaage and d"namic power dissipation.
!he actual instruction execution of our proposed algorithm against existing ones is shown in
2lgorithms and 1. In the +# algorithm$ an instruction is executed seriall" in-order until a stall
occurs$ and thereafter the out-of-order execution technique comes to pla" to replace the stall with an
independent instruction stage. !herefore$ in most cases$ our proposed algorithm taes less c"cle of
operation and less c"cle time
compared to existing algorithms as shwon in algorithm <1=. !he comparison of our proposed
algorithm against the !omasulo algorithm and the in-orderalgorithm is shown in !able . !he next
section focusses on the power-performance efficienc" of our proposed algorithm
Top Related