Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment
description
Transcript of Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment
![Page 1: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/1.jpg)
Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment
Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment
ISPD 2005 San Francisco, CA ISPD 2005 San Francisco, CA
May 5th, 2005May 5th, 2005
Mario R. Casu - Mario R. Casu - Politecnico di TorinoPolitecnico di Torino
and and Luca MacchiaruloLuca Macchiarulo - - University of Hawaii at University of Hawaii at ManoaManoa
![Page 2: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/2.jpg)
OutlineOutline
Communication concerns at the physical Communication concerns at the physical layerlayer
Great Expectations of “Wire Pipelining”Great Expectations of “Wire Pipelining”– No block DelayNo block Delay– Block delay limitationBlock delay limitation
Computation localityComputation locality Adaptive CommunicationsAdaptive Communications Floorplanning strategy for adaptive Floorplanning strategy for adaptive
systemssystems Experimental resultsExperimental results
![Page 3: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/3.jpg)
Wire pipelining - conceptWire pipelining - concept
Wire delay: Wire delay: substantial share substantial share of overall delayof overall delay
Global wires Global wires difficult to deal difficult to deal withwith
Global wires Global wires scaling does not scaling does not follow follow – TransistorsTransistors– Local wiringLocal wiring
Del
![Page 4: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/4.jpg)
Wire pipelining - conceptWire pipelining - concept
Introducing a Introducing a latch/FF reduces latch/FF reduces the timing the timing constraintsconstraints
Similar to classical Similar to classical pipelining pipelining
Del’
Del’’
![Page 5: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/5.jpg)
Critical LengthCritical Length
Maximal length for Maximal length for which the wire can which the wire can be driven at a be driven at a given frequencygiven frequency– Optimum number Optimum number
of buffersof buffers– Optimum buffer Optimum buffer
dimensionsdimensions– Optimum wire Optimum wire
sizingsizing
Del=1/f
![Page 6: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/6.jpg)
Wire PipeliningWire Pipelining
Above Critical Above Critical length clocked length clocked elements are elements are needed (pipeline needed (pipeline stages)stages)
Del>1/f
![Page 7: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/7.jpg)
“Wire Pipelining” techniques“Wire Pipelining” techniques
Problem: maintaining functionality with a Problem: maintaining functionality with a minimum loss in performance.minimum loss in performance.
Solutions:Solutions:– Globally Asynchronous Locally Synchronous – Globally Asynchronous Locally Synchronous –
GALSGALS– RetimingRetiming– Regular Distributed Register (J. Cong)Regular Distributed Register (J. Cong)– c-slowing (S. Sapatnekar) c-slowing (S. Sapatnekar) – Latency Insensitive Protocols (L. Carloni)Latency Insensitive Protocols (L. Carloni)
![Page 8: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/8.jpg)
Shell
LIPs: ConceptLIPs: Concept
Pearl Relay Station
![Page 9: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/9.jpg)
Shell – Relay Station InteractionShell – Relay Station Interaction
valid stop
![Page 10: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/10.jpg)
Feedback TopologyFeedback Topology
τ
τ
τ
τ
00
0
![Page 11: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/11.jpg)
Feedback TopologyFeedback Topology
0
τ
0
0
τ
τ
0τ
![Page 12: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/12.jpg)
Feedback TopologyFeedback Topology
τ
0
τ
0
1
τ
0τ1
![Page 13: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/13.jpg)
Feedback TopologyFeedback Topology
1
τ
τ
1
τ
1
0τ1τ
![Page 14: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/14.jpg)
Feedback TopologyFeedback Topology
τ
1
1
1
τ
τ
0τ1ττ
![Page 15: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/15.jpg)
Feedback TopologyFeedback Topology
τ
τ
τ
τ
2
2
0τ1ττ2
![Page 16: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/16.jpg)
Feedback Topology: PerformanceFeedback Topology: Performance Void data circulate in the Void data circulate in the
loops: initially as many loops: initially as many as relay stations (as relay stations (ss))
““Period” of void-stop Period” of void-stop equal to the number of equal to the number of shells (shells (ss) and relay ) and relay station (station (rr) in the loop) in the loop
Worst loop fixes thr.Worst loop fixes thr. T=s/(s+r)T=s/(s+r) TTaa=2/4, Tb=2/5 =2/4, Tb=2/5
T=2/5T=2/5 τ
τ
τ
τ
2
2
0τ1ττ2
a b
![Page 17: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/17.jpg)
Classical FloorplanningClassical Floorplanning
Problem: find a Problem: find a placement of (soft or placement of (soft or hard) blocks that hard) blocks that optimally fits a floorplanoptimally fits a floorplan
Optimality is Optimality is Whitespace, overall Whitespace, overall Wirelength, critical path, Wirelength, critical path, or a combinationor a combination
![Page 18: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/18.jpg)
Floorplanning for Throughput [ISPD2004]Floorplanning for Throughput [ISPD2004]
The optimal floorplan The optimal floorplan in our case is that in our case is that which guarantees the which guarantees the maximum throughput maximum throughput compatible with given compatible with given blocks’ dimensionsblocks’ dimensions
Maximum throughput Maximum throughput is equivalent to the is equivalent to the worst cost-to-time worst cost-to-time ratio loopratio loop
![Page 19: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/19.jpg)
New Heuristic Throughput ComputationNew Heuristic Throughput Computation Heuristic: Heuristic:
– Statically compute the shortest loop l(e) in Statically compute the shortest loop l(e) in which every edge appearswhich every edge appears
– For every optimization iteration: For every optimization iteration: Cost(e)=1/l(e)*floor(length/CCost(e)=1/l(e)*floor(length/Clengthlength)) TotCost=TotCost=cost(e)cost(e)
![Page 20: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/20.jpg)
Throughput-frequency trade-offThroughput-frequency trade-off
f=1/L
T=1
DR0=1.1/L=1/L
![Page 21: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/21.jpg)
Throughput-frequency trade-offThroughput-frequency trade-off
f=2/L
T=2/(2+2)=1/2
DR=1/2.2/L=1/L
No advantage!
![Page 22: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/22.jpg)
Throughput-frequency trade-offThroughput-frequency trade-off
f=1/L L L
L/2
T=1
DR0=1/L.1=1/L
![Page 23: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/23.jpg)
Throughput-frequency trade-offThroughput-frequency trade-off
L/2
L/2
L/2
L/2
L/2
f=2/L T=3/(3+2)
DR=2/L.3/5=6/5L
![Page 24: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/24.jpg)
Data Rate as the basic performance metric – Speed-upData Rate as the basic performance metric – Speed-up Wire pipelining allows increased frequencyWire pipelining allows increased frequency But it decreases the throughput according to But it decreases the throughput according to
the previous considerationsthe previous considerations Real performance is given by DATA Real performance is given by DATA
RATE=Thr*fRATE=Thr*f Advantage w.r.t. non-pipelined systems to be Advantage w.r.t. non-pipelined systems to be
assessed through DR measuresassessed through DR measures Speed-Up SU=DR/DRSpeed-Up SU=DR/DR00
L/(lL/(lmm+l+lmaxmax)<SU<L/l)<SU<L/lmm Floorplanning can be extremely beneficial Floorplanning can be extremely beneficial
if it can reduce the average branch length if it can reduce the average branch length llmm
![Page 25: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/25.jpg)
Block delay effectBlock delay effect
Blocks put a cap to the max frequencyBlocks put a cap to the max frequency– ffmaxmax<1/max(d<1/max(dii))
ii
We can measure delay in “length”, by using a proportionality We can measure delay in “length”, by using a proportionality factorfactor
Block delay can enter in the picture if signals are Block delay can enter in the picture if signals are latched at the input or output side onlylatched at the input or output side only
L
ld
![Page 26: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/26.jpg)
Block delay modelsBlock delay models
We used two different modelsWe used two different models– Delay proportional to block edgeDelay proportional to block edge
Rationale: complexity of logic is related to block Rationale: complexity of logic is related to block sizesize
Minimum constant of proportionality=1: delay is Minimum constant of proportionality=1: delay is the same needed for the fastest signal to the same needed for the fastest signal to traverse the entire block traverse the entire block
Optimistic assumptionOptimistic assumption– Delay constant, related to technology and Delay constant, related to technology and
equal to 13FO4equal to 13FO4 Derived for assumption in the roadmapDerived for assumption in the roadmap More realistic for high performance designMore realistic for high performance design More pessimistic (see below)More pessimistic (see below)
Probably the reality is somehow between the Probably the reality is somehow between the two casestwo cases
![Page 27: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/27.jpg)
Speed-up with block delaySpeed-up with block delay
Taking the block delay into account modifies Taking the block delay into account modifies the previous considerationsthe previous considerations
max(Lmax(Lii+d+dii)/(l)/(lmm+d+dmm+d+dmaxmax)<SU<max(L)<SU<max(Lii+d+dii)/(l)/(lmm+d+dmm))
In general, much worse than previous caseIn general, much worse than previous case
![Page 28: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/28.jpg)
Throughput driven floorplan experimentsThroughput driven floorplan experiments We used the floorplanner described in ISPD’04 We used the floorplanner described in ISPD’04
to evaluate the optimal frequency (maximum to evaluate the optimal frequency (maximum DR)DR)
On GSRC and MCNC benchmarks with input-On GSRC and MCNC benchmarks with input-output informationoutput information
No block delay: No block delay: – SU varies between 0.8 to 36%SU varies between 0.8 to 36%– Better on benchmarks with greater complexityBetter on benchmarks with greater complexity
Block delayBlock delay– Proportional to blocks’ edges: -7% to 44%Proportional to blocks’ edges: -7% to 44%– Equal to 13FO4: -11% to 12%Equal to 13FO4: -11% to 12%– MCNC suite shows the worse behaviorMCNC suite shows the worse behavior
High speed systems with highly optimized High speed systems with highly optimized blocks lead to negligible or irrelevant SU, for an blocks lead to negligible or irrelevant SU, for an high increase of clock frequency.high increase of clock frequency.
![Page 29: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/29.jpg)
Space for better performance?Space for better performance?
Not all point to point connections are actually Not all point to point connections are actually used at every clock cycle.used at every clock cycle.
Ex. CPU to Cache communication.Ex. CPU to Cache communication.
Read cycle
Addr
Data-in
Data-out
![Page 30: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/30.jpg)
Space for better performance?Space for better performance?
Not all point to point connections are actually Not all point to point connections are actually used at every clock cycle.used at every clock cycle.
Ex. CPU to Cache communication.Ex. CPU to Cache communication.
Write cycle
Addr
Data-in
Data-out
![Page 31: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/31.jpg)
Space for better performance?Space for better performance?
Unused communication channel effectively break Unused communication channel effectively break throughput-limiting loopsthroughput-limiting loops
Pipelining without limitation can become possiblePipelining without limitation can become possible
Stream Write cycle
Addr 1
Data-out 1τ
![Page 32: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/32.jpg)
Space for better performance?Space for better performance?
Unused communication channel effectively break Unused communication channel effectively break throughput-limiting loopsthroughput-limiting loops
Pipelining without limitation can become possiblePipelining without limitation can become possible
Stream Write cycle
Addr 2
Data-out 2
Addr 1
Data-out 1
![Page 33: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/33.jpg)
Space for better performance?Space for better performance?
Unused communication channel effectively break Unused communication channel effectively break throughput-limiting loopsthroughput-limiting loops
Pipelining without limitation can become possiblePipelining without limitation can become possible
Stream Write cycle
Addr 3
Data-out 3
Addr 2
Data-out 2
![Page 34: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/34.jpg)
Adaptive Latency Insensitive ProtocolAdaptive Latency Insensitive Protocol Need a mechanism to allow discarding useless Need a mechanism to allow discarding useless
“packets” by blocks: Adaptive communication“packets” by blocks: Adaptive communication Details out of the scope of the paper butDetails out of the scope of the paper but
– It is possible thorugh a simple modification of It is possible thorugh a simple modification of the original protocolthe original protocol
– Requires the introduction of “oracles” Requires the introduction of “oracles” predicting unused inputs for each blockpredicting unused inputs for each block
– We designed a functional implementation in We designed a functional implementation in synthesizable VHDLsynthesizable VHDL
– We proved the correctness of the We proved the correctness of the implementation (absence of deadlocks and implementation (absence of deadlocks and correct signal sequencing)correct signal sequencing)
![Page 35: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/35.jpg)
ALIP performance evaluationALIP performance evaluation
The adaptiveness of the approach prevents a The adaptiveness of the approach prevents a static prediction of performancestatic prediction of performance
However, a few conclusion can be reached:However, a few conclusion can be reached:– The performance is bounded above by static LIPThe performance is bounded above by static LIP– Performance in long sequences of input Performance in long sequences of input
independence is equivalent to the simplified independence is equivalent to the simplified network with the channel removednetwork with the channel removed
If the system experiences unfrequent “context If the system experiences unfrequent “context switching” on its channels, such that at any switching” on its channels, such that at any given time the performance is static Thgiven time the performance is static Th ii, the , the average performance can be approximated as:average performance can be approximated as:– Th=Th=ii.Th.Thii
i: fraction of time with performance Thi: fraction of time with performance Th i i
![Page 36: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/36.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Stream Write cycle
Addr 1
Data-out 1τ
Ck=1Valid Data=1
![Page 37: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/37.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Stream Write cycle
Addr 2
Data-out 2
Addr 1
Data-out 1
Ck=2Valid Data=2
![Page 38: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/38.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Stream Write cycle
Addr 3
Data-out 3
Addr 2
Data-out 2
Ck=3Valid Data=3
![Page 39: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/39.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
Addr 4 Addr 3
Data-out 3
Ck=4Valid Data=4
![Page 40: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/40.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
----- Addr 4
Ck=5Valid Data=5
ττ
![Page 41: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/41.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
-----
Ck=6Valid Data=5
Data-in4τ
τ
![Page 42: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/42.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
Ck=7Valid Data=5
-----
τ
Data-in4
τ
![Page 43: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/43.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
Ck=8Valid Data=6
-----
τAddr 5
τ
![Page 44: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/44.jpg)
ALIP performance evaluation - ExampleALIP performance evaluation - Example
Read cycle
Ck=8Valid Data=6Throughput=3/4Th1=1Th2=1/2=1/22=1/2
-----
τAddr 5
τ
![Page 45: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/45.jpg)
Adaptive communication performance evaluation - assumptions
Adaptive communication performance evaluation - assumptions Assumption 1: No time lost in “context Assumption 1: No time lost in “context
switching”switching”– Unrealistic, but acceptable for burst Unrealistic, but acceptable for burst
communication, and consistent with communication, and consistent with experimentsexperiments
Assumption 2: Channels behave in a Assumption 2: Channels behave in a statistically independent fashionstatistically independent fashion– Only single clock cycle independence is Only single clock cycle independence is
important for our purposesimportant for our purposes Under 1 and 2, we can compute channel Under 1 and 2, we can compute channel
activities and use them to weight the activities and use them to weight the connectionsconnections
![Page 46: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/46.jpg)
Floorplanning for Throughput – adaptive caseFloorplanning for Throughput – adaptive case The optimal floorplan The optimal floorplan
in our case is that in our case is that which guarantees the which guarantees the maximum throughput maximum throughput compatible with given compatible with given blocks’ dimensionsblocks’ dimensions
Maximum throughput Maximum throughput is equivalent to the is equivalent to the worst cost-to-time worst cost-to-time ratio loop, ratio loop, weighted weighted by the by the looploop activation activation ratioratio
It can be It can be approximated by approximated by taking into account taking into account the the channelchannel activation activation ratioratio
![Page 47: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/47.jpg)
New Heuristic Throughput ComputationNew Heuristic Throughput Computation Heuristic: Heuristic:
– Statically compute the shortest loop l(e) in Statically compute the shortest loop l(e) in which every edge appearswhich every edge appears
– For every optimization iteration: For every optimization iteration: Cost(e)=1/l(e)*floor(length/CCost(e)=1/l(e)*floor(length/Clengthlength)*)*(e)(e) TotCost=TotCost=cost(e)cost(e)
The only change consists in the inclusion of The only change consists in the inclusion of the term the term (e)(e)
![Page 48: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/48.jpg)
ExperimentsExperiments
GSRC/MCNC benchmarksGSRC/MCNC benchmarks– Burst modeBurst mode– Uniformly distributed phases and activation Uniformly distributed phases and activation
timestimes– Comparison between non-pipelined solution and Comparison between non-pipelined solution and
adaptively pipelined (13FO4 case)adaptively pipelined (13FO4 case)– After optimization, a VHDL netlist is After optimization, a VHDL netlist is
automatically generated and simulated to automatically generated and simulated to measure the real performance of the system (as measure the real performance of the system (as opposed to the approximation from the opposed to the approximation from the floorplanner)floorplanner)
Results:Results:– SU between 16 and 44%SU between 16 and 44%– Monotonous behavior in the legal intervalMonotonous behavior in the legal interval– Limitations due mainly to FO4 delaysLimitations due mainly to FO4 delays
![Page 49: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/49.jpg)
ExperimentsExperiments
MPEG decoderMPEG decoder– Strict data dependencyStrict data dependency– Optimization as in other casesOptimization as in other cases– Simulation as before Simulation as before andand with real channel with real channel
utilization profilesutilization profiles Results:Results:
– SU of 42% with block delay, 76% withoutSU of 42% with block delay, 76% without– Real SU of 31% (effect of non-random Real SU of 31% (effect of non-random
correlation)correlation)
![Page 50: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/50.jpg)
Conclusions and future workConclusions and future work
Pure “blind” pipelining fails to achive available Pure “blind” pipelining fails to achive available optimization, due to neglect of common optimization, due to neglect of common informationinformation
Adaptive protocols can take advantage of the Adaptive protocols can take advantage of the information available to the blocksinformation available to the blocks
We will concentrate onWe will concentrate on– Automated extraction of information from the Automated extraction of information from the
blocksblocks– Power optimization (power/timing trade-offs)Power optimization (power/timing trade-offs)– Routing constraints effectsRouting constraints effects
![Page 51: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/51.jpg)
Thank you
![Page 52: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/52.jpg)
Shell – Relay Station InteractionShell – Relay Station Interaction
valid stop
a
![Page 53: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/53.jpg)
Shell – Relay Station InteractionShell – Relay Station Interaction
valid stop
b
a
![Page 54: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/54.jpg)
Shell – Relay Station InteractionShell – Relay Station Interaction
valid stop
c
b
![Page 55: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/55.jpg)
Shell – Relay Station InteractionShell – Relay Station Interaction
valid stop
d
bc
![Page 56: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/56.jpg)
Feedforward equalizationFeedforward equalization
Maximum Maximum performance can be performance can be recovered by recovered by equalizing various equalizing various pathspaths
Longest path Longest path computation to computation to obtain the obtain the appropriate number appropriate number of added relay of added relay stationsstations
![Page 57: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/57.jpg)
Critical Length and Pipelining Stages (ITRS projections)Critical Length and Pipelining Stages (ITRS projections)
YearYear NodeNode Clock Clock FrequencyFrequency
Critical Critical
LengthLength
StagesStages10 10 mmmm
34 mm34 mm
20012001 130 130 nmnm
1.684 GHz1.684 GHz 17.11 mm17.11 mm 00 11
20022002 115 115 nmnm
2.317 GHz2.317 GHz 12.17 mm12.17 mm 00 22
20032003 100 100 nmnm
3.088 GHz3.088 GHz 8.95 mm8.95 mm 11 33
20042004 90 nm90 nm 3.990 GHz3.990 GHz 7.37 mm7.37 mm 11 4420052005 80 nm80 nm 5.173 GHz5.173 GHz 5.28 mm5.28 mm 11 6620062006 70 nm70 nm 5.631 GHz5.631 GHz 4.63 mm4.63 mm 22 7720072007 65 nm65 nm 6.739 GHz6.739 GHz 4.16 mm4.16 mm 22 88
![Page 58: Floorplan Assisted Data Rate Enhancement through Wire Pipelining: A Real Assessment](https://reader036.fdocuments.in/reader036/viewer/2022070413/56814bfa550346895db8f3e0/html5/thumbnails/58.jpg)
General Performance EvaluationGeneral Performance Evaluation Generic netlists of blocks are feedforward Generic netlists of blocks are feedforward
connections of loopsconnections of loops If feedforward connections are equalized, If feedforward connections are equalized,
“worst” loop dominates throughput“worst” loop dominates throughput Problem formulation: max cost-to-time ratio Problem formulation: max cost-to-time ratio
(polynomial time).(polynomial time).