Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Soha Hassoun Tufts University Medford, MA Thanks to: Carl Ebeling University of Washington Seattle,...
Soha HassounSoha Hassoun
Tufts UniversityTufts University
Medford, MAMedford, MA
Thanks to: Carl EbelingThanks to: Carl Ebeling
University of WashingtonUniversity of Washington
Seattle, WASeattle, WA
Fine Grain Incremental ReschedulingFine Grain Incremental ReschedulingViaVia
Architectural RetimingArchitectural Retiming
RAM
OffsetOffset
ExampleExample
Problem -- Clock period is too largeProblem -- Clock period is too large
Write AddressWrite Address
Read AddressRead Address
RAM
Write AddressWrite Address
Read AddressRead Address
OffsetOffset
PipeliningPipelining
Problems w/ consecutive dependent operationsProblems w/ consecutive dependent operations
Performance BottleneckPerformance Bottleneck
Latency constrained pathsLatency constrained paths
Latency = n
Performance BottleneckPerformance Bottleneck
Latency constrained pathsLatency constrained paths
Latency = n
ApproachApproachapply architectural retiming at the RT levelapply architectural retiming at the RT level
Problem:Problem: too much work, too little timetoo much work, too little time
Architectural RetimingArchitectural Retiming
yk
Problem:Problem: too much work, too little timetoo much work, too little time
D
pipelinepipelineregisterregister
yk
Architectural RetimingArchitectural Retiming
N
negative registernegative register
Problem:Problem: too much work, too little timetoo much work, too little time
pipelinepipelineregisterregister
DCyk
Architectural RetimingArchitectural Retiming
N
negative registernegative register
Problem:Problem: too much work, too little timetoo much work, too little time
pipelinepipelineregisterregister
DCyk
Architectural RetimingArchitectural Retiming
precomputation prediction
OutlineOutline
PrecomputationPrecomputationincremental rescheduling incremental rescheduling withoutwithout resource resource
constraintsconstraints
PredictionPredictionincremental rescheduling incremental rescheduling withwith resource resource
constraintsconstraints
ResultsResults
DD t t = C = C t+1t+1
= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )
Precomputation FunctionPrecomputation Function
hhhDCxi
ffggyk
x iN
DD t t = C = C t+1t+1
= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )
xxi i t+1t+1 = x´= x´ii
t t == gg ( ... , y( ... , ykktt , ... ) , ... )
Precomputation FunctionPrecomputation Function
hhhDCxi
ffggyk
x iN
f´f´DD t t = C = C t+1t+1
= f ( ... , x= f ( ... , xi i t+1t+1 , ... ) , ... )
xxi i t+1t+1 = x´= x´ii
t t == gg ( ... , y( ... , ykktt , ... ) , ... )
Precomputation FunctionPrecomputation Function
hhhDCxi
ffggyk
x iN
DD tt = f ( ... , g= f ( ... , g ( ... , y( ... , ykktt , ... ) , ...) , ... ) , ...)
= f´( ... , y= f´( ... , ykktt , ... ) , ... )
f´f´
Incremental ReschedulingIncremental Rescheduling
hhhffggyk
Time n g
Time n+1 f, h
N
Time n f ’
Time n+1 h
PrecomputingPrecomputingWith Register ArraysWith Register Arrays
Read Data
Write Address
Read Address
Write Data
Read Data
PrecomputingPrecomputingWith Register ArraysWith Register Arrays
Write Address
Read Address
Write Data
Read Data
Out
N
F
PrecomputingPrecomputingWith Register ArraysWith Register Arrays
F t = Out t+1
Write Address
Read Address
Write Data
Read Data
Out
N
F
PrecomputingPrecomputingWith Register ArraysWith Register Arrays
F t = Out t+1
= Arrayt+1 [Read Addresst+1 ]
Write Address
Read Address
Write Data
Read Data
Out
N
F
Synthesizing Bypass PathsSynthesizing Bypass Paths
Write Address
PrecomputedRead
Address
Write Data
Read Data
=?
Write Address
Read Address
Write Data
Read Data
PredictionPrediction
DCffgi
Z
N
What if ? What if ? can’t precompute, can’t precompute, too many additional resources, ortoo many additional resources, orperformance is unsatisfactoryperformance is unsatisfactory
PredictionPrediction
DCffgi
Z
N
What if ? What if ? can’t precompute, can’t precompute, too many additional resources, ortoo many additional resources, orperformance is unsatisfactoryperformance is unsatisfactory
Predict C one cycle before its arrivalPredict C one cycle before its arrival
Schedule with MispredictionsSchedule with Mispredictions
C HR1 R2
t-1 t t+1C c1
H
Verify
NegativeRegister
c2
h1 h2
Schedule with MispredictionsSchedule with Mispredictions
C HR1 R2
t-1 t t+1C c1
H
Verify
NegativeRegister
Schedule with MispredictionsSchedule with Mispredictions
C HR1 R2
t-1 t t+1C c1
H
h1
c1*=? c1
c1*
Verify
NegativeRegister
c2*
c2
h2
c2*=? c2
c2
Synthesis Issues in PredictionSynthesis Issues in Prediction
Negative register as predicting FSM Negative register as predicting FSM use signal transition probabilitiesuse signal transition probabilitiesincorporate don’t care conditionsincorporate don’t care conditions
Nullifying mispredictionsNullifying mispredictionsTwo correction strategiesTwo correction strategies
• As-Soon-As-Possible restoration• As-Late-As-Possible correction
Add handshaking signals to coordinate with Add handshaking signals to coordinate with interfaceinterface
Related WorkRelated Work PrecomputationPrecomputation
Bypass Synthesis Bypass Synthesis lookahead [Kogge ‘81, …..]lookahead [Kogge ‘81, …..]
Prediction / Speculative ExecutionPrediction / Speculative ExecutionMost likely path, arbitrarily deep [Holtmann & Ernst Most likely path, arbitrarily deep [Holtmann & Ernst
‘93,’95]‘93,’95]Pre-execution [Radivojevic & Brewer ‘94]Pre-execution [Radivojevic & Brewer ‘94]Possible multiple paths & arbitrarily deep Possible multiple paths & arbitrarily deep
[Lakshminarayana et al. ‘98][Lakshminarayana et al. ‘98]
Percolation scheduling Percolation scheduling [Potasman et al. ‘90][Potasman et al. ‘90]
ResultsResults
0
0.5
1
1.5
2
2.5
Seq QC GCD-prec FA1 FA2 MIM MIM-pred GCD-pred
Speed up Area Increase
Architectural RetimingArchitectural Retiming Improves throughput while preserving Improves throughput while preserving
functionality and sometimes latencyfunctionality and sometimes latency
Bridge gap between HLS and logic optimizationsBridge gap between HLS and logic optimizations
Unifies several sequential optimizationsUnifies several sequential optimizationsbypass synthesisbypass synthesislookahead transformationlookahead transformationbranch predictionbranch predictionfine-grain cross register optimizationsfine-grain cross register optimizations
Ph.D. Forum at DAC ‘99Ph.D. Forum at DAC ‘99 Goal Goal
increase interaction between academia and industryincrease interaction between academia and industry
FormatFormatstudents present work at poster session at DAC students present work at poster session at DAC researchers give feedbackresearchers give feedback
Who’s eligible?Who’s eligible?Students within 1 or 2 years of finishing Ph.D. thesisStudents within 1 or 2 years of finishing Ph.D. thesis
www.cs.washington.edu/homes/soha/forum
Precomputing in Precomputing in Single-Register CyclesSingle-Register Cycles
Lookahead -- A(n) is a function of B(n-2)
N BA
A' BAB'
[Kogge, ‘81], [Parhi & Messerschmidtt, ‘89]