A Personal View of Real-Time Computing - Ptolemy …eal/presentations/Lee...Compute future time...
Transcript of A Personal View of Real-Time Computing - Ptolemy …eal/presentations/Lee...Compute future time...
UniversityofCaliforniaatBerkeley
APersonalViewofReal-TimeComputing
EdwardA.LeeProfessoroftheGraduateSchool
ÉcoleNormaleSupérieureParis,France,February26,2020
Invited Talk
CyberPhysicalSystems
2
Predictabilityrequiresdeterminacyanddependsontiming,includingexecutiontimesandnetworkdelays.
WhatisRealTime?
• fastcomputation• prioritizedscheduling• computationonstreamingdata• boundedexecutiontime• temporalsemanticsinprograms• temporalsemanticsinnetworks
3 Lee,Berkeley
WhatisRealTime?
• fastcomputation• prioritizedscheduling• computationonstreamingdata• boundedexecutiontime• temporalsemanticsinprograms• temporalsemanticsinnetworks
4 Lee,Berkeley
Theseareverydifferentfromoneanother.Wehavetodecidewhichtofocuson.
Correct execution of a program in all widely used programming languages, and correct delivery of a network message in all general-purpose networks has nothing to do with how long it takes to do anything.
Programmershavetostepoutsidetheprogrammingabstractionstospecifytimingbehavior.
Lee,Berkeley 5
Timing is not part of software and network semantics
AchievingRealTime
• overengineering• usingoldtechnology• response-timeanalysis• real-timeoperatingsystems(RTOSs)• specializednetworks• extensivetestingandvalidation
6 Lee,Berkeley
Timingofprogramsemergesfromtheimplementation
• Pipelinehazards• Cacheeffects• VariableDRAMlatencies• Speculativeexecution• Interrupts• Forwarding• Dynamicvoltage/frequency• …
7 Lee,Berkeley
PC
Instructionmemory Mux
Add
4
fetchdecode
executem
emory
writeback
Registerbank
Mux
ALU
Decode
Zero?
branchtaken
control hazard (conditional branch)data hazard (com
puted branch)
datamemory
Mux
data hazard (mem
ory read or ALU result)
ImagefromLee&Seshia,IntroductiontoEmbeddedSystems
MITPress,2017
MessyTime
8 Lee,Berkeley
ModelTimebecomesamesswithinterruptsandthreads
IEEEComputer,May,2006.
CurrentTrendsinReal-TimeSoftware
• Modelthedetails• Analyzethemodels
9 Lee,Berkeley
Evendeterministicreal-timemodelscanleadtochaos.[ThieleandKumar,EMSOFT2015]
Resultisexpensive,intractablemodels.
NewtonianTimeforPhysicalDynamics
PhysicalSystem Model
Signal Signal
Newtoniantimeadvanceseverywhereuniformlyinacontinuum.
Lee,Berkeley 10
Image:WikimediaCommons
PitfallwithNewtonianTime(1)
Whenrealizedinasoftware-basedmodel:1. Theprecisionoftimeshouldbefiniteandthesame
forallobservers.2. Theprecisionoftimeshouldbeindependentofthe
absolutemagnitudeofthetime.3. Additionoftimeshouldbeassociative.Thatis,for
anythreetimeintervalst1,t2,andt3,(t1+t2)+t3=t1+(t2+t3)
Floatingpointnumbersdonotsatisfythese.
11 Lee,Berkeley
[1]Broman,etal.“Requirementsforhybridcosimulationstandards.HSCC2015.[2]Cremona,etal.,“Hybridco-simulation:it'sabouttime,”SoftwareandSystemsModeling2017.
PitfallwithNewtonianTime(2)
• “Continuum”doesnotimply“continuous.”
12 Lee,Berkeley
v1v2
-1.5
-1.0
-0.5
0.0
0.5
1.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Velocities
time
Signal Signal
13 Image:WikimediaCommonsLee,Berkeley
BringtheCyberandthePhysicalTogether
AchievingRealTimeinPractice
• overengineering• usingoldtechnology• response-timeanalysis• real-timeoperatingsystems(RTOSs)• specializednetworks• extensivetestingandvalidation
14 Lee,Berkeley
Maybewecandobetter?
AnEpiphany
15
• Inscience,thevalueofamodelliesinhowwellitsbehaviormatchesthatofthephysicalsystem.
• Inengineering,thevalueofthephysicalsystemliesinhowwellitsbehaviormatchesthatofthemodel.
Ascientistasks,“CanImakeamodelforthisthing?”Anengineerasks,“CanImakeathingforthismodel?”
Lee,Berkeley 16
TheValueofModels
Modelsvs.Reality
Inthisexample,themodelingframeworkiscalculusandNewton’slaws.Fidelityishowwellthemodelanditstargetmatch
17 Lee,Berkeley
Themodel
Thetarget(thethingbeingmodeled).
AModel
18 Lee,BerkeleyImagebyDominiqueToussaint,GNUFreeDocumentationLicense,Version1.2orlater.
APhysicalRealization
19 Lee,Berkeley
ModelFidelity
• Toascientist,themodelisflawed.• Toanengineer,therealizationisflawed.
Toarealist,bothareflawed…Perhapsweshouldbemakingourrealizationsmorefaithfultoourmodelsratherthantheotherwayaround?
20 Lee,Berkeley
UsefulModelsandUsefulThings
“Essentially,allmodelsarewrong,butsomeareuseful.”
Box,G.E.P.andN.R.Draper,1987:EmpiricalModel-BuildingandResponseSurfaces.WileySeriesinProbabilityandStatistics,Wiley.
“Essentially,allsystemimplementations
arewrong,butsomeareuseful.”LeeandSirjani,“Whatgoodaremodels,”FACS2018.
Lee,Berkeley 21
TheValueofSimulation
“Simulationisdoomedtosucceed.”
[anonymous]
Couldthisstatementbeconfusingengineeringmodelsforscientificones?
22 Lee,Berkeley
LeeandSirjani,“Whatgoodaremodels,”FACS2018.
ChangingtheQuestion
Isthequestionwhetherourmodelsdescribethebehaviorofreal-timesystems(withhighfidelity)?OrIsthequestionwhetherwecanbuildreal-timesystemswherebehaviormatchesthatofourmodels(withhighprobability)?
23 Lee,Berkeley
The hardware out of which we build computers is capable of delivering “correct” computations and precise timing…
Synchronous digital logic delivers precise, repeatable timing. … but the overlaying software abstractions discard timing.
// Perform the convolution. for (int i=0; i<10; i++) { x[i] = a[i]*b[j-i]; // Notify listeners. notify(x[i]); }
Lee,Berkeley 24
PRET Machines – Giving Software the Capabilities its Hardware Already Has.
• PREcision-Timed processors = PRET • Predictable, REpeatable Timing = PRET • Performance with REpeatable Timing = PRET
= PRET + Computing
With time
// Perform the convolution. for (int i=0; i<10; i++) { x[i] = a[i]*b[j-i]; // Notify listeners. notify(x[i]); }
http://chess.eecs.berkeley.edu/pret
Lee,Berkeley 25
MajorChallengesandexistenceproofsthattheycanbemet
• Pipelines– fine-grainmultithreading
• Memoryhierarchy– memorycontrollerswithcontrollablelatency
• I/O– threadedinterruptswithzeroeffectontiming
Lee,Berkeley 26
ThreeGenerationsofPRETMachinesatBerkeley
• PRET1,Sparc-based(simulationonly)– [Licklyetal.,CASES,2008]
• PTARM,ARM-based(FPGAimplementation)– [Liuetal.,ICCD,2012]
• FlexPRET,RISC-V-based(FPGA+simulation)– [Zimmeretal.,RTAS,2014,PhDThesis2015]
Lee,Berkeley 27
HardwarethreadHardwarethreadHardwarethread
Our Second Generation PRET PTArm, a soft core on a Xilinx Virtex 5 FPGA (2012)
Hardwarethread
registers
scratchpad
memory
I/Odevices
Interleaved pipeline with one set of registers
per thread
SRAM scratchpad
shared among threads
DRAM main memory,
separate banks per thread
memorymemory
memory
IsaacLiu,PhDThesis,2012
PipelineInterleavingisOld
FirstusedintheCDC6600.
29
OurThird-GenerationPRET:Open-SourceFlexPRET(Zimmer2014/15)
• 32-bit,5-stagethreadinterleavedpipeline,RISC-VISA– Hardreal-timeHWthreads:scheduledatconstantrateforisolationandrepeatability.
– Softreal-timeHWthreads:shareallavailablecyclesforefficiency.
• DeployedonXilinxFPGA
Lee,Berkeley 30
Every3cycles(unlessdone)
WMXD
WMX
WMW
F DF
XDF
MXDF
WMXDF
HRTT0
SRTT1
SRTT2
Clockcycles
Instructions
Whenevercycleavailable(arbitraryinterleaving) DigilentAtlys(Spartan6)and
NImyRIO(Zync)
SRT thread
HardwarethreadHardwarethreadHardwarethreadHardwarethread
FlexPRET Hard-Real-Time (HRT) Threads Interleaved with Soft-Real-Time (SRT) Threads
Hardwarethread
registers
scratchpad
HRT threads have deterministic timing. SRT threads share remaining cycles
SRAM scratchpad
shared among threads
DRAM main memory provides
deterministic latency for HRT threads.
Conventional behavior for the rest.
memorymemory
memory
HRT thread
MichaelZimmer
Fact
Thereal-timeperformanceofaFlexPRETmachineisneverworsethanthatofaconventionalmachine.Proof:AFlexPRETmachineisaconventionalmachineifthememory-mappedregisterscontrollingHRTandSRTthreadsissettohaveonlyonethread,aSRTthread.
32 Lee,Berkeley
Benefits
• Fourhardwarethreadsisenoughtoeliminateallpipelinebubblesandmemorylatencyvariability.
• Unrealistictaskmodelsbecomerealistic.– Exact,knownWCET.– Zero-interferencetasks.– Interruptsenabledatalltimes.
• High-precisiontiminginstructions– Repeatablenanosecondprecision
33 Lee,Berkeley
TheCost
[Zimmer,Broman,Shaver,Lee,RTAS2014]34 Lee,Berkeley
Size:
TheCost
AbaselineRISC-Vwithoutanycomplexinstructions(floatingpoint,integerdivision,packedinstructions)canberealizedonanFPGAwith580flipflopsand2,788LUTs.A4-threadFlexPRETcanberealizedwith908flipflopsand3,943LUTs,anincreaseof56%and41%respectively.Percentageismuchlowerwithfloatingpoint,division,etc.[Zimmer,Broman,Shaver,Lee,RTAS2014]
35 Lee,Berkeley
AboutInterrupts
“[M]anyasystemsprogrammer’sgreyhairbearswitnesstothefactthatweshouldnottalklightlyaboutthelogicalproblemscreatedbythatfeature” -EdsgerDijkstra(1972)
36 Lee,Berkeley
Interrupts
• Nondeterministicallyinterleavedwithprogram• Makeresponsetime>executiontime• Disruptcacheandbranchpredictors• Overheadofcontextswitching
• ForWCETanalysis,havetodisableinterrupts• Disablinginterruptsincreasesvariabilityinresponsetime
37 Lee,Berkeley
Interrupts
Scientificsolution:• Modelalltheseeffects
Engineeringsolution:• Eliminatealltheseeffects
ThelatteriswhatPRETmachinesdo.
38 Lee,Berkeley
Interrupt Handler Thread
HardwarethreadHardwarethreadHardwarethread
FlexPRET I/O Interrupt Handler Thread Option
Hardwarethread
registers
scratchpad
Such interrupts have no effect on HRT threads, and
bounded effect on SRT threads!
memorymemory
memory
MichaelZimmer
AsimilarstrategyisalsousedbyXMOS,butwithlessisolation.
AbstractPRETMachines(APM)
RTSS,2017,Paris.Thispapershowsthatachievingdeterministicresponsetimesthatmeetdeadlines,whenthatisfeasible,comesatnocostinworst-caseresponsetimes.
ThisisshownforataskmodelofNsporadicindependenttaskswithdeadlines.
40 Lee,Berkeley
Intuition
• Nsporadicreal-timetaskswithminimuminterarrivaltimeTi,deadlinesDi,andWCETCi.
Theorem:WhenTi=Di,PRETyieldsdeterministicresponsetimesnoworsethantheworstcaseresponsetimeofaconventionalarchitecture.
WhenTi>Di,ifanyprocessorcandeliverdeterministicresponsetimes,PRETwill,withworstcaseresponsetimenoworsethanaconventionalarchitecture.
41 Lee,Berkeley
AvionicsMixedCriticalityTestCase
42 S.Vestal,“PreemptiveSchedulingofMulti-criticalitySystemswithVaryingDegreesofExecutionTimeAssurance,”inRTSS2007.
T0⌧A1
T1⌧A2
T2⌧A3
T3⌧A4
T4
⌧B1⌧B2⌧B3
T5
⌧B4⌧B5⌧B6⌧B7
T6
⌧C1⌧D1⌧D2⌧D3⌧D4
T7
⌧D5⌧D6⌧D7⌧D8⌧D9
t (ms)0 25 50 75 100 125 150 175 200
Fig. 1: FlexPRET-8T executing a mixed-criticality avionicscase study.
• Abstractworkloadof21tasksderivedfromatime-partitionedavionicssystem1with93%utilization
• Eachiterationperformsidenticalcomputation,but…
Deployedon8HWthreads
‘A’level(mostcritical)tasksonseparate
threads
‘B’level(critical)tasksonathreadusingnon-
preemptivestaticscheduler
‘B’level(critical)tasksonathreadusingrate-monotonicscheduler
‘C’and‘D’level(lesscritical)tasksonthreadsthatuse
earliestdeadlinefirst(EDF)
Isolatedexecutiontime
ExecutiontimedependentonHRTTs
Zimmer,PhDThesis,2015.
Programming:BasicTimingControlinPRET
43
Bui,Lee,Liu,Patel,andReineke,“Temporalisolationonmultiprocessingarchitectures,”DAC2011.
time r; get_time(r); while(1) {
add_ms(r, 10); exception_on_expire(r); task(); deactivate_exception(); delay_until(r);
}
internalclockvalue(in
nanoseconds)
Computefuturetime
Executioninterruptedattimer(deadlinemiss)Ifhere,nodeadline
misssodeactivate
Waituntilnextperiod,SRTTcanuseallocated
cycles
ExtendedRISC-VISAwithtiminginstructions.E.g.Hard-real-timeperiodictask:
Four Patterns of Timed Code Blocks
[V1]Besteffort:set_time r1, 1s // Code block delay_until r1
[V2]Latemissdetectionset_time r1, 1s // Code block branch_expired r1, <target> delay_until r1
set_time r1, 1s exception_on_expire r1, 1 // Code block deactivate_exception 1 delay_until r1
[V3]Immediatemissdetection
[V4]Exactexecution:set_time r1, 1s // Code block MTFD r1
Lee,Berkeley 44
ButWait…
ThewholepointofanISAisthatthesameprogramdoesthesamethingonmultiplehardwarerealizations.Isn’tthisincompatiblewithdeterministictiming?
Lee,Berkeley 45
InstructionSetArchitecture(ISA)
TheconceptofanISAhasn’tchangedmuchsincethe1960s.
46 FredBrooks
Photocourtesyofcomputerhistory.org
Parametric PRET Machines
ISA that admits a variety of implementations: • Variable clock rates and energy profiles • Variable number of cycles per instruction • Latency of memory access varying by address • Varying sizes of memory regions • …
A given program may meet deadlines on only some realizations of the same parametric PRET ISA.
set_time r1, 1s // Code block MTFD r1
Lee,Berkeley 47
Realizing the MTFD instruction on a Parametric PRET machine
The goal is to make software that will run correctly on many implementations of the ISA, and that correctness can be checked for each implementation.
set_time r1, 1s // Code block MTFD r1
Lee,Berkeley 48
BenefitsofPRET(Evenifyoudon’tcareaboutdeterminism)
• Verylowcontextswitchoverhead– Uptothenumberofhardwarethreads.– Conventionaloverheadabovethat.
• TighterWCETanalysis– Particularlywhenactivatingenoughthreadstoeliminatepipelinebubblesandmemoryaccessorderdependencies.
• NolongerneedtoberestrictedtopollingI/O– Effectofinterruptsisbounded.
49 Lee,Berkeley
BenefitsofPRET(Ifyoutakeadvantageofdeterminism)
• Modularity– Non-interferencebetweentasks.– Interruptshaveexactlynoeffectonhard-real-timetasks.
• Exactness– CangetnotjustWCET,butactualET.– NotjustET,butresponsetime.
• Repeatability– Worksinthefieldlikeonthebench.– Eventorderingcanbemadeinvariant.
• Complexity– Morehard-real-timetasksisbetterthanfewer.
• Certifiability– Everycorrectexecutionofthesoftwaregivesthesamebehavior.
• Energy– Reducevoltageandfrequencytothebareminimumtomeetdeadlines.
50 Lee,Berkeley
Sowhyisn’teverymodernprocessoraFlexPRET?
Possibilities:• Ourclaimslooktoogoodtobetrue.• It’saparadigmshift.
• Programmingmodelsthatcantakeadvantageofthisaremissing.
51
Today:LinguaFranca
52
Apolyglotmeta-languagewithDEsemanticsforimplementation(notsimulation)ofdeterministic,concurrent,time-sensitivesystems.
https://github.com/icyphy/lingua-franca/wiki
Conclusion
• Inscience,thevalueofamodelliesinhowwellitsbehaviormatchesthatofthephysicalsystem.
• Inengineering,thevalueofthephysicalsystemliesinhowwellitsbehaviormatchesthatofthemodel.
Mymessage:Dolessscienceandmoreengineering.http://ptolemy.berkeley.edu/prethttps://github.com/icyphy/lingua-franca http://platoandthenerd.org