NASH - Scheme and Functional Programming Workshop · IP 41 (br -23) is a backward jump instruction....
Transcript of NASH - Scheme and Functional Programming Workshop · IP 41 (br -23) is a backward jump instruction....
NASHanexperimental
TRACINGJITVMfor
GNUGUILE
1
TRACINGJIT
2
ASSUMPTIONSProgramsspendmostoftheirruntimeinloops.Severaliterationsofthesamelooparelikelytotakesimilarcodepaths.
3
TERMINOLOGYTrace:Recordedinstructionsequence.
Fragment:Artifactmadefromatrace.
Guard:Atestinsertedtonativecode.
4
GNUGUILE
5
GuilecompilesSchemesourcecodetobytecode
Bytecodehasflatstructure.
Bytecodecontainslabelsforjumpdestination.
(define(mandelbrotxy)(let((cr(-y0.5))(cix)(zi0.0)(zr0.0))(letlp((i0)(zrzr)(zizi))(if(<*max-iterations*i)0(let((zi2(*zizi))(zr2(*zrzr)))(if(<*bailout*(+zi2zr2))i(lp(+i1)(+(-zr2zi2)cr)(+(*2.0zrzi)ci))))))))
Disassemblyofmandelbrotat#x2a8:
0(assert-nargs-ee/locals39)1(static-ref113087)3(sub11911)4(make-short-immediate92)5(static-ref83093)7(toplevel-box7309329252923#t)12(box-ref77)13(static-ref63097)15(mov58)16(mov48)17(mov89)L1:18(br-if-<78#f26)21(mul344)22(mul255)23(toplevel-box1308929092897#t)28(box-ref11)29(add032)30(br-if-<10#f12)33(add/immediate881)34(sub323)35(add3311)36(mul565)37(mul554)38(add5510)39(mov45)40(mov53)41(br-23)L2:42(mov108)43(return-values2)L3:44(mov109)45(return-values2)
6
CompiledbytecodeinstructionsareinterpretedbyaCfunction.
...whichiscalledVM-engine.
VM-enginecouldbeviewedasahugeswitch...casestatement.
staticSCMVM_NAME(scm_i_thread*thread,structscm_vm*vp,scm_i_jmp_buf*registers,intresume){...VM_DEFINE_OP(1,call,...){...}VM_DEFINE_OP(2,call-label,...){...}VM_DEFINE_OP(3,tail_call,...){...}...}
7
NASHOVERVIEW
8
NashInterpreter
Nativecode
Start
Interpretbytecode
Hotloopfound?
Lookupnativecode
Nativecodefound?
Recordandinterpret
Endofloop?
No
Yes
No
Runcompiledcode
Yes
No
Compile
Yes
NashCompiler
Guardfailed?
Recoverinterpreterstate
No
Yes
Keycomponents:
InterpreterCompilerNativecode
9
NashInterpreter
Nativecode
Start
Interpretbytecode
Hotloopfound?
Lookupnativecode
Nativecodefound?
Recordandinterpret
Endofloop?
No
Yes
No
Runcompiledcode
Yes
No
Compile
Yes
NashCompiler
Guardfailed?
Recoverinterpreterstate
No
Yes
Userprogramsstartfromtheinterpreter.
Interpreterobserveseachbytecodeinstruction,seeksforhotloops.
Whenahotloopwasdetected,theinterpreterlooksupaccompanyingnativecode.
Ifnonativecodewerefound,theinterpreterstartsrecordingtheinstructionsintheloop.
10
IP41(br-23)isabackwardjumpinstruction.
TheinterpreterjumpsbacktoIP18(br-if-78#f20),whichismarkedasL1,thelabelone.
ThentheinterpreterrecordstheinstructionsbetweenIP18and41.
Recordedtrace:
Disassemblyofmandelbrotat#x2a8:
0(assert-nargs-ee/locals39)1(static-ref113087)3(sub11911)4(make-short-immediate92)5(static-ref83093)7(toplevel-box7309329252923#t)12(box-ref77)13(static-ref63097)15(mov58)16(mov48)17(mov89)L1:18(br-if-<78#f26)21(mul344)22(mul255)23(toplevel-box1308929092897#t)28(box-ref11)29(add032)30(br-if-<10#f12)33(add/immediate881)34(sub323)35(add3311)36(mul565)37(mul554)38(add5510)39(mov45)40(mov53)41(br-23)L2:42(mov108)43(return-values2)L3:44(mov109)45(return-values2)
;;;trace1:bytecode167f39397e92f0(br-if-<78#f26)7f39397e92fc(mul344)7f39397e9300(mul255)7f39397e9304(toplevel-box1308929092897#t)7f39397e9318(box-ref11)7f39397e931c(add032)7f39397e9320(br-if-<10#f12)7f39397e932c(add/immediate881)7f39397e9330(sub323)7f39397e9334(add3311)7f39397e9338(mul565)7f39397e933c(mul554)7f39397e9340(add5510)7f39397e9344(mov45)7f39397e9348(mov53)7f39397e934c(br-23)
11
NashInterpreter
Nativecode
Start
Interpretbytecode
Hotloopfound?
Lookupnativecode
Nativecodefound?
Recordandinterpret
Endofloop?
No
Yes
No
Runcompiledcode
Yes
No
Compile
Yes
NashCompiler
Guardfailed?
Recoverinterpreterstate
No
Yes
Therecordedinstructionsarethenpassedtocompiler.
CompileriswritteninScheme.
UsesA-normalformIR.
UsesGNULightningasassemblerbackend.
12
IRcontainsaprologuesectionandaloopbodysection.
Prologuesectionloadslocalsfromthestack,thestackissharedwiththeinterpreter.
IRtonativecodecompilationisdoneinalmoststraightforwardmanner.
;;;trace1:bytecode167f39397e92f0(br-if-<78#f26)7f39397e92fc(mul344)7f39397e9300(mul255)7f39397e9304(toplevel-box1308929092897#t)7f39397e9318(box-ref11)7f39397e931c(add032)7f39397e9320(br-if-<10#f12)7f39397e932c(add/immediate881)7f39397e9330(sub323)7f39397e9334(add3311)7f39397e9338(mul565)7f39397e933c(mul554)7f39397e9340(add5510)7f39397e9344(mov45)7f39397e9348(mov53)7f39397e934c(br-23)
;;;trace1:anf(lambda()(let*((_(%snap0))(v0(%sref/f02))(v2(%sref/f22))(v3(%sref/f32))(v4(%sref/f42))(v5(%sref/f52))(v6(%sref/f62))(v7(%sref71))(v8(%sref81))(v10(%sref/f102))(v11(%sref/f112)))(loopv0v1v2v3v4v5v6v7v8v10v11)))(lambda(v0v1v2v3v4v5v6v7v8v10v11)(let*((_(%snap1v0v1v2v3v4v5v8))(_(%gev7v8))(v3(%fmulv4v4))(v2(%fmulv5v5))(v1(%cref159676641))(v0(%faddv3v2))(_(%snap2v0v1v2v3v4v5v8))(_(%typeqv12))(f2(%cref/fv12))(_(%fgef2v0))(_(%snap3v0v1v2v3v4v5v8))(v8(%addovv84))(v3(%fsubv2v3))(v3(%faddv3v11))(v5(%fmulv6v5))(v5(%fmulv5v4))(v5(%faddv5v10))(v4v5)(v5v3))(loopv0v1v2v3v4v5v6v7v8v10v11)))
13
NashInterpreter
Nativecode
Start
Interpretbytecode
Hotloopfound?
Lookupnativecode
Nativecodefound?
Recordandinterpret
Endofloop?
No
Yes
No
Runcompiledcode
Yes
No
Compile
Yes
NashCompiler
Guardfailed?
Recoverinterpreterstate
No
Yes
Aftercompilation,thecontrolflowgoesbacktotheinterpreter.
Theinterpreterwillfindthenativecodeoftheloopfromthenextiteration.
Nativecoderunsuntilguardfails.Theguardfailuretriggersabailoutcode.
BailoutcoderecoversVMstateandpassesbackthecontrolofuserprogramtotheintepreter.
14
IRofrecordedtrace Nativecodeofloopbody(x86-64);;;trace1:anf(lambda()(let*((_(%snap0))(v0(%sref/f02))(v2(%sref/f22))(v3(%sref/f32))(v4(%sref/f42))(v5(%sref/f52))(v6(%sref/f62))(v7(%sref71))(v8(%sref81))(v10(%sref/f102))(v11(%sref/f112)))(loopv0v1v2v3v4v5v6v7v8v10v11)))(lambda(v0v1v2v3v4v5v6v7v8v10v11)(let*((_(%snap1v0v1v2v3v4v5v8))(_(%gev7v8))(v3(%fmulv4v4))(v2(%fmulv5v5))(v1(%cref159676641))(v0(%faddv3v2))(_(%snap2v0v1v2v3v4v5v8))(_(%typeqv12))(f2(%cref/fv12))(_(%fgef2v0))(_(%snap3v0v1v2v3v4v5v8))(v8(%addovv84))(v3(%fsubv2v3))(v3(%faddv3v11))(v5(%fmulv6v5))(v5(%fmulv5v4))(v5(%faddv5v10))(v4v5)(v5v3))(loopv0v1v2v3v4v5v6v7v8v10v11)))
;;;trace1:ncode624...loop:0x01ee61c8cmpr14,r150x01ee61cbjl0x01efe028->10x01ee61d1movsdxmm13,xmm140x01ee61d6mulsdxmm13,xmm140x01ee61dbmovsdxmm12,xmm150x01ee61e0mulsdxmm12,xmm150x01ee61e5movr9,QWORDPTRds:0x1cac5a80x01ee61edmovsdxmm11,xmm130x01ee61f2addsdxmm11,xmm120x01ee61f7testr9,0x60x01ee61fejne0x01efe0300x01ee6204movrax,QWORDPTR[r9]0x01ee6207andrax,0xffff0x01ee620dcmprax,0x2170x01ee6213jne0x01efe0300x01ee6219movsdxmm10,QWORDPTR[r9+0x10]0x01ee621fucomisdxmm11,xmm100x01ee6224ja0x01efe030->20x01ee622amovr11,r150x01ee622daddr11,0x40x01ee6231jo0x01efe0380x01ee6237movr15,r110x01ee623amovsdxmm8,xmm130x01ee623fmovsdxmm13,xmm120x01ee6244subsdxmm13,xmm80x01ee6249addsdxmm13,xmm50x01ee624emulsdxmm15,xmm70x01ee6253mulsdxmm15,xmm140x01ee6258addsdxmm15,xmm60x01ee625dmovsdxmm14,xmm150x01ee6262movsdxmm15,xmm130x01ee6267jmp0x01ee61c8->loop...
15
BENCHMARKS
16
TotaltimenormalizedtoGuilebytecodeinterpreter
0
0.5
1
1.5
2
2.5
3
sumfp
mbrot
sumray
sumloop
pnpoly
trav1takl
trav2ntakl
wc triangl
fft array1
quicksort
fib boyer
sboyer
ackbv2string
takdderiv
cpstak
mazedestruc
scheme
nboyer
simplex
derivlattice
nucleic
graphs
browse
perm9
divrec
puzzle
diviter
primes
tailmatrix
fibfpnqueens
gcbench
earley
catmazefun
conform
peval
gcoldsum1
paraffins
pi ctakstring
parsing
fibcdynamic
Nash
sumfp mbrot sum ctak string parsing fibc dynamic GM
Nash 0.024 0.034 0.119 1.073 1.115 1.654 1.678 2.506 0.400
"sumfp"and"mbrot"containloopswithflotingpointnumberarithmetic.
"string"and"fibc"mostlyuseproceduresimplementedinC.
"parsing"and"dynamic"containlargeamountofdata-drivenconditionalbranches.
17
Geometricstandardscoresofthebenchmarksuitefrom10Schemenativecodecompilers
-3
-2
-1
0
1
2
3
sumfp
stringmbrot
sumtrav1array1
bv2string
sumloop
perm9
quicksort
gcbench
destruc
raypnpoly
fft trav2prim
es
triangl
pi taklacksum1puzzle
nboyer
boyer
ntakl
paraffins
earley
maze
nqueens
cpstak
tailwc gcold
fib scheme
ctakbrowse
catsboyer
fibcmatrix
takgraphs
fibfppeval
lattice
dynamic
divrec
diviter
simplex
nucleic
derivdderiv
mazefun
conform
parsing
Nash
Distributionofbenchmarkresults
Chez Bigloo Ikarus Pycket Gambit Larceny Racket Nash Chicken MIT
GM 0.148 0.236 0.244 0.252 0.274 0.301 0.324 0.400 0.448 0.486
Nashshowedthebestscorein"string",butGuilebytecodeinterpreterwasfaster.
"parsing"wastheslowestbenchmarkforNash,butthescoreofPycketwasnotsobad.
18
Significantimprovementintightloopwithfloatingpointarithmetic.
Noneedtouse"fl+","fx+",...etc.
NotmuchdifferencesinproceduresimplementedinC.
Notmuchsuitedforprogramswithlargeamountofconditionalbranches,e.g.:parser,interpreter,andcompiler.
JITwarminguptime.
PROSANDCONSOFNASH
19
QUESTIONS?
20