Modular Multiplication Algorithms for...
Transcript of Modular Multiplication Algorithms for...
![Page 1: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/1.jpg)
ModularMultiplicationAlgorithmsforFPGAs
MustafaParlak
![Page 2: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/2.jpg)
Outline• WhatisanFPGA?• FPGAvs.ASIC&Microprocessors• FPGADesignMetrics• FPGAsinCryptography• Adders:BasicoperatorofModularMultiplications
• ModularMultiplications– InterleavedModularMultiplications– MontgomeryModularMultiplications
• ComparisonofModularMultiplicationalgorithms
![Page 3: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/3.jpg)
WhatisanFPGA• FPGA =FieldProgrammableGateArray• AsemiconductorICthatcanbeconfiguredbytheuser(designer)aftermanufacturing
• Twodimensionalarrayofcustomizablelogicblockplacedinaninterconnectframework
• Theusertoconfigure:1. Thefunctionofeachlogicblock2. Theinterconnectionbetweenthelogicblocks,
• Canbeprogrammedusingalogiccircuitdiagram(schematic)orsourcecodeinVHDLorVerilog
![Page 4: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/4.jpg)
WhatisanFPGA• Logicblocks
– toimplementcombinationalandsequentiallogic
• Interconnect– wirestoconnect inputsand
outputstologicblocks• I/Oblocks
– speciallogicblocksatperipheryofdevice forexternalconnections
• Keyquestions:– howtomakelogicblocks
programmable?– howtoconnect thewires?– afterthechiphasbeenfabricated
![Page 5: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/5.jpg)
FPGALogicBlocks
• 4-inputlookuptable(LUT)– implementscombinationallogicfunctions
• Register– optionallystoresoutputofLUT
4-LUT FF1
0
latchLogic Block set by configuration
bit-stream
4-input "look up table"
OUTPUTINPUTS
![Page 6: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/6.jpg)
FPGAInterconnect
![Page 7: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/7.jpg)
LUTs(LookUp Tables)• LUTcontainsMemoryCellstoimplementsmalllogic
functions• Eachcellholds‘0’or‘1’.• ProgrammedwithoutputsofTruthTable• Inputsselectcontentofoneofthecellsasoutput
16-bit SR
flip-flop
clock
muxy
qe
abcd
16x1 RAM4-input
LUT
clock enable
set/reset
3 Inputs LUT -> 8 Memory Cells
SRAM
SRAM
3 – 6 Inputs
Multiplexer MUX Static Random Access MemorySRAM cells
![Page 8: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/8.jpg)
ConfiguringFPGA• MillionsofSRAMcellsholdingLUTsandInterconnectRouting• VolatileMemory.Losesconfigurationwhenboardpoweris
turnedoff.• KeepBitPatterndescribingtheSRAMcellsinnon-Volatile
Memorye.g.Flash• Configurationtakes~secs
Configuration data in
Configuration data out
= I/O pin/pad
= SRAM cell
![Page 9: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/9.jpg)
GenericFPGADesignFlow
• DesignEntry:– Createyourdesign files using:
• Schematic editoror• Hardware description language
(Verilog, VHDL)• Design“implementation”onFPGA:
– Synthesis, Partition,place,androute tocreatebit-stream file
• Designverification:– UseSimulator tocheckfunction,– othersoftwaredetermines maxclock
frequency.– LoadontoFPGAdevice (cableconnects
PCtodevelopment board)• Checkoperation atfullspeed inreal
environment.
![Page 10: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/10.jpg)
FPGAvs.ASIC/Microprocessors
–ASICgiveshighperformanceatcostofinflexibility.–Processorisveryflexiblebutnottunedtotheapplication.–Reconfigurablehardwareisanicecompromise.
Microprocessor ReconfigurableHardware
ASIC
Software Firmware Hardware
![Page 11: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/11.jpg)
FPGAvs.ASIC
FPGA• Reconfigurable• Lowthroughput• Shortdesigncycle• Suitableforlowvolume
production– Lowcostatsmallnumber
• Highpower• Highsiliconarea
– Utilizationproblem• Notestingcost• Alreadyfabricated
ASIC• Noreconfiguration• Highthroughput• Longdesigncycle• Suitableforhighvolume
production(>1Million)– Lowcostatlargenumber
• Lowpower• Lowsiliconarea
– Fullyutilized• Hightestingcost• Needtobefabricated
![Page 12: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/12.jpg)
FPGAvs.ProcessorsFPGA• Longdesigncycle• Expensive• Highthroughput
– (morethan20~100x)
Processor• Shortdesigncycle• Cheap• Lowthroughput
– Significantlyslower
![Page 13: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/13.jpg)
FPGABasedApplications• Cryptography• Networkprocessors• Evolvableandbiologically-inspired hardware• RapidASICprototyping• Real-timesystems• Embeddedapplications• Custom-computinghardware• Reconfigurablecomputing• Special-purpose computationengines
– Hardwarededicatedtosolvingoneproblem(orclassofproblems)
– Acceleratorsattachedtogeneral-purposecomputers
![Page 14: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/14.jpg)
FPGADesignMetrics• TimeComplexity– Throughputisthenumberofprocesseddataperunittime(bits/sec)
– Thehigherthethroughputofadesignthebetteritsefficiency
• AreaComplexity– #ofLUT,FF,RAMetc.
• Designmetriccombiningtimeandareatogether– Throughput/Area– Theratioishigherincaseofhighthroughputandlessspace
• Anotherimportantdesignmetric:Power
![Page 15: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/15.jpg)
Area-Speedoptimization
Loopunrolling&pipelining
Ingeneralthereisatrade-offbetween• Speed• Area
• Speedboosters• Parallelexecution• Loopunrollingand
pipelining• Inallcasesarea
increaseswithincreasingspeed
![Page 16: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/16.jpg)
WhyFPGA?• Flexibilityfromgeneralpurposecomputingandspeedfromreconfigurable logic
• Duetotheinherentfine-grainedgranularitytheparallelismtendstobeveryhigh
• Registers,latchesandevendistributedRAMblockscanbecreatedanddistributedwhereverneededbythedatapath
• LackofafixedarchitectureofFPGA,allowsthedesignerstotailordesign'sdatapathandcontrolflowarbitrarily
• Highlyregularanditerativeapplicationswithnon-standardwordlengths.
![Page 17: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/17.jpg)
WhyFPGAsuitswellinCryptography
• Speed&realtimeexecution– Encryption/decryptiondaterateupto1Gb/secforIPseccrypto
devices• RNGintegrity
– RObasedRNG• COMSECCriteria
– Red-BlackSeparation.– HardertoattackandbreakthecryptosystemrunningonFPGAas
comparedtoGPPs• TheeffectivenessoftheFPGA’scellstructureforimplementingbit-
wiselogicaloperationstypicaltomanycryptographicalgorithms• ThelargeamountofmemoryinsideFPGA
– Easetheimplementationofmemoryintensivesubstitutionoperation– Localstorageworkingasacachewheneverneeded
• Lowpower(ascomparedtoGPP?)
![Page 18: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/18.jpg)
ModularMultiplicationAlgorithms• Whymodularmultiplicationisimportant?
– Mostcommon operation of• RSA• Finitefieldarithmetic• DSA• Diffie-Hellmankeyexchange• ECC
• ModularMultiplicationalgorithmsinGF(p)– Multiply anddivide
• Naïvemethod– Interleavedmodularmultiplication
• Multiplicationandreduction areinterleaved– Montgomerymodularmultiplication
• Transformationandoperations inresiduedomain– Otheralgorithms
• Brickell’s method• …etc
![Page 19: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/19.jpg)
Adders:BasicBuildingBlockofMultiplication
• Fulladder(FA)iscombinational circuitwith3inputsandtwooutputs
• Computes sum(Si)andcarry(Ci+1)forthenextstage• FAisone-bit adder.Whathappens ifFAscascaded to
maken-bitadder§ Carryhastobepropagated§ Problem: propagationdelay § Canwegetridofcarrypropagation ordecrease
it?§ Number ofmethod proposedtoefficiently
implement addition• Ripple Carry(obviousone)• CarryLookAhead• CarrySave• DelayedCarry• Brent-Kung• etc….
![Page 20: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/20.jpg)
RippleCarryandCarrySaveAddersRippleCarryAdder• EachFAreceivesCin from
previousFA• Advantages
– Signdetectioniseasy• Disadvantage
– Delayishigh– LetdelayofanFAisT(FA)– Delayofn-bitadderisn*T(FA)
Carry-SaveAdder• ParallelEnsembleofFAs• Advantages
– DelayisconstantandoneFA• Disadvantages
– Addsthreenumberandproducestwo
– Thesigndetectionishard– Needconventionaladdertoget
finalresult
![Page 21: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/21.jpg)
OtherAddersandComparison
CarryLookAhead• Improvesspeedby
reducingcarrypropagation
CarryDelayedAdder• Twolevelcarrysave
adder
![Page 22: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/22.jpg)
ModularAddition• GivenA,B<PcomputeA+B(modP)
1. FindSʹ=A+B2. If(Sʹ>P)3. S=Sʹ- P4. elseS=Sʹ
• Omura’s Method:Anefficientmethodcomputingthemodularaddition– Usefulformultioperandmodularaddition– Eliminatestheneedforsubtraction– Foran-bitoperands,thismethodalwayskeepstheintermediate
resultswithinn-bit.Nevergrowsbeyondthat– Wheneveritexceedsn-bit,thecarry-outisignoredandacorrection
isperformed.
![Page 23: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/23.jpg)
Omura’s Method1. Computecorrection
factorm=2n-P2. FirstcomputeS'=A
+B.3. Ifthereisacarry-
out(nth bit),thenS=S'+m,elseS=S'.
Ex:AssumeP=39m=26-39=25=(011001)
WeobtaintheresultasS=31whichis70(mod39)
![Page 24: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/24.jpg)
InterleavedModularMultiplication
Atmosttwosubtraction isneeded toreducepartial product
![Page 25: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/25.jpg)
InterleavedwithOmura’s MethodObservations withstandard interleavedmethod• 3addition (orsubtraction) periteration• Twocomparison andtworeduction per iteration• Partialaddition result goesbeyondn-bit• UseOmura’s method togetridofsubtractions andcomparisons
Advantages• Comparisons andsubtractions
eliminated• PartialproductRnevergrow
beyondn-bitDisadvantages• Pre-computation increases
execution time• Still3addition periteration• ExtramemoryforstoringM• Onefinalcorrection subtraction
mayberequired
![Page 26: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/26.jpg)
InterleavedwithPre-computation• Aclevermethodtoreduce3addition/reductionto1addition:
– Idea:Reductionofith iterationcanbecalculatedandgetreadyfornextiteration(i+1)th.(correctionstep)
– Correctioncanbeaddedtothenextiterationintermediateproduct– InsteadofreducingwithPreducewith2nwhichisselectingnleast
significantbits– Thesepossiblecorrectionvaluescanbepre-computedbefore
multiplicationstartsandstoredinalook-uptable• Atith iteration,assumepartialproductiscalculatedR=A•Bi +2R
andreadyfornextiteration.• PartialproductR,maygrowonly2morebit,fromnton+2as
R=(Rn+1 RnRn-1 …R0)• AssumethatRgrowonly1bit,R=(RnRn-1….R0).
– NowRisn+1bitlong
![Page 27: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/27.jpg)
InterleavedwithPre-computation• InsteadofreducingRtoP,reduceitto2n.
– Rʹ=R(mod2n)selectsnleastsignificantbitsofR.ThenRʹ=R– 2n isreadyfori+1th iteration
– Addcorrectionfactoratnextiteration(i+1th) torestorethesamepartialproductinoriginalinterleavedalgorithm
• At(i+1th)iteration,AssumeBi+1 =0– OriginalinterleavedalgorithmfindsRʹʹ=A•Bi +2R(modP)=0+2R
(modP)=2R(modP)• Verification
– Shiftleft(doublesthepartialproduct)Rʹʹ=2Rʹ=2(R– 2n)=2R- 2n+1– Reducethepartialproductbyadding2n+1 (modP).(correctionfactor)– Rʹʹ=2R- 2n+1+2n+1 (modP)=2R(modP)whichisdesiredresultfor
i+1th iteration.• Onlyafewpossiblecorrectionfactormayoccur.
– 0,B,2n+1 (modP),B+2n+1 (modP),2n+2 (modP),B+2n+2 (modP)
![Page 28: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/28.jpg)
InterleavedwithPre-computation• Advantages
– Oneadditionineachiteration– Almost2xincreaseinspeed
• Disadvantages– Requirepre-computation(breakstheregularity)
– Requireoneextraiteration– Requireextralocalstorage(4xoperandbitlength=4x2n)• Ex:2048-RSAmodularmultiplication(4x2048=8kbit
– comparisonandsubtractionattheend
![Page 29: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/29.jpg)
InterleavedwithPre-computationDatapath
![Page 30: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/30.jpg)
InterleavedwithCSAUtilization
• ConventionaladderisreplacedwithCSAadder(redundantrepresentation)
• ReductiontoMininterleavedalgorithmisreplacedwithreductionwith2n
• Afterwards,thevalueofk*2n(modM)isaddedinordertoreconstructthecorrectintermediateresultatnextiteration
• AttheendS,Careaddedtofindcorrectresult
![Page 31: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/31.jpg)
InterleavedwithCSAUtilizationAdvantages• Twoaddition periteration (?)• Additions inconstanttime (No
carrypropagation)
Disadvantages• Theresult isinredundant form
(C,S)whichhastobecalculatedwithconventional adder. (Onemoreadder)
• Calculation ofAisnotstraightforwardandneed subtractionsandcomparisons
• NeedmorestoragetosaveS,Cinstead ofone.
• Datapath requiremorelogic• Complex FSMandaddress
generation
![Page 32: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/32.jpg)
InterleavedwithCSAUtilizationandPre-computation
• Thementionedproblemsmakesthealgorithminfeasible
• Samepre-computationideaisapplied– TheintermediateresultIhasonlytwopossiblevalues(0,Y)
– IncorrectionphaseAalsohasafewpossiblevalues
– Thesetwocancombinedas2A+Iandpre-computedandstored
![Page 33: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/33.jpg)
InterleavedwithCSAUtilizationandPre-computation
Advantages– Onlyoneaddition periteration
inconstant time– Nocomparison andreduction
Disadvantages– Require pre-computation
(breakstheregularity)– Require oneextraiteration– Require extrastorage(6x
operandbit length)• Ex:2048-RSAmodular
multiplication• 6x2048=12kbit localstorage
– Attheendofiterations• Requireconventionaladderto
calculate(C+S)• Mayrequireoneextrareduction
(subtraction)– Require 3operandmemory
bandwidth percycle
![Page 34: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/34.jpg)
MontgomeryModularMultiplication
• In1985,P.L.MontgomeryintroducedanefficientalgorithmforcomputingA·B(modP)
• Itperformsmoduloreductionwithoutdivision• AlgorithmreplacesdivisionbyPoperationwithdivisionbyapower
of2– Wellsuitscomputersystemsbecausedivisionbypowerof2issimply
theshiftoperation• DefineanP-residuetoberesidueclassmoduloP.
– GivenA,Basn-bitoperand.Aʹ=A·R(modP),Bʹ=B·R(modP)• SelectRco-primetoP.NaturalchoiceisRbeingtheoperandsize
(2n).• Montgomerymultiplicationcomputes
– MonPro(A,B)=A·B·R-1 (modP)• GivenAʹ=A·R(modP),B
– MonPro(Aʹ,B)=Aʹ·B·R-1 (modP)=A·R·B·R-1 (modP)=A·B(modP)
![Page 35: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/35.jpg)
BinaryMontgomeryModularMultiplication
• A,B,Paren-bitnumbers (A,B,P<2n)• LetA=(An-1An-2 •••A0)bethebinaryrepresentation ofA.• Choose R=2n• MonPro(A,B)=A·B·2-n (modP)• Startfromthe leastsignificant bit,andobtainthefollowingbinaryadd-shift
algorithm tocomputeT=A·B·2-n
![Page 36: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/36.jpg)
BinaryMontgomeryModularMultiplication
• WeareinterestedinT=A•B•2-n (modP)notT=A•B•2-n
• ReducepartialproductTineachiteration– IfTiseventhen
• T/2(modP)=T/2• Reducebyjustrightshiftedbyonebit
– IfTisoddthenT+Pmustbeeven• WeknowT<P• T(modP)=T+P(modP)• (T+P)<2P=>(T+P)/2<P• ResultisalreadyreducedmoduloP• ReducebyaddingPandthenrightshiftingbyonebit
![Page 37: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/37.jpg)
BinaryMontgomeryModularMultiplication
Advantages• Onaveragemorethanone
addition foreach iteration• Onlyone-bit comparison is
performed todecide thePaddition
Disadvantages• Oneextrasubtraction is
needed attheend• Require conversionto
residue domain• Notabigproblem if
multiple multiplicationsrequiredforthesamemodulus
![Page 38: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/38.jpg)
MontgomeryMultiplicationwithPre-computation
• Beforecomputingpartialproductitisknownthateither0,P,B,B+Pneedtobeadded.
• Followingtruthtableshowswhattoadded
R0 Ai B0 Precomp
0 0 0 0
0 0 1 0
0 1 0 B
0 1 1 B+P
1 0 0 P
1 0 1 P
1 1 0 B+P
1 1 1 B
![Page 39: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/39.jpg)
MontgomerywithPre-computation
Advantages• Lessthanoneadditionper
iteration– Latencydecreased
• Simplerdatapath
Disadvantages• Storageisrequiredtosave
B+P• B+Phastobecalculated
beforeiterationsstart.• Littlebitmorecomplexloop
controlcomparedtosimpleMontgomerymultiplication– Negligible
![Page 40: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/40.jpg)
MontgomerywithCSAutilization
![Page 41: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/41.jpg)
MontgomerywithCSAutilizationAdvantages• AdditionsisdonebyCSAwhichhas1FAdelay
– Improvesoperation frequency• Almostoneadditionperiteration
Disadvantages• Memorybandwidthis3operandpercycle(C,S,I)• Require1extraiterationtorestoretheresult• Storageincreases
– X,Y,P,Y+P,C,Sneed tobestored• Complexdatapath(2xlargerbecauseofredundantrepresentation{C,S})
– Conventional adderneeded togetC+S• Directlyaffectsoperationfrequency(think ofRCAn*FAdelay)
• Conventionaladditionneedtobereduced(finalreduction)
![Page 42: Modular Multiplication Algorithms for FPGAskoclab.cs.ucsb.edu/teaching/ca/docx/w08a/ModMulParlak.pdfWhat is an FPGA • Logic blocks – to implement combinational and sequential logic](https://reader031.fdocuments.in/reader031/viewer/2022030500/5aabf0377f8b9a693f8c989b/html5/thumbnails/42.jpg)
ComparisonsofMMAlgorithmsAlgorithms # ofAddition/
iteration# ofAdder Storageneeded
Interleaved Greater than2 1 3xoperand length
InterleavedwithPre-computation
Slightlygreaterthan1(oneextra
iteration)
1 7xoperand length
InterleavedwithCSA
Slightlygreaterthan1 (oneextra
iteration)
2(1CSA,1RCA)Complex datapath(redundant rep)
9xoperand length
Montgomery Greaterthan1lessthan1.5
1 3xoperand length
MontgomerywithPre-computation
Less than1 1 4x operand length
MontgomerywithCSA
Slightlygreaterthan1 (oneextra
iteration)
2(1CSA,1RCA)Complex datapath(redundant rep)
4xoperand length