Robust Arithmetic

57
Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates Jonathan Richard Shewchuk May 17, 1996 CMU-CS-96-140 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Exact computer arithmetic has a variety of uses including, but not limited to, the robust implementation of geometric algorithms. This report has three purposes. The first is to offer fast software-level algorithms for exact addition and multiplication of arbitrary precision floating-point values. The second is to propose a technique for adaptive-precision arithmetic that can often speed these algorithms when one wishes to perform multiprecision calculations that do not always require exact arithmetic, but must satisfy some error bound. The third is to provide a practical demonstration of these techniques, in the form of implementations of several common geometric calculations whose required degree of accuracy depends on their inputs. These robust geometric predicates are adaptive; their running time depends on the degree of uncertainty of the result, and is usually small. These algorithms work on computers whose floating-point arithmetic uses radix two and exact rounding, including machines complying with the IEEE 754 standard. The inputs to the predicates may be arbitrary single or double precision floating-point numbers. C code is publicly available for the 2D and 3D orientation and incircle tests, and robust Delaunay triangulation using these tests. Timings of the implementations demonstrate their effectiveness. Supported in part by the Natural Sciences and Engineering Research Council of Canada under a 1967 Science and Engineering Scholarship and by the National Science Foundation under Grant CMS-9318163. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either express or implied, of NSERC, NSF, or the U.S. Government.

Transcript of Robust Arithmetic

AdaptivePrecisionFloating-PointArithmeticandFastRobustGeometricPredicates

JonathanRichardShewchuk

May 17,1996CMU-CS-96-140

Schoolof ComputerScienceCarnegieMellon University

Pittsburgh,PA 15213

Abstract

Exactcomputerarithmetichasa varietyof usesincluding,but not limited to, therobustimplementationof geometricalgorithms.This reporthasthreepurposes.Thefirst is to offer fastsoftware-level algorithmsfor exactadditionandmultiplicationof arbitraryprecisionfloating-pointvalues.Thesecondis to proposeatechniquefor adaptive-precisionarithmeticthatcanoftenspeedthesealgorithmswhenonewishesto performmultiprecisioncalculationsthatdo notalwaysrequireexactarithmetic,but mustsatisfysomeerrorbound.Thethird is to providea practicaldemonstrationof thesetechniques,in theform of implementationsof severalcommongeometriccalculationswhoserequireddegreeof accuracy dependson their inputs. Theserobustgeometricpredicatesareadaptive; their runningtime dependsonthedegreeof uncertaintyof theresult,andis usuallysmall.

Thesealgorithmswork on computerswhosefloating-pointarithmeticusesradix two andexact rounding,includingmachinescomplyingwith the IEEE 754 standard. The inputs to the predicatesmay be arbitrarysingleor doubleprecisionfloating-pointnumbers.C codeis publicly availablefor the 2D and3D orientationandincircle tests,androbustDelaunaytriangulationusingthesetests.Timingsof theimplementationsdemonstratetheireffectiveness.

Supportedin partby theNaturalSciencesandEngineeringResearchCouncilof Canadaundera1967ScienceandEngineeringScholarshipandby the NationalScienceFoundationunderGrantCMS-9318163.The views andconclusionscontainedin thisdocumentarethoseof theauthorandshouldnot be interpretedasrepresentingthe official policies,eitherexpressor implied, ofNSERC,NSF, or theU.S.Government.

Keywords: arbitraryprecisionfloating-pointarithmetic,computationalgeometry, geometricrobustness,orientationtest,incircle test,Delaunaytriangulation

Contents

1 Intr oduction 1

2 Arbitrary PrecisionFloating-Point Arithmetic 32.1 Background ��������������������������������������������������������������������������������� 32.2 Propertiesof BinaryArithmetic ������������������������������������������������������������� 42.3 SimpleAddition ����������������������������������������������������������������������������� 62.4 ExpansionAddition ������������������������������������������������������������������������� 92.5 SimpleMultiplication ����������������������������������������������������������������������� 182.6 ExpansionScaling ��������������������������������������������������������������������������� 202.7 CompressionandApproximation ����������������������������������������������������������� 232.8 OtherOperations����������������������������������������������������������������������������� 25

3 AdaptivePrecisionArithmetic 273.1 Why Adaptivity? ����������������������������������������������������������������������������� 273.2 MakingArithmeticAdaptive ����������������������������������������������������������������� 27

4 Implementation of GeometricPredicates 304.1 RelatedWork in RobustComputationalGeometry ������������������������������������������� 304.2 TheOrientationandIncircleTests ����������������������������������������������������������� 354.3 ORIENT2D ����������������������������������������������������������������������������������� 374.4 ORIENT3D, INCIRCLE, andINSPHERE ������������������������������������������������������� 414.5 Performancein Two TriangulationPrograms������������������������������������������������� 44

5 Caveats 45

6 Conclusions 47

A Why the TiebreakingRule is Important 49

B Linear-Time ExpansionAddition without Round-to-EvenTiebreaking 49

i

About this Report

An electroniccopy of this report,andthesoftwaredescribedherein,canbeobtainedthroughtheWebpagehttp://www.cs.cmu.edu/˜quake/rob ust.h tml .

Copyright 1996by JonathanRichardShewchuk. This reportmaybefreely duplicatedanddistributedso long as this copyright notice remainsintact. Pleasemail me ([email protected] ) commentsandcorrections.

Many thankstoStevenFortune,DouglasPriest,andChristopherVanWyk, whoeachprovidedcommentson a draft of this paper, andwhosepapersprovidedthefoundationsfor this research.StevenFortunealsoprovidedLN-generatedpredicatesfor timing comparisons,andunwittingly sparkedthis researchtwo yearsago with a few brief email responses.Thanksalso to David O’Hallaron, JamesStichnoth,and DanielTunkelangfor their comments.

ii

1 Intr oduction

Softwarelibrariesfor arbitraryprecisionfloating-pointarithmeticcanbeusedto accuratelyperformmanyerror-proneor ill-conditionedcomputationsthat would be infeasibleusing only hardware-supportedap-proximatearithmetic. Someof thesecomputationshave accuracy requirementsthatvary with their input.For instance,considertheproblemof finding thecenterof a circle,giventhreepointsthatlie on thecircle.Normally, hardwareprecisionarithmeticwill suffice,but if theinputpointsarenearlycollinear, theproblemis ill-conditionedandtheapproximatecalculationmayyield awildly inaccurateresultor adivisionby zero.Alternatively, anexactarithmeticlibrary canbeusedandwill yield a correctresult,but exactarithmeticisslow; onewould ratheruseit only whenonereallyneedsto.

This reportpresentstwo techniquesfor writing fastimplementationsof extendedprecisioncalculationslike these,anddemonstratesthemwith implementationsof four commonlyusedgeometricpredicates.Thefirst techniqueis a suiteof algorithms,severalof themnew, for performingarbitraryprecisionarithmetic.Themethodhasits greatestadvantagein computationsthatprocessvaluesof extendedbut smallprecision(several hundredor thousandbits), and seemsideal for computationalgeometryand somenumericalmethods,wheremuchbenefitcanberealizedfrom a modestincreasein precision.Thesecondtechniqueis a way to modify thesealgorithmsso that they computetheir resultadaptively; they arequick in mostcircumstances,but arestill slow whentheir resultsareproneto have high relative error. A third subjectofthis reportis a demonstrationof thesetechniqueswith implementationsandperformancemeasurementsoffour commonlyusedgeometricpredicates.An elaborationof eachof thesethreetopicsfollows.

Methodsof simulatingexactarithmeticin softwarecanbeclassifiedby severalcharacteristics.Someexactarithmeticlibrariesoperateon integersor fixed-pointnumbers,while othersoperateonfloating-pointnumbers. To representa number, the former libraries storea significandof arbitrary length; the latterstorean exponentas well. Somelibraries usethe hardware’s integer arithmeticunits, whereasothersusethe floating-pointunits. Oddly, the decisionto useintegersor floating-pointnumbersinternally isorthogonalto thetypeof numberbeingrepresented.It wasoncethenormto useintegerarithmeticto buildextendedprecisionfloating-pointlibraries,especiallywhenfloating-pointhardwarewasuncommonanddifferedbetweencomputermodels.Timeshavechanged,andmodernarchitecturesarehighlyoptimizedforfloating-pointperformance;on many processors,floating-pointarithmeticis fasterthanintegerarithmetic.The trendis reversingfor softwarelibrariesaswell, andthereareseveralproposalsto usefloating-pointarithmeticto performextended-precisionintegercalculations.FortuneandVanWyk [10, 9], Clarkson[4],and Avnaim, Boissonnat,Devillers, Preparata,and Yvinec [1] have describedalgorithmsof this kind,designedto attackthe samecomputationalgeometryrobustnessproblemsconsideredlater in this report.Thesealgorithmsaresurveyedin Section4.1.

Anotherdifferentiatingfeatureof multiprecisionlibrariesis whetherthey usemultipleexponents.Mostarbitrary precisionlibrariesstorenumbersin a multiple-digit format, consistingof a sequenceof digits(usuallyof largeradix,like232) coupledwith asingleexponent.A freelyavailableexampleof themultiple-digit approachisBailey’sMPFUNpackage[2], asophisticatedportablemultiprecisionlibrarythatusesdigitsof machine-dependentradix (usually224) storedassingleprecisionfloating-pointvalues.An alternative isthemultiple-termformat,whereina numberis expressedasa sumof ordinaryfloating-pointwords,eachwith its own significandandexponent[21, 5, 17]. This approachhasthe advantagethat the resultof anadditionlike 2300 � 2 � 300 (which maywell arisein calculationslike thegeometricpredicatesdiscussedinSection4.2) canbestoredin two wordsof memory, whereasthemultiple-digit approachwill useat least601bits to storethesum,andincura correspondingspeedpenaltywhenperformingarithmeticwith it. Ontheotherhand,themultiple-digitapproachcanmorecompactlyrepresentmostnumbers,becauseonly oneexponentis stored.(MPFUNsacrificesthis compactnessto take advantageof floating-pointhardware;the

2 JonathanRichardShewchuk

exponentof eachdigit is unused.)Morepertinentis thedifferencein speed,discussedbriefly in Section2.1.

The algorithmsdescribedhereinusefloating-pointhardwareto performextendedprecisionfloating-point arithmetic,usingthemultiple-termapproach.Thesealgorithms,describedin Section2, work undertheassumptionthathardwarearithmeticis performedin radix two with exact rounding. This assumptionholdson processorscompliantwith theIEEE 754floating-pointstandard.Proofsof thecorrectnessof allalgorithmsaregiven.

The methodshereinareclosely relatedto, andoccasionallytaken directly from, methodsdevelopedby Priest[21, 22], but arefaster. The improvementin speedarisespartly becausePriest’s algorithmsrunon a wide variety of floating-pointarchitectures,with different radicesand roundingbehavior, whereasmine are limited to andoptimizedfor radix two with exact rounding. This specializationis justified bythe wide acceptanceof the IEEE 754 standard.My algorithmsalsobenefitfrom a relaxationof Priest’snormalizationrequirement,whichis lessstrict thanthenormalizationrequiredby multiple-digitalgorithms,but is nonethelesstime-consumingto enforce.

I demonstratethesemethodswith publicly availablecodethatperformsthetwo-dimensionalandthree-dimensionalorientationand incircle tests,calculationsthat commonlyarisein computationalgeometry.The orientationtestdetermineswhethera point lies to the left of, to the right of, or on a line or plane;itis animportantpredicateusedin many (perhapsmost)geometricalgorithms.Theincircle testdetermineswhethera point lies inside,outside,or on a circle or sphere,andis usedfor Delaunaytriangulation[12].Inexact versionsof thesetestsarevulnerableto roundoff error, andthe wrong answersthey producecancausegeometricalgorithmsto hang,crash,or produceincorrectoutput.Althoughexactarithmeticbanishesthesedifficulties, it is commonto hearreportsof implementationsbeingslowedby factorsof tenor moreasa consequence[14, 9]. For thesereasons,computationalgeometryis an importantarenafor evaluatingextendedprecisionarithmeticschemes.

The orientationandincircle testsevaluatethesign of a matrix determinant.It is significantthat onlythe sign,andnot the magnitude,of the determinantis needed.FortuneandVan Wyk [9] take advantageof this fact by usinga floating-pointfilter: the determinantis first evaluatedapproximately, andonly ifforwarderroranalysisindicatesthat thesign of theapproximateresultcannotbe trusteddoesoneuseanexact test. I carry their suggestionto its logical extremeby computinga sequenceof successively moreaccurateapproximationsto the determinant,stoppingonly whenthe accuracy of the sign is assured.Toreducecomputationtime,approximationsreuseaprevious,lessaccuratecomputationwhenit is economicalto do so. Proceduresthus designedare adaptive; they refine their resultsuntil they are certainof thecorrectnessof their answer. The techniqueis not limited to computationalgeometry, nor is it limited tofinding signsof expressions;it canbeemployedin any calculationwheretherequireddegreeof accuracyvaries.This adaptive approachis describedin Section3, andits applicationto theorientationandincircletestsis describedin Section4.

Readerswho wish to usethesepredicatesin their own applicationsareencouragedto downloadthemand try themout. However, be certainto readSection5, which covers two importantissuesthat mustbe consideredto ensurethe correctnessof the implementation:your processor’s floating-pointbehaviorandyour compiler’s optimizationbehavior. Furthermore,be awarethatexactarithmeticis not a panaceafor all robustnesswoes;its usesandlimitationsarediscussedin Section4.1. Exactarithmeticcanmakerobust many algorithmsthat take geometricinput andreturnpurely combinatorialoutput; for instance,afully robust convex hull implementationcanbe producedwith recourseonly to an exact orientationtest.However, in algorithmsthatconstructnew geometricobjects,exactarithmeticis sometimesconstrainedbyits costandits inability to representarbitraryirrationalnumbers.

Arbitrary PrecisionFloating-PointArithmetic 3

2 Arbitrary PrecisionFloating-Point Arithmetic

2.1 Background

Most modernprocessorssupportfloating-point numbersof the form � significand � 2exponent. Thesignificandis a � -bit binarynumberof theform � � ��� ��� , whereeach� denotesa singlebit; oneadditionalbit representsthesign.Thisreportdoesnotaddressissuesof overflow andunderflow, soI allow theexponentto bean integer in the range � �� ��� �� . (Fortunately, many applicationshave inputswhoseexponentsfallwithin a circumscribedrange. The four predicatesimplementedfor this report will not overflow norunderflow if their inputshaveexponentsin therange� � 142� 201� andIEEE754doubleprecisionarithmeticis used.) Floating-pointvaluesaregenerallynormalized, which meansthat if a valueis not zero,thenitsmostsignificantbit is setto one,andtheexponentadjustedaccordingly. For example,in four-bit arithmetic,binary 1101(decimal13) is representedas1 � 101 � 23. Seethe survey by Goldberg [11] for a detailedexplanationof floating-pointstorageformats,particularlytheIEEE754standard.

Exactarithmeticoftenproducesvaluesthatrequiremorethan� bits to store.For thealgorithmsherein,eacharbitraryprecisionvalueis expressedasanexpansion1 ������� ������� � 2

� �1, whereeach��� is called

a componentof � andis representedby a floating-pointvaluewith a � -bit significand. To imposesomestructureon expansions,they arerequiredto benonoverlappingandorderedby magnitude( ��� largest,� 1

smallest).Two floating-pointvalues� and � arenonoverlappingif theleastsignificantnonzerobit of � ismoresignificantthanthe mostsignificantnonzerobit of � , or vice-versa;for instance,the binary values1100and � 10� 1 arenonoverlapping,whereas101and10 overlap.2 Thenumberzerodoesnot overlapanynumber. An expansionis nonoverlappingif all its componentsaremutually nonoverlapping. Note thata numbermay be representedby many possiblenonoverlappingexpansions;consider1100

� � 10� 1 �1001

�0 � 1 � 1000

�1�

0 � 1. A nonoverlappingexpansionis desirablebecauseit is easyto determineits sign(take thesignof thelargestcomponent)or to producea crudeapproximationof its value(take thecomponentwith largestmagnitude).

Two floating-pointvalues� and � areadjacentif they overlap,if � overlaps2� , or if 2� overlaps� . Forinstance,1100is adjacentto 11,but 1000is not. An expansionis nonadjacentif no two of its componentsareadjacent.Surprisingly, any floating-pointvaluehasacorrespondingnonadjacentexpansion;for instance,11111mayappearatfirst not to berepresentableasanonoverlappingexpansionof one-bitcomponents,butconsidertheexpansion100000

� � 1. Thetrick is to usethesignbit of eachcomponentto separateit fromits largerneighbor. Wewill laterseealgorithmsin whichnonadjacentexpansionsarisenaturally.

Multiple-term algorithms(basedon the expansionsdefinedabove) can be fasterthan multiple-digitalgorithmsbecausethe latter requireexpensive normalizationof resultsto fixed digit positions,whereasmultiple-termalgorithmscanallow the boundariesbetweentermsto wanderfreely. Boundariesarestillenforced,but canfall at any bit position. In addition,it usuallytakestime to convert anordinaryfloating-point numberto theinternalformatof a multiple-digit library, whereasany ordinaryfloating-pointnumberis anexpansionof lengthone. Conversionoverheadcanaccountfor a significantpartof thecostof smallextendedprecisioncomputations.

The centralconceptualdifferencebetweenstandardmultiple-digit algorithmsand the multiple-termalgorithmsdescribedhereinis that the former performexact arithmeticby keepingthe bit complexity ofoperandssmallenoughto avoid roundoff error, whereasthelatterallow roundoff to occur, thenaccountfor

1Notethatthisdefinitionof expansionis slightlydifferentfromthatusedbyPriest[21]; whereasPriestrequiresthattheexponentsof any two componentsof theexpansiondiffer by at least� , nosuchrequirementis madehere.

2Formally, � and � arenonoverlappingif thereexist integers and ! suchthat �#"$ 2% and & �'&( 2% , or �)"$ 2% and & �*&( 2% .

4 JonathanRichardShewchuk

it afterthefact. To measureroundoff quickly andcorrectly, a certainstandardof accuracy is requiredfromtheprocessor’s floating-pointunits. Thealgorithmspresentedhereinrely on theassumptionthataddition,subtraction,andmultiplicationareperformedwith exactrounding. This meansthat if theexactresultcanbestoredin a � -bit significand,thentheexactresultis produced;if it cannot,thenit is roundedto thenearest� -bit floating-pointvalue. For instance,in four-bit arithmetictheproduct111 � 101 � 100011is roundedto 1 � 001 � 25. If a valuefalls preciselyhalfway betweentwo consecutive � -bit values,a tiebreakingruledeterminesthe result. Two possibilitiesarethe round-to-even rule, which specifiesthat the valueshouldberoundedto thenearest� -bit valuewith anevensignificand,andtheround-toward-zerorule. In four-bitarithmetic,10011is roundedto 1 � 010 � 24 undertheround-to-evenrule,andto 1 � 001 � 24 undertheround-toward-zerorule. TheIEEE754standardspecifiesround-to-eventiebreakingasa default. Throughoutthisreport,thesymbols+ , , , and - represent� -bit floating-pointaddition,subtraction,andmultiplicationwithexactrounding.Dueto roundoff, theseoperatorslackseveraldesirablearithmeticproperties.Associativityis anexample;in four-bit arithmetic,. 1000+ 0 � 011/*+ 0 � 011 � 1000,but 1000+0. 0 � 011 + 0 � 011/ � 1001.A list of reliableidentitiesfor floating-pointarithmeticis givenby Knuth [15].

Roundoff isoftenanalyzedin termsof ulps, or “units in thelastplace”.An ulp is theeffectivemagnitudeof thelow-order(� th) bit of a � -bit significand.An ulp is definedrelative to a specificfloatingpoint value;I shall useulp .21*/ to denotethis quantity. For instance,in four-bit arithmetic,ulp .3� 1100/ � 1, andulp . 1/ � 0 � 001.

Anotherusefulnotationis err.21546�/ , whichdenotestheroundoff errorincurredby usinga � -bit floating-point operation4 to approximatea real operation7 (addition,subtraction,multiplication,or division) ontheoperands1 and � . Notethatwhereasulp is anunsignedquantity, err is signed.For any basicoperation,1�48� � 197:� � err.21�48�/ , andexactroundingguaranteesthat ; err.21�48�/;=< 1

2ulp .21�48�/ .In thepagesthat follow, variouspropertiesof floating-pointarithmeticareproven,andalgorithmsfor

manipulatingexpansionsaredevelopedbasedontheseproperties.Throughout,binaryanddecimalnumbersareintermixed; thebaseshouldbe apparentfrom context. A numberis saidto beexpressiblein � bits ifit canbeexpressedwith a � -bit significand,not countingthesignbit or theexponent. I will occasionallyrefer to the magnitudeof a bit, definedrelative to a specificnumber;for instance,the magnitudeof thesecondnonzerobit of binary � 1110is four. Theremainderof thissectionis quitetechnical;thereadermaywish to skip theproofson a first reading.Thekey new resultsareTheorems13,19,and24,whichprovidealgorithmsfor summingandscalingexpansions.

2.2 Propertiesof Binary Arithmetic

Exactroundingguaranteesthat ; err.21�4>�/;=< 12ulp .21�4>�/ , but onecansometimesfind asmallerboundfor

the roundoff error, asevidencedby the two lemmatabelow. Thefirst lemmais usefulwhenoneoperandis muchsmallerthanthe other, andthe secondis usefulwhenthe sumis closeto a power of two. ForLemmata1 through5, let 1 and � be � -bit floating-pointnumbers.

Lemma 1 Let 1�+0� � 1 � � � err.21�+?�/ . Theroundoff error ; err.21�+0�/; is no larger than ; 1@; or ; �A; . (Ananalogousresultholdsfor subtraction.)

Proof: Assumewithout lossof generalitythat ; 1@;CBD; �A; . Thesum 19+?� is the � -bit floating-pointnumberclosestto 1 � � . But 1 is a � -bit floating-pointnumber, so ; err.21�+0�/;=<E; �A;'<E; 1@; . (SeeFigure1.) FCorollary 2 Theroundoff error err.21�+0�/ canbeexpressedwith a � -bit significand.

Arbitrary PrecisionFloating-PointArithmetic 5

G

101� 1 110� 0 110� 1 111� 0 111� 1 1000 1001 1010

1 1�+0�1 � �

Figure1: Demonstration of the first two lemmata. Vertical lines represent four-bit floating-point values. Theroundoff error is the distance between H:I�J and H:KLJ . Lemma 1 states that the error cannot be larger thanM J M . Lemma 3(b) states that if

M H:ILJ MON 2PRQ 2S�T 1 I 1U (for V@WYX 2 and Z�W 4, this means that H:I�J falls into thedarkened region), then the error is no greater than 2P . This lemma is useful when a computed value fallsclose to a power of two.

Proof: Assumewithoutlossof generalitythat ; 1@;=BE; �A; . Clearly, theleastsignificantnonzerobit of err.21[+\�/is nosmallerin magnitudethanulp .]�/ . By Lemma1, ; err.21#+^�/;=<E; �A; ; hence,thesignificandof err.21#+_�/is no longerthanthatof � . It follows thaterr.21�+0�/ is expressiblein � bits.

Lemma 3 For anybasicfloating-pointoperation 7 , let 1�48� � 1�7)� � err.21�48�/ . Then:

(a) If ; err.21�48�/;=B 2�for someinteger ` , then ; 197)�A;'B 2

� . 2a � 1/ .(b) If ; err.21�48�/;=b 2

�for someinteger ` , then ; 197)�A;'b 2

� . 2adc 1 � 1/ .Proof:

(a) Thenumbers2� . 2ae/f� 2� . 2a�� 1/f� 2� . 2a�� 2/f� ��� � 0 areall expressiblein � bits. Any value ; 1g7h�A;ji

2� . 2a � 1/ is within adistancelessthan2

�from oneof thesenumbers.

(b) The numbers2� . 2adc 1 /f� 2� . 2adc 1 � 2/f� 2� . 2adc 1 � 4/f� ��� � 0 areall expressiblein � bits. Any value; 197:�A;=< 2

� . 2adc 1 � 1/ is within a distanceof 2�from oneof thesenumbers.(SeeFigure1.) F

The next two lemmataidentify specialcasesfor which computerarithmeticis exact. Thefirst showsthatadditionandsubtractionareexactif theresulthassmallermagnitudethantheoperands.

Lemma 4 Supposethat ; 1 � �A;C<D; 1@; and ; 1 � �A;C<k; �A; . Then 1�+l� � 1 � � . (An analogousresultholdsfor subtraction.)

Proof: Without lossof generality, assume; 1@;mBE; �A; . Clearly, theleastsignificantnonzerobit of 1 � � is nosmallerin magnitudethanulp .]�/ . However, ; 1 � �A;'<E; �A; . It followsthat 1 � � canbeexpressedin � bits. F

Many of thealgorithmswill rely on thefollowing lemma,whichshowsthatsubtractionis exactfor twooperandswithin a factorof two of eachother:

6 JonathanRichardShewchuk

1 � 1 1 0 1 1 � 1 0 0 1 � 21� � 1 0 1 0 � � 1 0 0 119�n� � 1 1 1o��� � 1 0 0 1

Figure2: Two demonstrations of Lemma 5.

Lemma 5 (Sterbenz[24]) Supposethat �#pn�2q2 � 21e� . Then1�,0� � 19�n� .Proof: Withoutlossof generality, assume; 1@;=BE; �A; . (Theothercaseissymmetric,because1r,g� � �s�t,u�#1 .)Then �#pn�2q2 ��1e� . Thedifferencesatisfies; 1o�n�A;'<E; �A;=<E; 1@; ; theresultfollowsby Lemma4. F

Two examplesdemonstratingLemma5 appearin Figure2. If 1 and � have the sameexponent,thenfloating-pointsubtractionis analogousto findingthedifferencebetweentwo � -bit integersof thesamesign,andtheresultis expressiblein � bits. Otherwise,theexponentsof 1 and � differ by one,because�#pn� q2 � 21e� .In thiscase,thedifferencehasthesmallerof thetwo exponents,andsocanbeexpressedin � bits.

2.3 SimpleAddition

An importantbasicoperationin all thealgorithmsfor performingarithmeticwith expansionsis theadditionof two � -bit valuesto formanonoverlappingexpansion(of lengthtwo). Twosuchalgorithms,duetoDekkerandKnuth respectively, arepresented.

Theorem6 (Dekker [5]) Let 1 and � be� -bit floating-pointnumberssuchthat ; 1@;=BE; �A; . Thenthefollowingalgorithmwill produceanonoverlappingexpansion � � � suchthat 1 � � ��� � � ,where � isanapproximationto 1 � � and � representstheroundoff error in thecalculationof � .

FAST-TWO-SUM .21��f�/1 ��v 1�+Y�2 � virtual

vw� ,Y13 � v �x,0� virtual4 return . � �y�C/

Proof: Line 1 computes1 � � , but maybesubjectto rounding,sowe have ��� 1 � � � err.21o+l�/ . Byassumption; 1@;=BE; �A; , so 1 and � musthavethesamesign(or ��� 0).

Line 2 computesthe quantity � virtual, which is the valuethat was really addedto 1 in Line 1. Thissubtractionis computedexactly; this factcanbeprovenby consideringtwo cases.If 1 and � havethesamesign,or if ; �A;=<{z q z2 , then � pn�2q2 � 21e� andonecanapplyLemma5 (seeFigure3). Ontheotherhand,if 1 and� areoppositein signand ; �A;=b z q z2 , then �spn� � q2 ��#1e� andonecanapplyLemma5 to Line 1, showing that� wascomputedexactlyandtherefore� virtual

� � (seeFigure4). In eithercasethesubtractionis exact,so� virtual��� ��1 � � � err.21�+0�/ .

Line 3 is alsocomputedexactly. By Corollary2, �:�n� virtual� � err.21�+0�/ is expressiblein � bits.

It follows that � � � err.21g+��/ and �Y� 1 � � � err.21g+��/ , hence1 � � �|� � � . Exactroundingguaranteesthat ; �[;=< 1

2ulp . � / , so � and � arenonoverlapping. F

Arbitrary PrecisionFloating-PointArithmetic 7

1 � 1 1 1 1 � 22� � 1 0 0 1�}� 1�+0� � 1 0 0 1 � 231 � 1 1 1 1 � 22� virtual� � ,Y1 � 1 1 0 0

� � �x,0� virtual� � 1 1

Figure3: Demonstration of FAST-TWO-SUM where H and J have the same sign. The sum of 111100and 1001is the expansion 1001000I^X 11.

1 � 1 0 0 1 � 21� � � 1 0 1 1�}� 1�+0� � 1 1 11 � 1 0 0 1 � 21� virtual� � ,Y1 � � 1 0 1 1

� � �x,0� virtual� 0

Figure4: Demonstration of FAST-TWO-SUM where H and J have opposite sign andM J MO~E� ��2 .

Note that theoutputs� and � do not necessarilyhave thesamesign,asFigure3 demonstrates.Two-term subtraction(“FAST-TWO-DIFF”) is implementedby the sequence�Dv 1$,E� ; � virtual

v 1$, � ;� v � virtual ,0� . Theproofof thecorrectnessof thissequenceis analogousto Theorem6.

The difficulty with using FAST-TWO-SUM is the requirementthat ; 1@;hB�; �A; . If the relative sizesof1 and � are unknown, a comparisonis requiredto order the addendsbeforeinvoking FAST-TWO-SUM.With most C compilers3, perhapsthe fastestportableway to implementthis test is with the statement“ if ((a > b) == (a > -b)) ”. This test takes time to execute,and the slowdown may be sur-prisingly large becauseon modernpipelinedandsuperscalararchitectures,an if statementcoupledwithimperfectmicroprocessorbranchpredictionmay causea processor’s instructionpipeline to drain. Thisexplanationis speculativeandmachine-dependent,but theTWO-SUM algorithmbelow, whichavoidsacom-parisonat thecostof threeadditionalfloating-pointoperations,is usuallyempirically faster4. Of course,FAST-TWO-SUM remainsfasterif therelative sizesof theoperandsareknown a priori , andthecomparisoncanbeavoided.

Theorem7 (Knuth [15]) Let 1 and � be � -bit floating-pointnumbers, where �0B 3. Thenthe followingalgorithm will producea nonoverlappingexpansion � � � such that 1 � � ��� � � , where � is an

3Theexceptionsarethosefew thatcanidentify andoptimizethefabs() mathlibrary call.4On a DEC Alpha-basedworkstation,usingthebundledC compilerwith optimizationlevel 3, TWO-SUM usesroughly65%as

muchtimeasFAST-TWO-SUM conditionedwith thetest“ if ((a > b) == (a > -b)) ”. On aSPARCstationIPX, usingtheGNU compilerwith optimizationlevel 2, TWO-SUM usesroughly85%asmuchtime. Ontheotherhand,usingtheSPARCstation’sbundledcompilerwith optimization(which producesslower codethangcc),conditionalFAST-TWO-SUM usesonly 82%asmuchtimeasTWO-SUM. Thelessonis thatfor optimalspeed,onemusttimeeachmethodwith one’s own machineandcompiler.

8 JonathanRichardShewchuk

1 � 1 1 � 1 1� � 1 1 0 1�}� 1�+0� � 1 0 0 0 � 211 � 1 1 � 1 1� virtual� � ,Y1 � 1 1 0 0

1 virtual� � ,0� virtual

� 1 0 0

� roundoff� �5,0� virtual

� 11 roundoff� 1�,Y1 virtual

� � 0 � 0 1� � 1 roundoff +0� roundoff� 0 � 1 1

Figure5: Demonstration of TWO-SUM whereM H M���M J M and

M H M�N�M �rM . The sum of 11� 11 and 1101 is theexpansion 10000I 0 � 11.

approximationto 1 � � and � is theroundoff error in thecalculationof � .TWO-SUM .21��f�/1 ��v 1�+Y�2 � virtual

vw� ,Y13 1 virtual

vw� ,0� virtual4 � roundoff

v �x,0� virtual5 1 roundoff

v 1�,Y1 virtual6 � v 1 roundoff +0� roundoff7 return . � �y�C/

Proof: If ; 1@;@B�; �A; , thenLines1, 2, and4 correspondpreciselyto theFAST-TWO-SUM algorithm. Recallfromtheproofof Theorem6thatLine2 iscalculatedexactly;it followsthatLine3of TWO-SUM iscalculatedexactlyaswell, because1 virtual

� 1 canbeexpressedexactly. Hence,1 roundoff is zero,� � � roundoff iscomputedexactly, andtheprocedureis correct.

Now, supposethat ; 1@;CiD; �A; , andconsidertwo cases.If ; � ;CiD; 1@;CiD; �A; , then � is computedexactly byLemma4. It immediatelyfollowsthat � virtual

� � , 1 virtual� 1 , and � roundoff, 1 roundoff, and � arezero.

Conversely, if ; � ;#B�; 1@; , Lines 1 and 2 may be subjectto rounding,so ��� 1 � � � err.21$+��/ ,and � virtual

� � � err.21�+��/ � err. � ,�1*/ . (SeeFigure 5.) Lines 2, 3, and 5 are analogousto thethreelines of FAST-TWO-DIFF (with Line 5 negated),so Lines 3 and 5 are computedexactly. Hence,1 virtual

��� �n� virtual� 19� err. � ,Y1*/ , and 1 roundoff

� err. � ,Y1*/ .Because; �A;Cb|; 1@; , we have ; � ; � ; 1�+?�A;*< 2 ; �A; , sotheroundoff errorserr.219+?�/ anderr. � ,01*/ each

cannotbemorethanulp .]�/ , so � virtual pn�]�2 � 2��� (for �LB 3) andLemma5 canbeappliedto show thatLine4 is exact. Hence,� roundoff

� � err.21�+?�/�� err. � ,01*/ . Finally, Line 6 is exactbecauseby Corollary2,1 roundoff� � roundoff

� � err.21�+0�/ is expressiblein � bits.

It follows that � � � err.21�+0�/ and ��� 1 � � � err.21�+0�/ , hence1 � � ��� � � . FTwo-termsubtraction(“TWO-DIFF”) is implementedby the sequence�?v 16,�� ; � virtual

v 1g, � ;1 virtualvw� +8� virtual; � roundoff

v � virtual ,8� ; 1 roundoffv 1),>1 virtual; � v 1 roundoff +8� roundoff.

Arbitrary PrecisionFloating-PointArithmetic 9

Corollary 8 Let � and � bethevaluesreturnedbyFAST-TWO-SUM or TWO-SUM.

(a) If ; �[;'B 2�for someinteger ` , then ; � � �[;=B 2

� . 2a � 1/ .(b) If ; �[;'b 2

�for someinteger ` , then ; � � �[;=b 2

� . 2adc 1 � 1/ .Proof: � is theroundoff error � err.21o+l�/ for some1 and � . By Theorems6 and7, 1 � � ��� � � . Theresultsfollow directly from Lemma3. FCorollary 9 Let � and � be the valuesreturnedby FAST-TWO-SUM or TWO-SUM. On a machinewhosearithmeticusesround-to-eventiebreaking, � and � arenonadjacent.

Proof: Exactroundingguaranteesthat ��< 12ulp . � / . If theinequalityis strict, � and � arenonadjacent.If� � 1

2ulp . � / , theround-to-evenruleensuresthattheleastsignificantbit of thesignificandof � is zero,so �and � arenonadjacent. F2.4 ExpansionAddition

Havingestablishedhow toaddtwo � -bit values,I turnto thetopicof how toaddtwoarbitraryprecisionvaluesexpressedasexpansions.Threemethodsareavailable. EXPANSION-SUM addsan � -componentexpansionto an � -componentexpansionin �$.�����/ time. LINEAR-EXPANSION-SUM andFAST-EXPANSION-SUM do thesamein �$.�� � ��/ time.

Despiteits asymptoticdisadvantage,EXPANSION-SUM canbe fasterthanthe linear-time algorithmsincaseswherethesizeof eachexpansionis smallandfixed,becauseprogramloopscanbecompletelyunrolledandindirectionoverheadcanbeeliminated(by avoidingtheuseof arrays).Thelinear-timealgorithmshaveconditionalsthatmake suchoptimizationsuntenable.Hence,EXPANSION-SUM andFAST-EXPANSION-SUM

arebothusedin theimplementationsof geometricpredicatesdescribedin Section4.

EXPANSION-SUM andLINEAR-EXPANSION-SUM bothhavethepropertythattheiroutputsarenonoverlap-ping if their inputsarenonoverlapping,andnonadjacentif their inputsarenonadjacent.FAST-EXPANSION-SUM is fasterthanLINEAR-EXPANSION-SUM, performingsix floating-pointoperationspercomponentratherthannine, but hasthreedisadvantages.First, FAST-EXPANSION-SUM doesnot alwayspreserve eitherthenonoverlappingnor the nonadjacentproperty; instead,it preserves an intermediateproperty, describedlater. Second,whereasLINEAR-EXPANSION-SUM makesno assumptionaboutthe tiebreakingrule, FAST-EXPANSION-SUM is designedfor machinesthat useround-to-even tiebreaking,andcanfail on machineswith other tiebreakingrules. Third, the correctnessproof for FAST-EXPANSION-SUM is much more te-dious. Nevertheless,I useFAST-EXPANSION-SUM in my geometricpredicates,and relegate the slowerLINEAR-EXPANSION-SUM to AppendixB. Usersof machinesthathaveexactroundingbut notround-to-eventiebreakingshouldreplacecallsto FAST-EXPANSION-SUM with callsto LINEAR-EXPANSION-SUM.

A complicatingcharacteristicof all the algorithmsfor manipulatingexpansionsis that theremay bespuriouszerocomponentsscatteredthroughouttheoutputexpansions,evenif no zeroswerepresentin theinputexpansions.For instance,if theexpansions1111

�0 � 0101and1100

�0 � 11arepassedasinputstoany of

thethreeexpansionadditionalgorithms,theoutputexpansionin four-bit arithmeticis11100�

0�

0�

0 � 0001.Onemaywantto addexpansionsthusproducedto otherexpansions;fortunately, all thealgorithmsin thisreportcopewell with spuriouszerocomponentsin their input expansions.Unfortunately, accountingforthesezerocomponentscould complicatethe correctnessproofssignificantly. To avoid confusion,most

10 JonathanRichardShewchuk

TWOSUM

TWOSUM

TWOSUM

TWOSUM �����

�1

�2

�3

�4

����

1�

2�

3�

4

� � � � ��5

�4

�3

�2

�1

Figure6: Operation of GROW-EXPANSION. The expansions � and � are illustrated with their most significantcomponents on the left. All TWO-SUM boxes in this report observe the convention that the larger output (

�)

emerges from the left side of each box, and the smaller output ( � ) from the bottom or right. Each � P term isan approximate running total.

of the proofsfor the additionandscalingalgorithmsarewritten as if all input componentsarenonzero.Spuriouszeroscanbe integratedinto the proofs(after the fact) by noting that the effect of a zero inputcomponentis alwaysto producea zerooutputcomponentwithout changingthevalueof theaccumulator(denotedby thevariable

�). Theeffect canbelikenedto a pipelinedelay;it will becomeclearin thefirst

few proofs.

Eachalgorithmhasan accompanying dataflow diagram,like Figure6. Readerswill find the proofseasierto understandif they follow thediagramswhile readingtheproofs,andkeepseveral factsin mind.First, Lemma1 indicatesthat thedown arrow from any TWO-SUM box representsa numberno larger thaneitherinput to the box. (This is why a zeroinput componentyields a zerooutputcomponent.)Second,Theorems6 and7 indicatethat thedown arrow from any TWO-SUM box representsa numbertoo small tooverlapthenumberrepresentedby theleft arrow from thebox.

I begin with analgorithmfor addingasingle� -bit valueto anexpansion.

Theorem10 Let � �E�^�� � 1� � bea nonoverlappingexpansionof ��� -bit components,andlet � bea � -bit

valuewhere �LB 3. Supposethatthecomponents� 1 � � 2 � ��� � � � aresortedin orderof increasingmagnitude,exceptthatanyof the � � maybezero. Thenthefollowingalgorithmwill produceanonoverlappingexpansion�

such that� � � � c 1� �

1� �¡� � � � , where thecomponents

�1 � � 2 � ��� � � � c 1 arealsoin orderof increasing

magnitude, exceptthat any of the� � maybe zero. Furthermore, if � is nonadjacentand round-to-even

tiebreakingis used,then�

is nonadjacent.

GROW-EXPANSION . � �f�/1

�0v �

2 for ` v 1 to �3 . � � � � � / v TWO-SUM . � � � 1 � � � /4

� � c 1v � �

5 return�

� � is anapproximatesumof � andthefirst ` componentsof � ; seeFigure6. In animplementation,thearray

�canbecollapsedinto a singlescalar.

Arbitrary PrecisionFloating-PointArithmetic 11

Proof: At the endof eachiterationof the for loop, the invariant� � � � �¢ �

1� ¢ � � � � �¢ � 1

� ¢ holds.Certainlythis invariantholdsfor ` � 0 afterLine 1 is executed.FromLine 3 andTheorem7, we have that� � � � ��� � � � 1

� � � ; from this onecandeduceinductively thattheinvariantholdsfor all (relevantvaluesof) ` . Thus,afterLine 4 is executed,� � c 1¢ �

1� ¢ � � �¢ �

1� ¢ � � .

For all ` , the output of TWO-SUM (in Line 3) hasthe propertythat� � and

� � do not overlap. ByLemma 1, ; � � ;�<£; � � ; , and because� is a nonoverlappingexpansionwhosenonzerocomponentsarearrangedin increasingorder,

� � cannotoverlapany of � � c 1 � � � c 2 � ��� . It follows that� � cannotoverlapany

of thelatercomponentsof�, becausetheseareconstructedby summing

� � with later � components.Hence,�is nonoverlappingandincreasing(exceptingzerocomponentsof

�). If round-to-eventiebreakingis used,

then� � and

� � arenonadjacentfor all ` (by Corollary9), soif � is nonadjacent,then�

is nonadjacent.

If any of the � � is zero,thecorrespondingoutputcomponent� � is alsozero,andtheaccumulatorvalue

�is unchanged(

� �¡� � � � 1). (For instance,considerFigure6, andsupposethat � 3 is zero.Theaccumulatorvalue

�2 shiftsthroughthepipelineto become

�3, anda zerois harmlesslyoutputas

�3. Thesameeffect

occursin severalalgorithmsin this report.) FCorollary 11 Thefirst � componentsof

�areeachnolarger thanthecorrespondingcomponentof � . (That

is, ; � 1 ;'<E; � 1 ; �O; � 2 ;'<E; � 2 ; � ��� �O; � � ;=<E; � � ; .) Furthermore, ; � 1 ;'<E; �A; .Proof: Follows immediatelyby applicationof Lemma1 to Line 3. (Both of thesefactsareapparentinFigure6. Recallthat the down arrow from any TWO-SUM box representsa numberno larger thaneitherinput to thebox.) F

If � is a long expansion,two optimizationsmight beadvantageous.Thefirst is to usea binarysearchto find thesmallestcomponentof � greaterthanor equalto ulp .]�/ , andstartthere. A variantof this idea,without thesearch,is usedin thenext theorem.Thesecondoptimizationis to stopearly if theoutputof aTWO-SUM operationis thesameasits inputs;theexpansionis alreadynonoverlapping.

A naıve way to add one expansionto anotheris to repeatedlyuseGROW-EXPANSION to add eachcomponentof oneexpansionto theother. Onecanimprove this ideawith asmallmodification.

Theorem12 Let � ���8�� � 1� � and ¤ ��� �� � 1 ¤ � benonoverlappingexpansionsof � and �)� -bit components,

respectively, where �^B 3. Supposethat thecomponentsof both � and ¤ are sortedin order of increasingmagnitude, exceptthat any of the � � or ¤ � may be zero. Thenthe following algorithm will produceanonoverlappingexpansion

�such that

� � � � c �� � 1� �x� � � ¤ , where thecomponentsof

�are in order of

increasingmagnitude, exceptthat anyof the� � maybezero. Furthermore, if � and ¤ arenonadjacentand

round-to-eventiebreakingis used,then�

is nonadjacent.

EXPANSION-SUM . � �f¤[/1

� v �2 for ` v 1 to �3 ¥ � � � � � c 1 � ��� � � � c ��¦ v GROW-EXPANSION .y¥ � � � � � c 1 � ��� � � � c � � 1 ¦ �f¤ � /4 return

�Proof: That � � c �� � 1

� �¡� �^�� �1� � � � �� �

1 ¤ � uponcompletioncanbeprovenby inductiononLine 3.

After setting� v � , EXPANSION-SUM traversesthe expansion¤ from smallestto largestcomponent,

individually addingthesecomponentsto�

usingGROW-EXPANSION (seeFigure7). The theoremwouldfollow directly from Theorem10 if eachcomponent¤ � wereaddedto thewholeexpansion

�, but to save

12 JonathanRichardShewchuk

TWOSUM

TWOSUM

TWOSUM

TWOSUM �����

¤ 1

�1

�2

�3

�4

���� � � �

��1

TWOSUM

TWOSUM

TWOSUM

TWOSUM � ¤ 2���

� � � �

��2

TWOSUM

TWOSUM

TWOSUM

TWOSUM � ¤ 3���

� � � � ��7

�6

�5

�4

�3

Figure7: Operation of EXPANSION-SUM.

time, only the subexpansion ¥ � � � � � c 1 � ��� � � � c � � 1 ¦ is considered.(In Figure7, this optimizationsavesthreeTWO-SUM operationsthatwouldotherwiseappearin thelower right cornerof thefigure.)

When ¤ � is considered,thecomponents¤ 1 �f¤ 2 � ��� �f¤ � � 1 havealreadybeensummedinto�. Accordingto

Corollary11, ; � ¢ ;m<|; ¤ ¢ ; afteriteration § of Line 3. Because¤ is anincreasingnonoverlappingexpansion,for any §6i?` , � ¢ cannotoverlap ¤ � , andfurthermore; � ¢ ;=iE; ¤ � ; (unless¤ �[� 0). Therefore,whenonesums¤ � into

�, onecanskipthefirst `C� 1 componentsof

�withoutsacrificingthenonoverlappingandincreasing

propertiesof�. Similarly, if � and ¤ arenonadjacent,onecanskip thefirst `¡� 1 componentsof

�without

sacrificingthenonadjacentpropertyof�.

No difficulty ensuesif ¤ � is a spuriouszerocomponent,becausezerodoesnot overlapany number.GROW-EXPANSION will deposita zeroat

� � andcontinuenormally. FUnlike EXPANSION-SUM, FAST-EXPANSION-SUM doesnot preserve thenonoverlappingor nonadjacent

properties,but it is guaranteedto producea strongly nonoverlappingoutput if its inputs are stronglynonoverlapping.An expansionis stronglynonoverlappingif no two of its componentsareoverlapping,nocomponentis adjacentto two othercomponents,andany pair of adjacentcomponentshave the propertythat both componentscanbe expressedwith a one-bitsignificand(that is, both arepowersof two). Forinstance,11000

�11 and10000

�1000

�10�

1 areboth stronglynonoverlapping,but 11100�

11 isnot,nor is 100

�10�

1. A characteristicof this propertyis thata zerobit mustoccurin theexpansionatleastonceevery � � 1 bits. For instance,in four-bit arithmetic,astronglynonoverlappingexpansionwhoselargestcomponentis 1111canbe no greaterthan1111� 01111011110��� . Any nonadjacentexpansionisstronglynonoverlapping,andany stronglynonoverlappingexpansionis nonoverlapping,but theconverseimplicationsdo notapply.

Undertheassumptionthatall expansionsarestronglynonoverlapping,it is possibleto prove thefirst

Arbitrary PrecisionFloating-PointArithmetic 13

TWOSUM

TWOSUM

TWOSUM

FASTTWOSUM

�����

¨1

¨2

¨3

¨4

¨5

����

2�

3�

4�

5

� � � � ��5

�4

�3

�2

�1

Figure8: Operation of FAST-EXPANSION-SUM. The � P terms maintain an approximate running total.

key resultof thisreport: theFAST-EXPANSION-SUM algorithmdefinedbelow behavescorrectlyunderround-to-even tiebreaking. The algorithmcanalsobe usedwith round-toward-zeroarithmetic,but the proof isdifferent. I haveemphasizedround-to-evenarithmeticheredueto theIEEE754standard.

A variantof this algorithmwaspresentedby Priest[21], but it is useddifferentlyhere. Priestusesthealgorithmto sumtwo nonoverlappingexpansions,andprovesundergeneralconditionsthatthecomponentsof the resultingexpansionoverlapby at mostonedigit (i.e. onebit in binary arithmetic). An expensiverenormalizationstepis requiredafterwardto removetheoverlap.Here,by contrast,thealgorithmis usedtosumtwo stronglynonoverlappingexpansions,andtheresultis alsoa stronglynonoverlappingexpansion.Not surprisingly, theproof demandsmorestringentconditionsthanPriestrequires:binaryarithmeticwithexactroundingandround-to-eventiebreaking,consonantwith theIEEE754standard.No renormalizationis needed.

Theorem13 Let � �|�^�� � 1� � and ¤ �|� �� � 1 ¤ � bestronglynonoverlappingexpansionsof � and �6� -bit

components,respectively, where �LB 4. Supposethat thecomponentsof both � and ¤ aresortedin orderofincreasingmagnitude, exceptthatanyof the � � or ¤ � maybezero. Ona machinewhosearithmeticusestheround-to-evenrule, thefollowing algorithmwill producea stronglynonoverlappingexpansion

�such that� � � � c �� � 1

� �x� � � ¤ , where thecomponentsof�

are alsoin orderof increasingmagnitude, exceptthatanyof the

� � maybezero.

FAST-EXPANSION-SUM . � �f¤[/1 Merge � and ¤ into a singlesequence, in orderof

nondecreasingmagnitude(possiblywith interspersedzeros)2 . � 2 � � 1 / v FAST-TWO-SUM . ¨ 2 � ¨ 1 /3 for ` v 3 to � � �4 . � � � � � � 1 / v TWO-SUM . � � � 1 � ¨ � /5

� � c �\v � � c �6 return

�� � is anapproximatesumof thefirst ` componentsof ¨ ; seeFigure8.

Severallemmatawill aid theproofof Theorem13. I begin with a proof thatthesumitself is correct.

Lemma 14(Q Invariant) Attheendofeachiterationofthefor loop,theinvariant� � � � � � 1¢ �

1� ¢ � � �¢ �

1¨ ¢

holds. This assuresus that after Line 5 is executed,� � c �¢ �1� ¢ � � � c �¢ �

1¨ ¢ , so thealgorithmproducesa

correctsum.

14 JonathanRichardShewchuk

Proof: Theinvariantclearlyholdsfor ` � 2 afterLine 2 is executed.For largervaluesof ` , Line 4 ensuresthat

� � � � � � 1� � � � 1

� ¨ � ; theinvariantfollowsby induction. FLemma 15 Let ©¨ ���_ª¢ � 1 ©¨ ¢ bea seriesformedbymergingtwostronglynonoverlappingexpansions,or asubseriesthereof. Supposethat ©¨ ª is thelargesttermandhasa nonzero bit of magnitude2

�or smallerfor

someinteger ` . Then ; � ª¢ � 1 ©¨ ¢ ;'i 2� . 2adc 1 � 1/ , and ; � ª � 1¢ �

1 ©¨ ¢ ;=i 2� . 2aA/ .

Proof: Let ©� and ©¤ be the expansions(or subsequencesthereof)from which ©¨ wasformed,andassumethat theterm ©¨ ª comesfrom theexpansion©� . Because©¨ ª is the largesttermof ©� andhasa nonzerobit ofmagnitude2

�or smaller, andbecause©� is stronglynonoverlapping,; ©� ; is boundedbelow 2

� . 2a�� 12 / . (For

instance,if � � 4 and ` � 0, then ; ©� ;�< 1111� 0111101111��� .) Thesameboundappliesto theexpansion©¤ , so ; ©¨ ; � ; ©� � ©¤x;=i 2� . 2adc 1 � 1/ .

If we omit ©¨ ª from the sum, thereare two casesto consider. If ©¨ ª � 2�, then ; ©� ��©¨ ª ; is bounded

below 2�, and ; ©¤x; is boundedbelow 2

� . 2/ . (For instance,if � � 4, ` � 0, and ©¨ ª � 1, then ; ©� �?©¨ ª ;¡<0 � 10111101111��� , and ; ©¤x;=< 1 � 10111101111��� .) Conversely, if ©¨ ª�«� 2

�, then ; ©� ��©¨ ª ; is boundedbelow

2� . 12 / , and ; ©¤5; is boundedbelow 2

� . 2as� 12 / . (For instance,if � � 4, ` � 0, and ©¨ ª � 1111,then ; ©� �n©¨ ª ;'<

0 � 0111101111��� , and ; ©¤5;'< 1111� 0111101111��� .) In eithercase,; ©¨ ��©¨ ª ; � ; ©� �n©¨ ª � ©¤x;=i 2� . 2ae/ . F

Lemma 16 The expansion�

producedby FAST-EXPANSION-SUM is a nonoverlappingexpansionwhosecomponentsare in orderof increasingmagnitude(exceptingzeros).

Proof: Supposefor thesakeof contradictionthattwo successivenonzerocomponentsof�

overlapor occurin orderof decreasingmagnitude.Denotethefirst suchpair produced5

� � � 1 and� � ; thenthecomponents�

1 � ��� � � � � 1 arenonoverlappingandincreasing(exceptingzeros).

Assumewithout lossof generalitythat theexponentof� � � 1 is zero,so that

� � � 1 is of the form � 1 � 7 ,whereanasteriskrepresentsasequenceof arbitrarybits.� � and

� � � 1 areproducedby a TWO-SUM or FAST-TWO-SUM operation,andarethereforenonadjacentby Corollary9 (becausetheround-to-evenrule is used).

� � is thereforeof theform ��7 00 (having no bitsof magnitudesmallerthanfour). Because; � � � 1 ;=B 1, Corollary8(a)guaranteesthat

; � � � � � � 1 ;'B 2a � 1 � (1)

Becausethe offendingcomponents� � � 1 and

� � arenonzeroandeitheroverlappingor of decreasingmagnitude,theremustbe at leastonenonzerobit in thesignificandof

� � whosemagnitudeis no greaterthanone. Onemayask,wheredoesthis offendingbit comefrom?

� � is computedby Line 4 from� � and¨ � c 1, andtheoffendingbit cannotcomefrom

� � (which is of theform �l7 00),soit musthavecomefrom¨ � c 1. Hence, ; ¨ � c 1 ; hasa nonzerobit of magnitudeoneor smaller. Applying Lemma15, onefinds that; � �¢ � 1¨ ¢ ;'i 2a .

A boundfor � � � 2¢ �1� ¢ canbederivedby recallingthat

� � � 1 is of the form � 1 � 7 , and�

1 � ��� � � � � 1 are

nonoverlappingandincreasing.Hence,; � � � 2¢ �1� ¢ ;'i 1.

Rewrite theQ Invariantin theform� � � � � � 1

� � �¢ �1¨ ¢ � � � � 2¢ �

1� ¢ . Usingtheboundsderivedabove,

weobtain ; � � � � � � 1 ;'i 2a � 1 � (2)

Arbitrary PrecisionFloating-PointArithmetic 15

; � �¢ � 1¨ � ; ¬ ; � ¢y­ � ¢y­ ;'< ¨ � c 1 � 0 1 1 1 1 0 1 1 1 1 0 1 1; � ¢y­ ­ ¤ ¢y­ ­ ;'< 1 1 1 1 � 0 1 1 1 1 0 1 1 1 1 0 1 1; � � � 2¢ �

1� ¢ ;'< 0 � 1 1 1 1 1 1 1 1 1 1 1 1 1; � � � � � � 1 ;'< 1 0 0 0 0 � 1 1 1 1 0 1 1 1 1 1 1 0 1

Figure9: Demonstration (for Z$W 4) of how the Q Invariant is used in the proof that � is nonoverlapping.The top two values, � and ® , are being summed to form � . Because ¯ P T 1 has a nonzero bit of magnitudeno greater than 1, and because ¯ is formed by merging two strongly nonoverlapping expansions, the sumM � P°R±

1 ¯ P M I M � P ² 2°R±1 � ° M can be no larger than illustrated in this worst-case example. As a result,

M � P I>� P ² 1M

cannot be large enough to have a roundoff error of 1, soM � P ² 1

Mis smaller than 1 and cannot overlap ¯ P T 1.

(Note that ¯ P T 1 is not part of the sum; it appears above in a box drawn as a placeholder that bounds thevalue of each expansion.)

SeeFigure9 for aconcreteexample.

Inequalities1 and2 cannotholdsimultaneously. Theresultfollowsby contradiction. FProof of Theorem 13: Lemma14 ensuresthat

� � � � ¤ . Lemma16 eliminatesthepossibility that thecomponentsof

�overlapor fail to occurin orderof increasingmagnitude;it remainsonly to provethat

�is

stronglynonoverlapping.Supposethattwo successivenonzerocomponents� � � 1 and

� � areadjacent.

Assumewithout lossof generalitythat theexponentof� � � 1 is zero,so that

� � � 1 is of the form � 1 � 7 .As in theproofof Lemma16,

� � musthavetheform �l7 00.

Because� � � 1 and

� � areadjacent,theleastsignificantnonzerobit of� � hasmagnitudetwo; thatis,

� � isof theform ��7 10. Againweask,wheredoesthisbit comefrom? As before,thisbit cannotcomefrom

� � ,soit musthave comefrom ¨ � c 1. Hence, ; ¨ � c 1 ; hasa nonzerobit of magnitudetwo. Applying Lemma15,wefind that ; � � c 1¢ �

1¨ ¢ ;'i 2adc 2 � 2 and ; � �¢ � 1

¨ ¢ ;'i 2adc 1.

Boundsfor � � � 1¢ �1� ¢ and � � � 2¢ �

1� ¢ canalsobederivedby recallingthat

� � � 1 is of theform � 1 � 7 andis

thelargestcomponentof anonoverlappingexpansion.Hence,; � � � 1¢ �1� ¢ ;'i 2, and ; � � � 2¢ �

1� ¢ ;'i 1.

Rewriting theQ Invariantin theform� � c 1

� � �[� � � c 1¢ �1¨ ¢ � � � � 1¢ �

1� ¢ , weobtain

; � � c 1� � � ;'i 2adc 2 � (3)

TheQ Invariantalsogivesustheidentity� � � � � � 1

� � �¢ �1¨ ¢ � � � � 2¢ �

1� ¢ . Hence,

; � � � � � � 1 ;'i 2adc 1 � 1 � (4)

Recall that the value ; � � ; is at least2. Considerthe possibility that ; � � ; might be greaterthan2; byCorollary8(b), thiscanoccuronly if ; � � c 1

� � � ;'b 2adc 2 � 2, contradictingInequality3. Hence,; � � ; mustbeexactly2, andis expressiblein onebit. (Figure10givesanexamplewherethisoccurs.)

Similarly, thevalue ; � � � 1 ; is at least1. Considerthepossibilitythat ; � � � 1 ; might begreaterthan1; byCorollary8(b), this canoccuronly if ; � � � � � � 1 ;[b 2adc 1 � 1, contradictingInequality4. Hence, ; � � � 1 ;mustbeexactly1, andis expressiblein onebit.

5It is implicitly assumedherethat thefirst offendingpair is not separatedby interveningzeros.Theproof couldbewritten toconsiderthecasewhereinterveningzerosappear, but thiswouldmake it evenmoreconvoluted.Trustme.

16 JonathanRichardShewchuk

TWOSUM

TWOSUM

TWOSUM

FASTTWOSUM

�����

¤ 1 � 0 � 1�

1� 0 � 1

¤ 2 � 11110

�2� 11110

¤ 3 � 1 � 27

����

2�

3�

4�

5

11000001 � 26

�� � � ��

5� 1 � 1 � 27�

4� 0

�3� � 10

�2� � 1

�1� 0

Figure10: A four-bit example where FAST-EXPANSION-SUM generates two adjacent components � 2 and � 3.The figure permits me a stab at explaining the (admittedly thin) intuition behind Theorem 13: suppose � 2

is of the form ³ 1 � ´ . Because � 2 is the roundoff term associated with � 3, � 3 must be of the form ´ 00 ifround-to-even arithmetic is used. Hence, the bit of magnitude 2 in � 3 must have come from � 2. This impliesthat

M � 2 M is no larger than 11110, which imposes bounds on how largeM � 3Mand

M � 4Mcan be (Lemma 15);

these bounds in turn imply thatM � 2Mcan be no larger than 1, and

M � 3Mcan be no larger than 10. Furthermore,� 4 cannot be adjacent to � 3 because neither � 4 nor ® 3 can have a bit of magnitude 4.

By Corollary8(a), ; � � � � � � 1 ;[B 2a � 1 (because; � � � 1 ; � 1). Using this inequality, the inequality; � � � 2¢ �1� ¢ ;�i 1, andtheQ Invariant,onecandeducethat ; � �¢ � 1

¨ ¢ ;�b 2a . Because is formedfrom twononoverlappingincreasingexpansions,this inequalityimpliesthat ; ¨ � ;=B 2aO� 2 B 100binary(recallingthat�LB 4), andhence � c 2 � ¨ � c 3 � ��� mustall beof theform �?7 000(having nobitsof magnitudesmallerthan8).� � c 1 is alsoof theform �?7 000,because

� � c 1 and� � areproducedby aTWO-SUM or FAST-TWO-SUM

operation,andarethereforenonadjacentby Corollary9 (assumingtheround-to-evenrule is used).

Because� � c 1 and ¨ � c 2 � ¨ � c 3 � ��� are of the form �E7 000,

� � c 1 � � � c 2 � ��� must be as well, and arethereforenotadjacentto

� � . It follows that�

cannotcontainthreeconsecutiveadjacentcomponents.

Theseargumentsprovethatif two componentsof�

areadjacent,bothareexpressiblein onebit, andnoothercomponentsareadjacentto them.Hence,

�is stronglynonoverlapping. F

Theproofof Theorem13is morecomplex thanonewould like. It is unfortunatethattheproof requiresstronglynonoverlappingexpansions;it would be moreparsimoniousif FAST-EXPANSION-SUM producednonoverlappingoutputfrom nonoverlappinginput,or nonadjacentoutputfrom nonadjacentinput. Unfortu-nately, it doesneither. For a counterexampleto theformerpossibility, consideraddingthenonoverlappingexpansion11110000

�1111

�0 � 1111to itself in four-bit arithmetic.(Thisexampleproducesanoverlapping

expansionif oneusestheround-to-evenrule,but not if oneusestheround-toward-zerorule.) For a coun-terexampleto thelatterpossibility, seeFigure10. Onapersonalnote,it tookmequiteabit of effort to finda propertybetweennonoverlappingandnonadjacentthat is preservedby FAST-EXPANSION-SUM. SeveralconjectureswerelaboriouslyexaminedanddiscardedbeforeI convergedon the stronglynonoverlappingproperty. I persistedonly becausethealgorithmconsistentlyworksin practice.

It is also unfortunatethat the proof requiresexplicit considerationof the tiebreakingrule. FAST-EXPANSION-SUM works just aswell on a machinethat usesthe round-toward-zerorule. The conditionsunderwhich it works arealsosimpler— the outputexpansionis guaranteedto be nonoverlappingif theinput expansionsare. Onemight hopeto prove that FAST-EXPANSION-SUM works regardlessof rounding

Arbitrary PrecisionFloating-PointArithmetic 17

mode,but this is not possible. AppendixA demonstratesthe difficulty with an exampleof how mixinground-toward-zeroandround-to-evenarithmeticcanleadto thecreationof overlappingexpansions.

The algorithmsEXPANSION-SUM and FAST-EXPANSION-SUM can be mixed only to a limited degree.EXPANSION-SUM preservesthe nonoverlappingandnonadjacentproperties,but not the stronglynonover-lappingproperty; FAST-EXPANSION-SUM preservesonly the strongly nonoverlappingproperty. Becausenonadjacentexpansionsarestronglynonoverlapping,andstronglynonoverlappingexpansionsarenonover-lapping,expansionsproducedexclusivelyby oneof thetwo algorithmscanbefedasinputto theother, but itmaybedangerousto repeatedlyswitchbackandforth betweenthetwo algorithms.In practice,EXPANSION-SUM is only preferredfor producingsmallexpansions,whicharenonadjacentandhencesuitableasinput toFAST-EXPANSION-SUM.

It is useful to considerthe operationcountsof the algorithms. EXPANSION-SUM uses��� TWO-SUM

operations,for a total of 6��� flops (floating-pointoperations).FAST-EXPANSION-SUM uses� � �^� 2TWO-SUM operationsandoneFAST-TWO-SUM operation,for a total of 6� � 6�µ� 9 flops. However, themergestepof FAST-EXPANSION-SUM requires� � �6� 1 comparisonoperationsof theform “ if ; � � ;'bE; ¤ ¢ ; ”.Empirically, eachsuchcomparisonseemsto take roughlyaslongasthreeflops;hence,a roughmeasureisto estimatethatFAST-EXPANSION-SUM takesaslong to executeas9� � 9��� 12flops.

Theseestimatescorrelatewell with themeasuredperformanceof thealgorithms.I implementedeachprocedureasafunctioncallwhoseparametersarevariable-lengthexpansionsstoredasarrays,andmeasuredthemonaDECAlpha-basedworkstationusingthebundledcompilerwith optimizationlevel 3. By plottingtheirperformanceoveravarietyof expansionsizesandfitting curves,I foundthatEXPANSION-SUM runsin0 � 83.�� � ��/¶� 0 � 7 microseconds,andFAST-EXPANSION-SUM runsin 0 � 54��� � 0 � 6 microseconds.FAST-EXPANSION-SUM is alwaysfasterexceptwhenoneof theexpansionshasonly onecomponent,in whichcaseGROW-EXPANSION shouldbeused.

As I havementioned,however, thebalanceshiftswhenexpansionlengthsaresmallandfixed.By storingsmall,fixed-lengthexpansionsasscalarvariablesratherthanarrays,onecanunroll theloopsin EXPANSION-SUM, remove arrayindexing overhead,andallow componentsto beallocatedto registersby thecompiler.Thus,EXPANSION-SUM is attractive in this specialcase,andis usedto advantagein my implementationofthegeometricpredicatesof Section4. NotethatFAST-EXPANSION-SUM is difficult to unroll becauseof theconditionalsin its initial mergingstep.

On the other hand,the useof arraysto storeexpansions(and non-unrolledloops to managethem)conferstheadvantagethatspuriouszerocomponentscaneasilybeeliminatedfrom outputexpansions.IntheproceduresGROW-EXPANSION, EXPANSION-SUM, andFAST-EXPANSION-SUM, aswell astheproceduresSCALE-EXPANSION andCOMPRESSin thesectionsto come,zero eliminationcanbeachievedby maintainingaseparateindex for theoutputarray

�andadvancingthisindex onlywhentheprocedureproducesanonzero

componentof�. In practice,versionsof thesealgorithmsthateliminatezerosarealmostalwayspreferable

to versionsthat don’t (exceptwhen loop unrolling confersa greateradvantage). Zero eliminationaddsa small amountof overheadfor testingandindexing, but the lost time is virtually alwaysregainedwhenfurtheroperationsareperformedon theresultingshortenedexpansions.

Experiencesuggeststhatit iseconomicaltouseunrolledversionsof EXPANSION-SUM to formexpansionsof up to aboutfour components,toleratinginterspersedzeros,andto useFAST-EXPANSION-SUM with zeroeliminationwhenforming(potentially)largerexpansions.

18 JonathanRichardShewchuk

2.5 SimpleMultiplication

Thebasicmultiplicationalgorithmcomputesa nonoverlappingexpansionequalto theproductof two � -bitvalues.Themultiplicationis performedby splittingeachvalueinto two halveswith half theprecision,thenperformingfour exactmultiplicationson thesefragments.Thetrick is to find away to split afloating-pointvaluein two. Thefollowing theoremwasfirst provenby Dekker [5]:

Theorem17 Let 1 be a � -bit floating-pointnumber, where �lB 3. Choosea splitting point · such thata2 <�·\<0��� 1. Thenthefollowing algorithmwill producea . �$�Y·t/ -bit value 1 hi anda nonoverlapping.]·¸� 1/ -bit value 1 lo such that ; 1 hi ;'BE; 1 lo ; and 1 � 1 hi

� 1 lo.

SPLIT .21��f·t/1 ¹ v . 2º � 1/@-Y12 1 big

v ¹»,Y13 1 hi

v ¹5,Y1 big4 1 lo v 1�,Y1 hi5 return .21 hi ��1 lo /

Theclaim mayseemabsurd.After all, 1 hi and 1 lo have only ��� 1 bits of significandbetweenthem;how canthey carryall the informationof a � -bit significand?The secretis hiddenin the signbit of 1 lo.For instance,theseven-bitnumber1001001canbesplit into thethree-bitterms1010000and � 111. Thispropertyis fortunate,becauseevenif � is odd,asit is in IEEE 754doubleprecisionarithmetic, 1 canbesplit into two ¼ a2 ½ -bit values.

Proof: Line 1 is equivalent to computing2º�1$+�1 . (Clearly, 2ºy1 can be expressedexactly, becausemultiplying avalueby apowerof two only changesits exponent,anddoesnotchangeits significand.)Line1 is subjectto rounding,sowehave ¹ � 2ºy1 � 1 � err. 2º�1�+Y1*/ .

Line 2 is alsosubjectto rounding,so 1 big� 2º�1 � err. 2º�1�+Y1*/ � err.2¹»,n1*/ . It will becomeapparent

shortly that theproof relieson showing that theexponentof 1 big is no greaterthantheexponentof 2ºy1 .Both ; err. 2º�1)+>1*/; and ; err.2¹@,>1*/; areboundedby 1

2ulp .2¹d/ , sotheexponentof 1 big canonly belargerthanthatof 2ºy1 if every bit of thesignificandof 1 is nonzeroexceptpossiblythe last (in four-bit arithmetic, 1musthave significand1110or 1111). By manuallycheckingthebehavior of SPLIT in thesetwo cases,onecanverify thattheexponentof 1 big is never largerthanthatof 2ºy1 .

Thereasonthis fact is usefulis because,with Line 2, it implies that ; err.2¹),?1*/;r< 12ulp . 2º�1*/ , andso

theerrortermerr.2¹»,Y1*/ is expressiblein ·¸� 1 bits (for ·�B 2).

By Lemma5, Lines 3 and4 arecalculatedexactly. It follows that 1 hi� 1\� err.2¹h,�1*/ , and 1 lo �

err.2¹»,Y1*/ ; thelatteris expressiblein ·¸� 1 bits. To show that 1 hi is expressiblein �\�n· bits,considerthatits leastsignificantbit cannotbe smallerthanulp .21 big / � 2º ulp .21*/ . If 1 hi hasthe sameexponentas 1 ,then 1 hi mustbeexpressiblein ���?· bits; alternatively, if 1 hi hasanexponentonegreaterthanthatof 1(because1�� err.2¹x,�1*/ hasa largerexponentthan 1 ), then 1 hi is expressiblein onebit (asdemonstratedinFigure11).

Finally, theexactnessof Line 4 impliesthat 1 � 1 hi� 1 lo asrequired. F

Multiplication is performedby setting · �¿¾ a2 À , so that the � -bit operands1 and � areeachsplit intotwo ¼ a2 ½ -bit pieces,1 hi, 1 lo, � hi, and � lo. The products1 hi � hi, 1 lo � hi, 1 hi � lo, and 1 lo � lo caneachbecomputedexactly by the floating-pointunit, producingfour values. Thesecould thenbe summedusingtheFAST-EXPANSION-SUM procedurein Section2.4. However, Dekker [5] providesseveral fasterwaystoaccomplishthecomputation.Dekkerattributesthefollowing methodto G. W. Veltkamp.

Arbitrary PrecisionFloating-PointArithmetic 19

1 � 1 1 1 0 123 1 � 1 1 1 0 1 � 23¹ � . 23 � 1/@-Y1 � 1 0 0 0 0 � 241 � 1 1 1 0 11 big

� ¹5,Y1 � 1 1 1 0 0 � 23

1 hi� ¹»,Y1 big

� 1 0 0 0 0 � 21

1 lo � 1�,Y1 hi� � 1 1

Figure11: Demonstration of SPLIT splitting a five-bit number into two two-bit numbers.

1 � 1 1 1 0 1 1� � 1 1 1 0 1 1�}� 1�-0� � 1 1 0 1 1 0 � 261 hi -0� hi� 1 1 0 0 0 1 � 26�ÁtÁ

1� � ,Â.21 hi -0� hi / � 1 0 1 0 0 0 � 231 lo -0� hi

� 1 0 1 0 1 0 � 22�ÁtÁ2� �ÁtÁ

1 ,Â.21 lo -0� hi / � 1 0 0 1 1 0 � 221 hi -0� lo � 1 0 1 0 1 0 � 22�ÁtÁ3� �ÁtÁ

2 ,Â.21 hi -0� lo / � � 1 0 0 0 01 lo -0� lo � 1 0 0 1�¸� � �ÁtÁ3 ,Â.21 lo -0� lo / � � 1 1 0 0 1

Figure12: Demonstration of TWO-PRODUCT in six-bit arithmetic where H�WYJ»W 111011, H hi WYJ hi W 111000,and H lo WnJ lo W 11. Note that each intermediate result is expressible in six bits. The resulting expansion is110110 Ã 26 I 11001.

Theorem18 Let 1 and � be � -bit floating-pointnumbers,where �>B 6. Thenthefollowingalgorithmwillproducea nonoverlappingexpansion� � � such that 1m� ��� � � , where � is an approximationto 1m� and� representstheroundoff error in thecalculationof � . Furthermore, if round-to-eventiebreakingis used,�and � arenonadjacent.(SeeFigure12.)

TWO-PRODUCT .21��f�/1 ��v 1�-0�2 .21 hi ��1 lo / � SPLIT .21�� ¾ a2 À /3 .]� hi �f� lo / � SPLIT .]�Ä� ¾ a2 À /4 �ÁtÁ

1v¿� ,Â.21 hi -0� hi /

5 �ÁtÁ2v �ÁtÁ

1 ,Â.21 lo -0� hi /6 �ÁtÁ

3v �ÁtÁ

2 ,Â.21 hi -0� lo /7 � v .21 lo -0� lo /[, �ÁtÁ 38 return . � �y�C/

20 JonathanRichardShewchuk

Proof: Line1 is subjectto rounding,sowehave �L� 1m� � err.21x-��/ . Themultiplicationsin Lines4 through7 areall exact,becauseeachfactorhasnomorethan ¼ a2 ½ bits; it will beproventhateachof thesubtractionsis alsoexact,andthus � � � err.21�-0�/ .

Without lossof generality, assumethattheexponentsof 1 and � are��� 1, sothat ; 1@; and ; �A; areintegersin therange� 2aO� 1 � 2a[� 1� . In theproofof Theorem17it emergedthat ; 1 hi ; and ; � hi ; areintegersin therange� 2aO� 1 � 2ad� , and ; 1 lo ; and ; � lo ; areintegersin therange � 0 � 2 Å adÆ 2Ç � 1 � . Fromtheserangesandtheassumptionthat�LB 6,onecanderivetheinequalities; 1 lo ;'< 1

8 ; 1 hi ; , ; � lo ;=< 18 ; � hi ; , anderr.21r-g�/»< 2aO� 1 < 1

32 ; 1 hi � hi ; .Intuitively, 1 hi � hi oughtto bewithin a factorof two of 1o-l� , so thatLine 4 is computedexactly (by

Lemma5). To confirmthis hunch,notethat �>� 1m� � err.219-?�/ � 1 hi � hi� 1 lo � hi

� 1 hi � lo � 1 lo � lo �err.21�-0�/ � 1 hi � hi � 19

64 ; 1 hi � hi ; (usingtheinequalitiesstatedabove),which justifiestheuseof Lemma5.BecauseLine 4 is computedwithout roundoff, �ÁtÁ 1 � 1 lo � hi

� 1 hi � lo � 1 lo � lo � err.21�-0�/ .We areassuredthat Line 5 is executedwithout roundoff error if thevalue �ÁtÁ 1 �01 lo � hi

� 1 hi � lo �1 lo � lo � err.21o-l�/ is expressiblein � bits. I prove that this propertyholdsby showing that the left-handexpressionis amultipleof 2 Å adÆ 2Ç , andtheright-handexpressionis strictly smallerthan2 Å 3adÆ 2Ç .

Theupperboundontheabsolutevalueof theright-handexpressionfollowsimmediatelyfrom theupperboundsfor 1 hi, 1 lo, � lo, anderr.21\-Â�/ . To show that the left-handexpressionis a multiple of 2 Å adÆ 2Ç ,considerthat �ÁtÁ 1 mustbea multiple of 2aO� 1 because1�-?� and 1 hi � hi have exponentsof at least2�\� 2.Hence,�ÁtÁ 1 ��1 lo � hi mustbeamultipleof 2 Å adÆ 2Ç because1 lo is aninteger, and � hi is a multipleof 2 Å adÆ 2Ç .Hence,Line 5 is computedexactly, and �ÁtÁ 2 � 1 hi � lo � 1 lo � lo � err.21�-0�/ .

To show thatLine 6 is computedwithout roundoff error, notethat 1 lo � lo is anintegerno greaterthan2aO� 1 (because1 lo and � lo areintegersnogreaterthan2 Å adÆ 2Ç � 1), anderr.21#-_�/ is anintegernogreaterthan2aO� 1. Thus, �ÁtÁ 3 � 1 lo � lo � err.21�-Y�/ is anintegernogreaterthan2a , andis expressiblein � bits.

Finally, Line 7 is exactsimplybecause� � � err.21s-n�/ canbeexpressedin � bits. Hence,1m� ��� � � .If round-to-eventiebreakingis used,� and � arenonadjacentby analogyto Corollary9. F

2.6 ExpansionScaling

The following algorithm,which multipliesan expansionby a floating-pointvalue,is thesecondkey newresultof this report.

Theorem19 Let � � � �� � 1� � bea nonoverlappingexpansionof ��� -bit components,andlet � bea � -bit

valuewhere �8B 4. Supposethat thecomponentsof � are sortedin order of increasingmagnitude, exceptthat anyof the � � maybezero. Thenthefollowing algorithmwill producea nonoverlappingexpansion

�such that

� � � 2�� �1� ��� � � , where thecomponentsof

�are alsoin orderof increasingmagnitude, except

thatanyof the� � maybezero. Furthermore, if � is nonadjacentandround-to-eventiebreakingis used,then�

is nonadjacent.

SCALE-EXPANSION . � �f�/1 . � 2 � � 1 / v TWO-PRODUCT . � 1 �f�/2 for ` v 2 to �3 .ÉÈ � �yÊ � / v TWO-PRODUCT . � � �f�/4 . � 2

� � 1 � � 2� � 2 / v TWO-SUM . � 2

� � 2 �yÊ � /5 . � 2

� � � 2� � 1 / v FAST-TWO-SUM .ÉÈ � � � 2

� � 1 /6

�2� v � 2�

7 return�

Arbitrary PrecisionFloating-PointArithmetic 21

TWOPROD

TWOPROD

TWOPROD

TWOPROD

�����

1�

2�

3�

4

���� ����È 2È 3È 4 ��� ��� Ê 2Ê 3Ê 4

TWOSUM

TWOSUM

TWOSUM

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

�������

2�

3�

4�

5�

6�

7�

8

�������� �1

�2

�3

�4

�5

�6

�7

�8

Figure13: Operation of SCALE-EXPANSION.

As illustratedin Figure13,SCALE-EXPANSION multiplieseachcomponentof � by � andsumstheresults.It shouldbe apparentwhy the final expansion

�is the desiredproduct,but it is not so obvious why the

componentsof�

areguaranteedto benonoverlappingandin increasingorder. Two lemmatawill aid theproof.

Lemma 20 Let � � and � ¢ betwo nonoverlappingnonzero componentsof � , with `:i?§ and ; � � ;mi|; � ¢ ; . LetÈ � bea correctlyroundedapproximationto � � � , andlet È � � Ê � bea two-componentexpansionexactlyequalto � � � . (Such an expansionis producedby Line 3, but here is definedalsofor ` � 1.) Then Ê � is too smallin magnitudeto overlapthedouble-widthproduct � ¢ � . Furthermore, if � � and � ¢ arenonadjacent,then Ê � isnotadjacentto � ¢ � .Proof: By scaling � and � by appropriatepowersof 2 (therebyshifting their exponentswithout changingtheirsignificands),onemayassumewithoutlossof generalitythat � ¢ and � areintegerswith magnitudelessthan2a , andthat ; � � ;=i 1 (andhencea radixpoint fallsbetween� ¢ and � � ).

It follows that � ¢ � is aninteger, and ; � � �A;Ci 2a . Thelatter factandexactroundingimply that ; Ê � ;C< 12.

Hence,� ¢ � and Ê � donotoverlap.

If � � and � ¢ arenonadjacent,scale� sothat � ¢ is anintegerand ; � � ;=i 12. Then ; Ê � ;'< 1

4, so � ¢ � and Ê � arenotadjacent. FLemma 21 For some , let Á bethesmallestinteger such that ; � � ;=i 2Ë (hence� � doesnotoverlap2Ë ). Then; � 2� ;'< 2Ëe; �A; , andthus ; � 2

� � 1 ;'< 2�� 1ulp .]�/ .Proof: The inequality ; � 2

� ;¡< 2Ëe; �A; holdsfor ` � 1 after Line 1 is executedeven if�

2 is roundedto alargermagnitude,because; � 1 �A;mi 2Ëe; �A; , and2Ëe; �A; is expressiblein � bits. For largervaluesof ` , theboundis proven by induction. Assumethat Ì is the smallestinteger suchthat ; � � � 1 ;»i 2Í ; by the inductivehypothesis,; � 2

� � 2 ;=< 2Í:; �A; .Because� � and � � � 1 arenonoverlapping,� � mustbe a multiple of 2Í . Supposethat Á is the smallest

integersuchthat ; � � ;'i 2Ë ; then ; � � ;=< 2Ë:� 2Í .

22 JonathanRichardShewchuk

Lines3, 4, and5 compute�

2� , anapproximationof

�2� � 2� � � � , andaresubjectto roundoff error in

Lines4 and5. Supposethat�

2� � 2 and � � � have thesamesign, that ; � 2

� � 2 ; hasits largestpossiblevalue2Í:; �A; , andthat ; � � ; hasits largestpossiblevalue2Ëx� 2Í . For theseassignments,roundoff doesnotoccurinLines4 and5, and ; � 2

� ; � ; � 2� � 2� � � �A; � 2ËA; �A; . Otherwise,roundoff mayoccur, but themonotonicityof

floating-pointmultiplicationandadditionensuresthat ; � 2� ; cannotbelargerthan2Ëe; �A; .

Theinequality ; � 2� � 1 ;=< 2�� 1ulp .]�/ is guaranteedby exactroundingbecause

�2� � 1 is theroundoff term

associatedwith thecomputationof�

2� in Line 5. F

Proofof Theorem19: Onecanproveinductivelythatattheendof eachiterationof thefor loop,theinvariant�2� � � 2

� � 1¢ �1� ¢ � � �¢ �

1� ¢ � holds. Certainlythis invariantholdsfor ` � 1 afterLine 1 is executed.By

inductiononLines3,4,and5,onecandeducethattheinvariantholdsfor all (relevantvaluesof) ` . (Theuseof FAST-TWO-SUM in Line 5 will bejustifiedshortly.) Thus,afterLine 6 is executed,� 2�¢ �

1� ¢ � � � �¢ � 1

� ¢ .I shallprove that thecomponentsof

�arenonoverlappingby showing thateachtime a componentof�

is written, thatcomponentis smallerthananddoesnot overlapeithertheaccumulator�

nor any of theremainingproducts( � ¢ � ); hence,thecomponentcannotoverlapany portionof their sum. Thefirst claim,that eachcomponent

� ¢ doesnot overlapthe accumulator� ¢ c 1, is true because

� ¢ is the roundoff errorincurredwhile computing

� ¢ c 1.

To show thateachcomponentof�

is smallerthananddoesnot overlaptheremainingproducts,I shallconsider

�1, theremainingoddcomponentsof

�, andtheevencomponentsof

�separately. Thecomponent�

1, computedby Line 1, doesnot overlapthe remainingproducts( � 2 �� � 3 �� ��� ) by virtue of Lemma20.Theevencomponents,which arecomputedby Line 4, do not overlaptheremainingproductsbecause,byapplicationof Lemma1 to Line 4, a component; � 2

� � 2 ; is no larger than ; Ê � ; , which is boundedin turn byLemma20.

Oddcomponentsof�,computedbyLine5,donotoverlaptheremainingproductsbyvirtueof Lemma21,

whichguaranteesthat ; � 2� � 1 ;=< 2Ë�� 1ulp .]�/ . Theremainingproductsareall multiplesof 2Ë ulp .]�/ (because

theremainingcomponentsof � aremultiplesof 2Ë ).If round-to-eventiebreakingis used,theoutputof eachTWO-SUM, FAST-TWO-SUM, andTWO-PRODUCT

statementis nonadjacent.If � is nonadjacentaswell, theargumentsaboveareeasilymodifiedto show that�is nonadjacent.

The useof FAST-TWO-SUM in Line 5 is justified because; È � ;�BÎ; � 2� � 1 ; (exceptif È �#� 0, in which

caseFAST-TWO-SUM still workscorrectly). To seethis, recall that � � is a multiple of 2Í (with Ì definedas in Lemma21), and considertwo cases: if ; � � ; � 2Í , then È � is computedexactly and Ê �6� 0, so; È � ; � 2Í:; �A;jB�; � 2

� � 2 ; � ; � 2� � 1 ; . If ; � � ; is larger than2Í , it is at leasttwice aslarge,andhenceÈ � is at

least2 ; � 2� � 2 ; , soevenif roundoff occursand Ê � is not zero, ; È � ;=bE; � 2

� � 2 ; � ; Ê � ;'BE; � 2� � 1 ; .

Note that if an input component� � is zero, then two zerooutputcomponentsareproduced,andtheaccumulatorvalueis unchanged(

�2�¡� �

2� � 2). F

ThefollowingcorollarydemonstratesthatSCALE-EXPANSION iscompatiblewith FAST-EXPANSION-SUM.

Corollary 22 If � is strongly nonoverlappingand round-to-even tiebreakingis used,then�

is stronglynonoverlapping.

Proof: Because� is nonoverlapping,�

is nonoverlappingby Theorem19. We have alsoseenthat if �is nonadjacent,then

�is nonadjacentandhencestronglynonoverlapping;but � is only guaranteedto be

stronglynonoverlapping,andmaydeviatefrom nonadjacency.

Arbitrary PrecisionFloating-PointArithmetic 23

TWOPROD

TWOPROD

��� �[� 2º� � c 1

� 2º3c 1

�� ��È �[� 2º �È � c 1

� 2º3c 1 � �� �� Ê �[� 0Ê � c 1� 0

TWOSUM

TWOSUM

FASTTWOSUM

FASTTWOSUM

������

2� � 2

�2� � 1

�2��

2� c 1

�2� c 2

���� �2� � 2� 0

�2� � 1

�2�¡� 0

�2� c 1

Figure14: An adjacent pair of one-bit components in a strongly nonoverlapping input expansion may causeSCALE-EXPANSION to produce an adjacent pair of one-bit components in the output expansion.

Supposetwo successive components� � and � � c 1 areadjacent.By the definitionof stronglynonover-lapping, � � and � � c 1 areboth powersof two andarenot adjacentto � � � 1 or � � c 2. Let · be the integersatisfying� �¡� 2º and � � c 1

� 2º3c 1. For thesecomponentsthemultiplicationof Line3 is exact,so È �¡� 2º�� ,È � c 1� 2º3c 1 � , and Ê �g� Ê � c 1

� 0. Applying Lemma1 to Line 4,�

2� � 2

� �2�6� 0. However, the

components�

2� � 1 and

�2� c 1 maycausedifficulty (seeFigure14). We know

�is nonoverlapping,but can

thesetwo componentsbeadjacentto theirneighborsor eachother?

Theargumentsusedin Theorem19to provethat�

is nonadjacent,if � is nonadjacentandround-to-eventiebreakingis used,can be appliedhereas well to show that

�2� � 1 and

�2� c 1 are not adjacentto any

componentsof�

producedbeforeor afterthem,but they maybeadjacentto eachother. Assumethat�

2� � 1

and�

2� c 1 areadjacent(they cannotbeoverlapping).�

2� c 1 is computedin Line 5 from È � c 1 and

�2� c 1. Thelatteraddendis equalto

�2� , becauseÊ � c 1

� 0.�2� is notadjacentto

�2� � 1, becausethey areproducedin Line 5 from aFAST-TWO-SUM operation.Hence,

theleastsignificantnonzerobit of�

2� c 1 (thatis, thebit thatcausesit tobeadjacentto

�2� � 1) musthavecome

from È � c 1, whichis equalto 2º3c 1 � . It followsthat�

2� c 1 is amultipleof 2º3c 1ulp .]�/ . Because; � � c 1 ;=i 2º3c 2,

Lemma21 impliesthat�

2� c 1 < 2º3c 1ulp .]�/ . Hence,

�2� c 1� 2º3c 1ulp .]�/ .

Similarly, because; � � ;�i 2º3c 1, Lemma21 impliesthat�

2� � 1 < 2º ulp .]�/ . Thecomponents

�2� c 1 and�

2� � 1 canonly beadjacentin thecase

�2� � 1� 2º ulp .]�/ . In this case,bothcomponentsareexpressiblein

onebit.

Hence,eachadjacentpair of one-bit componentsin the input can give rise to an isolatedadjacentpair of one-bitcomponentsin the output,but no otheradjacentcomponentsmay appear. If � is stronglynonoverlapping,sois

�. F

2.7 Compressionand Approximation

Thealgorithmsfor manipulatingexpansionsdonotusuallyexpresstheirresultsin themostcompactform. Inadditionto theinterspersedzerocomponentsthathavealreadybeenmentioned(andareeasilyeliminated),itisalsocommontofindcomponentsthatrepresentonlyafew bitsof anexpansion’svalue.Suchfragmentation

24 JonathanRichardShewchuk

rarelybecomessevere,but it cancausethelargestcomponentof anexpansionto bea poorapproximationof the valueof the wholeexpansion;the largestcomponentmay carryaslittle asonebit of significance.Sucha componentmay result,for instance,from cancellationduring the subtractionof two nearly-equalexpansions.

TheCOMPRESSalgorithmbelow findsa compactform for anexpansion.More importantly, COMPRESS

guaranteesthat the largestcomponentis a goodapproximationto thewholeexpansion. If round-to-eventiebreakingis used,COMPRESSalsoconvertsnonoverlappingexpansionsinto nonadjacentexpansions.

Priest[21] presentsa morecomplicated“Renormalization”procedurethat compressesoptimally. Itsgreaterrunningtime is rarelyjustifiedby themarginal reductionin expansionlength,unlessthereis a needto putexpansionsin acanonicalform.

Theorem23 Let � � � �� � 1� � be a nonoverlappingexpansionof �Ï� -bit components,where �ÐB 3.

Supposethat the componentsof � are sortedin order of increasingmagnitude, exceptthat any of the � �maybe zero. Thenthe following algorithm will producea nonoverlappingexpansion

�(nonadjacentif

round-to-eventiebreakingis used)such that� � � �� �

1� �o� � , where the components

� � are in orderof increasingmagnitude. If

� «� 0, noneof the� � will be zero. Furthermore, the largestcomponent

� �approximates

�with anerror smallerthanulp . � � / .

COMPRESS. � /1

� v � �2 ��ÑOÊ3ÊÒÑO� v �3 for ` v �Ó� 1 downto 14 . � ��Ôe/ v FAST-TWO-SUM . � � � � /5 if Ô «� 0 then6 ¨ �]Õ3Ö×Ö Õ � v �7 ��ÑOÊ3ÊÒÑO� v ��ÑOÊ3ÊÒÑO�Ó� 18

� v Ô9 ¨ �]Õ3Ö×Ö Õ � v �10 ÊÒÑ�� v 111 for ` v ��ÑOÊ3ÊÒÑO� � 1 to �12 . � ��Ôe/ v FAST-TWO-SUM . ¨ � � � /13 if Ô «� 0 then14

� Ö Õ a v �15 ÊÒÑ�� v ÊÒÑ�� � 116

� Ö Õ a v �17 Set � (thelengthof

�) to ÊÒÑ��

18 return�

Figure15 illustratesthe operationof COMPRESS. For clarity, ¨ and�

arepresentedas two separatearraysin theCOMPRESSpseudocode,but they canbecombinedinto asingleworkingarraywithoutconflictby replacingeveryoccurrenceof “ ¨ ” with “

�”.

Proof Sketch: COMPRESSworksby traversingtheexpansionfrom largestto smallestcomponent,thenbackfrom smallestto largest,replacingeachadjacentpairwith its two-componentsum.Thefirst traversal,fromlargestto smallest,doesmostof thecompression.Theexpansion � � ¨ � � 1

�����A� ¨ �]Õ3Ö×Ö Õ � producedbyLines1 through8 hasthepropertythat ¨ ¢ � 1 < ulp . ¨ ¢ / for all § (andthussuccessive componentsoverlapby atmostonebit). This factfollows becausetheoutputof FAST-TWO-SUM in Line 4 hasthepropertythat

Arbitrary PrecisionFloating-PointArithmetic 25

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

����Ø

�1

�2

�3

�4

�5

Ø Ø Ø

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

����� ¨

5

���� � � � ��5

�4

�3

�2

�1

Figure15: Operation of COMPRESSwhen no zero-elimination occurs.

Ôo< 12ulp . � / , andthevalueof Ô thusproducedcanonly beincreasedslightly by thesubsequentadditionof

smallernonoverlappingcomponents.

Thesecondtraversal,from smallestto largest,clipsany overlappingbits. Theuseof FAST-TWO-SUM inLine 12 is justifiedbecausethepropertythat ¨ � � 1 < ulp . ¨ � / guaranteesthat

�(thesumof thecomponents

thataresmallerthan ¨ � ) is smallerthan ¨ � . Theexpansion� Ö Õ a � � Ö Õ aO� 1

�?���� �2� �

1 is nonoverlapping(nonadjacentif round-to-even is used)becauseFAST-TWO-SUM producesnonoverlapping(nonadjacent)output.

During thesecondtraversal,anapproximatetotal is maintainedin theaccumulator�

. Thecomponent� � � 1 is producedby thelastFAST-TWO-SUM operationthatproducesa roundoff term;this roundoff termisnogreaterthan 1

2ulp . � � / . Hence,thesum ; � � � 1� � � � 2

�����t� �2� �

1 ; (wherethecomponentsof�

arenonoverlapping)is lessthanulp . � � / , therefore; � � � � ;=i ulp . � � / . F

To ensurethat� � is a goodapproximationto

�, only the secondtraversalis necessary;however, the

first traversalis moreeffective in reducingthenumberof components.Thefastestway to approximate� isto simply sumits componentsfrom smallestto largest;by thereasoningusedabove, theresulterrsby lessthanoneulp. This observationis thebasisfor anAPPROXIMATE procedurethat is usedin thepredicatesofSection4.

Theorem23 is not thestrongeststatementthatcanbemadeaboutCOMPRESS. COMPRESSis effectiveeven if the componentsof the input expansionhave a certainlimited amountof overlap. Furthermore,the boundfor ; � � � � ; is not tight. (I conjecturethat the largestpossiblerelative error is exhibitedby anumberthatcontainsanonzerobit every � th bit; notethat1

� 12ulp . 1/ � 1

4 � ulp . 1/� 2 �0��� cannotbefurthercompressed.)Theseimprovementscomplicatetheproofandarenotexploredhere.

2.8 Other Operations

Distillation is the processof summing Ú unordered� -bit values. Distillation can be performedby thedivide-and-conqueralgorithmof Priest[21], whichusesany expansionadditionalgorithmto sumthevalues

26 JonathanRichardShewchuk

ComponentExpansion

Two-Sum

Expansion Sum

Figure16: Distillation of sixteen Z -bit floating-point values.

in atree-likefashionasillustratedin Figure16. Each� -bit addendis aleafof thetree,andeachinteriornoderepresentsa call to an expansionadditionalgorithm. If EXPANSION-SUM is used(andzeroeliminationisnot),thenit doesnotmatterwhetherthetreeis balanced;distillationwill takeprecisely1

2 Ú@.]Ú)� 1/ TWO-SUM

operations,regardlessof the order in which expansionsarecombined. If FAST-EXPANSION-SUM is used,the speedof distillation dependsstronglyon the balanceof the tree. A well-balancedtreewill yield an�$.]Ú log ÚC/ distillation algorithm,an asymptoticimprovementover distilling with EXPANSION-SUM. As Ihavementioned,it is usuallyfastestto useanunrolledEXPANSION-SUM to createexpansionsof lengthfour,andFAST-EXPANSION-SUM with zeroeliminationto sumtheseexpansions.

To find theproductof two expansions� and ¤ , useSCALE-EXPANSION (with zeroelimination)to formtheexpansions� ¤ 1 � � ¤ 2 � ��� , thensumtheseusingadistillation tree.

Division cannotalways,of course,beperformedexactly, but it canbeperformedto arbitraryprecisionby an iterative algorithmthatemploys multiprecisionadditionandmultiplication. ConsultPriest[21] foronesuchalgorithm.

Theeasiestway to comparetwo expansionsis to subtractonefrom theother, andtestthesignof theresult. An expansion’s sign canbe easily testedbecauseof the nonoverlappingproperty;simply checkthesignof theexpansion’s mostsignificantnonzerocomponent.(If zeroeliminationis employed,checkthecomponentwith the largestindex.) A nonoverlappingexpansionis equalto zeroif andonly if all itscomponentsareequalto zero.

AdaptivePrecisionArithmetic 27

3 AdaptivePrecisionArithmetic

3.1 Why Adaptivity?

Exactarithmeticis expensive, andwhenit canbe avoided,it shouldbe. Someapplicationsdo not needexact results,but requirethe absoluteerror of a result to fall below somethreshold. If this thresholdisknownbeforethecomputationis performed,it is economicaltoemploy adaptivitybyprediction. Onewritesseveral procedures,eachof which approximatesthe resultwith a differentdegreeof precision,andwitha correspondinglydifferentspeed. Error boundsarederived for eachof theseprocedures;theseboundsare typically muchcheaperto computethanthe approximationsthemselves,except for the leastpreciseapproximation.For any particularinput,theapplicationcomputestheerrorboundsandusesthemto choosetheprocedurethatwill attainthenecessaryaccuracy mostcheaply.

Sometimes,however, onecannotdeterminewhethera computationwill be accurateenoughbeforeitis done. An exampleis whenonewishesto boundtherelative error, ratherthantheabsoluteerror, of theresult. (A specialcaseis determiningthesignof anexpression;theresultmusthaverelativeerrorlessthanone.)Theresultmayproveto bemuchlargerthanits errorbound,andlow precisionarithmeticwill suffice,or it maybesocloseto zerothat it is necessaryto evaluateit exactly to satisfytheboundon relative error.Onecannotgenerallyknow in advancehow muchprecisionis needed.

In thecontext of determinantevaluationfor computationalgeometry, FortuneandVanWyk [9] suggestusingafloating-pointfilter. An expressionis evaluatedapproximatelyin hardwareprecisionarithmeticfirst.Forwarderroranalysisdetermineswhethertheapproximateresultcanbetrusted;if not, anexactresultiscomputed.If theexactcomputationis only neededoccasionally, theapplicationis slowedonly a little.

Onemight hopeto improve this ideafurtherby computinga sequenceof increasinglyaccurateresults,testingeachonein turn for accuracy. Alas, wheneveranexactresultis required,onesuffersboththecostof theexactcomputationandtheadditionalburdenof computingseveralapproximateresultsin advance.Fortunately, it is oftenpossibleto useintermediateresultsassteppingstonesto moreaccurateresults;workalreadydoneis notdiscardedbut is refined.

3.2 Making Arithmetic Adaptive

FAST-TWO-SUM, TWO-SUM, andTWO-PRODUCTeachhavethefeaturethatthey canbebrokeninto two parts:Line 1, whichcomputesanapproximateresult,andtheremaininglines,whichcalculatetheroundoff error.The latter, moreexpensive calculationcanbedelayeduntil it is needed,if it is ever neededat all. In thissense,theseroutinescanbemadeadaptive, sothatthey only produceasmuchof theresultasis needed.Idescribeherehow to achievethesameeffectwith moregeneralexpressions.

Any expressioncomposedof addition, subtraction,and multiplication operationscan be calculatedadaptivelyin amannerthatdefinesanaturalsequenceof intermediateresultswhoseaccuracy it isappropriateto test.Sucha sequenceis mosteasilydescribedby consideringthetreeassociatedwith theexpression,asin Figure17(a). The leavesof this treerepresentfloating-pointoperands,andits internalnodesrepresentoperations.Replaceeachnodewhosechildrenarebothleaveswith thesum ��� � � � , where��� representstheapproximatevalueof thesubexpression,and � � representstheroundoff error incurredwhile calculating���(Figure17(b));thenexpandtheexpressionto form a polynomial.

In theexpandedexpression,thetermscontainingmany occurrencesof � variablesaredominatedbytermscontainingfeweroccurrences.As anexample,considertheexpression.21'Ûh�>�fÛ'/ 2 � .21'Ü»�8�fÜt/ 2 (Figure17),

28 JonathanRichardShewchuk

ax bybx ayax bx byay

(a)

1 x1 x 2 y1 y2x 2 x1 x 2 y1 y1 y2 y2

A1

A2

A3

x

2

Expansion Sum(s)

Two-Product

Two-Sum

ExpansionComponent

21 T0T T

22

O(1) O( )O( )

(c)

+ yx1 1 + yx1 1 + y2x 2 + y2x 2

(b)

1 x1 x 2 y1 y2x 2 x1 x 2 y1 y1 y2 y2x 2

B1

B2

B3

B4

5B

2

2

4

3O( ) terms

O( ) terms

O( ) term

O( ) terms

(d)

Figure17: (a) Formula for the square of the distance between two points H and J . (b) The lowest subex-pressions in the tree are expressed as the sum of an approximate value and a roundoff error. (c) A simpleincremental adaptive method for evaluating the expression. The approximations Ý 1 and Ý 2 are generatedand tested in turn. The final expansion Ý 3 is exact. Each Ý P includes all terms of size Þ�QÉßÒP ² 1 U or larger,and hence has error no greater than Þ�QÉßÒP2U . (d) Incremental adaptivity taken to an extreme. The threesubexpression trees à 0, à 1, and à 2 are themselves calculated adaptively. Each á P contains only the termsneeded to reduce its error to Þ�QÉßÒP2U .which calculatesthesquareof thedistancebetweentwo pointsin theplane. Set 1'Û9�?�fÛ ��� 1

� � 1 and1'Ü#�n�fÜ ��� 2� � 2. Theresultingexpression,expandedin full, is

. � 21� � 2

2 / � . 2� 1 � 1�

2� 2 � 2 / � .�� 21� � 2

2 / � (5)

It is significantthateach� � is small relative to its corresponding��� . Usingstandardterminologyfromforwarderroranalysis[26], thequantity1

2ulp . 1/ is calledthemachineepsilon, denotedâ . Recallthatexactroundingguaranteesthat ; � � ;'<�âO; ��� ; ; thequantity â boundstherelativeerror err.21:4u�/yãm.21:4u�/ of any basicfloating-pointoperation.Notethat â � 2 �=a . In IEEE 754doubleprecisionarithmetic,â � 2 � 53; in singleprecision,â � 2 � 24.

Expression5 canbedividedinto threeparts,having magnitudesof �$. 1/ , �$.]â�/ , and �$.]â 2 / , respectively.Denotetheseparts È 0, È 1, and È 2. More generally, for any expressionexpandedin this manner, let È � bethesumof all productscontaining of the � variables,sothat È � hasmagnitude�$.]â � / .

AdaptivePrecisionArithmetic 29

1 x1 x 2 y1 y2x 2 x1 x 2 y1 y1 y2 y2

1A

2C 2A

3C 43 C=A

2 2

ct ct

C1

1TT 0 T 2

x

ct

Figure18: An adaptive method of intermediate complexity that is frequently more efficient than the othertwo. Each ä P achieves an Þ�QÉßÒP2U error bound by adding an inexpensive correctional term (labeled “ct”) toÝ P ² 1.

Onecanobtainanapproximationå ¢ with errorno largerthan �$.]â ¢ / by computingexactly thesumofthefirst § terms,È 0 throughÈ ¢ � 1. Thesequenceå 1 ��å 2 � ��� of increasinglyaccurateapproximationscanbeformedincrementally;å ¢ is theexactsumof å ¢ � 1 and È ¢ � 1. Membersof thissequencearegeneratedandtested,asillustratedin Figure17(c),until oneis sufficiently accurate.

A more intricate methodis to modify this techniqueso that the subexpressionsÈ 0, È 1, and È 2 arethemselvescomputedadaptively. To produceanapproximationhaving errorof magnitude�$.]â ¢ / , oneneedonly approximateeachÈ termwith error �$.]â ¢ / ; theseapproximationsaresummedexactly to form aresultæ ¢ . Becausetheterm È ª hasmagnitudeatmost �$.]â ª / , it neednotbeapproximatedwith any betterrelativeerror than �$.]â ¢ � ª / . This approachmaybeeconomicalwhenadaptive choicescanbemadeby prediction.It canalsobeusedincrementally, asillustratedin Figure17(d),but thecostis usuallyunnecessarilylargebecauseof unbalancedadditionsandtheoverheadof keepingtrackof many smallpiecesof thesum.

A bettermethodfor incrementaladaptivity, whichis usedtoderivethegeometricpredicatesin Section4,fallssomewherebetweenthetwo describedabove. As in thefirstmethod,computethesequenceå 1 ��å 2 � ��� ,anddefinealso å 0

� 0. To obtainanapproximationwith errorno largerthan �$.]â ¢ / , take å ¢ � 1 (insteadofå ¢ ), andadd(exactly)aninexpensivecorrectionaltermthatapproximatesÈ ¢ � 1 (with ordinaryfloating-pointarithmetic)to form a new approximationç ¢ , asillustratedin Figure18. Thecorrectionaltermreducesthe

30 JonathanRichardShewchuk

errorfrom �$.]â ¢ � 1 / to �$.]â ¢ / , so ç ¢ is nearlyasaccurateas å ¢ but takesmuchlesswork to compute.Thisschemereusesthework donein performingexactcalculations,but doesnot reusethecorrectionalterms.The first value( ç 1) computedby this methodis an approximationto È 0; if ç 1 is sufficiently accurate,itis unnecessaryto computethe � terms,or useany exact arithmetictechniques,at all. (Recall that the �termsaremoreexpensiveto computethanthe � terms.)Thisfirst testis identicalto FortuneandVanWyk’sfloating-pointfilter.

This methoddoesmorework duringeachstageof thecomputationthanthefirst method,but typicallyterminatesonestageearlier. It is slower whenthe exact resultmustbe computed,but is generallyfasterin applicationsthatrarelyneedanexactresult. In somecases,it maybedesirableto testmembersof bothsequenceså and ç for accuracy; thepredicatesdefinedin Section4 doso.

Thereadermaywonderif writing anexpressionin sum-of-productsform isn’t inefficient. In ordinaryfloating-pointarithmeticit often is, but it seemsto make little differencewhenusingtheexactarithmeticalgorithmsof Section2. Indeed,the multiplication operationdescribedin Section2.8 multiplies twoexpansionsby expandingtheproductinto sum-of-productsform.

Theseideasarenotexclusivelyapplicableto themultiple-termapproachtoarbitraryprecisionarithmetic.They will work with multiple-digit formatsaswell, thoughthedetailsdiffer.

4 Implementation of GeometricPredicates

4.1 RelatedWork in Robust Computational Geometry

Mostgeometricalgorithmsarenotoriginally designedfor robustnessatall; they arebasedontherealRAMmodel, in which quantitiesareallowedto bearbitraryrealnumbers,andall arithmeticis exact. Thereareseveralwaysageometricalgorithmthatis correctwithin therealRAM modelcangowrongin anencounterwith roundoff error. Theoutputmight be incorrect,but becorrectfor someperturbationof its input. Theresultmight beusableyet not bevalid for any imaginableinput. Or, theprogrammaysimply crashor failto producearesult.To reflectthesepossibilities,geometricalgorithmsaredividedinto severalclasseswithvaryingamountsof robustness:exactalgorithms, which arealwayscorrect;robustalgorithms, which arealwayscorrectfor someperturbationof the input; stablealgorithms, for which theperturbationis small;quasi-robustalgorithms, whoseresultsmight begeometricallyinconsistent,but neverthelesssatisfysomeweakenedconsistency criterion; and fragile algorithms, which arenot guaranteedto produceany usableoutputat all. The next severalpagesaredevotedto a discussionof representative researchin eachclass,andof thecircumstancesin whichexactarithmeticandothertechniquesareor arenotapplicable.For moreextensivesurveysof geometricrobustness,seeFortune[7] andHoffmann[13].

Exact algorithms. A geometricalgorithmis exactif it is guaranteedto produceacorrectresultwhengivenan exact input. (Of course,the input to a geometricalgorithmmay only be an approximationof somereal-world configuration,but thisdifficulty is ignoredhere.)Exactalgorithmsuseexactarithmeticin someform, whetherin theform of amultiprecisionlibrary or in amoredisguisedform.

Thereareseveralexactarithmeticschemesdesignedspecificallyfor computationalgeometry;mostaremethodsfor exactlyevaluatingthesignof a determinant,andhencecanbeusedto performtheorientationandincircletests.Clarkson[4] proposesanalgorithmfor usingfloating-pointarithmeticto evaluatethesignof thedeterminantof asmallmatrixof integers.A variantof themodifiedGram-Schmidtprocedureis usedto improve theconditioningof thematrix, sothatthedeterminantcansubsequentlybeevaluatedsafelybyGaussianelimination.The53 bits of significandavailablein IEEEdoubleprecisionnumbersaresufficient

Implementationof GeometricPredicates 31

to operateon 10 � 10 matricesof 32-bit integers. Clarkson’s algorithmis naturallyadaptive; its runningtime is smallfor matriceswhosedeterminantsarenotnearzero6.

Recently, Avnaim,Boissonnat,Devillers, Preparata,andYvinec [1] proposedanalgorithmto evaluatesignsof determinantsof 2 � 2 and3 � 3 matricesof � -bit integersusingonly � and . � � 1/ -bit arithmetic,respectively. Surprisingly, this is sufficient evento implementtheinspheretest(which is normallywrittenasa 4 � 4 or 5 � 5 determinant),but with a handicapin bit complexity; 53-bit doubleprecisionarithmeticis sufficient to correctlyperformtheinspheretestonpointshaving 24-bit integercoordinates.

FortuneandVanWyk [10, 9] proposea moregeneralapproach(not specificto determinants,or evento predicates)that representsintegersusing a standardmultiple-digit techniquewith digits of radix 223

storedas doubleprecisionfloating-pointvalues. (53-bit doubleprecisionsignificandsmake it possibleto addseveral productsof 23-bit integersbeforeit becomesnecessaryto normalize.) Ratherthanuseageneral-purposearbitraryprecisionlibrary, they have developedLN, an expressioncompiler that writescodeto evaluatea specificexpressionexactly. Thesizeof theoperandsis arbitrary, but is fixedwhenLNis run; anexpressioncanbeusedto generateseveralfunctions,eachfor argumentsof differentbit lengths.Becausethe expressionand the bit lengthsof all operandsarefixed in advance,LN can tune the exactarithmeticaggressively, eliminatingloops,functioncalls,andmemorymanagement.Therunningtime ofa functionproducedby LN dependson thebit complexity of the inputs. FortuneandVanWyk reportanorder-of-magnitudespeedimprovementover theuseof multiprecisionlibraries(for equalbit complexity).Furthermore,LN gainsanotherspeedimprovementby installingfloating-pointfilterswhereverappropriate,calculatingerrorboundsautomatically.

Karasick,Lieber, and Nackman[14] report their experiencesoptimizing a methodfor determinantevaluationusingrational inputs. Their approachreducesthe bit complexity of the inputsby performingarithmeticonintervals(with low precisionbounds)ratherthanexactvalues.Thedeterminantthusevaluatedis also an interval; if it containszero, the precisionis increasedand the determinantreevaluated. Theprocedureis repeateduntil theinterval doesnotcontainzero(or containsonly zero),andtheresultis certain.Their approachis thusadaptive,althoughit doesnot appearto usetheresultsof oneiterationto speedthenext.

Becausethe ClarksonandAvnaim et al. algorithmsareeffectively restrictedto low precisionintegercoordinates,I do not comparetheir performancewith thatof my algorithms,thoughtheirsmaybe faster.Floating-pointinputsaremoredifficult to work with thanintegerinputs,partlybecauseof thepotentialforthebit complexity of intermediatevaluesto grow morequickly. (TheKarasicketal. algorithmalsosuffersthisdifficulty, andis probablynotcompetitivewith theothertechniquesdiscussedhere,althoughit maybethebestexistingalternativefor algorithmsthatrequirerationalnumbers,suchasthosecomputingexactlineintersections.)Whenit is necessaryfor analgorithmto usefloating-pointcoordinates,theaforementionedmethodsarenot currentlyanoption (althoughit might be possibleto adaptthemusingthe techniquesofSection2). I amnotawareof any prior literatureonexactdeterminantevaluationthatconsidersfloating-pointoperands,exceptfor onelimitedexample:Ottmann,Thiemt,andUllrich [20] advocatetheuseof anaccuratescalarproductoperation,ideally implementedin hardware(thougha softwaredistillation algorithmmayalsobeused),asa way to evaluatesomepredicatessuchasthe2D orientationtest.

Exactdeterminantalgorithmsdonotsatisfytheneedsof all applications.A programthatcomputeslineintersectionsrequiresrationalarithmetic;anexactnumeratorandexactdenominatormustbestored.If the

6Themethodpresentedin Clarkson’s paperdoesnot work correctlyif thedeterminantis exactly zero,but Clarkson(personalcommunication)notesthat it is easily fixed. “By keepingtrack of the scalingdoneby the algorithm, an upperboundcanbemaintainedfor themagnitudeof thedeterminantof thematrix. Whenthatupperbounddropsbelow one,thedeterminantmustbezero,sincethematrixentriesareintegers,andthealgorithmcanstop.”

32 JonathanRichardShewchuk

intersectionsmaythemselvesbecomeendpointsof linesthatgeneratemoreintersections,thenintersectionsof greaterand greaterbit complexity may be generated. Even exact rational arithmetic is not alwayssufficient; a solid modeler, for instance,might needto determinethe verticesof the intersectionof twoindependentsolidsthathavebeenrotatedthrougharbitraryangles.Yetexactfloating-pointarithmeticcan’tevencopewith rotatingasquare45è in theplane,becauseirrationalvertex coordinatesresult.Thisproblemmightbesolvableby storingcoordinatesin symbolicformandresolvingall combinatorialquerieswith greatnumericalcare,but sucha treatmentwouldalmostcertainlybesorelyexpensive. For theremainderof thisdiscussion,considerationis restrictedto algorithmswhoseinputis geometric(e.g.coordinatesarespecified)but whoseoutputis purelycombinatorial,suchastheconstructionof a convex hull or an arrangementofhyperplanes.

Robust algorithms. Therearealgorithmsthatcanbemadecorrectwith straightforward implementationsof exactarithmetic,but suffer anunacceptablelossof speed.An alternative is to relaxtherequirementfora correctsolution,andinsteadaccepta solutionthat is “close enough”in somesensethat dependsuponthe application. Without exact arithmetic,an algorithm must somehow find a way to producesensibleoutputdespitethe fact that geometrictestsoccasionallytell it lies. No generaltechniqueshave emergedyet, althoughan army of bandageshasappearedfor specificalgorithms,usuallyensuringrobustnessorquasi-robustnessthroughpainstakingdesignanderroranalysis.Thelack of generalityof thesetechniquesis not theonly limitation of therelaxedapproachto robustness;thereis a morefundamentaldifficulty thatdeservescarefuldiscussion.

Whendisasterstrikesanda realRAM-correctalgorithmimplementedin floating-pointarithmeticfailsto producea meaningfulresult, it is often becausethe algorithmhasperformedtestswhoseresultsaremutuallycontradictory. Figure19 shows anerror thatarosein a two-dimensionalDelaunaytriangulationprogramI wrote. Theprogram,which employs a divide-and-conqueralgorithmpresentedby GuibasandStolfi [12], failed in a subroutinethatmergestwo triangulationsinto one. Thegeometricallynonsensicaltriangulationin theillustrationwasproduced.

On closeinspectionwith a debugger, I found that the failure wascausedby a single incorrectresultof the incircle test. At thebottomof Figure19 appearfour nearly-collinearpointswhosedeviation fromcollinearity hasbeengreatlyexaggeratedfor clarity. The points 1 , � , ¹ , and é hadbeensortedby their� -coordinates,and � hadbeencorrectlyestablished(by orientationtests)to lie below theline 1=¹ andabovethe line 1=é . In principle,a programcoulddeducefrom thesefactsthat 1 cannotfall insidethecircle é'¹� .Unfortunately, theincircle testincorrectlydeclaredthat 1 lay inside,therebyleadingto theinvalid result.

It is significantthattheincircle testwasnot justwrongabouttheseparticularpoints;it wasinconsistentwith the “known combinatorialfacts”. A correctalgorithm(that computesa purelycombinatorialresult)will produceameaningfulresultif its testresultsarewrongbut areconsistentwith eachother, becausethereexistsan input for which thosetestresultsarecorrect. Following Fortune[6], analgorithmis robust if italwaysproducesthecorrectoutputunderthe realRAM model,andunderapproximatearithmeticalwaysproducesanoutputthatis consistentwith somehypotheticalinput thatis a perturbationof thetrueinput; itis stableif this perturbationis small. Typically, boundson theperturbationareprovenby backwarderroranalysis.Usingonly approximatearithmetic,Fortunegivesanalgorithmthatcomputesaplanarconvex hullthatis correctfor pointsthathavebeenperturbedby a relative errorof at most �$.]â�/ (where â is definedasin Section3.2),andanalgorithmthatmaintainsa triangulationthatcanbemadeplanarby perturbingeachvertex by a relative errorof at most �$.�� 2 â�/ , where � is thenumberof vertices.If it seemssurprisingthata “stable” algorithmcannotkeepa triangulationplanar, considertheproblemof insertinga new vertex socloseto anexistingedgethatit is difficult to discernwhichsideof theedgethevertex falls on. Only exactarithmeticcanpreventthepossibilityof creatingan“inverted”triangle.

Implementationof GeometricPredicates 33

b

a

c

d

Figure19: Top left: A Delaunay triangulation. Top right: An invalid triangulation created due to roundofferror. Bottom: Exaggerated view of the inconsistencies that led to the problem. The algorithm “knew” thatthe point J lay between the lines Htê and Htë , but an incorrect incircle test claimed that H lay inside the circleëOêfJ .

34 JonathanRichardShewchuk

Onemight wonderif my triangulationprogramcanbemaderobustby avoiding any testwhoseresultcanbeinferredfrom previoustests.Fortune[6] explainsthat

[a]n algorithm is parsimoniousif it never performsa test whoseoutcomehasalreadybeendeterminedastheformal consequenceof previoustests.A parsimoniousalgorithmis clearlyrobust,sinceany paththroughthealgorithmmustcorrespondto somegeometricinput;makinganalgorithmparsimoniousis themostobviouswayof makingit robust. In principleit ispossibleto make analgorithmparsimonious:sinceall primitive testsarepolynomialsignevaluations,thequestionof whetherthecurrenttestis alogicalconsequenceof previoustestscanbephrasedas a statementof the existential theoryof the reals. This theory is at leastNP-hardand isdecidablein polynomialspace[3]. Unfortunately, the full power of the theoryseemsto benecessaryfor someproblems. An exampleis the line arrangementproblem: given a setoflines(specifiedby realcoordinates.21��f�Ä��¹d/ , sothat 1 � � �f� � ¹ ), computethecombinatorialstructureof theresultingarrangementin theplane. It follows from recentwork of Mnev [19]that theproblemof decidingwhethera combinatorialarrangementis actuallyrealizablewithlines is ashardastheexistentialtheoryof the reals. Hencea parsimoniousalgorithmfor theline arrangementproblem ��� seemsto requirethesolutionof NP-hardproblems.

Becauseexact arithmeticdoesnot requirethe solutionof NP-hardproblems,an intermediatecourseis possible;onecould employ parsimony whenever it is efficient to do so, andresortto exact arithmeticotherwise.Consistency is guaranteedif exacttestsareusedto bootstrapthe“parsimony engine.” I amnotawareof any algorithmsin the literaturethat take this approach,althoughgeometricalgorithmsareoftendesignedby theirauthorsto avoid themoreobviously redundanttests.

Quasi-robust algorithms. Thedifficulty of determiningwhethera line arrangementis realizablesuggeststhat,withoutexactarithmetic,robustnessasdefinedabovemaybeanunattainablegoal.However, sometimesonecansettlefor an algorithmwhoseoutputmight not be realizable. I placesuchalgorithmsin a baglabeledwith thefuzzytermquasi-robust, whichI applyto any algorithmwhoseoutputis somehow provablydistinguishablefrom nonsense.Milenkovic [18] circumventstheaforementionedNP-hardnessresultwhileusingapproximatearithmeticbyconstructingpseudo-linearrangements;apseudo-lineisacurveconstrainedto lie veryclosetoanactualline. Fortune[8] presentsa2DDelaunaytriangulationalgorithmthatconstructs,using approximatearithmetic,a triangulationthat is nearly Delaunayin a well-definedsenseusing thepseudo-line-like notion of pseudocircles.Unfortunately, the algorithm’s running time is �$.�� 2 / , whichcomparespoorly with the �$.�� log ��/ time of optimalalgorithms. Milenkovic’s andFortune’s algorithmsarebothquasi-stable, havingsmallerrorbounds.Milenkovic’salgorithmcanbethoughtof asaquasi-robustalgorithmfor line arrangements,or asa robustalgorithmfor pseudo-linearrangements.

Thedegreeof robustnessrequiredof anapplicationis typically determinedby how theoutputis used.For instance,many point locationalgorithmscanfail whengivena non-planartriangulation.For this veryreason,my triangulatorcrashedafterproducingtheflawedtriangulationin Figure19.

Thereadershouldtake threelessonsfrom this section. First, problemsdueto roundoff canbesevereanddifficult to solve. Second,evenif theinputsareimpreciseandtheuserisn’t picky abouttheaccuracy oftheoutput,internalconsistency maystill benecessaryif any outputis to beproducedatall; exactarithmeticmayberequiredevenwhenexactresultsaren’t. Third, neitherexactarithmeticnorcleverhandlingof teststhattell falsehoodsis auniversalbalm. However, exactarithmeticis attractivewhenit is applicable,becauseit canbeemployedby naıve programdeveloperswithout thetime-consumingneedfor carefulanalysisofa particularalgorithm’s behavior whenfacedwith imprecision. (I occasionallyhearof implementationswheremorethanhalf the developers’time is spentsolving problemsof roundoff error anddegeneracy.)Hence,efforts to improvethespeedof exactarithmeticin computationalgeometryarewell justified.

Implementationof GeometricPredicates 35

4.2 The Orientation and Incir cleTests

Let 1 , � , ¹ , and é befour pointsin theplane.Definea procedureORIENT2D .21��f�Ä��¹d/ thatreturnsa positivevalueif thepoints 1 , � , and ¹ arearrangedin counterclockwiseorder, a negative valueif thepointsareinclockwiseorder, andzeroif thepointsarecollinear. A morecommon(but lesssymmetric)interpretationis that ORIENT2D returnsa positive valueif ¹ lies to the left of the directedline 1m� ; for this purposetheorientationtestis usedby many geometricalgorithms.

Definealsoa procedureINCIRCLE .21��f�Ä��¹O��ém/ that returnsa positive valueif é lies insidethe orientedcircle 1m��¹ . By orientedcircle, I meantheunique(andpossiblydegenerate)circle through1 , � , and ¹ , withthesepointsoccurringin counterclockwiseorderaboutthecircle. (If thesepointsoccurin clockwiseorder,INCIRCLE will reversethesignof its output,asif thecircle’s exterior wereits interior.) INCIRCLE returnszeroif andonly if all four pointslie onacommoncircle. BothORIENT2D andINCIRCLE havethesymmetrypropertythatinterchangingany two of theirparametersreversesthesignof their result.

Thesedefinitionsextendtrivially to arbitrarydimensions.For instance,ORIENT3D .21��f�Ä��¹O��ém/ returnsapositivevalueif é liesbelow theorientedplanepassingthrough1 , � , and ¹ . By orientedplane, I meanthat1 , � , and ¹ appearin counterclockwiseorderwhenviewedfrom abovetheplane.(Onecanapplya left-handrule: orient your left handwith fingerscurledto follow the circular sequence1m��¹ . If your thumbpointstoward é , ORIENT3D returnsa positive value.) To generalizethe orientationtest to dimensionalityé , letì

1 � ì 2 � ��� � ìjí betheunit vectors;ORIENT is definedsothatORIENT . ì 1 � ì 2 � ��� � ìjí � 0/ � 1.

In any dimension,theorientationandincircle testsmay be implementedasmatrix determinants.Forthreedimensions:

ORIENT3D .21��f�Ä��¹O��ém/ � îîîîîîîîî

1'ÛÐ1'Üï1=ð 1�fÛ}�fÜñ��ð 1¹�Ûò¹�Üó¹ð 1éeÛôéeÜïé'ð 1

îîîîîîîîî(6)

� îîîîîîî

1'Û���éeÛô1'ܸ��éeÜï1=ð)��é'ð�fÛ���éeÛ}�fÜ#��éeÜñ��ð)��é'ð¹�Û���éeÛò¹�Ü#��éeÜó¹ðh�_é'ð îîîîîîî(7)

INSPHERE.21��f�Ä��¹O��éC� � / � îîîîîîîîîîî

1'ÛÐ1'Üï1=ðõ1 2Û � 1 2Ü � 1 2ð 1�fÛ}�fÜñ��ðö� 2Û � � 2Ü � � 2ð 1¹�Ûò¹�Üó¹ð÷¹ 2Û � ¹ 2Ü � ¹ 2ð 1éeÛôéeÜïé'ðøé 2Û � é 2Ü � é 2ð 1� Û � Ü � ð � 2Û � � 2Ü � � 2ð 1

îîîîîîîîîîî

(8)

� îîîîîîîîî

1'Û�� � Ûô1'ܸ� � Üï1=ð)� � ðô.21'Û�� � Û=/ 2 � .21'Ü#� � ÜA/ 2 � .21=ð)� � ðÄ/ 2�fÛ�� � Û}�fÜ#� � Üñ��ð)� � ðù.]�fÛ�� � Û'/ 2 � .]�fÜ#� � ÜA/ 2 � .]��ðh� � ðÄ/ 2¹�Û�� � Ûò¹�Ü#� � Üó¹ðh� � ðú.2¹�Û�� � Û'/ 2 � .2¹�Ü#� � ÜA/ 2 � .2¹ð)� � ðÄ/ 2éeÛ�� � ÛôéeÜ#� � Üïé'ðh� � ðû.2éeÛ�� � Û'/ 2 � .2éeÜ#� � ÜA/ 2 � .2é'ð)� � ðÄ/ 2îîîîîîîîî(9)

Theseformulaegeneralizeto otherdimensionsin theobviousway. Expressions6 and7 canbeshownto beequivalentby simplealgebraictransformations,ascanExpressions8 and9 with a little moreeffort.Theseequivalencesareunsurprisingbecauseoneexpectsthe resultsof any orientationor incircle testnotto changeif all thepointsundergo anidenticaltranslationin theplane.Expression7, for instance,followsfrom Expression6 by translatingeachpointby �#é .

Whencomputingthesedeterminantsusingthetechniquesof Section2, thechoicebetweenExpressions6and7, or between8 and9, is not straightforward. In principle,Expression6 seemspreferablebecauseit

36 JonathanRichardShewchuk

Figure20: Shaded triangles can be translated to the origin without incurring roundoff error (Lemma 5). Inmost triangulations, such triangles are the common case.

canonly producea96-componentexpansion,whereasExpression7 couldproduceanexpansionhaving 192components.Thesenumbersaresomewhatmisleading,however, becausewith zero-elimination,expansionsrarelygrow longerthansix componentsin realapplications.Nevertheless,Expression7 takesroughly25%moretime to computein exactarithmetic,andExpression9 takesabout50%moretime thanExpression8.Thedisparitylikely increasesin higherdimensions.

Nevertheless,themechanicsof errorestimationturn thetide in theotherdirection. Importantasa fastexact test is, it is equallyimportantto avoid exact testswhenever possible. Expressions7 and9 tendtohavesmallererrors(andcorrespondinglysmallererrorestimates)becausetheir errorsarea functionof therelative coordinatesof thepoints,whereastheerrorsof Expressions6 and8 area functionof theabsolutecoordinatesof thepoints.

In mostgeometricapplications,thepointsthatserve asparametersto geometricteststendto becloseto eachother. Commonly, their absolutecoordinatesaremuchlargerthanthedistancesbetweenthem. Bytranslatingthepointssothey lie neartheorigin, workingprecisionis freedfor thesubsequentcalculations.Hence,theerrorsanderrorboundsfor Expressions7and9aregenerallymuchsmallerthanfor Expressions6and8. Furthermore,thetranslationcanoftenbedonewithout roundoff error. Figure20demonstratesa toyproblem: supposeORIENT2D is usedto find theorientationof eachtrianglein a triangulation.ThankstoLemma5, any shadedtrianglecanbetranslatedsothatoneof its verticesliesat theorigin without roundofferror; the white trianglesmay or may not suffer from roundoff during suchtranslation. If the completetriangulationis muchlargerthantheportionillustrated,only asmallproportionof thetriangles(thosenearacoordinateaxis)will suffer roundoff. Becauseexacttranslationis thecommoncase,my adaptivegeometricpredicatestestfor andexploit thiscase.

Implementationof GeometricPredicates 37

Onceadeterminanthasbeenchosenfor evaluation,thereareseveralmethodstoevaluateit. A numberofmethodsaresurveyedby FortuneandVanWyk [9], andonly theirconclusionis repeatedhere.Thecheapestmethodof evaluatingthedeterminantof a 5 � 5 or smallermatrix seemsto beby dynamicprogrammingappliedto cofactorexpansion.Evaluatethe ü í2ý determinantsof all 2 � 2 minorsof thefirst two columns,thenthe ü í3 ý determinantsof all 3 � 3 minorsof thefirst two columns,andsoon. All four of my predicatesusethismethod.

4.3 ORIENT2D

My implementationof ORIENT2D computesa sequenceof up to four results(labeledA throughD) asillustratedin Figure21. TheexactresultD maybeaslong assixteencomponents,but zeroeliminationisused,soa lengthof two to six componentsis morecommonin practice.

A, B, andC arelogicalplacesto testtheaccuracy of theresultbeforecontinuing.In mostapplications,the majority of calls to ORIENT2D will endwith the floating-pointapproximationA, which is computedwithout resortto any exactarithmetictechniques.Althoughthefour-componentexpansionB, like A, hasanerrorof �$.]â�/ , it is anappropriatevalueto testbecauseB is theexactresultif thefour subtractionsat thebottomof theexpressiontreeareperformedwithout roundoff error(correspondingto theshadedtrianglesin Figure 20). Becausethis is the commoncase,ORIENT2D explicitly testsfor it; executioncontinuesonly if roundoff occurredduringthetranslationof coordinatesandB is smallerthanits errorbound. ThecorrectedestimateC hasanerrorboundof �$.]â 2 / . If C is not sufficiently accurate,theexactdeterminantDis computed.

Therearetwo unusualfeaturesof this test,bothof whicharisebecauseonly thesignof thedeterminantis needed.First, thecorrectionaltermaddedto B to form C is notaddedexactly; instead,theAPPROXIMATE

procedureof Section2.7 is usedto find an approximationB þ of B, and the correctionalterm is addedto B þ with the possibility of roundoff error. The consequenterrorsmay be of magnitude�$.]â B / , whichwould normallyprecludeobtaininganerrorboundof �$.]â 2 / . However, thesignof thedeterminantis onlyquestionableif B is of magnitude�$.]â�/ , soan �$.]â 2 / errorboundfor C canbeestablished.

The secondinterestingfeatureis that, if C is not sufficiently accurate,no moreapproximationsarecomputedbeforecomputingthe exact determinant. To understandwhy, considerthreecollinearpoints1 , � , and ¹ ; the determinantdefinedby thesepoints is zero. If a coordinateof one of thesepoints isperturbedby a singleulp, thedeterminanttypically increasesto �$.]â�/ . Hence,onemight guessthatwhenadeterminantis no largerthan �$.]â 2 / , it is probablyzero. This intuition seemsto hold in practicefor all thepredicatesconsideredherein,onbothrandomand“practical” point sets.Determinantsthatdon’t stopwithapproximationC arenearlyalwayszero.

Thederivationof errorboundsfor thesevaluesis tricky, soanexampleis givenhere.Theeasiestwayto applyforwarderroranalysisto anexpressionwhosevalueis calculatedin floating-pointarithmeticis toexpresstheexactvalueof eachsubexpressionin termsof thecomputedvalueplusanunknown error termwhosemagnitudeis bounded.For instance,theerror incurredby thecomputation�^v 19+l� is no largerthan âO; � ; . Furthermore,theerror is smallerthan âO; 1 � �A; . Eachof theseboundsis usefulunderdifferentcircumstances.If Ê representsthetruevalue1 � � , anabbreviatedwayof expressingthesenotionsis to writeÊ ��� �_âO; � ; and Ê ��� ��âO; Êd; . Henceforth,thisnotationwill beusedasshorthandfor therelation Ê ��� ��ÿfor some

ÿthatsatisfies; ÿ ;'<�âO; � ; and ; ÿ ;=<�âO; Êd; .

Let usconsidertheerrorboundfor A. For eachsubexpressionin theexpressiontreeof theorientationtest,denoteits true(exact)value Ê � andits approximatevalue ��� asfollows.

38 JonathanRichardShewchuk

x cx by c y ay cy bx

5t

x5

1t 3tt 2 t4

t6

6x

tA

cx

3x 4

a

x

B

2x1x

ComponentExpansion

Two-Diff

Two-Product

Expansion Sum

Expansion DiffEstimate

A

CD

B’

Figure21: Adaptive calculations used by the 2D orientation test. Dashed boxes represent nodes in theoriginal expression tree.

Ê 1 � 1'Û���¹�Û �1� 1'Û#,Y¹�ÛÊ 2 � �fܸ��¹�Ü �

2� �fÜ),Y¹�ÜÊ 3 � 1'ܸ��¹�Ü �

3� 1'Üh,n¹�ÜÊ 4 � �fÛ���¹�Û �

4� �fÛ#,Y¹�ÛÊ 5 � Ê 1 Ê 2 �

5���

1 - � 2Ê 6 � Ê 3 Ê 4 �6���

3 - � 4Ê�� � Ê 5 �_Ê 6 A �l� 5 , � 6

Fromthesedefinitions,it is clearthat Ê 1 ��� 1 �?âO; � 1 ; ; similar boundshold for Ê 2, Ê 3, and Ê 4. Observe

Implementationof GeometricPredicates 39

Approximation ErrorboundA . 3â � 16â 2 /[-Â.�; � 5 ;+�; � 6 ; /B þ . 2â � 12â 2 /[-Â.�; � 5 ;+�; � 6 ; /C . 3â � 8â 2 /[-�;B þÙ;d+Â. 9â 2 � 64â 3 /[-Â.�; � 5 ;+�; � 6 ; /

Table1: Error bounds for the expansions calculated by ORIENT2D. B�is a Z -bit approximation of the expansion

B, computed by the APPROXIMATE procedure. Note that each coefficient is expressible in Z bits.

alsothat � 5���

1 - � 2���

1�

2 �0âO; � 5 ; . It follows that

Ê 5 � Ê 1 Ê 2 � �1�

2 �Â. 2â � â 2 /; � 1�

2 ;� �5 �0âO; � 5 ;�Â. 2â � â 2 /.�; � 5 ;d�0âO; � 5 ; /� �5 �Â. 3â � 3â 2 � â 3 /; � 5 ; �

Similarly, Ê 6 ��� 6 �Â. 3â � 3â 2 � â 3 /; � 6 ; .It mayseemoddtobekeepingtrackof termssmallerthan�$.]â�/ ,but theeffort tofindthesmallestmachine-

representablecoefficientfor eacherrorboundis justifiedif it everpreventsadeterminantcomputationfrombecomingmoreexpensivethannecessary. An errorboundfor A cannow bederived.

Ê�� � Ê 5 �_Ê 6 � �5 � � 6 �Â. 3â � 3â 2 � â 3 /.�; � 5 ; � ; � 6 ; /� A �0âO;A ;d�Â. 3â � 3â 2 � â 3 /.�; � 5 ; � ; � 6 ; /

Onecanminimizetheeffect of theterm âO;A ; by takingadvantageof thefact thatwe areonly interestedinthesignof Ê�� . Onecanconcludewith certaintythatA hasthecorrectsignif

. 1 �nâ�/;A ;=b�. 3â � 3â 2 � â 3 /.�; � 5 ; � ; � 6 ; /f�which is trueif ;A ;=B�. 3â � 6â 2 � 8â 3 /.�; � 5 ; � ; � 6 ; / �

Thisboundis notdirectlyapplicable,becauseits computationwill incur roundoff error. To accountforthis, multiply thecoefficient by . 1 � â�/ 2 (a factorof . 1 � â�/ for theadditionof ; � 5 ; and ; � 6 ; , andanothersuchfactorfor themultiplication). Hence,we aresecurethatthesignof A is correctif

;A ;=B�. 3â � 12â 2 � 24â 3 /@-Â.�; � 5 ;d+�; � 6 ; / �This boundis not directly applicableeither, becausethecoefficient is not expressiblein � bits. Roundingupto thenext � -bit number, wehavethecoefficient . 3â � 16â 2 / , whichshouldbeexactlycomputedonceatprograminitializationandreusedduringeachcall to ORIENT2D.

Error boundsfor A, B þ , andC aregivenin Table1. Theboundfor B þ takesadvantageof Theorem23,whichshowsthatB þ approximatesB with relativeerrorlessthan2â . (RecallfromSection2.7thatthelargestcomponentof B mighthaveonly onebit of precision.)

Theseboundshave the pleasingpropertythat they arezero in the commoncasethat all threeinputpointslie on a horizontalor vertical line. Hence,althoughORIENT2D usuallyresortsto exact arithmeticwhengivencollinearinputpoints,it only performstheapproximatetest(A) in thetwo casesthatoccurmostcommonlyin practice.

40 JonathanRichardShewchuk

DoubleprecisionORIENT2D timingsin microsecondsPoints Uniform Geometric Nearly

Method Random Random CollinearApproximate(7) 0.15 0.15 0.16Exact(6) 6.56 6.89 6.31Exact(7) 8.35 8.48 8.13Exact(6), MPFUN 92.85 94.03 84.97AdaptiveA (7), approximate 0.28 0.27 0.22AdaptiveB (7) 1.89AdaptiveC (7) 2.14AdaptiveD (7), exact 8.35LN adaptive(7), approximate 0.32 n/aLN adaptive(7), exact n/a 4.43

Table2: Timings for ORIENT2D on a DEC 3000/700 with a 225 MHz Alpha processor. All determinants usethe 2D version of either Expression 6 or the more stable Expression 7 as indicated. The first two columnsindicate input points generated from a uniform random distribution and a geometric random distribution. Thethird column considers two points chosen from one of the random distributions, and a third point chosen tobe approximately collinear to the first two. Timings for the adaptive tests are categorized according to whichresult was the last generated. Each timing is an average of 60 or more randomly generated inputs. For eachsuch input, time was measured by a Unix system call before and after 10,000 iterations of the predicate.Individual timings vary by approximately 10%. Timings of Bailey’s MPFUN package and Fortune and VanWyk’s LN package are included for comparison.

Compilereffectsaffect the implementationof ORIENT2D. By separatingthe calculationof A andtheremainingcalculationsinto two procedures,with theformercallingthelatterif necessary, I reducedthetimeto computeA by 25%,presumablybecauseof improvementsin thecompiler’s ability to performregisterallocation.

Table2 liststimingsfor ORIENT2D,givenrandominputs.Observethattheadaptivetest,whenit stopsattheapproximateresultA, takesnearlytwiceaslongastheapproximatetestbecauseof theneedto computeanerrorbound.Thetableincludesacomparisonwith Bailey’sMPFUN[2], chosenbecauseit is thefastestportableandfreelyavailablearbitraryprecisionpackageI know of. ORIENT2Dcodedwith my(nonadaptive)algorithmsis roughlythirteentimesfasterthanORIENT2D codedwith MPFUN.

Also includedis acomparisonwith anorientationpredicatefor 53-bit integerinputs,createdby FortuneandVanWyk’s LN. TheLN-generatedorientationpredicateis quite fastbecauseit takesadvantageof thefactthatit is restrictedto boundedintegerinputs.My exacttestscostlessthantwiceasmuchasLN’s; thisseemslike a reasonablepriceto payfor theability to handlearbitraryexponentsin theinput.

Thesetimingsarenotthewholestory;LN’sstaticerrorestimateis typicallymuchlargerthantheruntimeerrorestimateusedfor adaptive stageA, andLN usesonly two stagesof adaptivity, so theLN-generatedpredicatesareslower in someapplications,asSection4.5will demonstrate.It is significantthatfor 53-bitintegerinputs,themultiple-stagepredicateswill rarelypassstageB becausetheinitial translationis usuallydonewithout roundoff error;hence,theLN-generatedORIENT2D usuallytakesmorethantwice aslong toproduceanexactresult. It shouldbeemphasized,however, thatthesearenot inherentdifferencesbetweenLN’smultiple-digit integerapproachandmy multiple-termfloating-pointapproach;LN could,in principle,employ thesameruntimeerrorestimateandasimilarmultiple-stageadaptivity scheme.

Implementationof GeometricPredicates 41

yc yd yb yd xc xd

t6

x6 x7

t7

t5t4t 3

t8

xb xd

t2

ComponentExpansion

Two-Diff

Two-Product

Scale-Expansion

Expansion-Sum

Expansion-Diff

t1

za dz

x1

192

tA

A

2424

2424

24

24

B

EstimateB’

C D

24

Figure22: Adaptive calculations used by the 3D orientation test. Bold numbers indicate the length of anexpansion. Only part of the expression tree is shown; two of the three cofactors are omitted, but their resultsappear as dashed components and expansions.

4.4 ORIENT3D, INCIRCLE, and INSPHERE

Figure22illustratestheimplementationof ORIENT3D,whichis similarto theORIENT2Dimplementation.Ais thestandardfloating-pointresult.B is exactif thesubtractionsat thebottomof thetreeincurnoroundoff.C representsadropin theerrorboundfrom �$.]â�/ to �$.]â 2 / . D is theexactdeterminant.

42 JonathanRichardShewchuk

Approximation ErrorboundA . 7â � 56â 2 /@-Â.�� q +�� � +����f/B þ . 3â � 28â 2 /@-Â.�� q +�� � +����f/C . 3â � 8â 2 /[-�;B þÙ;d+Â. 26â 2 � 288â 3 /@-Â.�� q +�� � +����f/

� q � ; � 1 ;-Â.�; � 6 ;+�; � 7 ; /� ; 1=ð»,Yé'ð';d-Â.�; .]�fÛs,YéeÛ'/[-�.2¹�ܸ,YéeÜt/;d+�; .]�fÜh,YéeÜA/[-�.2¹�Ûs,YéeÛ'/; /� � � ; ��ð»,Yé'ð';d-Â.�; .2¹�Ûs,YéeÛ'/[-Â.21'Üh,YéeÜt/;d+�; .2¹�ܸ,YéeÜt/[-Â.21'Û#,YéeÛ'/; /��� � ; ¹ð:,Yé'ð';-Â.�; .21'Ûs,YéeÛ'/[-Â.]�fÜh,YéeÜt/;d+�; .21'Üh,YéeÜA/[-Â.]�fÛ#,YéeÛ'/; /

Table3: Error bounds for the expansions calculated by ORIENT3D.

DoubleprecisionORIENT3D timingsin microsecondsPoints Uniform Geometric Nearly

Method Random Random CoplanarApproximate(7) 0.25 0.25 0.25Exact(6) 33.30 38.54 32.90Exact(7) 42.69 48.21 42.41Exact(6), MPFUN 260.51 262.08 246.64AdaptiveA (7), approximate 0.61 0.60 0.62AdaptiveB (7) 12.98AdaptiveC (7) 15.59AdaptiveD (7), exact 27.29LN adaptive(7), approximate 0.85 n/aLN adaptive(7), exact n/a 18.11

Table4: Timings for ORIENT3D on a DEC 3000/700. All determinants are Expression 6 or the more stableExpression 7 as indicated. Each timing is an average of 120 or more randomly generated inputs. For eachsuch input, time was measured by a Unix system call before and after 10,000 iterations of the predicate.

Errorboundsfor thelargestcomponentof eachof theseexpansionsaregivenin Table3, partly in termsof thevariables� 1, � 6, and � 7 in Figure22. Theboundsarezeroif all four input pointssharethesame� ,� , or � -coordinate,soonly theapproximatetestis neededin themostcommoninstancesof coplanarity.

Table4 liststimingsfor ORIENT3D,givenrandominputs.Theerrorboundfor A isexpensivetocompute,andincreasestheamountof time requiredto performtheapproximatetestin theadaptive caseby a factorof two anda half. Thegapbetweenmy exactalgorithmandMPFUN is smallerthanin the2D case,but isstill a factorof nearlyeight.

Oddly, the table revealsthat D is calculatedmore quickly than the exact result is calculatedby thenonadaptiveversionof ORIENT3D.Theexplanationisprobably thatD isonlycomputedwhenthedeterminantis zeroor verycloseto zero,hencethelengthsof theintermediateexpansionsaresmallerthanusual,andthecomputationtime is less.Furthermore,whensomeof thepointcoordinatesaretranslatedwithout roundofferror, theadaptivepredicateignoresbranchesof theexpressiontreethatevaluateto zero.

INCIRCLE is implementedsimilarly to ORIENT3D, asthedeterminantsaresimilar. Thecorresponding

Implementationof GeometricPredicates 43

Approximation ErrorboundA . 10â � 96â 2 /@-Â.�� q +�� � +����f/B þ . 4â � 48â 2 /@-Â.�� q +�� � +����f/C . 3â � 8â 2 /[-�;B þÙ;d+Â. 44â 2 � 576â 3 /@-Â.�� q +�� � +����f/

� q � .y.21'Ûs,YéeÛ'/ 2 +Â.21'Üh,YéeÜA/ 2 /[-Â.�; .]�fÛs,YéeÛ'/[-�.2¹�ܸ,YéeÜt/;d+�; .]�fÜh,YéeÜA/[-�.2¹�Ûs,YéeÛ'/; /� � � .y.]�fÛs,YéeÛ'/ 2 +Â.]�fÜh,YéeÜA/ 2 /[-Â.�; .2¹�Ûs,YéeÛ'/[-Â.21'Üh,YéeÜt/;d+�; .2¹�ܸ,YéeÜt/[-Â.21'Û#,YéeÛ'/; /��� � .y.2¹�Ûs,YéeÛ'/ 2 +Â.2¹�ܸ,YéeÜt/ 2 /[-�.�; .21'Û�,néeÛ=/@-Â.]�fÜh,YéeÜA/;d+Â; .21'ܸ,YéeÜt/[-Â.]�fÛ#,YéeÛ'/; /Table5: Error bounds for the expansions calculated by INCIRCLE. Squares are approximate.

DoubleprecisionINCIRCLE timingsin microsecondsPoints Uniform Geometric Nearly

Method Random Random CocircularApproximate(9) 0.31 0.28 0.30Exact(8) 71.66 83.01 75.34Exact(9) 91.71 118.30 104.44Exact(6), MPFUN 350.77 343.61 348.55AdaptiveA (9), approximate 0.64 0.59 0.64AdaptiveB (9) 44.56AdaptiveC (9) 48.80AdaptiveD (9), exact 78.06LN adaptive(9), approximate 1.33 n/aLN adaptive(9), exact n/a 32.44

Table6: Timings for INCIRCLE on a DEC 3000/700. All determinants are the 2D version of either Expression 8or the more stable Expression 9 as indicated. Each timing is an average of 100 or more randomly generatedinputs, except adaptive stage D. (It is difficult to generate cases that reach stage D.) For each such input,time was measured by a Unix system call before and after 1,000 iterations of the predicate.

errorboundsappearin Table5, andtimingsappearin Table6.

Timingsfor INSPHEREappearin Table7. This implementationdiffers from theothertestsin that,dueto programmerlaziness,D is not computedincrementallyfrom B; rather, if C is not accurateenough,D iscomputedfrom scratch.Fortunately, C is usuallyaccurateenough.

TheLN exact testshave anadvantageof a factorof roughly2.5 for INCIRCLE and4 for INSPHERE, sothecostof handlingfloating-pointoperandsis greaterwith thelargerexpressions.As with theorientationtests,thiscostis mediatedby bettererrorboundsandfour-stageadaptivity.

Thetimingsfor theexactversionsof all four predicatesshow somesensitivity to thedistributionof theoperands;they take5%to 30%longerto executewith geometricallydistributedoperands(whoseexponentsvarywidely) thanwith uniformly distributedoperands.Thisdifferenceoccursbecausetheintermediateandfinal expansionsarelarger whenthe operandshave broadlydistributedexponents.The exact orientationpredicatesarecheapestwhen their inputsarecollinear/coplanar, becauseof the smallerexpansionsthatresult,but thiseffectdoesnotoccurfor theexactincirclepredicates.

44 JonathanRichardShewchuk

DoubleprecisionINSPHEREtimingsin microsecondsPoints Uniform Geometric Nearly

Method Random Random CosphericalApproximate(9) 0.93 0.95 0.93Exact(8) 324.22 378.94 347.16Exact(9) 374.59 480.28 414.13Exact(8), MPFUN 1,017.56 1,019.89 1,059.87AdaptiveA (9), approximate 2.13 2.14 2.14AdaptiveB (9) 166.21AdaptiveC (9) 171.74AdaptiveD (9), exact 463.96LN adaptive(9), approximate 2.35 n/aLN adaptive(9), exact n/a 116.74

Table7: Timings for INSPHEREon a DEC 3000/700. All determinants are Expression 8 or the more stableExpression 9 as indicated. Each timing is an average of 25 or more randomly generated inputs, exceptadaptive stage D. For each such input, time was measured by a Unix system call before and after 1,000iterations of the predicate.

4.5 Performancein Two Triangulation Programs

To evaluatetheeffectivenessof theadaptive testsin applications,I testedthemin two of my Delaunaytriangulationcodes. Triangle[23] is a 2D Delaunaytriangulatorandmeshgenerator, publicly availablefrom Netlib, thatusesa divide-and-conqueralgorithm[16, 12]. Pyramidis a 3D Delaunaytetrahedralizerthatusesan incrementalalgorithm[25]. For both2D and3D, threetypesof inputsweretested:uniformrandompoints,pointslying (approximately)on theboundaryof a circle or sphere,anda squareor cubicgrid of latticepoints,tilted soasnot to bealignedwith thecoordinateaxes.Thelattertwo werechosenfortheirnastiness.Thelatticeshavebeentilted usingapproximatearithmetic,sothey arenotperfectlycubical,andtheexponentsof theircoordinatesvaryenoughthatLN cannotbeused.(I havealsotriedperfectlatticeswith 53-bit integercoordinates,but ORIENT3D andINSPHEREneverpassstageB; theperturbedlatticesarepreferredherebecausethey occasionallyforcethepredicatesinto stageC or D.)

Theresultsfor 2D, whichappearin Table8, indicatethatthefour-stagepredicatesaddabout8%to thetotal runningtime for randomlydistributedinput points,mainly becauseof theerrorboundtests.For themoredifficult point sets,thepenaltymaybeasgreatas30%. Of course,this penaltyappliespreciselyforthepoint setsthataremostlikely to causedifficultieswhenexactarithmeticis notavailable.

The resultsfor 3D, outlinedin Table9, are lesspleasing. The four-stagepredicatesaddabout35%to the total running time for randomlydistributed input points; for points distributed approximatelyonthe surfaceof a sphere,the penaltyis a factorof eleven. Ominously, however, the penaltyfor the tiltedgrid is uncertain,becausethetetrahedralizationprogramusingapproximatearithmeticfailedto terminate.A debuggerrevealedthat the point location routine was stuck in an infinite loop becausea geometricinconsistency hadbeenintroducedinto the meshdueto roundoff error. Robust arithmeticis not alwaysslowerafterall.

In theseprograms(andlikely in any program),threeof thefour-stagepredicates(INSPHEREbeingtheexception)arefasterthantheirLN equivalents.This is asurprise,consideringthatthefour-stagepredicatesaccept53-bit floating-point inputs whereasthe LN-generatedpredicatesare restrictedto 53-bit integer

Caveats 45

2D divide-and-conquerDelaunaytriangulationUniform Perimeter TiltedRandom of Circle Grid

Inputsites 1,000,000 1,000,000 1,000,000ORIENT2D callsAdaptiveA, approximate 9,497,314 6,291,742 9,318,610AdaptiveB 121,081AdaptiveC 118AdaptiveD, exact 3Averagetime, s 0.32 0.38 0.33

LN approximate 9,497,314 2,112,284 n/aLN exact 4,179,458 n/aLN averagetime, s 0.35 3.16 n/aINCIRCLE callsAdaptiveA, approximate 7,596,885 3,970,796 7,201,317AdaptiveB 50,551 176,470AdaptiveC 120 47AdaptiveD, exact 4Averagetime, s 0.65 1.11 1.67

LN approximate 6,077,062 0 n/aLN exact 1,519,823 4,021,467 n/aLN averagetime, s 7.36 32.78 n/aProgramrunningtime,secondsApproximateversion 57.3 59.9 48.3Robustversion 61.7 64.7 62.2LN robustversion 116.0 214.6 n/a

Table8: Statistics for 2D divide-and-conquer Delaunay triangulation of several point sets.

inputs. However, the integer predicateswould probablyoutperformthe floating-pointpredicatesif theywereto adoptthesameruntimeerrorestimateandasimilar four-stageadaptivity scheme.

5 Caveats

Unfortunately, thearbitraryprecisionarithmeticroutinesdescribedhereinarenotuniversallyportable;bothhardwareandcompilerscanpreventthemfrom functioningcorrectly.

Compilerscaninterfereby makinginvalid optimizationsbasedon misconceptionsaboutfloating-pointarithmetic.For instance,acleverbut incorrectcompilermightcauseexpansionarithmeticalgorithmsto failby deriving the “f act” that � virtual, computedby Line 2 of FAST-TWO-SUM, is equalto � , andoptimizingthesubtractionaway. This optimizationwould bevalid if computersstoredarbitraryrealnumbers,but isincorrectfor floating-pointnumbers.Unfortunately, notall compilerdevelopersareawareof theimportanceof maintainingcorrectfloating-point languagesemantics,but as a whole, they seemto be improving.Goldberg [11, 3.2.3]presentsseveralrelatedexamplesof how carefullydesignednumericalalgorithmscanbeutterly ruinedby incorrectoptimizations.

46 JonathanRichardShewchuk

3D incrementalDelaunaytetrahedralizationUniform Surface TiltedRandom of Sphere Grid

Inputsites 10,000 10,000 10,000ORIENT3D countsAdaptiveA, approximate 2,735,668 1,935,978 5,542,567AdaptiveB 602,344AdaptiveC 1,267,423AdaptiveD, exact 28,185Averagetime, s 0.72 0.72 4.12

LN approximate 2,735,668 1,935,920 n/aLN exact 58 n/aLN averagetime, s 0.99 1.00 n/aINSPHEREcountsAdaptiveA, approximate 439,090 122,273 3,080,312AdaptiveB 180,383 267,162AdaptiveC 1,667 548,063AdaptiveD, exactAveragetime, s 2.23 96.45 48.12

LN approximate 438,194 104,616 n/aLN exact 896 199,707 n/aLN averagetime, s 2.50 70.82 n/aProgramrunningtime,secondsApproximateversion 4.3 3.0 Robustversion 5.8 34.1 108.5LN robustversion 6.5 30.5 n/a

Table9: Statistics for 3D incremental Delaunay tetrahedralization of several point sets. The approximatecode failed to terminate on the tilted grid input.

Evenfloating-pointunits thatusebinaryarithmeticwith exactrounding,includingthosethatconformto theIEEE754standard,canhavesubtlepropertiesthatunderminetheassumptionsof thealgorithms.Themostcommonsuchdifficulty is the presenceof extendedprecisioninternalfloating-pointregisters,suchas thoseon the Intel 80486andPentiumprocessors.While suchregistersusually improve the stabilityof floating-pointcalculations,they causethemethodsdescribedhereinfor determiningtheroundoff of anoperationto fail. Thereareseveralpossibleworkaroundsfor this problem.In C, it is possibleto designatevariablesasvolatile,implying thatthey mustbestoredto memory. Thisensuresthatthevariableis roundedto a � -bit significandbeforeit is usedin anotheroperation. Forcing intermediatevaluesto be storedtomemoryandreloadedcanslow down thealgorithmssignificantly, andthereis aworseconsequence.Evenavolatilevariablecouldbedoublyrounded, beingroundedonceto theinternalextendedprecisionformat,thenroundedagainto singleor doubleprecisionwhenit is storedto memory. Theresultafterdoubleroundingisnotalwaysthesameasit wouldbeif it hadbeencorrectlyroundedto thefinal precision,andPriest[22,page103]describesa casewhereintheroundoff errorproducedby doubleroundingmaynotbeexpressiblein �bits. Thismightbealleviatedby amorecomplex (andslower)versionof FAST-TWO-SUM. A bettersolutionis to configureone’sprocessorto roundinternallyto doubleprecision.While mostprocessorswith internal

Conclusions 47

extendedprecisionregisterscanbethusconfigured,andmostcompilersprovide supportfor manipulatingprocessorcontrolstate,suchsupportvariesbetweencompilersandis notportable.Nevertheless,thespeedadvantageof multiple-termmethodsmakesit well worththetroubleto learntheright incantationtocorrectlyconfigureyourprocessor.

Thealgorithmsdo work correctlywithout specialtreatmenton mostcurrentUnix workstations.Nev-ertheless,usersshouldbecarefulwhentrying theroutines,or moving to a new platform,to ensurethattheunderlyingassumptionsof themethodarenot violated.

6 Conclusions

The algorithmspresentedhereinaresimpleand fast; looking at Figure8, it is difficult to imaginehowexpansionscouldbesummedwith fewer operationswithout specialhardwareassistance.Two featuresofthesetechniquesaccountfor theimprovementin speedrelative to othertechniques,especiallyfor numberswhoseprecisionis only a few componentsin length. The first is the relaxationof the usualconditionthat numbersbe normalizedto fixed digit positions. Instead,one enforcesthe much weaker conditionthat expansionsbe nonoverlapping(or strongly nonoverlapping). Expansionscan be summedand theresultingcomponentsmadenonoverlappingat a costof six floating-pointoperationsandonecomparisonper component. It seemsunlikely that normalizationto fixed digit positionscanbe doneso quickly ina portableway on currentprocessors.The secondfeatureto which I attribute the improved speedis thefact that mostpackagesrequireexpensive conversionsbetweenordinaryfloating-pointnumbersandthepackages’internalformats.With thetechniquesPriestandI describe,noconversionsarenecessary.

The readermay be misledandattribute the wholedifferencebetweenmy algorithmsandMPFUN tothefactthatI storedoubleprecisioncomponents,while MPFUNstoressingleprecisiondigits,andimaginethedifferencewould go away if MPFUN werereimplementedin doubleprecision.Sucha belief betraysamisunderstandingof how MPFUNworks. MPFUNusesdoubleprecisionarithmeticinternally, andobtainsexact resultsby usingdigits narrow enoughthat they canbe multiplied exactly. Hence,MPFUN’s half-precisiondigits arean integral part of its approach:to calculateexactly by avoiding roundoff error. Thesurpriseof multiple-termmethodsis thatreasonablespeedcanbeattainedby allowing roundoff to happen,thenaccountingfor it afterthefact.

As well asbeingfast,multiple-termalgorithmsarealsoreasonablyportable,makingno assumptionsotherthanthatamachinehasbinaryarithmeticwith exactrounding(andround-to-eventiebreakingif FAST-EXPANSION-SUM is to be usedinsteadof LINEAR-EXPANSION-SUM). No representation-dependenttrickslike bit-maskingto extract exponentfields areused. Therearestill machinesthat cannotexecutethesealgorithmscorrectly, but theirnumbersseemto bedwindlingastheIEEEstandardbecomesentrenched.

Perhapsthegreatestlimitation of themultiple-termapproachis thatwhile it easilyextendstheprecisionof floating-pointnumbers,thereis no simpleway to extendthe exponentrangewithout losing muchofthe speed.The obvious approach,associatinga separateexponentfield with eachcomponent,is suretobe too slow. A morepromisingapproachis to expresseachmultiprecisionnumberasa multiexpansionconsistingof digits of very largeradix,whereeachdigit is anexpansioncoupledwith anexponent.In thisscheme,thetrueexponentof a componentis thesumof thecomponent’s own exponentandtheexponentof theexpansionthatcontainsit. Thefastalgorithmsdescribedin thisreportcanbeusedto addor multiplyindividualdigits;digitsarenormalizedby standardmethods(suchasthoseusedby MPFUN).IEEEdoubleprecisionvalueshaveanexponentrangeof � 1022to 1023,soonecouldmultiply digitsof radix21000witha simpleexpansionmultiplicationalgorithm,or digits of radix 22000 with a slightly morecomplicatedonethatsplitseachdigit in half beforemultiplying.

48 JonathanRichardShewchuk

TheC codeI havemadepubliclyavailablemightform thebeginningof anextensivelibraryof arithmeticroutinessimilar to MPFUN, but a greatdealof work remainsto be done. In additionto the problemofexpandingtheexponentrange,thereis oneproblemthat is particularto themultiple-termapproach:it isnot possibleto useFFT-basedmultiplication algorithmswithout first renormalizingeachexpansionto amultiple-digit form. This normalizationis not difficult to do, but it coststime andputsthemultiple-termmethodatadisadvantagerelative to methodsthatkeepnumbersin digit form asamatterof course.

As Priestpointsout,multiple-termalgorithmscanbeusedto implementextended(but finite) precisionarithmeticaswell asexactarithmetic;simply compressandthentruncateeachresultto a fixednumberofcomponents.Perhapsthegreatestpotentialof thesealgorithmslies not with arbitraryprecisionlibraries,but in providing a fastandsimpleway to extendslightly the precisionof critical variablesin numericalalgorithms. Hence,it would not be difficult to provide a routine that quickly computesthe intersectionpointof two segmentswith doubleprecisionendpoints,correctlyroundedto adoubleprecisionresult. If analgorithmcanbemadesignificantlymorestablebyusingdoubleorquadrupleprecisionfor afew key values,it may save a researcherfrom spendinga greatdealof time devising andanalyzinga stableralgorithm;Priest[22, 5.1]offersseveralexamples.Speedconsiderationsmaymakeit untenableto accomplishthisbycallingastandardextendedprecisionlibrary. ThetechniquesPriestandI havedevelopedaresimpleenoughto becodeddirectly in numericalalgorithms,avoiding functioncall overheadandconversioncosts.

A usefultool in codingsuchalgorithmswould be anexpressioncompilersimilar to FortuneandVanWyk’s LN [10, 9], which convertsan expressioninto exact arithmeticcode,completewith error boundderivationandfloating-pointfilters. Sucha tool might evenbeableto automatetheprocessof breakinganexpressioninto adaptivestagesasdescribedin Section3.

To seehow adaptivity canbeusedfor morethanjustdeterminingthesignof anexpression,supposeonewishesto find, with relative errorno greaterthan1%, thecenteré of a circle thatpassesthroughthethreepoints 1 , � , and ¹ . Onemayusethefollowing expressions.

éeÛ � ¹�Û�� îîîîî1'ܸ��¹�Ü .21'Û���¹�Û'/ 2 � .21'Ü#��¹�Üt/ 2�fÜ#��¹�Ü .]�fÛ���¹�Û'/ 2 � .]�fÜ#��¹�Üt/ 2 îîîîî2 îîîîî

1'Û���¹�ÛÐ1'ܸ��¹�Ü�fÛ���¹�Û}�fÜ#��¹�Ü îîîîî

��éeÜ � ¹�Ü � îîîîî1'Û���¹�Ûó.21'Û���¹�Û'/ 2 � .21'Ü#��¹�ÜA/ 2�fÛ���¹�Ûù.]�fÛ���¹�Û'/ 2 � .]�fÜ#��¹�ÜA/ 2 îîîîî2 îîîîî

1'Û���¹�ÛÐ1'Ü#��¹�Ü�fÛ���¹�Û}�fÜ#��¹�Ü îîîîî

Thedenominatorof thesefractionsis preciselytheexpressioncomputedby ORIENT2D.Thecomputationofé is unstableif 1 , � , and ¹ arenearlycollinear;roundoff error in thedenominatorcandramaticallychangethe result, or causea division by zero. Disastercanbe avoided,and the desirederror boundenforced,by computingthedenominatorwith a variantof ORIENT2D thatacceptsanapproximationonly if its errorboundis roughly200timessmaller. A similaradaptiveroutinecouldaccuratelycomputethenumerators.

It might be fruitful to explore whetherthe methodsdescribedby Clarkson[4] andAvnaim et al. [1]canbeextendedby fastmultiprecisionmethodsto handlearbitrarydoubleprecisionfloating-pointinputs.Onecouldcertainlyrelaxtheir constraintson thebit complexity of the inputs;for instance,themethodofAvnaimetal. couldbemadeto performtheINSPHEREteston64-bit inputsusingexpansionsof lengththree.Unfortunately, it is not obvioushow to adapttheseinteger-basedtechniquesto inputswith wildly differingexponents.It isalsonotclearwhethersuchhybridalgorithmswouldbefasterthanstraightforwardadaptivity.Nevertheless,Clarkson’s approachlooks promisingfor larger determinants.Althoughmy methodsworkwell for smalldeterminants,they areunlikely to work well for sizesmuchlarger than5 � 5. Evenif oneusesGaussianeliminationratherthancofactorexpansion(animportantadjustmentfor matriceslargerthan5 � 5), theadaptivity techniquedoesnotscalewell with determinants,becauseof thelargenumberof terms

Why theTiebreakingRuleis Important 49

in theexpandedpolynomial.Clarkson’stechniquemaybetheonly economicalapproachfor matriceslargerthan10 � 10.

Whetheror not theseissuesareresolved in the nearfuture, researcherscanmake usetodayof testsfor orientationandincircle in two andthreedimensionsthatarecorrect,fastin mostcases,andapplicableto singleor doubleprecisionfloating-pointinputs. I invite working computationalgeometersto try mycodein their implementations,andhopethatit will save themfrom worryingaboutrobustnesssothey mayconcentrateongeometry.

A Why the TiebreakingRule is Important

Theorem13iscomplicatedby theneedtoconsiderthetiebreakingrule. Thisappendixgivesanexamplethatprovesthatthiscomplicationis necessaryto ensurethatFAST-EXPANSION-SUM will producenonoverlappingoutput. If one’s processordoesnot useround-to-eventiebreaking,onemight useinsteadanalgorithmthatis independentof thetiebreakingrule,suchastheslowerLINEAR-EXPANSION-SUM in AppendixB.

Section2.4gaveexamplesthatdemonstratethatFAST-EXPANSION-SUM doesnotpreservethenonoverlap-pingor nonadjacentproperties.Thefollowing exampledemonstratesthat,in theabsenceof any assumptionaboutthetiebreakingrule,FAST-EXPANSION-SUM doesnotpreserveany propertythatimpliesthenonover-lappingproperty. (As we have seen,the round-to-evenrule ensuresthatFAST-EXPANSION-SUM preservesthestronglynonoverlappingproperty.)

For simplicity, assumethat four-bit arithmeticis used.Supposetheround-toward-zerorule is initiallyin effect. The incompressibleexpansions214 � 28 � 24 � 1 and 211 � 26 � 22 can eachbe formedby summingtheir componentswith any expansionadditionalgorithm. Summingthesetwo expansions,FAST-EXPANSION-SUM (with zeroelimination)yields theexpansion1001 � 211 � 28 � 26 � 24 � 22 � 1.Similarly, onecanform theexpansion1001 � 210 � 27 � 25 � 23 � 21. Summingthesetwo in turn yields1101� 211 � 210 � 1111� 25 � 24 � 23 � 22 � 21 � 1,whichisnonoverlappingbutnotstronglynonoverlapping.

Switchingto theround-to-evenrule, supposeFAST-EXPANSION-SUM is usedto sumtwo copiesof thisexpansion.Theresulting“expansion”is 111 � 213 � � 211 � 210 � � 25 � 25 � � 21, whichcontainsapairofoverlappingcomponents.Hence,it is notsafeto mix theround-toward-zeroandround-to-evenrules,anditis notpossibleto provethatFAST-EXPANSION-SUM producesnonoverlappingexpansionsfor any tiebreakingrule.

Although the expansionabove is not nonoverlapping,it is not particularly bad, in the sensethatAPPROXIMATEwill nonethelessproduceanaccurateapproximationof theexpansion’svalue.It canbeproventhat,regardlessof tiebreakingrule,FAST-EXPANSION-SUM preserveswhatI call theweaklynonoverlappingproperty, which allows only a smallamountof overlapbetweencomponents,easilyfixedby compression.(Detailsareomittedhere.)I conjecturethatthegeometricpredicatesof Section4 work correctlyregardlessof tiebreakingrule.

B Linear-Time ExpansionAddition without Round-to-EvenTiebreaking

Theorem24 Let � �������� 1 � � and ���������� 1 � � benonoverlappingexpansionsof � and ��� -bit components,respectively, where ��� 3. Supposethat thecomponentsof both � and � are sortedin order of increasingmagnitude, exceptthat any of the � � or � � may be zero. Thenthe following algorithm will producea

50 JonathanRichardShewchuk

TWOSUM

TWOSUM

TWOSUM

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

FASTTWOSUM

��

���

1

2

3

4

5

��� !2

!3

!4

!5

� � � � �"5

"4

"3

"2

"1

��� #2

#3

#4

#5

$$$

%3

%4

%5

Figure 23: Operation of LINEAR-EXPANSION-SUM. &('*),+-' maintains an approximate running total. TheFAST-TWO-SUM operations in the bottom row exist to clip a high-order bit off each +-' term, if necessary, beforeoutputting it.

nonoverlappingexpansion"

such that" ������./���� 1

" � �0�213� , wherethecomponentsof"

arealsoin orderof increasingmagnitude, exceptthatanyof the

" � maybezero.

LINEAR-EXPANSION-SUM 45�76-�981 Merge � and � into a singlesequence , in orderof

nondecreasingmagnitude(possiblywith interspersedzeroes)2 4 ! 2 6

#2 82: FAST-TWO-SUM 4 2 6 1 8

3 for ;<: 3 to �=1,�4 4 % � 6 " �?> 2 82: FAST-TWO-SUM 4 � 6 # �?> 1 85 4 ! � 6 # � 8@: TWO-SUM 4 ! �?> 1 6 % � 86

" ��./� > 1 :#��./�

7" ��./� :

!��./�

8 return"

! � 1 # � is anapproximatesumof thefirst ; termsof ; seeFigure23.

Proof: At the endof eachiterationof the for loop, the invariant! � 1 # � 1 � �?> 2A � 1

" A � � �A � 1 A holds.

Certainly this invariant holds for ;�� 2 after Line 2 is executed. From Lines 4 and 5, we have that! � 1 # � 1 " �?> 2 � ! �?> 1 1# �?> 1 1 � ; theinvariantfollowsby induction.(Theuseof FAST-TWO-SUM in Line4

will be justifiedshortly.) This assuresusthatafterLines6 and7 areexecuted,����./�A � 1" A �B����./�A � 1

A , sothealgorithmproducesacorrectsum.

Theproof that"

is nonoverlappingandincreasingrelieson thefact that thetermsof aresummedinorderfrom smallestto largest,sotherunningtotal

! � 1 # � nevergrowsmuchlargerthanthenext componentto be summed.Specifically, I prove by inductionthat theexponentof

! � is at mostonegreaterthantheexponentof � . 1, andthecomponents

"1 6DCDCDCE6 " �?> 1 arenonoverlappingandin orderof increasingmagnitude

(exceptingzeros).This statementholdsfor ;F� 2 becauseG ! 2 GH�IG 1 + 2 GKJ 2 G 2 GLJ 2 G 3 G . To prove thestatementin thegeneralcase,assume(for theinductivehypothesis)thattheexponentof

! �?> 1 is atmostonegreaterthantheexponentof � , andthecomponents

"1 6DCDCDCE6 " �?> 2 arenonoverlappingandincreasing.

References 51

# �?> 1 is theroundoff errorof theTWO-SUM operationthatproduces! �?> 1, so G # �?> 1 GMJ 1

2ulp 4 ! �?> 1 8 . Thisinequalityandtheinductivehypothesisimply that G # �?> 1 GHJ ulp 4 � 8 , which justifiestheuseof a FAST-TWO-SUM operationin Line 4. This operationproducesthesum G % � 1 " �?> 2 GL�NG � 1 # �?> 1 G/OI4 2PQ1 18 ulp 4 � 8 .Corollary8(a)impliesthat G " �?> 2 GHO ulp 4 � 8 . Because

"1 6DCDCDCR6 " �?> 2 arenonoverlapping,we havetheboundGS� �?> 2A � 1

" A GMO ulp 4 � 8FJ ulp 4 � . 1 8 .Assumewithout loss of generalitythat the exponentof � . 1 is ��T 1, so that ulp 4 � . 1 8U� 1, andG 1 G 6VG 2 G 6DCDCDCR6VG � . 1 G areboundedbelow 2P . Because is formedby mergingtwo nonoverlappingincreasing

expansions,GS� �A � 1 A G*O 2PQ1 2P > 1. Consider, for instance,if � . 1 � 1000(in four-bit arithmetic);then

GS� �A � 1 A G canbeno greaterthanthesumof 1111C 1111 CDCDC and111C 1111 CDCDC .

Substitutingtheseboundsinto the invariantgivenat thebeginningof this proof, we have G ! � 1 # � G/JGS� �?> 2A � 1" A GV1WGS� �A � 1

A GXO 2P 1 2P > 1 1 1, which confirmsthat theexponentof! � is at mostonegreater

thantheexponentof � . 1.

To show that" �?> 1 is larger thanprevious componentsof

"(or is zero)and doesnot overlap them,

observe from Figure23 that" �?> 1 is formed(for ;Y� 3) by summing � . 1,

% � , and! �?> 1. It canbeshown

thatall threeof theseareeitherequalto zeroor too large to overlap" �?> 2, andhenceso is

" �?> 1. We havealreadyseenthat G " �?> 2 GMO ulp 4 � 8 , which is boundedin turnby ulp 4 � . 1 8 . It is clearthat G " �?> 2 G is toosmallto overlap

% � becauseboth areproducedby a FAST-TWO-SUM operation. Finally, G " �?> 2 G is too small tooverlap

! �?> 1 becauseG " �?> 2 GMJWG # �?> 1 G (applyingLemma1 to Line 4), and G # �?> 1 GHJ 12ulp 4 ! �?> 1 8 .

The foregoingdiscussionassumesthatnoneof the input componentsis zero. If any of the � is zero,thecorrespondingoutputcomponent

" �?> 2 is alsozero,andtheaccumulatorvalues!

and#

areunchanged(! � � ! �?> 1,

# � � # �?> 1). Z

References

[1] FrancisAvnaim,Jean-DanielBoissonnat,Olivier Devillers,FrancoP. Preparata,andMarietteYvinec.EvaluatingSignsof DeterminantsUsing Single-PrecisionArithmetic. Manuscriptavailable fromhttp://www.inria.fr:/prisme/personnel/devillers/anglais/determinant,1995.

[2] David H. Bailey. A PortableHigh PerformanceMultiprecisionPackage. TechnicalReportRNR-90-022,NASA AmesResearchCenter, Moffett Field,California,May 1993.

[3] JohnCanny. SomeAlgebraic andGeometricComputationsin PSPACE. 20thAnnualSymposiumontheTheoryof Computing(Chicago,Illinois), pages460–467.Associationfor ComputingMachinery,May 1988.

[4] KennethL. Clarkson. Safeand Effective DeterminantEvaluation. 33rd Annual SymposiumonFoundationsof ComputerScience(Pittsburgh,Pennsylvania),pages387–395.IEEEComputerSocietyPress,October1992.

[5] T. J.Dekker. A Floating-PointTechniquefor ExtendingtheAvailablePrecision. NumerischeMathe-matik18:224–242,1971.

[6] Steven Fortune. StableMaintenanceof Point SetTriangulationsin Two Dimensions. 30th AnnualSymposiumon Foundationsof ComputerScience,pages494–499.IEEE ComputerSocietyPress,1989.

52 JonathanRichardShewchuk

[7] . Progressin ComputationalGeometry. Directionsin GeometricComputing(R. Martin,editor),chapter3, pages81–128.InformationGeometersLtd., 1993.

[8] . NumericalStabilityof Algorithmsfor 2D DelaunayTriangulations. InternationalJournalofComputationalGeometry& Applications5(1–2):193–213,March–June1995.

[9] StevenFortuneandChristopherJ.VanWyk. EfficientExactArithmeticfor ComputationalGeometry.Proceedingsof theNinthAnnualSymposiumonComputationalGeometry,pages163–172.Associationfor ComputingMachinery, May 1993.

[10] . StaticAnalysisYieldsEfficient ExactInteger Arithmeticfor ComputationalGeometry. Toappearin TransactionsonMathematicalSoftware,1996.

[11] David Goldberg. WhatEveryComputerScientistShouldKnowAboutFloating-PointArithmetic. ACMComputingSurveys23(1):5–48,March1991.

[12] LeonidasJ.GuibasandJorgeStolfi. Primitivesfor theManipulationof General SubdivisionsandtheComputationof Voronoı Diagrams. ACM TransactionsonGraphics4(2):74–123,April 1985.

[13] ChristophM. Hoffmann. The Problemsof Accuracy and Robustnessin GeometricComputation.Computer22(3):31–41,March1989.

[14] Michael Karasick, Derek Lieber, and Lee R. Nackman. Efficient DelaunayTriangulation UsingRationalArithmetic. ACM TransactionsonGraphics10(1):71–91,January1991.

[15] DonaldErvin Knuth. TheArt of ComputerProgramming:SeminumericalAlgorithms, secondedition,volume2. AddisonWesley, Reading,Massachusetts,1981.

[16] Der-Tsai Lee andBruceJ. Schachter. Two Algorithmsfor Constructinga DelaunayTriangulation.InternationalJournalof ComputerandInformationSciences9(3):219–242,1980.

[17] SeppoLinnainmaa.Analysisof SomeKnownMethodsof Improving theAccuracyof Floating-PointSums. BIT 14:167–202,1974.

[18] Victor Milenkovic. DoublePrecisionGeometry:A General Techniquefor CalculatingLineandSeg-mentIntersectionsusingRoundedArithmetic. 30thAnnualSymposiumon Foundationsof ComputerScience,pages500–505.IEEEComputerSocietyPress,1989.

[19] N. E. Mnev. TheUniversalityTheoremsontheClassificationProblemof ConfigurationVarietiesandConvex PolytopesVarieties. TopologyandGeometry- Rohlin Seminar(O. Ya.Viro, editor),LectureNotesin Mathematics,volume1346,pages527–543.Springer-Verlag,1988.

[20] ThomasOttmann,GeraldThiemt,andChristianUllrich. NumericalStabilityofGeometricAlgorithms.Proceedingsof theThirdAnnual SymposiumonComputationalGeometry,pages119–125.Associationfor ComputingMachinery, June1987.

[21] DouglasM. Priest.Algorithmsfor Arbitrary PrecisionFloating Point Arithmetic. TenthSymposiumon ComputerArithmetic (Los Alamitos,California),pages132–143.IEEE ComputerSocietyPress,1991.

[22] . On Propertiesof Floating Point Arithmetics:NumericalStabilityandtheCostof AccurateComputations. Ph.D.thesis,Departmentof Mathematics,Universityof CaliforniaatBerkeley, Berke-ley, California,November1992. Availableby anonymousFTPto ftp.icsi.berkeley.edu aspub/theory/priest-thesis.ps.Z .

REFERENCES 53

[23] JonathanRichardShewchuk. Triangle: Engineeringa 2D Quality MeshGenerator and DelaunayTriangulator. First Workshopon Applied ComputationalGeometry. Associationfor ComputingMachinery, May 1996.

[24] PatH. Sterbenz.Floating-PointComputatation. Prentice-Hall,EnglewoodCliffs, New Jersey, 1974.

[25] David F. Watson.Computingthe � -dimensionalDelaunayTessellationwith Applicationto VoronoıPolytopes. ComputerJournal24(2):167–172,1981.

[26] JamesHardyWilkinson. RoundingErrors in Algebraic Processes. Prentice-Hall,EnglewoodCliffs,New Jersey, 1963.