Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++...

12
Edinburgh Research Explorer Low-cost Deterministic C++ Exceptions for Embedded Systems Citation for published version: Renwick, J, Spink, T & Franke, B 2019, Low-cost Deterministic C++ Exceptions for Embedded Systems. in Proceedings of the 28th International Conference on Compiler Construction (CC2019). ACM, Washington, DC, USA, pp. 76-86, 28th International Conference on Compiler Construction, Washington D.C., United States, 16/02/19. https://doi.org/10.1145/3302516.3307346 Digital Object Identifier (DOI): 10.1145/3302516.3307346 Link: Link to publication record in Edinburgh Research Explorer Document Version: Peer reviewed version Published In: Proceedings of the 28th International Conference on Compiler Construction (CC2019) General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 13. Mar. 2020

Transcript of Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++...

Page 1: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Edinburgh Research Explorer

Low-cost Deterministic C++ Exceptions for Embedded Systems

Citation for published versionRenwick J Spink T amp Franke B 2019 Low-cost Deterministic C++ Exceptions for Embedded Systems inProceedings of the 28th International Conference on Compiler Construction (CC2019) ACM WashingtonDC USA pp 76-86 28th International Conference on Compiler Construction Washington DC UnitedStates 160219 httpsdoiorg10114533025163307346

Digital Object Identifier (DOI)10114533025163307346

LinkLink to publication record in Edinburgh Research Explorer

Document VersionPeer reviewed version

Published InProceedings of the 28th International Conference on Compiler Construction (CC2019)

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation If you believe that the public display of this file breaches copyright pleasecontact openaccessedacuk providing details and we will remove access to the work immediately andinvestigate your claim

Download date 13 Mar 2020

Low-Cost Deterministic C++ Exceptions for Embedded SystemsJames Renwick

University of EdinburghEdinburgh UK

s1227500smsedacuk

Tom SpinkUniversity of Edinburgh

Edinburgh UKtspinkinfedacuk

Bjoumlrn FrankeUniversity of Edinburgh

Edinburgh UKbfrankeinfedacuk

ABSTRACTThe C++ programming language offers a strong exception mech-anism for error handling at the language level improving codereadability safety and maintainability However current C++ im-plementations are targeted at general-purpose systems often sac-rificing code size memory usage and resource determinism forthe sake of performance This makes C++ exceptions a particularlyundesirable choice for embedded applications where code size andresource determinism are often paramount Consequently embed-ded coding guidelines either forbid the use of C++ exceptions orembedded C++ tool chains omit exception handling altogether Inthis paper we develop a novel implementation of C++ exceptionsthat eliminates these issues and enables their use for embeddedsystems We combine existing stack unwinding techniques with anew approach to memory management and run-time type infor-mation (RTTI) In doing so we create a compliant C++ exceptionhandling implementation providing bounded runtime and memoryusage while reducing code size requirements by up to 82 andincurring only a minimal runtime overhead for the common caseof no exceptions

CCS CONCEPTSbull Software and its engineering rarr Error handling and recov-ery Software performance Language features

KEYWORDSC++ exceptions error handling

ACM Reference FormatJames Renwick Tom Spink and Bjoumlrn Franke 2019 Low-Cost Determin-istic C++ Exceptions for Embedded Systems In Proceedings of the 28thInternational Conference on Compiler Construction (CC rsquo19) February 16ndash172019 Washington DC USA ACM New York NY USA 11 pages httpsdoiorg10114533025163307346

1 INTRODUCTIONOne of the major benefits of C++ is its promise of zero-cost ab-stractions and its rule of ldquoyou donrsquot pay for what you donrsquot userdquoThis allows the use of various safer and more productive modernprogramming idioms without the performance drawbacks with

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for components of this work owned by others than theauthor(s) must be honored Abstracting with credit is permitted To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specific permissionandor a fee Request permissions from permissionsacmorgCC rsquo19 February 16ndash17 2019 Washington DC USAcopy 2019 Copyright held by the ownerauthor(s) Publication rights licensed to ACMACM ISBN 978-1-4503-6277-11902 $1500httpsdoiorg10114533025163307346

which their use is often associated However C++rsquos wide usage hasnot been without issues For many of the different domains withinindustry the use of exceptions in C++ a core part of the languagehas been disallowed For example the Google C++ Style Guide [7]the Joint Strike Fighter Air Vehicle C++ Coding Standards and theMars Rover flight software [14] do not allow the use of exceptionsat all while MISRA C++ specifies detailed rules for their use [1]

While there are many advantages to the use of exceptions in C++perhaps their greatest disadvantage lies in their implementationThe current exception implementations were designed to prioritiseperformance in the case where exceptions do not occur motivatedprimarily by the idea that exceptions are exceptional and thusshould rarely happen This prioritisation may suit applications forwhich errors are rare and execution time is paramount but it comesat the cost of other factors ndash notably increased binary sizes up to15 in some cases [4] memory usage and a loss of determinismin both execution time and memory when exceptions do occurThese drawbacks make exceptions unsuitable for use in embeddedsystems where binary size and determinism are often as if notmore important than overall execution time [12 18] However theC++ language its standard library and many prominent 3rd partylibraries such as ZMQ [10] and Boost [2] rely on exceptions Asa result embedded software developed in C++ generally disablesexceptions [8] at the cost of reduced disciplined error handlingthrough C++ exceptions and the exclusion of available librariesThe Arm C++ compiler tool chain for example and in contrast tomost other compilers disables exceptions by default [5]

In this paper we develop a novel C++ exception implementationwhich reduces memory and run-time costs and provides determin-ism The key idea is to apply and extend a recently developedlow-cost stack unwinding mechanism [19] for C++ exceptions Inour novel C++ exception implementation we make use of a stack-allocated object that records the necessary run-time information forthrowing an exception such as the type and size of the exceptionobject This state is allocated in a single place and is passed betweenfunctions via an implicit function parameter injected into functionswhich support exceptions The state is initialised by throw expres-sions and is re-used to enable re-throwing catch statements usethe state in order to determine whether they can handle the ex-ception After a call to a function which may throw exceptionsa run-time check is inserted to test whether the state containsan active exception We have implemented our new C++ excep-tion implementation in the LLVM compiler Clang and evaluatedit on both an embedded Arm as well as a general-purpose x86-64platform Using targeted micro-benchmarks and full applicationswe demonstrate binary size decreases up to 82 over existing im-plementations and substantial performance improvements whenhandling and propagating exceptions We demonstrate that our

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

void C()

Bar bar

throw 0

void B()

Foo foo

C()

void A()

try

B()

catch (intamp p)

Catch exception

Figure 1 Example code showing a possible scenario inwhichexceptions are used Function A has a trycatch block thatencompasses the call to B B allocates a local object and callsC which also allocates a local object and throws an excep-tion

novel C++ exception implementation achieves memory and execu-tion time determinism thus making it better suited for embeddedapplications with specified worst-case execution time requirements

11 Motivating ExampleConsider the motivating example in Figure 1 where function

A has a trycatch block that encompasses a call to B B allocatesa local object and calls C which also allocates a local object andthrows an exception During exception handling stack unwinding isperformed ie stack frames for functions C and B must be removedfrom the call stack and local objects destroyed Finally the catchhandler (found in A) is invoked Figure 3 shows this operation inaction for both the standard implementation and our proposedimplementation In each case the stack must be unwound from thethrow site in function C back to the catch handler in function AThis involves three steps(1) Local objects in C are destroyed and callee saved registers are

restored(2) Local objetcs in B are destroyed and called saved registers are

restored(3) Control is transferred to the catch handler in AThe most common implementation of C++ exception handling inuse by both eg GCC and Clang makes use of table-based stackunwinding It seeks to eliminate any runtime overhead in the casewhere no exceptions are thrown This is achieved by storing abstractinstruction sequences required for identifying the catch handlerrestoring the call stack and machine state and invoking objectdestructors in Unwind Tables which are generated at compile-timeand embedded in the final program binary When an exception isthrown the exception object is allocated on the heap and a 2-stageprocess for stack unwinding is started The first stage shown in

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Resolve catch handler(found in A)

Call Stack

Figure 2 The standard exception handling implementationrequires an initial phase to scan over the call stack lookingfor a catch handler This phase does not unwind the stack itidentifies how far back the stack should be unwound

Figure 2 is concerned with finding a suitable catch handler Usingthe unwind tables an embedded abstract machine executes theprepackaged instruction sequence to identify the catch handler byscanning over the call stack to determine how far back the stackshould be unwound The second stage shown in the top half ofFigure 3 then performs the actual stack unwinding Again anabstract machine implements a so-called personality routine whichuses instruction sequences stored in the unwind tables to destroylocal objects remove stack frames from the call stack update andcommit the machine state and eventually invoke the catch handler

The unwind tables used by the conventional C++ exception im-plementations can grow large and become complex Furthermorethese tables typically encode operations to perform during stackunwinding (such as restoring registers running object destructorsetc) which must be executed by an integrated software abstractmachine While this approach eliminates runtime overhead whenno exceptions occur it increases the memory footprint of the appli-cation and introduces non-determinism due to the abstract machineused at runtime for exception handling

Our novel C++ exception implementation shown in the bottomhalf of Figure 3 eliminates the need for unwind tables and the as-sociated abstract machine Instead we introduce a stack-allocatedexception state object which is allocated at the outermost exceptionpropagation barrier (function A in our example) and is populatedat throw time (function C) This state object contains bookkeepinginformation for propagating the exception between the throw andcatch sites We pass this object into any invoked function via animplicit function parameter and use it to initiate and orchestrateexception handling After each return from a function call the com-piler inserts an additional check of the exception state to determineif an exception is in progress and if so uses a standard functionreturn mechanism for step-by-step stack unwinding until we reachthe matching catch statement (stages 1 2 and 3 in Figure 3)

12 ContributionsIn this paper we contribute to the development of a novel C++exception implementation particularly suitable for use in embeddedsystems We address following issues

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

Our Implementation

Standard Implementation

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

foo

Return Address

Callee-saved Registers

barC

B

AException State

(1) Destroy bar(2) Restore saved registers(3) Return to B

Standard Function Return Sequence

Personality Routine(1) Destroy bar(2) Update machine state(3) Finish

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

fooB

A

Personality Routine(1) Destroy foo(2) Update machine state(3) Finish

UnwindTables

Return AddressCallee-saved RegistersA

Personality Routine(1) Commit machine state(2) Continue in catch handler

UnwindTables

(1) (2) (3)

(1) Check active flag in excp state(2) Continue in catch handler

Standard Function Return Sequence(1) Destroy foo(2) Restore saved registers(3) Return to A

Standard Function Return Sequence

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

fooB

AException State

Return Address

Callee-saved RegistersAException State

Figure 3 Stages involved in stack unwinding for the standard exception implementation (top) and our novel scheme (bottom)

(1) Reducing the binary size increase caused by exceptions(2) Permitting the bounding of memory usage and execution time

when using exceptions ie supporting determinism(3) Maintaining or improving the performance of exceptionsAlthough we focus on embedded systems we will ensure that thesegoals also apply to general-purpose applications

2 DETERMINISTIC C++ EXCEPTIONSStack unwinding forms a core part of the mechanisms underpinningexceptions It is also responsible for its poor execution times andincreased binary sizes and is part of the reason for exceptions beingnon-deterministic

Sutter [19] has only very recently proposed re-using the exist-ing function return mechanism in place of the traditional stackunwinding approaches requiring no additional data or instruc-tions to be stored and little to no overhead in unwinding the stackFurthermore by removing stack unwindingrsquos runtime reliance ontables encoded in the program binary itself the issue of time andspatial determinism is solved As it is possible to determine theworst-case execution times for programs not using exceptions itfollows that exception implementations making use of the samereturn mechanism must also be deterministic in stack unwinding

Given these clear advantages we have based our implementationon this design with a core difference being the replacement oftheir use of registers with function parameters allowing for mucheasier interoperability with C code which can simply provide theparameter as necessary

A limitation with [19] is that they require all exceptions be ofthe same type leaving much of the standard exception-handlingfunctionality up to the user Our novel approach includes a methodof throwing and catching exceptions of arbitrary types (as with

1 void foo(__exception_t __exception) throws

2 Exception state __exception assigned to by throw

3 throw SomeError()

4

5 int main()

6 State automatically allocated in main

7 __exception_t __exception_state

8 foo(amp__exception_state)

9 check for exception

10 if (__exception_stateactive) goto catch

11

Figure 4 Pseudocode showing injected exception state ob-ject (line 7) and parameter (line 1)

existing exception handling) without imposing any meaningfulexecution-time penalties when exceptions do not occur We alsoreduce the size of run-time type information (RTTI) and maintaindeterminism over existing implementations

Our schememakes use of a stack-allocated object that records thenecessary run-time information for throwing an exception such asthe type and size of the exception object This state is allocated in asingle place and is passed between functions via an implicit functionparameter injected into functions which support exceptions Thestate is initialised by throw expressions and is re-used to enablere-throwing catch statements use the state in order to determinewhether they can handle the exception After a call to a functionwhich may throw exceptions a run-time check is inserted to testwhether the state contains an active exception

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 auto safe_divide(float dividend float divisor) throws

2 if (divisor == 0)

3 throw stdinvalid_argument(Divide by zero)

4 else

5 return dividend divisor

6

Figure 5 Example function marked with our proposedthrows exception specifier (line 1) allowing exception prop-agation

Figure 4 gives an example of the variables the compiler willautomatically inject during code generation Line 1 shows the nor-mally hidden implicit exception state parameter __exception Line3 assigns values corresponding to the SomeError type to the ex-ception state parameter Line 7 shows the automatically allocated__exception_state variable emitted by all noexcept functionsLine 8 shows the exception state object being automatically passedto the throws function foo Line 10 shows the check inserted aftereach call to a throwing function to test for an active exception

The design of this implementation performs almost all of theexception handling directly within the functions being executed al-lowing the optimiser to work most effectively and by implementingexceptions on top of normal execution flow code used in return-ing from functions is re-used when unwinding the stack followingexceptions This helps to reduce code size and unpredictabilityand overcomes the largest roadblock in achieving deterministicexecution time This approach combines the exception-handlingmechanism of C++ with the inter-mixed error-checking and non-error-handling code integral to the design of [9]

21 Throws Exception SpecifierA fundamental issue is to decide which functions should take theimplicit exception parameter that our design requires (ie whichfunctions could throw exceptions) The natural choice would be toapply it to all functions with C++ linkage except for those markednoexcept in keeping with the current exception implementationHowever the issue with this approach would be that C++ and Cfunctions would then have different calling conventions WhilstC++ functions can in general be called without special handlingfrom C this is only due to the coincidence that the two callingconventions are identical

With our proposed scheme the decision to have C++ functionssupport exceptions by default is impractical as C++ has certainlanguage holes when it comes to specifying linkage Specificallyneither classes structs nor their members such as function point-ers can have a specified linkage Therefore we propose to havefunctions be declared noexcept by default This change in defaultexception specification preserves compatibility with C at the costof a language change in C++

Like noexcept we require functions that might throw encodethis into their signature (similar to [19]) as this impacts on how theyare called Thus to indicate that functions can throw exceptionswe introduce an exception specifier (that is part of the functionprototype) as shown in Figure 5

Name Descriptiontype The ldquotype idrdquo variable address identifying the

exception objectrsquos typebase_types Pointer to an array of ldquotype idrdquos identifying

the exception objectrsquos base class typesptr Whether the exception object is a pointer to

a non-pointer typesize The size in bytes of the exception objectalign The alignment in bytes of the exception ob-

jectctor The address of the objectrsquos move or copy con-

structordtor The address of the objectrsquos destructorbuffer The address of the exception objectactive Whether the exception is currently active

Table 1 Exception State Fields

22 Throwing Exceptions and Exception StateThe exception state object exists to hold run-time information on theexception currently in progress Every function marked noexceptincluding the program main function and functions used as threadentry-points allocates and initialises its own exception state objecteffectively making noexcept functions boundaries beyond whichexceptions cannot propagate

The address of this object is passed to any of the called functionsthat are marked throws allowing the exception to propagate downthe call chain as far as the first noexcept function where if nothandled the program is terminated

Following a throw expression the fields in the exception stateobject are populated at the throw-site with values known at compile-time corresponding to the exception object given in the expressionThe active flag is set to true the exception object is moved into abuffer and its address assigned to the buffer field of the exceptionstate

If the throw expression is directly within a try statement thecode jumps to the first catch block for handling Otherwise pro-vided the current function is marked as throws the exception ispropagated execution is transferred to the function epilogue as ifreturning normally but the return value for the function if any isnot initialised

During the return sequence local objects have their destructorsexecuted as if a standard function return had occurred efficientlyunwinding the stack The exception beingmarked active is notmadevisible to destructors as destructors are necessarily noexcept inthis context (ldquoIf a destructor called during stack unwinding ex-its with an exception stdterminate is called (1551)rdquo[17]) Inthis way the exception state is untouched until entering the ini-tial catch block If the current function is not marked as throwspropagation is replaced by an immediate call to stdterminateto terminate the program

After each function call to a function marked as throws a checkis automatically inserted by the compiler to test whether the activeflag has been set and thus whether the exception has been prop-agated This is the largest overhead as a result of our changes -requiring a test for each function called regardless of whether an

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 2: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded SystemsJames Renwick

University of EdinburghEdinburgh UK

s1227500smsedacuk

Tom SpinkUniversity of Edinburgh

Edinburgh UKtspinkinfedacuk

Bjoumlrn FrankeUniversity of Edinburgh

Edinburgh UKbfrankeinfedacuk

ABSTRACTThe C++ programming language offers a strong exception mech-anism for error handling at the language level improving codereadability safety and maintainability However current C++ im-plementations are targeted at general-purpose systems often sac-rificing code size memory usage and resource determinism forthe sake of performance This makes C++ exceptions a particularlyundesirable choice for embedded applications where code size andresource determinism are often paramount Consequently embed-ded coding guidelines either forbid the use of C++ exceptions orembedded C++ tool chains omit exception handling altogether Inthis paper we develop a novel implementation of C++ exceptionsthat eliminates these issues and enables their use for embeddedsystems We combine existing stack unwinding techniques with anew approach to memory management and run-time type infor-mation (RTTI) In doing so we create a compliant C++ exceptionhandling implementation providing bounded runtime and memoryusage while reducing code size requirements by up to 82 andincurring only a minimal runtime overhead for the common caseof no exceptions

CCS CONCEPTSbull Software and its engineering rarr Error handling and recov-ery Software performance Language features

KEYWORDSC++ exceptions error handling

ACM Reference FormatJames Renwick Tom Spink and Bjoumlrn Franke 2019 Low-Cost Determin-istic C++ Exceptions for Embedded Systems In Proceedings of the 28thInternational Conference on Compiler Construction (CC rsquo19) February 16ndash172019 Washington DC USA ACM New York NY USA 11 pages httpsdoiorg10114533025163307346

1 INTRODUCTIONOne of the major benefits of C++ is its promise of zero-cost ab-stractions and its rule of ldquoyou donrsquot pay for what you donrsquot userdquoThis allows the use of various safer and more productive modernprogramming idioms without the performance drawbacks with

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for components of this work owned by others than theauthor(s) must be honored Abstracting with credit is permitted To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specific permissionandor a fee Request permissions from permissionsacmorgCC rsquo19 February 16ndash17 2019 Washington DC USAcopy 2019 Copyright held by the ownerauthor(s) Publication rights licensed to ACMACM ISBN 978-1-4503-6277-11902 $1500httpsdoiorg10114533025163307346

which their use is often associated However C++rsquos wide usage hasnot been without issues For many of the different domains withinindustry the use of exceptions in C++ a core part of the languagehas been disallowed For example the Google C++ Style Guide [7]the Joint Strike Fighter Air Vehicle C++ Coding Standards and theMars Rover flight software [14] do not allow the use of exceptionsat all while MISRA C++ specifies detailed rules for their use [1]

While there are many advantages to the use of exceptions in C++perhaps their greatest disadvantage lies in their implementationThe current exception implementations were designed to prioritiseperformance in the case where exceptions do not occur motivatedprimarily by the idea that exceptions are exceptional and thusshould rarely happen This prioritisation may suit applications forwhich errors are rare and execution time is paramount but it comesat the cost of other factors ndash notably increased binary sizes up to15 in some cases [4] memory usage and a loss of determinismin both execution time and memory when exceptions do occurThese drawbacks make exceptions unsuitable for use in embeddedsystems where binary size and determinism are often as if notmore important than overall execution time [12 18] However theC++ language its standard library and many prominent 3rd partylibraries such as ZMQ [10] and Boost [2] rely on exceptions Asa result embedded software developed in C++ generally disablesexceptions [8] at the cost of reduced disciplined error handlingthrough C++ exceptions and the exclusion of available librariesThe Arm C++ compiler tool chain for example and in contrast tomost other compilers disables exceptions by default [5]

In this paper we develop a novel C++ exception implementationwhich reduces memory and run-time costs and provides determin-ism The key idea is to apply and extend a recently developedlow-cost stack unwinding mechanism [19] for C++ exceptions Inour novel C++ exception implementation we make use of a stack-allocated object that records the necessary run-time information forthrowing an exception such as the type and size of the exceptionobject This state is allocated in a single place and is passed betweenfunctions via an implicit function parameter injected into functionswhich support exceptions The state is initialised by throw expres-sions and is re-used to enable re-throwing catch statements usethe state in order to determine whether they can handle the ex-ception After a call to a function which may throw exceptionsa run-time check is inserted to test whether the state containsan active exception We have implemented our new C++ excep-tion implementation in the LLVM compiler Clang and evaluatedit on both an embedded Arm as well as a general-purpose x86-64platform Using targeted micro-benchmarks and full applicationswe demonstrate binary size decreases up to 82 over existing im-plementations and substantial performance improvements whenhandling and propagating exceptions We demonstrate that our

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

void C()

Bar bar

throw 0

void B()

Foo foo

C()

void A()

try

B()

catch (intamp p)

Catch exception

Figure 1 Example code showing a possible scenario inwhichexceptions are used Function A has a trycatch block thatencompasses the call to B B allocates a local object and callsC which also allocates a local object and throws an excep-tion

novel C++ exception implementation achieves memory and execu-tion time determinism thus making it better suited for embeddedapplications with specified worst-case execution time requirements

11 Motivating ExampleConsider the motivating example in Figure 1 where function

A has a trycatch block that encompasses a call to B B allocatesa local object and calls C which also allocates a local object andthrows an exception During exception handling stack unwinding isperformed ie stack frames for functions C and B must be removedfrom the call stack and local objects destroyed Finally the catchhandler (found in A) is invoked Figure 3 shows this operation inaction for both the standard implementation and our proposedimplementation In each case the stack must be unwound from thethrow site in function C back to the catch handler in function AThis involves three steps(1) Local objects in C are destroyed and callee saved registers are

restored(2) Local objetcs in B are destroyed and called saved registers are

restored(3) Control is transferred to the catch handler in AThe most common implementation of C++ exception handling inuse by both eg GCC and Clang makes use of table-based stackunwinding It seeks to eliminate any runtime overhead in the casewhere no exceptions are thrown This is achieved by storing abstractinstruction sequences required for identifying the catch handlerrestoring the call stack and machine state and invoking objectdestructors in Unwind Tables which are generated at compile-timeand embedded in the final program binary When an exception isthrown the exception object is allocated on the heap and a 2-stageprocess for stack unwinding is started The first stage shown in

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Resolve catch handler(found in A)

Call Stack

Figure 2 The standard exception handling implementationrequires an initial phase to scan over the call stack lookingfor a catch handler This phase does not unwind the stack itidentifies how far back the stack should be unwound

Figure 2 is concerned with finding a suitable catch handler Usingthe unwind tables an embedded abstract machine executes theprepackaged instruction sequence to identify the catch handler byscanning over the call stack to determine how far back the stackshould be unwound The second stage shown in the top half ofFigure 3 then performs the actual stack unwinding Again anabstract machine implements a so-called personality routine whichuses instruction sequences stored in the unwind tables to destroylocal objects remove stack frames from the call stack update andcommit the machine state and eventually invoke the catch handler

The unwind tables used by the conventional C++ exception im-plementations can grow large and become complex Furthermorethese tables typically encode operations to perform during stackunwinding (such as restoring registers running object destructorsetc) which must be executed by an integrated software abstractmachine While this approach eliminates runtime overhead whenno exceptions occur it increases the memory footprint of the appli-cation and introduces non-determinism due to the abstract machineused at runtime for exception handling

Our novel C++ exception implementation shown in the bottomhalf of Figure 3 eliminates the need for unwind tables and the as-sociated abstract machine Instead we introduce a stack-allocatedexception state object which is allocated at the outermost exceptionpropagation barrier (function A in our example) and is populatedat throw time (function C) This state object contains bookkeepinginformation for propagating the exception between the throw andcatch sites We pass this object into any invoked function via animplicit function parameter and use it to initiate and orchestrateexception handling After each return from a function call the com-piler inserts an additional check of the exception state to determineif an exception is in progress and if so uses a standard functionreturn mechanism for step-by-step stack unwinding until we reachthe matching catch statement (stages 1 2 and 3 in Figure 3)

12 ContributionsIn this paper we contribute to the development of a novel C++exception implementation particularly suitable for use in embeddedsystems We address following issues

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

Our Implementation

Standard Implementation

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

foo

Return Address

Callee-saved Registers

barC

B

AException State

(1) Destroy bar(2) Restore saved registers(3) Return to B

Standard Function Return Sequence

Personality Routine(1) Destroy bar(2) Update machine state(3) Finish

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

fooB

A

Personality Routine(1) Destroy foo(2) Update machine state(3) Finish

UnwindTables

Return AddressCallee-saved RegistersA

Personality Routine(1) Commit machine state(2) Continue in catch handler

UnwindTables

(1) (2) (3)

(1) Check active flag in excp state(2) Continue in catch handler

Standard Function Return Sequence(1) Destroy foo(2) Restore saved registers(3) Return to A

Standard Function Return Sequence

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

fooB

AException State

Return Address

Callee-saved RegistersAException State

Figure 3 Stages involved in stack unwinding for the standard exception implementation (top) and our novel scheme (bottom)

(1) Reducing the binary size increase caused by exceptions(2) Permitting the bounding of memory usage and execution time

when using exceptions ie supporting determinism(3) Maintaining or improving the performance of exceptionsAlthough we focus on embedded systems we will ensure that thesegoals also apply to general-purpose applications

2 DETERMINISTIC C++ EXCEPTIONSStack unwinding forms a core part of the mechanisms underpinningexceptions It is also responsible for its poor execution times andincreased binary sizes and is part of the reason for exceptions beingnon-deterministic

Sutter [19] has only very recently proposed re-using the exist-ing function return mechanism in place of the traditional stackunwinding approaches requiring no additional data or instruc-tions to be stored and little to no overhead in unwinding the stackFurthermore by removing stack unwindingrsquos runtime reliance ontables encoded in the program binary itself the issue of time andspatial determinism is solved As it is possible to determine theworst-case execution times for programs not using exceptions itfollows that exception implementations making use of the samereturn mechanism must also be deterministic in stack unwinding

Given these clear advantages we have based our implementationon this design with a core difference being the replacement oftheir use of registers with function parameters allowing for mucheasier interoperability with C code which can simply provide theparameter as necessary

A limitation with [19] is that they require all exceptions be ofthe same type leaving much of the standard exception-handlingfunctionality up to the user Our novel approach includes a methodof throwing and catching exceptions of arbitrary types (as with

1 void foo(__exception_t __exception) throws

2 Exception state __exception assigned to by throw

3 throw SomeError()

4

5 int main()

6 State automatically allocated in main

7 __exception_t __exception_state

8 foo(amp__exception_state)

9 check for exception

10 if (__exception_stateactive) goto catch

11

Figure 4 Pseudocode showing injected exception state ob-ject (line 7) and parameter (line 1)

existing exception handling) without imposing any meaningfulexecution-time penalties when exceptions do not occur We alsoreduce the size of run-time type information (RTTI) and maintaindeterminism over existing implementations

Our schememakes use of a stack-allocated object that records thenecessary run-time information for throwing an exception such asthe type and size of the exception object This state is allocated in asingle place and is passed between functions via an implicit functionparameter injected into functions which support exceptions Thestate is initialised by throw expressions and is re-used to enablere-throwing catch statements use the state in order to determinewhether they can handle the exception After a call to a functionwhich may throw exceptions a run-time check is inserted to testwhether the state contains an active exception

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 auto safe_divide(float dividend float divisor) throws

2 if (divisor == 0)

3 throw stdinvalid_argument(Divide by zero)

4 else

5 return dividend divisor

6

Figure 5 Example function marked with our proposedthrows exception specifier (line 1) allowing exception prop-agation

Figure 4 gives an example of the variables the compiler willautomatically inject during code generation Line 1 shows the nor-mally hidden implicit exception state parameter __exception Line3 assigns values corresponding to the SomeError type to the ex-ception state parameter Line 7 shows the automatically allocated__exception_state variable emitted by all noexcept functionsLine 8 shows the exception state object being automatically passedto the throws function foo Line 10 shows the check inserted aftereach call to a throwing function to test for an active exception

The design of this implementation performs almost all of theexception handling directly within the functions being executed al-lowing the optimiser to work most effectively and by implementingexceptions on top of normal execution flow code used in return-ing from functions is re-used when unwinding the stack followingexceptions This helps to reduce code size and unpredictabilityand overcomes the largest roadblock in achieving deterministicexecution time This approach combines the exception-handlingmechanism of C++ with the inter-mixed error-checking and non-error-handling code integral to the design of [9]

21 Throws Exception SpecifierA fundamental issue is to decide which functions should take theimplicit exception parameter that our design requires (ie whichfunctions could throw exceptions) The natural choice would be toapply it to all functions with C++ linkage except for those markednoexcept in keeping with the current exception implementationHowever the issue with this approach would be that C++ and Cfunctions would then have different calling conventions WhilstC++ functions can in general be called without special handlingfrom C this is only due to the coincidence that the two callingconventions are identical

With our proposed scheme the decision to have C++ functionssupport exceptions by default is impractical as C++ has certainlanguage holes when it comes to specifying linkage Specificallyneither classes structs nor their members such as function point-ers can have a specified linkage Therefore we propose to havefunctions be declared noexcept by default This change in defaultexception specification preserves compatibility with C at the costof a language change in C++

Like noexcept we require functions that might throw encodethis into their signature (similar to [19]) as this impacts on how theyare called Thus to indicate that functions can throw exceptionswe introduce an exception specifier (that is part of the functionprototype) as shown in Figure 5

Name Descriptiontype The ldquotype idrdquo variable address identifying the

exception objectrsquos typebase_types Pointer to an array of ldquotype idrdquos identifying

the exception objectrsquos base class typesptr Whether the exception object is a pointer to

a non-pointer typesize The size in bytes of the exception objectalign The alignment in bytes of the exception ob-

jectctor The address of the objectrsquos move or copy con-

structordtor The address of the objectrsquos destructorbuffer The address of the exception objectactive Whether the exception is currently active

Table 1 Exception State Fields

22 Throwing Exceptions and Exception StateThe exception state object exists to hold run-time information on theexception currently in progress Every function marked noexceptincluding the program main function and functions used as threadentry-points allocates and initialises its own exception state objecteffectively making noexcept functions boundaries beyond whichexceptions cannot propagate

The address of this object is passed to any of the called functionsthat are marked throws allowing the exception to propagate downthe call chain as far as the first noexcept function where if nothandled the program is terminated

Following a throw expression the fields in the exception stateobject are populated at the throw-site with values known at compile-time corresponding to the exception object given in the expressionThe active flag is set to true the exception object is moved into abuffer and its address assigned to the buffer field of the exceptionstate

If the throw expression is directly within a try statement thecode jumps to the first catch block for handling Otherwise pro-vided the current function is marked as throws the exception ispropagated execution is transferred to the function epilogue as ifreturning normally but the return value for the function if any isnot initialised

During the return sequence local objects have their destructorsexecuted as if a standard function return had occurred efficientlyunwinding the stack The exception beingmarked active is notmadevisible to destructors as destructors are necessarily noexcept inthis context (ldquoIf a destructor called during stack unwinding ex-its with an exception stdterminate is called (1551)rdquo[17]) Inthis way the exception state is untouched until entering the ini-tial catch block If the current function is not marked as throwspropagation is replaced by an immediate call to stdterminateto terminate the program

After each function call to a function marked as throws a checkis automatically inserted by the compiler to test whether the activeflag has been set and thus whether the exception has been prop-agated This is the largest overhead as a result of our changes -requiring a test for each function called regardless of whether an

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 3: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

void C()

Bar bar

throw 0

void B()

Foo foo

C()

void A()

try

B()

catch (intamp p)

Catch exception

Figure 1 Example code showing a possible scenario inwhichexceptions are used Function A has a trycatch block thatencompasses the call to B B allocates a local object and callsC which also allocates a local object and throws an excep-tion

novel C++ exception implementation achieves memory and execu-tion time determinism thus making it better suited for embeddedapplications with specified worst-case execution time requirements

11 Motivating ExampleConsider the motivating example in Figure 1 where function

A has a trycatch block that encompasses a call to B B allocatesa local object and calls C which also allocates a local object andthrows an exception During exception handling stack unwinding isperformed ie stack frames for functions C and B must be removedfrom the call stack and local objects destroyed Finally the catchhandler (found in A) is invoked Figure 3 shows this operation inaction for both the standard implementation and our proposedimplementation In each case the stack must be unwound from thethrow site in function C back to the catch handler in function AThis involves three steps(1) Local objects in C are destroyed and callee saved registers are

restored(2) Local objetcs in B are destroyed and called saved registers are

restored(3) Control is transferred to the catch handler in AThe most common implementation of C++ exception handling inuse by both eg GCC and Clang makes use of table-based stackunwinding It seeks to eliminate any runtime overhead in the casewhere no exceptions are thrown This is achieved by storing abstractinstruction sequences required for identifying the catch handlerrestoring the call stack and machine state and invoking objectdestructors in Unwind Tables which are generated at compile-timeand embedded in the final program binary When an exception isthrown the exception object is allocated on the heap and a 2-stageprocess for stack unwinding is started The first stage shown in

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Resolve catch handler(found in A)

Call Stack

Figure 2 The standard exception handling implementationrequires an initial phase to scan over the call stack lookingfor a catch handler This phase does not unwind the stack itidentifies how far back the stack should be unwound

Figure 2 is concerned with finding a suitable catch handler Usingthe unwind tables an embedded abstract machine executes theprepackaged instruction sequence to identify the catch handler byscanning over the call stack to determine how far back the stackshould be unwound The second stage shown in the top half ofFigure 3 then performs the actual stack unwinding Again anabstract machine implements a so-called personality routine whichuses instruction sequences stored in the unwind tables to destroylocal objects remove stack frames from the call stack update andcommit the machine state and eventually invoke the catch handler

The unwind tables used by the conventional C++ exception im-plementations can grow large and become complex Furthermorethese tables typically encode operations to perform during stackunwinding (such as restoring registers running object destructorsetc) which must be executed by an integrated software abstractmachine While this approach eliminates runtime overhead whenno exceptions occur it increases the memory footprint of the appli-cation and introduces non-determinism due to the abstract machineused at runtime for exception handling

Our novel C++ exception implementation shown in the bottomhalf of Figure 3 eliminates the need for unwind tables and the as-sociated abstract machine Instead we introduce a stack-allocatedexception state object which is allocated at the outermost exceptionpropagation barrier (function A in our example) and is populatedat throw time (function C) This state object contains bookkeepinginformation for propagating the exception between the throw andcatch sites We pass this object into any invoked function via animplicit function parameter and use it to initiate and orchestrateexception handling After each return from a function call the com-piler inserts an additional check of the exception state to determineif an exception is in progress and if so uses a standard functionreturn mechanism for step-by-step stack unwinding until we reachthe matching catch statement (stages 1 2 and 3 in Figure 3)

12 ContributionsIn this paper we contribute to the development of a novel C++exception implementation particularly suitable for use in embeddedsystems We address following issues

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

Our Implementation

Standard Implementation

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

foo

Return Address

Callee-saved Registers

barC

B

AException State

(1) Destroy bar(2) Restore saved registers(3) Return to B

Standard Function Return Sequence

Personality Routine(1) Destroy bar(2) Update machine state(3) Finish

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

fooB

A

Personality Routine(1) Destroy foo(2) Update machine state(3) Finish

UnwindTables

Return AddressCallee-saved RegistersA

Personality Routine(1) Commit machine state(2) Continue in catch handler

UnwindTables

(1) (2) (3)

(1) Check active flag in excp state(2) Continue in catch handler

Standard Function Return Sequence(1) Destroy foo(2) Restore saved registers(3) Return to A

Standard Function Return Sequence

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

fooB

AException State

Return Address

Callee-saved RegistersAException State

Figure 3 Stages involved in stack unwinding for the standard exception implementation (top) and our novel scheme (bottom)

(1) Reducing the binary size increase caused by exceptions(2) Permitting the bounding of memory usage and execution time

when using exceptions ie supporting determinism(3) Maintaining or improving the performance of exceptionsAlthough we focus on embedded systems we will ensure that thesegoals also apply to general-purpose applications

2 DETERMINISTIC C++ EXCEPTIONSStack unwinding forms a core part of the mechanisms underpinningexceptions It is also responsible for its poor execution times andincreased binary sizes and is part of the reason for exceptions beingnon-deterministic

Sutter [19] has only very recently proposed re-using the exist-ing function return mechanism in place of the traditional stackunwinding approaches requiring no additional data or instruc-tions to be stored and little to no overhead in unwinding the stackFurthermore by removing stack unwindingrsquos runtime reliance ontables encoded in the program binary itself the issue of time andspatial determinism is solved As it is possible to determine theworst-case execution times for programs not using exceptions itfollows that exception implementations making use of the samereturn mechanism must also be deterministic in stack unwinding

Given these clear advantages we have based our implementationon this design with a core difference being the replacement oftheir use of registers with function parameters allowing for mucheasier interoperability with C code which can simply provide theparameter as necessary

A limitation with [19] is that they require all exceptions be ofthe same type leaving much of the standard exception-handlingfunctionality up to the user Our novel approach includes a methodof throwing and catching exceptions of arbitrary types (as with

1 void foo(__exception_t __exception) throws

2 Exception state __exception assigned to by throw

3 throw SomeError()

4

5 int main()

6 State automatically allocated in main

7 __exception_t __exception_state

8 foo(amp__exception_state)

9 check for exception

10 if (__exception_stateactive) goto catch

11

Figure 4 Pseudocode showing injected exception state ob-ject (line 7) and parameter (line 1)

existing exception handling) without imposing any meaningfulexecution-time penalties when exceptions do not occur We alsoreduce the size of run-time type information (RTTI) and maintaindeterminism over existing implementations

Our schememakes use of a stack-allocated object that records thenecessary run-time information for throwing an exception such asthe type and size of the exception object This state is allocated in asingle place and is passed between functions via an implicit functionparameter injected into functions which support exceptions Thestate is initialised by throw expressions and is re-used to enablere-throwing catch statements use the state in order to determinewhether they can handle the exception After a call to a functionwhich may throw exceptions a run-time check is inserted to testwhether the state contains an active exception

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 auto safe_divide(float dividend float divisor) throws

2 if (divisor == 0)

3 throw stdinvalid_argument(Divide by zero)

4 else

5 return dividend divisor

6

Figure 5 Example function marked with our proposedthrows exception specifier (line 1) allowing exception prop-agation

Figure 4 gives an example of the variables the compiler willautomatically inject during code generation Line 1 shows the nor-mally hidden implicit exception state parameter __exception Line3 assigns values corresponding to the SomeError type to the ex-ception state parameter Line 7 shows the automatically allocated__exception_state variable emitted by all noexcept functionsLine 8 shows the exception state object being automatically passedto the throws function foo Line 10 shows the check inserted aftereach call to a throwing function to test for an active exception

The design of this implementation performs almost all of theexception handling directly within the functions being executed al-lowing the optimiser to work most effectively and by implementingexceptions on top of normal execution flow code used in return-ing from functions is re-used when unwinding the stack followingexceptions This helps to reduce code size and unpredictabilityand overcomes the largest roadblock in achieving deterministicexecution time This approach combines the exception-handlingmechanism of C++ with the inter-mixed error-checking and non-error-handling code integral to the design of [9]

21 Throws Exception SpecifierA fundamental issue is to decide which functions should take theimplicit exception parameter that our design requires (ie whichfunctions could throw exceptions) The natural choice would be toapply it to all functions with C++ linkage except for those markednoexcept in keeping with the current exception implementationHowever the issue with this approach would be that C++ and Cfunctions would then have different calling conventions WhilstC++ functions can in general be called without special handlingfrom C this is only due to the coincidence that the two callingconventions are identical

With our proposed scheme the decision to have C++ functionssupport exceptions by default is impractical as C++ has certainlanguage holes when it comes to specifying linkage Specificallyneither classes structs nor their members such as function point-ers can have a specified linkage Therefore we propose to havefunctions be declared noexcept by default This change in defaultexception specification preserves compatibility with C at the costof a language change in C++

Like noexcept we require functions that might throw encodethis into their signature (similar to [19]) as this impacts on how theyare called Thus to indicate that functions can throw exceptionswe introduce an exception specifier (that is part of the functionprototype) as shown in Figure 5

Name Descriptiontype The ldquotype idrdquo variable address identifying the

exception objectrsquos typebase_types Pointer to an array of ldquotype idrdquos identifying

the exception objectrsquos base class typesptr Whether the exception object is a pointer to

a non-pointer typesize The size in bytes of the exception objectalign The alignment in bytes of the exception ob-

jectctor The address of the objectrsquos move or copy con-

structordtor The address of the objectrsquos destructorbuffer The address of the exception objectactive Whether the exception is currently active

Table 1 Exception State Fields

22 Throwing Exceptions and Exception StateThe exception state object exists to hold run-time information on theexception currently in progress Every function marked noexceptincluding the program main function and functions used as threadentry-points allocates and initialises its own exception state objecteffectively making noexcept functions boundaries beyond whichexceptions cannot propagate

The address of this object is passed to any of the called functionsthat are marked throws allowing the exception to propagate downthe call chain as far as the first noexcept function where if nothandled the program is terminated

Following a throw expression the fields in the exception stateobject are populated at the throw-site with values known at compile-time corresponding to the exception object given in the expressionThe active flag is set to true the exception object is moved into abuffer and its address assigned to the buffer field of the exceptionstate

If the throw expression is directly within a try statement thecode jumps to the first catch block for handling Otherwise pro-vided the current function is marked as throws the exception ispropagated execution is transferred to the function epilogue as ifreturning normally but the return value for the function if any isnot initialised

During the return sequence local objects have their destructorsexecuted as if a standard function return had occurred efficientlyunwinding the stack The exception beingmarked active is notmadevisible to destructors as destructors are necessarily noexcept inthis context (ldquoIf a destructor called during stack unwinding ex-its with an exception stdterminate is called (1551)rdquo[17]) Inthis way the exception state is untouched until entering the ini-tial catch block If the current function is not marked as throwspropagation is replaced by an immediate call to stdterminateto terminate the program

After each function call to a function marked as throws a checkis automatically inserted by the compiler to test whether the activeflag has been set and thus whether the exception has been prop-agated This is the largest overhead as a result of our changes -requiring a test for each function called regardless of whether an

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 4: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

Our Implementation

Standard Implementation

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

foo

Return AddressCallee-saved Registers

barC

B

AUnwindTables

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

foo

Return Address

Callee-saved Registers

barC

B

AException State

(1) Destroy bar(2) Restore saved registers(3) Return to B

Standard Function Return Sequence

Personality Routine(1) Destroy bar(2) Update machine state(3) Finish

Return AddressCallee-saved Registers

Return AddressCallee-saved Registers

fooB

A

Personality Routine(1) Destroy foo(2) Update machine state(3) Finish

UnwindTables

Return AddressCallee-saved RegistersA

Personality Routine(1) Commit machine state(2) Continue in catch handler

UnwindTables

(1) (2) (3)

(1) Check active flag in excp state(2) Continue in catch handler

Standard Function Return Sequence(1) Destroy foo(2) Restore saved registers(3) Return to A

Standard Function Return Sequence

Return Address

Callee-saved Registers

Return Address

Callee-saved Registers

fooB

AException State

Return Address

Callee-saved RegistersAException State

Figure 3 Stages involved in stack unwinding for the standard exception implementation (top) and our novel scheme (bottom)

(1) Reducing the binary size increase caused by exceptions(2) Permitting the bounding of memory usage and execution time

when using exceptions ie supporting determinism(3) Maintaining or improving the performance of exceptionsAlthough we focus on embedded systems we will ensure that thesegoals also apply to general-purpose applications

2 DETERMINISTIC C++ EXCEPTIONSStack unwinding forms a core part of the mechanisms underpinningexceptions It is also responsible for its poor execution times andincreased binary sizes and is part of the reason for exceptions beingnon-deterministic

Sutter [19] has only very recently proposed re-using the exist-ing function return mechanism in place of the traditional stackunwinding approaches requiring no additional data or instruc-tions to be stored and little to no overhead in unwinding the stackFurthermore by removing stack unwindingrsquos runtime reliance ontables encoded in the program binary itself the issue of time andspatial determinism is solved As it is possible to determine theworst-case execution times for programs not using exceptions itfollows that exception implementations making use of the samereturn mechanism must also be deterministic in stack unwinding

Given these clear advantages we have based our implementationon this design with a core difference being the replacement oftheir use of registers with function parameters allowing for mucheasier interoperability with C code which can simply provide theparameter as necessary

A limitation with [19] is that they require all exceptions be ofthe same type leaving much of the standard exception-handlingfunctionality up to the user Our novel approach includes a methodof throwing and catching exceptions of arbitrary types (as with

1 void foo(__exception_t __exception) throws

2 Exception state __exception assigned to by throw

3 throw SomeError()

4

5 int main()

6 State automatically allocated in main

7 __exception_t __exception_state

8 foo(amp__exception_state)

9 check for exception

10 if (__exception_stateactive) goto catch

11

Figure 4 Pseudocode showing injected exception state ob-ject (line 7) and parameter (line 1)

existing exception handling) without imposing any meaningfulexecution-time penalties when exceptions do not occur We alsoreduce the size of run-time type information (RTTI) and maintaindeterminism over existing implementations

Our schememakes use of a stack-allocated object that records thenecessary run-time information for throwing an exception such asthe type and size of the exception object This state is allocated in asingle place and is passed between functions via an implicit functionparameter injected into functions which support exceptions Thestate is initialised by throw expressions and is re-used to enablere-throwing catch statements use the state in order to determinewhether they can handle the exception After a call to a functionwhich may throw exceptions a run-time check is inserted to testwhether the state contains an active exception

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 auto safe_divide(float dividend float divisor) throws

2 if (divisor == 0)

3 throw stdinvalid_argument(Divide by zero)

4 else

5 return dividend divisor

6

Figure 5 Example function marked with our proposedthrows exception specifier (line 1) allowing exception prop-agation

Figure 4 gives an example of the variables the compiler willautomatically inject during code generation Line 1 shows the nor-mally hidden implicit exception state parameter __exception Line3 assigns values corresponding to the SomeError type to the ex-ception state parameter Line 7 shows the automatically allocated__exception_state variable emitted by all noexcept functionsLine 8 shows the exception state object being automatically passedto the throws function foo Line 10 shows the check inserted aftereach call to a throwing function to test for an active exception

The design of this implementation performs almost all of theexception handling directly within the functions being executed al-lowing the optimiser to work most effectively and by implementingexceptions on top of normal execution flow code used in return-ing from functions is re-used when unwinding the stack followingexceptions This helps to reduce code size and unpredictabilityand overcomes the largest roadblock in achieving deterministicexecution time This approach combines the exception-handlingmechanism of C++ with the inter-mixed error-checking and non-error-handling code integral to the design of [9]

21 Throws Exception SpecifierA fundamental issue is to decide which functions should take theimplicit exception parameter that our design requires (ie whichfunctions could throw exceptions) The natural choice would be toapply it to all functions with C++ linkage except for those markednoexcept in keeping with the current exception implementationHowever the issue with this approach would be that C++ and Cfunctions would then have different calling conventions WhilstC++ functions can in general be called without special handlingfrom C this is only due to the coincidence that the two callingconventions are identical

With our proposed scheme the decision to have C++ functionssupport exceptions by default is impractical as C++ has certainlanguage holes when it comes to specifying linkage Specificallyneither classes structs nor their members such as function point-ers can have a specified linkage Therefore we propose to havefunctions be declared noexcept by default This change in defaultexception specification preserves compatibility with C at the costof a language change in C++

Like noexcept we require functions that might throw encodethis into their signature (similar to [19]) as this impacts on how theyare called Thus to indicate that functions can throw exceptionswe introduce an exception specifier (that is part of the functionprototype) as shown in Figure 5

Name Descriptiontype The ldquotype idrdquo variable address identifying the

exception objectrsquos typebase_types Pointer to an array of ldquotype idrdquos identifying

the exception objectrsquos base class typesptr Whether the exception object is a pointer to

a non-pointer typesize The size in bytes of the exception objectalign The alignment in bytes of the exception ob-

jectctor The address of the objectrsquos move or copy con-

structordtor The address of the objectrsquos destructorbuffer The address of the exception objectactive Whether the exception is currently active

Table 1 Exception State Fields

22 Throwing Exceptions and Exception StateThe exception state object exists to hold run-time information on theexception currently in progress Every function marked noexceptincluding the program main function and functions used as threadentry-points allocates and initialises its own exception state objecteffectively making noexcept functions boundaries beyond whichexceptions cannot propagate

The address of this object is passed to any of the called functionsthat are marked throws allowing the exception to propagate downthe call chain as far as the first noexcept function where if nothandled the program is terminated

Following a throw expression the fields in the exception stateobject are populated at the throw-site with values known at compile-time corresponding to the exception object given in the expressionThe active flag is set to true the exception object is moved into abuffer and its address assigned to the buffer field of the exceptionstate

If the throw expression is directly within a try statement thecode jumps to the first catch block for handling Otherwise pro-vided the current function is marked as throws the exception ispropagated execution is transferred to the function epilogue as ifreturning normally but the return value for the function if any isnot initialised

During the return sequence local objects have their destructorsexecuted as if a standard function return had occurred efficientlyunwinding the stack The exception beingmarked active is notmadevisible to destructors as destructors are necessarily noexcept inthis context (ldquoIf a destructor called during stack unwinding ex-its with an exception stdterminate is called (1551)rdquo[17]) Inthis way the exception state is untouched until entering the ini-tial catch block If the current function is not marked as throwspropagation is replaced by an immediate call to stdterminateto terminate the program

After each function call to a function marked as throws a checkis automatically inserted by the compiler to test whether the activeflag has been set and thus whether the exception has been prop-agated This is the largest overhead as a result of our changes -requiring a test for each function called regardless of whether an

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 5: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 auto safe_divide(float dividend float divisor) throws

2 if (divisor == 0)

3 throw stdinvalid_argument(Divide by zero)

4 else

5 return dividend divisor

6

Figure 5 Example function marked with our proposedthrows exception specifier (line 1) allowing exception prop-agation

Figure 4 gives an example of the variables the compiler willautomatically inject during code generation Line 1 shows the nor-mally hidden implicit exception state parameter __exception Line3 assigns values corresponding to the SomeError type to the ex-ception state parameter Line 7 shows the automatically allocated__exception_state variable emitted by all noexcept functionsLine 8 shows the exception state object being automatically passedto the throws function foo Line 10 shows the check inserted aftereach call to a throwing function to test for an active exception

The design of this implementation performs almost all of theexception handling directly within the functions being executed al-lowing the optimiser to work most effectively and by implementingexceptions on top of normal execution flow code used in return-ing from functions is re-used when unwinding the stack followingexceptions This helps to reduce code size and unpredictabilityand overcomes the largest roadblock in achieving deterministicexecution time This approach combines the exception-handlingmechanism of C++ with the inter-mixed error-checking and non-error-handling code integral to the design of [9]

21 Throws Exception SpecifierA fundamental issue is to decide which functions should take theimplicit exception parameter that our design requires (ie whichfunctions could throw exceptions) The natural choice would be toapply it to all functions with C++ linkage except for those markednoexcept in keeping with the current exception implementationHowever the issue with this approach would be that C++ and Cfunctions would then have different calling conventions WhilstC++ functions can in general be called without special handlingfrom C this is only due to the coincidence that the two callingconventions are identical

With our proposed scheme the decision to have C++ functionssupport exceptions by default is impractical as C++ has certainlanguage holes when it comes to specifying linkage Specificallyneither classes structs nor their members such as function point-ers can have a specified linkage Therefore we propose to havefunctions be declared noexcept by default This change in defaultexception specification preserves compatibility with C at the costof a language change in C++

Like noexcept we require functions that might throw encodethis into their signature (similar to [19]) as this impacts on how theyare called Thus to indicate that functions can throw exceptionswe introduce an exception specifier (that is part of the functionprototype) as shown in Figure 5

Name Descriptiontype The ldquotype idrdquo variable address identifying the

exception objectrsquos typebase_types Pointer to an array of ldquotype idrdquos identifying

the exception objectrsquos base class typesptr Whether the exception object is a pointer to

a non-pointer typesize The size in bytes of the exception objectalign The alignment in bytes of the exception ob-

jectctor The address of the objectrsquos move or copy con-

structordtor The address of the objectrsquos destructorbuffer The address of the exception objectactive Whether the exception is currently active

Table 1 Exception State Fields

22 Throwing Exceptions and Exception StateThe exception state object exists to hold run-time information on theexception currently in progress Every function marked noexceptincluding the program main function and functions used as threadentry-points allocates and initialises its own exception state objecteffectively making noexcept functions boundaries beyond whichexceptions cannot propagate

The address of this object is passed to any of the called functionsthat are marked throws allowing the exception to propagate downthe call chain as far as the first noexcept function where if nothandled the program is terminated

Following a throw expression the fields in the exception stateobject are populated at the throw-site with values known at compile-time corresponding to the exception object given in the expressionThe active flag is set to true the exception object is moved into abuffer and its address assigned to the buffer field of the exceptionstate

If the throw expression is directly within a try statement thecode jumps to the first catch block for handling Otherwise pro-vided the current function is marked as throws the exception ispropagated execution is transferred to the function epilogue as ifreturning normally but the return value for the function if any isnot initialised

During the return sequence local objects have their destructorsexecuted as if a standard function return had occurred efficientlyunwinding the stack The exception beingmarked active is notmadevisible to destructors as destructors are necessarily noexcept inthis context (ldquoIf a destructor called during stack unwinding ex-its with an exception stdterminate is called (1551)rdquo[17]) Inthis way the exception state is untouched until entering the ini-tial catch block If the current function is not marked as throwspropagation is replaced by an immediate call to stdterminateto terminate the program

After each function call to a function marked as throws a checkis automatically inserted by the compiler to test whether the activeflag has been set and thus whether the exception has been prop-agated This is the largest overhead as a result of our changes -requiring a test for each function called regardless of whether an

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 6: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

exception was thrown or not There is perhaps a good case to bemade for its necessity however By indicating that a function canthrow via throws the developer is encoding into the function sig-nature the fact that an error may occur during its execution Thusregardless of performance requirements some test must be madefor that error in a correct program

23 Handling ExceptionsOnce an exception has been thrown and if it does not result in acall to stdterminate control is transferred to the first availablecatch block

Before each catch block executes it first compares for valueequality the ldquotype idrdquo of the exception type it handles and the typemarked in the type field of the exception state If they are equal orif the type the catch block handles is a base type of the type in thetype field then the catch block is entered Otherwise executionjumps to the next catch block and repeats this process until thecatch block is a catch-all or all blocks have been exhausted If noblocks remain the exception continues propagation as describedpreviously

Once a catch block is executed the active flag of the exceptionstate is cleared memory is allocated on the stack for the exceptionobject and the move or copy constructor is invoked to transferownership of the exception object into the current block

If an exception was to occur during the move or copy operationstdterminate is called to satisfy the C++ specification whichstates ldquoIf the exception handling mechanism after completing eval-uation of the expression to be thrown but before the exception iscaught calls a function that exits via an exception stdterminateis called (1551)rdquo

Catch filters can take two forms (1) by value where the catchvariable represents an object allocated within the catch block or (2)by reference where the catch variable is a reference to the exceptionobject allocated elsewhere

In the latter case an additional unnamed variable is introducedin which to store the exception object and its allocation and ini-tialisation is performed as described above using the equivalentnon-reference type The catch variable reference is then bound tothis unnamed variable In this way even when the exception objectis caught by reference it is still owned by the catch block

Following a successful movecopy of the exception object thedestructor is called on the original exception object instance withinthe buffer Naturally as destructors are necessarily noexcept werethe destructor to exit with an exception again stdterminatewould be called

With the catch variable successfully initialised the catch blockthen commences execution Once complete execution moves to apoint past the final catch block in the current scope and the catchvariable is destroyed as usual

24 Re-ThrowingOne of the issues with maintaining and passing references to asingle instance of exception state via the exception state parameteris thatmore than one exceptionmay be active at a time and cruciallythe initial exception may require to be re-thrown following theintermediary exception

try throw 1

catch (int i)

try throw 2

catch ()

throw

Figure 6 Re-throwing following a nested exception Thefirst exceptionrsquos state must be persisted and not overwrittenby the second exception to allow the re-throw

Figure 6 gives an example The first exception is thrown on line1 caught on line 2 then a second exception is thrown on line 3and caught by the catch-all on line 4 The first exception is thenre-thrown on line 5 which requires the first exceptionrsquos state tohave persisted across the second inner throwcatch

To allow for this a local copy of the exception state is allocatedwhen entering a try block and if an exception occurswithin the tryblock the exception modifies the local copy If the same exceptionis not handled by the try blockrsquos corresponding catch block orblocks the local state is copied back to either the outer state orto the functionrsquos exception state as passed in the parameter forexception propagation

If the inner exception is handled correctly only its local excep-tion state is modified thus the outer exception state is maintainedand can be used for example to successfully re-throw the outerexception As we only initialise the active field of the local copyits cost is a simple stack allocation plus zero-initialisation of aboolean flag

C++ already has a mechanism for obtaining and re-throwingexceptions by making and passing around an explicit copy of (orreference to) the exception state via the use ofstdcurrent_exception() which when called returns an in-stance of stdexception_ptr To provide similar functionality inour implementation we propose an equivalent mechanism namelystdget_exception_obj This function returns a reference to astdexception_obj instance stored on the stack

What makes stdexception_obj novel is that its type is tem-plated such that developers can specify a custom allocator givingthem control over where the exception object is to be allocatedwhen being transferred to a new instance However we restrict theuse of stdget_exception_obj such that it can only be calledfrom directly within a catch block This makes sense from a value-oriented perspective - functions can only inspect and access theirown variables and not their callersrsquo unless shared explicitly viaparameters Since in our implementation the exception object isdirectly owned by catch block functions called from within thatcatch block would not normally have access to the object

Re-throwing in our implementation is achieved by calling thestdrethrow function and passing a stdexception_obj in-stance This is intuitive arguably more so than the bare throwexpression and crucially makes clear the cost of re-throwing asstdexception_obj instances must be explicitly moved around

Figure 7 shows a complete example On lines 1 and 2 a function isdeclared taking a stdexception_obj instance using the default

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 7: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

1 void store_exception(

2 stdexception_objltstdallocatorltconst chargtgt obj)

3 throws

4 Move the exception object into a global

5 resultexception = stdmove(obj)

6

7 ---------------------------------

8 try

9

10 catch ()

11 Move the exception object out of the catch block

12 store_exception(stdmove(stdget_exception_obj()))

13

14 The exception can be re-thrown from a global

15 stdrethrow(stdmove(resultexception))

Figure 7 Example of re-throwing the exception ob-ject using our proposed stdrethrow function andstdexception_obj type

allocator This object is move-constructed from the correspondingstdexception_obj instance representing the exception objectwithin the catch block on line 12 allocating the exception object asinstructed The object is then moved and stored in a global variableresultexception on line 5 which is then re-thrown on line 15

To enable re-throwing the stdexception_obj instance alsocontains the exception state object which is already populated withthe details of the exception Thus all that is required to re-throwthat exception is to copy those fields to the current exception staterestore the exception object to the buffer to set the active flag andthen to attempt to propagate the exception as if throwing normally

25 Throwing DestructorsAlthough generally advised against by the C++ specification de-structors can throw and can be explicitly marked as throws Weresuch destructors to be called during stack unwinding they wouldincorrectly share the same exception state as the uncaught excep-tion In particular the active flag would be set leading to incorrectexecution flow when function calls from within the destructor per-formed their exception check

To avoid this we modify the compiler such that when throwingdestructors are entered they allocate and initialise their own localexception state as if they were a noexcept function This localstate will be used within the destructor If at any point an exceptionis unhandled within the destructor such that it would exit with thatexception it first checks the exception state parameter to see if it ismarked as active If so two unhandled exceptions are in progressand thus the program must be terminated with stdterminateper the C++ standard (1551 as quoted above)

As throwing destructors may be called when an exception is notactive as part of normal logic these additional measures consti-tute overhead on the in the case where exceptions are not thrownHowever throwing destructors are rare and are conceptually odd -exceptions exist to indicate the failure of the operation but in C++destruction always succeeds

26 Exception Object Type IdentificationOne of the disadvantages of current exception implementationsis their use of RTTI in matching the types of thrown exceptionobjects to a corresponding catch handler Current approaches makeuse of complete RTTI objects in exception binaries emitting a fullstdtype_info instance for each type which includes a uniquestring identifying it Either the address of these stdtype_infoinstances or the string identifier is used to compare against thecorresponding stdtype_info instance for the catch handler In-heritance adds an additional layer of complexity as each candidatetype must also enumerate and compare its public base classes whenmatching against the catch block type

Seeking to reduce binary sizes we introduce an optimisationthat both reduces the space required to represent types and ad-dresses concerns with execution time determinism This solution ispredicated upon the fact that each type requires solely some uniqueidentifier and not the full stdtype_info instance

For each unique type of exception object thrown the compilerwill automatically emit a char-sized variable with minimum align-ment whose identifier is composed of the name of the type prefixedwith __typeid_for_ Pointer types will have an integer prefix cor-responding to the number of levels of indirection Type qualifierssuch as const and volatile are ignored

This ldquotype idrdquo variable will use its address to represent a uniquetype identifier with which types might be compared for equalityIn a throw expression the type field of the exception state will beassigned the address of the ldquotype idrdquo matching the thrown type

To handle polymorphic types for each unique type of exceptionthrown an additional array of addresses will be emitted containingthe addresses of the ldquotype idrdquo variables for the base types In similarfashion to the ldquotype idrdquo variables its identifier is prefixed with__typeid_bases_for_ followed by the name of the type For thosetypes which do not have base classes a single shared array withthe identifier __typeid_empty_bases will be emitted containingonly a null entry

Testing whether a given type is a base class of another type isthen a matter of getting the pointer to the base class array (which isknown at compile-time) and iterating through its fixed-size list ofbases When an exception is thrown its corresponding base classarray will be assigned to the base_types field of the exceptionstate

This same iteration of base classes applies when throwing andcatching pointer types While the type field will contain a ldquotypeidrdquo identifying the pointer the base_types field will correspondto the base type array of the pointee type

In both the ldquotype idrdquo and base-array cases to ensure both emis-sion and uniqueness these variables will be marked as being weaksymbols an indication given to the linker that only one instance ofeach variable will be preserved in the final binary

27 Exception Object AllocationWhen throwing an exception the exception object must be allo-cated such that it is constructed in the scope where it was thrownbut is also accessible to the scope where it is caught It is not enoughto simply allocate the object on the stack despite the fact that the

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 8: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

bool type_is_base(void type const char bases) noexcept

for ( bases[0] = nullptr bases++)

if (static_castltconst voidgt(bases[0]) == type)

return true

return false

Figure 8 Example iteration over base entries the candidatebase is passed to the type parameter and the array of baseldquotype idrdquos to the bases parameter

exception will cause the stack to be unwound as local object de-structors will run as the stack is unwound potentially overwritingthe exception object Allocating the exception object on the heapis a solution but the allocation may result in system calls which(a) potentially have unbounded execution time and (b) may failif memory is exhausted As our target is to support deterministicexceptions standard heap allocation is not an option

We propose a statically sized buffer using a simple constant-time stack allocator for our exception object with a heap-backedfall-back upon buffer exhaustion for those applications who can-not compute or do not require run-time determinism The size ofthis buffer will be given a default value by the ABI in use butwill be optionally application-overridden at link-time via a linkercommand-line parameter to optimise or prevent any and all heapallocation

In the worst-case scenario our solutionmatches the performanceof current exception implementations as at least that of g++ uses asimilar buffer However as we free our exception objects as earlyas possible we expect usage of this buffer to be less than otherimplementations and with our novel ability to customise its size(and in combination with earlier improvements) we uniquely offerdeterministic allocation

3 STANDARDS COMPLIANCEIn this section we consider the compliance of our exception im-plementation against the ABI specification and the C++ standardsdocument

31 ABI ComplianceExisting ABI specifications mandate the use of tables and manualstack unwinding However since using the existing function returnmechanism is a superior method of achieving stack unwinding ourimplementation deviates entirely from their design Therefore wewill not evaluate our solution against any existing ABI specifica-tions A side effect of this is that our own ABI is significantly easierto implement as the stack unwinding and exception handling codeis generated in a platform-independent way by the compiler

32 C++ Standard ComplianceIn general our implementation conforms to the C++ standardrsquos re-quirements for exceptions However we have observed four clauseswhere there are deviations

Clause 1814 describes the interaction between the exceptionobject and stdexception_ptr which is designed for exceptionobjects allocated on the heap This is therefore not suited to ourstack-based implementation however our replacement providessimilar functionality

Clause 1822 requires local temporaries such as return valuesto have their destructors called when unwinding However clangrsquosexisting exception implementation does not conform to this partof the specification and as a result our implementation also doesnot conform This is a bug in the clang compiler (on which ourimplementation is based) and if this is repaired our implementationwill also be repaired

Clauses 1823 and 1824 require the destruction of base classinstances and sub-objects already constructed when an exceptionoccurs during a constructor This feature is not yet implementedand will be addressed in future work however it does not requireany conceptual changes as following from its simplicity and simi-larity to local object destruction we foresee no issues that mightoccur in its implementation

4 EVALUATIONWe have implemented our novel scheme in the clang compiler andpresent results showing the performance of our implementationcompared to the existing implementation

41 Experimental Set-upFor our experiments we used two different machines with twodifferent platforms

bull x86-64 Machine Desktop computer running Ubuntu 1804 Intel8700k 37-47GHz 32GB 3200MHz DRAM Samsung 980 EvoNVMe storage

bull ArmMachine Raspberry Pi 1 Model B running Raspian StretchLite ARM1176JZFS ARMv6 700MHz 512 MB DRAM 16GBSD storage

All binaries were built with identical flags other than those deter-mining the type of exceptions to use Run-time type informationis disabled (-fno-rtti) the symbol table is removed (-s) and-O3 optimisation with link-time optimisation (-flto) is employedClang-70 was used as the compiler along with libstdc++ andlibgcc on Arm and libc++ and libunwind on x86

411 Benchmark Application Lacking an obvious existing realisticbenchmark with which to test exceptions we instead developed ourown microbenchmark xmlbench is a simple XML parser targetingease of use and functionality as a real XML parser It organicallyincludes a good mixture of functions that can and can not throwexceptions and also those which do throw exceptions and thosewhich are ldquoexception neutralrdquo ie allowing exception propagationBy supplying different XML files as input to the parser we canprecisely control the rate of exceptions Errors in input XML syntaxand external conditions (such as the input file missing or an out-of-memory condition) can be influenced in a realistic way Thisapproach to benchmarking is similar to [15] who also use an XMLparser as a representative application

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 9: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

Platform Implementation SizeArm Standard 59188 bytesArm Deterministic 22068 bytesx86-64 Standard 105312 bytesx86-64 Deterministic 18600 bytes

Table 2 Final binary sizes for xmlbench benchmark with thestandard and deterministic exception implementations

42 ResultsThe following sections detail the results of our observations onvarious facets of the exception handling infrastructure

421 Unwind Library Overhead Statically-linked binaries partici-pate in whole-program optimisation and as such have the exceptionhandling and unwind functions removed when using our imple-mentation This allows for a direct comparison against binaries thatuse the standard implementation

Table 2 shows the results of the binary built with both the stan-dard and the deterministic exception implementations The resultsfor both x86-64 and Arm show substantial decreases in binary sizewhen using our deterministic implementation This measurementindicates how large the unwinding code is Armrsquos unwinding codeis smaller but by removing it we see a decrease in binary size of363 Kilobytes a 627 reduction for our program x86-64 has amuch larger unwind library with our implementation saving 847Kilobytes an 823 reduction in size for identical functionality

On both platforms the code section (text) grows due to theinclusion of additional instructions for checking the exception stateOn x86-64 the removal of the unwind tables compensates for thisgrowth with a reduction of 03 However on Arm missed opti-mization opportunities by the compiler leads to an net increase ofroughly 8

422 Application Benchmark Performance We generated two setsof XML files one with syntax errors and one without and parsedthem with xmlbench The syntax error is contained within thedeepest element resulting in an exception thrown with the stackat its largest number of function calls For each run we measuredthe total run-time of the xmlbench program including the start-uptime

Figure 9 shows the average execution time in microseconds perXML element parsed by our benchmark for both the Arm (Figure 9a)and x86-64 (Figure 9b) platforms On the Arm platform the resultsshow there is little difference between execution times except forthe standard exception implementation generally performing theworst of the four when faced with an exception As was expectedthe standard implementation performs better when no exception israised with only 156 elements as our implementation must checkfor active exceptions This difference disappears as more elementsare processed

On the x86-64 platform our deterministic implementation takesslightly longer per element than the standard implementation forlarger numbers of nodes This is primarily due to the overhead ofchecking whether an exception has occurred As is expected thedifference in time for our implementation depending on whether

Elems Throw Time (σ )Standard Impl

Time (σ )Determ Impl

Arm

156 427ms (06ms) 417ms (06ms)156 437ms (06ms) 430ms (00ms)3906 1587ms (21ms) 1627ms (15ms)3906 1607ms (15ms) 1620ms (20ms)

x86-64

3905 33ms (06ms) 33ms (06ms)3905 33ms (06ms) 33ms (06ms)97655 680ms (10ms) 707ms (06ms)97655 690ms (10ms) 703ms (06ms)

Table 3 Overall execution time summary for xmlbench onthe Arm and x86-64 platforms

an exception was thrown or not is very small particularly in com-parison to the same for the standard implementation which at itspeak has a 23 overhead

The benchmark clearly shows how throwing a single exceptionincreases execution time for the standard exception implementationgiven the lengthy unwind procedure However with our implemen-tation the improvements to exception handling speed are not quiteenough to compensate for the overhead in the absence of excep-tions With more than a single exception and particularly giventhe results for 488281 elements where the trend suggests that ourimplementation scales better than the existing one it should beenough for our solution to out-perform the current one

As shown in Table 3 our implementation creates a small ad-ditional execution time overhead in comparison to the standardimplementation however the difference is negligible and withinthe standard deviation

423 Exception Propagation To test exception propagation weused a different benchmark which simply called the same functionrecursively as many times as indicated by Stack Frames with a localobject having a simple custom destructor within each frame AfterStack Frames calls an exception is thrown and caught by referencefrom the first invocation The benchmark was compiled with thesame flags as xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 4

The results show a significant difference in execution time be-tween the two exception implementations Our implementationis on average 98times faster at performing exception propagation onx86-64 The factor of improvement increases from 68times to 138times asthe number of stack frames increases indicating that the perfor-mance penalty seen in the current implementation increases as thenumber of frames increases Our implementation performs best onx86-64 with deep call stacks

On Arm the improvement in execution time remains almostconstant at 32times to 33times for 100 and 1000 frames and 35times speedupfor 10000 frames This suggests that our implementation will con-sistently offer improved performance independent of the numberof frames

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 10: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

0

50

100

150

200

250

300

156 781 3906 19531

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(a) Arm

066

068

07

072

074

076

078

08

082

084

086

3905 19531 97655 488281

Exe

c T

ime p

er

ele

ment

(uS)

XML Elements

Standard No ErrorStandard With Error

Deterministic No ErrorDeterministic With Error

(b) x86-64

Figure 9 Execution times in micros per XML element for different element counts with our xmlbench benchmark on Arm (a) andx86-64 (b) Lower is better

Stack Frames Implementation Time σ

Arm

100 Standard 13513micros 743micros100 Deterministic 417micros 06micros1000 Standard 96ms 02ms1000 Deterministic 03ms 00ms10000 Standard 948ms 10ms10000 Deterministic 27ms 01ms

x86-64

100 Standard 1147micros 216micros100 Deterministic 17micros 08micros1000 Standard 25720 micros 2312micros1000 Deterministic 290 micros 00micros10000 Standard 88238micros 9600micros10000 Deterministic 640micros 64micros

Table 4 Execution times for the exception propagationbenchmark

The improvements in execution time are a result of not havingto resolve the encoded unwind information and walk the stacktwice to locate the catch handler before unwinding

424 Re-Throwing To measure exception re-throwing we usedanother benchmark which called the same function recursively asmany times as indicated by Stack Frames with a local object havinga custom destructor within each frame After Stack Frames calls anexception is thrown but unlike before caught and re-thrown byeach function The benchmark was compiled with the same flagsas xmlbench

The CPU time was measured between the first function call andarrival in the catch handler The experiment was run multiple timesfor each configuration and the mean and standard deviation of theresults are tabulated in Table 5

These results show a similar pattern to those for straight propa-gation On x86-64 for 10000 stack frames the current implemen-tation takes almost 3times longer to catch and re-throw than simplypropagating while our implementation is 38times slower than whensimply propagating This factor of slowdown is to be expected

Stack Frames Implementation Time σ

Arm

100 Standard 41ms 01ms100 Deterministic 02ms 00ms1000 Standard 335ms 04ms1000 Deterministic 18ms 00ms10000 Standard 3184ms 63ms10000 Deterministic 182ms 01ms

x86-64

100 Standard 10717micros 3633micros100 Deterministic 227micros 138micros1000 Standard 38ms 01ms1000 Deterministic 02ms 00ms10000 Standard 220ms 03ms10000 Deterministic 24ms 01ms

Table 5 Execution times for the exception re-throwingbenchmark

given how fast our implementation is when not re-throwing butalso due to the multiple steps required to move the exception ob-ject into and then out of each stack frame This is clearly a goodtarget for optimisation as such transfer is unnecessary Despitethis our implementation yielded a 9times speedup over the standardimplementation

For 1000 stack frames on x86-64 the difference between execu-tion times is larger with our implementation executing 19times fasterWith 100 frames the results have a very high standard deviation butthere is a clear order of magnitude between the two times indicat-ing that the difference in performance remains largely independentof the number of frames

On Arm the additional time taken to initialise the exceptionstate in our implementation is evident as it takes roughly fivetimes longer than in the propagation benchmark The standardimplementation also suffers a performance hit but one which de-creases as the number of frames increases suggesting that a largepart of it is constant This matches the fact that in the standard

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 11: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

CC rsquo19 February 16ndash17 2019 Washington DC USA James Renwick Tom Spink and Bjoumlrn Franke

implementation the exception object is allocated on the heap creat-ing a large up-front cost but one which is amortised across manyre-throwings

In reality however re-throwing at every frame is highly un-usual behaviour making both this amortisation and our worseningperformance unlikely to be noticed

43 Timing and Memory DeterminismWe base our understanding of the language and algorithmic require-ments for determinism on Verber and Colnaric [20] In general therequirements for deterministic execution fall into two categoriesdeterministic memory usage and deterministic execution time

Formemory usage we can further subdivide the requirement intostack usage and heap usage When no exception occurs our stackusage is static A fixed-size allocation for the exception state occursper noexcept function invocation throwing destructor invocationand try block entry

When an exception is thrown the object is either allocated with afixed size determined by the catch block or with a variable size thatcan be bounded by the maximum allocation size of all exceptionobjects thrown The functions called by our implementation havea well-defined order and number known at compile-time and arenot recursive While pointers are used to store this address theiraddresses are fixed within the range of the exception object bufferor the current stack frame

Our implementation does not require heap allocation Howeverit is possible for the program to allocate an arbitrary amount ofmemory if local object destructors throw exceptions For exam-ple during stack unwinding destructors may throw an exceptioncausing their own local objects to be destroyed which in turn maythrow additional exceptions

However since this potential execution flow is known at compile-time and is akin to normal function calls made directly via theirfunction definition rather than via function pointer static analysistools are able to give a bound for the number and type of exceptionsthrown and thus the maximum size of allocation Furthermore dueto the nature of stack unwinding recursion is not possible exceptwhere introduced explicitly by the developerrsquos code and thus thenumber of stack frames is known

Deterministic execution time generally refers to the ability tocalculate the worst-case execution time (WCET) as demanded bymost real-time systems For execution time unlike with existingimplementations it is trivial to calculate the WCET of all of ouroperations Aside from the allocation and initialisation of the localexception state and the checking of the active flag (all of whichhave deterministic timing) we have three operations

The first allocates the exception object in the exception buffer Inthis case the allocator is a stack allocator which merely advancesa pointer except where the developer does not specify a correctmaximum bound on exception allocation The second frees theexception object which again is trivially a pointer decrement Thefinal function resolves the base class ldquotype idrdquos when matchingpolymorphic exception types against catch handlers Our imple-mentationrsquos novel approach guarantees a fixed number of iterationsthrough the list of base classes again known at compile time andderived from the filter of the catch block

Therefore should a WCET analysis tool give correct predictionsfor programs not using exceptions it would also give such predic-tions for our implementation meeting the criteria for determinism

5 RELATEDWORKModern exception handling including the semantics of raising(throwing) and handling (catching) exceptions exception hierar-chies and control flow requirements were first introduced in [6]

In [11] inherent overheads both in terms of execution time andmemory usage in the current C++ implementations of exceptionsare identified A further evaluation on how C++ features impactembedded systems software is presented in [16] This report sin-gles out exceptions and its prerequisite run-time type information(RTTI) as ldquothe single most expensive feature an embedded designmay considerrdquo

Lang and Stuart [12] describe stack unwinding as it relates toexceptions in real-time systems and show how the worst-case ex-ecution time of the stack unwinding is unbounded This has laterbeen confirmed in [9] For our own definition of determinism wetake insight from [20] which details many of the requirementsnecessary for languages to be susceptible to time analysis

A new exception handling approach which is better suited tostatic analysis is presented in [13] but it requires substantial coderewriting SetJmpLongJmp and table-based exception handling arediscussed in[3] The results demonstrate large increases in binarysize and an execution time overhead of 10-15 Gylfason and Hjalm-tysson [8] develop a variant of table-based exception handling foruse within the Itanium port of the Linux kernel

Most relevant to the work present in this paper is [19] It focuseson propagating exceptions down the stack by duplexing the returnvalue of functions so as to fit the exception object itself within theregister or stackmemory used for the return value and importantlyto re-use the existing function return mechanism to perform stackunwinding This approach hinges on two complex changes to C++rsquosfunction calling convention though This kind of change would besignificant breaking compatibility with C code and all existing C++libraries Instead our novel exception implementationmaintains theexisting exception model by finding a new way to store and matchexceptions at run-time with little to no additional performanceimpact when exceptions are not thrown

6 SUMMARY amp CONCLUSIONSIn this paper we have developed a novel implementation of C++ ex-ceptions which better suits the requirements of embedded systemswhile still upholding the C++ core design tenets Our implemen-tation shows binary size decreases of up to 823 over existingimplementations performance improvements of up to 325 whenhandling exceptions minimal execution time overhead on x86-64and none on the Arm platform We have shown that our implemen-tation achieves memory and execution time determinism makingit significantly better-suited to calculating worst-case executiontimes a major hurdle in the adoption of exceptions

Our future work will investigate further code size improvementsresulting from code optimizations and shrinking of the exceptionstate object while reducing execution time by allocating the activeflag in the exception state to a register

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References
Page 12: Edinburgh Research Explorer · 2019-01-16 · low-cost stack unwinding mechanism [19] for C++ exceptions. In our novel C++ exception implementation we make use of a stack-allocated

Low-Cost Deterministic C++ Exceptions for Embedded Systems CC rsquo19 February 16ndash17 2019 Washington DC USA

REFERENCES[1] David W Binkley 1997 C++ in Safety Critical Systems Ann Softw Eng 4 1-4

(Jan 1997) 223ndash234 httpdlacmorgcitationcfmid=590565590588[2] Beman Dawes David Abrahams and R Rivera 2014 Boost C++ libraries

httpwwwboostorgdoclibs1_55_0libslibrarieshtm[3] Christophe de Dinechin 2000 C++ Exception Handling for IA64 In Proceedings

of the First Workshop on Industrial Experiences with Systems Software WIESS 2000October 22 2000 San Diego CA USA Dejan S Milojicic (Ed) USENIX 67ndash76httpwwwusenixorgeventswiess2000dinechinhtml

[4] Bruce Eckel Chuck D Allison and Chuck Allison 2003 Thinking in C++ Vol 2(2 ed) Pearson Education

[5] C++ exception handling 2012 ARM Compiler toolchain Compiler Reference[6] John B Goodenough 1975 Exception Handling Issues and a Proposed Nota-

tion Commun ACM 18 12 (Dec 1975) 683ndash696 httpsdoiorg101145361227361230

[7] Google 2018 Google C++ Style Guide httpsgooglegithubiostyleguidecppguidehtml

[8] Halldor Isak Gylfason and Gisli Hjalmtysson 2004 Exceptional KernelndashUsingC++ exceptions in the Linux kernel

[9] Wolfgang A Halang and Matjaž Colnarič 2002 Dealing with exceptions in safety-related embedded systems IFAC Proceedings Volumes 35 1 (2002) 477ndash482

[10] Pieter Hintjens 2014 ZeroMQ The Guide httpzeromqorg

[11] ISOIEC JTC1 SC22 WG21 2006 ISOIEC TR 18015 Technical Report on C++Performance Technical Report ISOIEC TR 180152006 ISOIEC httpswwwisoorgstandard43351html

[12] Jun Lang and David B Stewart 1998 A Study of the Applicability of Exist-ing Exception-handling Techniques to Component-based Real-time SoftwareTechnology ACM Trans Program Lang Syst 20 2 (March 1998) 274ndash301httpsdoiorg101145276393276395

[13] Roy Levin 1977 Program Structures for Exceptional Condition Handling TechnicalReport Carnegie-Mellon Univ Pittsburgh PA Dept of Computer Science

[14] MarkMaimone 2014 C++ OnMars In Proceedings of the C++ Conference (CppCon2014)

[15] Prakash Prabhu Naoto Maeda Gogul Balakrishnan Franjo Ivančić and AartiGupta 2011 Interprocedural exception analysis for C++ In European Conferenceon Object-Oriented Programming Springer 583ndash608

[16] Ceacutesar A Quiroz 1998 Using C++ efficiently in embedded applications In Pro-ceedings of the Embedded Systems Conference

[17] Richard Smith et al 2017 Working draft standard for programming languageC++ Technical Report Technical Report

[18] Alexander D Stoyenko 2012 Constructing predictable real time systems Vol 146Springer Science amp Business Media

[19] Herb Sutter 2018 P0709 Zero-overhead deterministic exceptions Throwing valuesTechnical Report R0 SG14

[20] Domen Verber and Matjaž Colnarič 1996 Programming and time analysis ofhard real-time applications IFAC Proceedings Volumes 29 6 (1996) 79ndash84

  • Abstract
  • 1 Introduction
    • 11 Motivating Example
    • 12 Contributions
      • 2 Deterministic C++ Exceptions
        • 21 Throws Exception Specifier
        • 22 Throwing Exceptions and Exception State
        • 23 Handling Exceptions
        • 24 Re-Throwing
        • 25 Throwing Destructors
        • 26 Exception Object Type Identification
        • 27 Exception Object Allocation
          • 3 Standards Compliance
            • 31 ABI Compliance
            • 32 C++ Standard Compliance
              • 4 Evaluation
                • 41 Experimental Set-up
                • 42 Results
                • 43 Timing and Memory Determinism
                  • 5 Related Work
                  • 6 Summary amp Conclusions
                  • References