Empirical Study of Object-layout Strategies and Optimization Techniques

download Empirical Study of Object-layout Strategies and Optimization Techniques

If you can't read please download the document

description

Empirical Study of Object-layout Strategies and Optimization Techniques. M.Sc. seminar (in the proceedings of ECOOP’2000). Natalie Eckel Supervisor: Dr. Joseph (Yossi) Gil Computer Science Department Technion - The Israel Institute of Technology. Outline. - PowerPoint PPT Presentation

Transcript of Empirical Study of Object-layout Strategies and Optimization Techniques

  • Empirical Studyof Object-layout Strategiesand Optimization TechniquesNatalie EckelSupervisor: Dr. Joseph (Yossi) Gil

    Computer Science DepartmentTechnion - The Israel Institute of TechnologyM.Sc. seminar (in the proceedings of ECOOP2000)

  • OutlineOverhead incurred due to multiple inheritance:VPTRs and VBPTRsThe separate compilation dilemmaHierarchies used in out experimentsDistribution of object sizeOptimization Techniques:Elimination of transitive virtual inheritanceInlining virtual basesBidirectional layout Hermaphrodite bidirectional layoutPacking VBPTRs

  • The Subobject RuleBasic rule of OO: if class B inherits from class A, then,Every object of B must have inside it a subobject of A.Example (B. Meyer): if SoftwareEngineer is an Engineer then,There is a part in every software engineer which is an engineer.Rationale: procedures and methods expecting objects of A, should be able to also operate on an object of type B.Software Engineer Engineer

  • The VPTRVPTR: virtual table pointer. A pointer leading from every object and every subobject to a table of virtual functions (and other RTTI).Single inheritance: VPTR can be shared between an object, its subobject, its sub-subobject, sub-subobject, etc.VPTR is laid out at offset 0Multiple inheritance: VPTR can only be shared with only one subobject.VPTRVirtual functions table (VTBL)Software Engineer Engineer

  • The VBPTRVBPTR: virtual base pointerAnswers the question: where is the subobject?Occurs only in multiple inheritance case.Rationale: the diamond problemIt is impossible for class Person to have a fixed offset with respect to both Teacher and Student.Solution:TeacherStudentTAPersonVPTRsVBPTRsPersonTeacherStudentTeachingAssistant

  • Experimental Setting

    HierarchyLanguageHierarchys weight in experimentsNumber of classesNumber of inheritance linksUnidrawC++7.2%613476SelfSelf21.1%18011838LaureLaure3.5%295315JDK 1.1Java19.3%16541927Eiffel 4Eiffel23.4%19992678EdLOV5.1%434750LOVLOV5.1%436774GeodeLOV15.4%13182785Total:100%855011543Used in benchmarking:68989616

  • No Dynamic MeasurementsObjective: estimate the saving for all possible object sizesThe chicken and egg problem: people may not use MI because of current overhead.Adds other factors:Selection of inputsHow to deal with libraries?Correlated instantiationsCache.

  • The Topology of Hierarchies

    HierarchyDepthAverage number of parentsPercentage of virtual basesAverage number of virtual basesUnidraw81.020.3%0.02Self161.050.2%0.73Laure111.073.7%2.86JDK 1.181.231.1%0.52Eiffel 4141.343.2%2.49Ed81.735.3%3.79LOV91.785.5%3.99Geode112.117.6%8.37Total:161.392.9%2.62

  • Overheads of Multiple InheritanceSpace Overhead: VPTR: if a class X inherits from n roots, then its objects will have at least n VPTRs in their layout.VBPTR: to every shared base, usually more than one Time Overhead: VPTR: add/subtract offset, i.e., this adjustment, in down- and up-casts (not dealt with here).VBPTR: follow pointers in up-casts.Inessential VBPTRs (used by some compilers): Add a transitive edge to shortcut every chain of VBPTRs.Minimizes time overhead.Induces space overhead.

  • Compilation ModelsGiven an inheritance link (a,b), is it Simple inheritance (no diamonds)?Virtual inheritance ?(diamond might show up later)Whole program analysisthe whole picture is available for compilation the compiler assigns virtual inheritance to solve diamond problemsSeparate compilationthe compiler must make the decision without seeing the whole pictureSolution: all inheritance links are treated as virtual C++ compilation modeluser takes the responsibility to assign virtual inheritancewe consider C++ compilers with whole program information

  • Distribution of Object SizeDefinition: object size is the total number of compiler generated fields in the layout of objects of a certain class

  • Cost of Using Separate Compilation Over C++ Compilation Model

  • Elimination of Transitive Virtual InheritanceA preliminary step to more sophisticated techniquesCan be done in any compilation modelVABvvVABthis edge is transitive!

  • The EfficacyDefinition: efficacy of optimization technique for a certain class is the relative reduction in object size for a class due to application of the technique

    Definition: accumulative efficacy=(x,y) means that x% of classes experience at least y% reduction in their object size

  • Efficacy of Elimination of Transitive Virtual InheritanceEliminates 4.1% of inheritance linksReduces the faction of virtual inheritance links from 35.2% to 28.6%Accumulative efficacy=(8%,8%)

  • Inlining of Virtual BasesInlining: Layout a virtual base inside a child, thus eliminating at least one VBPTR.Has a potential of saving a VPTR.A virtual base can be inlined into several children, as long as the shared inheritance semantics is obliged.Not without whole program analysis! Must examine descendants!Can we inline X into Y?No! But we have to see Z to understand why:Due to the repeated inheritance semantics of C++, class Z has two Y objects in it. If Y has X inlined into it, then there would be two copies of X in Z, which contradicts the C++ semanticsXZYWv

  • Inlining TechniquesDevirtualization of single virtual inheritanceV is inlined into E Simple InliningDevirtualization + inline into one childV is inlined into E and either A, B, C or DAggressive InliningFind a maximally independent set of children to inline intoClasses are independent if they dont share a descendantV is inlined into E , either A or B , either C or DVGDCFBAE

  • Efficacy of Inlining TechniquesSimple InliningAggressive Inliningvs.

    Technique:DevirtualizationSimpleAggressiveInlined fraction of inheritance links17%17%+2.4%17%+6.3%

    Average efficacy (for big objects)10-20%25-30%60-70%

    Accumulative efficacy(20%,25%)(30%,30%)(35%,50%)

  • Bidirectional Object LayoutIdea: use both ascending and descending memory addresses for object layoutOne VPTR can be saved in a marriage of a positive and a negative classC has mixed directionalityCB+A-ABCA-B+CStandard layout:Bidirectional layout:

  • Bidirectional Layout of Virtual Functions TableThe Virtual Function Table must also have a directionality.Positive classes: entries 0,1,2,Negative classes: -1, -2, .A-B+C-1-2-301234As virtual table Bs virtual table Functions introduced in C

  • The Theorem of MarriageThe BIG question: how to assign directionality to classes to maximize savings?Whole program analysis: various algorithms and heuristics possibleSeparate compilation: assign directionality at random! (actually use a good hash function)The theorem of marriage: With random assignments, a class that has n roots will enjoy an expected saving of at least: n/2/2 n/4. In other words, about half of all root classes will eventually find a mate.

  • Marriages of Non-Virtual and Virtual BasesOnes classes A and B are married in C, they remain married in all Cs descendantsHowever, marriage of virtual bases cannot be permanent.V1 and V2 are married in AV2 and V3 are married in BWhat happens in C?Each class marries its virtual bases independently of what its ancestors didTheorem: If there are n virtual base classes, then the number of marriages is n/2 - O(n) thats the expectation for separate compilation modelV1+BCAV2-V3+

  • Bidirectional Layout EfficacyC++ compilation model with inessential VBPTRsSeparate compilationwithout inessential VBPTRsApplied after Aggressive InliningBig objects have 20% of their size occupied by VPTRs5% savings for big objects a quarter of VPTRs as predicted(30%,30%)The number of VPTRs and VBPTRs is about the same15-20% for big objects almost a half of the VPTRs as predicted(60%,18%)

  • Hermaphrodite Bidirectional Object LayoutBidirectional layout drawback: two base classes with the same directionality will never be marriedHermaphroditing: a directed (hermaphrodite) class has two types of instances: positive and negativeTwo hermaphrodite classes can always be married

  • Efficacy of Hermaphrodite Bidirectional LayoutC++ compilation modelwith inessential VBPTRsSeparate compilationwithout inessential VBPTRs(33%,33%)Applied after Aggressive Inlining(50%,25%)Makes savings for all classes of size 2 and more!

  • Packing VBPTRsObservation: objects are laid out consecutive in memoryMotivation: In large objects VBPTRs occupy 80-90% of their sizeIdea: instead of using full blown pointers to virtual base sub-objects, use offsetsAssumption: machine word = 4 bytesSmall objects (under size 1K): an offset to a sub-object can be stored in one byte = 4 offsets in a wordLarger objects (under size 0.25MB): an offset could be stored in 2 bytes = 2 offsets in a wordClass can reuse empty slots in non-virtual basesCannot reuse empty slots in virtual base sub-objects

  • Efficacy of Packing in C++ Compilation Model2 slots in word4 slots in wordExpected savings: 4 slots in word: saves 60-70% in object size2 slots in word: saves 40-45% in object size

  • SummaryEvils of virtual inheritance and different compilation models.Distribution of object size votes against separate compilation.Optimization techniques: Inlining (not so trivial).Aggressive inlining.Bidirectional layout.Architectural support.Hermaphroditing ideaSecure savings for all sizes of objectsPossible run-time costs for checking the instance directionalityPacking VBPTRsThe bottom line: saving in the range of 40% can be achieved for all object sizes!!!

  • Future ResearchDynamic measurementsMore optimization techniquesEfficient implementation of Java interfaces

    Purpose of this work:Study the space overhead incurred due to multiple inheritance in different compilation modelsSuggest optimization algorithms for reducing object sizeBenchmarking

    Cost of using multiple inheritance (no data about data members)Compiler generated fields Dependent on particular compilation modelInside means at fixed offset known at compilation timeIn C++ virtual function have dynamic binding, I.e. an appripriate copy of a function is called according to the dynamic type of a object.

    Engineer shares a VPTR with Software Egineer, since a VPTR is always at offset 0 and both Enginerr sub-object and Software Engineer object start at the same address.Standard object layout.In C++ virtual and non-virtual inheritance. Dashed arrows symbolize virtual inheritance.TA shares VPTR only with Student, cannot share one with Teacher.JDK wasnt used for benchmarking because of a very restricted form of MI in JavaLOV: language similar to Eiffel, developed by Verilog, distributor if CASE toolLaure: Yves CaseauHierarchies weight: number of classes divided by total number of classes

    Chicken and the egg: on one side we want to estimate object sizes and savings in class hierarchies which are widely using multiple and repeated inheritance. However its very hard to find those hierarchies, since people afraid of using multiple inheritance because of its current overhead.Dynamic measurements:an application creates different objects with different inputsAsk Yossi: libraries?correlated instatiations: instatiations of one type of objects will cause instantiation of other types of objectsAsk Yossi: cache? Objects are saved for future use?

    Depth maximal length (in inheritance links) from a root class to a leafAll hierarchies are pretty shallow: their depth doesnt exceed the depth of AVL balanced treeAsk Yossi: Whats out-degree in AVL tree?Average number of parents: the extent of using multiple inheritanceIn single inheritance == 1Hierarchies presented in ascending order by this parameterEiffel, LOV extensive usage of MI, others not so much.

    Simple case: picture a class inheriting from n base classes (which are roots), it can share a VPTR only with one of them n VPTRs. The number of VPTRs will be more than n, if there are virtual bases involved, because a virtual base cannot share a VPTR with a derived class you need a fixed offset for this.

    Q. Why more than one VBPTR for a shared base? Show a diamond problem

    Time overhead (not dealt here)VPTRs: for calling virtual functions, this adjustment for MI.VBPTRs: accessing data members from a virtual base

    Separate compilation dilemma: While compiling a class B, we already compiler all its ancestors, but know nothing about the descendants

    C++ compilation model: the current C++ compilation model doesnt maintain whole program information. Our goal is to show the version of C++ compilation model which does use whole program information and to convince the reader that its profitable.

    250 = 250 and more

    7% of classes in separate compilation have more than 250 compiler generated fieldsC++: almost no classes of sizes more than 50The cheapest (in space) separate compilation without inessentials vs.The wasteful C++ compilation model with inessentialsNote: cheapest in space, means run-time cost.

    Less than 10% of classes, suffer from C++. Maybe draw an example where C++ is worse than separate compilation.50% of classes suffer from 150% increase in object size due to separate compilation.

    Saves both VBPTRs and VPTRsinside a child means at fixed offset - no need in VBPTR and, potentially a VPTR can be shared between child and a virtual base.Mention: that finding a maximal independent set of nodes in a graph, is an NP-hard problem, so we used the exhaustive (exponential) search for small cases (under 50 nodes) and a heuristic for big ones.Hyperbolic curves y=f/x, f-number of saved compiler generated fieldsRequires an architectural supportSimple case: No virtual inheritance, to repeated inheritance, full binary tree with a class as a root and n different leaves. If the directionalities are assigned at random to the leaves, each pair of leaves, that has a common parent can be married with probability 1/2. Before bidirectional layout a class had n VPTRs, there could be n/2 marriages, each saving a VPTR with probability 1/2 => n/2*1/2 = n/4 VPTRs saved.Marriage of non-virtual bases ifs permanent because the fixed offset should be remained.

    Virtual bases are marriage separately for each classIn C++ compilation with inessentials - in large objects 20% of the object size is occupied by VPTRsIn separate compilation without inessentials - in large objects 40-50% of the object size is occupied by VPTRs, all class ancestors are its virtual bases - no non-virtual bases existFor each directed class there are two kinds of instances - positive and negative ones. Hence two directed classes can always be marriage by choosing an appropriate directionlities for their instance.Requires a run-time resolution of directionlity of an object.Very successful in separate compilation since all class ancestors are its virtual bases.Q.Why can reuse slots in non-virtual basesThe offset remains fixedQ.Why cannot reuse slots in virtual bases?A.Requires more then one this adjustment in run-time Savings without using any other optimization techniquesWe get the expected savingsSeparate compilation compile time might be better, but by no means its better choice than whole program analysis regarding space.3% of classes are virtual bases