Quantifying Uncertainty in Points-To Relations
Constantino Ribeiro and Marcelo Cintra
University of Edinburghhttp://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA
LCPC 2006 2
ContributionsScope
– Measure and compare sizes of static vs. dynamic points-to sets from context- and flow-sensitive algorithm
Goal – Quantification of may-alias behavior that is intrinsic to
applications– Classification of reasons for difference between static
prediction and run-time behaviorRelevance
– Important step toward future aggressive (speculative) optimizations
This work is not about a new pointer analysis algorithm
LCPC 2006 3
Outline
MotivationPointer Analysis Evaluation MethodologyExperimental Setup and Results Related Work Conclusions
LCPC 2006 4
Compiler Optimizations
To make good optimizations a compiler must have accurate knowledge of:– Data flow:
Redundant variable eliminationConstant propagationRegister allocation
– Control flow:Dead code eliminationInstruction scheduling
LCPC 2006 5
Data Flow Analysis
Data flow analysis: difficult to achieve 100% of precision– Use of pointers variables
Same pointer may refer to different memory objects at different timesSame pointer may refer to many memory objects at some program point
– Use of proceduresSide effects caused by call by reference and access to global data
– Presence of control flow structuresMultiple def-use chains
LCPC 2006 6
Real Points-to Behavior
So we want to– Understand the points-to behavior in real applications– Discover the causes of the ambiguities from static
analysis– Facilitate more aggressive optimizations for ambiguous
points-to
LCPC 2006 7
Outline
MotivationPointer AnalysisEvaluation Methodology Experimental Setup and Results Related Work Conclusions
LCPC 2006 8
Points-to analysis
3
4
87
12
Data Dependence Analysis for pointer variablesAt each point of the program: set of pointer variables and the locations that they point toPointer variables may point to an address or to many addressesPointer variables can even point to other pointersMany possible points-to targets restrict optimizations in conservative compilersProcedures and their calls increase complexity and time of the analysis
LCPC 2006 9
Types of Algorithms
Sensitivity:
Flow-sensitive + Context-sensitive → more precise analysisGranularity:
Fine: individual fields of complex data structuresCoarse: whole data structures and arrays
Naming of dynamically created memory objects:Single name “heap”Per memory allocation sitePer context
Context-sensitive: points-to sets within procedures are computed for each call site
Context-insensitive:
Flow-sensitive: points-to sets are computed for each program point
Flow-insensitive:
LCPC 2006 10
Formal Representation
Location sets or locsets: individual named memory locations where:– Points-to relations (R): tuples (p,v) where
p: pointer v: location set
– P and V: set of pointers and location sets whereR ⊂ P × V : points-to relation
– Every tuple (p, v) ∈ R means: pointer p may point to location set v
p → v
– Points-to graph:G = (N, E) of N = P ∪ V nodes and E = R edges
LCPC 2006 11
Formal Representation
Analysis: compute points-to graph to:– Basic dataflow equations that make pointer manipulation
operations:p1 = &p2; (Address-of assignment)p1 = p2; (Copy assignment)p1 = *p2; (Load assignment)*p1 = p2; (Store assignment)
– Resulting in: points-to graph to all points-to relationships:
Definitely points-toPossibly points-to
LCPC 2006 12
Formal Representation
Where:
Definitely points-to:R = {(p, v)} only p = &v
Possibly points-to:R = {(p, v),(p, z)} either p = &v
or p = &z
LCPC 2006 13
Causes of Uncertainty in Pointer Analysis
Control flowPointer arithmeticUnavailable procedure codeRecursive data structuresAggregate data structuresDynamically allocated objects
LCPC 2006 14
Outline
MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and Results Related Work Conclusions
LCPC 2006 15
Static Source Code Analysis
An extension of Rugina and Rinard’s Context- and flow-sensitive pointer analysis algorithm with following new features:– Number of accesses with pointer de-reference– Number of used and modified locsets that occurs just before of:
Indirect use of a variable : ... = *p;Indirect modification of a variable: *p = ...;Multi-level indirect use of variable: ... = * * p;Multi-level indirect modification of variable: * * p = ...;Procedure call: foo(..., *p, ...);
– Loops : one instance of the cases above per pointer de-reference– Procedures : one instance of each pointer de-reference per calling
context
LCPC 2006 16
Run-time Statistics Collection
Our tool inserts additional profiling code that:– Records all different run-time memory addresses– Counts the number of accesses to each different address
Each run-time access has a unique identifier (source code number) that matches the run-time / static accessProblem:– Possible mismatches between static and dynamic:
Multiple static accesses may map to the same source code line with the same run-time counter:
– The pointer analysis algorithm separates static accesses according to their context
Not all static accesses may appear at run time:– Portion of the code not executed due to input data
LCPC 2006 17
Outline
MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated Work Conclusions
LCPC 2006 18
Experimental Setup
Applications:– SPEC2000 integer
Except gcc, gap, vortex and eon
– MediaBench– SPEC2000 fp tried but found to be not interesting as a
pointer analysis problem
Standard input set used with run-time experiments
LCPC 2006 19
Applications Characteristics
797515,262 (950)17.5300.twolf
04887 (85)2.9256.bzip2
8310,5873,631 (917)12197.parser
6724,7164,920 (469)12Int186.crafty
1316506 (194)1.9SPEC181.mcf
4289603,959 (649)17175.vpr
431131,750 (246)9.1164.gzip
Pointer ModificationsPointer Uses
Total Location Sets
(Pointer)
Lines of Code (KLOC)
SuitApplication
LCPC 2006 20
Applications Characteristics
311681566 (599)5.8gsmdecode
022448 (133)5.8gsmencode
24122 (36)`1g721-dec
02393 (68)1g721-enc
851401,605 (295)4.9mpeg2dec
2761162,179 (455)8.5MediaBenchmpeg2enc
618531 (242)7.6unepic
1337397 (105)7.6epic
Pointer ModificationsPointer Uses
Total Location Sets
(Pointer)
Lines of Code
(KLOC)
SuitApplication
LCPC 2006 21
Static Analysis Tool
Extension of SPAN package that:– Records all instances of pointer de-references + number of
possible targets + source code line number– Uses and modifications via pointer de-references counted
separately– Static de-references to potentially uninitialized pointers use
a special location set (unk) and are counted separately– Static de-references to dynamically allocated memory use a
special location set (heap.X, where X is context id) and are counted separately
LCPC 2006 22
Static Analysis Results
N > 3N = 3N = 2N = 1
Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)
Application
nonenone6 (6, 0, 6)2 (2, 0, 2)
u: 3687m: 77
twolf
nonenonenoneu: 119m: 0
bzip2
7841 (181, 230, 259)31 (9, 4, 9)
36 (0, 0, 11)0 (0, 0, 0)
241 (241, 241, 35)32 (32, 32, 6)
u: 25178m: 20
parser
119 (0, 26, 24)0 (0, 0, 0)
2 (2, 2, 1)146 (146, 66, 13)
542 (534, 67, 59)47 (45, 11, 9)
u: 4970m: 479
crafty
6 (0, 0, 3)0 (0, 0, 0)
nonenoneu: 67m: 13
mcf
nonenonenoneu: 2488m: 428
vpr
nonenonenoneu: 277m: 43
gzip
LCPC 2006 23
Static Analysis Results
N > 3N = 3N = 2N = 1
Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)
Application
9 (0, 0, 9)0 (0, 0, 0)
nonenoneu: 346m: 31
gsmdecode
nonenonenoneu: 154m: 0
gsmencode
nonenonenoneu: 6m: 2
g721-dec
nonenonenoneu: 22m: 0
g721-enc
6 (6, 6, 1)10 (10, 10, 2)
none8 (8, 8, 2)0 (0, 0, 0)
u: 499m: 75
mepeg2dec
nonenonenoneu: 395m: 279
mpeg2enc
nonenonenoneu: 59m: 6
Unepic
nonenonenoneu: 156m: 13
epic
LCPC 2006 24
Profiling Environment
Monitor the actual run-time behaviour of static pointer de-references with multiple possible targets
SPAN extension include profiling code where:– static de-reference has multiple targets and then record the actual
address accessed + counter per address
Instrumented code is converted (SUIF format (.spd) to C code)
Compiled (Intel x86 platform, gcc 3.4.4, -O2 optimizationlevel)
LCPC 2006 25
Run-time Uncertainty
Modifications with N actual targetsUses with N actual targetsApplication
N > 2N = 2N = 1NEN > 2N = 2N = 1NE
----0090gsmdecode
10011002mpeg2dec
00020051twolf
801685027193parser
50017231159crafty
----0021mcf
119 (0, 26, 24)0 (0, 0, 0)
2 (2, 2, 1)146 (146, 66, 13)
542 (534, 67, 59)47 (45, 11, 9)
u: 4970m: 479
Crafty
59 + 1 + 24 = 84
59 + 1 + 1 + 23 = 84
N > 3N = 3N = 2N = 1
Uses (u) and Modifications (m) with N possible targets (including unk target, including heap target, number of source code lines)
Application
LCPC 2006 26
Causes of Uncertainty
Behaviour difference
ActualStatic
No change
Single target3 or more targets
3 or more targets2 targets
Single target2 targets(inclusive unk)
Not executed2 or more targets
Number of cases
Cause
95-
28Control path alternative never taken
9Pointer arithmetic to index into array-like object
2Use of structure fields
5Use of recursive data structures
2Use of arrays
22Pointer arithmetic to index into array-like object
6Pointer turns out to be always initialised
282-
LCPC 2006 27
Outline
MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated WorkConclusions
LCPC 2006 28
Related Work
Algorithms:– The basic SUIF1 package used in our study (SPAN) was introduced by
R. Rugina and M. Rinard (PLDI ‘1999);– E. M. Nystrom et al proposed a fast and efficient summary-based
pointer analysis algorithm (SAS ‘04);– M. Hind discussed main pointer analysis research and talked about
unsolved questions (PASTE ‘01) - SURVEY;Quantification of run-time behavior:– Few works investigated the impact of pointer analysis on overall
compiler optimization like B. Cheng and W. M. Hwu, M. Das et al, R. Ghiya et al (SIGPLAN ‘00 - PLDI , SAS ‘04, SIGPLAN ‘01 – PLDI);
– A attempted to quantify the run-time behavior of points-to sets was done by M. Mock et al (PASTE ‘01);
– D. Liang et al is similar to previous work but using Java programs (ISSTA ‘02);
LCPC 2006 29
Related Work
Speculative probabilistic analysis:– A quantitative computation of static points-to results against run-time
behavior in a probabilistic framework was proposed by Y. S. Hwang et al (LCPC ‘01)
– Support for speculative analysis of points-to was proposed by J. Lin, T. Chen et al (PLDI ‘03)
– G. Ramalingam proposed to extend static analysis with probabilistic information reflecting the actual run-time behavior (SIGPLAN ‘01 –PLDI)
LCPC 2006 30
Outline
MotivationPointer AnalysisEvaluation MethodologyExperimental Setup and ResultsRelated WorkConclusions
LCPC 2006 31
Conclusions
For most of the benchmarks static pointer analysis is very accurateFor some benchmarks up to 25% of the de-references cannot be statically fully disambiguated27% of these de-references access a single memory location at run time, but many do access several different memory locationsResults suggest further compiler optimizations exploiting cases where the uncertainty does not appear at run time – We need to improve the handling of pointer arithmetic – New probabilistic approaches that capture actual control flow
behavior
Quantifying Uncertainty in Points-To Relations
Constantino Ribeiro and Marcelo Cintra
University of Edinburghhttp://www.homepages.inf.ed.ac.uk/mc/Projects/VESPA
Top Related