Efficient and Precise Points-toAnalysis2017.pdf · findbugs pmd Analysis Time (sec.) MAHJONG...
Transcript of Efficient and Precise Points-toAnalysis2017.pdf · findbugs pmd Analysis Time (sec.) MAHJONG...
Efficient and Precise Points-to Analysis:Modeling the Heap by Merging Equivalent Automata
Tian Tan, Yue Li and Jingling Xue
PLDI 2017
June, 2017
1
A New
Points-to Analysis Technique
for
Object-Oriented Programs
2
Points-to Analysis
Determines
◦ “which objects a variable can point to?”
3
Uses of Points-to Analysis
Clients Tools
Security analysis
Bug detection
Compiler optimization
Program verification
Program understanding
…
Chord
4
…
Uses of Points-to Analysis
Clients Tools
Security analysis
Bug detection
Compiler optimization
Program verification
Program understanding
…
Chord
5
…
Call Graph
Existing Call Graph Construction
6
On-the-fly construction
(run with points-to analysis)
◦ Precise
◦ Inefficient
Existing Call Graph Construction
7
On-the-fly construction
(run with points-to analysis)
◦ Precise
◦ Inefficient
3-object-sensitive points-to analysis
◦ Very precise
◦ Adopted by, e.g.,
7
Chord
3-Object-Sensitive Points-to Analysis
Analyze Java programs
◦ Intel Xeon E5 3.70GHz,128GB of memory
◦ Time budget: 5 hours (18000 secs)
8
3-Object-Sensitive Points-to Analysis
Analyze Java programs
◦ Intel Xeon E5 3.70GHz,128GB of memory
◦ Time budget: 5 hours (18000 secs)
9
Unscalable
(> 5 hours)
14469
(4 hours)
0 5000 10000 15000
findbugs
pmd
Analysis time (sec.)
Two Mainstreams of
Points-to Analysis Techniques Model control-flow
Model data-flow
10
Two Mainstreams of
Points-to Analysis Techniques Model control-flow
◦ Context-sensitivity
Call-site-sensitivity (PLDI’04, PLDI’06)
Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)
Type-sensitivity (POPL’11)
…
Model data-flow
11
Two Mainstreams of
Points-to Analysis Techniques Model control-flow
◦ Context-sensitivity
Call-site-sensitivity (PLDI’04, PLDI’06)
Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)
Type-sensitivity (POPL’11)
…
Model data-flow
◦ Heap abstraction
Allocation-site abstraction
Type-based abstraction
…12
Two Mainstreams of
Points-to Analysis Techniques Model control-flow
◦ Context-sensitivity
Call-site-sensitivity (PLDI’04, PLDI’06)
Object-sensitivity (ISSTA’02, TOSEM’05, SAS’16)
Type-sensitivity (POPL’11)
…
Model data-flow
◦ Heap abstraction
Allocation-site abstraction
Type-based abstraction
…13
Heap Abstraction
14
Infinite-size
heap
Finite
(abstract)
objects
Dynamic
execution
Static
analysis
abstracted
or
partitioned… …
Allocation-Site Abstraction
One object per allocation site
15
1 A a1 = new A();
2 A a2 = new A();
3 B b = new B();
Allocation-Site Abstraction
One object per allocation site
16
1 A a1 = new A();
2 A a2 = new A();
3 B b = new B();
o1
A
o2
A
o3
B
Allocation-Site Abstraction
One object per allocation site
◦ Adopted by all mainstream points-to analyses
17
1 A a1 = new A();
2 A a2 = new A();
3 B b = new B();
o1
A
o2
A
o3
B
Allocation-Site Abstraction
Over-partition for call graph construction
18
A::toString()
o1
A o2
A
void foo(Object o) {
o.toString();
}
1 A a1 = new A();
2 A a2 = new A();
3 foo(a1);
4 foo(a2);
o1
A
o2
A
Allocation-Site Abstraction
Over-partition for type-dependent clients
◦ Call graph construction
◦ Devirtualization
◦ May-fail casting
◦ …
19
1 A a1 = new A();
2 A a2 = new A();
3 foo(a1);
4 foo(a2);
o1
A o2
A
void foo(Object o) {
o.toString();
A a = (A) o;
}
o1
A
o2
A
Type-Based Abstraction
One object per type
20
1 A a1 = new A();
2 A a2 = new A();
3 B b = new B();
Type-Based Abstraction
One object per type
21
oB
oA
1 A a1 = new A();
2 A a2 = new A();
3 B b = new B();
Type-Based Abstraction
Precision loss for type-dependent clients
22
A a1 = new A();
A a2 = new A();
B b = new B();
C c = new C();
a1.f = b;
a2.f = c;
Object o = a1.f;
o.toString();
oB
oA
oC
Type-Based Abstraction
Precision loss for type-dependent clients
23
A a1 = new A();
A a2 = new A();
B b = new B();
C c = new C();
a1.f = b;
a2.f = c;
Object o = a1.f;
o.toString();
oA
oB
oA
oC
oB
oC
Type-Based Abstraction
Precision loss for type-dependent clients
24
A a1 = new A();
A a2 = new A();
B b = new B();
C c = new C();
a1.f = b;
a2.f = c;
Object o = a1.f;
o.toString();
oA
oB
oC
oB
oA
oC
oB
oC
Type-Based Abstraction
Precision loss for type-dependent clients
25
A a1 = new A();
A a2 = new A();
B b = new B();
C c = new C();
a1.f = b;
a2.f = c;
Object o = a1.f;
o.toString();B::toString()
C::toString()
oA
oB
oC
oB
oA
oC
oB
oC
Type-Based Abstraction
Precision loss for type-dependent clients
26
A a1 = new A();
A a2 = new A();
B b = new B();
C c = new C();
a1.f = b;
a2.f = c;
Object o = a1.f;
o.toString();B::toString()
C::toString()
oA
oB
oC
oB
oA
oC
oB
oC
False positive
Our Goal:
Improve Efficiency
Preserve Precision
27
MAHJONG:
A New Heap Abstraction
28
Improve Efficiency Adopted by
all mainstream
points-to analyses
Unscalable
(> 5 hours)
14469
(4 fours)
524
128
findbugs
pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction
MAHJONG:
A New Heap Abstraction
29
Unscalable
(> 5 hours)
14469
(4 fours)
524
128
findbugs
pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction
4400444016
pmd
#call graph edges
MAHJONG Allocation-site abstraction
Improve Efficiency
Preserve Precision
Adopted by
all mainstream
points-to analyses
MAHJONG:
A New Heap Abstraction
30
4400444016
pmd
#call graph edges
MAHJONG Allocation-site abstraction
Improve Efficiency
Preserve Precision How?
Adopted by
all mainstream
points-to analyses
Unscalable
(> 5 hours)
14469
(4 fours)
524
128
findbugs
pmd
Analysis Time (sec.)
MAHJONG Allocation-site abstraction
31
Merging Objects Over-Partition
Blindly Merging Objects Precision Loss
alleviate
cause
32
Merging Objects Over-Partition
Blindly Merging Objects Precision Loss
alleviate
cause
o1
A
o2
A
o3
B
o4
C
f
f
inconsistent
types
inconsistent
types
33
Merging Objects Over-Partition
Blindly Merging Objects Precision Loss
alleviate
cause
o1
A
o2
A
o3
B
o4
C
f
fo
A
oB
oC
f
f
inconsistent
types
Definition
OiT and Oj
T are type-consistent objects,
if for every sequence of field names,
= f1. f2. ... . fn :
OiT. and Oj
T. point to the objects of the
same types.
Type-Consistent Objects
34
f
ff
Definition
OiT and Oj
T are type-consistent objects,
if for every sequence of field names,
= f1. f2. ... . fn :
OiT. and Oj
T. point to the objects of the
same types.
Type-Consistent Objects
35
f
ff
MAHJONG only merges type-consistent objects
Type-Consistent Objects
Example
36
o2
T
o3
U
o6
X
o1
Tf
f
o7
Y
o9
Y
o5
X o11
Y
o4
U
g
h
h
k
o8
Y
g
h
k
Type-Consistent Objects
Example
37
o2
T
o3
U
o6
X
o1
Tf
f
o7
Y
o9
Y
o5
X o11
Y
o4
U
g
h
h
k
o8
Y
g
h
k
O1T O2
T
.f U U
.f.h Y Y
.g X X
.g.k Y Y
Type-Consistent Objects
Example
38
o2
T
o3
U
o6
X
o1
Tf
f
o7
Y
o9
Y
o5
X o11
Y
o4
U
g
h
h
k
o8
Y
g
h
k
O1T O2
T
.f U U
.f.h Y Y
.g X X
.g.k Y Y
∵
∴O1
T and O2T are
type-consistent objects
How to Check Type-Consistency?
39
Our Solution: Sequential Automata
40
Check
Type-Consistency
of Objects
Test
Equivalence
of Automata
Sequential Automata
6-tuple (Q, Σ, δ, q0, Γ, γ), where:
◦ Q is a set of states
◦ Σ is a set of input symbols
◦ δ is the next-state map: Q ×Σ P(Q)
◦ q0 is the initial state
◦ Γ is a set of output symbols
◦ γ is the output map: Q Γ
41
42
Check
Type-Consistency
of Objects
Test
Equivalence
of Automata
How?
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
43
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
44
objects ↔ states
O2T, O4
U, O6X, O8
Y
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
45
f, g, h, k
field names ↔ input symbols
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
46
o2
T
o6
X
f o4
U
o8
Y
g
h
k
field points-to map ↔ next-state map
O2T f O4
U
O2T g O6
X
O4U h O8
Y
O6X k O8
Y
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
47
O2T
checked object ↔ initial state
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
48
T, U, X, Y
types ↔ output symbols
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
49
object-to-type map ↔ output map
O2T T
O4U U
O6X X
O8Y Y
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Objects Automata
o2
T
o6
X
f o4
U
o8
Y
g
h
k
50
Test
Equivalence
of Automata
Objects Automata
A set of objects
A set of field names
The field points-to map
The object to be checked
A set of types
The object-to-type map
Q: a set of states
Σ: a set of input symbols
δ: the next-state map
q0: the initial state
Γ: a set of output symbols
γ: the output map
Check
Type-Consistency
of Objects
Test Equivalence of Automata
Hopcroft-Karp algorithm*
◦ Almost linear in terms of |Qlarger|
◦ Qlarger: set of states of the larger automaton
51
* J. E. Hopcroft and R. M. Karp, A linear algorithm for testing
equivalence of finite automata, Technical Report 71-114, 1971
Methodology
(MAHJONG)
52
53
Pre-Analysis NFA
Builder
Automata
Equivalence
Checker
Field Points-to
Graph (FPG)
Heap
Abstraction
DFAOiT≡ DFAOjT
?
Points-to AnalysisHeap
Modeler
∀ OiT, Oj
T
in FPG
MAHJONG
fast but imprecise
e.g., context-insensitive
precise but expensive
e.g., 3-object-sensitive
DFA
Converter
NFAOiT
NFAOjT
DFAOiTDFAOjT
Overview
Working with Points-to Analysis
Original New
Allocation-site heap
abstraction
MAHJONG heap
abstraction
54
… …
type-consistent objects
Implementation
1500 LOC of Java in total
Integrated with
Can also be easily integrated to other
points-to analysis frameworks
55
Evaluation
56
Evaluation - Research Questions
RQ1: MAHJONG’s effectiveness as a pre-analysis
RQ2: MAHJONG-based points-to analysis
57
RQ1: MAHJONG’s Effectiveness as A Pre-Analysis
Efficiency
◦ Is MAHJONG lightweight for large programs?
Heap partitioning
◦ Can MAHJONG avoid heap over-partition?
58
antlr fop luindex pmd chart checkstyle xalan bloat lusearch JPC findbugs eclipse
CI 44.1 34.7 26.2 44.8 37.7 89.6 66.6 38.7 41.4 58.9 90.6 174.1
FPG 1.3 0.7 0.8 1.4 2.4 2.3 3.0 1.2 0.8 2.1 4.6 15.5
MAHJONG 1.3 1.1 1.1 1.5 1.9 4.0 3.1 1.7 1.0 4.5 3.2 21.4
Total 46.7 36.5 28.1 47.7 42.0 95.9 72.7 41.6 43.2 65.5 98.4 211.0
59
In total: 1 minute
Each program (on average)
MAHJONG itself: 3.8 seconds
Pre-Analysis: Efficiency
CI: Context-Insensitive points-to analysis
FPG: Read Field Points-to Graph
MAHJONG: Check automata equivalence, build heap abstraction
60
77297159
6190
73638106
14337
6523
7807
10888 11181
14063
19529
2228 2474 21082727 3107
5285
22292942
4028 41425233
9414
0
5000
10000
15000
20000
Number of abstract objects created by the
allocation-site abstraction and MAHJONG
Allocation-Site Abstraction
MAHJONG
Average reduction: 62%
Pre-Analysis: Heap Partition
RQ2:
MAHJONG-Based Points-to Analysis Efficiency
◦ Can MAHJONG accelerate points-to analysis?
Precision
◦ Can MAHJONG preserve precision for
type-dependent clients?
61
Evaluated Points-to Analyses
5 mainstream context-sensitive
points-to analyses:
1. 2-call-site-sensitive analysis
2. 2-type-sensitive analysis
3. 3-type-sensitive analysis
4. 2-object-sensitive analysis
5. 3-object-sensitive analysis
Time budget: 5 hours
62
Evaluated Clients
Call graph construction
Devirtualization
May-fail casting
63
64
MAHJONG-Base Points-to Analysis: Results
Efficiency
Most precise
(3-object-sensitive)
Speedup: 131X
Call graph: -0.02%
Devirtualization: -0.29%
May-fail casting: -0%
Precision
65
MAHJONG-Base Points-to Analysis: Results
Efficiency
Most precise
(3-object-sensitive)
Speedup: 131X
Call graph: -0.02%
Devirtualization: -0.29%
May-fail casting: -0%
On average
Speedup: 15X
Call graph: -0.02%
Devirtualization: -0.18%
May-fail casting: -0.03%
Precision
66
MAHJONG-Base Points-to Analysis: Results
Efficiency
Most precise
(3-object-sensitive)
Speedup: 131X
Call graph: -0.02%
Devirtualization: -0.29%
May-fail casting: -0%
On average
Speedup: 15X
Call graph: -0.02%
Devirtualization: -0.18%
May-fail casting: -0.03%
Precision
For checkstyle, xalan, lusearch, JPC, findbugs
3-object-sensitive analysis:
• without MAHJONG, unscalable (> 5 hours)
• with MAHJONG, finish in 1min ~ 84 mins (33 minutes on average)
Conclusion
MAHJONG
◦ Improve significantly the efficiency of different
point-to analyses
Call-site-, object- and type-sensitivity
◦ Preserve almost the same precision for type-
dependent clients
Direct impact◦ Benefit many program analyses where call graphs are required
67
Thank you!
68