How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014
Yoyak ScalaDays 2015
Transcript of Yoyak ScalaDays 2015
Yoyak: static analysis framework
Heejong Lee
ScalaDays 2015
Speaker Introduction
• Has been working in a static analysis industry since 2008
• Studied programming language theory at a graduate school
• Has been developing several static analyzers which are mostly commercial ones
• Began to use Scala six years ago and still actively using it in everyday development
Agenda
• Static analysis
• Theory of abstract interpretation
• Yoyak framework: implementation highlights
• Yoyak framework: Scala experience
• Yoyak framework: Roadmap
Static Analysis
What is Static Analysis?
• Analyze source codes without actually running it
• Someone prefers to call it white box test
• Used for finding bugs, optimizing a compiled binary, calculating a software metric, proving safety properties, etc.
Examples of Static Analysis
• Finding bugs : symbolic execution
• Optimizing a compiled binary: data flow analysis
• Calculating a software metric: syntactic analysis
• Proving safety properties: model checking, abstract interpretation, type system
Two important terms in Static Analysis
• Soundness
• The analysis result should contain all possibilities which can happen in the runtime
• If the analysis uses an over-approximation, it is sound
• Completeness
• The analysis result should not contain any possibility which cannot happen in the runtime
• If the analysis uses an under-approximation, it is complete
Two important terms in Static Analysis
Over-approximation of Semantics
Program Semantics
Under-approximation of Semantics
Abstract Interpretation
The beauty of abstraction
http://cargocollective.com/carlyfox/Design
What is the result of this expression?
19224⇥ 7483919⇥ (11952� 20392)
What is the result of this expression?
19224⇥ 7483919⇥ (11952� 20392)
= �1214270048744640
How long does it take without a calculator?
What is the result of this expression?
19224⇥ 7483919⇥ (11952� 20392)
= �1214270048744640
What if we do not have an interest in the exact number, rather we just want to know whether it is positive or negative?
What is the result of this expression?
19224⇥ 7483919⇥ (11952� 20392)
+⇥ +⇥ �= �
↵
�
= n (n 2 Z ^ n < 0)
What is the result of this expression?
19224⇥ 7483919⇥ (11952� 20392)
= �1214270048744640
= n (n 2 Z ^ n < 0)
takes 30 seconds
takes 3 seconds
• inaccurate but not incorrect • accurate enough for a specific purpose • much faster than a real calculation
This is abstract interpretation
Is this program safe from buffer overruns?
void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {
index = index + 1;x = x - 1;
}strs[index] = "hello!";
}
No, ArrayIndexOutOfBoundsException may occur at the last line
void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {
index = index + 1;x = x - 1;
}strs[index] = "hello!";
}
index = [0,0]
index = [1,∞]
index = [0,∞]
• Roughly but soundly execute the program
Abstract interpretation for dummies
?
Abstract interpretation for brains
First, we need to precisely define what “domain” and “semantics” means in a mathematical way
Let me introduce you Javar language
1
1
What this program means?
Javar-1
C ! n (n 2 Z)
Javar-1 semantic domain
n 2 V alue = ZJCK 2 V alue
Javar-1 semantics
JnK = n
1+1
Javar-2
C ! n op n (n 2 Z, op 2 {+,�, ⇤, /})
Javar-{1,2} semantic domain
n 2 V alue = ZJCK 2 V alue
Javar-2 semantics
JnK = n
Jn1 + n2K = Jn1K + Jn2KJn1 � n2K = Jn1K � Jn2KJn1 ⇤ n2K = Jn1K ⇥ Jn2KJn1 / n2K = Jn1K ÷ Jn2K
x := x + 1
Javar-3
C ! x := E
E ! n (n 2 Z)| x| E op E (op 2 {+,�, ⇤, /})
Javar-3 semantic domain
M 2 Memory = V ar ! V alue
n 2 V alue = Zx 2 V ar = V ariables
JCK 2 Memory ! Memory
JEK 2 Memory ! Z
Javar-3 semantics
Jx := EKM = M{x ! JEKM}JnKM = n
JxKM = M(x)
JE1{+,�, ⇤, /}E2KM = JE1KM{+,�,⇥,÷}JE2KM
x := 100 + 2; if(x) x := x * 10 else x := x / 2; while(x) x := x - 1
Javar-4
C ! x := E
| if (E) C else C
| while (E) C
| C;C
E ! n (n 2 Z)| x| E op E (op 2 {+,�, ⇤, /})
Javar-{3,4} semantic domain
M 2 Memory = V ar ! V alue
n 2 V alue = Zx 2 V ar = V ariables
JCK 2 Memory ! Memory
JEK 2 Memory ! Z
Javar-4 semantics
Jx := EKM = M{x ! JEKM}Jif(E) C1 else C2KM = if JEKM 6= 0 then JC1KM else JC2KM
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
JnKM = n
JxKM = M(x)
JE1{+,�, ⇤, /}E2KM = JE1KM{+,�,⇥,÷}JE2KM
This is not a definition
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
GNU = GNU’s Not Unix
The existence and uniqueness of the fixed-point is guaranteed by domain theory
Jwhile(E) CKM = if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
Jwhile(E) CK = �M.if JEKM 6= 0 then Jwhile(E) CK(JCKM) else M
F = �M.if JEKM 6= 0 then F (JCKM) else M
F = H(F )
Jwhile(E) CK = fix(�F.�M.if JEKM 6= 0 then F (JCKM) else M)
Abstract interpretation revisited
• Safely estimate program semantics in a finite time
• Abstraction is not omission, guarantees soundness
• Most of static analysis techniques can be defined in a form of abstract interpretation
Key Elements of Abstract Interpretation
• Domain : concrete domain, abstract domain
• Semantics : concrete semantics, abstract semantics
• Galois connection : pair of abstraction and concretization functions
• CPO : complete partial order
• Continuous function : preserving upper bound
Galois Connection
8x 2 D, x 2 D : ↵(x) v x () x v �(x)
x
x
↵
�
D D
CPO
exists partial order ⊑
exists element x where x ⊑ y (for all y ∈ D)
for all ordered subset of D, there exists upper bound x where x ∈ D
Lattices
Partially ordered set in which every two elements have a unique LUB(⊔)
and a unique GLB(⊓)
Continuous Function
x
D
8ordered subset S ✓ D,F (G
x2S
x) =G
x2S
F (x)
D
y
z
F (x)
F (y)
F (z)
Abstract Interpretation in a NutshellConcrete Abstract
Program Semantics
Domain D should be CPO should be CPO
Galois Connection
Semantic Function F should be continuous should be monotonic
Program Execution
F : D ! D F : D ! D
lfp F =G
i2NF i(?)
G
i2NF i(?) v X
↵ : D ! D � : D ! D
Performing analysis using abstract interpretation = calculating in a finite time X
And the following formula is always satisfied (soundness guarantee)
lfp F v �X
Abstract Interpretation in a Nutshell
lfp F v �X
false positives
lfp F
X
lfp F
↵ � F v F � ↵
D D
Is this program safe from buffer overruns?
void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {
index = 1;} else {
index = 10;}strs[index] = "hello!";
}
void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {
index = 1;} else {
index = 10;}strs[index] = "hello!";
}
index = [0,0]
index = [1,1]
index = [10,10]
index = [1,10]
Interval analysis based on abstract interpretation
• Concrete domain: the domain in the real world
Memory = V ar ! V alue
V alue = 2Z
C 2 C ! Memory ! Memory
V 2 E ! Memory ! V alue
Interval analysis based on abstract interpretation
• Concrete semantics: the semantics in the real world
C x := E m = m{x 7! V E m}C if(E) C1 C2 m = V E m ? C C1 m : C C2 m
C while(E) C m = V E m ? C while(E) C (C C m) : m
C C1;C2 m = C C2 (C C1 m)
V x m = m x
V n m = {n}V E1 + E2 m = (V E1 m) + (V E2 m)
Interval analysis based on abstract interpretation
• Concrete execution of a program
? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) = F i+1(?)
is the execution result of a programF
i(?) 2 Memory
F = �m.C C m
lfp F =G
i2NF i({})
Interval analysis based on abstract interpretation
• Abstract domain: the domain we will use in an analysis
ˆMemory = V ar ! ˆ
V alue
ˆV alue = Z [ {?}
Z = {[a, b] | a 2 Z [ {�1}, b 2 Z [ {1}, a b}C 2 C ! ˆ
Memory ! ˆMemory
V 2 E ! ˆMemory ! ˆ
V alue
ㅗ
[0,0] [1,1] [2,2] ……..[-1,-1][-2,-2][-3,-3]
[-1,0] [0,1] [0,2][-2,-1][-3,-2]
[-3,-1] [-2,0] [-1,1] [0,2]
[-2,1][-3,0] [-1,2]
……..
[-∞,∞]
[0,∞]
[-1,∞]
[-2,∞]
……..
[-∞,0]
[-∞,1]
[-∞,2]
…….……
………………
………………..…
……..
…….……
………………
………………..…
Lattice of Interval Domain
Interval analysis based on abstract interpretation
• Abstract semantics: the semantics we will use in an analysis
C x := E m = m{x 7! V E m}C if(E) C1 C2 m = C C1 m t C C2 m
C while(E) C m = m t C while(E) C (C C m)
C C1;C2 m = C C2 (C C1 m)
V x m = m x
V n m = ↵{n}V E1 + E2 m = (V E1 m)+(V E2 m)
Interval analysis based on abstract interpretation
• Abstract execution of a program
is the analysis result of a program
F = �m.C C mG
i2NF i({}) v X
? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) v X
X
Interval analysis based on abstract interpretation
• Widening
What if this chain has infinite length?
? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i(?) v X
? @ F (?) @ F (F (?)) @ F (F (F (?)))... @ F i�1(?)rF i(?) v X
rWe need a widening operator
Interval analysis based on abstract interpretation
• Widening
? @ [0, 0] @ [0, 1] @ [0, 2]... @ [0, i� 1] r [0, i] v [0,1]
void foo(int x) {String[] strs = new String[10];int index = 0;while(x > 0) {
index = index + 1;x = x - 1;
}strs[index] = "hello!";
}
index = [0,0]
index = [1,∞]
index = [0,∞]
Is this program safe from buffer overruns?
void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {
index = 1;} else {
index = 10;}strs[index] = "hello!";
}
Interval analysis based on abstract interpretation
0
213 4
5 6
index = 0; if(x > 0) index = 1 else index = 10; result = index
C C0 m = C C2 (C C1 m)
C C1 m = m{index 7! ↵{0}}C C2 m = C C4 (C C3 m)
C C3 m = C C5 m t C C6 m
C C4 m = m{result 7! m index}C C5 m = m{index 7! ↵{1}}C C6 m = m{index 7! ↵{10}}
Interval analysis based on abstract interpretation
C C0 {} = C C2 (C C1 {})C C1 {} = {index 7! [0, 0]}
C C2 {index 7! [0, 0]} = C C4 (C C3 {index 7! [0, 0]})C C3 {index 7! [0, 0]} = C C5 {index 7! [0, 0]} t C C6 {index 7! [0, 0]}C C4 {index 7! [1, 10]} = {index 7! [1, 10], result 7! [1, 10]}C C5 {index 7! [0, 0]} = {index 7! [1, 1]}C C6 {index 7! [0, 0]} = {index 7! [10, 10]}
C C0 {} = {index 7! [1, 10], result 7! [1, 10]}
void foo(int x) {String[] strs = new String[10];int index = 0;if(x > 0) {
index = 1;} else {
index = 10;}strs[index] = "hello!";
}
index may have an integer between 1 and 10
Since the size of the buffer strs is 10, ArrayIndexOutOfBoundsException may occur here
Is this program safe from buffer overruns?
YoyakDo not reinvent the wheel
https://trimaps.com/assets/website/dontreinventthemap-6ba62b8ba05d4957d2ed772584d7e4cd.png
Motivation
• Do no reinvent the wheel : many components that static analyzers often use are reusable
• CFG data types : construction, optimization, visualization
• Graph algorithms : unrolling loops, finding loop heads, finding topological order
• Intermediate language data types : construction, optimization, pretty printing
• Common abstract domains : integer interval, abstract object, abstract memory
• Common abstract semantics : assignment, invoking methods, evaluating binary expressions
Motivation
• Perfect to be a framework : the theory of abstract interpretation guarantees soundness and termination of the analysis if a user supplies valid abstract domain and semantics
Generic fixed point computation engine
Abstract domain D Abstract semantics F
Fixed point x = F(x) (x∈D)
OverviewYoyak
Abstract Domain Fixed Point Computation Abstract Semantics
MapDom
MemDom
Interval
ArithmeticOps
LatticeOps
StdSemanticsForwardAnalysis
AbstractTransferable
Widening
Galois
ILFlowSensitiveFixedPoint
Computation
Worklist
WideningAtLoopHeads
InterproceduralIteration
DoWidening
CommonIL
Attachable
Typable
Fixed-point Computation in Yoyak
Built-in work-list algorithm
x := 10
Assume (y == 0) println(“0”)
println(“2”)
Assume (y != 0)
Assume (y == 1) println(“0”) Assume (y != 1)
Assume (z) throw new Ex();
ENTRY
EXIT
Assume (!z) println(“done”) return;
def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = { worklist.add(startNodes:_*) var map = MapDom.empty[BasicBlock,D] while(worklist.size() > 0) { val bb = worklist.pop().get val prevInputs = memoryFetcher(map,bb) val prev = getInput(map,prevInputs) val (mapOut,next) = work(map,prev,bb) val orig = map.get(bb) val isStableOpt = ops.<=(next,orig) if(isStableOpt.isEmpty) { println("error: abs. transfer func. is not distributive") } if(!isStableOpt.get) { val widened = if(widening.nonEmpty) { doWidening(widening.get)(orig,next,bb) } else next map = mapOut.update(bb->widened) val nextWork = getNextBlocks(bb) worklist.add(nextWork:_*) } } map
Fixed-point Computation in Yoyak
Built-in work-list algorithm
trait FlowSensitiveFixedPointComputation[D<:Galois] extends FlowSensitiveIteration[D] with CfgNavigator[D] with DoWidening[D] {
def computeFixedPoint(startNodes: List[BasicBlock])(implicit widening: Option[Widening[D]] = None) : MapDom[BasicBlock,D] = {
class FlowSensitiveForwardAnalysis[D<:Galois](val cfg: CFG)( implicit val ops: LatticeOps[D], val absTransfer: AbstractTransferable[D], val widening: Option[Widening[D]] = None) extends FlowSensitiveFixedPointComputation[D] with WideningAtLoopHeads[D] {
Abstract Semantics in Yoyak
Built-in work-list algorithm
trait AbstractTransferable[D<:Galois] { protected def transferIdentity(stmt: Identity, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssign(stmt: Assign, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferInvoke(stmt: Invoke, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferIf(stmt: If, input: D#Abst)( implicit context: Context) : D#Abst = input protected def transferAssume(stmt: Assume, input: D#Abst)( implicit context: Context) : D#Abst = input
// so on
Abstract Semantics in Yoyak
Built-in standard semantic
trait StdSemantics[A<:Galois,D,Mem<:MemDomLike[A,D,Mem]] extends AbstractTransferable[GaloisIdentity[Mem]] { val arithOps : ArithmeticOps[A]
override protected def transferAssign(stmt: Assign, input: Mem)( implicit context: Context) : Mem = { val (rv,output) = eval(stmt.rv,input) output.update(stmt.lv,rv) }
Abstract Domain in Yoyak
Composable abstract domains
class MapDom[K,V <: Galois : LatticeOps] {
trait LatticeOps[D <: Galois] extends ParOrdOps[D] { def \/(lhs: D#Abst, rhs: D#Abst) : D#Abst def bottom : D#Abst
trait ParOrdOps[D <: Galois] { def <=(lhs: D#Abst, rhs: D#Abst) : Option[Boolean]
trait Galois { type Conc type Abst
Abstract Domain in Yoyak
Built-in Interval Domain
scala> import com.simplytyped.yoyak.framework.domain.arith._ import com.simplytyped.yoyak.framework.domain.arith._
scala> import com.simplytyped.yoyak.framework.domain.arith.Interval._ import com.simplytyped.yoyak.framework.domain.arith.Interval._
scala> val intv1 = Interv.of(10) intv1: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(10),IInt(10))
scala> val intv2 = Interv.in(IInt(-10),IInt(10)) intv2: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInt(10))
scala> val intv3 = Interv.in(IInfMinus,IInf) intv3: com.simplytyped.yoyak.framework.domain.arith.Interval = IntervTop
scala> val intv4 = Interv.in(IInt(-10),IInf) intv4: com.simplytyped.yoyak.framework.domain.arith.Interval = Interv(IInt(-10),IInf)
Abstract Domain in Yoyak
Built-in Interval Domain
scala> import IntervalInt.arithOps import IntervalInt.arithOps
scala> arithOps.+(intv1,intv2) // [10,10] + [-10,10] res1: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20))
scala> arithOps.-(intv1,intv2) // [10,10] - [-10,10] res2: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(0),IInt(20))
scala> arithOps.+(intv2,intv3) // [-10,10] + [-∞,∞] res3: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop
scala> arithOps.*(intv2,intv4) // [-10,10] * [-10,∞] res4: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = IntervTop
scala> arithOps.*(intv1,intv4) // [10,10] * [-10,∞] res5: com.simplytyped.yoyak.framework.domain.arith.IntervalInt#Abst = Interv(IInt(-100),IInf)
Abstract Domain in Yoyak
Built-in Standard Object Model
trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] {
implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D]
def update(kv: (Loc,AbsValue[A,D])) : This def remove(loc: Local) : This def alloc(from: Stmt) : (AbsRef,This) def get(k: Loc) : AbsValue[A,D] def isStaticAddr(addr: AbsAddr) : Boolean def isDynamicAddr(addr: AbsAddr) : Boolean
class MemDom[A <: Galois : ArithmeticOps, D <: Galois : LatticeWithTopOps] extends StdObjectModel[A,D,MemDom[A,D]] {
Abstract Domain in Yoyak
Built-in Memory Domain
scala> import com.simplytyped.yoyak.framework.domain.mem.MemDom scala> import com.simplytyped.yoyak.framework.domain.mem.MemElems._ scala> import com.simplytyped.yoyak.framework.domain.Galois._ scala> import com.simplytyped.yoyak.framework.domain.arith.Interv scala> import com.simplytyped.yoyak.framework.domain.arith.IntervalInt scala> import com.simplytyped.yoyak.il.CommonIL.Value._
scala> val memory = new MemDom[IntervalInt,SetAbstraction[String]] memory: com.simplytyped.yoyak.framework.domain.mem.MemDom[com.simplytyped.yoyak.framework.domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[String]] = com.simplytyped.yoyak.framework.domain.mem.MemDom@8443a1
Abstract Domain in Yoyak
scala> val memory2 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(1)))
scala> val memory3 = memory.update(Local("x") -> AbsArith[IntervalInt](Interv.of(10)))
scala> val memory4 = MemDom.ops[IntervalInt,SetAbstraction[String]].\/(memory2,memory3)
scala> memory4.get(Local("x")) res1: com.simplytyped.yoyak.framework.domain.mem.MemElems.AbsValue[com.simplytyped.yoyak.framework.domain.arith.IntervalInt,com.simplytyped.yoyak.framework.domain.Galois.SetAbstraction[String]] = AbsArith(Interv(IInt(1),IInt(10)))
Built-in Memory Domain
IL in Yoyak
CommonIL
abstract class Stmt extends Attachable { override def equals(that: Any): Boolean = this eq that.asInstanceOf[AnyRef] override def hashCode() : Int = System.identityHashCode(this)
private[Stmt] def copyAttr(stmt: Stmt) : this.type = {sourcePos = stmt.pos; this} }
IL in Yoyak
CommonIL
case class Block(stmts: StatementContainer) extends Stmtcase class Switch(v: Value.Loc, keys: List[Value.t], targets: List[Target]) extends Stmtcase class Placeholder(x: AnyRef) extends Stmt
sealed trait CoreStmt extends Stmtcase class If(cond: Value.CondBinExp, target: Target) extends CoreStmtcase class Goto(target: Target) extends CoreStmt
sealed trait CfgStmt extends CoreStmtcase class Identity(lv: Value.Local, rv: Value.Param) extends CfgStmtcase class Assign(lv: Value.Loc, rv: Value.t) extends CfgStmtcase class Invoke(ret: Option[Value.Local], callee: Type.InvokeType) extends CfgStmtcase class Assume(cond: Value.CondBinExp) extends CfgStmtcase class Return(v: Option[Value.Loc]) extends CfgStmtcase class Nop() extends CfgStmtcase class EnterMonitor(v: Value.Loc) extends CfgStmtcase class ExitMonitor(v: Value.Loc) extends CfgStmtcase class Throw(v: Value.Loc) extends CfgStmt
IL in YoyakStmt
x := 10; switch (y) { case 0: println(“0”); break; case 1: println(“1”); default: println(“2”); } if(z) { throw new Exception(); } else { println(“done”); } return 0;
x := 10; if(y == 0) { println(“0”); goto D; } if(y == 1) { println(“1”); } D: println(“2”); if(z) { throw new Exception(); } else { println(“done”); } return 0;
CoreStmt
x := 10
Assume (y == 0) println(“0”)
println(“2”)
Assume (y != 0)
Assume (y == 1) println(“0”) Assume (y != 1)
Assume (z) throw new Ex();
ENTRY
EXIT
Assume (!z) println(“done”) return;
CfgStmt
Simple Interval Analysis in Yoyakclass IntervalAnalysis(cfg: CFG) { def run() = { import IntervalAnalysis.{memDomOps,absTransfer,widening} val analysis = new FlowSensitiveForwardAnalysis[GMemory](cfg) val output = analysis.compute output }}
object IntervalAnalysis { type Memory = MemDom[IntervalInt,SetAbstraction[Any]] type GMemory = GaloisIdentity[Memory] implicit val absTransfer : AbstractTransferable[GMemory] = new StdSemantics[IntervalInt,SetAbstraction[Any],Memory] { val arithOps: ArithmeticOps[IntervalInt] = IntervalInt.arithOps }
implicit val memDomOps : LatticeOps[GMemory] = MemDom.ops[IntervalInt,SetAbstraction[Any]] implicit val widening : Option[Widening[GMemory]] = { implicit val NoWideningForSetAbstraction = Widening.NoWidening[SetAbstraction[Any]] Some(MemDom.widening[IntervalInt,SetAbstraction[Any]]) }}
Simple Interval Analysis in YoyakMemDom
StdObjectModel
MapDom
AbsValue
AbsRef
AbsArithIntervalInt
AbsBoxSetAb[Any]
AbsBottom
AbsTop
AbsObject
AbsAddr
IntervalAnalysis
FlowSensitiveForwardAnalysis
FlowSensitiveFixedPointComputation
Worklist
LatticeOps
FlowSensitiveIterationAbstract
Transferable
CfgNavigator
WideningAtLoopHeads
Widening
MapDom
BasicBlock
MemDom
MemDom.op
IntervalInt.widening
IntervalAnalysisTransferFunction
CFG
Fixed-point result
StdSemantics
ArithmeticOps
IntervalInt.arithOps
Yoyak : Scala Experience
• Scala is a very good language to implement a static analyzer
• Function is a first class citizen
• Type class support
• Algebraic data type support
• Native support for mutable and immutable values
• Excellent support for parallelization
Yoyak : Scala Experience
• Function is a first class citizen
Natural way to express mathematical logic
// optimize Cfg(insertAssume _ andThen removeIfandGoto) apply rawCfg
Yoyak : Scala Experience
• Type class support
Can avoid F-bounded polymorphism which is the fast lane to overworking
• F-bounded polymorphism
• Commonly happen when inheritance meets immutability
• Seriously deteriorate code readability
Yoyak : Scala Experience• F-bounded polymorphism
trait Queue[T, This <: Queue[T, This]] {def push(elem: T) : This
}trait GoodQueue[T, This <: GoodQueue[T, This]] extends Queue[T, This] {
def pop : (T, This)}trait BetterQueue[T, R, This <: BetterQueue[T, R, This]] extends GoodQueue[T, This] {
def giveMeSomethingNew : R}trait QueueUnited[T, R, Q <: Queue[T, Q], G <: GoodQueue[T, G], B <: BetterQueue[T, R, B], This <: QueueUnited[T, R, Q, G, B, This]] extends BetterQueue[T, R, This] {
def giveUp : Unit}
• Always need the type of concrete subclass • Reiterate all type variables again in subclass reference • Type class liberates methods from inheritance
Yoyak : Scala Experience• Type class
trait QueueLike[T,This] {def push(elem: T) : This
}trait GoodQueueLike[T,This] {
implicit val queueLike : QueueLike[T,This]def push(elem: T) : This = queueLike.push(elem)def pop(q: This) : (T,This)
}trait BetterQueueLike[T,R,This] {
implicit val goodQueueLike : GoodQueueLike[T,This]def push(elem: T) : This = goodQueueLike.push(elem)def pop(q: This) : (T,This) = goodQueueLike.pop(q)def giveMeSomethingNew : R
}class QueueUnited[T,R,This](implicit val q : QueueLike[T,This], g : GoodQueueLike[T,This], b : BetterQueueLike[T,R,This]) {
def push(elem: T) : This = b.push(elem)def pop(q: This) : (T,This) = b.pop(q)def giveMeSomethingNew : R = b.giveMeSomethingNewdef giveUp : Unit = {}
}
Yoyak : Scala Experience• Type class in Yoyak
trait StdObjectModel[A<:Galois,D<:Galois,This<:StdObjectModel[A,D,This]] extends MemDomLike[A,D,This] with ArrayJoinModel[A,D,This] { implicit val arithOps : ArithmeticOps[A] implicit val boxedOps : LatticeWithTopOps[D]
Use both methods in an appropriate place
Yoyak : Scala Experience
• Algebraic data type support
Natural way to express an abstract syntax tree of a program
;
if(x)
a = 1 a = 2
println(a)
Seq( If(“x”,Assign(“a”,1), Assign(“a”,2)), Invoke(“println”,List(“a”)))
Yoyak : Scala Experience
• Algebraic data type support
Easy to navigate the abstract syntax tree
def eval(v: Value.t, input: Mem)(implicit context: Context) : (AbsValue[A,D],Mem) = { v match { case x : Value.Constant => evalConstant(x,input) case x : Value.Loc => evalLoc(x,input) case x : Value.BinExp => evalBinExp(x,input) case Value.This => (AbsRef(Set("$this")),input) case Value.CaughtExceptionRef => (AbsRef(Set("$caughtex")),input) case Value.CastExp(v, ofTy) => evalLoc(v,input) case Value.InstanceOfExp(v, ofTy) => (AbsTop,input) case Value.LengthExp(v) => (AbsTop,input) case Value.NewExp(ofTy) => input.alloc(context.stmt) case Value.NewArrayExp(ofTy, size) => input.alloc(context.stmt)
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
In some cases, mutability is more important than immutability
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“A”
memory.filter{_._2 == object}.foldLeft(memory) { case (m,(k,_)) => m + (k -> newObject)}
O(n)
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
NewObject
f
g
2
“A”
object.update(newObject) O(1)
Yoyak : Scala Experience
• Native support for mutable and immutable values
Memory
x
y
z
Object
f
g
1
“A”
NewObject
f
g
2
“A”
If we frequently update immutable objects in a big memory, it may result in severe inefficiency
Yoyak : Scala Experience
• Excellent support for parallelization
• Static analysis does not sufficiently utilize today’s advancement of computing scalability (multicore machines, big data technologies, cloud computing)
• Scala has a perfect platform to experiment parallelization which called Akka
• Many fun things to try with Yoyak powered by Akka
Yoyak : Scala Experience• Excellent support for parallelization
Worklist Parallelization can be naturally
implemented by Akka’s Actor model
Yoyak : Roadmap
• Add more built-in abstract domains
• Optimize analysis performance
• Visualize analysis details
• Build Scala compiler plug-in
Yoyak : Roadmap
• Add more built-in abstract domains
Interval domain cannot represent the relation between two variables
x = [2,8], y = [1,7] produce 49 combinations of (x,y) pairs
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
X Axis
Y A
xis
Yoyak : Roadmap
• Add more built-in abstract domains
Octagon domain can represent the relation between two variables
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
X Axis
Y A
xis
http://www.di.ens.fr/~mine/publi/article-mine-HOSC06.pdf
Yoyak : Roadmap
• Add more built-in abstract domains
2-interval domain is more precise than interval domain
100 1 2 3 4 5 6 7 8 9
10
0
1
2
3
4
5
6
7
8
9
X Axis
Y A
xis
Yoyak : Roadmap
• Optimize analysis performance
• {Worklist, Method, Class}-level parallelization
• Reduce abstract memory size by removing unused variables (faster join operation for abstract memory)
• Optional faster but unsound analysis
Yoyak : Roadmap
• Visualize analysis details
It is hard to know what a static analyzer is doing at a specific moment because…
• Static analyzer’s behavior is very different for each input program
• Often need to inspect and compare a map with thousands of entries
• Unable to look over the big picture by ordinary Java debuggers
Yoyak : Roadmap
• Visualize analysis details
Example from SAT solvers
Visualization of the search tree generated by a basic DPLL
algorithm
DPVis
Yoyak : Roadmap
• Build Scala compiler plug-in
• Programming language researchers foresee that the semantic program analyzer will be merged with compiler systems in the near future as the type system did
Syntactic Analysis Grammar Checking Type System Semantic Analysis
Yoyak : Roadmap
• Build Scala compiler plug-in
• Scala compiler is well modularized, cleanly coded (as compared to other compiler systems), so it is an excellent platform for experimenting new ideas
• Pure Scala code is safe from null, however linked Java libraries are not
• It would be great if Scala compiler can detect possible null dereferences at a compile time and issue a warning