SymPy.stats: Uncertainty...

26
Introduction Modeling Uncertainty Multi-Compilation Conclusion SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July, 2012

Transcript of SymPy.stats: Uncertainty...

Page 1: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

SymPy.stats: Uncertainty Modeling

Matthew Rocklin

University of Chicago

July, 2012

Page 2: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Topics

Uncertainty

Modeling

Symbolics

Multi-Compilation

Page 3: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Data-Driven Statistics vs. Uncertainty Modeling

Last Talk - Pandas: DataAnalysis

1950 1960 1970 1980 1990 2000 2010 2020Year

310

320

330

340

350

360

370

380

390

400

CO

2 -

(ppm

)

CO2 levels observed at Mauna Kea

Figure: Historical CO2 Data fromMauna Kea observatory

This Talk - SymPy.StatsMathematical Modeling withUncertainty

Figure: CO2 Predictions withuncertainty bounds

Page 4: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Uncertainty

Windspeed30 km/hr

Wind Power50 KW

Electricity40 KW

Lights are on!

Page 5: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Uncertainty

Windspeed30 km/hr

Wind Power50 KW

Electricity40 KW

Lights are on!

Page 6: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Uncertainty

Windspeed30 km/hr

Wind Power50 KW

Electricity40 KW

Lights are on!

Page 7: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Uncertainty

Windspeed30 km/hr

Wind Power50 KW

Electricity40 KW

Lights are on!

Page 8: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Uncertainty

Measurements are Distributions, not Scalars

WindSpeed:Around 30km/hr

WindPower:Electricity:

Lights stay on?

Page 9: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Direct Solution

# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4

# S o l v ew h i l e ( y > y f ) :

t+=dty = y0 + v∗ s i n ( t h e t a ) ∗ t

+ g∗ t ∗∗2 / 2}

x = x0 + v∗ cos ( t h e t a ) ∗ t

gv

Page 10: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Direct Solution

# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4

# S o l v ew h i l e ( y > y f ) :

t+=dty = y0 + v∗ s i n ( t h e t a ) ∗ t

+ g∗ t ∗∗2 / 2}

x = x0 + v∗ cos ( t h e t a ) ∗ t

gv

Page 11: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Modeling

# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4

# D e f i n e Modelt = Symbol ( ’ t ’ )x = x0 + v∗ cos ( t h e t a ) ∗ ty = y0 + v∗ s i n ( t h e t a ) ∗ t

+ g∗ t ∗∗2 / 2i m p a c t t i m e = s o l v e ( y − yf , t )x f = x0 + v∗ cos ( t h e t a ) ∗

i m p a c t t i m e

gv

Page 12: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Full SymPy

# D e f i n e i n p u t sx0 = Symbol ( ’ x 0 ’ )y0 = Symbol ( ’ y 0 ’ )y f = Symbol ( ’ y f ’ )g = Symbol ( ’ g ’ )v = Symbol ( ’ v ’ )t h e t a = Symbol ( ’ t h e t a ’ )

# D e f i n e Modelt = Symbol ( ’ t ’ )x = x0 + v∗ cos ( t h e t a ) ∗ ty = y0 + v∗ s i n ( t h e t a ) ∗ t

+ g∗ t ∗∗2 / 2i m p a c t t i m e = s o l v e ( y − yf , t )x f = x0 + v∗ cos ( t h e t a ) ∗

i m p a c t t i m e

gv

Page 13: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Graph

SymPy generates a graph

x

x0 v thetat

y

y0 g

yf

impact_time

xf

Page 14: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kinematics - Computing on a Graph

impact time =−v sin (θ) +

√−4gy0 + 4gyf + v2 sin2 (θ)

2g

xf = x0 +

v

(−v sin (θ) +

√−4gy0 + 4gyf + v2 sin2 (θ)

)cos (θ)

2g

Page 15: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Random Variables

Introducting SymPy.stats

>>> from sympy . s t a t s i m p o r t ∗>>> # v = Symbol ( ’ v ’ )>>> v = Normal ( ’ v ’ , 30 , 1)

>>> # P l o t t h e d e n s i t y>>> pdf = d e n s i t y ( v )>>> p l o t ( pdf ( z ) , ( z , 27 , 33) )

√2e−

12

(z−30)2

2√π

27 28 29 30 31 32 33

Distribution of velocity

>>> P( v > 31)

−1

2erf

(1

2

√2

)+

1

2== 0.1586...

Page 16: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Random Variables

Introducting SymPy.stats

>>> from sympy . s t a t s i m p o r t ∗>>> # v = Symbol ( ’ v ’ )>>> v = Normal ( ’ v ’ , 30 , 1)

>>> # P l o t t h e d e n s i t y>>> pdf = d e n s i t y ( v )>>> p l o t ( pdf ( z ) , ( z , 27 , 33) )

√2e−

12

(z−30)2

2√π

27 28 29 30 31 32 33

Distribution of velocity

>>> P( v > 31)

−1

2erf

(1

2

√2

)+

1

2== 0.1586...

Page 17: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Graph with Random Expressions

Random inputs cause other random expressions

x0 y0v theta g

yf

t

x y

impact_time

xf

Page 18: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Querying Random Expressions

a , b = symbols ( ’ a , b ’ )d e n s i t y ( x ) ( a ) ∗ d e n s i t y ( y ) ( b )

e−a2

t2 e−(b+5t2)

2

t2 e30√

2at e30

√2(b+5t2)

t

πt2e900

p l o t (P( y > y f ) , ( t , 2 . 7 , 3 . 4 ) )

4.5 5.0 5.5 6.0 6.5Time

0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2Probability that the cannon ball has not yet landed

E( i m p a c t t i m e ) ∫ ∞−∞

(v +√v2 + 1200

)e−

12

(v−30)2

20√π

dv

Page 19: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Random and Computational expressions

SymPy.stats does two things

1 Models uncertain systems2 Reduces uncertain expressions to computational ones

Functions P, E, sample, density, variance ::Random Expression → Computational Expression

P(v > 31)→∫∞

31

√2e−

12 (z−30)2

2√π

dz → −12 erf

(12

√2)

+ 12

E(impact time)→∫∞−∞

(v+√v2+1200)e−

12 (v−30)2

20√π

dv →?

But there are other ways to compute integrals

1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo

(E(impact time, numsamples=10000))

3 Code Generation

Page 20: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Random and Computational expressions

SymPy.stats does two things

1 Models uncertain systems2 Reduces uncertain expressions to computational ones

Functions P, E, sample, density, variance ::Random Expression → Computational Expression

P(v > 31)→∫∞

31

√2e−

12 (z−30)2

2√π

dz → −12 erf

(12

√2)

+ 12

E(impact time)→∫∞−∞

(v+√v2+1200)e−

12 (v−30)2

20√π

dv →?

But there are other ways to compute integrals

1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo

(E(impact time, numsamples=10000))

3 Code Generation

Page 21: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Random and Computational expressions

SymPy.stats does two things

1 Models uncertain systems2 Reduces uncertain expressions to computational ones

Functions P, E, sample, density, variance ::Random Expression → Computational Expression

P(v > 31)→∫∞

31

√2e−

12 (z−30)2

2√π

dz → −12 erf

(12

√2)

+ 12

E(impact time)→∫∞−∞

(v+√v2+1200)e−

12 (v−30)2

20√π

dv →?

But there are other ways to compute integrals

1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo (E(impact time, numsamples=10000))3 Code Generation

Page 22: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Other kinds of expressions

RV Type Computational Type

Continuous SymPy Integral

Discrete - Finite (dice) Python iterators / generators

Discrete - Infinite (Poisson) SymPy Summation

Multivariate Normal SymPy Matrix Expression

Page 23: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Kalman Filter

mu = Matr ixSymbol ( ’mu’ , n , 1) # n by 1 mean v e c t o rSigma = Matr ixSymbol ( ’ Sigma ’ , n , n ) # c o v a r i a n c e m a t r i xX = MVNormal ( ’X’ , mu, Sigma )

H = Matr ixSymbol ( ’H’ , k , n ) # An o b s e r v a t i o n o p e r a t o rdata = Matr ixSymbol ( ’ data ’ , k , 1)

R = Matr ixSymbol ( ’R ’ , k , k ) # c o v a r i a n c e m a t r i x f o r n o i s en o i s e = MVNormal ( ’ eta ’ , Z e r o M a t r i x ( k , 1) , R)

# C o n d i t i o n a l d e n s i t y o f X g i v e n HX+n o i s e==datad e n s i t y (X , Eq (H∗X+n o i s e , data ) )

[ I 0 ]([

Σ 00 R

] [HT

I] (

[ H I ][

Σ 00 R

] [HT

I])−1

([ H I ] [ µ0 ]− data) + [ µ0 ])

[ I 0 ](I−

[Σ 00 R

] [HT

I] (

[ H I ][

Σ 00 R

] [HT

I])−1

[ H I ]) [

Σ 00 R

] [ I0

]µ+ΣHT (R+HΣHT )

−1(Hµ−data)(

I−ΣHT (R+HΣHT )−1

H)

Σ

Page 24: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Scientific Computing Technology Stack

Math / PDE description

Linear Algebra/Matrix Expressions

Sparse matrix algorithms

Parallel solution /scheduler

C/FORTRAN

x86

Uncertainty

Scientific description

CUDA

PowerPC GPU SoC

Numerical Linear Algebra

Page 25: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

Scientific Computing Technology Stack

Math / PDE description

Linear Algebra/Matrix Expressions

Sparse matrix algorithms

Parallel solution /scheduler

C/FORTRAN

x86

Uncertainty

Scientific description

BLAS/LAPACK

PETSc/Trilinos

FEniCS

SymPy.stats

CUDA

PowerPC GPU SoC

Numerical Linear Algebra

gcc/nvcc

Page 26: SymPy.stats: Uncertainty Modelingpeople.cs.uchicago.edu/~mrocklin/tempspace/scipy2012-sympystats… · SymPy.stats: Uncertainty Modeling Matthew Rocklin University of Chicago July,

Introduction Modeling Uncertainty Multi-Compilation Conclusion

End

This work was a Google Summer of Code project.Contributors: Raoul Bourquin, Nathan Alison

GSoC Mentor: Andy TerrelSymPy: http://github.com/sympy/sympy

me: Matthew Rocklin http://matthewrocklin.com

http://github.com/mrocklin