SymPy.stats: Uncertainty...
Transcript of SymPy.stats: Uncertainty...
Introduction Modeling Uncertainty Multi-Compilation Conclusion
SymPy.stats: Uncertainty Modeling
Matthew Rocklin
University of Chicago
July, 2012
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Topics
Uncertainty
Modeling
Symbolics
Multi-Compilation
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Data-Driven Statistics vs. Uncertainty Modeling
Last Talk - Pandas: DataAnalysis
1950 1960 1970 1980 1990 2000 2010 2020Year
310
320
330
340
350
360
370
380
390
400
CO
2 -
(ppm
)
CO2 levels observed at Mauna Kea
Figure: Historical CO2 Data fromMauna Kea observatory
This Talk - SymPy.StatsMathematical Modeling withUncertainty
Figure: CO2 Predictions withuncertainty bounds
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Uncertainty
Windspeed30 km/hr
Wind Power50 KW
Electricity40 KW
Lights are on!
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Uncertainty
Windspeed30 km/hr
Wind Power50 KW
Electricity40 KW
Lights are on!
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Uncertainty
Windspeed30 km/hr
Wind Power50 KW
Electricity40 KW
Lights are on!
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Uncertainty
Windspeed30 km/hr
Wind Power50 KW
Electricity40 KW
Lights are on!
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Uncertainty
Measurements are Distributions, not Scalars
WindSpeed:Around 30km/hr
WindPower:Electricity:
Lights stay on?
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Direct Solution
# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4
# S o l v ew h i l e ( y > y f ) :
t+=dty = y0 + v∗ s i n ( t h e t a ) ∗ t
+ g∗ t ∗∗2 / 2}
x = x0 + v∗ cos ( t h e t a ) ∗ t
gv
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Direct Solution
# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4
# S o l v ew h i l e ( y > y f ) :
t+=dty = y0 + v∗ s i n ( t h e t a ) ∗ t
+ g∗ t ∗∗2 / 2}
x = x0 + v∗ cos ( t h e t a ) ∗ t
gv
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Modeling
# D e f i n e i n p u t sx0 = 0y0 = 0y f = −30 # t a r g e t h e i g h tg = −10 # g r a v i t yv = 30 # m/ st h e t a = p i /4
# D e f i n e Modelt = Symbol ( ’ t ’ )x = x0 + v∗ cos ( t h e t a ) ∗ ty = y0 + v∗ s i n ( t h e t a ) ∗ t
+ g∗ t ∗∗2 / 2i m p a c t t i m e = s o l v e ( y − yf , t )x f = x0 + v∗ cos ( t h e t a ) ∗
i m p a c t t i m e
gv
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Full SymPy
# D e f i n e i n p u t sx0 = Symbol ( ’ x 0 ’ )y0 = Symbol ( ’ y 0 ’ )y f = Symbol ( ’ y f ’ )g = Symbol ( ’ g ’ )v = Symbol ( ’ v ’ )t h e t a = Symbol ( ’ t h e t a ’ )
# D e f i n e Modelt = Symbol ( ’ t ’ )x = x0 + v∗ cos ( t h e t a ) ∗ ty = y0 + v∗ s i n ( t h e t a ) ∗ t
+ g∗ t ∗∗2 / 2i m p a c t t i m e = s o l v e ( y − yf , t )x f = x0 + v∗ cos ( t h e t a ) ∗
i m p a c t t i m e
gv
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Graph
SymPy generates a graph
x
x0 v thetat
y
y0 g
yf
impact_time
xf
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kinematics - Computing on a Graph
impact time =−v sin (θ) +
√−4gy0 + 4gyf + v2 sin2 (θ)
2g
xf = x0 +
v
(−v sin (θ) +
√−4gy0 + 4gyf + v2 sin2 (θ)
)cos (θ)
2g
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Random Variables
Introducting SymPy.stats
>>> from sympy . s t a t s i m p o r t ∗>>> # v = Symbol ( ’ v ’ )>>> v = Normal ( ’ v ’ , 30 , 1)
>>> # P l o t t h e d e n s i t y>>> pdf = d e n s i t y ( v )>>> p l o t ( pdf ( z ) , ( z , 27 , 33) )
√2e−
12
(z−30)2
2√π
27 28 29 30 31 32 33
Distribution of velocity
>>> P( v > 31)
−1
2erf
(1
2
√2
)+
1
2== 0.1586...
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Random Variables
Introducting SymPy.stats
>>> from sympy . s t a t s i m p o r t ∗>>> # v = Symbol ( ’ v ’ )>>> v = Normal ( ’ v ’ , 30 , 1)
>>> # P l o t t h e d e n s i t y>>> pdf = d e n s i t y ( v )>>> p l o t ( pdf ( z ) , ( z , 27 , 33) )
√2e−
12
(z−30)2
2√π
27 28 29 30 31 32 33
Distribution of velocity
>>> P( v > 31)
−1
2erf
(1
2
√2
)+
1
2== 0.1586...
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Graph with Random Expressions
Random inputs cause other random expressions
x0 y0v theta g
yf
t
x y
impact_time
xf
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Querying Random Expressions
a , b = symbols ( ’ a , b ’ )d e n s i t y ( x ) ( a ) ∗ d e n s i t y ( y ) ( b )
e−a2
t2 e−(b+5t2)
2
t2 e30√
2at e30
√2(b+5t2)
t
πt2e900
p l o t (P( y > y f ) , ( t , 2 . 7 , 3 . 4 ) )
4.5 5.0 5.5 6.0 6.5Time
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2Probability that the cannon ball has not yet landed
E( i m p a c t t i m e ) ∫ ∞−∞
(v +√v2 + 1200
)e−
12
(v−30)2
20√π
dv
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Random and Computational expressions
SymPy.stats does two things
1 Models uncertain systems2 Reduces uncertain expressions to computational ones
Functions P, E, sample, density, variance ::Random Expression → Computational Expression
P(v > 31)→∫∞
31
√2e−
12 (z−30)2
2√π
dz → −12 erf
(12
√2)
+ 12
E(impact time)→∫∞−∞
(v+√v2+1200)e−
12 (v−30)2
20√π
dv →?
But there are other ways to compute integrals
1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo
(E(impact time, numsamples=10000))
3 Code Generation
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Random and Computational expressions
SymPy.stats does two things
1 Models uncertain systems2 Reduces uncertain expressions to computational ones
Functions P, E, sample, density, variance ::Random Expression → Computational Expression
P(v > 31)→∫∞
31
√2e−
12 (z−30)2
2√π
dz → −12 erf
(12
√2)
+ 12
E(impact time)→∫∞−∞
(v+√v2+1200)e−
12 (v−30)2
20√π
dv →?
But there are other ways to compute integrals
1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo
(E(impact time, numsamples=10000))
3 Code Generation
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Random and Computational expressions
SymPy.stats does two things
1 Models uncertain systems2 Reduces uncertain expressions to computational ones
Functions P, E, sample, density, variance ::Random Expression → Computational Expression
P(v > 31)→∫∞
31
√2e−
12 (z−30)2
2√π
dz → −12 erf
(12
√2)
+ 12
E(impact time)→∫∞−∞
(v+√v2+1200)e−
12 (v−30)2
20√π
dv →?
But there are other ways to compute integrals
1 scipy.integrate.quad (uses FORTRAN library)2 Monte Carlo (E(impact time, numsamples=10000))3 Code Generation
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Other kinds of expressions
RV Type Computational Type
Continuous SymPy Integral
Discrete - Finite (dice) Python iterators / generators
Discrete - Infinite (Poisson) SymPy Summation
Multivariate Normal SymPy Matrix Expression
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Kalman Filter
mu = Matr ixSymbol ( ’mu’ , n , 1) # n by 1 mean v e c t o rSigma = Matr ixSymbol ( ’ Sigma ’ , n , n ) # c o v a r i a n c e m a t r i xX = MVNormal ( ’X’ , mu, Sigma )
H = Matr ixSymbol ( ’H’ , k , n ) # An o b s e r v a t i o n o p e r a t o rdata = Matr ixSymbol ( ’ data ’ , k , 1)
R = Matr ixSymbol ( ’R ’ , k , k ) # c o v a r i a n c e m a t r i x f o r n o i s en o i s e = MVNormal ( ’ eta ’ , Z e r o M a t r i x ( k , 1) , R)
# C o n d i t i o n a l d e n s i t y o f X g i v e n HX+n o i s e==datad e n s i t y (X , Eq (H∗X+n o i s e , data ) )
[ I 0 ]([
Σ 00 R
] [HT
I] (
[ H I ][
Σ 00 R
] [HT
I])−1
([ H I ] [ µ0 ]− data) + [ µ0 ])
[ I 0 ](I−
[Σ 00 R
] [HT
I] (
[ H I ][
Σ 00 R
] [HT
I])−1
[ H I ]) [
Σ 00 R
] [ I0
]µ+ΣHT (R+HΣHT )
−1(Hµ−data)(
I−ΣHT (R+HΣHT )−1
H)
Σ
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Scientific Computing Technology Stack
Math / PDE description
Linear Algebra/Matrix Expressions
Sparse matrix algorithms
Parallel solution /scheduler
C/FORTRAN
x86
Uncertainty
Scientific description
CUDA
PowerPC GPU SoC
Numerical Linear Algebra
Introduction Modeling Uncertainty Multi-Compilation Conclusion
Scientific Computing Technology Stack
Math / PDE description
Linear Algebra/Matrix Expressions
Sparse matrix algorithms
Parallel solution /scheduler
C/FORTRAN
x86
Uncertainty
Scientific description
BLAS/LAPACK
PETSc/Trilinos
FEniCS
SymPy.stats
CUDA
PowerPC GPU SoC
Numerical Linear Algebra
gcc/nvcc
Introduction Modeling Uncertainty Multi-Compilation Conclusion
End
This work was a Google Summer of Code project.Contributors: Raoul Bourquin, Nathan Alison
GSoC Mentor: Andy TerrelSymPy: http://github.com/sympy/sympy
me: Matthew Rocklin http://matthewrocklin.com
http://github.com/mrocklin