Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The...

Python Analysis

PHYS 224September 25/26, 2014

Goals• Two things to teach in this lecture

1. How to use python to fit data2. How to interpret what python gives you

• Some references:• http://nbviewer.ipython.org/url/media.usm.maine.edu/~pauln/

ScipyScriptRepo/CurveFitting.ipynb

• http://www.physics.utoronto.ca/~phy326/python/curve_fit_to_data.py

Fitting Experimental Data

The goal of the lab experiments is to determine a physical quantity y (independent variable) as a function of x (dependent variable)How?

• Measure the pair (xi,yi) a number (N) times• Find a fit function y=y(x) that describes the

relationship between these two quantities

The Linear Case• The simplest function relating the two

variables is the linear functionf(x) = y = ax +b

• This is valid for any yi,xi combination• If a and b are known, the true value of yi

can be calculated for any xi

yi,true = axi + b

Linear Regression

• Linear regression calculates the most probable values of a and b such that the linear equation is validyi,true = axi + b

• When taking measurements of yi, these usually obey Gauss’ distribution

An Example• Ideal Gas Law: P*V = n*R*T

• Pressure * Volume = n * R * Temperature• P = [(n*R)/V]*T

Fitting in Python

• We’re going to use the curve_fit function, which is part of the scipy.optimize package

• The usage is as follows:fit_parameters,fit_covariance = scipy.optimize.curve_fit(fit_function,x_data,y_data,sigma,guess)

fit_parameters - an array of the output fit parametersfit_covariance - an array of the covariance of the output fit parametersfit_function - the function used to do the fitsigma - the uncertainty associated with the dataguess - the initial guess input to the fit

Fitting with curve_fit

import numpyimport scipy.optimizefrom matplotlib import peplos

#define the function to be used in the fittingdef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data, vol_data = numpy.loadtxt('ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit, temp_data, vol_data, p0=(1.0,8.0),sigma=uncertainty)

Fitting with curve_fit

X data

}Y data

Initial guessfor parameters

}Uncertainty

on data

Function}

import numpyimport scipy.optimizefrom matplotlib import peplos

#define the function to be used in the fittingdef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data, vol_data = numpy.loadtxt('ideal_gas_law.txt',unpack=True)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit, temp_data, vol_data, p0=(1.0,8.0),sigma=uncertainty)

Results

fit parameters = [0.21617647 8.33058824]

fit covariance = [[2.16490542e+ 04 � 6.89053501e+ 01]

[�6.89053501e+ 01 2.20507375e� 01]]

So what does this mean?

We set up the function for the fit to be:

y = p[0] + p[1]*xSo with the fit parameters, the function is:

y = 0.216 + 8.33*x

Full Probability• For a set of N measurements of the

dependent variable yy1, y2, y3,… yN

The probability of obtaining these values is the product of the individual probaiblities

Pa,b(y1, y2, y3...yN ) = Pa,b(y1)Pa,b(y2)Pa,b(y3)...Pa,b(yN )

�a�bx

Full Probability• For a set of N measurements of the

dependent variable yy1, y2, y3,… yN

The probability of obtaining these values is the product of the individual probabilities

Pa,b(y1, y2, y3...yN ) = Pa,b(y1)Pa,b(y2)Pa,b(y3)...Pa,b(yN )

�a�bx

Called the chi-squared

Chi-Squared

• The circled part is the definition of the residuals, ie the true data (yi) minus the fit data (a + b*xi)

• Dividing this by the standard deviation (σ) tells us how many standard deviations the test data is away from the fit at that x

• The square ensures this is always positive

(yi � a� bxi)2

Plotting the Residuals

#read in the data (currently only located on my hard drive...)temp_data,vol_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/ideal_gas_law.txt',unpack=True)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit,temp_data,vol_data,p0=(1.0,8.0),sigma=uncertainty)

#now generate the line of the best fit#set up the temperature points for the full arrayfit_temp = numpy.arange(270,355,5)#make the data for the best fit valuesfit_answer = linearFit(fit_temp,*fit_parameters)#calculate the residualsfit_resid = vol_data-linearFit(temp_data,*fit_parameters)#make a line at zerozero_line = numpy.zeros(len(vol_data))

How do the Residuals Look?

• The residuals are obviously a large component of the χ2 value used by the minimizer

• They can be plotted to look for trends and see if the fit function is appropriate

Interpreting the Covariance• Elements in covariance matrix represent

the relationship between the two variables

• The diagonals are the square of the standard deviations• we will use this in our interpretation of

the answer

Covariance Matrix Elements

• Diagonal elements are the square of the standard deviation for that parameter

• The non-diagonal elements show the relationship between the parameters

fit parameters = [0.21617647 8.33058824]

[�6.89053501e+ 01 2.20507375e� 01]]

cov(x, y) =1

(xi � x)(yi � y)

Fit Results

import numpyimport scipy.optimizefrom matplotlib import pyplot

#define the function to be used in the fitting, which is linear in this casedef linearFit(x,*p): return p[0]+p[1]*x

#determine the standard deviations for each parametersigma0 = numpy.sqrt(fit_covariance[0,0])sigma1 = numpy.sqrt(fit_covariance[1,1])

Fit Results

• Calculate the standard deviation on the slope (p[1])

• This is the square root of the [1,1] entry of the covariance matrix

fit parameters = [0.21617647 8.33058824]

[�6.89053501e+ 01 2.20507375e� 01]]

Fit Results

• Show the p[1] parameter with the standard deviation:

p1 = 8.33 ± 0.470

fit parameters = [0.21617647 8.33058824]

[�6.89053501e+ 01 2.20507375e� 01]]

Comparison to Accepted Values• We obtained the result p[1] = 8.33±0.47

• We assume that there is 1 mole in a 1m3 volume so that n=V=1

• The accepted value (currently) is 8.3144621±0.0000075

• The accepted value IS contained within our uncertainty (our one sigma range is from 7.86 to 8.80)

• These values agree “within their error”

Application to Non-linear Examples

• This method can also be applied to other examples• Powers: y = b √x

• can be linearized as y2 = b2*x• Polynomials: y = a + b*x + c*x2 + d*x3

• This is just a case of using multiple regression since the equation is linear in the coefficients

• Exponentials: y= a*ebx

• Can be linearized as ln(y) = ln(a) + b*x• There are many other examples

Return to Chi-Squared

• Here the definition of the residual has changed• Instead of yi - a - b*xi a more general term has

been used• yi is still the data• y(xi) is the fit function evaluated at xi

(yi � y(xi))2

Gauss’ Distribution

• The probability is described by

P (x) =1p2⇡�

� (x�x)2

• where the average (mean) value is x and the spread in values is σ

Gauss’ Distribution

• We use the probabilities shown above to determine how probable a value is in this distribution

• When we take a measurement, we expect that 68.2% of the time it will be within 1σ from the mean value

• Another way of phrasing this is that we expect a value to be more than 3σ above the mean value only 0.1% of the time

Another example

Fitting the Gaussian

import numpyimport scipy.optimizeimport matplotlib.pyplot as pyplotimport pylab as py

#define the function to be used in the fitting, which is linear in this casedef gaussFit(x,*p): return p[0]+p[1]*numpy.exp(-1*(x-p[2])**2/(2*p[3]**2))

#read in the data (currently only located on my hard drive...)day_num,rain_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/precip_2013.txt', unpack=True)

#get some (pretty good) guesses for the fitting parametersdata_mean = rain_data.mean()data_std = rain_data.std()

#set up the histogram so that it can be fitdata_plot = py.hist(rain_data,range=(0.1,90),bins=100)histx = [0.5 * (data_plot[1][i] + data_plot[1][i + 1]) for i in xrange(100)]histy = data_plot[0]

#actually do the fittingfit_parameters,fit_covariance = scipy.optimize.curve_fit(gaussFit,histx,histy,p0=(5.0,10.0,data_mean,data_std))

Another example

Fit mean: 7.06mmFit standard deviation:

10.13mm

Standard Deviation

Another example

10.13mm

Standard Deviation

Another example

10.13mm

Standard Deviation

Another example

10.13mm

Rainfall of 85.5mm is 7.74 standard

deviations above the mean (from this data) which is extremely

unlikely

Standard Deviation

Chi-Squared and Goodness of Fit

• This can then be used as a “goodness of fit” test

• If the function is a good approximation, then the residual will be within one standard deviation, so this will sum to approximately N

(yi � y(xi))2

Chi-Squared

• We normally use the number of degrees of freedom of the experiment to determine the fit quality

• The number of DOF is the number of data points in the sample minus the number of parameters in the fit

• For a sample with 20 data points and a linear fit (2 parameters), DOF = 18

• This is used as the goodness of fit since χ2/DOF≅1 for a good fit

(yi � y(xi))2

Revisit the First Example

import numpyimport scipy.optimizefrom matplotlib import pyplot

#define the function to be used in the fitting, which is linear in this casedef linearFit(x,*p): return p[0]+p[1]*x

#calculate the chi-squared valuechisq = sum(((vol_data-linearFit(temp_data,*fit_parameters))/uncertainty)**2)print chisq

dof = len(temp_data)-len(fit_parameters)print dof

Revisit the First Example• Is this a good fit?

�2 =16X

presDatai � fiti

uncertainty

�2= 65.6

• Divide this by the DOF• We have 16 data points,

2 parameters�2

16� 2= 4.68

• This may not be a great fit...

Goodness of Fit• Previous statements only mostly true• More accurately:

• χ2 >> 1 is a very poor fit, maybe even a fit model which doesn’t match

• χ2 > 1 is not a good fit, or the uncertainty is underestimated

• χ2 << 1 means the uncertainty could be overestimated

Summary

• You should now be well prepared to use python to fit the data

• Your practice with this starts with the next pendulum exercise, which you can begin now!

Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The...

Documents

Transcript of Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The...

Python Analysis / LATAnalysisScripts / Lightcurves / Fitting ......Python Analysis / LATAnalysisScripts / Lightcurves / Fitting Issues Jeremy S. Perkins, FSSC Fermi Summer School 2012

Python Live Training, Python Online Training, Python Online Tutorials, Python Videos in Online

On Python language How to use Python Syntax of Python Types in Python Outlinemarcinm/dyd/objects/python1.pdf · 2012-02-15 · On Python language How to use Python Syntax of Python

Curve Fitting in Python - halvorsen.blog · Linear Regression -Example % * 0 15 1 10 2 9 3 6 4 2 5 0 Assume the Data: From the Python code we get the following results: [-2.91428571

Non-Linear Least-Squares Minimization and Curve …...Non-Linear Least-Squares Minimization and Curve-Fitting for Python Release 0.9.3-dirty Matthew Newville, Till Stensitzki, and

A Python Book: Beginning Python, Advanced Python, and Python … · 2013-10-23 · A Python Book: Beginning Python, Advanced Python, and Python Exercises Author: Dave Kuhlman Address:

Python教程 / Python tutorial

A Python Book: Beginning Python, Advanced Python, and ... · A Python Book A Python Book: Beginning Python, Advanced Python, and Python Exercises Author: Dave Kuhlman Contact: dkuhlman@davekuhlman.org

Самоучитель Python - msu.ruserver.aesc.msu.ru/materials/PYTHON/pythonworldru.pdf · Синтаксис языка Python Синтаксис языка Python,как и

Polarization of Light - U of T Physicsphy225h/experiments/...Polarization of Light Introduction Light, viewed classically, is a transverse electromagnetic wave. Namely, the underlying

CHILDREN’S SLEEPWEAR STANDARD OPERATING PROCEDURE - The Kroger … · 3 tight-fitting (snug fitting) sleepwear - requirement..... 3 4 tight-fitting (snug-fitting ... 5 tight-fitting

Non-Linear Least-Squares Minimization and Curve-Fitting ... · Non-Linear Least-Squares Minimization and Curve-Fitting for Python, Release 0.8.3-py2.7.egg 2.If the user wants to ﬁx

Scientific Computing with Python Webinar 9/18/2009:Curve Fitting

Curve Fitting in Python · 2020. 9. 21. · 1 10 2 9 3 6 4 2 5 0 Assume the Data: From the Python code we get the following results: [-2.91428571 14.28571429] This means - ≈−2

Red Hat Gluster Storage 3€¦ · python-jwt MIT N python-kafka ASL 2.0 N python-keyczar ASL 2.0 N python-keyring Python N python-keystoneclient ASL 2 Y python-keystoneclient-doc

DC Python - Python on Pi

Work Python, play Python~

Transitional Fitting(Brass) One Touch Fitting

A Crash Course in Python - Techomepage.cem.itesm.mx/.../SLIDES/Python/CrashCoursePythonPartI.pdfA Crash Course in Python 2 Agenda Why Python Python References Python Advantages Python

Non-Linear Least-Squares Minimization and Curve-Fitting for Python