Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The...

37
Python Analysis PHYS 224 September 25/26, 2014

Transcript of Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The...

Page 1: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Python Analysis

PHYS 224September 25/26, 2014

Page 3: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fitting Experimental Data

The goal of the lab experiments is to determine a physical quantity y (independent variable) as a function of x (dependent variable)How?

• Measure the pair (xi,yi) a number (N) times• Find a fit function y=y(x) that describes the

relationship between these two quantities

3

Page 4: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

The Linear Case• The simplest function relating the two

variables is the linear functionf(x) = y = ax +b

• This is valid for any yi,xi combination• If a and b are known, the true value of yi

can be calculated for any xi

yi,true = axi + b

4

Page 5: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Linear Regression

• Linear regression calculates the most probable values of a and b such that the linear equation is validyi,true = axi + b

• When taking measurements of yi, these usually obey Gauss’ distribution

5

Page 6: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

An Example• Ideal Gas Law: P*V = n*R*T

• Pressure * Volume = n * R * Temperature• P = [(n*R)/V]*T

6

Page 7: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fitting in Python

• We’re going to use the curve_fit function, which is part of the scipy.optimize package

• The usage is as follows:fit_parameters,fit_covariance = scipy.optimize.curve_fit(fit_function,x_data,y_data,sigma,guess)

fit_parameters - an array of the output fit parametersfit_covariance - an array of the covariance of the output fit parametersfit_function - the function used to do the fitsigma - the uncertainty associated with the dataguess - the initial guess input to the fit

7

Page 8: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fitting with curve_fit

8

import numpyimport scipy.optimizefrom matplotlib import peplos

#define the function to be used in the fittingdef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data, vol_data = numpy.loadtxt('ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit, temp_data, vol_data, p0=(1.0,8.0),sigma=uncertainty)

Page 9: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fitting with curve_fit

9

}

X data

}Y data

}

Initial guessfor parameters

}Uncertainty

on data

Function}

import numpyimport scipy.optimizefrom matplotlib import peplos

#define the function to be used in the fittingdef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data, vol_data = numpy.loadtxt('ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit, temp_data, vol_data, p0=(1.0,8.0),sigma=uncertainty)

Page 10: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Results

10

fit parameters = [0.21617647 8.33058824]

fit covariance = [[2.16490542e+ 04 � 6.89053501e+ 01]

[�6.89053501e+ 01 2.20507375e� 01]]

So what does this mean?

We set up the function for the fit to be:

y = p[0] + p[1]*xSo with the fit parameters, the function is:

y = 0.216 + 8.33*x

Page 11: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Full Probability• For a set of N measurements of the

dependent variable yy1, y2, y3,… yN

The probability of obtaining these values is the product of the individual probaiblities

11

Pa,b(y1, y2, y3...yN ) = Pa,b(y1)Pa,b(y2)Pa,b(y3)...Pa,b(yN )

=1

�Ny

e

PN

i=1(y

i

�a�bx

i

)2

2y

2

Page 12: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Full Probability• For a set of N measurements of the

dependent variable yy1, y2, y3,… yN

The probability of obtaining these values is the product of the individual probabilities

12

Pa,b(y1, y2, y3...yN ) = Pa,b(y1)Pa,b(y2)Pa,b(y3)...Pa,b(yN )

=1

�Ny

e

PN

i=1(y

i

�a�bx

i

)2

2y

2

Called the chi-squared

(χ2)

Page 13: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Chi-Squared

• The circled part is the definition of the residuals, ie the true data (yi) minus the fit data (a + b*xi)

• Dividing this by the standard deviation (σ) tells us how many standard deviations the test data is away from the fit at that x

• The square ensures this is always positive

13

2 =NX

i=1

(yi � a� bxi)2

2y

Page 14: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Plotting the Residuals

14

#read in the data (currently only located on my hard drive...)temp_data,vol_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit,temp_data,vol_data,p0=(1.0,8.0),sigma=uncertainty)

#now generate the line of the best fit#set up the temperature points for the full arrayfit_temp = numpy.arange(270,355,5)#make the data for the best fit valuesfit_answer = linearFit(fit_temp,*fit_parameters)#calculate the residualsfit_resid = vol_data-linearFit(temp_data,*fit_parameters)#make a line at zerozero_line = numpy.zeros(len(vol_data))

Page 15: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

How do the Residuals Look?

• The residuals are obviously a large component of the χ2 value used by the minimizer

• They can be plotted to look for trends and see if the fit function is appropriate

15

Page 16: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Interpreting the Covariance• Elements in covariance matrix represent

the relationship between the two variables

• The diagonals are the square of the standard deviations• we will use this in our interpretation of

the answer

16

Page 17: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Covariance Matrix Elements

• Diagonal elements are the square of the standard deviation for that parameter

• The non-diagonal elements show the relationship between the parameters

17

fit parameters = [0.21617647 8.33058824]

fit covariance = [[2.16490542e+ 04 � 6.89053501e+ 01]

[�6.89053501e+ 01 2.20507375e� 01]]

cov(x, y) =1

N

NX

i=1

(xi � x)(yi � y)

Page 18: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fit Results

18

import numpyimport scipy.optimizefrom matplotlib import pyplot

#define the function to be used in the fitting, which is linear in this casedef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data,vol_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit,temp_data,vol_data,p0=(1.0,8.0),sigma=uncertainty)

#determine the standard deviations for each parametersigma0 = numpy.sqrt(fit_covariance[0,0])sigma1 = numpy.sqrt(fit_covariance[1,1])

Page 19: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fit Results

• Calculate the standard deviation on the slope (p[1])

• This is the square root of the [1,1] entry of the covariance matrix

19

fit parameters = [0.21617647 8.33058824]

fit covariance = [[2.16490542e+ 04 � 6.89053501e+ 01]

[�6.89053501e+ 01 2.20507375e� 01]]

Page 20: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fit Results

• Show the p[1] parameter with the standard deviation:

p1 = 8.33 ± 0.470

20

fit parameters = [0.21617647 8.33058824]

fit covariance = [[2.16490542e+ 04 � 6.89053501e+ 01]

[�6.89053501e+ 01 2.20507375e� 01]]

Page 21: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Comparison to Accepted Values• We obtained the result p[1] = 8.33±0.47

• We assume that there is 1 mole in a 1m3 volume so that n=V=1

• The accepted value (currently) is 8.3144621±0.0000075

• The accepted value IS contained within our uncertainty (our one sigma range is from 7.86 to 8.80)

• These values agree “within their error”

21

Page 22: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Application to Non-linear Examples

• This method can also be applied to other examples• Powers: y = b √x

• can be linearized as y2 = b2*x• Polynomials: y = a + b*x + c*x2 + d*x3

• This is just a case of using multiple regression since the equation is linear in the coefficients

• Exponentials: y= a*ebx

• Can be linearized as ln(y) = ln(a) + b*x• There are many other examples

22

Page 23: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Return to Chi-Squared

• Here the definition of the residual has changed• Instead of yi - a - b*xi a more general term has

been used• yi is still the data• y(xi) is the fit function evaluated at xi

23

2 =NX

i=1

(yi � y(xi))2

2y

Page 24: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Gauss’ Distribution

• The probability is described by

24

P (x) =1p2⇡�

e

� (x�x)2

2�2

• where the average (mean) value is x and the spread in values is σ

Page 25: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Gauss’ Distribution

• We use the probabilities shown above to determine how probable a value is in this distribution

• When we take a measurement, we expect that 68.2% of the time it will be within 1σ from the mean value

• Another way of phrasing this is that we expect a value to be more than 3σ above the mean value only 0.1% of the time

25

Page 26: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Another example

26

Page 27: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Fitting the Gaussian

27

import numpyimport scipy.optimizeimport matplotlib.pyplot as pyplotimport pylab as py

#define the function to be used in the fitting, which is linear in this casedef gaussFit(x,*p): return p[0]+p[1]*numpy.exp(-1*(x-p[2])**2/(2*p[3]**2))

#read in the data (currently only located on my hard drive...)day_num,rain_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/precip_2013.txt', unpack=True)

#get some (pretty good) guesses for the fitting parametersdata_mean = rain_data.mean()data_std = rain_data.std()

#set up the histogram so that it can be fitdata_plot = py.hist(rain_data,range=(0.1,90),bins=100)histx = [0.5 * (data_plot[1][i] + data_plot[1][i + 1]) for i in xrange(100)]histy = data_plot[0]

#actually do the fittingfit_parameters,fit_covariance = scipy.optimize.curve_fit(gaussFit,histx,histy,p0=(5.0,10.0,data_mean,data_std))

Page 28: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Another example

28

Fit mean: 7.06mmFit standard deviation:

10.13mm

Mean

}

Standard Deviation

Page 29: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Another example

29

Fit mean: 7.06mmFit standard deviation:

10.13mm

Mean

}

Standard Deviation

Page 30: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Another example

30

Fit mean: 7.06mmFit standard deviation:

10.13mm

Mean

}

Standard Deviation

Page 31: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Another example

31

Fit mean: 7.06mmFit standard deviation:

10.13mm

Rainfall of 85.5mm is 7.74 standard

deviations above the mean (from this data) which is extremely

unlikely

Mean

}

Standard Deviation

Page 32: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Chi-Squared and Goodness of Fit

• This can then be used as a “goodness of fit” test

• If the function is a good approximation, then the residual will be within one standard deviation, so this will sum to approximately N

32

2 =NX

i=1

(yi � y(xi))2

2y

Page 33: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Chi-Squared

• We normally use the number of degrees of freedom of the experiment to determine the fit quality

• The number of DOF is the number of data points in the sample minus the number of parameters in the fit

• For a sample with 20 data points and a linear fit (2 parameters), DOF = 18

• This is used as the goodness of fit since χ2/DOF≅1 for a good fit

33

2 =NX

i=1

(yi � y(xi))2

2y

Page 34: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Revisit the First Example

34

import numpyimport scipy.optimizefrom matplotlib import pyplot

#define the function to be used in the fitting, which is linear in this casedef linearFit(x,*p): return p[0]+p[1]*x

#read in the data (currently only located on my hard drive...)temp_data,vol_data = numpy.loadtxt('/Users/kclark/Desktop/Teaching/phys224/weather_data/ideal_gas_law.txt',unpack=True)

#add an uncertainty to each measurement pointuncertainty = numpy.empty(len(vol_data))uncertainty.fill(20.)

#do the fitfit_parameters,fit_covariance = scipy.optimize.curve_fit(linearFit,temp_data,vol_data,p0=(1.0,8.0),sigma=uncertainty)

#calculate the chi-squared valuechisq = sum(((vol_data-linearFit(temp_data,*fit_parameters))/uncertainty)**2)print chisq

dof = len(temp_data)-len(fit_parameters)print dof

Page 35: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Revisit the First Example• Is this a good fit?

35

�2 =16X

i=1

presDatai � fiti

uncertainty

�2= 65.6

• Divide this by the DOF• We have 16 data points,

2 parameters�2

DOF=

65.6

16� 2= 4.68

• This may not be a great fit...

Page 36: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Goodness of Fit• Previous statements only mostly true• More accurately:

• χ2 >> 1 is a very poor fit, maybe even a fit model which doesn’t match

• χ2 > 1 is not a good fit, or the uncertainty is underestimated

• χ2 << 1 means the uncertainty could be overestimated

36

Page 37: Python Analysis - U of T Physicsphy225h/python... · 2014-09-24 · Fitting Experimental Data The goal of the lab experiments is to determine a physical quantity y (independent variable)

Summary

• You should now be well prepared to use python to fit the data

• Your practice with this starts with the next pendulum exercise, which you can begin now!

37