Lecture Notes for Phys 114 ”Introduction to data reduction”

60
notes114.tex Lecture Notes for Phys 114 ”Introduction to data reduction...” Vitaly A. Shneidman Department of Physics, New Jersey Institute of Technology (Dated: April 13, 2014) Abstract These lecture notes will contain some theoretical material related to Bevington, 3rd ed. (abbre- viated BEV.) , which is the main textbook. Notes for all lectures will be kept in a single file and the table of contents will be automatically updated so that each time you can print out only the updated part. The Mathematica appendix is included for reference; you do not have to print it out since there will be other files which illustrate how this program works. Please report any typos to [email protected] 1

Transcript of Lecture Notes for Phys 114 ”Introduction to data reduction”

Page 1: Lecture Notes for Phys 114 ”Introduction to data reduction”

notes114.tex

Lecture Notes for Phys 114 ”Introduction to data reduction...”

Vitaly A. Shneidman

Department of Physics, New Jersey Institute of Technology

(Dated: April 13, 2014)

Abstract

These lecture notes will contain some theoretical material related to Bevington, 3rd ed. (abbre-

viated BEV.) , which is the main textbook. Notes for all lectures will be kept in a single file and

the table of contents will be automatically updated so that each time you can print out only the

updated part.

The Mathematica appendix is included for reference; you do not have to print it out since there

will be other files which illustrate how this program works.

Please report any typos to [email protected]

1

Page 2: Lecture Notes for Phys 114 ”Introduction to data reduction”

notes114.tex

Contents

I. Introduction 2

II. Fundamental limitations of measurements 2

A. Diffraction 2

B. Quantization of light 2

C. deBroglie wavelength 3

D. Thermal noise 3

E. Charge quantization 3

F. Decay of metastable states 3

III. The Mathematica program and review of introductory math 6

A. Error function 6

B. Gamma function and Stirling formula 7

C. Incomplete gamma function 8

IV. Introduction to Mathematica 9

A. Overview 9

B. Main commands 10

C. Lists 11

V. Data and Histograms 12

VI. Errors, Parent and Sample distributions 13

A. Parent and Sample distributions 14

B. Mean and variance 15

C. Alternatives to mean and variance 15

VII. Discrete and continuos distributions 17

1. Mean and variance 17

A. Change of variables in continuos distributions 18

VIII. Binomial distribution 18

A. Normal distribution 19

0

Page 3: Lecture Notes for Phys 114 ”Introduction to data reduction”

IX. Limits of the binomial distribution 20

A. Poisson 20

1. Advanced: sum of two Poisson distributions 21

B. Normal 21

C. Advanced: Central limit theorem 22

D. Physical example: Fluctuations of density in ideal gas 23

X. Other continuous distributions 24

1. Exponential 24

2. Gamma and χ2 25

A. Lorentz (Cauchy) 25

XI. ADVANCED: Expectation and moments, addition for Gauss 26

A. Adding two normal distributions (ND) 26

XII. Distrubution of several variables 27

1. Multivariate Gaussian distribution 27

2. Covariance and correlation 28

3. Statistical analog of covariance and correlation 28

4. Independent random variables 28

XIII. Measurements and propagation of errors 29

A. The coin experiment 30

1. Fair 30

2. Unfair 31

B. Propagation of small errors 31

1. x = f(u) 31

2. x = f(u, v) 32

C. Designing an experiment: Example 33

XIV. Estimators, χ2 and Kolmogorov-Smirnov tests 34

A. Maximum likelihood 34

1. Weighted average 35

B. χ2 35

1

Page 4: Lecture Notes for Phys 114 ”Introduction to data reduction”

1. Where does it come from? 35

C. Comparing two distributions 37

1. χ2-test for data vs theoretical 37

2. χ2-test for data1 vs data2 37

D. Kolmogorov-Smirnov 38

1. Kolmogorov-Smirnov test 39

XV. Monte Carlo integration 41

A. Buffon’s needle 41

B. General MC and example 41

XVI. Generation of random number for different distributions 43

A. exponential distribution 43

B. Poisson 43

C. Gauss 44

XVII. LSA 45

A. Fitting of data and geometric LSA 45

B. ”Physical” LSA 45

1. Errors in a and b 47

XVIII. Trigonometric, polynomial and nonlinear fits 48

A. a bit of Mathematica 49

1. Basic elementary commands 49

2. NUMBERS 50

3. SYMBOLIC MATH 51

4. DEFINING YOUR OWN FUNCTIONS 52

5. Graphics (2D) 53

6. Modeling and analysis of data 56

a. Filtering 56

2

Page 5: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys 114, 1st Lecture

I. INTRODUCTION

Homeworks are important part of the course; if done by hand they must be clearly

written in pen (black or blue), if done using Mathematica they must be printed out as a

clear hard copy (adding neat hand-written corrections on this hard copy is ok). No electronic

submissions of Mathematica notebooks will be accepted.

II. FUNDAMENTAL LIMITATIONS OF MEASUREMENTS

A. Diffraction

-10 -5 0 5 10

-10

-5

0

5

10

2 4 6 8 10 12 14

0.05

0.1

0.15

0.2

0.25

FIG. 1: Fraunhofer diffraction from a circular aperture with diameter D. The arguments are

απD/λ with α ≪ 1 being the angle and λ the wavelength. The angle to see the first minimum is

called the angular resolution, θ ≈ 1.21967λ/D. Note that the intensity is very small beyond the

first minimum (right figure), but still can be easily picked up by the eye (left figure). [Also, note:

in optics literature this circular pattern is sometimes called ”Airy disk”.]

B. Quantization of light

E = hω , h ≈ 10−34 J · s (1)

2

Page 6: Lecture Notes for Phys 114 ”Introduction to data reduction”

C. deBroglie wavelength

λdB =h

p, h = 2πh (2)

D. Thermal noise

E ∼ kBT , kB ≃ 1.38 · 10−23 J/oK (3)

Einstein (1905) – relation between noise intensity and dissipation rate

Nyquist noise:

V 2 ∼ kBT ·R

E. Charge quantization

Q/e = 0, ±1, ±2, . . . , e = −1.6 · 10−19 C (4)

F. Decay of metastable states

Quantum:

3

Page 7: Lecture Notes for Phys 114 ”Introduction to data reduction”

0.5 1.0 1.5 2.0 2.5 3.0distance

0.5

1.0

1.5

2.0

2.5

3.0

energy

P ∼ exp(

−2

h|S|)

|S| =∫ x2

x1

|p(x)| dx , p2/2m = E − U(x)

4

Page 8: Lecture Notes for Phys 114 ”Introduction to data reduction”

Thermal:

r

W

P ∼ exp(

− W∗

kBT

)

HW: Give a crude estimation of the resolution limit (in m) when using

• 1 Mhz radio waves

• 1015Hz light

• 1018Hz X-rays

• 1 KeV electrons

• thermal neutrons at about 1000 oK

Arrange the results in a neat Table.

5

Page 9: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys 114, 2nd Lecture

III. THE MATHEMATICA PROGRAM AND REVIEW OF INTRODUCTORY

MATH

Introduction to Mathematica is described in a separate file ”IntroToMathematica.doc”

(see also Appendix A). The examples refer to numbers and elemetary functions, which

should not cause any difficulties. Non-elementtary functions used in data analysis are de-

scribed below. See also 111−functions.pdf

HW: reproduce the Mathematica output described in the file ”IntroToMathematica.doc” , slightly

changing the input functions compared to what we discussed in class. Print out your notebook.

A. Error function

erf(x) =2√π

∫ x

0dt exp(−t2) , erf(±∞) = ±1 (5)

erfc(x) = 1− erf(x) =2√π

∫ ∞

xdt exp(−t2) (6)

erfc(−∞) = 2 , erfc(x → +∞) ∼ 1√πx

exp(−x2) (7)

-3 -2 -1 1 2 3x

-1.0

-0.5

0.5

1.0

1.5

2.0

y

erf HxLerfcHxL

6

Page 10: Lecture Notes for Phys 114 ”Introduction to data reduction”

B. Gamma function and Stirling formula

Γ(n+ 1) = nΓ(n) =∫ ∞

0dt tne−t ≡ n! (8)

n! ≃√2πn(n/e)n (9)

Γ(1) = Γ(2) = 1 , Γ(0) = ∞ , Γ(1/2) =√π

1 2 3 4 5x

-1

1

2

3

4

5

y

logHGH1 + xLL

log ã-x 2 Π x1

2+x

lnn! = ln 1 + ln 2 + . . .+ lnn ≃∫

lnn dn = n lnn− n = n lnn

e

7

Page 11: Lecture Notes for Phys 114 ”Introduction to data reduction”

C. Incomplete gamma function

Γ(x+ 1,m) =∫ ∞

mdt txe−t , Γ(x, 0) ≡ Γ(x) (10)

Let us plot a few g[m](x) = Γ(1 + x,m)/Γ(1 + x):

1 2 3 4x

0.2

0.4

0.6

0.8

1.0

y

gH1LgH2LgH3LgH4L

8

Page 12: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

IV. INTRODUCTION TO MATHEMATICA

A. Overview

TOPICS for several lectures (0.-III. in IntroToMathematica.doc; also see Appendix A):

0. GETTING STARTED

entering a command

help

useful type- and space saving commands

saving files

brackets

equality signs

I. NUMBERS:

exact

approximate

complex (optional)

numbers with dimensions

II. SYMBOLIC MATH

sums and series

integration

algebra

trigonometry

Taylor expansions and limits (optional)

III GRAPHICS

2D

3D (optional)

IV. FUNCTIONS - see 114−functions.pdf

V. LISTS

see 114−intro2.pdf

VI. RANDOM NUMBERS and HISTOGRAMS - see 114−hist.pdf

9

Page 13: Lecture Notes for Phys 114 ”Introduction to data reduction”

reading external data (Excel and txt)

reading images

B. Main commands

To know by heart (with options):

(space=multiplication)

(*...*) - comment

;

%

/. - replacement with x− > ... or {x− > ..., y− > ...}. - dot product

{..., ..., ...} - list (”vector”)

{{...}, {...}} - nested list (matrix)

=

:=

==

...//f - equiv. to f[...] with f any pure function or operation

N

Integrate

D

Solve

Table , Do , If

...[[i]] - i-th element of list ... (or, row of a matrix if nested list); [[i,k]] - element of a matrix

Transpose , Append , Drop , Select

Plot , ListPlot , Show, Export

Graphics , Point , Line , Text

Timing , RandomReal , RandomInteger

Histogram , Mean , StandardDeviation

FindFit

Import

10

Page 14: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

C. Lists

see 114−intro2.pdf/nb

1. How to generate : Table command; nested lists

2. Length and Dimensions; Lists as vectors and matrices

extracting elements of lists - the [[...]]

3. REARRANGING LISTS : Sort, Rotate, etc.

4. Restructuring Lists : Transpose, Flatten, Drop, Select, etc.

5. Combining Lists : Join, Union, Intersection

6. Operating on lists : the Map (or, /@) command

7. Pure functions

8. Graphics

HW: Select any simple function f . Construct mylist - a reasonably long list of values of f

plus random corrections. Make a good plot which shows f(x) and mylist together. The plot must

include yourname and labels. Create a high quality plot outside of Mathemaica; print it out.

11

Page 15: Lecture Notes for Phys 114 ”Introduction to data reduction”

1 2 3 4 5 6

1

2

3

4

5

6

FIG. 2: A typical histogram after 20 rolls of a single die.

Dr. Vitaly A. Shneidman, Phys114

V. DATA AND HISTOGRAMS

see 114−histogram.nb

Suppose, you have a long list of data xk , 1 ≤ k ≤ N and

xmin ≤ xk ≤ xmax

Select ”binSize” ∆x and group data into n bins with

n ≃ xmax − xmin

∆x

Each bin will be distinguished by its index i and content bi . E.g. an element xk belongs to

bin i if

i = [xk/∆x]

Here [...] is the ”Floor”” function. A plot of bi as a function of x will then represent a

”Histogram”; a plot normalized to 1 corresponds to ”Probability” option; bi/(N · binSize)will approximate the PDF.

HW:

1. write a function which gives a sum (between 3 and 18) for a single random roll of 3 fair dice

2. repeat the experiment 500 times (not more)

3. use ”Commonest” command to find the most frequent outcome (”mode”)

12

Page 16: Lecture Notes for Phys 114 ”Introduction to data reduction”

2 3 4 5 6 7 8 9 10 11 12

25000

50000

75000

100000

125000

150000

2 3 4 5 6 7 8 9 10 11 12

0.025

0.05

0.075

0.1

0.125

0.15

FIG. 3: Unscaled histogram (red) and scaled (blue) after 1 million rols of 2 dice. The blue histogram

represents the experimental probabilities (which are already close to theoretical expectations) with

a total area of 1.

4. plot histograms similar to those in fig. 3; you do not have to use colors and, in fact the

do-it-yourself alternative based on ”myBins” (see notebook) could be better.

Dr. Vitaly A. Shneidman, Phys 114

VI. ERRORS, PARENT AND SAMPLE DISTRIBUTIONS

• Accuracy - How close the measurements are to the true value (note that we may not

always know the true value).

• Precision - How close repeated measurements are to each other. A measure of the

spread of data points. One can make measurements that are highly accurate (their

mean is close to the true value) even though they may not be very precise (large spread

of measurements). Conversely, on can make very precise measurements that are not

accurate.

• Errors - Deviations of measurements from the true value. Error here does not mean a

blunder! Also referred to as uncertainties.

• Systematic Errors - deviations from the true value that are very reproducible, gener-

ally due to some uncorrected effect of an instrument or measurement technique. An

example is reading a scale slightly off the vertical, which may systematically give a

too-high or too-low reading.

• Statistical, or Random Errors - fluctuations in measurements that result in their being

13

Page 17: Lecture Notes for Phys 114 ”Introduction to data reduction”

both too high and too low, due to how precisely the measurement can be made, and

which are amenable to reduction by doing repeated measurements

A. Parent and Sample distributions

Suppose we measure a sample with discrete values of x

x1 , x2 , . . . , xn

in N measurements with n ≤ N . Let all xi be distinct, repeated ni ≥ 1 times. Define

frequency

fi =ni

N,∑

i

fi = 1

Then probability

pi = limN→∞

fi (11)

provided the limit exists. The set pi ≥ 0 with

n∑

i

pi = 1 (12)

determines the discrete Parent probability distribution. The n can be finite or not. EXAM-

PLE: coin (in class).

If x is continuous, break the x-interval into bins

x0 < x1 < x2 < . . . < xn

and define fi for each bin with xi ≤ x < xi+1 (a ”histogram”). The parent distribution now

is a continuous probability density function (”PDF”) p(x) ≥ 0 with

∫ ∞

−∞

p(x) dx = 1 ,∫ xi+1

xi

p(x) dx = pi (13)

the rest is the same. Note units:

[pi] = 1 , [p(x)] = 1/[x]

14

Page 18: Lecture Notes for Phys 114 ”Introduction to data reduction”

B. Mean and variance

Sample:

N - number of experiments (large); n - number of possible outcomes of a single experiment,

can be large or small (e.g. n = 2 for a single coin).

x =1

N

N∑

i

xi =n∑

i

fixi (14)

s2 =1

N − 1

N∑

i

(xi − x)2 ≈⟨

(x− x)2⟩

(15)

Equivalently

s2 ≈n∑

i

fi (xi − x)2 =⟨

x2⟩

− x2

Parent: N → ∞ , n can remain finite (or not).

pi = limN→∞

fi (16)

µ = limN→∞

x =n∑

i

pixi (17)

σ2 = limN→∞

s2 =n∑

i

pi (xi − µ)2 (18)

for discrete, or for continuous:

µ =∫ ∞

−∞

dx xp(x)

σ2 =∫ ∞

−∞

dx (x− µ)2p(x)

In both cases

σ2 =⟨

(x− µ)2⟩

=⟨

x2⟩

− µ2 (19)

σ - ”standard deviation”; usually ”error”≃ ±σ..

C. Alternatives to mean and variance

”Mode” M :

x = M , p(x) = max

(command ”Commonest” in Mathematica).

15

Page 19: Lecture Notes for Phys 114 ”Introduction to data reduction”

”Median” m (also, µ1/2)

∫ m

−∞

dx p(x) =∫ ∞

mdx p(x) =

1

2

(less sensitive to ”outliers” than µ).

”Deviation”:

d = limN→∞

1

N

N∑

i

|xi − µ| = 〈|x− µ|〉

(less sensitive to ”outliers” than σ).

HW: Consider and ”experiment”: a coin is dropped twice, with heads = 0 and tails = 1 (i.e.

with outcome a number 0, 1 or 2). The experiment is repeated a large number of times, N .

1. find ”parent” values of n, all xi and pi

2. find µ and σ

3. find M,m, d

4. write a Mathematica code to simulate one experiment

5. repeat N =10,000 times creating a sample list of data (don’t forget the ”;” or your screen

will be full).

6. find x, s for the list and M,m and d

7. plot a histogram and compare with parent distribution.

16

Page 20: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

VII. DISCRETE AND CONTINUOS DISTRIBUTIONS

READING: Ch.2 + these notes

• Discrete: assume the elements of sample space can be numbered by an integer j; xj

are their values while pj are the probabilites. Cumulative distribution:

F (x) =∑

xj≤x

pj (20)

• Continuous: assume the elements of sample space can be identified by a continuous

variable x with a probability density p(x). Cumulative distribution is given by

F (x) =∫ x

−∞

dx′p (x′) (21)

Primtive examples:

Discrete: ”binary”

p0 = p1 =1

2, F (x) = 0, x < 0 , F (x) =

1

2, 0 ≤ x < 1 , F (x) = 1, x ≥ 1

Continuous: ”uniform”

p(x) =1

L, 0 ≤ x ≤ L , F (x) = 0, x < 0 , F (x) = x, 0 ≤ x ≤ L , F (x) = 1, x > L

1. Mean and variance

We use equivalently ”bar” for averages and 〈. . .〉 for longer expressions.

x =∑

pixi →∫ ∞

−∞

xp(x) dx (22)

σ2 =⟨

(x− x)2⟩

=⟨

x2⟩

− 〈x〉2 (23)

17

Page 21: Lecture Notes for Phys 114 ”Introduction to data reduction”

5 10 15 20

0.025

0.05

0.075

0.1

0.125

0.15

0.175

5 10 15 20

0.05

0.1

0.15

0.2

-1 1 2 3 4 5

0.1

0.2

0.3

0.4

0.5

FIG. 4: Binomial probability function and approximation of the results by a gaussian curve for

n = 20 , p = 1/2 (left, unbiased), n = 20 , p = 0.3 (middle, biased) and n = 3 , p = 0.6 (right).

The approximation becomes exact for n → ∞ (”Limit theorem of de Moivre and Laplace”) but in

practice is good starting from very modest n.

A. Change of variables in continuos distributions

x → y(x) , F [y(x)] = F (x)

Thus the new probability density P (y) is derived from

P (y) dy = p(x) dx (24)

Primitive example: p(x) = 1 , 0 ≤ x ≤ 1, y = x/L

VIII. BINOMIAL DISTRIBUTION

see 114−BinPoiGa.nb

Consider a ”loaded” coin with unequal probabilities of heads and tails, p and q = 1− p.

Then, probability to get m heads is

P binm = Cm

n pmqn−m , Cmn =

n!

m!(n−m)!(25)

see Fig. 4.

HW: (a) verify normalizationn∑

m=0

P binm = 1

(b) show that the following is true:

m ≡n∑

m=0

mP binm = pn (26)

σ2 = np(1− p) (27)

18

Page 22: Lecture Notes for Phys 114 ”Introduction to data reduction”

-3 -2 -1 1 2 3

0.2

0.4

0.6

0.8

1

FIG. 5: Normal probability density (red) and cumulative distribution (blue).

HW: use Stirling formula to approximate the binomial coefficient if n, m and n−m are large

A. Normal distribution

see Fig. 5 and the end of 114−BinPoiGa.nb.

For

Z =x− x

σ

P (Z) =1√2π

e−Z2/2 (28)

and

F (Z) ≡∫ Z

−∞

P (u)du =1

2

[

1 + erf(Z/√2)]

=1

2erfc(−Z/

√2) (29)

with

erf(z) =2√π

∫ z

0e−x2

dx ≡ 1− erfc(z) (30)

F (1)− F (−1) ≈ 68% (31)

F (2)− F (−2) ≈ 95% (32)

HW: Write the distribution in terms of x; find 〈x〉 and⟨

x2⟩

19

Page 23: Lecture Notes for Phys 114 ”Introduction to data reduction”

5 10 15 20

0.025

0.05

0.075

0.1

0.125

0.15

0.175

FIG. 6: Poisson distribution Pm (black) for m = 10 and binomial distributions P binm (red) with

different n and p = m/n. From top to bottom: n = 20, n = 100 and n = 400 (which practically

blends with the Poisson curve)

IX. LIMITS OF THE BINOMIAL DISTRIBUTION

A. Poisson

(note: typos in eq. (2.9) in textbook).

Consider n → ∞, p → 0 with fixed m = pn. Then,

P binm ≈ n!

(n−m)!nm

mm

m!

(

1− m

n

)n

=mm

m!e−m (33)

which is the Poisson distribution, Pm. See Fig. 6 and 114−BinPoiGa.nb.

HW: (a) verify normalization∞∑

m=0

Pm = 1

(b) verify

m ≡∞∑

m=0

mPm

(c) find⟨

m2⟩

≡∞∑

m=0

m2Pm

20

Page 24: Lecture Notes for Phys 114 ”Introduction to data reduction”

1. Advanced: sum of two Poisson distributions

see Fig. 7.

Consider P1(m) with m = µ1 and P2(m) with m = µ2 . Then

P1+2(n) =n∑

m=0

P1(m)P2(n−m) = (34)

=n∑

m=0

µm1

m!e−µ1

µn−m2

(n−m)!e−µ2 = (35)

=(µ1 + µ2)

n

n!e−(µ1+µ2)

n∑

m=0

Cmn

(

µ1

µ1 + µ2

)m (µ2

µ1 + µ2

)n−m

= (36)

=(µ1 + µ2)

n

n!e−(µ1+µ2) (37)

since the last sum evaluates to 1.

5 10 15 20 25 30 35

50 000

100 000

150 000

FIG. 7: The sum of two Poisson distributions with m1 = 5 (orange) and m2 = 10 (white) results in

another Poisson distribution (blue) with m = m1+ m2 . (”Experiment” with 106 random numbers

in each distribution).

B. Normal

Alternatively, let m be close to the average n/2 for p = q = 1/2. We will use

x = 2m− n

21

Page 25: Lecture Notes for Phys 114 ”Introduction to data reduction”

and further switch to scaled

y = x/√n ∼ 1

(with this the distribution is multiplied by√n to ensure normalization). This leads to

Gaussian. Major steps:

• use Stirling approximation

n! ≃√2πn (n/e)n , n ≫ 1 (38)

for both n! and (n−m)!

• replace m by (n+ y√n) /2 (and multiply by

√n/2 to ensure normalization).

• Take the limit n → ∞

P gauss(y) =1√2π

e−y2/2 (39)

See the end of 114−BinPoiGa.nb; also see this notebook for some Advanced topics: Gauss

from Poisson for m ≫ 1, cumulative distribution (CDF) and Addition Theorem for Poisson,

etc..

HW: Homework for Ch.2, pp. 34,35 (part of Exam 1):

2(d), 3-5, 11-18

C. Advanced: Central limit theorem

Why normal distributions are so typical?

Let

yn = X1 + . . .+Xn

with

µ = 〈Xn〉 , σ2 =⟨

(Xn − µ)2⟩

for any n. Then, for n → ∞ the distribution for Yn is asymptotically normal with

µy = nµ , σ2y = nσ2 (40)

22

Page 26: Lecture Notes for Phys 114 ”Introduction to data reduction”

30 40 50 60 70 80

0.02

0.04

0.06

0.08

0.1

FIG. 8: Illustration of central limit theorem. A sum of a large number (100) of independent

random variables has a gaussian distribution, while each individual variable can have an arbitrary

non-gaussian distribution. In the example individual random variables were informly distributed

with an average µ = 1/2 and variance σ2 = 1/4. Points (blue) correspond to an experimental

histogram after 10000 runs; the line (red) is the normal distribution with average µ100 = 100µ−1/2

and variance σ100 = σ√100. (the −1/2 is due to introducing a discrete histogram).

Note that we do not need a gaussian X, only large n(!)

For a ”binary” distribution X = {0, 1} we could see it above (in that case yn is binomial

and is known exactly). For a different X consider a inform distribution

p(X) =1√3, 1/2−

√3/2 ≤ X ≤ 1/2 +

√3/2

HW: find µ and σ2

see Fig. 8.

D. Physical example: Fluctuations of density in ideal gas

see Fig. 9

23

Page 27: Lecture Notes for Phys 114 ”Introduction to data reduction”

FIG. 9: 500 non-interacting particles randomly distributed in a large (blue) box. Probability to

find exactly m particles in a selected red box is given by a binomial distribution (exactly) with

n = 500 and p = 1/50 (and with m = np = 10). Once the number of red boxes is large, this

probability becomes Poissonian. If, in addition the number of molecules in a red box is still large,

the probability becomes Gaussian with the same m and with σ2 = m.

Dr. Vitaly A. Shneidman, Phys114

X. OTHER CONTINUOUS DISTRIBUTIONS

1. Exponential

p(x) =1

µexp

(

−x

µ

)

(41)

with

x = µ

.

HW: (a) calculate⟨

x2⟩

; (b) find F (x)

24

Page 28: Lecture Notes for Phys 114 ”Introduction to data reduction”

2. Gamma and χ2

f(x) = λ exp(−λx)(λx)t−1/Γ(t) (42)

with λ = 1/2 and t = n/2 (integer n) this is χ2 distribution.

F (x) = (43)

HW: (a) calculate 〈x〉 ; (b) find σ2 ; (c) find the above F (x) using Mathematica

A. Lorentz (Cauchy)

p(x) =1

π

γ

γ2 + (x− µ)2(44)

Note: mean undefined (error in textbook).

25

Page 29: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

XI. ADVANCED: EXPECTATION AND MOMENTS, ADDITION FOR GAUSS

E(g) ≡ 〈g(X)〉 =∑

i

g (xi) p (xi) →∫ ∞

−∞

g(x)p(x) dx (45)

kth moment:

µk = E(

Xk)

, µ1 ≡ µ (46)

kth central moment:

Mk = E(

(X − µ)k)

, M2 ≡ σ2 (47)

HW: Show that µ0 = M0 = 1

Dimensionless central moments:

γk = Mk/σk (48)

with γ3 - ”skewness” and γ4 - ”kurtosis”.

HW: show that for normal distribution γ3 = 0 and γ4 = 3 (for which reason γ4−3 is called ”excess

curtosis”)

A. Adding two normal distributions (ND)

Let p1(x), p2(x) - ND’s with means µ1 , µ2 and SD’s σ1 , σ2 . Then

P1+2(x) =∫ ∞

−∞

dy p1(y)p2(x− y) (49)

is also a ND with

µ = µ1 + µ2 (50)

σ2 = σ21 + σ2

2 (51)

(Proof in class). See next fig. with µ1,2 = 3 and 7, and σ1,2 = 3 and 4 (green and white).

Solid line - ND with µ = 10 and σ = 5. (histograms were obtained from ”experiment”, as

before).

26

Page 30: Lecture Notes for Phys 114 ”Introduction to data reduction”

-10 0 10 20 30

20 000

40 000

60 000

80 000

100 000

120 000

140 000

XII. DISTRUBUTION OF SEVERAL VARIABLES

READING: these notes

Probability density

p(x, y)

which satisfies all axioms of probability. (in fact, for 2 variables Veen diagrams are the most

insructive).

1. Multivariate Gaussian distribution

With

~r = (x, y, . . .)

p (~r) = C exp(

−1

2~r · A · ~r

)

, C =|detA|1/2

(2π)d/2(52)

where d is the dimension of r (2 in our case). [Proof in class].

(3D plot removed)

27

Page 31: Lecture Notes for Phys 114 ”Introduction to data reduction”

2. Covariance and correlation

Cov [X, Y ] = σ2xy = 〈(X − µx) (Y − µy)〉 (53)

with

Cov [X,X] = σ2x (54)

Correlation:

Corr [X, Y ] =Cov [X, Y ]

σxσy

=σ2xy

σxσy

(55)

and

Corr [X,X] = Corr [Y, Y ] = 1

(thus, one can introduce a correlation matrix, V .)

3. Statistical analog of covariance and correlation

Covariance:

sxy =1

n− 1

n∑

i=1

(xi − x) (yi − y) (56)

The rest is similar. Note: a good random number generator should give practically uncor-

related results - see 114−2dGa.nb

4. Independent random variables

p(x, y) = p1(x)p2(y)

or

F (x, y) = F1(x)F2(y)

Then

E(XY ) = E(X)E(Y )

28

Page 32: Lecture Notes for Phys 114 ”Introduction to data reduction”

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

1.2

1.4

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

1

-3 -2 -1 1 2 3

-5

5

: 1 0

0 4, 4>

-3 -2 -1 1 2 3

-4

-2

2

4

6

8

10

1 1

1 4

-3 -2 -1 1 2 3

-4

-2

2

4

6

8

10

: 1 1.95

1.95 4, 0.1975>

FIG. 10: Charge of pattern from non-correlated data (left) to strongly correlated data (right).

Upper row: uniform distribution in x. Lower: from ND with indicated covariance matrix (and

determinant).

Dr. Vitaly A. Shneidman, Phys114

XIII. MEASUREMENTS AND PROPAGATION OF ERRORS

READING: Ch.3+these notes

HW: pp. 49,50: 1,2,5,9-11

29

Page 33: Lecture Notes for Phys 114 ”Introduction to data reduction”

A. The coin experiment

20 000 40 000 60 000 80 000 100 000

-150

-100

-50

50

100

20 000 40 000 60 000 80 000 100 000

500

1000

1500

2000

1. Fair

heads – +1, tails – -1; ξ - random (”noise”):

ξ = ±1 , ξ = 0 , < ξ2 >=1

212 +

1

2(−1)2 = 1

xn = xn−1 + ξ (57)

x2n = x2

n−1 + ξ2 + 2ξxn−1 (58)

30

Page 34: Lecture Notes for Phys 114 ”Introduction to data reduction”

< x2n >=< x2

n−1 > + < ξ2 >=< x2n−1 > +1 (59)

< x2n >= n (60)

or

xRMS =√n

2. Unfair

p - probability to get heads

ξ = ±1 , ξ = µ = 2p− 1 , < ξ2 >= p · 12 + (1− p)(−1)2 = 1 = σ2 + µ2

with σ2 = 1− µ2 ≈ 1 for small µ.

xn = xn−1 + ξ (61)

< xn >=< xn−1 > +µ = µn (62)

x2n = x2

n−1 + ξ2 + 2ξxn−1 (63)

< x2n >=< x2

n−1 > + < ξ2 > +2 < xn−1 > µ =< x2n−1 > +1 + 2(n− 1)µ2 (64)

σ2n =< x2

n > −µ2n2 =< x2n−1 > − < xn−1 >

2 +σ2 = σ2n−1 + σ2

or

σn = σ√n , µn = µn

Thus, for any µ 6= 0 for large enough n the ”signal-to-noise ratio” becomes large

µn/σn ∼√n ≫ 1

allowing to properly measure µ.

B. Propagation of small errors

1. x = f(u)

SNR :u

σu≫ 1

31

Page 35: Lecture Notes for Phys 114 ”Introduction to data reduction”

x = f(u) , u = u+ ξ , ξ = 0 , < ξ2 >= σ2u (65)

x ≃ f(u) + f ′uξ +

1

2f ′′uuξ

2 + . . . (66)

x = f(u) +1

2f ′′uuσ

2u (67)

x2 = f 2 + f ′2u ξ

2 + ff ′′uuξ

2 + . . . (68)

< x2 >= f 2 +(

f ′2u + ff ′′) σ2

u (69)

x 2 = f 2 + ff ′′σ2u + . . .

< x2 > −x2 = f ′2u σ

2u (70)

or

σx = |f ′u| σu (71)

2. x = f(u, v)

u = u+ ξu , v = v + ξv

< ξ2u >= σ2u , < ξ2v >= σ2

v , < ξuξv >= σ2uv

x = f(u , v) + ξuf′u + ξvf

′v +

+1

2ξ2uf

′′uu +

1

2ξ2vf

′′vv + ξuξvf

′′uv (72)

x = f(u , v) +1

2σ2uf

′′uu +

1

2σ2vf

′′vv + σ2

uvf′′uv ≈ f (u , v) (73)

and

σ2x = σ2

uf′2u + σ2

vf′2v + 2σ2

uvf′uf

′v (74)

32

Page 36: Lecture Notes for Phys 114 ”Introduction to data reduction”

C. Designing an experiment: Example

T ≃ 2π√

L/g

g = 4π2 L

T 2

σ2g = g2

σ2L

L2+ 4

σ2T

T 2

33

Page 37: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

XIV. ESTIMATORS, χ2 AND KOLMOGOROV-SMIRNOV TESTS

READING: Ch.4 + a bit of Ch.11 + these notes HW: 4.5,4.6,4.9; apply the χ2-test to your

data on radioactive decay

A. Maximum likelihood

P (xi) = p(xi ; α)∆x

L = P (x1)P (x2) . . . P (xn) = max (75)

dL/dα = 0 (76)

Let, e.g.

p(xi) ∼ exp

−(xi − µ)2

2σ2

with fixed σ and adjustable µ. One has

L ∼ exp

−∑ (xi − µ)2

2σ2

(77)

0 = d lnL/dµ =∑

i

(xi − µ)

σ2(78)

µ =1

n

xi (79)

as expected.

Error in µ:

σ2µ = σ2(dµ/dx1)

2 + σ2(dµ/dx2)2 + . . . =

1

nσ2 (80)

σµ = σ/√n (81)

34

Page 38: Lecture Notes for Phys 114 ”Introduction to data reduction”

1. Weighted average

Let σ → σi:

p(xi) ∼ exp

−(xi − µ)2

2σ2i

with fixed set of σ and adjustable µ. One has

L ∼ exp

−∑ (xi − µ)2

2σ2i

(82)

0 = d lnL/dµ =∑

i

(xi − µ)

σ2i

(83)

µ =1

i(1/σ2i )

k

1

σ2k

xk (84)

Note that the best measurements with σi → 0 win!

Error in µ:

σ2µ = σ2

1(dµ/dx1)2 + σ2

2(dµ/dx2)2 + . . . =

1∑

i(1/σ2i )

2∑

k

1

σ2k

= (85)

=1

i(1/σ2i )

(86)

B. χ2

1. Where does it come from?

Let µ = 0 , σ = 1 and

p(x) =1√2π

e−x2/2

Introduce

yn =∑

i

x2i (87)

What is P (yn)? E.g., n = 1 , y1 = x2 , x = ±√y1

P (y1) = p[x(y1)](dx/dy1) · 2 ∼ e−y1/2/√y1 (88)

35

Page 39: Lecture Notes for Phys 114 ”Introduction to data reduction”

etc.

General (χ2 = x in PDF):

PDF :2−n/2e−x/2x

n

2−1

Γ(

n2

) (89)

CDF :1

Γ(n/2)Γ

n

2,χ2

2

(90)

xmode = n− 2

For n ≫ 1

PDF → 1√2π

e−y2/2 , y =x− xmode√

2xmode(91)

2 4 6 8 10

500

1000

1500

2000

2500

3000n = 1

2 4 6 8 10

1000

2000

3000

4000

5000n = 2

2 4 6 8 10

500

1000

1500

2000

2500n = 3

2 4 6 8 10

500

1000

1500

2000n = 4

FIG. 11:

36

Page 40: Lecture Notes for Phys 114 ”Introduction to data reduction”

C. Comparing two distributions

1. χ2-test for data vs theoretical

Let X (data) are tested vs. theory with known PDF p(x) and CDF F (x).

Let data be grouped in N bins, with 1 ≤ i ≤ N being the number of the bin

and Ri and Si be the number of events in a corresponding bin for data and

theory, respectively.

Si = N∫ xi+1

xi

dx p(x) = N (F (xi+1)− F (xi))

Then,

χ2 =∑

i

(Ri − Si)2

Si(92)

Evaluate

1

Γ(N/2)Γ

N

2,χ2

2

(93)

If this is close to 1 the two distributions are close. - see chi2.nb.

2. χ2-test for data1 vs data2

Let X and Y be grouped in N bins each, with 1 ≤ i ≤ N being the number

of the bin and Ri and Si be the number of events in a corresponding bin for

X and Y , respectively. Then,

χ2 =∑

i

(Ri − Si)2

Ri + Si(94)

Evaluate

1

Γ(N/2)Γ

N

2,χ2

2

(95)

If this is close to 1 the two distributions are close. Note

37

Page 41: Lecture Notes for Phys 114 ”Introduction to data reduction”

• the total number of events for X and Y is not required the same (oth-

erwise N → N − 1).

• bins should be filled up; 1-2 empty bins can be ok, but if a bin is empty

for both X and Y will not work.

• works well if the number of bins is large

• can be used to compare with a known distribtion with Si in the numer-

ator replaced by a known ni and the entire denominator replaced by the

the same ni

D. Kolmogorov-Smirnov

Qks (λ) = 2∞∑

i=1

(−1)i−1 exp(

−2i2λ2)

= 1− θ4(

0, e−2λ2)

(96)

0.5 1.0 1.5 2.0Λ

0.2

0.4

0.6

0.8

1.0

ksHΛL

FIG. 12:

38

Page 42: Lecture Notes for Phys 114 ”Introduction to data reduction”

-3 -2 -1 1 2 3x

0.2

0.4

0.6

0.8

1.0

FHxLd=0.064

FIG. 13:

1. Kolmogorov-Smirnov test

Consider two unbinned distribution with N points in each.

• Construct cummulant distributions S1(x) and S2(x)

• find the maximum distance

D = max−∞<x<∞ |S1(x)− S2(x)|

• Evaluate

Qks

(

N/2D)

• If the number is close to 1 distributions are similar.

Note:

• distributions must be one-dimensional each

• can be used if different number of points with N/2 → N1N2/ (N1 +N2)

39

Page 43: Lecture Notes for Phys 114 ”Introduction to data reduction”

200 400 600 800 1000

0.2

0.4

0.6

0.8

1

200 400 600 800 1000

0.2

0.4

0.6

0.8

1

FIG. 14: Cumulative distributions used in Kolmogorov-Smirnov test for comparing two distribu-

tions. Each set of data was generated using the standard uniform RNG (so that distributions are

expected identical if the number of points is large). Left - 20 points in each distribution, right -

200 points. The KS test identifies distributions with each other with confidence 0.33 for the 1st

case and confidence 0.987 for the 2nd.

• can be used for comparison with a known distribution with N/2 → N

40

Page 44: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

XV. MONTE CARLO INTEGRATION

READING: Ch.5 + these notes + 114−MonteCarlo.pdf

HW: 5.6,7,9. Using MC, find the area of a triangle with vertexes (-1,0), (1,0), (0,1).

A. Buffon’s needle

FIG. 15: Buffon’s needle. Chance to cross a line is 1/π.

Mario Lazzarini, an Italian mathematician, performed the Buffon’s needle

experiment in 1901. Tossing a needle 3408 times, he attained the well-known

estimate 355/113 for π, which is a very accurate value, differing by about

10−7. What was wrong? - see ”MonteCarlo.nb”

B. General MC and example

file MonteCarlo.nb

When to use?

• d ≥ 2

41

Page 45: Lecture Notes for Phys 114 ”Introduction to data reduction”

• complicated (”bad”) boundary

• more-or-less smooth integrand (no peaks in small areas)

• not too high accuracy is ok

Note, sometimes you may have a ”good” boundary but a ”bad” integrand.

If change of variables can reverse this, MC will work much better.

Ideas of MC - see Fig. 16. We want to find an area under the black arc

(semicircle in this case) and to locate its center of gravity. Steps:

• surround by a simple boundary (blue box)

• define functions ”ar” (area) and ”mom” (moment) with zero initial value

• generate N points inside the box randomly

• if a point falls under the arc, increase ”ar” and ”mom” accordingly

• calculate averages ar/N , mom/N

Error decays as 1/√N - not too fast, but algorithm is very simple.

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1

FIG. 16: Ideas of Monte Carlo integration

42

Page 46: Lecture Notes for Phys 114 ”Introduction to data reduction”

XVI. GENERATION OF RANDOM NUMBER FOR DIFFERENT DISTRIBU-

TIONS

Usually, the standard RNG gives a uniform density

p(x) = 1 , 0 < x < 1

with the cumulative distribution being

x , 0 ≤ x ≤ 1

Then, for another distribution with a cumulative F (y) one has

y = F−1(x)

Example:

A. exponential distribution

file: 114−MonteCarlo.nb

Transformation method:

P (y) = λ exp(−λy) (97)

F (y) = 1− exp(−λy) = x (98)

y = −1

λln(1− x) (99)

B. Poisson

The rejection method plus look-up table - see file: 114−MonteCarlo.nb

43

Page 47: Lecture Notes for Phys 114 ”Introduction to data reduction”

10 20 30 40 50

0.02

0.04

0.06

0.08

0.1

10 20 30 40 50

0.02

0.04

0.06

0.08

0.1

10 20 30 40 50

0.02

0.04

0.06

0.08

0.1

FIG. 17: ”Experimental” studies of exponential distribution. 10000 data were produced by a do-

it-yourself RNG obtained from a modified built-in RNG for uniform distributions (see text) and

grouped into different bins. From left to right: bin size 1, 5 and 0.2. In each case solid line is

exponential approximation obtained from non-linear fit.

C. Gauss

Transformation method and the Box-Muller algorithm. See file:

114−MonteCarlo.nb

44

Page 48: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

XVII. LSA

READING: Ch.6 + notes

A. Fitting of data and geometric LSA

Fitting to a straight line:

y = a+ bx

with

b =sxyσ2x

, a = y − bx (100)

and sxy being ”sample covariance”.

In Mathematica linear fit is achieved using the ”FindFit” command.

HW: create a list of 20 points of type y = ax + b+noise. Use ”FindFit” command to find a

linear fit; compare coefficients to a and b.

B. ”Physical” LSA

Notations:

X = (x1 , x2 , . . . , xn) , Y = (y1 , . . . , yn) , S = (σ1 , . . . , σn) (101)

σ2 = 1/n∑

i=1

σ−2i , 〈f〉 = σ2

n∑

i=1

fiσ2i

for any vector f (102)

〈X〉 = σ2n∑

i=1

xiσ2i

, 〈Y 〉 = σ2n∑

i=1

yiσ2i

(103)

〈XY 〉 = σ2n∑

i=1

xiyiσ2i

, 〈X2〉 = σ2n∑

i=1

x2iσ2i

(104)

45

Page 49: Lecture Notes for Phys 114 ”Introduction to data reduction”

Units:

[σ] = [Y ] = [σi] , [〈f〉] = [f ]

y(x) = a+ bx (105)n∑

i=1

[yi − y(xi)]2

σ2i

= min (106)

∂a:

n∑

i=1

yi − y(xi)

σ2i

= 0

〈Y 〉 − a− b〈X〉 = 0 (107)

∂b:

n∑

i=1

[yi − y(xi)]xiσ2i

= 0

〈XY 〉 − a〈X〉 − b〈X2〉 = 0 (108)

Thus,

a =〈X2〉〈Y 〉 − 〈X〉〈XY 〉

〈X2〉 − 〈X〉2 (109)

b =〈XY 〉 − 〈X〉〈Y 〉〈X2〉 − 〈X〉2 (110)

Poisson:

σ2i = yi > 0 (111)

σ2 = 1/∑

(1/yi) (112)

〈Y 〉 = nσ2 (113)

〈X〉 = σ2∑xi/yi (114)

〈XY 〉 = σ2∑ xi (115)

46

Page 50: Lecture Notes for Phys 114 ”Introduction to data reduction”

1. Errors in a and b

σ2a =

σ2i (da/dyi)

2 ∝ ∑

σ2i (〈X2〉σ

2

σ2i

− 〈X〉xiσ2

σ2i

)2 ∝ (116)

σ4∑(〈X2〉2 + 〈X〉2x2i − 2〈X2〉〈X〉xi)/σ2i ∝ σ4〈X2〉(〈X2〉 − 〈X〉2) = (117)

= σ2 〈X2〉〈X2〉 − 〈X〉2(118)

Similarly

σ2b =

σ2i (db/dyi)

2 = σ2 1

〈X2〉 − 〈X〉2 (119)

47

Page 51: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

XVIII. TRIGONOMETRIC, POLYNOMIAL AND NONLINEAR FITS

READING: Ch.7,8 + orthog.pdf

HW: reproduce examples from orthog.pdf, and in each case find χ2 and χ2/dof .

Theory: in class.

===============================(updated till here)

48

Page 52: Lecture Notes for Phys 114 ”Introduction to data reduction”

Dr. Vitaly A. Shneidman, Phys114

〈x| |x〉 〈x|y〉 〈x〉 〈xy〉

APPENDIX A: A BIT OF MATHEMATICA

1. Basic elementary commands

HELP:

1) if you know the exact command , but want to refresh what

argument it requires, use ?. E.g.

?Sin

2) if you approximately know the spelling, use ? with * for the

unknown part, e.g.

?*Plot*

gives all commands which have Plot in them

Frequent type- and space-saving commands:

1) % uses the last output as input.

Similarly, %% uses the one before last output, etc. Or, %12

2) space - can be used instead of * for multiplication:

3) ; will not produce an output on the screen (but can

work with it further!)

(main typesaving - defining your own functions, etc. - will study

later).

4) /.x->.... replacement. E.g.

3x^3/.x->2 gives

24

Sin[x/y]/.{x->Pi,y->2.} gives 1.

49

Page 53: Lecture Notes for Phys 114 ”Introduction to data reduction”

Saving your work:

There are two ways:

1) Save["filename", symbol] appends definitions associated with

the specified symbol to a file.

if symbol includes previous definitions, will save everything

which is required! "filename" usually includes .m at the end (for

convenience), but you can be creative. Graphics cannot be saved

this way, but you can save the last command used to generate it,

and then recreate the picture upon restarting Mathematica. Files

are in plain text and relatively small.

Example:

In[1]:= fig:=Plot[Sin[x]/x, {x,-8,8}] (we defined a plot function,

fig)

In[2]:= Save["figSinc.m", fig] (saved this function in a file

figSinc.m)

In[3]:= !!figSinc.m (this shows the contents of the file)

fig := Plot[Sin[x]/x, {x, -8, 8}]

Now, if you start a new Mathematica session, you can type

<< figSinc.m

and you will have all saved definitions. Command fig will plot

your picture. Note: can require a full path on your PC.

2) you save as a notebook, with all graphics you created (and all

the junk). Saved files are BIG, and can quickly overflow your

directory if caution is not used. Use sparingly, and only for work

you feel you really need and which you cannot save using the Save

command.

2. NUMBERS

1) Integer

50

Page 54: Lecture Notes for Phys 114 ”Introduction to data reduction”

2) Exact - 1/2, 10^-10, Pi, E, Sqrt[2], EulerGamma, etc.

3) approximate - 2. , 10.^-10, pi= N[Pi,15], e=N[E,7], etc.

4) Complex numbers:

I represents the imaginary unit Sqrt[-1], e.g.

z = 2+3I and then Abs[z]=..., Arg[z]=..., etc.

5) Random numbers, e.g.

Random[]

Note: Updated in Mathematica9

3. SYMBOLIC MATH

Sum, e.g.:

In[74]:= Sum[i^2, {i,1,n}]

or

In[74]:= Sum[i^-2, {i, 1, Infinity}]

Derivatives and integration:

In[75]:= D[x^n,x]

-1 + n

Out[75]= n x

In[76]:= D[%,x]

-2 + n

Out[76]= (-1 + n) n x

In[77]:= Integrate[%,x]

-1 + n

Out[77]= n x

Algebraic operations:

Expand, Factor, Collect, Simplify, etc.

Trigonometry:

TrigExpand and TrigReduce

51

Page 55: Lecture Notes for Phys 114 ”Introduction to data reduction”

Connection with exponential notations:

In[8]:= ExpToTrig[Exp[I x]]

Out[8]= Cos[x] + I Sin[x]

or

In[9]:= TrigToExp[Cos[x]+I Sin[x]]

I x

Out[9]= E

Power serieses:

In[118]:= Series[Exp[a x], {x, 0, 5}]

To make a polynomial by truncating a series:

In[119]:= Normal[%]

Will give a series even if there is a simple singularity (Laurent

series):

In[1]:= Series[1/Sin[t], {t,0,2}]

1 t 3

Out[1]= - + - + O[t]

t 6

Limit:

In[123]:= Limit[(1+x/n)^n, n->Infinity]

x

Out[123]= E

4. DEFINING YOUR OWN FUNCTIONS

In[1]:= f[x_]:=Sin[x]

In[3]:=Plot[f[x]/x, {x,-6,6}] (*will give a plot*)

52

Page 56: Lecture Notes for Phys 114 ”Introduction to data reduction”

Can define and save a plotting function:

In[23]:= plotf:=Plot[f[x]/x, {x,-6,6}]

Difference between := and =

In[19]:= r=Random[];

In[20]:= Table[r, {i,5}]

Out[20]= {0.307826, 0.307826, 0.307826, 0.307826, 0.307826}

Gives identical numbers since r was assigned a fixed value

but

In[21]:= Clear[r]; r:=Random[]

In[22]:= Table[r, {i,5}]

Out[22]= {0.0592439, 0.981402, 0.944823, 0.0902293, 0.598816}

gives different values each time r is evaluated

A third assignment (dangerous!):

Clear[x]; f[x_]=Sin[x]

Must use Clear (!)

5. Graphics (2D)

Main functions: Plot, Show, ListPlot

Options: PlotStyle, AxesLabel, etc.

Text, arrows, etc.

Plot[f, {x, xmin, xmax}] generates a plot of f as a function of x

from xmin to xmax. Plot[{f1, f2, ... }, {x, xmin, xmax}] plots

several functions.

PlotRange is an option for graphics functions that specifies what

points to include in a plot.

Show[graphics, options] displays two- and three-dimensional

graphics, using the options specified. Show[g1, g2, ... ] shows

53

Page 57: Lecture Notes for Phys 114 ”Introduction to data reduction”

several plots combined.

Examples:

In[12]:= Clear[plo]

In[13]:= plo[n_]:=Plot[Sin[n x]/x, {x,-2,2}, PlotRange -> {-1,2},

PlotStyle -> Dashing[{0.01*n, 0.02}]]

In[15]:= sho:=Show[Table[plo[n], {n,1,3}]]

In[16]:= sho (*will give graphics*)

AxesLabel -> {"x", "y"} will label each axes.

Plotting discrete data points:

ListPlot[{y1, y2, ... }] plots a list of values. The x coordinates

for each point are taken to be 1, 2, ... .

ListPlot[{{x1, y1}, {x2, y2}, ... }]

plots a list of values with specified x and y coordinates.

Example:

In[21]:= list=Table[Sin[i/100.]+.1*Random[], {i,100}];

In[22]:= ListPlot[list]

Out[22]= -Graphics-

Main extra options:e.g., PlotStyle -> PointSize[0.02],

or Joined->True

Parametric plot:

ParametricPlot[{fx, fy}, {t, tmin, tmax}] produces a parametric

plot with x and y coordinates fx and fy generated as a function

of t. Example:

In[2]:=Clear[x,y,phi]; x[phi_]=Cos[phi];

In[3]:= y[phi_]=Sin[phi];

In[4]:= ParametricPlot[{x[phi],y[phi]}, {phi, 0, 2Pi}]

54

Page 58: Lecture Notes for Phys 114 ”Introduction to data reduction”

-Graphics- (not a circle on the screen)

(*by default, AspectRatio ->GoldenRatio; try

to use Show[%, AspectRatio -> 1]*)

PlotLabel:

can be simple text, PlotLabel -> "mypicture" or

Labels as parameters:

plo[n_] := Plot[x^n, {x, 0, 1}, PlotLabel ->n] (*no quotes

now!!!*)

(*suppose we like what we see and want to create, a

postscript file, e.g. t.ps *)

disp:=Export["t.ps", #, "EPS"] &

(*now disp[plo[3]] will create t.ps, as a GOOD postscript

which is outside of Mathematica and can be further used

independently*)

Graphics Primitives:

Line[{pt1, pt2, ... }] is a graphics primitive which represents a

line joining a sequence of points.

Point[coords] is a graphics primitive that represents a point.

Circle[{x, y}, r] is a two-dimensional graphics primitive that

represents a circle of radius r centered at the point x, y.

Polygon, etc. use with Graphics, similar to Arrow

=============================================================

===================================================================

55

Page 59: Lecture Notes for Phys 114 ”Introduction to data reduction”

2 4 6 8 10

0.1

0.2

0.3

0.4

0.5

FIG. 18: Experimental studies of a normal distribution. 10000 data were produced using a RNG

from standard package in Mathematica and grouped into bins. Solid line is the best non-linear fit

by a gaussian curve.

100 200 300 400

-1.5

-1

-0.5

0.5

1

1.5

100 200 300 400

-1

-0.5

0.5

1

100 200 300 400

-1

-0.5

0.5

1

FIG. 19: A do-it-yourself Fourier filter which eliminates ”noise” (in fact, any signal) with a fourier

component below selected level. Red line - hidden deterministic signal, black dots - full signal after

filtering: left - noise cut-off 0.1, middle - noise cut-off 0.6 and right noise cut-off 0.7

6. Modeling and analysis of data

a. Filtering

will be discussed in class. See −stat.nb and Figs. 19 and 20

HW: LAST HW: (a) create 2 lists of 40 random points each with exponen-

tial distribution (see −stat.nb). (b) perform the χ2-test (c) perform the KS test

100 200 300 400

-1

-0.5

0.5

1

100 200 300 400

-1

-0.5

0.5

1

100 200 300 400

-1

-0.5

0.5

1

FIG. 20: Same signal as in Fig. 19 processes using the built-in moving average filter.

56

Page 60: Lecture Notes for Phys 114 ”Introduction to data reduction”

=======================

57