880.P20 Winter 2006 Richard Kass Binomial Probability Distribution For the binomial distribution P...

880.P20 Winter 2006 Richard Kass

Binomial Probability Distribution

mNmqpmNm

NpNmP

)!(!

!),,(

For the binomial distribution P is the probability of m successes out of N trials. Here p is probability of a success and q=1-p is probability of a failure only two choices in a binomial process. Tossing a coin N times and asking for m heads is a binomial process.The binomial coefficient keeps track of the number of ways (“combinations”) we can get the desired outcome. 2 heads in 4 tosses: HHTT, HTHT, HTTH, THHT, THTH, TTHH

D o e s t h i s f o r m u l a m a k e s e n s e , e . g . i f w e s u m o v e r a l l p o s s i b i l i t i e s d o w e g e t 1 ? T o s h o w t h a t t h i s d i s t r i b u t i o n i s n o r m a l i z e d p r o p e r l y , f i r s t r e m e m b e r t h e B i n o m i a l T h e o r e m :

( a b ) k k

l

a k l

l 0

k

b l

F o r t h i s e x a m p l e a = q = 1 - p a n d b = p , a n d ( b y d e f i n i t i o n ) a + b = 1 .

P ( m , N , p ) m 0

N

N

m

p m q N m

m 0

N

( p q ) N 1

T h u s t h e d i s t r i b u t i o n i s n o r m a l i z e d p r o p e r l y .

What is the mean of this distribution?

mP(m, N, p)

m0

N

P(m, N, p)m0

N

mP(m,N, p)

m0

N

mN

m

m0

N

pmqN m

A cute way of evaluating the above sum is to take the derivative:

p

N

m

m0

N

pmqN m

0 m

N

m

m0

N

pm 1qN m N

m

m0

N

pm(N m)(1 p)N m 1

mN

m

m0

N

pm 1qN m N

m

m0

N

pm(N m)(1 p)N m 1

p 1 mN

m

m0

N

pmqN m (1 p) 1 NN

m

m0

N

pm(1 p)N m (1 p) 1N

m

m0

N

mpm(1 p)N m

p 1 (1 p) 1 N(1) (1 p) 1

:tcoefficienbinomial

,

m

NC mN

=Np


Binomial Probability Distribution

Suppose you observed m special events (or successes) in a sample of N events. The measured probability (sometimes called “efficiency”) for a special event to occur is m / N . What is the error ( standard deviation or ) in ? Since N is a fixed quantity it is plausible (we will show it soon) that the error in is related to the error ( standard deviation or m) in m by:

m / N . This leads to:

m / N Npq / N N (1 ) / N (1 ) / N This is sometimes called "error on the efficiency". Thus you want to have a sample (N) as large as possible to reduce the uncertainty in the probability measurement!

What’s the variance of a binomial distribution?Using a trick similar to the one used for the average we find:

NpqpNmP

pNmPm

N

m

N

m

0

0

2

2

),,(

),,()(

Note: , the “error in the efficiency” 0 as 0 or 1.(This is NOT a gaussian so don’t stick it into a Gaussian pdf to calculate probability)

Detection efficiency and its “error”:

G G


Binomial Probability DistributionsWhen a -ray goes though material there is chance that it will convert into an electron-positron pair, e+e-. Let’s assume the probability for conversion is 10%. If 100 ’s go through this material on average how many will convert to e+e-? = Np = 100(0.1) = 10 conversions Consider the case where the ’s come from 0’s. most (98.8%) of the time. We can ask the following: What is the probability that both ’s will convert? P(2)=Probability of 2/2 = (0.1)2 =0.01= 1% What is the probability that one will convert? P(1)=Probability of 1/2 = [2!/(1!1!)](0.1)1(0.9)1 = 18% What is the probability that both ’s will not convert? P(0)=Probability of 0/2 =[2!/(0!2!)](0.1)0(0.9)2 = 81% Note: P(2)+P(1)+P(0)=100% Finally, the probability of at least one conversion is: P(1)=1- P(0) = 19%


Poisson Probability DistributionA n o t h e r i m p o r t a n t d i s c r e t e d i s t r i b u t i o n i s t h e P o i s s o n d i s t r i b u t i o n . C o n s i d e r t h e f o l l o w i n g c o n d i t i o n s :

a ) p i s v e r y s m a l l a n d a p p r o a c h e s 0 . F o r e x a m p l e s u p p o s e w e h a d a 1 0 0 s i d e d d i c e i n s t e a d o f a 6 s i d e d d i c e . H e r e p = 1 / 1 0 0 i n s t e a d o f 1 / 6 . S u p p o s e w e h a d a 1 0 0 0 s i d e d d i c e , p = 1 / 1 0 0 0 . . . e t c

b ) N i s v e r y l a r g e , i t a p p r o a c h e s . F o r e x a m p l e , i n s t e a d o f t h r o w i n g 2 d i c e , w e c o u l d t h r o w 1 0 0 o r 1 0 0 0 d i c e . c ) T h e p r o d u c t N p i s f i n i t e . A g o o d e x a m p l e o f t h e a b o v e c o n d i t i o n s o c c u r s w h e n o n e c o n s i d e r s r a d i o a c t i v e d e c a y . S u p p o s e w e h a v e 2 5 m g o f a n e l e m e n t . T h i s i s 1 0 2 0 a t o m s . S u p p o s e t h e l i f e t i m e ( ) o f t h i s e l e m e n t = 1 0 1 2 y e a r s 5 x 1 0 1 9 s e c o n d s . T h e p r o b a b i l i t y o f a g i v e n n u c l e u s t o d e c a y i n o n e s e c o n d = 1 / = 2 x 1 0 - 2 0 / s e c . F o r t h i s e x a m p l e : N = 1 0 2 0 ( v e r y l a r g e ) p = 2 x 1 0 - 2 0 ( v e r y s m a l l ) N p = 2 ( f i n i t e ! ) W e c a n d e r i v e a n e x p r e s s i o n f o r t h e P o i s s o n d i s t r i b u t i o n b y t a k i n g t h e a p p r o p r i a t e l i m i t s o f t h e b i n o m i a l d i s t r i b u t i o n .

P ( m , N , p ) N !

m ! ( N m ) !p m q N m

U s i n g c o n d i t i o n b ) w e o b t a i n :

N !

( N m ) !

N ( N 1 ) ( N m 1 ) ( N m ) !

( N m ) ! N m

q N m ( 1 p ) N m 1 p ( N m ) p 2 ( N m ) ( N m 1 )

2 ! . . . 1 p N

( p N ) 2

2 ! e p N

P u t t i n g t h i s a l t o g e t h e r w e o b t a i n :

P ( m , N , p ) N m p m e p N

m !

e m

m !

H e r e w e ' v e l e t = p N . I t i s e a s y t o s h o w t h a t : = N p = m e a n o f a P o i s s o n d i s t r i b u t i o n 2 = N p = = v a r i a n c e o f a P o i s s o n d i s t r i b u t i o n .

N o t e : m i s a l w a y s a n i n t e g e r 0 h o w e v e r , d o e s n o t h a v e t o b e a n i n t e g e r .

radioactive decaynumber of Prussian soldiers kicked to death by horses per year per army corps! quality control, failure rate predictions

N>>m

In a counting experiment if you observe m events:

m


Poisson Probability DistributionRadioactivity Example: a) What’s the probability of zero decays in one second if the average = 2 decays/sec?

P(0,2) e 2 20

0!

e 21

1e 2 0.135 13.5%

b) What’s the probability of more than one decay in one second if the average = 2 decays/sec?

P(1,2) 1 P(0,2) P(1,2) 1 e 2 20

0!

e 2 21

1!1 e 2 2e 2 0.594 59.4%

c) Estimate the most probable number of decays/sec?

We want: m

P(m,)m*0

To solve this problem its convenient to maximize lnP(m, ) instead of P(m, ).

ln P(m,) ln(e m

m!) m ln lnm!

In order to handle the factorial when take the derivative we use Stirling's Approximation: ln(m!) mln(m)-m

m

lnP(m,) m

( m * ln lnm *!) m

( m * ln m * lnm *m*) ln lnm * 11 0

m* = In this example the most probable value for m is just the average of the distribution. Therefore if you observed m events in an experiment, the error on m is m . Caution: The above derivation is only approximate since we used Stirlings Approximation which is only valid for large m. Another subtle point is that strictly speaking m can only take on integer values while is not restricted to be an integer.

0

0.1

0.2

0.3

0.4

0.5

Prob

abili

ty

0 1 2 3 4 5m

poissonbinomial

N=3, p=1/3

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Prob

abili

ty

0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0

m

binomialpoisson

N=10,p=0.1

Comparison of Binomial and Poissondistributions with mean=1.

Not much differencebetween them here!

ln10!=15.10 10ln10-10=13.03 14%ln50!=148.48 50ln50-50=145.601.9%


Poisson Probability Distribution

2 4 6 8 10 12

5

10

15

20

number of cosmic rays in a 15 sec. interval

number ofoccurrences

poisson with =5.4

Counting the numbers of cosmic rays that pass through a detector in a 15 sec interval

counts occurrences

0 0

1 2

2 9

3 11

4 8

5 10

6 17

7 6

8 8

9 6

10 3

11 0

12 0

13 1

Data is compared with a poisson using the measured average number of cosmicrays passing through the detector in eighty one 15 sec. intervals (=5.4)

Error bars are (usually) calculated using ni (ni=number in a bin) Why? Assume we have N total counts and the probability to fall in bin i is pi. For a given bin we have a binomial distribution (you’re either in or out).The expected average number in a given bin is: Npi and the variance is Npi(1-pi)=ni(1-pi)If we have a lot of bins then the probability of a event falling into a bin is small so (1-pi) 1

In our example the largest pi =17/81=0.21correction=(1-.21)1/2=0.88


Gaussian Probability DistributionT h e G a u s s i a n p r o b a b i l i t y d i s t r i b u t i o n ( o r “ b e l l s h a p e d c u r v e ” o r N o r m a l d i s t r i b u t i o n ) i s p e r h a p s t h e m o s t u s e d d i s t r i b u t i o n i n a l l o f s c i e n c e . U n l i k e t h e b i n o m i a l a n d P o i s s o n d i s t r i b u t i o n t h e G a u s s i a n i s a c o n t i n u o u s d i s t r i b u t i o n . I t i s g i v e n b y :

p ( y ) 1

2 e

y 2

2 2

w i t h = m e a n o f d i s t r i b u t i o n ( a l s o a t t h e s a m e p l a c e a s m o d e a n d m e d i a n ) 2 = v a r i a n c e o f d i s t r i b u t i o n y i s a c o n t i n u o u s v a r i a b l e ( - y T h e p r o b a b i l i t y ( P ) o f y b e i n g i n t h e r a n g e [ a , b ] i s g i v e n b y a n i n t e g r a l :

P ( a y b ) 1

2 e

( y ) 2

2 2

a

b dy

S i n c e t h i s i n t e g r a l c a n n o t b e e v a l u a t e d i n c l o s e d f o r m f o r a r b i t r a r y a a n d b ( a t l e a s t n o o n e ' s f i g u r e d o u t h o w t o d o i t i n t h e l a s t c o u p l e o f h u n d r e d y e a r s ) t h e v a l u e s o f t h e i n t e g r a l h a v e t o b e l o o k e d u p i n a t a b l e .

x

It is very unlikely (<0.3%) that a measurement taken at random from a gaussian pdf will be more than from the true mean of the distribution.

T h e t o t a l a r e a u n d e r t h e c u r v e i s n o r m a l i z e d t o o n e . I n t e r m s o f t h e p r o b a b i l i t y i n t e g r a l w e h a v e :

P ( y ) 1

2 e

( y ) 2

2 2

dy 1

Q u i t e o f t e n w e t a l k a b o u t a m e a s u r e m e n t b e i n g a c e r t a i n n u m b e r o f s t a n d a r d d e v i a t i o n s ( ) a w a y f r o m t h e m e a n ( ) o f t h e G a u s s i a n . W e c a n a s s o c i a t e a p r o b a b i l i t y f o r a m e a s u r e m e n t t o b e | - n | f r o m t h e m e a n j u s t b y c a l c u l a t i n g t h e a r e a o u t s i d e o f t h i s r e g i o n . n P r o b . o f e x c e e d i n g ± n 0 . 6 7 0 . 5 1 0 . 3 2 2 0 . 0 5 3 0 . 0 0 3 4 0 . 0 0 0 0 6


Central Limit Theorem

Why is the gaussian pdf so important ?“ T h i n g s t h a t a r e t h e r e s u l t o f t h e a d d i t i o n o f l o t s o f s m a l l e f f e c t s t e n d t o b e c o m e G a u s s i a n ” T h e a b o v e i s a c r u d e s t a t e m e n t o f t h e C e n t r a l L i m i t T h e o r e m : A m o r e e x a c t s t a t e m e n t i s : L e t Y 1 , Y 2 , . . . Y n b e a n i n f i n i t e s e q u e n c e o f i n d e p e n d e n t r a n d o m v a r i a b l e s e a c h w i t h t h e

s a m e p r o b a b i l i t y d i s t r i b u t i o n . S u p p o s e t h a t t h e m e a n ( ) a n d v a r i a n c e ( 2) o f t h i s

d i s t r i b u t i o n a r e b o t h f i n i t e . T h e n f o r a n y n u m b e r s a a n d b :

limn

P a Y 1 Y 2 Y n n

n b

1

2 e 1 / 2 y 2

a

b d y

T h u s t h e C . L . T . t e l l s u s t h a t u n d e r a w i d e r a n g e o f c i r c u m s t a n c e s t h e p r o b a b i l i t y d i s t r i b u t i o n t h a t d e s c r i b e s t h e s u m o f r a n d o m v a r i a b l e s t e n d s t o w a r d s a G a u s s i a n d i s t r i b u t i o n a s t h e n u m b e r o f t e r m s i n t h e s u m .

A l t e r n a t i v e l y , limn

P a Y

/ n b

lim

n P a

Y m

b

1

2 e 1 / 2 y 2

a

b d y

N o t e : m i s s o m e t i m e s c a l l e d “ t h e e r r o r i n t h e m e a n ” ( m o r e o n t h a t l a t e r ) .

For CLT to be valid: and of pdf must be finiteNo one term in sum should dominate the sum

Actually, the Y’s canbe from different pdf’s!


Central Limit TheoremBest illustration of the CLT.

a) Take 12 numbers (ri) from your computer’s random number generatorb) add them together c) Subtract 6d) get a number that is from a gaussian pdf !

Computer’s random number generator gives numbers distributed uniformly in the interval [0,1]A uniform pdf in the interval [0,1] has =1/2 and 2=1/12

0-6 +6

6

12)12/1(

)2/1(1266

12)12/1(

)2/1(126

1212

1

12

1

12

1321 ii

ii

ii

nr

Pr

Pbn

raPb

n

nYYYYaP

dyerP y

ii

6

6

)2/(12

1

2

2

1666

Thus the sum of 12 uniform random numbersminus 6 is distributed as if it came from a gaussian pdfwith =0 and =1.

E

A) 5000 random numbers B) 5000 pairs (r1+ r2)of random numbers

C) 5000 triplets (r1+ r2+ r3)of random numbers

D) 5000 12-plets (r1+ ++r12) of random numbers.

E) 5000 12-plets

(r1+ ++r12-6) of random numbers.

Gaussian =0 and =1In this case, 12 is close to .


Central Limit Theorem

Example: An electromagnetic calorimeter is being made out of a sandwich of lead and plastic scintillator. There are 25 pieces of lead and 25 pieces of plastic, each piece is nominally 1 cm thick. The spec on the thickness is 0.5 mm and is uniform in [-0.5,0.5] mm. The calorimeter has to fit inside an opening of 51 cm. What is the probability that it won’t will fit?Since the machining errors come from a uniform distribution with a well defined mean and variance

the Central Limit Theorem is applicable:

The upper limit corresponds to many large machining errors, all +0.5 mm:

The lower limit corresponds to a sum of machining errors of 1 cm.

The probability for the stack to be greater than cm is:

There’s a 31% chance the calorimeter won’t fit inside the box!

limn

P a Y1Y2 ...Yn n n

b

1

2e 1

2y2

a

b dy

2.1250

050)5.0(50...

121

21

n

nYYYb n

31.02

1 2.12

49.0

221

dyePy

49.050

0501...

121

21

n

nYYYa n

(and a 100% chance someone will get fired if it doesn’t fit inside the box…)


When Doesn’t the Central Limit Theorem Apply?Case I) PDF does not have a well defined mean or variance. The Breit-Wigner distribution does not have a well defined variance!

220 )2/()(

1

2)(

mm

mBW

1)(:normalized

dmmBW

0)(:average defined well mdmmmBW

dmmBWm )(:since varianceundefined 2

Case II) Physical process where one term in the sum dominates the sum. i) Multiple scattering: as a charged particle moves through material it undergoes many elastic (“Rutherford”) scatterings. Most scattering produce small angular deflections (dd~-4) but every once in a while a scattering produces a very large deflection. If we neglect the large scatterings the angle plane is gaussian distributed. The mean depends on the material thickness & particle’s charge & momentum

ii) The spread in range of a stopping particle (straggling).A small number of collisions where the particle loses a lot of its energy dominates the sum.iii) Energy loss of a charged particle going through a gas. Described by a “Landau” distribution (very long “tail”).

Describes the shape of a resonance, e.g. K*

880.P20 Winter 2006 Richard Kass Binomial Probability Distribution For the binomial distribution P...

Documents

Transcript of 880.P20 Winter 2006 Richard Kass Binomial Probability Distribution For the binomial distribution P...