Unit 2: Review of Probability - University of...
Transcript of Unit 2: Review of Probability - University of...
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 1
Unit 2: Review of Probability
Statistics 571: Statistical MethodsRamón V. León
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 2
Approaches to Probability• Approaches to probability
– Classical approach– Frequentist– Personal or subjective approach– Axiomatic approach
• Basic ideas of axiomatic approach– Sample space– Events– Union– Intersection– Complement– Disjoint or mutually exclusive events– Inclusion
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 3
Axioms of Probability• Axioms:
– P(A) ≥0– P(S) = 1 where S is the sample space– P(A ∪ B) = P(A) + P(B) if A and B are mutually
exclusive events• Theorems about probability can be proved using these
axioms• These theorems can be used in probability calculations
– E.g. assuming all elements of the sample space are equally likely
– Counting arguments used. (Take a look at Birthday Problem on Page 13.)
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 4
Conditional Probability and Independence• Conditional probability
– P(A | B) = P (A ∩ B) / P(B)• Events A and B are mutually independent if P (A | B) = P(A)
– Implies P (A ∩ B) = P(A)P(B)
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 5
Tossing Two Dice
(6,6)(6,5)(6,4)(6,3)(6,2)(6,1)6(5,6)(5,5)(5,4)(5,3)(5,2)(5,1)5(4,6)(4,5)(4,4)(4,3)(4,2)(4,1)4(3,6)(3,5)(3,4)(3,3)(3,2)(3,1)3
(2,6)(2,5)(2,4)(2,3)(2,2)(2,1)2(1,6)(1,5)(1,4)(1,3)(1,2)(1,1)1
654321
First Die Outcome
Second Die Outcome
Sample space has 6 x 6 = 36 outcomes
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 7
AIDS Example
100009900100
941094055TestNegative
59049595Test positive
Not AIDSAIDS
P(A) = 100/10000 =.01 P(+|A) = 95/100 =.95 P(-|~A) = 9405/9900 =.95P(A|+) = 95/590 =.16
The usual way of solving this problem uses Bayes Theorem
Given
Conclude
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 9
Bayes Theorem Consequences
( ) ( | )P A P A B⎯⎯→
( | ) ( | )P A B P B A≠
( ) ( | )P A P A Data⎯⎯→
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 11
Random Variables
• A random variable (r.v.) associates a unique numerical value with each outcome in the sample space
• Example:
• Discrete random variables: number of possible values is finite or countably infinite: x1, x2, x3, x4, x5, x6, …
• Probability mass (density) function (p.m.f. or p.d.f.)– f(x) = P(X= x )
• Cumulative distribution function (c.d.f.)
– F(x) = P (X ≤ x) =
10
X ⎧= ⎨
⎩
if coin toss results in a head
if coin toss results in a tail
( )k x
f k≤∑
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 13
Graphs of Probability Mass (Density) Function and Probability Distribution Function
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 14
Continuous Random Variables
( ) 0f x ≥
•An r.v. is continuous if it can assume any value from one or more intervals of real numbers•Probability density function f(x) :
( ) 1f x dx∞
−∞
=∫
( ) ( )b
a
P a X b f x dx≤ ≤ = ∫ for any a b≤
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 15
Cumulative Distribution Function
The cumulative distribution function (c.d.f.), denoted by F(x) , for a continuous random variable is given by:
( ) ( ) ( )x
F x P X x f y dy−∞
= ≤ = ∫
It follows that ( )( ) dF xf x
dx=
x
f(x)F(x)
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 17
Mean and Variance of Random Variables: Discrete Case
2( ) ( ), ( ) ( ( )) ( )E X xf x Var X x E X f x= = −∑ ∑
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 19
Expected Value or Mean of Random Variables
The expected value or mean of a discrete r. v. X denoted by E(X), μX , or simply μ, is defined as:
1 1 2 2( ) ( ) ( ) ( ) ...x
E X xf x x f x x f xμ= = = + +∑The expected value of a continuous r. v. is defined as:
( ) ( )E X xf x dxμ= = ∫
0
Mean of Exponetial Distribution 1( ) xE X x e dxλλλ
∞ −= =∫
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 20
Variance and Standard DeviationThe variance of an r.v. X, denoted by Var(X), 2
Xσ , or simply 2σis defined as
2 2( ) ( )Var X E Xσ μ= = −We can show that
( )22( ) ( ) ( )Var X E X E X= −The standard deviation (SD) is the square root of the variance
Challenge exercise: Show that for the exponential distribution the standard deviation is 1/λ
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 21
Variance of the Mean of independent, Identically Distributed Random Variables
( )( )
1
2 1
2 1
22
2
( )
1
1
1
nii
nii
nii
XVar X Var
n
Var Xn
Var Xn
nn
n
σ
σ
=
=
=
⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠
⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞= ⎜ ⎟⎝ ⎠⎛ ⎞= ⎜ ⎟⎝ ⎠
=
∑
∑
∑ by independence
since the r.v.’s are identically distributed
1 2
We often refer to , ,..., as
a randon samplewith replacement orfrom a very large population
nX X X
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 22
Quantiles and PercentilesFor 0 1p≤ ≤ the pth quantile (or the 100pth percentile), denoted by
pθ ,of a continuous r.v. X is defined by the following equation:
( ) ( )p pP X F pθ θ≤ = =
.5θ is called the median
pF(x)
θp
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 24
Jointly Distributed Random Variables
( , ) joint probability mass function
f x y =
32 0.16200
=
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 25
Marginal Distribution
Discrete: ( ) ( ) ( , )
Continuous: ( ) ( ) ( , )
y
X
g x P X x f x y
g x f x f x y dy∞
−∞
= = =
= =
∑
∫
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 26
Conditional Distribution
( , )( | ) ( | )( )
f x yf y x P Y y X xg x
= = = =
Conditional probability mass function (p.m.f.):( 4, 1) 0.005( 1| 4)
( 4) 0.315P X YP Y X
P X= =
= = = ==
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 27
Independent Random Variables and are independent r.v.'s if ( , ) ( ) ( )
( , )Note that ( | ) ( )( )
X Y f x y g x h yf x yf y x h yg x
=
= =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 28
Covariance and Correlation
If X and Y are independent then E(XY)=E(X)E(Y) so the covarianceis zero. The other direction is not true; zero covariance does not imply independence.
( , ) ( )( ) ( ) ( ) ( )XY X YCov X Y E X Y E XY E X E Yσ μ μ= = − − = −
Note that: ( ) ( , )E XY xyf x y dxdy∞ ∞
−∞ −∞= ∫ ∫
( , )( , )var( ) var( )
XYXY
X Y
Cov X Ycorr X YX Y
σρσ σ
= = =
Measures strength of linear association
1 1XYρ− ≤ ≤
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 29
Covariance Example
•A positive covariance indicates positive dependence•A negative covariance indicates negative dependence
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 31
Chebyshev’s InequalityLet c > 0 be a constant. Then irrespective of thedistribution of X
2
( )( ( ) ) Var XP X E X cc
− ≥ ≤
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 32
Weak Law of Large Numbers
Let X be the sample mean of n i.i.d. observations from a
population with finite mean μ and variance 2σ . Then forany fixed c > 0
2
2( ) 0P X cncσμ− ≥ ≤ → as n→∞
We see that X approaches μ as n gets large.
This follows from Chebyshev’s inequality and the fact that2
( ) and ( )E X Var Xnσμ= =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 33
Selected Discrete Distributions
Bernoulli distribution:
( ) ( )1p
f x P X xp
⎧= = = ⎨ −⎩
if x = 1if x = 0
( ) ,E X p= ( ) (1 )V ar X p p= −
Binomial distribution:
( ) ( ) (1 )x n xnf x P X x p p
x−⎛ ⎞
= = = −⎜ ⎟⎝ ⎠
for x = 0, 1, …,n
( ) ,E X np= ( ) (1 )Var X np p= −
( )5! 5!, e.g., 103! ! 3!2!
n nk k n k⎛ ⎞ ⎛ ⎞
= = =⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 34
Binomial Distribution Example
Suppose that the probability of a thumbtack landing with the pin up is 0.9. If we toss the thumbtack ten times what is the probability that it lands with the pin up exactly 7 times?
7 3 7 310( 7) (.9) (1 .9) 120(.9) (.1) .057
7P X ⎛ ⎞
= = − = =⎜ ⎟⎝ ⎠
Answer:
See Example 2.30, Page 43 for another application of the Binomial distribution
( ) 10 .9 9, ( ) (1 ) 10 .9 .1 .9E X np Var X np p= = × = = − = × × =
Also note:
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 35
Hypergeometric Distribution(Sampling without replacement from a small population)
A lot of 50 tables has two defective tables. A sample of five tables are selected without replacement. What is the probabilitythat none of these five tables is defective?
2 480 5
( 0) .8082505
P X
⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠= = =⎛ ⎞⎜ ⎟⎝ ⎠
Suppose the five tables had been selected with replacement? What would then be the probability?
0 55 2 48( 0) .81540 50 50
P X ⎛ ⎞⎛ ⎞ ⎛ ⎞= = =⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 36
( ) ( ) ,!
xef x P X xx
λλ−
= = = for x = 0, 1, 2, …
( ) ,E X λ= ( )Var X λ=
Poisson Distribution:
Example: On the average five Prussian soldiers die from horsekicks in a year. What is the probability that exactly four soldiers are killed this way in a given year?
5 4(5)( 4) .1754!
eP X−
= = =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 37
Geometric Distribution
Probability of waiting time to an event in independent trials
1
2
( ) (1 ) , 1, 2,...1 1( ) and ( )
xP X x p p xpE X Var X
p p
−= = − =−
= =
Suppose the probability of winning the jackpot in a slot machineis .01. What is the expected number of tries to win the jackpot?What the is the probability that you hit the jackpot for the first time on your fifth try?
41( ) 100, ( 5) (.99) (.01) .0096.01
E X P X= = = = =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 38
Uniform DistributionDistribution when all values in an interval are equally likely
Suppose that you select a real number at random in the interval [1,5]. What is the probability that it turns out to be between 2and 4?
4 2(2 4) 0.55 1
P X −≤ ≤ = =
−Proportion of the lengths of the intervals [2,4] to the length of the interval [1,5]
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 39
Exponential DistributionDistribution of waiting time when arrivals occur at random
0( ) , ( ) 1 for 0
xx t xf x e F x e dt e xλ λ λλ λ− − −= = = − ≥∫2
1 1( ) and ( )E X Var Xλ λ
= =
( ) 1 ( ) 1 ( ) xP X x P X x F x e λ−> = − ≤ = − =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 40
Memoryless Property of the Exponential Distribution
( ) ( )( ) ( )
|
5 3 | 5 3
P X s t X s P X t
P X X P X
> + > = >
> + > = >
The probability of having to wait t additional minutes after havingwaited s minutes is the same as the probability of having to wait t minutes to begin with.
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 41
Normal DistributionA continuous r.v. X has a normal distribution with parameter μand 2σ if its p.d.f. is given by
2
2( )
21( ) for 2
x
f x e xμσ
σ π
−−
= −∞< <∞
( )E X μ= 2( )Var X σ=and
2~ ( , )X N μ σNotation:
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 42
Standard Normal Distribution = N(0,1)2If ~ ( , ) then ~ (0,1)XX N Z Nμμ σ
σ−
=
( ) X x xP X x P Z μ μ μσ σ σ− − −⎛ ⎞ ⎛ ⎞≤ = = ≤ = Φ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
2~ (205,5 )( 200)
205 200 2055 5
( 1) ( 1) 0.1587
X NP X
XP Z
P Z
< =
− −⎛ ⎞= < =⎜ ⎟⎝ ⎠
< − = Φ − =
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 45
Mean of i.i.d. Normal Random Variable
1 2
21 2
2
21
Let , ,..., be independent, indentically
distributed ( , ). We say that , ,...,
is a random sample from a ( , ) population.
Then for we have ~ , .
n
n
nii
X X X
N X X X
N
XX X N
n n
μ σ
μ σ
σμ= ⎛ ⎞= ⎜ ⎟
⎝ ⎠
∑
Hint: Use this result to do homework problem 2.83
6/7/2007 Unit 2 - Stat 571 - Ramón V. León 46
Percentiles of the Normal Distribution
Suppose that the scores on a standardized test are normally distributed with mean 500 and standard deviation 100. Whatis the 75th percentile score of this test?
500 500 500 500( ) .75100 100 100 100
X x x xP X x P P Z− − − −⎛ ⎞ ⎛ ⎞ ⎛ ⎞≤ = ≤ = ≤ = Φ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠
500 0.675 500 (0.675)(100) 567.5100
x x−= ⇒ = + =
From Table A.3 (0.675) 0.75.Φ = So
For 75 percentile means that ( ) .75. Sothx P X x= ≤ =