1 Intro to Probability Zhi Wei 2 Outline Basic concepts in probability theory Random variable and...
-
Upload
octavia-butler -
Category
Documents
-
view
227 -
download
3
Transcript of 1 Intro to Probability Zhi Wei 2 Outline Basic concepts in probability theory Random variable and...
1
Intro to Probability
Zhi Wei
2
Outline Basic concepts in probability theory Random variable and probability
distribution Bayes’ rule
3
Introduction
Probability is the study of randomness and uncertainty.
In the early days, probability was associated with games of chance (gambling).
4
Simple Games Involving Probability
Game: A fair die is rolled. If the result is 2, 3, or 4, you win $1; if it is 5, you win $2; but if it is 1 or 6, you lose $3.
Should you play this game?
5
Random Experiment
a random experiment is a process whose outcome is uncertain.
Examples: Tossing a coin once or several times Picking a card or cards from a deck Measuring temperature of patients ...
6
Sample SpaceThe sample space is the set of all possible outcomes.
Simple EventsThe individual outcomes are called simple events.
EventAn event is any subset of the whole sample space
Events & Sample Spaces
7
Example
Experiment: Toss a coin 3 times.
Sample space = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.
Examples of events include A = {at least two heads}
B = {exactly two tails.}
= {HHH, HHT,HTH, THH}
= {HTT, THT,TTH}
8
Basic Concepts (from Set Theory)
A B, the union of two events A and B, is the event consisting of all outcomes that are either in A or in B or in both events.
A B (AB), the intersection of two events A and B, is the event consisting of all outcomes that are in both events.
Ac, the complement of an event A, is the set of all outcomes in that are not in A.
A-B, the set difference, is the event consisting of all outcomes that in A but not in B
When two events A and B have no outcomes in common, they are said to be mutually exclusive, or disjoint, events.
9
Example
Experiment: toss a coin 10 times and the number of heads is observed.
Let A = { 0, 2, 4, 6, 8, 10}.
B = { 1, 3, 5, 7, 9}, C = {0, 1, 2, 3, 4, 5}.
A B= {0, 1, …, 10} = .
A B contains no outcomes. So A and B are mutually exclusive.
Cc = {6, 7, 8, 9, 10}, A C = {0, 2, 4}, A-C={6,8,10}
10
Rules Commutative Laws:
A B = B A, A B = B A Associative Laws:
(A B) C = A (B C ) (A B) C = A (B C) .
Distributive Laws: (A B) C = (A C) (B C) (A B) C = (A C) (B C)
DeMorgan’s Laws:
. ,1111
ci
n
i
c
i
n
i
ci
n
i
c
i
n
i
AAAA
11
Venn Diagram
A BA∩BA BA∩B
12
Probability
A Probability is a number assigned to each subset (events) of a sample space .
Probability distributions satisfy the following rules:
13
Axioms of Probability
For any event A, 0 P(A) 1.
P() =1.
If A1, A2, … An is a partition of A, then
P(A) = P(A1) + P(A2) + ...+ P(An)
(A1, A2, … An is called a partition of A if A1 A2 … An = A and A1, A2, … An are mutually exclusive.)
14
Properties of Probability
For any event A, P(Ac) = 1 - P(A).
P(A-B)=P(A) – P(A B) If A B, then P(A - B) = P(A) – P(B)
For any two events A and B, P(A B) = P(A) + P(B) - P(A B). For three events, A, B, and C, P(ABC) = P(A) + P(B) + P(C) - P(AB) - P(AC) - P(BC) + P(AB C).
15
Example
In a certain population, 10% of the people are rich, 5% are famous, and 3% are both rich and famous. A person is randomly selected from this population. What is the chance that the person is
not rich? rich but not famous? either rich or famous?
16
Joint Probability For events A and B, joint probability
Pr(AB) stands for the probability that both events happen.
Example: A={HT}, B={HT, TH}, what is the joint probability Pr(AB)?
17
Independence
Two events A and B are independent in casePr(AB) = Pr(A)Pr(B)
18
Independence
Two events A and B are independent in casePr(AB) = Pr(A)Pr(B)
Example 1: Drug test
Women Men
Success 200 1800
Failure 1800 200
A = {A patient is a Women}
B = {Drug fails}
Are A and B independent?
19
Independence
Two events A and B are independent in casePr(AB) = Pr(A)Pr(B)
Example 1: Drug test
Example 2: toss a coin 3 times, Let = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}.A={having both T and H}, B={at most one T}Are A and B independent? How about toss 2 times?
Women Men
Success 200 1800
Failure 1800 200
A = {A patient is a Women}
B = {Drug fails}
Are A and B independent?
20
Independence If A and B independent, then A and Bc, B and Ac, Ac and Bc
independent
Consider the experiment of tossing a coin twice Example I:
A = {HT, HH}, B = {HT} Will event A independent from event B?
Example II: A = {HT}, B = {TH} Is event A independent from event B?
Disjoint Independence
If A is independent from B, B is independent from C, will A be independent from C?
21
If A and B are events with Pr(A) > 0, the conditional probability of B given A is
Conditioning
Pr( )Pr( | )
Pr( )
ABB A
A
22
If A and B are events with Pr(A) > 0, the conditional probability of B given A is
Example: Drug test
Conditioning
Pr( )Pr( | )
Pr( )
ABB A
A
Women Men
Success 200 1800
Failure 1800 200
A = {Patient is a Women}
B = {Drug fails}
Pr(B|A) = ?
Pr(A|B) = ?
23
If A and B are events with Pr(A) > 0, the conditional probability of B given A is
Example: Drug test
Given A is independent from B, what is the relationship between Pr(A|B) and Pr(A)?
Conditioning
Pr( )Pr( | )
Pr( )
ABB A
A
Women Men
Success 200 1800
Failure 1800 200
A = {Patient is a Women}
B = {Drug fails}
Pr(B|A) = ?
Pr(A|B) = ?
24
Which Drug is Better ?
25
Simpson’s Paradox: View I
Drug I Drug II
Success 219 1010
Failure 1801 1190
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|B) ~ 50%
Drug II is better than Drug I
26
Simpson’s Paradox: View II
Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|B) ~ 5%
27
Simpson’s Paradox: View II
Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|B) ~ 5%
Male Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 100%
Pr(C|B) ~ 50%
28
Simpson’s Paradox: View II
Female Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 10%
Pr(C|B) ~ 5%
Male Patient
A = {Using Drug I}
B = {Using Drug II}
C = {Drug succeeds}
Pr(C|A) ~ 100%
Pr(C|B) ~ 50%
Drug I is better than Drug II
29
Conditional Independence
Event A and B are conditionally independent given C in case
Pr(AB|C)=Pr(A|C)Pr(B|C) A set of events {Ai} is conditionally independent
given C in case
11
Pr( ... | ) Pr( | )n
n ii
A A C A C
30
Conditional Independence (cont’d) Example: There are three events: A, B, C
Pr(A) = Pr(B) = Pr(C) = 1/5 Pr(AC) = Pr(BC) = 1/25, Pr(AB) = 1/10 Pr(ABC) = 1/125 Whether A, B are independent? Whether A, B are conditionally independent
given C? A and B are independent A and B are
conditionally independent
31
Outline Basic concepts in probability theory Random variable and probability
distribution Bayes’ rule
32
Random Variable and Distribution
A random variable X is a numerical outcome of a random experiment
The distribution of a random variable is the collection of possible outcomes along with their probabilities: Categorical case: Numerical case:
Pr( ) ( )X x p x
Pr( ) ( )b
aa X b p x dx
33
Random Variables Distributions Cumulative Probability Distribution (CDF):
• Probability Density Function (PDF):Probability Density Function (PDF):
34
Random Variable: Example Let S be the set of all sequences of three rolls of
a die. Let X be the sum of the number of dots on the three rolls.
What are the possible values for X? Pr(X = 5) = ?, Pr(X = 10) = ?
35
Expectation value Division of the stakes problem
Henry and Tony play a game. They toss a fair coin, if get a Head, Henry wins; Tail, Tony wins. They contribute equally to a prize pot of $100, and agree in advance that the first player who has won 3 rounds will collect the entire prize. However, the game is interrupted for some reason after 3 rounds. They got 2 H and 1 T. How should they divide the pot fairly?
a) It seems unfair to divide the pot equally Since Henry has won 2 out of 3 rounds. Then, how about Henry gets 2/3 of $100?
b) Other thoughts?
X 0 100
P 0.25 0.75
X is what Henry will win if the game not interrupted
36
Expectation Definition: the expectation of a random
variable is , discrete case
, continuous case Properties
Summation: For any n≥1, and any constants k1,…,kn
Product: If X1, X2, …, Xn are independent
[ ] Pr( )x
E X x X x [ ] ( )E X xp x dx
1 1
[ ] ( )n n
i ii i
E X E X
1 1
[ ] ( )n n
i i i ii i
E k X k E X
37
Expectation: Example Let S be the set of all sequence of three rolls of a
die. Let X be the sum of the number of dots on the three rolls.
What is E(X)?
Let S be the set of all sequence of three rolls of a die. Let X be the product of the number of dots on the three rolls.
What is E(X)?
38
Variance Definition: the variance of a random variable X
is the expectation of (X-E[x])2 :
Properties For any constant C, Var(CX)=C2Var(X) If X1, X2, …, Xn are independent
2
2 2
2 2
2 2
( ) (( [ ]) )
( [ ] 2 [ ])
( ) [ ] 2 ( [ ])
[ ] [ ]
Var X E X E X
E X E X XE X
E X E X E XE X
E X E X
1 2 1 2( ... ) ( ) ( ) ... ( )n nVar X X X Var X Var X Var X
39
Bernoulli Distribution
The outcome of an experiment can either be success (i.e., 1) and failure (i.e., 0).
Pr(X=1) = p, Pr(X=0) = 1-p, or
E[X] = p, Var(X) = p(1-p) Using sample() to generate Bernoulli samples> n = 10; p = 1/4; > sample(0:1, size=n, replace=TRUE, prob=c(1-p, p))[1] 0 1 0 0 0 1 0 0 0 1
1( ) (1 )x xp x p p
40
Binomial Distribution n draws of a Bernoulli distribution
Xi~Bernoulli(p), X=i=1n Xi, X~Bin(p, n)
Random variable X stands for the number of times that experiments are successful.
n = the number of trials x = the number of successes p = the probability of success
E[X] = np, Var(X) = np(1-p)
otherwise0
,...,2,1,0)1()()Pr(
nxppx
n
xpxXxnx
the binomial distribution in R
dbinom(x, size, prob) Try 7 times, equally
likely succeed or fail> dbinom(3,7,0.5)[1] 0.2734375
>barplot(dbinom(0:7,7,0.5),names.arg=0:7)
0 1 2 3 4 5 6 7
0.0
00
.05
0.1
00
.15
0.2
00
.25
what if p ≠ 0.5? > barplot(dbinom(0:7,7,0.1),names.arg=0:7)
0 1 2 3 4 5 6 7
0.0
0.1
0.2
0.3
0.4
Which distribution has greater variance?
0 1 2 3 4 5 6 7
0.0
00
.05
0.1
00
.15
0.2
00
.25
0 1 2 3 4 5 6 7
0.0
0.1
0.2
0.3
0.4
p = 0.5 p = 0.1
var = n*p*(1-p) = 7*0.1*0.9=7*0.09var = n*p*(1-p) = 7*0.5*0.5 = 7*0.25
briefly comparing an experiment to a distribution
nExpr = 1000tosses = 7; y=rep(0,nExpr);for (i in 1:nExpr) {
x = sample(c("H","T"), tosses, replace = T)y[i] = sum(x=="H")
}hist(y,breaks=-0.5:7.5)lines(0:7,dbinom(0:7,7,0.5)*n
Expr) points(0:7,dbinom(0:7,7,0.5)
*nExpr)
theoreticaldistribution
result of 1000 trials
Histogram of y
y
Freq
uen
cy
0 2 4 6
05
01
00
15
02
00
25
03
00
Cumulative distribution
P(X=x)
0 1 2 3 4 5 6 7
cum
ula
tive
dis
trib
utio
n
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 5 6 7
pro
ba
bili
ty d
istr
ibu
tion
0.0
0.2
0.4
0.6
0.8
1.0
P(X≤x)
> barplot(dbinom(0:7,7,0.5),names.arg=0:7) > barplot(pbinom(0:7,7,0.5),names.arg=0:7)
cumulative distribution
0 1 2 3 4 5 6 7
pro
bab
ility
dis
trib
uti
on
0.0
0.2
0.4
0.6
0.8
1.0
0 1 2 3 4 5 6 7
cum
ula
tive d
istr
ibu
tion
0.0
0.2
0.4
0.6
0.8
1.0
P(X=x) P(X≤x)
example: surfers on a website Your site has a lot of visitors 45% of whom
are female You’ve created a new section on
gardening Out of the first 100 visitors, 55 are female. What is the probability that this many or
more of the visitors are female? P(X≥55) = 1 – P(X≤54) = 1-
pbinom(54,100,0.45)
Another way to calculate cumulative probabilities ?pbinom P(X≤x) = pbinom(x, size, prob, lower.tail = T) P(X>x) = pbinom(x, size, prob, lower.tail = F)
> 1-pbinom(54,100,0.45)[1] 0.02839342
> pbinom(54,100,0.45,lower.tail=F)[1] 0.02839342
Female surfers visiting a section of a website
0 6 13 21 29 37 45 53 61 69 77 85 93
pro
ba
bili
ty d
istr
ibu
tion
0.0
00
.02
0.0
40
.06
what is the area under the curve?
Cumulative distribution
0 6 13 21 29 37 45 53 61 69 77 85 93
cum
ula
tive
dis
trib
utio
n
0.0
0.2
0.4
0.6
0.8
1.0
<3 %
> 1-pbinom(54,100,0.45)[1] 0.02839342
51
Plots of Binomial Distribution
Another discrete distribution: hypergeometric
Randomly draw n elements without replacement from a set of N elements, r of which are S’s (successes) and (N-r) of which are F’s (failures)
hypergeometric random variable x is the number of S’s in the draw of n elements
n
N
xn
rN
x
r
xp )(
hypergeometric example fortune cookies there are N = 20 fortune cookies r = 18 have a fortune, N-r = 2 are empty What is the probability that out of n = 5 cookies,
s=5 have a fortune (that is we don’t notice that some cookies are empty)
> dhyper(5, 18, 2, 5) [1] 0.5526316
So there is a greater than 50% chance that we won’t notice.
Gene Set Enrichment Analysis
hypergeometric and binomial When the population N is (very) big, whether one
samples with or without replacement is pretty much the same
100 cookies, 10 of which are empty
1 2 3 4 5
0.0
0.1
0.2
0.3
0.4
0.5
binomial
hypergeometric
number of full cookies out of 5
code aside> x = 1:5> y1 = dhyper(1:5,90,10,5)> y2 = dbinom(1:5,5,0.9)> tmp = as.matrix(t(cbind(y1,y2)))> barplot(tmp,beside=T,names.arg=x)
hypergeometric probability
binomial probability
Poisson distribution # of events in a given interval
e.g. number of light bulbs burning out in a building in a year # of people arriving in a queue per minute
= mean # of events in a given interval
E[X] = , Var(X) = !
)(x
exp
x
Example: Poisson distribution You got a box of 1,000 widgets. The manufacturer says that the failure
rate is 5 per box on average. Your box contains 10 defective widgets.
What are the odds?> ppois(9,5,lower.tail=F)[1] 0.03182806 Less than 3%, maybe the manufacturer is
not quite honest. Or the distribution is not Poisson?
Poisson approximation to binomial If n is large (e.g. > 100) and n*p is moderate (p
should be small) (e.g. < 10), the Poisson is a good approximation to the binomial with = n*p
0 1 2 3 4 5 6 7 8 9 11 13 15
binomialPoisson
0.0
00
.05
0.1
00
.15
59
Plots of Poisson Distribution
Normal (Gaussian) Distribution Normal distribution (aka “bell curve”) fits many biological data well
e.g. height, weight serves as an approximation to binomial,
hypergeometric, Poisson because of the Central Limit Theorem
Well studied
61
Normal (Gaussian) Distribution X~N(,)
E[X]= , Var(X)= 2
If X1~N(1,1), X2~N(2,2), and X1, X2 are independent X= X1+ X2 ?
X= X1- X2 ?
2
22
2
22
1 ( )( ) exp
22
1 ( )Pr( ) ( ) exp
22
b b
a a
xp x
xa X b p x dx dx
sampling from a normal distributionx <- rnorm(1000)h <- hist(x, plot=F)ylim <-
range(0,h$density,dnorm(0))
hist(x,freq=F,ylim=ylim)curve(dnorm(x),add=T)
Histogram of x
x
De
nsi
ty
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
Normal Approximation based on Central Limit Theorem
Central Limit Theorem If xi ~i.i.d with (μ, σ2) and when n is large, then
(x1+…+xn)/n ~ N(μ, σ2/n)
Or (x1+…+xn) ~ N(nμ, nσ2)
Example A population is evenly divided on an issue (p=0.5). For a random
sample of size 1000, what is the probability of having ≥550 in favor of it?
n=1000, xi~Bernoulli (p=0.5), i.e. E(xi)=p; V(xi)=p(1-p)
(x1+…+xn) ~ Binomial(n=1000, p=0.5)
Pr((x1+…+xn)>=550) =1-pbinom(550,1000,0.5)
Normal Approximation:(x1+…+xn) ~ N(np, np(1-p))=N(500, 250)
Pr((x1+…+xn)>=550) =1-pnorm(550, mean=500, sd=sqrt(250))63
d, p, q, and r functions in R In R, a set of functions have been implemented
for each of almost all known distributions. r<distname>(n,<parameters>)
Possible distributions: binom, pois, hyper, norm, beta, chisq, f, gamma, t, unif, etc
You find other characteristics of distributions as well d<dist>(x,<parameters>): density at x p<dist>(x,<parameters>): cumulative distribution
function to x q<dist>(p,<parameters>): inverse cdf
64
Example: Uniform Distribution The uniform distr. On [a,b] has two
parameter. The family name is unif. In R, the parameters are named min and max
> dunif(x=1, min=0, max=3)[1] 0.3333333> punif(q=2, min=0, max=3)[1] 0.6666667> qunif(p=0.5, min=0, max=3)[1] 1.5> runif(n=5, min=0, max=3)[1] 2.7866852 1.9627136 0.9594195 2.6293273 1.6277597
65
Lab Exercise Using R for Introductory StatisticsPage 39: 2.4, 2.8-2.10Page 54: 2.16, 2.23, 2.35, 2.26Page 66: 2.30, 2.32, 2.34-2.36, 2.39, 2.41,
2.42, 2.43-2.46
66
67
Outline Basic concepts in probability theory Random variable and probability
distribution Bayes’ rule
68
Given two events A and B and suppose that Pr(A) > 0. Then
Example:
Bayes’ Rule
Pr(W|R) R R
W 0.7 0.4
W 0.3 0.6
R: It is a rainy day
W: The grass is wet
Pr(R|W) = ?
Pr(R) = 0.8
)Pr(
)Pr()|Pr(
)Pr(
)Pr()|Pr(
A
BBA
A
ABAB
69
Bayes’ Rule
R R
W 0.7 0.4
W 0.3 0.6
R: It rains
W: The grass is wet
R W
Information
Pr(W|R)
Inference
Pr(R|W)
70
Pr( | ) Pr( )Pr( | )
Pr( )
E H HH E
E
Bayes’ Rule
R R
W 0.7 0.4
W 0.3 0.6
R: The weather rains
W: The grass is wet
Hypothesis H Evidence EInformation: Pr(E|H)
Inference: Pr(H|E) PriorLikelihoodPosterior
71
Summation (Integration) out tip Suppose that B1, B2, … Bk form a partition of :
B1 B2 … Bk = and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
1
1
Pr( | ) Pr( )Pr( | )
Pr( )
Pr( | ) Pr( )
Pr( )
Pr( | ) Pr( )
Pr( ) Pr( | )
i ii
i ik
jj
i ik
j jj
A B BB A
A
A B B
AB
A B B
B A B
72
Summation (Integration) out tip Suppose that B1, B2, … Bk form a partition of :
B1 B2 … Bk = and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
1
1
Pr( | ) Pr( )Pr( | )
Pr( )
Pr( | ) Pr( )
Pr( )
Pr( | ) Pr( )
Pr( ) Pr( | )
i ii
i ik
jj
i ik
j jj
A B BB A
A
A B B
AB
A B B
B A B
73
Summation (Integration) out tip Suppose that B1, B2, … Bk form a partition of :
B1 B2 … Bk = and B1, B2, … Bk are mutually exclusive
Suppose that Pr(Bi) > 0 and Pr(A) > 0. Then
1
1
Pr( | ) Pr( )Pr( | )
Pr( )
Pr( | ) Pr( )
Pr( )
Pr( | ) Pr( )
Pr( ) Pr( | )
i ii
i ik
jj
i ik
j jj
A B BB A
A
A B B
AB
A B B
B A B
Key: Joint distribution!
74
Application of Bayes’ Rule
R It rains
W The grass is wet
U People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)Pr(UW| R)=Pr(U| R)Pr(W| R)
R
W U
Pr(W|R) R R
W 0.7 0.4
W 0.3 0.6
Pr(U|R) R R
U 0.9 0.2
U 0.1 0.8
Pr(U|W) = ?
Pr(R) = 0.8
75
A More Complicated Example
R It rains
W The grass is wet
U People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)Pr(UW| R)=Pr(U| R)Pr(W| R)
R
W U
Pr(W|R) R R
W 0.7 0.4
W 0.3 0.6
Pr(U|R) R R
U 0.9 0.2
U 0.1 0.8
Pr(U|W) = ?
Pr(R) = 0.8
76
A More Complicated Example
R It rains
W The grass is wet
U People bring umbrella
Pr(UW|R)=Pr(U|R)Pr(W|R)Pr(UW| R)=Pr(U| R)Pr(W| R)
R
W U
Pr(W|R) R R
W 0.7 0.4
W 0.3 0.6
Pr(U|R) R R
U 0.9 0.2
U 0.1 0.8
Pr(U|W) = ?
Pr(R) = 0.8
77
Acknowledgments Peter N. Belhumeur: for some of the slides adapted or
modified from his lecture slides at Columbia University Rong Jin: for some of the slides adapted or modified from
his lecture slides at Michigan State University Jeff Solka: for some of the slides adapted or modified from
his lecture slides at George Mason University Brian Healy: for some of the slides adapted or modified
from his lecture slides at Harvard University