Introduction to Data Analysis Sampling and Probability Distributions.
Probability Distributions and Introduction to...
Transcript of Probability Distributions and Introduction to...
![Page 1: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/1.jpg)
ProbabilityDistributionsandIntroductiontoStatisticalInferenceBIO5312FALL2017
STEPHANIE J. SPIELMAN,PHD
![Page 2: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/2.jpg)
RandomvariableRandomprocessesproducenumericaloutcomes:◦ Numberoftailsin50coinflips◦ Thesumofeveryone'sheights
Definition:arandomvariableisafunctionthatmapsoutcomesofarandomprocesstoanumericvalue◦ X isafunction(rule)thatassignanumberX(s) toeachoutcomes ∈S (wheres isaneventinsamplespace S )
◦ r.v.'s aretechnicallyneitherrandomnorvariables…◦ But,youcanthinkofthemroughlynumericaloutcomesofrandomprocesses
![Page 3: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/3.jpg)
DiscretevscontinuousRVDiscrete randomvariablescantakeon(mapto)afinitenumberofvalues
Continuousrandomvariablescantakeon(mapto)innumerable/infinitevalues
![Page 4: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/4.jpg)
ExpressingdiscreterandomvariablesProbabilitymassfunction(PMF)◦ Describesthevaluestakenbyadiscreter.v.X anditsassociatedprobabilities◦ Functionthatassigns,toanypossiblevaluex ofadiscreter.v.X,theprobabilityP(X = x)
0.00
0.05
0.10
0.15
1 2 3 4 5 6Event
Even
t pro
babi
lity
PMFforrollingafairdieP(X = x)
x
![Page 5: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/5.jpg)
PMFproperties0 ≤ 𝑃 𝑋 = 𝑥 ≤ 1
∑𝑃 𝑋 = 𝑥 = 1��
PMFissimplyafanciertermforadiscreteprobabilitydistribution
![Page 6: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/6.jpg)
ExpressingdiscreterandomvariablesCumulativedistributionfunction(CDF)◦ Functiondefined,foraspecificvaluex ofadiscreter.v.X,asF(x) = P(X ≤ x)
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6Event
Cum
ulat
ive p
roba
bilit
y
CDFforrollingafairdie
P(X ≤ 1)
P(X ≤ 4)
![Page 7: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/7.jpg)
CDFproperties0 ≤ 𝐹 𝑋 ≤ 1
CDFfunctionsarenon-decreasing
![Page 8: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/8.jpg)
PMFvsCDFPMF:WhatistheprobabilityofeventX?
CDF:Whatisthesumofprobabilitiesforallevents≤X?
![Page 9: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/9.jpg)
ExpectationandspreadofrandomvariablesTheexpectation ofar.v.istheprobability-weightedaverageofallpossiblevalues(i.e.,mean)◦𝔼 𝑋 = 𝜇 = ∑ 𝑥/𝑝(𝑥/)�
/
Thevarianceofar.v.isdefined◦𝑉𝑎𝑟 𝑋 = 𝜎7 = 𝔼[ 𝑋 − 𝜇 7] = ∑ [𝑥/7𝑝(𝑥/)�
/ ] − 𝜇7
◦𝑉𝑎𝑟 𝑋 = 𝔼[𝑋7] − 𝔼[𝑋]7
![Page 10: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/10.jpg)
Example:TheBinomialdistributionThebinomialdistribution describestheprobabilityofobtainingk successesinnBernoullitrials,wheretheprobabilityofsuccessforeachtrialisconstantatp
ABernoullitrial hasabinaryoutcome(success/fail,true/false,yes/no),andP(success)=p isthesameforallrealizationsofthetrial
![Page 11: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/11.jpg)
TheBInS conditionsTobebinomiallydistributed,mustsatisfythefollowing:
Binaryoutcomes
Independenttrials(outcomesdonotinfluenceeachother)
n isfixedbeforethetrialsbegin
Sameprobabilityofsuccess,p,foralltrials
![Page 12: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/12.jpg)
Isitbinomial?
Yes!
NoL
Abagcontains10balls,7redand3green.
Situation1:Youdraw5ballsfromthebag,notingtheballcoloreachtimeandthenreturningittothebag.
Situation2:Youdraw5ballsfromthebag,retainingeachdrawnballforsafe-keepingsoyoucanplaycatchatanymoment.
Situation3:Youkeepdrawingballs,withreplacement,untilyouhavedrawn4redballs. NoL
![Page 13: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/13.jpg)
ThebinomialdistributionThePMF(probabilitydistribution)forabinomially-distributedrandomvariable:
𝑃 𝑋 = 𝑘 = <= 𝑝
=(1 − 𝑝)(<>=)= <= 𝑝
=𝑞(<>=)
Thebinomialcoefficient: <= = <!=! <>= !
◦ readas"nchoosek"
![Page 14: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/14.jpg)
Wikipediaweighsin
![Page 15: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/15.jpg)
ThebinomialdistributionTheexpectation forabinomialr.v.◦ 𝔼 𝑋 = 𝜇 = np
Thevarianceforabinomialr.v.◦ 𝑉𝑎𝑟 𝑋 = 𝜎7 = npq = np(1 − p)
Wewritebinomiallydistributedr.v.'s as𝑋~𝐵(𝑛, 𝑝)
![Page 16: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/16.jpg)
Example:PlayingwithabinomialrvEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.
Here,n=5andp=0.25,meaningwedefineTypeOas"success",andnotTypeOas"failure".à X~B(5,0.25)
Tasks:◦ Computeexpectationandvariance◦ VisualizePMF◦ VisualizeCDF◦ Makesomecalculations…
![Page 17: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/17.jpg)
ExpectationandvarianceEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)
𝔼 𝑋 = 𝜇 = np =5*0.25=1.25
𝑉𝑎𝑟 𝑋 = 𝜎7 = npq = np(1 − p) =5*0.25*0.75=0.9375
![Page 18: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/18.jpg)
VisualizethePMF
0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 19: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/19.jpg)
?distributionsDistributions in the stats package
Description:Density, cumulative distribution function, quantile function andrandom variate generation for many standard probabilitydistributions are available in the ‘stats’ package.
Details:The functions for the density/mass function, cumulativedistribution function, quantile function and random variategeneration are named in the form ‘dxxx’, ‘pxxx’, ‘qxxx’ and ‘rxxx’respectively.
For the beta distribution see ‘dbeta’.
For the binomial (including Bernoulli) distribution see ‘dbinom’.
For the Cauchy distribution see ‘dcauchy’.
For the chi-squared distribution see ‘dchisq’.
![Page 20: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/20.jpg)
Distributionfunctions,generallyFunction Purpose Binomialversion
dxxx() Probabilitydistribution dbinom(x, size, prob)
pxxx() CDF pbinom(q, size, prob)rxxx() Generaterandom
numbersfromgivendistribution
rbinom(n, size, prob)
qxxx() Quantile:Inverse ofpxxx()
qbinom(p, size, prob)
![Page 21: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/21.jpg)
BinomialdistributionfunctionsBinomial function Example Output
dbinom(x, size, prob) dbinom(2, 5, 0.25) Prob ofobtaining2successesin5trials,wherep=0.25à 0.263
pbinom(q, size, prob) pbinom(2, 5, 0.25) Prob ofobtaining≤2successesin5trials,wherep=0.25à 0.896
rbinom(n, size, prob) rbinom(100, 5, 0.25) Generate100kvaluesfromthisbinomialdist.à 100from{0,1,2,3,4}
qbinom(p, size, prob) qbinom(0.896, 5, 0.25) SmallestvaluexwhereF(x)>=p*à 2*notprob success,justaprob
![Page 22: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/22.jpg)
MakingthePMF0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
> ## Use dbinom() to get the PMF values> p = 0.25> n = 5> k0 <- dbinom(0, 5, 0.25) ## Prob of 0 successes, aka no children are Type O> k1 <- dbinom(1, 5, 0.25) ## Prob of 1 success, aka only 1 child is Type O
> ## Advanced:> library(purrr)> map_dbl(0:5, dbinom, 5, 0.25)
[1] 0.2373046875 0.3955078125 0.2636718750 0.0878906250 0.0146484375[6] 0.0009765625
![Page 23: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/23.jpg)
MakingthePMF## data frame (tibble) of probabilities for PMF> data.pmf <- tibble(k = 0:5, prob = c(0.236623, 0.396, 0.264, 0.0879, 0.0145, 0.000977))> data.pmf
# A tibble: 6 x 2k prob
<int> <dbl>1 0 0.2366232 1 0.3960003 2 0.2640004 3 0.0879005 4 0.0145006 5 0.000977
## Equivalent> data.pmf <- tibble(k = 0:5, prob = map_dbl(0:5, dbinom, 5, 0.25))
![Page 24: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/24.jpg)
MakingthePMFusesadifferent*stat*> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +
xlab("Number of kids") + ylab("Probability Type O")
0.0
0.1
0.2
0.3
0.4
0 2 4Number of kids
Prob
abilit
y Ty
pe O
![Page 25: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/25.jpg)
Tweakingthex-axis> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +
ylab("Probability Type O") + scale_x_continuous(name = "Number of kids", breaks = 0:5)
0.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 26: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/26.jpg)
Addingsometext> ggplot(data.pmf, aes(x = k, y=prob))+ geom_bar( stat="identity" ) +
ylab("Probability Type O") + scale_x_continuous(name = "Number of kids", breaks = 0:5) +geom_text(aes(x = k, y= prob + 0.01, label = prob))
0.236623
0.396
0.264
0.0879
0.0145 0.0009770.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 27: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/27.jpg)
VisualizetheCDF> binom.sample <- tibble(x = rbinom(1000, 5, 0.25))> ggplot(binom.sample, aes(x=x)) + stat_ecdf() +
xlab("# Type O kids") + ylab("Cumulative probability")
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4 5# Type O kids
Cum
ulat
ive p
roba
bility
![Page 28: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/28.jpg)
SolvingforprobabilitiesEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)
Whatistheprobabilitythatexactly2childrenwereTypeO?> dbinom(2, 5, 0.25)
[1] 0.2636719
0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 29: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/29.jpg)
SolvingforprobabilitiesEachchildborntoaparticularsetofparentshasa25%probabilityofhavingbloodtypeO.Assumetheparentshadfivechildren.B(5,0.25)
Whatistheprobabilitythatexactly2childrenwereTypeO?
𝑃 𝑋 = 2 = I7 0.25
70.75(I>7)
=10*0.0625*0.422=0.26375
𝑃 𝑋 = 𝑘 = 𝑛𝑘 𝑝=(1 − 𝑝)(<>=)=
𝑛𝑘 𝑝=𝑞(<>=)
0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 30: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/30.jpg)
SolvingforprobabilitiesWhatistheprobabilitythat2orfewer childrenwereTypeO?
> pbinom(2, 5, 0.25) [1] 0.8964844
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4 5# Type O kids
Cum
ulat
ive p
roba
bility
B(5,0.25)
![Page 31: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/31.jpg)
SolvingforprobabilitiesWhatistheprobabilitythat2orfewerchildrenwereTypeO?
> dbinom(0, 5, 0.25) + dbinom(1, 5, 0.25) + dbinom(2, 5, 0.25)
[1] 0.8964844
B(5,0.25)
0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 32: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/32.jpg)
SolvingforprobabilitiesWhatistheprobabilitythatmorethan2children(ie either3,4,or5)wereTypeO?
> 1 - pbinom(2, 5, 0.25) [1] 0.1035156
0.00
0.25
0.50
0.75
1.00
0 1 2 3 4 5# Type O kids
Cum
ulat
ive p
roba
bility
B(5,0.25)
![Page 33: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/33.jpg)
SolvingforprobabilitiesWhatistheprobabilitythatmorethan2children(ie either3,4,or5)wereTypeO?
> dbinom(3, 5, 0.25) + dbinom(4, 5, 0.25) + dbinom(5, 5, 0.25)
[1] 0.1035156
B(5,0.25)
0.2373046875
0.3955078125
0.263671875
0.087890625
0.01464843750.00097656250.0
0.1
0.2
0.3
0.4
0 1 2 3 4 5Number of kids
Prob
abilit
y Ty
pe O
![Page 34: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/34.jpg)
BREATHE
![Page 35: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/35.jpg)
ExpressingcontinuousrandomvariablesProbabilitydensityfunction(PDF)◦ Describesthevaluestakenbyacontinuousr.v.X anditsassociatedprobabilities
◦ Functionsuchthatthearea underthecurvebetweenanytwopointsa,bcorrespondstotheprobabilitythatther.v.fallsbetweena,b
◦à 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = ∫ 𝑓 𝑥 𝑑𝑥QR
![Page 36: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/36.jpg)
Prob
abili
ty d
ensi
ty
Y
a b
Prob
abili
ty d
ensi
ty
Y
a b
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = S 𝑓 𝑥 𝑑𝑥Q
R
![Page 37: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/37.jpg)
PDFpropertiesContinuousr.v.'s areinfinitelyprecise:𝑃 𝑋 = 𝑥) = 𝑃(𝑥 ≤ 𝑋 ≤ 𝑥 = 0◦ ExactlyunlikePMFs
TotalareaunderthePDFequals1: ∫ 𝑓 𝑥 𝑑𝑥 = 1T>T
Probabilitiesaren'tnegative:𝑓 𝑥 ≥ 0
![Page 38: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/38.jpg)
ExpressingcontinuousrandomvariablesCumulativedistributionfunction(CDF)◦ Functiondefined,foraspecificvaluex ofacontinuousr.v.X,asF(x) = P(X ≤ x)◦ (mostly)thesameasfordiscrete
0.00
0.25
0.50
0.75
1.00
0.0 2.5 5.0 7.5 10.0x
y
0.00
0.25
0.50
0.75
1.00
5 10 15 20 25x
y
![Page 39: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/39.jpg)
RelationshipbetweenPDFandCDF
![Page 40: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/40.jpg)
Jumpingrightin:NormaldistributionThePDF(probabilitydistribution)foranormally-distributedrandomvariable:
𝑓 𝑥 = V7WXY� 𝑒𝑥𝑝 >([>\)Y
7XY
Wewritenormallydistributedr.v.'s as𝑋~𝑁(𝜇, 𝜎7)
It'sgross,everyoneknowsit,andyouwillbeneitherpluggingnorchuggingwiththisequation
![Page 41: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/41.jpg)
PDFofnormaldistributionExample,let'ssaywomen'sheights(cm)arenormallydistributedaccordingto𝑁(165, 64)◦ Popquiz:whatisthestandarddeviationofthisdistribution?
0.00
0.01
0.02
0.03
0.04
0.05
140 160 180Value
Probability
![Page 42: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/42.jpg)
Wikipediaweighsin
![Page 43: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/43.jpg)
MakingthePDFAnother"interesting"hack:
0.00
0.01
0.02
0.03
0.04
0.05
140 160 180Value
Probability
𝑁(165, 64)
> plot.range <- tibble(x = c(165 - 32, 165 + 32)) > ggplot(plot.range, aes(x=x)) +
stat_function(fun = dnorm, args=list(mean=164, sd=8))
![Page 44: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/44.jpg)
MakingtheCDF> data.cdf <- tibble(x = rnorm(10000, 164, 8))> ggplot(data.cdf, aes(x=x)) + stat_ecdf()
0.00
0.25
0.50
0.75
1.00
140 160 180x
y
![Page 45: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/45.jpg)
ExpectationandvarianceAnyguesses?
It'sinthedefinition:𝑋~𝑁(𝜇, 𝜎7)
![Page 46: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/46.jpg)
Typesofquestionsonecanask:◦ Whatistheprobabilitythatarandomly-chosenwomanistallerthan158cm?◦ Whatistheprobabilitythatarandomly-chosenwomanisbetween163—170cmtall?
◦ Whatistheprobabilitythatarandomly-chosenwomanisshorterthan167cm?
◦ Whatistheprobabilitythatarandomly-chosenwomanis168cmtall?
Workingwiththenormaldistribution
![Page 47: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/47.jpg)
Typesofquestionsonecanask:◦ Whatistheprobabilitythatarandomly-chosenwomanistallerthan158cm?◦ Whatistheprobabilitythatarandomly-chosenwomanisbetween163—170cmtall?
◦ Whatistheprobabilitythatarandomly-chosenwomanisshorterthan167cm?
◦ Whatistheprobabilitythatarandomly-chosenwomanis168cmtall?
Workingwiththenormaldistribution
![Page 48: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/48.jpg)
PropertiesofthenormaldistributionSymmetricaroundthemean
Mean=median=mode
Inflectionpoints
![Page 49: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/49.jpg)
Introducingthestandardnormal:𝑋~𝑁(0,1)
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5x
Probability
𝜇 =0
σ =1
TheseareZ-scores
![Page 50: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/50.jpg)
StandardNormal𝑋~𝑁(0,1)
The cumulative-distribution function (cdf) for a standard normal distribution is denoted by
( ) ( )x X xPr
where X follows an N(0,1) distribution. This function is shown in Figure 5.10.
The symbol is used as shorthand for the phrase “is distributed as.” Thus X ~ N(0,1) means that the random variable X is distributed as an N(0,1) distribution.
Column A in Table 3 of the Appendix presents (x) for various positive values of x for a standard normal distribution. This cumulative distribution function is illus-trated in Figure 5.11. Notice that the area to the left of 0 is .5.
0.4
0.3
0.2
0.1
f(x)
0.0
x0
f(x) = 12
e–(1/2)x2
0.4
0.3
0.2
0.1
f(x)
0.0
x0–1.00–1.96–2.58 1.00 1.96 2.58
68% of area95% of area99% of area
CHE-ROSNER-10-0205-005.indd 115 7/15/10 7:50:47 PM
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
![Page 51: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/51.jpg)
PDFandCDFof𝑋~𝑁(0,1)
Furthermore, the area to the left of x approaches 0 as x becomes small and ap-proaches 1 as x becomes large.
The right-hand tail of the standard normal distribution Pr(X x) is given in column B of Appendix Table 3.
If X ~ N(0,1), then find Pr(X 1.96) and Pr(X 1).
From the Appendix, Table 3, column A,
(1.96) .975 and (1) .8413
From the symmetry properties of the standard normal distribution,
( ) ( ) ( ) ( ) ( )x X x X x X x xPr Pr Pr1 1
This symmetry property is depicted in Figure 5.12 for x 1.
0.4
0.3
0.2
0.1
0.0x0
Pr(X x) = (x) =area to the left of x
f(x)
(x)
x–3 –2 –1 0
.0013
1 2 3
1.0
0.5
0.0.023
.16
.50
.84
.977 .9987
CHE-ROSNER-10-0205-005.indd 116 7/15/10 7:50:49 PM
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
Furthermore, the area to the left of x approaches 0 as x becomes small and ap-proaches 1 as x becomes large.
The right-hand tail of the standard normal distribution Pr(X x) is given in column B of Appendix Table 3.
If X ~ N(0,1), then find Pr(X 1.96) and Pr(X 1).
From the Appendix, Table 3, column A,
(1.96) .975 and (1) .8413
From the symmetry properties of the standard normal distribution,
( ) ( ) ( ) ( ) ( )x X x X x X x xPr Pr Pr1 1
This symmetry property is depicted in Figure 5.12 for x 1.
0.4
0.3
0.2
0.1
0.0x0
Pr(X x) = (x) =area to the left of x
f(x)
(x)
x–3 –2 –1 0
.0013
1 2 3
1.0
0.5
0.0.023
.16
.50
.84
.977 .9987
CHE-ROSNER-10-0205-005.indd 116 7/15/10 7:50:49 PM
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
Iftheshadedgreyarea=0.977,whatisx?
![Page 52: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/52.jpg)
StandardNormal𝑋~𝑁(0,1)Duetosymmetry,P(X≤-x)=1- P(X≤ x)
Calculate Pr(X 1.96) if X ~ N(0,1).
Pr(X 1.96) Pr(X 1.96) .0250 from column B of Table 3.
Furthermore, for any numbers a, b we have Pr(a X b) Pr(X b) Pr(X a) and thus we can evaluate Pr(a X b) for any a, b from Table 3.
Compute Pr( 1 X 1.5) if X ~ N(0,1).
Pr( 1 X 1.5) Pr(X 1.5) Pr(X 1)
Pr(X 1.5) Pr(X 1) .9332 .1587
.7745
Forced vital capacity (FVC), a standard measure of pulmonary function, is the volume of air a person can expel in 6 seconds. Current research looks at potential risk factors, such as cigarette smoking, air pollution, indoor al-lergies, or the type of stove used in the home, that may affect FVC in grade-school children. One problem is that age, sex, and height affect pulmonary function, and these variables must be corrected for before considering other risk factors. One way to make these adjustments for a particular child is to find the mean and standard deviation for children of the same age (in 1-year age groups), sex, and height (in 2-in. height groups) from large national surveys and compute a standardized FVC, which is defined as ( )X , where X is the original FVC. The standardized FVC then approximately follows an N(0,1) distribution, if the distribution of the original FVC values was bell-shaped. Suppose a child is considered in poor pulmonary health if his or her standardized FVC 1.5. What percentage of children are in poor pul-monary health?
Pr(X 1.5) Pr(X 1.5) .0668
Thus about 7% of children are in poor pulmonary health.
0.4
0.3
0.2
0.1
0.0
x0–1 1
1 – (1)(–1)
f(x)
CHE-ROSNER-10-0205-005.indd 117 7/15/10 7:50:50 PM
Copyright 2010 Cengage Learning, Inc. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part.
![Page 53: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/53.jpg)
For𝑋~𝑁(0,1),whatistheprobabilityP(X≤0.47)?
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47
# CDF: P(X <= 0.47)> pnorm(0.47)
[1] 0.6808225
![Page 54: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/54.jpg)
Normaldistributionfunctions
Normalfunction Meaning
dnorm(x) Density atX=x
pnorm(q) P(X<= x)
rnorm(n) Generate nrandomdrawsfromN(0,1)
qnorm(p) ObtainxfromgivenCDFarea:qnorm(0.6808225) = 0.47
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47
![Page 55: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/55.jpg)
For𝑋~𝑁(0,1),whatistheprobabilityP(-1.32≤x≤0.47)?
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47-1.32
![Page 56: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/56.jpg)
For𝑋~𝑁(0,1),whatistheprobabilityP(-1.32≤x≤0.47)?
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47-1.32
# P(X <= 0.47)> pnorm(0.47)
[1] 0.6808225
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
-1.32
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
0.47
= -
# P(X <= -1.32)> pnorm(-1.32)
[1] 0.093417510.587405
![Page 57: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/57.jpg)
For𝑋~𝑁(0,1),whatistheprobabilityP(-1≤x≤1)?AKAprobabilityofbeingwithin1standarddeviationofmean?
~0.68
![Page 58: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/58.jpg)
For𝑋~𝑁(0,1),whatistheprobabilityP(x≥ 2.14)?
## Two approaches:
> 1 - pnorm(2.14)[1] 0.01617738
> pnorm(-2.14)[1] 0.01617738
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
2.14
![Page 59: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/59.jpg)
For𝑋~𝑁(0,1),thetop8%ofthedistributionfallsabovewhatnumber?
0.0
0.1
0.2
0.3
0.4
−5 −4 −3 −2 −1 0 1 2 3 4 5Z
Probability
Area=0.08
???
> qnorm(1 - 0.08)[1] 1.405072
> -1 * qnorm(0.08)[1] 1.405072
![Page 60: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/60.jpg)
Historicalconsiderationofz-scores
![Page 61: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/61.jpg)
Re-scalingtostandardnormaltocomparedistributions𝑍 = [>\
X◦x=distributionvalueofinterest("rawscore")◦𝜇,𝜎 =r.v./populationmean,standarddeviation
![Page 62: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/62.jpg)
Example:Weightforapopulationofrabbitsfollowsanormaldistribution𝑁(2.6, 1.1)
WhatistheZ-scorefora3poundrabbit?
𝑍 = [>\X
=a>7.bV.V� =0.381
Whatisprobabilityarabbitweighslessthan3pounds?pnorm(0.381) = 0.648
Doesismakesensethatthisnumberispositive?
pnorm(3, 2.6, sqrt(1.1)) = 0.648 THEFUTUREISNOW
![Page 63: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/63.jpg)
Normaldistributionsfunctions,revisitedAllfunctionsassumestandardnormal.Provideadditionalargumentsforothernormals:
Standardnormal Anynormalpnorm(q) = pnorm(q, 0, 1) pnorm(q, mean, sd)
![Page 64: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/64.jpg)
Z-scoresaremostusefulforcomparingdifferentdistributionsWeightforrabbitpopAisdistributed𝑁(2.6, 1.1)
WeightforrabbitpopBisdistributed𝑁(2.9, 0.17)
Whichofthesetworabbitsisbigger?PopArabbitweighting2.95lbs,orpopBrabbitweighing3.1lbs?
PopulationA:𝑍 = [>\X= 7.dI>7.b
V.V� = 0.334
PopulationB:𝑍 = [>\X= a.V>7.d
f.Vg� = 0.485
![Page 65: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/65.jpg)
PuttingitalltogetherTheheightofEuropeanmenisdistributedas𝑁 175, 53.3
TheheightofEuropeanwomenisdistributedas𝑁(162.5, 34.8)
Whatproportionofmenisshorterthan150cm,akaP(man<150)?UsingZ-scores
𝑍 = [>\X= VIf>VgI
Ia.a� = -3.424
> pnorm(-3.424)[1] 0.0003085331
SkippingZ-scores
> pnorm(150, 175, sqrt(53.3))[1] 0.0003081516
![Page 66: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/66.jpg)
PuttingitalltogetherWhatproportionofwomenistallerthan162.5cm?
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
50%
![Page 67: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/67.jpg)
PuttingitalltogetherWhatproportionofwomenistallerthan170cm?
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
UsingZ-scores
𝑍 = [>\X= Vgf>Vb7.I
ai.j� = 1.2713
> 1 - pnorm(1.2713)[1] 0.101811
SkippingZ-scores
> 1 - pnorm(170, 162.5, sqrt(34.8))[1] 0.1017987
![Page 68: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/68.jpg)
PuttingitalltogetherWhatisthetallestawomancanbeandstillbeinthebottom22%?
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
UsingZ-scores
> qnorm(0.22)[1] -0.7721932
𝑍 = [>\X à x = 𝑍𝜎 + 𝜇
= −0.7722 ∗ 34.8� + 162.5= 𝟏𝟓𝟕. 𝟗𝒄𝒎
SkippingZ-scores
> qnorm(0.22, 162.5, sqrt(34.8))[1] 157.9447
![Page 69: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/69.jpg)
PuttingitalltogetherWhatistheshortest awomancanbeandstillbeinthetop 22%?
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
UsingZ-scores
> -1 * qnorm(0.22)[1] 0.7721932
𝑍 = [>\X à x = 𝑍𝜎 + 𝜇
= 0.7722 ∗ 34.8� + 162.5= 𝟏𝟔𝟕. 𝟎𝟓𝒄𝒎
SkippingZ-scores
> qnorm(1-0.22, 162.5, sqrt(34.8))[1] 167.0553
![Page 70: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/70.jpg)
PuttingitalltogetherWhatistheprobabilityarandomlychosenmanisbetween175–182cmtall?
à P(X<182)– P(X<175)=P(X<182)– 0.5
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
> pnorm(182, 175, sqrt(53.3)) – 0.5[1] 0.3311738
![Page 71: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/71.jpg)
PuttingitalltogetherWhatistheprobabilityarandomlychosenmaniseitherbetween175–182cmtallor between150—160cmtall?
à P(175<X<182)+P(150<X<160)
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
### First probability> pnorm(182, 175, sqrt(53.3)) – 0.5[1] 0.3311738
> ### Second prob.> pnorm(160, 175, sqrt(53.3)) – pnorm(150, 175, sqrt(53.3))[1] 0.01965059
> 0.3311738 + 0.01965059[1] 0.3508244
![Page 72: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/72.jpg)
PuttingitalltogetherIhavetworandomly-chosenEuropeanfriends,onemanandonewomaneach.Whatistheprobabilitythemanisatleast180cmandthewomanisbetween163—170cm?
à P(man>180)xP(163<woman<170)
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
### First probability> 1 - pnorm(180, 175, sqrt(53.3))
[1] 0.2467138
> ### Second prob.> pnorm(170, 162.5, sqrt(34.8)) – pnorm(163, 162.5, sqrt(34.8))
[1] 0.3644282
> 0.246713*0.3644282[1] 0.08990917
![Page 73: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/73.jpg)
PuttingitalltogetherIhavetwonewrandomly-chosenEuropeanfriends,onemanandonewomaneach.Whatistheprobabilitythemanis180cmandthewomanis163cm?
à P(man=180)xP(woman=163)
à 0
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
![Page 74: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/74.jpg)
PuttingitalltogetherAssume50.8%ofEuropeansarewomen.Ifarandomly-chosenpersonisshorterthan155cmtall,whatistheprobabilitythepersonisawoman?
à P(woman|<155)=
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
### P(<155 | woman)> pnorm(155, 162.5, sqrt(34.8))
[1] 0.1017987
P(<155|woman)*P(woman)/P(<155)
0.102 0.508
![Page 75: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/75.jpg)
SolvingthedenominatorP(<155)=P(<155andman)+P(<155andwoman)=
P(<155|man)*P(man)+P(<155|woman)*P(woman)
### P(<155 | man) > pnorm(155, 175, sqrt(53.3))
[1] 0.003076926
0.0031 0.492 0.102 0.508
=0.0533
![Page 76: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/76.jpg)
PuttingitalltogetherAssume50.8%ofEuropeansarewomen.Ifarandomly-chosenpersonisshorterthan155cm,whatistheprobabilitythepersonisawoman?
à P(woman|<155)=
Men:𝑁 175, 53.3Women:𝑁(162.5, 34.8)
P(<155|woman)*P(woman)/P(<155)
0.102 0.508 0.533
=0.972
![Page 77: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/77.jpg)
BREAK
![Page 78: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/78.jpg)
StatisticalinferencePopulation Sample
Random sampling
Statistical inferencePopulation parameters !, "
Sample estimates x, s
![Page 79: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/79.jpg)
TwomainflavorsofstatisticalinferenceEstimation◦ Estimateapopulationparameterfromsampledata◦ Pointestimates:Whatisthepopulationmean?◦ Intervalestimates:Inwhatrangeofvaluesisthepopulationmeanlikelytofall?
Hypothesistesting◦ Testwhetherthevalueofapopulationparameterisequaltosomespecificvalue◦ Isthereevidencethatmysamplediffersfromsomeunderlyingpopulation?
![Page 80: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/80.jpg)
ThesamplingdistributionTheprobabilitydistributionofvaluesforanestimatethatweobtainundersampling
![Page 81: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/81.jpg)
Obtainingasamplingdistribution
0
1000
2000
3000
0 5000 10000 15000nucleotides
count
> genes <- read.csv("genes.csv")> head(genes)
nucleotides1 37852 74163 21354 76825 57666 11079
> mean(genes$nucleotides)[1] 2761.039 > sd(genes$nucleotides)[1] 2037.645
> ggplot(genes, aes(x=nucleotides)) + geom_histogram(fill="white", color="black")
![Page 82: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/82.jpg)
Obtainingasamplingdistribution### the function sample_n draws a random sample of rows
> small.sample <- genes %>% sample_n(25) > mean(small.sample$nucleotides)
[1] 2151.8
> ggplot(small.sample , aes(x = nucleotides)) + geom_histogram() +geom_vline(xintercept=2151.8, color="blue") + geom_vline(xintercept= 2761.039, color="red")
0
1
2
3
0 1000 2000 3000 4000nucleotides
count
geom_vline(xintercept=…)geom_hline(yintercept=…)geom_abline(yintercept=…, slope=…)
ThesamplemeanforarandomsampleofN=25is𝒙w = 𝟐𝟏𝟓𝟏. 𝟖
![Page 83: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/83.jpg)
ObtainingasamplingdistributionNowimaginewedraw20samplesofN=25andcomputeeachoftheirmeans:> head(n20.means)sample.mean
1 2584.842 2574.123 2382.644 3143.685 2252.566 2368.44
Samplingdistributionofthemean
0
1
2
3
2000 2500 3000 3500sample.mean
count
![Page 84: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/84.jpg)
QuantifyingthesamplingdistributionThestandarderror isthestandarddeviationoftheestimateofthesamplingdistribution◦ Standarderrorofthemean:𝑆𝐸[̅ =
X<�,approximatewith }
<�
◦ SEisnotthestandarddeviationofasample◦ Here,nrepresentsthenumberofsamples (not thesamplesize)
Italsoquantifiestheprecisionofourestimate,i.e.howfarfromthepopulationparameterweare
![Page 85: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/85.jpg)
Computingthestandarderrorofthemean
> head(n20.means)sample.mean
1 2584.842 2574.123 2382.644 3143.685 2252.566 2368.44
> sd(n20.means$sample.mean) / sqrt(20)[1] 93.11888
0
1
2
3
2000 2500 3000 3500sample.mean
count
Samplingdistributionofthemean
![Page 86: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/86.jpg)
SeveralsamplingdistributionscomprisedofNsamples,eachofn=25
0
1
2
3
2000 2500 3000 3500sample.mean
count
0
1
2
3
4
5
2000 2500 3000 3500 4000sample.mean
count
0
25
50
75
100
2000 3000 4000sample.mean
count
0
300
600
900
1200
2000 3000 4000sample.mean
count
N=20 N=50 N=100 N=1000 N=10000
0.0
2.5
5.0
7.5
10.0
12.5
2000 2500 3000 3500 4000sample.mean
count
![Page 87: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/87.jpg)
StandarderrordecreasesasNincreases
0
1
2
3
2000 2500 3000 3500sample.mean
count
0
1
2
3
4
5
2000 2500 3000 3500 4000sample.mean
count
0
25
50
75
100
2000 3000 4000sample.mean
count
0
300
600
900
1200
2000 3000 4000sample.mean
count
N=20 N=50 N=100 N=1000 N=10000
SE=93.1 SE=58.1 SE=37.9 SE=13.3 SE=4.02
0.0
2.5
5.0
7.5
10.0
12.5
2000 2500 3000 3500 4000sample.mean
count
![Page 88: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/88.jpg)
Therefore,meanofsamplingdistributionapproachespopulationmean 2761.039
0
1
2
3
2000 2500 3000 3500sample.mean
count
0
1
2
3
4
5
2000 2500 3000 3500 4000sample.mean
count
0
25
50
75
100
2000 3000 4000sample.mean
count
0
300
600
900
1200
2000 3000 4000sample.mean
count
N=20 N=50 N=100 N=1000 N=10000
SE=93.1 SE=58.1 SE=37.9 SE=13.3 SE=4.02𝒙w=2780.89 𝒙w=2753.91 𝒙w=2781.51 𝒙w=2777.02 𝒙w=2763.82
0.0
2.5
5.0
7.5
10.0
12.5
2000 2500 3000 3500 4000sample.mean
count
![Page 89: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/89.jpg)
TheCentralLimitTheoremAssamplesizeincreases,thesamplingdistributionofthemean willbeapproximatelynormalregardlessoftruepopulationdistribution
0
300
600
900
1200
2000 3000 4000sample.mean
count
0
1000
2000
3000
0 5000 10000 15000nucleotides
count
Populationdistribution N=1e4samplingdistribution
![Page 90: Probability Distributions and Introduction to …sjspielman.org/bio5312_fall2017/slides/day4_distributions.pdfProbability Distributions and Introduction to Statistical Inference BIO5312](https://reader036.fdocuments.in/reader036/viewer/2022062504/5ac850207f8b9a6b578bfd08/html5/thumbnails/90.jpg)
Nextweek..Introductiontohypothesistestingandcomparingmeans
Morefunfactsonestimationwillcomelaterinthesemester,tobebundledwith*likelihood*