STAT 231 MIDTERM 2
description
Transcript of STAT 231 MIDTERM 2
STAT 231 MIDTERM 2
Introduction
• Niall MacGillivray• 2B Actuarial Science
Agenda
• 6:05 – 6:25 Likelihood Functions and MLEs
• 6:25 – 6:35 Regression Model• 6:35 – 6:50 Gaussian, Chi Square, T RVs• 6:50 – 7:00 Sampling and Estimators• 7:00 – 7:30 Confidence Intervals• 7:30 – 8:00 Hypothesis Testing
Probability Models
• Random Variables– Represents what we’re going to measure in our
experiment
• Realizations– Represents the actual data we’ve collected from
our experiment
Binomial Model
• Problem: what is π, the proportion of the target population that possesses a certain characteristic
• We will use our data to estimate π• Let X be a random variable that represents the number of
people in your sample (of size n) that possesses the characteristic– X ~ BIN (n, π)
• A realization of X will give us the number of people in our sample that possesses the characteristic.
Response Model
• Problem: what is μ, the average variate of the target population
• We will use our collected data to estimate μ• Let Y be a random variable that represents the measured
response variate• Y = μ + R R~G(0, σ )
– Y ~ G(μ, σ)• A realization of Y is the measured attribute of one unit in the
sample
Maximum Likelihood Estimation
• Binomial = ; x = # of successes
• Response = ; yi is the ith realization
• Maximum Likelihood Estimation – A procedure used to determine a parameter
estimate given any model
n
x
n
yn
ii
1
Maximum Likelihood Estimation
• First, we assume our data collected will follow a distribution
• Before we collect the sample random variables– {Y1, Y2, …, Yn}
• After we collect the sample realizations– {y1, y2, …, yn}
• We know the distribution of Yi (with unknown parameters), hence we know the PDF/PMF
Likelihood Function
• The Likelihood Function:
• Likelihood: the probability of observing the dataset you have– We want to choose an estimate of the parameter θ that gives the
largest such probability– Ω is the parameter space, the set of possible values for θ
,);()(1
ii
n
i
yYPL Discrete (with sample)
Discrete (no sample) ,);()( yYPL
,);()(1
i
n
i
yfL Continuous
MLE Process
• Step One: Define the likelihood function
• Step Two: Define the log likelihood function l(θ ) = ln[L(θ)]
• Step Three: Take the derivative with respect to θ– If there are multiple parameters to estimate: take partial derivatives
with respect to each parameter
• Step Four: Solve for zero to arrive at the maximum likelihood estimate
• Step Five: Plug in data values (if given) to arrive at a numerical maximum likelihood estimate (or values of other MLEs if multiple parameters)
Example 1
Derive the MLEs of the Gaussian distributionwith parameters μ and σ, for a sample of data ofsize n.
2)(2
1
2
1)(
iy
i eyf
Regression Model• Let Y|{X=x} be a random variable that represents the measured
response variate, given a certain value of the explanatory variate• We can define a distribution Y|{X=x} ~ G(μ(x), σ)
– Simple Linear Regression: Y |{X=x} = α + βx + R, where R ~ G (0, σ)– Y|{X=x} ~ G(α + βx , σ)
• Response Model: μ is the average response variate of the target population
• Regression Model: α + βx is the average response variate of the subgroup in the target population, as specified by the value of the explanatory variate x– Allows us to look at subgroups within the target population
Regression Model• Problem: what is α, the average response variate of the
subgroup in the target population where the explanatory variate, x, is equal to 0?
• Problem: what is β, the change in the average value of the response variate, given a one unit change in x, our explanatory variate?
• We will use our collected data to estimate α or β using the MLE method
Example 2
Derive the MLEs of the simple linear regressionmodel with parameters α, β, and σ given asample of size n.
Gaussian DistributionGaussian Distribution• f(x; μ, σ) =
• If Y ~ G(μ,σ), then Z = ~ G(0,1)• If Y1,…,Yn are independent G(μ1,σ1),…,G(μn,σn):
– ~ G( , )
• If Y1,…,Yn are iid G(μ, σ):– ~ G(nμ, σ )
– = ~ G(μ, σ/ )
2)(2
1
2
1
x
e
Y
n
i
iiYb1
n
i
iib1
n
i
iib1
22
n
i
iY1
n
Y
n
i n
Yi
1n
Chi-Squared
• Y, a random variable following a distribution is defined as
, Y > 0where – We say , where n = degrees of freedom– E(Y) = n– If and , – Can use tables to get quantiles based on degrees of
freedom
2n
222
21 ... nXXXY
)1,0(~ GX i
)(~ 2 nY
)(~ 112 nY )(~ 22
2 nY )(~ 21212 nnYY
Chi-Squared Table
Example 3
• Prove that, if X ~ and Y ~ , a) E(X) = m b) X + Y ~
• If m = 10, estimate P(X > 20)
2m 2
p
2pm
t Distribution
• T, a random variable following a distribution, is defined as
where and Z and S are independent
We say where m is the degrees of freedom
mt
mSZ
T
)1,0(~ GZ )(~ 22 mS
mtT ~
T Table
Estimators
• In the data stage, we assume follows a certain distribution– Population Distribution
• From MLE in Example 1, we obtained an estimate – This is a number
• The corresponding estimator is – This is a random variable– Replace all instances of realizations (xi) with RVs (Xi)
– If = g(y1,y2,…,yn), then = g(Y1,Y2,…,Yn)
• We can then look at the distribution of these parameter estimators, aka the sampling distribution of θ
• We can make probability statements about the accuracy of our parameter estimates
1)ln(
1
n
iix
n
1)ln(
~
1
n
iiX
n
iX
~
~
Response Model Estimators• In the response model Y = μ + R ~ G(μ, σ ):
– For a sample y1,y2,…,yn , = / n
• The corresponding estimator = / n
• The distribution of ~ G(μ, σ/ )• The sample error, - μ , ~ G(0, σ/ )
– For a sample y1,y2,…,yn , =
• The corresponding estimator =
• In this case, we call the sample variance
n
iiy
1
~
n
iiY
1
n
1ˆ n
n
i
i
n
y
1
2
1
)ˆ(
1~
n
n
i
i
n
Y
1
2
1
)~(
n
21
~n
~
~
Confidence Intervals• In estimation problems, we use collected data to determine
estimates for model parameters
• Confidence intervals are statements about the true model parameter being in between two values: – We can make a statement about our ‘confidence’ that μ is
located somewhere between a and b– The confidence is measured by probability statements
• We will use sampling distributions to make probability statements as a starting point in determining the end values of the confidence interval
ba
Confidence Intervals
• A confidence interval helps answer the question: “What is the probability that ?”
– C(θ) = P[ ] coverage probability– Confidence interval = [l(D), u(D)] – Interpretation: the true value of the parameter θ will fall
in the confidence interval [l(D), u(D)] in proportion C(θ) of all cases
)()( DUDL )()( DUDL
Confidence Intervals for the Response Model
• Sampling distribution for the response model:
• But we want a distribution we can work with (we want to use our probability tables) so standardizing gives
• For now, we will assume the true value of σ (population standard deviation) is known
),(~~n
G
)1,0(~~
G
n
~
Confidence Intervals for the Response Model
• Our goal: find (a,b) such that P( ) = 0.95• Method: construct a 95% interval estimator (coverage interval) such that
or equivalently
What are a and b? Use = to get a confidence interval
ba
95.0)~
(
c
n
P
95.0)~
(
c
n
cP
~
Example 4
),(n
cn
c
• Coverage Interval:
• Confidence Interval:
• is called the Standard Error n
)~,~(n
cn
c
Confidence Intervals for the Response Model
• Often, we don’t know the value of σ• So we need to use the sample standard
deviation as an estimator for σ:
becomes
where =
nn 1
~~
n
~
1~
n
n
i
i
n
y
1
2
1
)~(
Confidence Intervals for the Response Model
• no longer follows a G(0,1) distribution!
– ~ tn-1
– New 95% CI for :
nn 1
~~
nn 1
~~
),( 11
nc
nc nn
95.0)~~
(1
c
n
cPn
Example 5
),( 11
nc
nc nn
T Table
Confidence Intervals for the Binomial Model
• Population Distribution: Y ~ Bin (n, π)• The parameter we want to estimate is π
• Using MLE, we get an estimate of – This is a number
• The corresponding estimator is – This is a random variable– What is the sampling distribution?
n
y
n
Y~
Confidence Intervals for the Binomial Model
• To derive the sampling distribution for , consider the
expectation and variance of :– E(Y) = nπ– Var(Y) = nπ(1 – π)
• CLT tells us that, for large n, Y is well approximated as a Gaussian:
• Then will also be a Gaussian:
))1(
,(~~n
G
),(~ nBinY
))1(,(~ nnGY
n
Y~
n
Y~
Confidence Intervals for the Binomial Model
Standardizing gives
We will use an approximation instead of
Confidence Interval:
n
)1(
)1,0(~)1(
~G
n
n
)~1(~
))1(
,)1(
(n
cn
c
Example 6
))1()1(
(n
cn
c
Confidence Intervals for the Regression Model
• Population Distribution:• Using MLE, we obtain:
• Your course notes simplify by:
n
ii
n
iii
xx
xxY
1
2
1
)(
)(~
n
ii
ii
xx
xxc
1
2)(
)(
n
iiicY
1
~
),(~}{| xGxXY
Confidence Intervals for the Regression Model
• What is the sampling distribution of ?
– is a linear combination of independent Gaussians, and thus is Gaussian itself
–
– Standardizing gives
– If σ is unknown, then
n
iiicY
1
~
~
),(~~
1
2
n
iicG
)1,0(~~
1
2
G
cn
ii
2
1
22
~~
~
nn
iin
t
c
Confidence Intervals for the Regression Model
• Confidence Interval:
– Assuming sigma is unknown, we will get c from the t table with (n – 2) degrees of freedom
),(1
22
1
22
n
iin
n
iin cccc
TerminologyThe random variables that we’ve used to construct confidence intervals
are called pivotal quantities– Distribution does not depend on choice of parameters
Confidence intervals are often written in the form– Point Estimate c Standard Error (SE)– Point Estimate: the MLE for the parameter– c: found using probability tables depending on the distribution of the
pivotal quantity
)1,0(~~
G
n
1
1
~~~
n
n
t
n
)1,0(~)~1(~
~G
n
2
1
22
~~
~
nn
iin
t
c
Terminology
Standard Error (SE): square root of the variance of our sampling distribution (replace all unknown parameters (i.e. σ) with estimates)
• Response (σ known)
• Response (σ unknown)
• Binomial
• Regression
n
nn 1
n
)1(
n
iin c
1
22
Confidence Interval Recap
Response Model (σ known)
Response Model (σ unknown)
Binomial Model
Regression Model
nc
nc
nc
)1(
n
iin cc
1
22
Interpretation of theConfidence Interval
• Does NOT mean there’s a 95% chance our true parameter will be between a and b
• 95% confidence interval: after repeatedly collecting data and calculating lots of confidence intervals, around 95% of them will contain the actual parameter
Hypothesis Testing
1) Define the null hypothesis, define the alternate hypothesis
2) Define the test statistic, identify the distribution , calculate the observed value
3) Calculate the p-value
4) Make a conclusion about your hypothesis
Hypothesis Testing1) Define the null hypothesis, define the alternate
hypothesis:
Null hypothesis always contains an “=” sign!
00 : H 0: aH
00 : H 0: aH
00 : H0: aH
Hypothesis Testing2) Define the test statistic, identify the distribution; calculate
the observed value
Assume that H0 will be tested using some
random data
Test Statistic: random variable, denoted DDistribution: of the test statistic, the standardized sampling
distribution of the model based on H0
Observed Value: a realization of the test statistic from our data
Hypothesis Testing
Test Statistics:
These distributions only hold because of theNull hypothesis: θ = θ0
)1,0(~)1(
~0 G
n
D
)1,0(~
~0 G
n
D
11
0 ~~~
n
n
t
n
D
Response (σ known)
Response (σ unknown)
Binomial
Hypothesis Testing
Calculate the observed value:
n
dobs)1(
0
n
dn
obs
1
0
n
dobs 0
Response (σ known)
Response (σ unknown)
Binomial
Hypothesis Testing3) Calculate the p-value
p-value = for
p-value = for
p-value = for– p-value (aka observed significance level) is the tail probability of
observing a dataset more extreme than our sample data, given H0 is true
)( obsdDP
)( obsdDP
)( obsdDP
00 : H
00 : H
00 : H
Hypothesis Testing4) Make a conclusion about your hypothesis
General Rule of Thumb• If the p-value > 0.05, do not reject the null
hypothesis• If the p-value < 0.05, reject the null hypothesis
Example 7
What if we want to test if the average weight is less than 18 ounces?
T Table
Example 8
Yao Ming is assumed to shoot free throws at an 80% success rate. In a sample of 50 free throws, Yao Ming makes 45. Test the hypothesis that Yao is an 80% free throw shooter.
T Table
Example 9
Professor Banerjee models the relationship between Stat 230 marks (X) and Stat 231 marks (Y) using a simple linear regression model and a sample of size 102. He obtains the following results:
MLE for Alpha: 99% Standard Error for Alpha: 10%MLE for Beta: -0.4 Standard Error for Beta: 0.20MLE for Sigma: 0.21 Standard Error for Sigma 0.04
Test the hypothesis that Beta = 0.
Questions
Questions???