Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...
Transcript of Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...
Lecture 3
Sampling, sampling
distributions, and
parameter estimation
Sampling
Definition
• Population is defined as the collection
of all the possible observations of
interest.
• The collection of observations we take
from the population is called a sample.
• The number of observations in the
sample is called the sample size.
Sampling
• When we are interested in a
population, we typically study a
sample of that population rather than
attempt to study the entire population.
• The sample should ideally be a
representation of the population with
similar characteristics.
Principles of sampling
• 1. Same distribution. All variables
in the sample X1, …, Xn have the
same distribution as in the entire
population.
• 2. Independence. X1, …, Xn are
independent. In other words, each
observation has no relationship
with others.
Simple random sampling
• Simple random sampling is the most
straightforward of the random sampling
strategies. We use this strategy when we
believe that the population is relatively
homogeneous for the characteristic of
interest. i.e. no population structure
Simple random sampling • For example, let's say you were surveying
first-time parents about their attitudes toward mandatory seat belt laws. You might expect that their status as new parents might lead to similar concerns about safety.
• On campus, those who share a major might also have similar interests and values; we might expect psychology majors to share concerns about access to mental health services on campus.
Simple Random Sampling
Other sampling methods
• Systematic sampling
• Stratified sampling
• Proportionate sampling
• Cluster sampling
• Multistage sampling
• And so on
Sample statistic and
distribution
Sample mean and sample variance
• Let X1, …, Xn be a random sample
• Sample mean
• Sample variance
• Sample standard error (deviation)
n
i
iXn
X1
1
n
i
i XXn
S1
22
1
1
n
i
i XXn
S1
2
1
1
2
1
22 )(1
1XnX
nS
n
i
i
For frequency or grouping data
• Let X1, …, Xk be group values, f1, …, fk
be the frequency of each group,
f1+ …+fk=1
• Sample mean
• Sample variance
k
i
ii XfX1
2
1
2
1
22 XXfXXfSk
i
ii
k
i
ii
Properties of sample
mean and variance • Let X1, . . . ,Xn be a random sample from a normal
distribution , and let
• Then we have
• (1) and are independent random variables.
• (2) has a normal distribution, i.e.
• (3) has a chi-squared distribution with
n − 1 degrees of freedom.
),( 2N
n
i
iXn
X1
1
n
i
i XXn
S1
22
1
1
X 2S
X ),( 2 nN 22)1( Sn
2
1
2
1
22 )()1( XnXXXSSSnn
i
i
n
i
i
distribution
• If Xi~N(0,1), i=1,…,n,
and Xi are
independent
Define
The obeys
distribution with n
degrees of freedom,
denoted by
2
n
i
iX1
22
2
)(~ 22 n
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0 2 4 6 8
p(x)
x
n=1
n=2
n=3
n=4
n=6
n=9
)(2 n
2
Student’s t distribution • If X~N(0,1), Y~ , X
and Y are independent
Define
Then t obeys t
distribution with n
degrees of freedom,
denoted by
t~t(n)
)(2 n
nY
Xt
t(n)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-6 -4 -2 0 2 4 6
p(x)
x
n=1
n=2
n=3
n=4
n=6
n=9
F distribution
• X and Y are independent,
Define
Then F obeys F distribution
with degrees of freedom n1
and n2, denoted by
F~F(n1, n2)
)(~ 1
2 nX )(~ 2
2 nY
2
1
nY
nXF
F(d1,d2)
0
0.1
0.2
0.3
0.4
0.5
0.6
0 1 2 3 4 5 6
p(x)
x
d1=1,d2=1
d1=2,d2=1
d1=5,d2=2
d1=100,d2=1
Relationship between different distributions
Binomial
distribution
Two-point
distribution
n=1
nxxx 21
p is large;
np is moderate
(Unit time)
Poisson
distribution p(λ)
(Time interval Δt)
Poisson
distribution p(λΔt)
Time for the first jump
Exponential
distribution
exp(λ)
Time for the αth jump
n 21
Erlang dis-
tribution Γ(α, λ)
α>0
λ= β
Gamma
distribution Γ(α, β)
n is large
Standardized
Normal
distribution N(0,1)
F(x) n is
large
Uniform
distribution
U(0,1)
Х2(n)
222
21 nXXX
nZ
XT 1
)1,0(~1 NX )(~ 2 nZ
t distribution
t(n)
α=n/2, β =2
X1 and Z are independent
21
1
21
2
1
B
tindependen are and
)1,(~
)1,(~
xx
x
xx
x
x
Beta distribution
Beta(α, β)
F distribution
F(m, n)
n
x
m
xF
tindependen are and
)(~
)(~
21
21
22
21
xx
nx
mx
n is large
α=1, β=1
Statistical inference
• Statistical inference: Drawing
conclusions about the whole
population on the basis of a sample
• Precondition for statistical inference:
A sample is randomly selected from
the population (=probability sample)
Parameter estimation
Parameter estimation • Parameter estimation is an important
problem in statistics. It can divided into two
types:
– 1. Point estimation: it involves the use of
sample data to calculate a single value (known
as a statistic) which is to serve as a "best
guess" or "best estimate" of an unknown (fixed
or random) population parameter.
– 2. Interval estimation: it is the use of sample
data to calculate an interval of possible (or
probable) values of an unknown population
parameter.
Point estimation
• X~F(x, θ), θ is unknown. The target of
point estimation is to give a statistic and
there is a group of observations X1, X2, …,
Xn. The estimator of θ denotes as
• For example, when θ=E(X), we can use
mean of samples as the estimator of θ, i.e.
),,,(ˆˆ21 nXXX
n
i
iXn 1
1̂
Two commonly used
point estimation methods
• Maximum likelihood method
• Moment method
Maximum likelihood
estimate (MLE)
• This method is to maximize the
likelihood function for getting the
estimator of parameters.
• The probability density function of X
is p(x; θ), and θ is unknown.
Suppose there is a sample
observations X1, X2, …, Xn for X.
Maximum likelihood method
• Then the combined probability function is
• We call the above function the likelihood
function. Define the logarithm of likelihood
as
• Let , then we can calculate the
maximum likelihood estimator (MLE) of θ.
);(x);(x);(x);(x;,,,x)(1
22 in11 )( ppppxxLLn
in
n
i
in xpxxLL1
21 ;ln;,,,xlnln )()()(
0ln
d
Ld )(
Maximum likelihood method • When the likelihood function contains k
parameters , then
• The maximum likelihood estimator of
: , i=1,…,k
are the solution of k equations
),,,;p(x,,, 211
21 k
n
ikL i)(
kiL
i
k ,,2 ,1,0,,ln ,21
)(
k ,,, 21
k ,,, 21 nii xxx• ,,,ˆˆ21
Example
• Assume X1 , X2 , … , Xn are random
samples from a normal distribution ,
how to get the maximum likelihood
estimator of parameters .
• Solution: The likelihood function is
),( 2N
2 and
n
i
i
n
n
i
i
x
xL
1
2
2
2
2
12
22
2
1exp
2
1
]2
exp[2
1,
)()(
Example
• Then
• Then the derivative equations are
n
i
ixnn
L1
2
2
22
2
1ln
22ln
2ln )()(),(
n
i
n
i
xn
L
xL
1
2
42
2
2
12
2
02
1
2ln
01
ln
)(),(
)(),(
Example
• So the solutions are
• The maximum likelihood estimator of is
n
i
i
n
i
i
xn
xxn
1
22
1
1
1
)(
n
i
i xxn 1
1̂
Example • When is known, MLE of is
• If is unknown, MLE of is
• MLE of is not equal to the sample
variance! Which one is better?
2
n
i
i xxn 1
22 1ˆ )(
n
i
ixn 1
22 1ˆ )(
2
2
Moment method
• Basic idea: equating sample moments with
unobservable population moments and
then solving those equations for the
quantities to be estimated.
• Suppose the probability density function of
X is p(x; ), then the r th moment
of X is
dxxpxXE k
rr
r ),, ,;()( 2-
1
k ,,, 21
Moment method
• Suppose there is a sample observations X1, X2, …, Xn for X. Then the r th moment of samples are
• Equate the j th (j=1, …, k) sample moments with unobservable population moments
n
i
r
ir Xn
a1
1
kkk
k
k
a
a
a
),,,(
),,,(
),,,(
21
2212
1211
Moment method
• Solve the equations, then we can get the
estimator of θ:
• We call them the moment estimators of θ.
k ˆ,,ˆ,ˆ 21
Example • X1, X2, …, Xn are samples from Uniform
distribution
• Then
• So
• The moment estimator of θ is
otherwise,0
0,1
)(x,
x
p
2
1),(
0-1
xdxdxxxp
n
i
ixn
a1
1
1
xxn
n
i
i 1
1ˆ
2
x2ˆ
Desirable properties of estimator
• Since estimator gives an estimate that
depends on sample points (X1, X2,…, Xn),
estimate is a function of sample points.
Sample points are random variable,
therefore estimate is random variable and
has probability distribution.
• We want that estimator to have several
desirable properties like
• 1. Unbiasedness
• 2. Effectiveness
• 3. Minimum mean square error
Unbiasedness • An estimator is said to be unbiased if the
expected value of the estimator is equal to
true value of the parameter being
estimated, or
• Example: sample proportion is the
unbiased estimator of population proportion
)̂(E
Effectiveness
• The most efficient estimator among a
group of unbiased estimators is the one
with the smallest variance.
• Generally speaking, assuming
and are two unbiased
estimators of θ, and , then is
said to be more effective than .
)X,,X,(Xˆ211 n
)X,,X,(Xˆ212 n
)ˆ()ˆ( 21 VV 1̂
2̂
Minimum mean square error (MSE)
• Basic idea: minimize the average
deviation between the estimation and
true value.
• We call the estimator which minimize
as the minimum mean square error
estimator of θ.
}]-)X,,X,(Xˆ{[ 2
21 nE
Interval estimation • Estimation of the parameter is not
sufficient. It is necessary to analyze and see how confident we can be about this particular estimation.
• One way of doing it is defining confidence intervals. If we have estimated θ we want to know if the “true” parameter is close to our estimate. In other words we want to find an interval that satisfies following relation:
1)( UL GGP
Interval estimation • i.e. probability that “true” parameter
θ is in the interval (GL, GU) is greater
than 1-.
• Actual realization of this interval - (gL,
gU) is called a 100(1- )% of
confidence interval, limits of the
interval are called lower and upper
confidence limits. 1- is called
confidence level.
Example • If population variance is known (2)
and we estimate population mean then
• We can find from the table that
probability of Z is more than 1 is equal
to 0.1587. Probability of Z is less than
-1 is again 0.1587. These values
comes from the table of the standard
normal distribution.
)1 ,0( ~ /
Nn
xZ
Example • Now we can find confidence interval for the
sample mean. Since:
• Then for we can write
• Confidence level that “true” value is within 1 standard error (standard deviation of sampling distribution) from the sample mean is 0.6826. Probability that “true” value is within 2 standard error from the sample mean is 0.9545.
6826.01587.0*21
)1()1(1)1()1()11(
ZPZPZPZPZP
6826.0)//()1/
1(
nxnxPn
xP
Interval estimation • Above we considered the case when
population variance is known in advance. It is rarely the case in real life. When both population mean and variance are unknown we can still find confidence intervals. In this case we calculate population mean and variance and then consider distribution of the statistic:
• Here S2 is the sample variance.
nS
xZ
/
Interval estimation • Since it is the ratio of the standard normal
random variable to square root of 2 random
variable with n-1 degrees of freedom, Z has
Student’s t distribution with n-1 degrees of
freedom. In this case we can use table of t
distribution to find confidence levels.
• It is not surprising that when we do not know
sample variance confidence intervals for the
same confidence levels becomes larger. That is price we pay for what we do not know.
Interval estimation
• If number of degrees of freedom
becomes large, then t distribution is
approximated well with normal
distribution. For n>100 we can use
normal distribution to find confidence
levels, intervals.
The Law of Large Numbers
and Central Limit Theorem
The Law of Large Numbers
• Assume X1 , X2 , … , Xn are random
samples of X. and V(X) exist. Let , then for any given ,
XE
n
i
iXn
X1
10ε
1 lim
XPn
The Central Limit Theorem
Let be the mean of a random
sample X1, X2, …, Xn, of size n from a
distribution with a finite mean and a
finite positive variance 2. Then
X
)1,0( /
Nn
XY
Small probability event
• A small probability event is an event
that has a low probability of occurring.
• The small probability event will hardly
happen in one experiment. This
principle is used for hypothesis and
tests.
• An event is a small probability event,
so it will hardly happen in theory. But if
it happens actually, then we reject H0.
Experiments on the distribution of
sample mean and sample variance
• Use RAND() in EXCEL to generate pseudo-
random numbers X1 and X2 of U(0,1):
uniform distribution on the interval [0, 1]
• Use transformation to generate random
numbers Y1 and Y2 of N(0, 1)
• Use transformation to generate random
numbers Z1 and Z2 of N(μ, σ2)
)2cos()ln(2 ),2sin()ln(2 212211 XXYXXY
YZ
Let’s do some exercises together
• Draw 100 random samples from U(0, 1)
• Draw the frequency distribution of the 100
samples
• Draw 100 sets of 5 samples from N(10, 10)
• Draw the frequency distribution of the 500
samples, and compare it with N(10, 10)
• Draw the frequency distribution of the 100
sample means and 100 sample variances
• Compare the distribution of with N(10, 2)
• Compare the distribution of with
X
104 2S )4(2