Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...

Lecture 3

Sampling, sampling

distributions, and

parameter estimation

Sampling

Definition

• Population is defined as the collection

of all the possible observations of

interest.

• The collection of observations we take

from the population is called a sample.

• The number of observations in the

sample is called the sample size.

Sampling

• When we are interested in a

population, we typically study a

sample of that population rather than

attempt to study the entire population.

• The sample should ideally be a

representation of the population with

similar characteristics.

Principles of sampling

• 1. Same distribution. All variables

in the sample X1, …, Xn have the

same distribution as in the entire

population.

• 2. Independence. X1, …, Xn are

independent. In other words, each

observation has no relationship

with others.

Simple random sampling

• Simple random sampling is the most

straightforward of the random sampling

strategies. We use this strategy when we

believe that the population is relatively

homogeneous for the characteristic of

interest. i.e. no population structure

Simple random sampling • For example, let's say you were surveying

first-time parents about their attitudes toward mandatory seat belt laws. You might expect that their status as new parents might lead to similar concerns about safety.

• On campus, those who share a major might also have similar interests and values; we might expect psychology majors to share concerns about access to mental health services on campus.

Simple Random Sampling

Other sampling methods

• Systematic sampling

• Stratified sampling

• Proportionate sampling

• Cluster sampling

• Multistage sampling

• And so on

Sample statistic and

distribution

Sample mean and sample variance

• Let X1, …, Xn be a random sample

• Sample mean

• Sample variance

• Sample standard error (deviation)

n

i

iXn

X1

1

n

i

i XXn

S1

22

1

1

n

i

i XXn

S1

2

1

1

2

1

22 )(1

1XnX

nS

n

i

i

For frequency or grouping data

• Let X1, …, Xk be group values, f1, …, fk

be the frequency of each group,

f1+ …+fk=1

• Sample mean

• Sample variance

k

i

ii XfX1

2

1

2

1

22 XXfXXfSk

i

ii

k

i

ii

Properties of sample

mean and variance • Let X1, . . . ,Xn be a random sample from a normal

distribution , and let

• Then we have

• (1) and are independent random variables.

• (2) has a normal distribution, i.e.

• (3) has a chi-squared distribution with

n − 1 degrees of freedom.

),( 2N

n

i

iXn

X1

1

n

i

i XXn

S1

22

1

1

X 2S

X ),( 2 nN 22)1( Sn

2

1

2

1

22 )()1( XnXXXSSSnn

i

i

n

i

i

distribution

• If Xi~N(0,1), i=1,…,n,

and Xi are

independent

Define

The obeys

distribution with n

degrees of freedom,

denoted by

2

n

i

iX1

22

2

)(~ 22 n

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 2 4 6 8

p(x)

x

n=1

n=2

n=3

n=4

n=6

n=9

)(2 n

2

Student’s t distribution • If X~N(0,1), Y~ , X

and Y are independent

Define

Then t obeys t

distribution with n

degrees of freedom,

denoted by

t~t(n)

)(2 n

nY

Xt

t(n)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-6 -4 -2 0 2 4 6

p(x)

x

n=1

n=2

n=3

n=4

n=6

n=9

F distribution

• X and Y are independent,

Define

Then F obeys F distribution

with degrees of freedom n1

and n2, denoted by

F~F(n1, n2)

)(~ 1

2 nX )(~ 2

2 nY

2

1

nY

nXF

F(d1,d2)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2 3 4 5 6

p(x)

x

d1=1,d2=1

d1=2,d2=1

d1=5,d2=2

d1=100,d2=1

Relationship between different distributions

Binomial

distribution

Two-point

distribution

n=1

nxxx 21

p is large;

np is moderate

(Unit time)

Poisson

distribution p(λ)

(Time interval Δt)

Poisson

distribution p(λΔt)

Time for the first jump

Exponential

distribution

exp(λ)

Time for the αth jump

n 21

Erlang dis-

tribution Γ(α, λ)

α>0

λ= β

Gamma

distribution Γ(α, β)

n is large

Standardized

Normal

distribution N(0,1)

F(x) n is

large

Uniform

distribution

U(0,1)

Х2(n)

222

21 nXXX

nZ

XT 1

)1,0(~1 NX )(~ 2 nZ

t distribution

t(n)

α=n/2, β =2

X1 and Z are independent

21

1

21

2

1

B

tindependen are and

)1,(~

)1,(~

xx

x

xx

x

x

Beta distribution

Beta(α, β)

F distribution

F(m, n)

n

x

m

xF

tindependen are and

)(~

)(~

21

21

22

21

xx

nx

mx

n is large

α=1, β=1

Statistical inference

• Statistical inference: Drawing

conclusions about the whole

population on the basis of a sample

• Precondition for statistical inference:

A sample is randomly selected from

the population (=probability sample)

Parameter estimation

Parameter estimation • Parameter estimation is an important

problem in statistics. It can divided into two

types:

– 1. Point estimation: it involves the use of

sample data to calculate a single value (known

as a statistic) which is to serve as a "best

guess" or "best estimate" of an unknown (fixed

or random) population parameter.

– 2. Interval estimation: it is the use of sample

data to calculate an interval of possible (or

probable) values of an unknown population

parameter.

Point estimation

• X~F(x, θ), θ is unknown. The target of

point estimation is to give a statistic and

there is a group of observations X1, X2, …,

Xn. The estimator of θ denotes as

• For example, when θ=E(X), we can use

mean of samples as the estimator of θ, i.e.

),,,(ˆˆ21 nXXX

n

i

iXn 1

1̂

Two commonly used

point estimation methods

• Maximum likelihood method

• Moment method

Maximum likelihood

estimate (MLE)

• This method is to maximize the

likelihood function for getting the

estimator of parameters.

• The probability density function of X

is p(x; θ), and θ is unknown.

Suppose there is a sample

observations X1, X2, …, Xn for X.

Maximum likelihood method

• Then the combined probability function is

• We call the above function the likelihood

function. Define the logarithm of likelihood

as

• Let , then we can calculate the

maximum likelihood estimator (MLE) of θ.

);(x);(x);(x);(x;,,,x)(1

22 in11 )( ppppxxLLn

in

n

i

in xpxxLL1

21 ;ln;,,,xlnln )()()(

0ln

d

Ld )(

Maximum likelihood method • When the likelihood function contains k

parameters , then

• The maximum likelihood estimator of

: , i=1,…,k

are the solution of k equations

),,,;p(x,,, 211

21 k

n

ikL i)(

kiL

i

k ,,2 ,1,0,,ln ,21

)(

k ,,, 21

k ,,, 21 nii xxx• ,,,ˆˆ21

Example

• Assume X1 , X2 , … , Xn are random

samples from a normal distribution ,

how to get the maximum likelihood

estimator of parameters .

• Solution: The likelihood function is

)，( 2N

2 and

n

i

i

n

n

i

i

x

xL

1

2

2

2

2

12

22

2

1exp

2

1

]2

exp[2

1,

)()(

Example

• Then

• Then the derivative equations are

n

i

ixnn

L1

2

2

22

2

1ln

22ln

2ln )()()，(

n

i

n

i

xn

L

xL

1

2

42

2

2

12

2

02

1

2ln

01

ln

)()，(

)()，(

Example

• So the solutions are

• The maximum likelihood estimator of is

n

i

i

n

i

i

xn

xxn

1

22

1

1

1

)(

n

i

i xxn 1

1̂

Example • When is known, MLE of is

• If is unknown, MLE of is

• MLE of is not equal to the sample

variance! Which one is better?

2

n

i

i xxn 1

22 1ˆ )(

n

i

ixn 1

22 1ˆ )(

2

2

Moment method

• Basic idea: equating sample moments with

unobservable population moments and

then solving those equations for the

quantities to be estimated.

• Suppose the probability density function of

X is p(x; ), then the r th moment

of X is

dxxpxXE k

rr

r ),, ,;()( 2-

1

k ,,, 21

Moment method

• Suppose there is a sample observations X1, X2, …, Xn for X. Then the r th moment of samples are

• Equate the j th (j=1, …, k) sample moments with unobservable population moments

n

i

r

ir Xn

a1

1

kkk

k

k

a

a

a

),,,(

),,,(

),,,(

21

2212

1211

Moment method

• Solve the equations, then we can get the

estimator of θ:

• We call them the moment estimators of θ.

k ˆ,,ˆ,ˆ 21

Example • X1, X2, …, Xn are samples from Uniform

distribution

• Then

• So

• The moment estimator of θ is

otherwise,0

0,1

)(x,

x

p

2

1),(

0-1

xdxdxxxp

n

i

ixn

a1

1

1

xxn

n

i

i 1

1ˆ

2

x2ˆ

Desirable properties of estimator

• Since estimator gives an estimate that

depends on sample points (X1, X2,…, Xn),

estimate is a function of sample points.

Sample points are random variable,

therefore estimate is random variable and

has probability distribution.

• We want that estimator to have several

desirable properties like

• 1. Unbiasedness

• 2. Effectiveness

• 3. Minimum mean square error

Unbiasedness • An estimator is said to be unbiased if the

expected value of the estimator is equal to

true value of the parameter being

estimated, or

• Example: sample proportion is the

unbiased estimator of population proportion

)̂(E

Effectiveness

• The most efficient estimator among a

group of unbiased estimators is the one

with the smallest variance.

• Generally speaking, assuming

and are two unbiased

estimators of θ, and , then is

said to be more effective than .

)X,,X,(Xˆ211 n

)X,,X,(Xˆ212 n

)ˆ()ˆ( 21 VV 1̂

2̂

Minimum mean square error (MSE)

• Basic idea: minimize the average

deviation between the estimation and

true value.

• We call the estimator which minimize

as the minimum mean square error

estimator of θ.

}]-)X,,X,(Xˆ{[ 2

21 nE

Interval estimation • Estimation of the parameter is not

sufficient. It is necessary to analyze and see how confident we can be about this particular estimation.

• One way of doing it is defining confidence intervals. If we have estimated θ we want to know if the “true” parameter is close to our estimate. In other words we want to find an interval that satisfies following relation:

1)( UL GGP

Interval estimation • i.e. probability that “true” parameter

θ is in the interval (GL, GU) is greater

than 1-.

• Actual realization of this interval - (gL,

gU) is called a 100(1- )% of

confidence interval, limits of the

interval are called lower and upper

confidence limits. 1- is called

confidence level.

Example • If population variance is known (2)

and we estimate population mean then

• We can find from the table that

probability of Z is more than 1 is equal

to 0.1587. Probability of Z is less than

-1 is again 0.1587. These values

comes from the table of the standard

normal distribution.

)1 ,0( ~ /

Nn

xZ

Example • Now we can find confidence interval for the

sample mean. Since:

• Then for we can write

• Confidence level that “true” value is within 1 standard error (standard deviation of sampling distribution) from the sample mean is 0.6826. Probability that “true” value is within 2 standard error from the sample mean is 0.9545.

6826.01587.0*21

)1()1(1)1()1()11(

ZPZPZPZPZP

6826.0)//()1/

1(

nxnxPn

xP

Interval estimation • Above we considered the case when

population variance is known in advance. It is rarely the case in real life. When both population mean and variance are unknown we can still find confidence intervals. In this case we calculate population mean and variance and then consider distribution of the statistic:

• Here S2 is the sample variance.

nS

xZ

/

Interval estimation • Since it is the ratio of the standard normal

random variable to square root of 2 random

variable with n-1 degrees of freedom, Z has

Student’s t distribution with n-1 degrees of

freedom. In this case we can use table of t

distribution to find confidence levels.

• It is not surprising that when we do not know

sample variance confidence intervals for the

same confidence levels becomes larger. That is price we pay for what we do not know.

Interval estimation

• If number of degrees of freedom

becomes large, then t distribution is

approximated well with normal

distribution. For n>100 we can use

normal distribution to find confidence

levels, intervals.

The Law of Large Numbers

and Central Limit Theorem

The Law of Large Numbers

• Assume X1 , X2 , … , Xn are random

samples of X. and V(X) exist. Let , then for any given ,

XE

n

i

iXn

X1

10ε

1 lim

XPn

The Central Limit Theorem

Let be the mean of a random

sample X1, X2, …, Xn, of size n from a

distribution with a finite mean and a

finite positive variance 2. Then

X

)1,0( /

Nn

XY

Small probability event

• A small probability event is an event

that has a low probability of occurring.

• The small probability event will hardly

happen in one experiment. This

principle is used for hypothesis and

tests.

• An event is a small probability event,

so it will hardly happen in theory. But if

it happens actually, then we reject H0.

Experiments on the distribution of

sample mean and sample variance

• Use RAND() in EXCEL to generate pseudo-

random numbers X1 and X2 of U(0,1):

uniform distribution on the interval [0, 1]

• Use transformation to generate random

numbers Y1 and Y2 of N(0, 1)

• Use transformation to generate random

numbers Z1 and Z2 of N(μ, σ2)

)2cos()ln(2 ),2sin()ln(2 212211 XXYXXY

YZ

Let’s do some exercises together

• Draw 100 random samples from U(0, 1)

• Draw the frequency distribution of the 100

samples

• Draw 100 sets of 5 samples from N(10, 10)


samples, and compare it with N(10, 10)


sample means and 100 sample variances

• Compare the distribution of with N(10, 2)

• Compare the distribution of with

X

104 2S )4(2

Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...

Documents

Transcript of Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...