Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...

50
Lecture 3 Sampling, sampling distributions, and parameter estimation

Transcript of Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions,...

Page 1: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Lecture 3

Sampling, sampling

distributions, and

parameter estimation

Page 2: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Sampling

Page 3: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Definition

• Population is defined as the collection

of all the possible observations of

interest.

• The collection of observations we take

from the population is called a sample.

• The number of observations in the

sample is called the sample size.

Page 4: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Sampling

• When we are interested in a

population, we typically study a

sample of that population rather than

attempt to study the entire population.

• The sample should ideally be a

representation of the population with

similar characteristics.

Page 5: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Principles of sampling

• 1. Same distribution. All variables

in the sample X1, …, Xn have the

same distribution as in the entire

population.

• 2. Independence. X1, …, Xn are

independent. In other words, each

observation has no relationship

with others.

Page 6: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Simple random sampling

• Simple random sampling is the most

straightforward of the random sampling

strategies. We use this strategy when we

believe that the population is relatively

homogeneous for the characteristic of

interest. i.e. no population structure

Page 7: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Simple random sampling • For example, let's say you were surveying

first-time parents about their attitudes toward mandatory seat belt laws. You might expect that their status as new parents might lead to similar concerns about safety.

• On campus, those who share a major might also have similar interests and values; we might expect psychology majors to share concerns about access to mental health services on campus.

Page 8: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Simple Random Sampling

Page 9: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Other sampling methods

• Systematic sampling

• Stratified sampling

• Proportionate sampling

• Cluster sampling

• Multistage sampling

• And so on

Page 10: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Sample statistic and

distribution

Page 11: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Sample mean and sample variance

• Let X1, …, Xn be a random sample

• Sample mean

• Sample variance

• Sample standard error (deviation)

n

i

iXn

X1

1

n

i

i XXn

S1

22

1

1

n

i

i XXn

S1

2

1

1

2

1

22 )(1

1XnX

nS

n

i

i

Page 12: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

For frequency or grouping data

• Let X1, …, Xk be group values, f1, …, fk

be the frequency of each group,

f1+ …+fk=1

• Sample mean

• Sample variance

k

i

ii XfX1

2

1

2

1

22 XXfXXfSk

i

ii

k

i

ii

Page 13: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Properties of sample

mean and variance • Let X1, . . . ,Xn be a random sample from a normal

distribution , and let

• Then we have

• (1) and are independent random variables.

• (2) has a normal distribution, i.e.

• (3) has a chi-squared distribution with

n − 1 degrees of freedom.

),( 2N

n

i

iXn

X1

1

n

i

i XXn

S1

22

1

1

X 2S

X ),( 2 nN 22)1( Sn

2

1

2

1

22 )()1( XnXXXSSSnn

i

i

n

i

i

Page 14: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

distribution

• If Xi~N(0,1), i=1,…,n,

and Xi are

independent

Define

The obeys

distribution with n

degrees of freedom,

denoted by

2

n

i

iX1

22

2

)(~ 22 n

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 2 4 6 8

p(x)

x

n=1

n=2

n=3

n=4

n=6

n=9

)(2 n

2

Page 15: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Student’s t distribution • If X~N(0,1), Y~ , X

and Y are independent

Define

Then t obeys t

distribution with n

degrees of freedom,

denoted by

t~t(n)

)(2 n

nY

Xt

t(n)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-6 -4 -2 0 2 4 6

p(x)

x

n=1

n=2

n=3

n=4

n=6

n=9

Page 16: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

F distribution

• X and Y are independent,

Define

Then F obeys F distribution

with degrees of freedom n1

and n2, denoted by

F~F(n1, n2)

)(~ 1

2 nX )(~ 2

2 nY

2

1

nY

nXF

F(d1,d2)

0

0.1

0.2

0.3

0.4

0.5

0.6

0 1 2 3 4 5 6

p(x)

x

d1=1,d2=1

d1=2,d2=1

d1=5,d2=2

d1=100,d2=1

Page 17: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Relationship between different distributions

Binomial

distribution

Two-point

distribution

n=1

nxxx 21

p is large;

np is moderate

(Unit time)

Poisson

distribution p(λ)

(Time interval Δt)

Poisson

distribution p(λΔt)

Time for the first jump

Exponential

distribution

exp(λ)

Time for the αth jump

n 21

Erlang dis-

tribution Γ(α, λ)

α>0

λ= β

Gamma

distribution Γ(α, β)

n is large

Standardized

Normal

distribution N(0,1)

F(x) n is

large

Uniform

distribution

U(0,1)

Х2(n)

222

21 nXXX

nZ

XT 1

)1,0(~1 NX )(~ 2 nZ

t distribution

t(n)

α=n/2, β =2

X1 and Z are independent

21

1

21

2

1

B

tindependen are and

)1,(~

)1,(~

xx

x

xx

x

x

Beta distribution

Beta(α, β)

F distribution

F(m, n)

n

x

m

xF

tindependen are and

)(~

)(~

21

21

22

21

xx

nx

mx

n is large

α=1, β=1

Page 18: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Statistical inference

• Statistical inference: Drawing

conclusions about the whole

population on the basis of a sample

• Precondition for statistical inference:

A sample is randomly selected from

the population (=probability sample)

Page 19: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Parameter estimation

Page 20: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Parameter estimation • Parameter estimation is an important

problem in statistics. It can divided into two

types:

– 1. Point estimation: it involves the use of

sample data to calculate a single value (known

as a statistic) which is to serve as a "best

guess" or "best estimate" of an unknown (fixed

or random) population parameter.

– 2. Interval estimation: it is the use of sample

data to calculate an interval of possible (or

probable) values of an unknown population

parameter.

Page 21: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Point estimation

• X~F(x, θ), θ is unknown. The target of

point estimation is to give a statistic and

there is a group of observations X1, X2, …,

Xn. The estimator of θ denotes as

• For example, when θ=E(X), we can use

mean of samples as the estimator of θ, i.e.

),,,(ˆˆ21 nXXX

n

i

iXn 1

Page 22: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Two commonly used

point estimation methods

• Maximum likelihood method

• Moment method

Page 23: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Maximum likelihood

estimate (MLE)

• This method is to maximize the

likelihood function for getting the

estimator of parameters.

• The probability density function of X

is p(x; θ), and θ is unknown.

Suppose there is a sample

observations X1, X2, …, Xn for X.

Page 24: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Maximum likelihood method

• Then the combined probability function is

• We call the above function the likelihood

function. Define the logarithm of likelihood

as

• Let , then we can calculate the

maximum likelihood estimator (MLE) of θ.

);(x);(x);(x);(x;,,,x)(1

22 in11 )( ppppxxLLn

in

n

i

in xpxxLL1

21 ;ln;,,,xlnln )()()(

0ln

d

Ld )(

Page 25: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Maximum likelihood method • When the likelihood function contains k

parameters , then

• The maximum likelihood estimator of

: , i=1,…,k

are the solution of k equations

),,,;p(x,,, 211

21 k

n

ikL i)(

kiL

i

k ,,2 ,1,0,,ln ,21

)(

k ,,, 21

k ,,, 21 nii xxx• ,,,ˆˆ21

Page 26: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example

• Assume X1 , X2 , … , Xn are random

samples from a normal distribution ,

how to get the maximum likelihood

estimator of parameters .

• Solution: The likelihood function is

),( 2N

2 and

n

i

i

n

n

i

i

x

xL

1

2

2

2

2

12

22

2

1exp

2

1

]2

exp[2

1,

)()(

Page 27: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example

• Then

• Then the derivative equations are

n

i

ixnn

L1

2

2

22

2

1ln

22ln

2ln )()(),(

n

i

n

i

xn

L

xL

1

2

42

2

2

12

2

02

1

2ln

01

ln

)(),(

)(),(

Page 28: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example

• So the solutions are

• The maximum likelihood estimator of is

n

i

i

n

i

i

xn

xxn

1

22

1

1

1

)(

n

i

i xxn 1

Page 29: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example • When is known, MLE of is

• If is unknown, MLE of is

• MLE of is not equal to the sample

variance! Which one is better?

2

n

i

i xxn 1

22 1ˆ )(

n

i

ixn 1

22 1ˆ )(

2

2

Page 30: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Moment method

• Basic idea: equating sample moments with

unobservable population moments and

then solving those equations for the

quantities to be estimated.

• Suppose the probability density function of

X is p(x; ), then the r th moment

of X is

dxxpxXE k

rr

r ),, ,;()( 2-

1

k ,,, 21

Page 31: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Moment method

• Suppose there is a sample observations X1, X2, …, Xn for X. Then the r th moment of samples are

• Equate the j th (j=1, …, k) sample moments with unobservable population moments

n

i

r

ir Xn

a1

1

kkk

k

k

a

a

a

),,,(

),,,(

),,,(

21

2212

1211

Page 32: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Moment method

• Solve the equations, then we can get the

estimator of θ:

• We call them the moment estimators of θ.

k ˆ,,ˆ,ˆ 21

Page 33: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example • X1, X2, …, Xn are samples from Uniform

distribution

• Then

• So

• The moment estimator of θ is

otherwise,0

0,1

)(x,

x

p

2

1),(

0-1

xdxdxxxp

n

i

ixn

a1

1

1

xxn

n

i

i 1

2

x2ˆ

Page 34: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Desirable properties of estimator

• Since estimator gives an estimate that

depends on sample points (X1, X2,…, Xn),

estimate is a function of sample points.

Sample points are random variable,

therefore estimate is random variable and

has probability distribution.

• We want that estimator to have several

desirable properties like

• 1. Unbiasedness

• 2. Effectiveness

• 3. Minimum mean square error

Page 35: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Unbiasedness • An estimator is said to be unbiased if the

expected value of the estimator is equal to

true value of the parameter being

estimated, or

• Example: sample proportion is the

unbiased estimator of population proportion

)̂(E

Page 36: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Effectiveness

• The most efficient estimator among a

group of unbiased estimators is the one

with the smallest variance.

• Generally speaking, assuming

and are two unbiased

estimators of θ, and , then is

said to be more effective than .

)X,,X,(Xˆ211 n

)X,,X,(Xˆ212 n

)ˆ()ˆ( 21 VV 1̂

Page 37: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Minimum mean square error (MSE)

• Basic idea: minimize the average

deviation between the estimation and

true value.

• We call the estimator which minimize

as the minimum mean square error

estimator of θ.

}]-)X,,X,(Xˆ{[ 2

21 nE

Page 38: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Interval estimation • Estimation of the parameter is not

sufficient. It is necessary to analyze and see how confident we can be about this particular estimation.

• One way of doing it is defining confidence intervals. If we have estimated θ we want to know if the “true” parameter is close to our estimate. In other words we want to find an interval that satisfies following relation:

1)( UL GGP

Page 39: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Interval estimation • i.e. probability that “true” parameter

θ is in the interval (GL, GU) is greater

than 1-.

• Actual realization of this interval - (gL,

gU) is called a 100(1- )% of

confidence interval, limits of the

interval are called lower and upper

confidence limits. 1- is called

confidence level.

Page 40: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example • If population variance is known (2)

and we estimate population mean then

• We can find from the table that

probability of Z is more than 1 is equal

to 0.1587. Probability of Z is less than

-1 is again 0.1587. These values

comes from the table of the standard

normal distribution.

)1 ,0( ~ /

Nn

xZ

Page 41: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Example • Now we can find confidence interval for the

sample mean. Since:

• Then for we can write

• Confidence level that “true” value is within 1 standard error (standard deviation of sampling distribution) from the sample mean is 0.6826. Probability that “true” value is within 2 standard error from the sample mean is 0.9545.

6826.01587.0*21

)1()1(1)1()1()11(

ZPZPZPZPZP

6826.0)//()1/

1(

nxnxPn

xP

Page 42: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Interval estimation • Above we considered the case when

population variance is known in advance. It is rarely the case in real life. When both population mean and variance are unknown we can still find confidence intervals. In this case we calculate population mean and variance and then consider distribution of the statistic:

• Here S2 is the sample variance.

nS

xZ

/

Page 43: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Interval estimation • Since it is the ratio of the standard normal

random variable to square root of 2 random

variable with n-1 degrees of freedom, Z has

Student’s t distribution with n-1 degrees of

freedom. In this case we can use table of t

distribution to find confidence levels.

• It is not surprising that when we do not know

sample variance confidence intervals for the

same confidence levels becomes larger. That is price we pay for what we do not know.

Page 44: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Interval estimation

• If number of degrees of freedom

becomes large, then t distribution is

approximated well with normal

distribution. For n>100 we can use

normal distribution to find confidence

levels, intervals.

Page 45: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

The Law of Large Numbers

and Central Limit Theorem

Page 46: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

The Law of Large Numbers

• Assume X1 , X2 , … , Xn are random

samples of X. and V(X) exist. Let , then for any given ,

XE

n

i

iXn

X1

10ε

1 lim

XPn

Page 47: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

The Central Limit Theorem

Let be the mean of a random

sample X1, X2, …, Xn, of size n from a

distribution with a finite mean and a

finite positive variance 2. Then

X

)1,0( /

Nn

XY

Page 48: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Small probability event

• A small probability event is an event

that has a low probability of occurring.

• The small probability event will hardly

happen in one experiment. This

principle is used for hypothesis and

tests.

• An event is a small probability event,

so it will hardly happen in theory. But if

it happens actually, then we reject H0.

Page 49: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Experiments on the distribution of

sample mean and sample variance

• Use RAND() in EXCEL to generate pseudo-

random numbers X1 and X2 of U(0,1):

uniform distribution on the interval [0, 1]

• Use transformation to generate random

numbers Y1 and Y2 of N(0, 1)

• Use transformation to generate random

numbers Z1 and Z2 of N(μ, σ2)

)2cos()ln(2 ),2sin()ln(2 212211 XXYXXY

YZ

Page 50: Lecture 2 Population and sampling - isbreeding.net · Lecture 3 Sampling, sampling distributions, and parameter estimation . Sampling . Definition •Population is defined as the

Let’s do some exercises together

• Draw 100 random samples from U(0, 1)

• Draw the frequency distribution of the 100

samples

• Draw 100 sets of 5 samples from N(10, 10)

• Draw the frequency distribution of the 500

samples, and compare it with N(10, 10)

• Draw the frequency distribution of the 100

sample means and 100 sample variances

• Compare the distribution of with N(10, 2)

• Compare the distribution of with

X

104 2S )4(2