3 2 Review Sampling, CI

Multivariate Random Variables

• Consider two random variables X and Y

• Joint distribution

• Marginal distribution

• Conditional distribution

Joint Probability Mass Function

Joint Probability Density Function

Example of joint probability density

Conditional distributions

Independent Random Variables

Expected value

Correlation

Random samples

Linear Combinations and their means

Variances of linear combinations

The difference between random variables

The Case of Normal Random Variables

14

Sampling and Confidence Interval

Statistical Inference

• Statistical inference: study the population based on the data obtained from a sample from the population

• Random variable 𝑋 vs. pdf 𝑓(𝑥) vs. cdf𝐹 𝑥 vs. parameter (𝜇, 𝜎2) vs. sample (𝑋1, 𝑋2, . . , 𝑋𝑛) vs. estimate ( 𝑋) or any other statistic (ℎ(𝑋𝑘))

• Estimates of parameters

– Point estimate of 𝜇: 𝑋=(𝑋1+𝑋2+. . . +𝑋𝑛)/𝑛

– Point estimate of 𝜎2: 𝑆2=((𝑋1− 𝑋)2 + (𝑋2− 𝑋)2+. . . +(𝑋𝑛− 𝑋)2)/(𝑛 − 1)

• TAMU student height: average height of students; sample based on random sampling of 30 students in ISEN department

• Problem: variability; 𝜇 ≠ 𝑥.

• Point estimate says nothing about how

close it might be to 𝜇

15

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

Definition of a statistic

pallav04

Pencil

pallav04

Pencil

pallav04

Pencil

pallav04

Pencil

pallav04

Pencil

Interval Estimate

• Interval estimate: an entire interval of plausible values

– More information about a population characteristic than does a point estimate

– A confidence level for the estimate

– Such interval estimates are called confidence intervals (CI)

• Confidence level: a measure of degree of reliability of the interval (95%, 99%, 90%)

• Significance level (𝜶): 1- confidence level

• Width of CI: given the confidence level, if the interval is narrow, our knowledge of the value of the parameters is reasonably precise; a very wide CI indicates large amount of uncertainty.

17

Point Estimate

Lower

Confidence

Limit

Width of confidence interval

Upper

Confidence

Limit

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

CI of Normal Distribution

• A 100 1 − 𝛼 % confidence interval for the mean 𝜇 of a normal population when the value of 𝜎 is known is given by

𝑥 − 𝑧𝛼/2𝜎

𝑛, 𝑥 + 𝑧𝛼/2

𝜎

𝑛

18

Z curve

1 − 𝛼

−𝑧𝛼/2 𝑧𝛼/2

P −𝑧𝛼/2< 𝑍 < 𝑧𝛼/2 = 1 − 𝛼

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

Example

• TAMU student height: --- Suppose students’ height follows normal distribution with

unknown mean 𝜇 and known 𝜎 (𝜎 = 2 ).--- We have observation from the sample of IE students; each

observation is a random sample from 𝑁(𝜇, 𝜎2).

--- The sample mean follows normal distribution 𝑋 ∼ 𝑁(𝜇,𝜎2

𝑛).

--- Normalize 𝑋: 𝑍 = 𝑋−𝜇

𝜎/ 𝑛, under the standard normal curve

𝑃 −1.96 < 𝑍 < 1.96 = 0.95

→ 𝑃 −1.96 < 𝑋 − 𝜇

𝜎/ 𝑛< 1.96 = 0.95

𝑃 𝑋 − 1.96𝜎

𝑛< 𝜇 < 𝑋 + 1.96

𝜎

𝑛= 0.95

95% confidence interval: 𝑋 − 1.96𝜎

𝑛, 𝑋 + 1.96

𝜎

𝑛

19

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Pencil

pallav04

Pencil

Example

• 95% confidence interval: 𝑋 − 1.96𝜎

𝑛, 𝑋 + 1.96

𝜎

𝑛

• CI is a random variable because 𝑋 is a random variable

• Interval length 2 × 1.96 ×𝜎

𝑛is not random, only the location is random

(center 𝑋).

• CI can be explained as the probability is 0.95 that the random interval includes the true value of 𝜇.

• TAMU students’ height: 68.46 − 1.962

24, 68.46 + 1.96

2

24=

(67.66, 69.26)

20

𝑿

𝑋 − 1.96𝜎

𝑛

2 × 1.96𝜎

𝑛

pallav04

Highlight

pallav04

Highlight

Interpret CI

• Confidence level: 95% is the probability 0.95 for the interval

• WRONG: we calculated CI(67.66,69.26), it is wrong to say 𝜇 is within this fixed interval with probability 0.95.

• We should use long-run relative frequency interpretation of probability to explain CI.

• We can take various samples, (e.g., students from business school, or students enrolling a certain class, or students currently at the library), and

then we get large enough intervals as 𝑋 −

21

pallav04

Highlight

Large Sample CI For Mean

• In reality, TAMU students’ height may not follow normal distribution.

• However, if the sample size is large enough, according to central limit theorem, 𝑋 tends to follow a normal distribution, whatever the population distribution.

• Thus 𝑍 = 𝑋−𝜇

𝜎/ 𝑛has approximately a standard normal

distribution,

• When 𝑛 is large, CI 𝑥 − 𝑧𝛼/2𝜎

𝑛, 𝑥 + 𝑧𝛼/2

𝜎

𝑛remains valid

whatever the population distribution.

• Even if the height follows some non-Gaussian distribution with known 𝜎, we can still use CI, provided we have large enough sample size 𝑛.

22

P 𝑧𝛼/2 < 𝑋−𝜇

𝜎/ 𝑛< 𝑧𝛼/2 ≈ 1 − 𝛼

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

CI When Variance Unknown

• Assumption: population is normal, and random samples are from a normal distribution with both 𝜇 and 𝜎 unknown.

• Theorem: when 𝑋 is the mean of a random sample of size 𝑛 from a normal

distribution with mean 𝜇, then random variable 𝑇 = 𝑋−𝜇

𝑆/ 𝑛follows a t

distribution with 𝒏 − 𝟏 degrees of freedom.

• Properties: let 𝑇𝑣 denote a t statistic with 𝜈 DoF.

--- Each 𝑇𝜈 pdf (𝑡𝜈 curve) is bell shaped and centered at 0.

--- Each 𝑡𝜈 curve is more spread out than the standard normal (z) curve.

--- As 𝜈 increases, the spread of the corresponding 𝑡𝜈 curves decreases.

--- As 𝜈 → ∞, the 𝑡𝜈 curve approaches the standard normal curve.

• Critical value: Let 𝑡𝛼,𝜈 denote the number on the measurement axis for

which the area under the t curve with 𝜈 DoF to the right of 𝑡𝛼,𝜈 is 𝛼; 𝑡𝛼,𝜈 is called a t critical value.

23

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pal

Highlight

CI When Variance Unknown

• Let 𝑥 and 𝑠 be the sample mean and sample standard deviation from a normal population with mean 𝜇. Then the 100 1 − 𝛼 %confidence interval for 𝜇 is

• An upper confidence bound for 𝜇 is 𝑥 + 𝑡𝛼,𝑛−1𝑠

𝑛, with confidence

level 100 1 − 𝛼 %.

24

𝑥 − 𝑡𝛼2,𝑛−1

𝑠

𝑛, 𝑥 + 𝑡𝛼

2,𝑛−1

𝑠

𝑛

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

pallav04

Highlight

3 2 Review Sampling, CI

Documents

Transcript of 3 2 Review Sampling, CI