3 2 Review Sampling, CI

24
Multivariate Random Variables Consider two random variables X and Y Joint distribution Marginal distribution Conditional distribution

description

hhfghghf

Transcript of 3 2 Review Sampling, CI

Page 1: 3 2 Review Sampling, CI

Multivariate Random Variables

• Consider two random variables X and Y

• Joint distribution

• Marginal distribution

• Conditional distribution

Page 2: 3 2 Review Sampling, CI

Joint Probability Mass Function

Page 3: 3 2 Review Sampling, CI

Joint Probability Density Function

Page 4: 3 2 Review Sampling, CI

Example of joint probability density

Page 5: 3 2 Review Sampling, CI

Conditional distributions

Page 6: 3 2 Review Sampling, CI

Independent Random Variables

Page 7: 3 2 Review Sampling, CI

Expected value

Page 8: 3 2 Review Sampling, CI

Correlation

Page 9: 3 2 Review Sampling, CI

Random samples

Page 10: 3 2 Review Sampling, CI

Linear Combinations and their means

Page 11: 3 2 Review Sampling, CI

Variances of linear combinations

Page 12: 3 2 Review Sampling, CI

The difference between random variables

Page 13: 3 2 Review Sampling, CI

The Case of Normal Random Variables

Page 14: 3 2 Review Sampling, CI

14

Sampling and Confidence Interval

Page 15: 3 2 Review Sampling, CI

Statistical Inference

• Statistical inference: study the population based on the data obtained from a sample from the population

• Random variable 𝑋 vs. pdf 𝑓(𝑥) vs. cdf𝐹 𝑥 vs. parameter (𝜇, 𝜎2) vs. sample (𝑋1, 𝑋2, . . , 𝑋𝑛) vs. estimate ( 𝑋) or any other statistic (ℎ(𝑋𝑘))

• Estimates of parameters

– Point estimate of 𝜇: 𝑋=(𝑋1+𝑋2+. . . +𝑋𝑛)/𝑛

– Point estimate of 𝜎2: 𝑆2=((𝑋1− 𝑋)2 + (𝑋2− 𝑋)2+. . . +(𝑋𝑛− 𝑋)2)/(𝑛 − 1)

• TAMU student height: average height of students; sample based on random sampling of 30 students in ISEN department

• Problem: variability; 𝜇 ≠ 𝑥.

• Point estimate says nothing about how

close it might be to 𝜇

15

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
Page 16: 3 2 Review Sampling, CI

Definition of a statistic

pallav04
Pencil
pallav04
Pencil
pallav04
Pencil
pallav04
Pencil
pallav04
Pencil
Page 17: 3 2 Review Sampling, CI

Interval Estimate

• Interval estimate: an entire interval of plausible values

– More information about a population characteristic than does a point estimate

– A confidence level for the estimate

– Such interval estimates are called confidence intervals (CI)

• Confidence level: a measure of degree of reliability of the interval (95%, 99%, 90%)

• Significance level (𝜶): 1- confidence level

• Width of CI: given the confidence level, if the interval is narrow, our knowledge of the value of the parameters is reasonably precise; a very wide CI indicates large amount of uncertainty.

17

Point Estimate

Lower

Confidence

Limit

Width of confidence interval

Upper

Confidence

Limit

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
Page 18: 3 2 Review Sampling, CI

CI of Normal Distribution

• A 100 1 − 𝛼 % confidence interval for the mean 𝜇 of a normal population when the value of 𝜎 is known is given by

𝑥 − 𝑧𝛼/2𝜎

𝑛, 𝑥 + 𝑧𝛼/2

𝜎

𝑛

18

Z curve

1 − 𝛼

−𝑧𝛼/2 𝑧𝛼/2

P −𝑧𝛼/2< 𝑍 < 𝑧𝛼/2 = 1 − 𝛼

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
Page 19: 3 2 Review Sampling, CI

Example

• TAMU student height: --- Suppose students’ height follows normal distribution with

unknown mean 𝜇 and known 𝜎 (𝜎 = 2 ).--- We have observation from the sample of IE students; each

observation is a random sample from 𝑁(𝜇, 𝜎2).

--- The sample mean follows normal distribution 𝑋 ∼ 𝑁(𝜇,𝜎2

𝑛).

--- Normalize 𝑋: 𝑍 = 𝑋−𝜇

𝜎/ 𝑛, under the standard normal curve

𝑃 −1.96 < 𝑍 < 1.96 = 0.95

→ 𝑃 −1.96 < 𝑋 − 𝜇

𝜎/ 𝑛< 1.96 = 0.95

𝑃 𝑋 − 1.96𝜎

𝑛< 𝜇 < 𝑋 + 1.96

𝜎

𝑛= 0.95

95% confidence interval: 𝑋 − 1.96𝜎

𝑛, 𝑋 + 1.96

𝜎

𝑛

19

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Pencil
pallav04
Pencil
Page 20: 3 2 Review Sampling, CI

Example

• 95% confidence interval: 𝑋 − 1.96𝜎

𝑛, 𝑋 + 1.96

𝜎

𝑛

• CI is a random variable because 𝑋 is a random variable

• Interval length 2 × 1.96 ×𝜎

𝑛is not random, only the location is random

(center 𝑋).

• CI can be explained as the probability is 0.95 that the random interval includes the true value of 𝜇.

• TAMU students’ height: 68.46 − 1.962

24, 68.46 + 1.96

2

24=

(67.66, 69.26)

20

𝑿

𝑋 − 1.96𝜎

𝑛

2 × 1.96𝜎

𝑛

pallav04
Highlight
pallav04
Highlight
Page 21: 3 2 Review Sampling, CI

Interpret CI

• Confidence level: 95% is the probability 0.95 for the interval

• WRONG: we calculated CI(67.66,69.26), it is wrong to say 𝜇 is within this fixed interval with probability 0.95.

• We should use long-run relative frequency interpretation of probability to explain CI.

• We can take various samples, (e.g., students from business school, or students enrolling a certain class, or students currently at the library), and

then we get large enough intervals as 𝑋 −

21

pallav04
Highlight
Page 22: 3 2 Review Sampling, CI

Large Sample CI For Mean

• In reality, TAMU students’ height may not follow normal distribution.

• However, if the sample size is large enough, according to central limit theorem, 𝑋 tends to follow a normal distribution, whatever the population distribution.

• Thus 𝑍 = 𝑋−𝜇

𝜎/ 𝑛has approximately a standard normal

distribution,

• When 𝑛 is large, CI 𝑥 − 𝑧𝛼/2𝜎

𝑛, 𝑥 + 𝑧𝛼/2

𝜎

𝑛remains valid

whatever the population distribution.

• Even if the height follows some non-Gaussian distribution with known 𝜎, we can still use CI, provided we have large enough sample size 𝑛.

22

P 𝑧𝛼/2 < 𝑋−𝜇

𝜎/ 𝑛< 𝑧𝛼/2 ≈ 1 − 𝛼

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
Page 23: 3 2 Review Sampling, CI

CI When Variance Unknown

• Assumption: population is normal, and random samples are from a normal distribution with both 𝜇 and 𝜎 unknown.

• Theorem: when 𝑋 is the mean of a random sample of size 𝑛 from a normal

distribution with mean 𝜇, then random variable 𝑇 = 𝑋−𝜇

𝑆/ 𝑛follows a t

distribution with 𝒏 − 𝟏 degrees of freedom.

• Properties: let 𝑇𝑣 denote a t statistic with 𝜈 DoF.

--- Each 𝑇𝜈 pdf (𝑡𝜈 curve) is bell shaped and centered at 0.

--- Each 𝑡𝜈 curve is more spread out than the standard normal (z) curve.

--- As 𝜈 increases, the spread of the corresponding 𝑡𝜈 curves decreases.

--- As 𝜈 → ∞, the 𝑡𝜈 curve approaches the standard normal curve.

• Critical value: Let 𝑡𝛼,𝜈 denote the number on the measurement axis for

which the area under the t curve with 𝜈 DoF to the right of 𝑡𝛼,𝜈 is 𝛼; 𝑡𝛼,𝜈 is called a t critical value.

23

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pal
Highlight
Page 24: 3 2 Review Sampling, CI

CI When Variance Unknown

• Let 𝑥 and 𝑠 be the sample mean and sample standard deviation from a normal population with mean 𝜇. Then the 100 1 − 𝛼 %confidence interval for 𝜇 is

• An upper confidence bound for 𝜇 is 𝑥 + 𝑡𝛼,𝑛−1𝑠

𝑛, with confidence

level 100 1 − 𝛼 %.

24

𝑥 − 𝑡𝛼2,𝑛−1

𝑠

𝑛, 𝑥 + 𝑡𝛼

2,𝑛−1

𝑠

𝑛

pallav04
Highlight
pallav04
Highlight
pallav04
Highlight
pallav04
Highlight