Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4...

43
Copyright (c) Bani Mallic k 1 Lecture 4 Stat 651
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4...

Page 1: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 1

Lecture 4

Stat 651

Page 2: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 2

Topics in Lecture #4 Probability

The bell-shaped (normal) curve

Normal probability plots (the q-q plot) to check for normality of continuous data

Use of Table 1 in the back of the book

Page 3: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 3

Topics in Lecture #4 Normal probability calculations

Data Transformations

Sampling distributions: sample means are random variables!

Standard error of the sample mean

Central Limit Theorem

A simple confidence interval

Page 4: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 4

Book Sections Covered in Lecture #4

Chapter 4.10, in detail

Chapter 4.11 (read on your own)

Chapter 4.12, in detail

Chapter 5.1

Chapter 5.2

Page 5: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 5

Lecture 3 Review

Box plots are probably the best way to compare populations graphically

You can detect shifts and changes in variation

Also identifies outliers

Page 6: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 6

Lecture 3 Review

q-q plots are a simple way to understand whether the data are approximately bell-shaped

Population Relative Frequency Histogram

Bell-shaped curve!!

X

43210-1-2-3-4

No

rma

l D

en

sity

.5

.4

.3

.2

.1

0.0

-.1

Page 7: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 7

Lecture 3 Review

q-q plots are a simple way to understand whether the data are approximately bell-shaped

If they are sort of straight, then normality of the population relative frequency histogram is not too badly off

Page 8: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 8

q-q plot for the healthy womenNormal Q-Q Plot of Log(Saturated Fat)

Observed Value

4.54.03.53.02.52.01.51.0

Exp

ect

ed

No

rma

l Va

lue

4.5

4.0

3.5

3.0

2.5

2.0

1.5

Page 9: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 9

Lecture 3 Review

For bell-shaped populations, we have empirical rules

Approximately 68% (90%) (95%) of the population lies within 1 (1.645) (1.96) population standard deviations of the population mean

Page 10: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 10

Lecture 3 Review

In many of our examples, we have seen that there look to be differences among populations. How can we tell if the differences are real?

We will say that populations are different if the differences we observe are more than can be expected by sample-to-sample variability.

Page 11: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 11

Lecture 3 Review

Random variables are any outcome (qualitative or numerical) from an experiment involving random sampling from a population

The idea of a model is to write down a formula for the population histogram as a function of 1-2 parameters which are estimated from the data.

If you know the parameters of the model, then you know everything about probabilities in that population

Page 12: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 12

Using the Normal Model

The entire point of the normal model is to make probability statements

In practice, we estimate the population mean by the sample mean

We estimate the population standard deviation by the sample standard deviation

Then we estimate probabilities, by pretending the sample quantities = the population ones

Page 13: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 13

Various Cases

Suppose we want to know what % of a population lies below a specified value, c

We write this by asking: what is

Pr(X < c)

The value c is any arbitrary value, e.g., 6

X is any random variable with a population mean and a population standard deviation

Page 14: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 14

Pr(X < c) for Normal Populations

Compute the z-score

Look up value in Table 1, page 1091

(white board explanation)

c-μz =

σ

Page 15: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 15

Mechanics

NHANES: suppose healthy women’s ages are normally distributed with mean = 40 and standard deviation = 6

What is the chance that a randomly selected person from this population is aged c = 43.3 or less

We write this in symbols as pr(X < 43.3)

Page 16: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 16

Mechanics

= 40, = 6

pr(X < 43.3) is what we want

z = (43.3 - )/ = 0.55 = z-score

Look up in Table 1:

The value 0.55 is on page 1092: first column is 0.5, first row is 0.05: add them to get 0.55, and look up the value

Pr(X < 43.3) = 0.7088

Page 17: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 17

Various Cases

Suppose we want to know what % of a population lies above a specified value, c

We write this by asking: what is

Pr(X > c)

The value c is any arbitrary value, e.g., 6

X is any random variable with a population mean and a population standard deviation

Page 18: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 18

Pr(X > c) for Normal Populations

This is simply 1 – Pr(X <= c).

Compute the z-score (c- )/

Look up the value for z in Table 1

Subtract this value from 1.0

Page 19: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 19

Mechanics

= 40, = 6

Chance that a randomly selected person from this population is aged 46 or more

pr(X > 46)

z = (46 - )/ = 1

Look up in Table 1 for 1.00: get 0.8413

Because you are asking for > 46, subtract from 1 to get pr(X > 46) = 1 – 0.8413 = .1587

Page 20: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 20

Mechanics

= 40, = 6

Chance that a randomly selected person from this population is aged 46 or less

pr(X <= 46)

z = (46 - )/ = 1

Look up in Table 1: chance is 84.13%

Page 21: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 21

Mechanics

= 40, = 6

Chance that a randomly selected person from this population is aged 34 or less

pr(X <= 34)

z = (34 - )/ = -1

Look up in Table 1: chance is 0.1587 = 15.87%

Page 22: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 22

Aortic Stenosis Data

Two populations: healthy kids and kids with aortic stenosis

Two outcomes: body surface area and aortic value area

Size adjusted aortic value areas is the ratio of aortic value area to body surface area

Page 23: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 23

Stenosis Data, AVA to BSA

Ratio: Note the huge outlier in

the stenotic kids.

He/she has a huge aortic value area relative to

his/her body size

5670N =

Health Status

StenotiHealthy

AV

A t

o B

SA

Ra

tio

8

6

4

2

0

-2

88797299

125

Page 24: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 24

Aortic Stenosis Data

Healthy kids and AVA/BSA Ratio

Sample mean = 1.38, s = 0.51

Let’s pretend the population has = 1.4, = 0.5

As it turns out, the sample mean of stenotic kids is 0.7

So, let’s ask: for healthy kids, what is

pr(X < 0.7)?

Page 25: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 25

Aortic Stenosis Data

Healthy kids and AVA/BSA Ratio

= 1.4, = 0.5

For healthy kids, what pr(X <= 0.7)?

z = (0.7 - )/= -1.4

look up in Table 1

You should get 0.0808

Page 26: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 26

Aortic Stenosis Data

For healthy kids, pr(X <= 0.7) = 0.0808

Stenotic kids have a mean ava/bsa ratio of 0.7

Thus, the average stenotic kid has a lower ava/bsa ratio than 91.92% of healthy kids

91.92% = 100% - 8.08%

Page 27: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 27

Not all Data are Normally Distributed

“Time to an event”, e.g., time to a heart attack

Number of things that happen, e.g., number of heart attacks

These typically have a skew shape

X

6543210-1

DE

NS

ITY

.2

.1

0.0

-.1

Page 28: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 28

Not all Data are Normally Distributed

These typically have a skew shape

Statisticians have special models to handle this (Gamma, Poisson)

You will usually try to eliminate some of the skewness by data transformation

X

6543210-1

DE

NS

ITY

.2

.1

0.0

-.1

Page 29: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 29

Not all Data are Normally Distributed

The standard data transformations are

Square root

Logarithm: but if you have zeros in the data set, you have to add a small constant, since log(0) =

Page 30: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 30

Inference

The basic building blocks for inference are statistics

Let’s start with the population mean , the sample mean and the sample standard deviation s

Standard error (of the mean) is

ns/

Page 31: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 31

Inference

The sample mean is a random variable

This means that it varies from sample to sample

Of course, if we were able to “sample” the entire population, the sample mean would equal the population mean

Page 32: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 32

Inference

The sample mean is a random variable

Its own “population” mean is

It’s standard deviation is

Note how the standard deviation of the sample mean becomes smaller as the sample size becomes larger

Why does this make sense?

σ/ n

Page 33: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 33

Central Limit Theorem

The sample mean is a random variable

Its own “population” mean is

It’s standard deviation is

In “large enough” samples, the sample mean is very nearly normally distributed, i.e., has a bell--shaped histogram

What does this mean?

σ/ n

Page 34: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 34

Warning

It is incredibly easy to have difficulty understanding that the sample mean is itself a random variable

But it is the crucial concept

If I take repeated samples and compute the sample mean each time, I will not get the same number.

Thus, the sample mean is a random variable

Page 35: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 35

Women’s Interview Survey of Health

Funny case-control study

Seemed to indicate that those women who ate a lot of non-chocolate sweets were at higher risk of breast cancer

271 women controls were interview for their diets

They completed 6 24-hour recalls

Page 36: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 36

Women’s Interview Survey of Health

271 women controls were interview for their diets and completed 6 24-hour recalls

Hawthorne effect: the more you ask people about their lives, the more they will change

Does this happen here?

If so, we’d expect that their caloric intake decreased the more they were asked about their diet

Page 37: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 37

Women’s Interview Survey of Health

To test the Hawthorne effect, we took the average caloric intake from the first two interviews, and subtracted it from the average caloric intake from the last 2 interviews

X = (average of 5 & 6) – (average of 1 & 2)

Do you think the population mean of X is positive or negative?

Page 38: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 38

WOMEN’S INTERVIEW SURVEY OF HEALTH (WISH)

My guess was that because of various factors (societal pressure, awareness of diet, Hawthorne effect), they will report fewer calories at the second time period

My hypothesis is that the population mean of X is < 0.

Page 39: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 39

WISH: Change in Caloric Intake

271N =

Change in mean Energ

2000

1000

0

-1000

-2000

-3000

217239

208

247

Does it look like a big change?

Page 40: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 40

WISH: Change in Calories

Normal Q-Q Plot of Change in mean Energy

Observed Value

200010000-1000-2000-3000

Exp

ect

ed

No

rma

l Va

lue

2000

1000

0

-1000

-2000

Does this look straight enough to be happy thinking that X is

approximately normally distributed?

Page 41: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 41

WISH

Descriptives

-180.1262 37.2202

-253.4050

-106.8474

-171.6543

-128.2150

375428.7

612.7223

-2235.00

1567.96

3802.96

838.1900

-.253 .148

.608 .295

Mean

Lower Bound

Upper Bound

95% ConfidenceInterval for Mean

5% Trimmed Mean

Median

Variance

Std. Deviation

Minimum

Maximum

Range

Interquartile Range

Skewness

Kurtosis

Change in meanEnergy: last 2 recallsminus first 2 recalls

Statistic Std. Error

What does an IQR of 838

mean?

Page 42: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 42

WISH

The sample size is n = 271

The sample mean change = -180 calories!

The sample standard deviation = 612

The sample standard error = 37

Empirical rule, the chance is 95% that the population mean is with 1.96 * 37 = 74 of -180, i.e., between - 254 and -106

Page 43: Copyright (c) Bani Mallick1 Lecture 4 Stat 651. Copyright (c) Bani Mallick2 Topics in Lecture #4 Probability The bell-shaped (normal) curve Normal probability.

Copyright (c) Bani Mallick 43

WISH

Empirical rule, the chance is 95% that the population mean between

- 254 and -106

What does this mean?

Is there a Hawthorne effect going on?

Can you attach a probability to this?