Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve!...

Post on 11-Oct-2020

1 views 0 download

Transcript of Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve!...

5/6/12

1

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

-4 -3 -2 -1 0 1 2 3 4

Apple sizes

Orange sizes

Z-scores: ���apples vs. oranges

Fruit diameter in decimeters

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-scores: ���apples vs. oranges

1 SD 1 SD

68% of data points fall

here

Fruit diameter in z-scores

Mathematically:

z = X-µ σ

Result: the orange is bigger! (Well, it’s bigger relative to other oranges than the apple is relative to other apples.)

Really rare

New York

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-scores: ���generally speaking

95% of data points fall

here

Z-scores

Mathematically:

z = X-µ σ

Really unlikely

z = -1.96

z = +1.96

5/6/12

2

You can also calculate a z-distribution for a sample, but it means something

different.

Mean-centered score divided by standard deviation

Mean-centered sample average divided by standard error

Mathematically:

z = X-µ σ

So you can calculate a z-score for

individual data points.

Mathematically:

z = X-µX σX

- -

_

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-distribution of samples���

95% of data points fall

here

Z-score: # of heads in samples of 100 coin tosses

Really unlikely to occur for this distribution

Sampling distribution of the mean

Mathematically:

z = X-µX σX

- -

IMPORTANT: We can do this because know what the mean and

standard error should be for 100 tosses of a

fair coin.

Z-test Z-value for z-test:

z = X-µX σX

- -

For the same population distribution, a larger sample size results in

a smaller standard error. (The more observations, the more accurate your estimate is.)

Standard error of estimate:

σX = σ √Ν

- _

When you might use a z-test

•  I’m a farmer. I have developed a new breed of Granny Smith apples, the Great Granny. I want to be able to say that my apples are notably bigger than regular Granny Smiths.

•  You want to know whether UCSD undergrads’ GRE scores are higher than the national average.

•  You have to know sigma (σ)!

•  Agricultural data: probably

•  Standardized tests: definitely

•  Reaction times in a lexical decision experiment? …

•  Spatial frames of reference in residents of Papua New Guinea? …

•  BOLD activation when looking at faces, houses, robots

5/6/12

3

area from 0 to z

area from z to ∞

What would B be when

z=0?

What would C be when

z=0?

What if you don’t know sigma? •  Most of the time.

•  Statistics to the rescue!

•  If you don’t know sigma, you can estimate it from your own sample.

•  You have to correct for it, of course. (df)

•  There’s a sampling distribution like the z distribution, but for unknown population σ: the t-distribution!

•  For extremely large samples, it is the z-distribution.

•  Usually, we don’t have samples big enough to get to the z-distribution, so we use t.

The t distribution •  Unlike z, there is a different t-

distribution for each sample size.

df df

df

Use just like z-distribution, but it has heavier tails. More on this next week.

5/6/12

4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Hypothesis testing

Z-scores

Way.

No way.

Some probability distribution Some probability distribution of the mean Green=your sample Blue line=your “baseline” µ & σ

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

17

Introduction to Hypothesis Testing: The Binomial Test

18

Binomial Distribution How many outcomes of “heads” will we get from N flips of a coin with weighting p?

Let’s try it ourselves…

5/6/12

5

19

Binomial Distribution

http://www.adsciengineering.com/bpdcalc/index.php

Try it yourself at:

20

Hypothesis Testing:

Neyman-Pearson paradigm for hypothesis testing:

1. Assume a probabilistic model for the data (the “null hypothesis”, H0) 2. Define an “alternative hypothesis” H1 (this can be very vague) 3. Define a “decision rule” which specifies which future observations will lead you to reject the null hypothesis 4. Collect observations (data) 5. If data highly unlikely in way that favors your alternative hypothesis, then reject the null hypothesis 6. Otherwise, “retain” the null hypothesis

21

Hypothesis Testing, translated:

Neyman-Pearson paradigm for hypothesis testing:

1.Null hypothesis H0: What would be the situation if there’s no difference? 2. Alternative hypothesis H1: What would be the situation if there is a difference? 3. Define what numeric outcome would convince you that there is a difference 4. Collect observations (data) 5. If data are highly unlikely given the no-difference scenario, reject the null hypothesis (Yay! Usually.) 6. Otherwise, “retain” the null hypothesis

22

Hypothesis Testing:

NOTE: Retaining the null hypothesis H0 is NOT the same as proving that H0 is true. It simply means that we didn’t have enough evidence to reject it (e.g., we might have, given more data)

This is analogous to when a jury declares someone “not guilty.” It does not mean that the person is innocent, only that there is not enough evidence to show that she/he is guilty.

5/6/12

6

23

Possible outcomes:

Type I Error: We reject the null hypothesis even though it’s true

Type II Error: We don’t reject the null hypothesis even though it is NOT true

Hypothesis Testing:

True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way No way

Type 1 Error (α)

Type I1 Error (β)

Criterion

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

5/6/12

7

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Type 1 Error (α)

Type I1 Error (β)

Criterion

30

Type I Error: We reject the null hypothesis even though it’s true (“False positive”)

Alpha (α): P(Type I Error)

p-Value: The smallest α you could have used and rejected the null hypothesis given your data

Hypothesis Testing: True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

P is for probability

5/6/12

8

31

Type II Error: We retain the null hypothesis even though it’s false (“False negative”)

Beta (β): P(Type II Error)

Power: 1-β We often don’t know what beta is because our alternative hypotheses are too vague.

Hypothesis Testing: True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

Criterion

1-β (power)

1-α

α

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

β

33

Trade off between Type I & Type II Error: The smaller your α, the larger your β

[You can achieve more power by accepting a greater chance of making a false positive]

Hypothesis Testing:

Effect of Sample Size on Type I and Type II Error: The bigger your sample size the more power you can achieve w/ fixed α

35

Question: Is the simulated coin toss at an online casino biased?

5/6/12

9

36

A Test:

1. Flip the coin twice 2. If the coin comes up heads both times, we decide it’s a biased magic store coin.

37

Assume a Fair Coin:

# of Heads

Probability

Decide “Biased”

Decide “Fair”

α = P(Type I error)=.25

38

A Better Test:

1. Flip the coin four times 2. If the coin comes up heads all four times, we decide it’s a biased coin.

39

# of Heads

Probability

Assume a Fair Coin:

Decide “Cheat”

Decide “Fair”

α = P(Type I error)=.06

5/6/12

10

40

Assessing power:

We can only assess the actual power of a statistical test by imagining what the world might be like (other than like H0)

41

# of Heads

Probability

Assume an unfair coin:

Decide “Cheat”

Decide “Fair”

α = P(Type I error) = .06 β = P(Type II error) = .87 Power = 1-β = .13

Suppose P(Heads=0.6)

42

Hypothesis Testing:

Effect of Increasing Sample Size: By increasing your sample size you can decrease beta without reducing alpha.

43

# of Heads

Probability

A Less Conservative Decision Rule:

Decide “Cheat”

Decide “Fair”

α = P(Type I error)=.31

5/6/12

11

44

Hypothesis Testing:

Trade off between Type I and Type II Error: You can adjust your decision rule such that it decreases alpha but it will increase beta (and vice-versa).

45

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

Hypothesis Testing

Inferential Statistics

binomial test z-test

Lecture Outline •  z-test Review & Cohen’s d

•  How many samples do we need?

•  How good is my estimate of the mean?

• What do we do when we don’t know the standard deviation of the null hypothesis?

5/6/12

12

http://www.news.com.au/perthnow/story/0,21598,22492511-5005375,00.html"

Is there a reason why some people perceive clockwise vs. counterclockwise rotation?

Null Hyp. (H0) Data come from a normal distribution with

μ=0.5, σ=0.5

Alt. Hyp. (H1) μ≠0.5

Tail of Test two-tailed

Type of Test z-test

Alpha Level α=0.05

Critical Value(s) mean S to N>.631 or mean S to N<.369

Observed Value 37/56=.661 S to N

Decision Reject H0

p-value p=.0164

Hypothesis Testing Form (n=56)

5/6/12

13

Cohen’s d: A unitless measure of effect size

see pg. 299

!

d =x "µ#

0.20 small

0.50 medium

0.80 large

!

µ = mean of H0

" = standard deviation of H0

x = sample mean (i.e., estimate of population mean)

5/6/12

14

Reporting our results:

We found that participants were significantly more likely to perceive the dancer as spinning south to north (z(n=56)=2.40, p=.0164, d=.322).

Central Limit Theorem

For large n (e.g., 25-100), the sum (or mean) of n independent samples of random variable X is

approximately normally distributed.

1 Flip

Do UCSD students do better on the SAT than average?

The mean and standard deviation of all SAT scores in the USA for 2007 is 1050 and 70 respectively. To find out if UCSD students do better than average on the SAT, I randomly select 49 students and collect their SAT scores. The mean of those 49 scores is 1090. Can I be 95% sure that UCSD students really are better?

5/6/12

15

z-Test

•  you know the mean and the standard deviation of the null hypothesis

•  your data are normally distributed OR you have a sufficiently large sample size (e.g., 25-100)

Useful for testing hypotheses when:

z-Test 1.  Decide if you need to do a two-tailed, upper

tailed, or lower tailed test.

2.  Compute the mean of your data, X.

3.  Compute the standard error of the mean of the distribution of the null hypothesis.

4.  Convert X into a z-score.

5.  If X exceeds your critical z-score, then reject the null hypothesis.

5/6/12

16

Cohen’s d

•  magnitude of d doesn’t depend on sample size (unlike p-values)

•  useful for getting a sense of how big an effect is, whereas p-values give you a sense of how reliable an effect is

A unitless measure of effect size:

Each of the following statements could inspire a hypothesis test. For each statement, would you use a two-tailed, upper-tailed, or lower-tailed. State H0 and H1.

a) To increase rainfall, extensive cloud-seeding experiments are to be conducted and the results are to be compared with a baseline figure of 0.54 inches (SD=.11) of rainfall (the amount of rain when cloud seeding wasn’t done).

b) Public health statistics indicate that American males gain an average of 23 pounds (SD=10) during the 20 year period after age 40. An ambitious weight-loss program, spanning 20 years, is being tested with a random sample of 40 year old men.

c) A basketball coach wonders if listening to CDs of positive comments during sleep will affect a player’s performance. On the one hand, it may boost self-confidence and subsequently boost performance. On the other hand it may disturb their sleep and hinder their performance.