Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve!...

16
5/6/12 1 Empirical Loop Hypothesis Research Design Collect Data Descriptive Statistics Inferential Statistics Probability -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 -4 -3 -2 -1 0 1 2 3 4 Apple sizes Orange sizes Z-scores: apples vs. oranges Fruit diameter in decimeters 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -4 -3 -2 -1 0 1 2 3 4 Standard normal curve Z-scores: apples vs. oranges 1 SD 1 SD 68% of data points fall here Fruit diameter in z-scores Mathematically: z = X- μ σ Result: the orange is bigger! (Well, it’s bigger relative to other oranges than the apple is relative to other apples.) Really rare New York 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 -4 -3 -2 -1 0 1 2 3 4 Standard normal curve Z-scores: generally speaking 95% of data points fall here Z-scores Mathematically: z = X- μ σ Really unlikely

Transcript of Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve!...

Page 1: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

1

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

-4 -3 -2 -1 0 1 2 3 4

Apple sizes

Orange sizes

Z-scores: ���apples vs. oranges

Fruit diameter in decimeters

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-scores: ���apples vs. oranges

1 SD 1 SD

68% of data points fall

here

Fruit diameter in z-scores

Mathematically:

z = X-µ σ

Result: the orange is bigger! (Well, it’s bigger relative to other oranges than the apple is relative to other apples.)

Really rare

New York

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-scores: ���generally speaking

95% of data points fall

here

Z-scores

Mathematically:

z = X-µ σ

Really unlikely

z = -1.96

z = +1.96

Page 2: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

2

You can also calculate a z-distribution for a sample, but it means something

different.

Mean-centered score divided by standard deviation

Mean-centered sample average divided by standard error

Mathematically:

z = X-µ σ

So you can calculate a z-score for

individual data points.

Mathematically:

z = X-µX σX

- -

_

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Z-distribution of samples���

95% of data points fall

here

Z-score: # of heads in samples of 100 coin tosses

Really unlikely to occur for this distribution

Sampling distribution of the mean

Mathematically:

z = X-µX σX

- -

IMPORTANT: We can do this because know what the mean and

standard error should be for 100 tosses of a

fair coin.

Z-test Z-value for z-test:

z = X-µX σX

- -

For the same population distribution, a larger sample size results in

a smaller standard error. (The more observations, the more accurate your estimate is.)

Standard error of estimate:

σX = σ √Ν

- _

When you might use a z-test

•  I’m a farmer. I have developed a new breed of Granny Smith apples, the Great Granny. I want to be able to say that my apples are notably bigger than regular Granny Smiths.

•  You want to know whether UCSD undergrads’ GRE scores are higher than the national average.

•  You have to know sigma (σ)!

•  Agricultural data: probably

•  Standardized tests: definitely

•  Reaction times in a lexical decision experiment? …

•  Spatial frames of reference in residents of Papua New Guinea? …

•  BOLD activation when looking at faces, houses, robots

Page 3: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

3

area from 0 to z

area from z to ∞

What would B be when

z=0?

What would C be when

z=0?

What if you don’t know sigma? •  Most of the time.

•  Statistics to the rescue!

•  If you don’t know sigma, you can estimate it from your own sample.

•  You have to correct for it, of course. (df)

•  There’s a sampling distribution like the z distribution, but for unknown population σ: the t-distribution!

•  For extremely large samples, it is the z-distribution.

•  Usually, we don’t have samples big enough to get to the z-distribution, so we use t.

The t distribution •  Unlike z, there is a different t-

distribution for each sample size.

df df

df

Use just like z-distribution, but it has heavier tails. More on this next week.

Page 4: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

4

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-4 -3 -2 -1 0 1 2 3 4

Standard normal curve

Hypothesis testing

Z-scores

Way.

No way.

Some probability distribution Some probability distribution of the mean Green=your sample Blue line=your “baseline” µ & σ

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

17

Introduction to Hypothesis Testing: The Binomial Test

18

Binomial Distribution How many outcomes of “heads” will we get from N flips of a coin with weighting p?

Let’s try it ourselves…

Page 5: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

5

19

Binomial Distribution

http://www.adsciengineering.com/bpdcalc/index.php

Try it yourself at:

20

Hypothesis Testing:

Neyman-Pearson paradigm for hypothesis testing:

1. Assume a probabilistic model for the data (the “null hypothesis”, H0) 2. Define an “alternative hypothesis” H1 (this can be very vague) 3. Define a “decision rule” which specifies which future observations will lead you to reject the null hypothesis 4. Collect observations (data) 5. If data highly unlikely in way that favors your alternative hypothesis, then reject the null hypothesis 6. Otherwise, “retain” the null hypothesis

21

Hypothesis Testing, translated:

Neyman-Pearson paradigm for hypothesis testing:

1.Null hypothesis H0: What would be the situation if there’s no difference? 2. Alternative hypothesis H1: What would be the situation if there is a difference? 3. Define what numeric outcome would convince you that there is a difference 4. Collect observations (data) 5. If data are highly unlikely given the no-difference scenario, reject the null hypothesis (Yay! Usually.) 6. Otherwise, “retain” the null hypothesis

22

Hypothesis Testing:

NOTE: Retaining the null hypothesis H0 is NOT the same as proving that H0 is true. It simply means that we didn’t have enough evidence to reject it (e.g., we might have, given more data)

This is analogous to when a jury declares someone “not guilty.” It does not mean that the person is innocent, only that there is not enough evidence to show that she/he is guilty.

Page 6: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

6

23

Possible outcomes:

Type I Error: We reject the null hypothesis even though it’s true

Type II Error: We don’t reject the null hypothesis even though it is NOT true

Hypothesis Testing:

True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way No way

Type 1 Error (α)

Type I1 Error (β)

Criterion

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Page 7: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

7

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

Way

No way

No way

Type 1 Error (α)

Type I1 Error (β)

Criterion

30

Type I Error: We reject the null hypothesis even though it’s true (“False positive”)

Alpha (α): P(Type I Error)

p-Value: The smallest α you could have used and rejected the null hypothesis given your data

Hypothesis Testing: True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

P is for probability

Page 8: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

8

31

Type II Error: We retain the null hypothesis even though it’s false (“False negative”)

Beta (β): P(Type II Error)

Power: 1-β We often don’t know what beta is because our alternative hypotheses are too vague.

Hypothesis Testing: True state of the world

Your decision H0 is CORRECT H0 is INCORRECT

Accept H0 Correct decision (1-α) Type II Error (β)

Reject H0 Type I Error (α) Correct decision (1-β)

Criterion

1-β (power)

1-α

α

Hypothesis testing

Z-scores Some probability distribution

Green=your sample Null hypothesis (H0) µ & σ

β

33

Trade off between Type I & Type II Error: The smaller your α, the larger your β

[You can achieve more power by accepting a greater chance of making a false positive]

Hypothesis Testing:

Effect of Sample Size on Type I and Type II Error: The bigger your sample size the more power you can achieve w/ fixed α

35

Question: Is the simulated coin toss at an online casino biased?

Page 9: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

9

36

A Test:

1. Flip the coin twice 2. If the coin comes up heads both times, we decide it’s a biased magic store coin.

37

Assume a Fair Coin:

# of Heads

Probability

Decide “Biased”

Decide “Fair”

α = P(Type I error)=.25

38

A Better Test:

1. Flip the coin four times 2. If the coin comes up heads all four times, we decide it’s a biased coin.

39

# of Heads

Probability

Assume a Fair Coin:

Decide “Cheat”

Decide “Fair”

α = P(Type I error)=.06

Page 10: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

10

40

Assessing power:

We can only assess the actual power of a statistical test by imagining what the world might be like (other than like H0)

41

# of Heads

Probability

Assume an unfair coin:

Decide “Cheat”

Decide “Fair”

α = P(Type I error) = .06 β = P(Type II error) = .87 Power = 1-β = .13

Suppose P(Heads=0.6)

42

Hypothesis Testing:

Effect of Increasing Sample Size: By increasing your sample size you can decrease beta without reducing alpha.

43

# of Heads

Probability

A Less Conservative Decision Rule:

Decide “Cheat”

Decide “Fair”

α = P(Type I error)=.31

Page 11: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

11

44

Hypothesis Testing:

Trade off between Type I and Type II Error: You can adjust your decision rule such that it decreases alpha but it will increase beta (and vice-versa).

45

Empirical Loop

Hypothesis

Research Design

Collect Data

Descriptive Statistics

Inferential Statistics

Probability

Hypothesis Testing

Inferential Statistics

binomial test z-test

Lecture Outline •  z-test Review & Cohen’s d

•  How many samples do we need?

•  How good is my estimate of the mean?

• What do we do when we don’t know the standard deviation of the null hypothesis?

Page 12: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

12

http://www.news.com.au/perthnow/story/0,21598,22492511-5005375,00.html"

Is there a reason why some people perceive clockwise vs. counterclockwise rotation?

Null Hyp. (H0) Data come from a normal distribution with

μ=0.5, σ=0.5

Alt. Hyp. (H1) μ≠0.5

Tail of Test two-tailed

Type of Test z-test

Alpha Level α=0.05

Critical Value(s) mean S to N>.631 or mean S to N<.369

Observed Value 37/56=.661 S to N

Decision Reject H0

p-value p=.0164

Hypothesis Testing Form (n=56)

Page 13: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

13

Cohen’s d: A unitless measure of effect size

see pg. 299

!

d =x "µ#

0.20 small

0.50 medium

0.80 large

!

µ = mean of H0

" = standard deviation of H0

x = sample mean (i.e., estimate of population mean)

Page 14: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

14

Reporting our results:

We found that participants were significantly more likely to perceive the dancer as spinning south to north (z(n=56)=2.40, p=.0164, d=.322).

Central Limit Theorem

For large n (e.g., 25-100), the sum (or mean) of n independent samples of random variable X is

approximately normally distributed.

1 Flip

Do UCSD students do better on the SAT than average?

The mean and standard deviation of all SAT scores in the USA for 2007 is 1050 and 70 respectively. To find out if UCSD students do better than average on the SAT, I randomly select 49 students and collect their SAT scores. The mean of those 49 scores is 1090. Can I be 95% sure that UCSD students really are better?

Page 15: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

15

z-Test

•  you know the mean and the standard deviation of the null hypothesis

•  your data are normally distributed OR you have a sufficiently large sample size (e.g., 25-100)

Useful for testing hypotheses when:

z-Test 1.  Decide if you need to do a two-tailed, upper

tailed, or lower tailed test.

2.  Compute the mean of your data, X.

3.  Compute the standard error of the mean of the distribution of the null hypothesis.

4.  Convert X into a z-score.

5.  If X exceeds your critical z-score, then reject the null hypothesis.

Page 16: Empirical Loop Z-scorescreel/COGS14/Weekly_schedule_files/Week5.pdf · Standard normal curve! Z-scores:" apples vs. oranges! 1 SD! 1 SD! 68% of data points fall here! Fruit diameter

5/6/12

16

Cohen’s d

•  magnitude of d doesn’t depend on sample size (unlike p-values)

•  useful for getting a sense of how big an effect is, whereas p-values give you a sense of how reliable an effect is

A unitless measure of effect size:

Each of the following statements could inspire a hypothesis test. For each statement, would you use a two-tailed, upper-tailed, or lower-tailed. State H0 and H1.

a) To increase rainfall, extensive cloud-seeding experiments are to be conducted and the results are to be compared with a baseline figure of 0.54 inches (SD=.11) of rainfall (the amount of rain when cloud seeding wasn’t done).

b) Public health statistics indicate that American males gain an average of 23 pounds (SD=10) during the 20 year period after age 40. An ambitious weight-loss program, spanning 20 years, is being tested with a random sample of 40 year old men.

c) A basketball coach wonders if listening to CDs of positive comments during sleep will affect a player’s performance. On the one hand, it may boost self-confidence and subsequently boost performance. On the other hand it may disturb their sleep and hinder their performance.