Last Time Hypothesis Testing –Yes – No Questions –Assess with p-value P[what saw or m.c. |...

173
Last Time • Hypothesis Testing – Yes – No Questions – Assess with p-value P[what saw or m.c. | Boundary] – Interpretation – Small is conclusive – 1-sided vs. 2-sided

Transcript of Last Time Hypothesis Testing –Yes – No Questions –Assess with p-value P[what saw or m.c. |...

Last Time

• Hypothesis Testing– Yes – No Questions– Assess with p-value

P[what saw or m.c. | Boundary]– Interpretation– Small is conclusive– 1-sided vs. 2-sided

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Numerical answers:– No computers, no calculators– Handwrite Excel formulas (e.g. =9+4^2)– Don’t do arithmetic (e.g. use such formulas)

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Numerical answers:– No computers, no calculators– Handwrite Excel formulas (e.g. =9+4^2)– Don’t do arithmetic (e.g. use such formulas)

• Bring with you:– 8.5 x 11 inch sheet of paper– With your favorite info (formulas, Excel, etc.)

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Numerical answers:– No computers, no calculators– Handwrite Excel formulas (e.g. =9+4^2)– Don’t do arithmetic (e.g. use such formulas)

• Bring with you:– 8.5 x 11 inch sheet of paper– With your favorite info (formulas, Excel, etc.)

• Course in Concepts, not Memorization

Administrative Matters

State of BlackBoard Discussion Board

• Generally happy with result

Administrative Matters

State of BlackBoard Discussion Board

• Generally happy with result

• But think carefully about “where to post”– Look at current Thread HW 4– Note “diffusion of questions”– Hard to find what you want

Administrative Matters

State of BlackBoard Discussion Board

• Generally happy with result

• But think carefully about “where to post”– Look at current Thread HW 4– Note “diffusion of questions”– Hard to find what you want

• Suggest keep HW problems all together– i.e. One “Root node” per HW problem

Administrative Matters

State of BlackBoard Discussion Board

• Suggest keep HW problems all together– i.e. One “Root node” per HW problem

Administrative Matters

State of BlackBoard Discussion Board

• Suggest keep HW problems all together– i.e. One “Root node” per HW problem

• Choose where to post (in tree) carefully

Administrative Matters

State of BlackBoard Discussion Board

• Suggest keep HW problems all together– i.e. One “Root node” per HW problem

• Choose where to post (in tree) carefully

• Use better “Subject Lines”– Not just dumb “Replies”– You can enter anything you want– Try to make it clear to readers…– Especially when “not following current line”

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 261-262, 9-14

Approximate Reading for Next Class:

270-276, 30-34

Hypothesis Testing

In General:

p-value = P[what was seen,

or more conclusive | at

boundary between

H0 & H1]

Caution: more conclusive requires careful

interpretation

Hypothesis Testing

Caution: more conclusive requires careful

interpretation

Reason: Need to decide between

1 - sided Hypotheses, like

H0 : p < vs. H1: p ≥

And 2 - sided Hypotheses, like

H0 : p = vs. H1: p ≠

Hypothesis Testing

e.g. a slot machine bears a sign which says

“Win 30% of the time”

In 10 plays, I don’t win any.

Can I conclude sign is false?

(& thus have grounds for complaint,

or is this a reasonable occurrence?)

Hypothesis Testing

e.g. a slot machine bears a sign which says

“Win 30% of the time”

In 10 plays, I don’t win any. Conclude false?

Let p = P[win], let X = # wins in 10 plays

Model: X ~ Bi(10, p)

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

(understand this by visualizing # line)

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

0 1 2 3 4 5 6

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

0 1 2 3 4 5 6

30% of 10, most likely when p = 0.3

i.e. least conclusive

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

0 1 2 3 4 5 6

so more conclusive includes

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = P[X = 0 or more conclusive | p = 0.3]

0 1 2 3 4 5 6

so more conclusive includes

but since 2-sided, also include

Hypothesis Testing

Generally how to calculate?

0 1 2 3 4 5 6

Hypothesis Testing

Generally how to calculate?

Observed Value

0 1 2 3 4 5 6

Hypothesis Testing

Generally how to calculate?

Observed Value

Most Likely Value

0 1 2 3 4 5 6

Hypothesis Testing

Generally how to calculate?

Observed Value

Most Likely Value

0 1 2 3 4 5 6

# spaces = 3

Hypothesis Testing

Generally how to calculate?

Observed Value

Most Likely Value

0 1 2 3 4 5 6

# spaces = 3

so go 3 spaces in other

direct’n

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

0 1 2 3 4 5 6

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

p-value = P[X = 0 or more conclusive | p = 0.3]

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

p-value = P[X = 0 or more conclusive | p = 0.3]

= P[X ≤ 0 or X ≥ 6 | p = 0.3]

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

p-value = P[X = 0 or more conclusive | p = 0.3]

= P[X ≤ 0 or X ≥ 6 | p = 0.3]

= P[X ≤ 0] + (1 – P[X ≤ 5])

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

p-value = P[X = 0 or more conclusive | p = 0.3]

= P[X ≤ 0 or X ≥ 6 | p = 0.3]

= P[X ≤ 0] + (1 – P[X ≤ 5])

= 0.076

Hypothesis Testing

Result: More conclusive means

X ≤ 0 or X ≥ 6

p-value = P[X = 0 or more conclusive | p = 0.3]

= P[X ≤ 0 or X ≥ 6 | p = 0.3]

= P[X ≤ 0] + (1 – P[X ≤ 5])

= 0.076

Excel result from:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg4.xls

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = 0.076

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = 0.076

Yes-No Conclusion: 0.076 > 0.05,

so not safe to conclude “P[win] = 0.3”

sign

is wrong, at level 0.05

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = 0.076

Yes-No Conclusion: 0.076 > 0.05,

so not safe to conclude “P[win] = 0.3”

sign

is wrong, at level 0.05

(10 straight losses is reasonably likely)

Hypothesis Testing

Test: H0: p = 0.3 vs. H1: p ≠ 0.3

p-value = 0.076

Yes-No Conclusion: 0.076 > 0.05,

so not safe to conclude “P[win] = 0.3”

sign

is wrong, at level 0.05

Gray Level Conclusion: in “fuzzy zone”,

some evidence, but not too strong

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

• Seems like same question?

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

• Seems like same question?

• Careful, “≠” became “<”

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

• Seems like same question?

• Careful, “≠” became “<”

• I.e. 2-sided hypo became 1-sided hypo

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

• Seems like same question?

• Careful, “≠” became “<”

• I.e. 2-sided hypo became 1-sided hypo

• Difference can have major impact

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

same boundary between H0 & H1

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

= P[ X ≤ 0 | p = 0.3]

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

= P[ X ≤ 0 | p = 0.3] = 0.028

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

Test: H0: p ≥ 0.3 vs. H1: p < 0.3

p-value = P[ X = 0 or m. c. | p = 0.3]

= P[ X ≤ 0 | p = 0.3] = 0.028

Excel result from:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg4.xls

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Yes-No: Now can conclude P[win] < 30%

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

• Have strong evidence that P[win] < 30%

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

• Have strong evidence that P[win] < 30%

• But cannot conclude P[win] diff’t from 30%

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

• Have strong evidence that P[win] < 30%

• But cannot conclude P[win] diff’t from 30%

• Different from Common Sense

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

• Have strong evidence that P[win] < 30%

• But cannot conclude P[win] diff’t from 30%

• Different from Common Sense

• I.e. “logic of statistical significance” different

from“ordinary logic”

Hypothesis Testing

Yes-No: Now can conclude P[win] < 30%

Paradox of Yes-No Approach:

• Have strong evidence that P[win] < 30%

• But cannot conclude P[win] diff’t from 30%

• Different from Common Sense

• I.e. “logic - stat. sig.” not “ordinary logic”

• Reason: for 2-sided, uncertainty comes from

both sides, just adds to gray level

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Yes-No: Now can conclude P[win] < 30%

Gray Level: Evidence still flaky, but stronger

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Yes-No: Now can conclude P[win] < 30%

Gray Level: Evidence still flaky, but stronger

• Note: No gray level paradox

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Yes-No: Now can conclude P[win] < 30%

Gray Level: Evidence still flaky, but stronger

• Note: No gray level paradox

• Since no cutoff, just “somewhat stronger…”

Hypothesis Testing

Alternate Question: Same setup,

can we conclude: P[win] < 30% ???

p-value = 0.028

Yes-No: Now can conclude P[win] < 30%

Gray Level: Evidence still flaky, but stronger

• Note: No gray level paradox

• Since no cutoff, just “somewhat stronger…”

• This is why I recommend gray level

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

(strongly affects answer)

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

(strongly affects answer)

2. Careful Interpretation

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

(strongly affects answer)

2. Careful Interpretation

(notion of “P[win]≠30%” being tested

is different from usual)

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

2. Careful Interpretation

But not so bad with Gray Level interpretation

Hypothesis Testing

Lessons: 1-sided vs. 2-sided issues need:

1. Careful Implementation

2. Careful Interpretation

But not so bad with Gray Level interpretation:

“very strong” p-val < 0.01

“marginal” – “flaky” 0.01 ≤ p-val ≤ 0.1

“very weak” 0.1 < p-val

Hypothesis Testing

HW C14: Answer from both gray-level and yes-no

viewpoints:

(c) A TV ad claims that 30% of people prefer Brand

X. Should we dispute this claim if a random

sample of 10 people show:

(i) 2 people who prefer Brand X (p-val = 0.733)

(ii) 3 people who prefer Brand X (p-val = 1)

(iii) 6 people who prefer Brand X (p-val = 0.076)

(iv) 10 people who prefer Brand X (p-val = 5.9e-6)

Hypothesis Testing

HW C14: Answer from both gray-level and

yes-no viewpoints:

(d) A manager asks 12 workers, of whom 7 say

they are satisfied with working conditions.

Does this contradict the CEO’s claim that ¾

of the workers are satisfied?

(p-val = 0.316)

Hypothesis Testing

HW:

8.22a, ignore “z statistic” (p-val = 0.006)

8.29a, ignore “sketch …” (p-val = 0.184)

And now for something completely different

Coin tossing & die rolling

And now for something completely different

Coin tossing & die rolling:

• Useful thought models in this course

And now for something completely different

Coin tossing & die rolling:

• Useful thought models in this course

• We’ve calculated various probabilities

And now for something completely different

Coin tossing & die rolling:

• Useful thought models in this course

• We’ve calculated various probabilities

• Model for “randomness”…

And now for something completely different

Coin tossing & die rolling:

• Useful thought models in this course

• We’ve calculated various probabilities

• Model for “randomness”…

• But how random are they really?

And now for something completely different

Randomness in coin tossing

And now for something completely different

Randomness in coin tossing:

• Excellent source

• Prof. Persi Diaconis (Stanford U.)

And now for something completely different

Randomness in coin tossing:

• Excellent source

• Prof. Persi Diaconis (Stanford U.)

http://www-stat.stanford.edu/~cgates/PERSI/

And now for something completely different

Randomness in coin tossing

And now for something completely different

Randomness in coin tossing:

• Prof. Persi Diaconis (Stanford U.)

• Trained as performing magician

And now for something completely different

Randomness in coin tossing:

• Prof. Persi Diaconis (Stanford U.)

• Trained as performing magician

• Legendary Trick:

– He tosses coin, you call it, he catches it!

And now for something completely different

Randomness in coin tossing:

• Prof. Persi Diaconis (Stanford U.)

• Trained as performing magician

• Legendary Trick:

– He tosses coin, you call it, he catches it!

• Coin tosses not really random

And now for something completely different

Randomness in die rolling?

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

Can solve using

BINOMDIST

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

• Margin of Error

(Find dist’n, use to measure error)

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

• Margin of Error

(Find dist’n, use to measure error)

• Choose Sample Size

(for given amount of error)

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

• Margin of Error

(Find dist’n, use to measure error)

• Choose Sample Size

(for given amount of error)

Need better prob. tools

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Start with visualizing probability distributions

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Start with visualizing probability distributions

(key to “alternate representation”)

Visualization

Idea: Visually represent “distributions” (2 types)

Visualization

Idea: Visually represent “distributions” (2 types)

a) Probability Distributions (e.g. Binomial)

Visualization

Idea: Visually represent “distributions” (2 types)

a) Probability Distributions (e.g. Binomial)

Summarized by f(x)

Visualization

Idea: Visually represent “distributions” (2 types)

a) Probability Distributions (e.g. Binomial)

Summarized by f(x)

b) Lists of numbers, x1, x2, …, xn

Visualization

Idea: Visually represent “distributions” (2 types)

a) Probability Distributions (e.g. Binomial)

Summarized by f(x)

b) Lists of numbers, x1, x2, …, xn

Use subscripts to index different ones

Visualization

Examples of lists: (will often use below)

1. Collection of “#’s of Males, from HW ???

2. 2.3, 4.5, 4.7, 4.8, 5.1

Visualization

Examples of lists: (will often use below)

1. Collection of “#’s of Males, from HW ???

2. 2.3, 4.5, 4.7, 4.8, 5.1

… (there are many others)

Visualization

Connections between prob. dist’ns and lists

Visualization

Connections between prob. dist’ns and lists:

(i) Given dist’n, can construct a related list by

drawing sample values from dist’n

Visualization

Connections between prob. dist’ns and lists:

(i) Given dist’n, can construct a related list by

drawing sample values from dist’n

e.g. Bi(1,0.5) (toss coins, count

H’s)

1, 1, 1, 0, 0, 0, 1

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

(not thinking of these as random,

so use lower case)

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n: n

xxxf i

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

Use different symbol, to distinguish

from f

n

xxxf i

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

Use different symbol, to distinguish

from f

Use “hat” to indicate “estimate”

n

xxxf i

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

E.g. For above list: 1, 1, 1, 0, 0, 0, 1

n

xxxf i

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

E.g. For above list: 1, 1, 1, 0, 0, 0, 1

n

xxxf i

otherwise

x

x

xf

0

1

74

73

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

Called the “empirical prob. dist’n”

or “frequency distribution”

n

xxxf i

Visualization

Connections between prob. dist’ns and lists

(ii) Given a list, x1, x2, …, xn,

can construct a dist’n:

Called the “empirical prob. dist’n”

or “frequency distribution”

Provides probability model for: choose random

number from list

n

xxxf i

Visualization

Note: if start with f(x),

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn, (as in (i))

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn, (as in (i))

(random, so use

capitals)

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn,

And construct frequency distribution of xf̂

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn,

And construct frequency distribution of

Then for n large,

xf̂ xfxf ˆ

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn,

And construct frequency distribution of

Then for n large,

(so “hat” notation is sensible)

xf̂ xfxf ˆ

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn,

And construct frequency distribution of

Then for n large,

– Recall “frequentist interpretation” of probability

xf̂ xfxf ˆ

Visualization

Note: if start with f(x), and draw

random sample, X1, X2, …, Xn,

And construct frequency distribution of

Then for n large,

– Recall “frequentist interpretation” of probability

– Can make precise, using

xf̂ xfxf ˆ

nlim

Visualization

Simple visual representation for lists:

Use number line, put x’s

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

• Picture already gives better impression

than list of numbers

Visualization

Simple visual representation for lists:

Use number line, put x’s

E.g. 2 (above) 2.3, 4.5, 4.7, 4.8, 5.1

2 3 4 5 6

• Will be much better when lists become “too

long to comprehend”

Visualization

Drawbacks of: Number line, & x’s

Visualization

Drawbacks of: Number line, & x’s

When have many data points:

• Hard to construct

• Can’t see all (overplotting)

• Hard to interpret

Visualization

Alternatives (Text, Sec. 1.1):

• Stem and leaf plots

Visualization

Alternatives (Text, Sec. 1.1):

• Stem and leaf plots

– Clever visualization, for only pencil & paper

– But we have computers

– So won’t study further

Visualization

Alternatives (Text, Sec. 1.1):

• Stem and leaf plots

• Histograms

– Will study carefully

Statistical Folklore

Graphical Displays:

• Important Topic in Statistics

• Has large impact

• Need to think carefully to do this

• Watch for attempts to fool you

Statistical Folklore

Graphical Displays: Interesting Article:

“How to Display Data Badly”

Howard Wainer

The American Statistician, 38, 137-147.

Internet Available:

http://links.jstor.org

Statistical Folklore

Main Idea:

• Point out 12 types of bad displays

• With reasons behind

• Here are some favorites…

Statistical Folklore

Hiding the data in the scale

Statistical Folklore

The eye perceives

areas as “size”:

Statistical Folklore

Change of

Scales

in Mid-

Axis

Really trust

the

Post???

Histograms

Idea: show rectangles, where area represents

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Note: will studies these in parallel for a while

(several concepts apply to both)

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Caution: There are variations not based on

areas, see bar graphs in text

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Caution: There are variations not based on

areas, see bar graphs in text

But eye perceives area, so sensible to use it

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns:

If possible values are: x = 0, 1, … ,

n,

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns:

If possible values are: x = 0, 1, … , n,

get good picture from choice:

[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns:

If possible values are: x = 0, 1, … , n,

get good picture from choice:

[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)

where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns:

If possible values are: x = 0, 1, … , n,

get good picture from choice:

[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)

where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”

(called a “half open interval”)

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

same e.g. as above

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• As above use half open intervals

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• As above use half open intervals

(to break ties)

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• As above use half open intervals

• Note: These contain full data set

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• Can use anything for class intervals

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• Can use anything for class intervals

• But some choices better than others…

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

(a) Probs: use f(x) for [x-½, x+½), etc.

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

(a) Probs: use f(x) for [x-½, x+½), etc.

(b) Lists: [1,3): rel. freq. = 1/5 = 20%

[3,7): rel. freq. = 4/5 = 80%

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

3. Above each interval, draw rectangle where

area represents class frequency

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs: If width = 1, then

area = width x height = height

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs: If width = 1, then

area = width x height = height

So get area = f(x), by taking height = f(x)

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs: If width = 1, then

area = width x height = height

So get area = f(x), by taking height = f(x)

E.g. Binomial Distribution

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

(do 1st two, and drag box)

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

• Compute f(x) values

(create 1st one, and drag twice)

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

• Compute f(x) values

• Make bar plot

Binomial Prob. Histograms• Make bar plot

– “Insert” tab– Choose “Column”– Right Click – Select Data

(Horizontal – x’s, “Add series”, Probs)– Resize, and move by dragging– Delete legend– Click and change title– Right Click on Bars, Format Data Series:

• Border Color, Solid Line, Black• Series Options, Gap Width = 0

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

• Compute f(x) values

• Make bar plot

• Make several, for interesting comparison