Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9,...

64
Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in State College: -18 degrees (January 19-20, 1994) These slides were created by Tom Hettmansperger and in some cases modified by David Hunter Review Review Exam Friday, March 19 Exam Friday, March 19 Chapters 10, 11, 12, 15, Chapters 10, 11, 12, 15, 16, 17 16, 17 Source: http://pasc.met.psu.edu

Transcript of Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9,...

Page 1: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Mar. 15 Statistics for the day:Highest Temp ever recorded in State College:102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in State College:-18 degrees (January 19-20, 1994)

These slides were created by Tom Hettmansperger and in some cases modified by David Hunter

ReviewReview

Exam Friday, March 19Exam Friday, March 19

Chapters 10, 11, 12, 15, 16, 17Chapters 10, 11, 12, 15, 16, 17

Source: http://pasc.met.psu.edu

Page 2: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

3 4 5 6 7 8 9 10 11

200

300

400

500

600

700

sizeserving

calo

ries

Arby's

Page 3: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

3 4 5 6 7 8 9 10 11

200

300

400

500

600

700

serving

calo

ries

calories = -10.2 + 60.5x(serving)

S = 78.5202 R-Sq = 72.2 % R-Sq(adj) = 69.8 %

Arby's

Correlation = .83

Best fitting line through the data: called the REGRESSION LINEStrength of relationship: measured by CORRELATON

Page 4: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

calories = -10 + 60(serving size in oz)

-------------------------------------------------For example if you have a 6 oz sandwich on the average you expect to get about:

-10 + 60(6) = -10 + 360 = 350 calories--------------------------------------------------

For a 10 oz sandwich:

-10 + 60(10) = -10 +600 = 590

Page 5: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

calories = -10 + 60(serving size in oz)

• -10 is called the intercept

• 60 is called the slope

• One way to interpret slope: For every extra oz of serving you get an increase of 60 calories

Page 6: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Facts about correlation, measured for two quantitative variables +1 means perfect increasing linear +1 means perfect increasing linear

relationshiprelationship -1 means perfect decreasing linear -1 means perfect decreasing linear

relationshiprelationship 0 means no linear relationship0 means no linear relationship + means one increases as the other increases+ means one increases as the other increases - means one increases as the other decreases- means one increases as the other decreases

Page 7: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Outliers

Outliers are data that are not compatiblewith the bulk of the data.

They show up in graphical displaysas detached or stray points.

Sometimes they indicate errors in data input. Experts estimate that roughly5% of all data entered is in error.

Sometimes they are the most importantdata points.

Page 8: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Example

05

1015

2025

3035

Hours per day typically spent studying

Stat 100.2 Spring 2004

Ho

urs

Page 9: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

30

20

10

0

Jean

s

How many pairs of jeans do you own?

Stat 100.2 Spring 2004

3020100

50

40

30

20

10

0

Jeans

Fre

que

ncy

Page 10: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

100 200 300

60

70

80

90

100

weight

hei

gh

theight = 58.2 + 0.06 weight

S = 4.77319 R-Sq = 14.7 % R-Sq(adj) = 13.9 %

Stat 100.2 Combined

Correlation = .38

A bad outlier:

Page 11: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

15001000 500 0

1500

1000

500

0

cds

text

cos

t

S = 141.673 R-Sq = 31.2 % R-Sq(adj) = 30.5 %

text cost = 220.5 + 0.58 cds

Regression Plot

Correlation = .56

Another bad outlier:

Page 12: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

The Moral:There can be good outliers: Election fraud. We use them to identify important parts of the data. Or in analyzing put options for extreme cases.

More often the outliers are bad. They can depressthe correlation and make you think the relationshipis weaker than it really is.

They can increase the correlation and make it appear that the relationship is stronger than it really is.

IMPORTANT: Always look at a scatter plot as wellas compute the correlation.

Page 13: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Another problem:

Sometimes we see strong relationship inabsurd examples.

Two seemingly unrelated variables havea high correlation.

This signals the presence of a third variablethat is highly correlated with the other two.

(Confounding or interaction)

Page 14: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

A third variable: vocabulary vs shoe size

65432

2500

2000

1500

1000

500

0

Shoe Size

know

nW

ords

S = 158.602 R-Sq = 97.1 % R-Sq(adj) = 96.6 %

Y = -806 + 555 X

Regression Plot

Correlation = .985

Page 15: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

How can we have such high correlation betweenshoe size and vocabulary?

Easy: Both increase with age and hence age is a hidden variable.

Age is positively correlated with both shoesize and with vocabulary.

Page 16: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Two categorical variables: Explanatory variable: GenderResponse variable: Body Pierced or Not

Survey question:Have you pierced any other part of your body?(Except for ears)

Research Question: Is there a significant difference between women and men in terms of body pierces?

Page 17: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Data:

NoNo YesYes

WomenWomen 8484 5151 135135

MenMen 9696 33 9999

180180 5454 234234

Pierced?

Gender?

Explanatory

Response

From Stat 100.2, spring 2004 (missing responses omitted)

Page 18: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Response: body pierced? no yes All female 62.22 37.78 100.00 male 96.97 3.03 100.00

All 76.92 23.08 100.00

Percentages

Research question: Is there a significant differenceBetween women and men? (i.e., between 62.22% and 96.97%)

62.22 = 84/13596.97 = 96/99

Page 19: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

The Debate:

The research advocate claims that there is a significant difference.

The skeptic claims there is no real difference. The data differences simply happen by chance.

Page 20: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

The strategy for determining statistical significance: First, figure out what you expect to see if there is First, figure out what you expect to see if there is

no difference between females and malesno difference between females and males Second, figure out how far the data is from what is Second, figure out how far the data is from what is

expected.expected. Third, decide if the distance in the second step is Third, decide if the distance in the second step is

large.large. Fourth, if large then claim there is a statistically Fourth, if large then claim there is a statistically

significant difference.significant difference.

Page 21: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Research Advocate: OK. Suppose there is really no difference in the population as you, the Skeptic,claim. We will compare what you, The Skeptic, expect to see and what you actually do see in the data.

Skeptic: How do we figure out what we expect to see?

Page 22: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

NoNo YesYes

WomenWomen 135135

MenMen 9999

180180 5454 234234

180 180 135135 103.85

234 234

Page 23: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Rows: gender Columns: body pierces top lines of numbers are observed bottom lines are expected (by skeptic) no yes All female 84 51 135 103.85 31.15 135.00 male 96 3 99 76.15 22.85 99.00 All 180 54 234 180.00 54.00 234.00

Page 24: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

How to measure the distance between what theresearch advocate observes in the table and what the skeptic expects:

Add up the following for each cell:

Now how do we decide if 38.85 is large or not? Ifit is large enough the skeptic concedes to the research advocate and agrees there is a statistically significant difference. How large is enough?

2( exp)

exp

obs

2 2 2 22 (84 103.85) (51 31.15) (96 76.15) (3 22.85)

38.85103.85 31.15 76.15 22.85

Page 25: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

But our chi-squared is 38.85 so the research advocate easily wins!There is a statistically significant difference between men and women.

0 1 2 3 4 5 6

0.0

0.5

1.0

1.5

2.0

95% onthis side

5% onthis side

Cutoff=3.84

Chi-squared distribution with 1 degree of freedom:

If chi-squared statistic is larger than 3.84, it is declared large and the research advocate wins.

Page 26: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Why 1 degree of freedom?NoNo YesYes

WomenWomen 136136

MenMen 101101

2626 211211 237237

Note that black box is the ONLY one we can fill arbitrarily. Once that box is filled, all others are determined by margins!

Page 27: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

How many degrees of freedom?

AlwaysAlways SometimesSometimes NeverNever

WomenWomen One dfOne df Two dfTwo df 136136

MenMen 101101

106106 105105 2626 237237

Degrees of freedom (df) always equal

(Number of rows – 1) times (Number of columns – 1)

Page 28: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Health studies and risk

Research question: Do strong electromagnetic fieldscause cancer?

50 dogs randomly split into two groups: no field, yes fieldThe response is whether they get lymphoma.

Rows: mag field Columns: cancer no yes All no 20 5 25yes 10 15 25 All 30 20 50

Page 29: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Rows: mag field Columns: cancer observed above the expected

no yes All no 20 5 25 15.00 10.00 25.00 yes 10 15 25 15.00 10.00 25.00 All 30 20 50 30.00 20.00 50.00 Chi-Square = 8.333 (compare to 3.84)Research advocate wins!

Page 30: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

1. Identify the ‘bad’ response category: yes cancer

2. Risk for categories of explanatory variable• Identify treatment category • Identify baseline (control) category

3. Treatment risk: 15/25 or .60 or 60%

4. Baseline risk: 5/25 or .20 or 20%

5. Relative risk: Treatment risk over Baseline risk = .60/.20=3

So risk due to mag field is 3 times higher than baseline risk.

One more on the next page:

Terminology and jargon:

Page 31: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

22.4.

20.20.60.

BaselineBaselineTreatment

Increased risk (percentage change in risk):

So the percentage change is 200%

A 200% increase in treatment risk over baseline risk for getting cancer.

Page 32: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Final note:

When the chi-squared test is statistically significantthen it makes sense to compute the various riskstatements.

If there is no statistical significance then the skepticwins.

There is no evidence in the data for differences in risk for the categories of the explanatory variable.

Page 33: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Research question: Is ghost sighting related to age? Do young and old people differ in

ghost sighting?

The skeptic responds by saying he The skeptic responds by saying he doesn’t believe that there is any doesn’t believe that there is any difference between the age groups.difference between the age groups.

We need to see the data to resolve the debate. Thenwe can consider assessing the risk.

Exercise 9, p219 of the text.

Page 34: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Expected counts are printed below observed yes no Total young 212 1313 1525 174.9 1350.1

old 465 3913 4378 502.1 3875.9

Total 677 5226 5903

Chi-Sq = 7.870 + 1.020 + 2.742 + 0.355 = 11.987

The research advocate wins and skeptic loses.There is evidence in the data that there are differencesin the population.

Page 35: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

The percent of young who saw a ghost: 212/1525 = .139Answer: 13.9%

The proportion of old who saw a ghost:465/4378 = .106Answer: .106

The risk of young seeing ghost: Answer: 212/1525 or .139 or 13.9%

Odds ratio?

Page 36: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Odds

The odds of something The odds of something happening are given happening are given by a ratio:by a ratio:

For example, if you For example, if you flip a fair coin, the flip a fair coin, the odds of heads are 1 (or odds of heads are 1 (or sometimes “1 to 1”).sometimes “1 to 1”).

An odds ratio is the An odds ratio is the ratio of two odds!ratio of two odds!

Proportion of time it happens

Proportion of time it doesn't happen

Page 37: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

The odds that a young person saw a ghost: 212/1313 = .161

The odds that an older person saw a ghost:465/3912 = .119

The odds ratio: Answer: .161/.106 = 1.35

Page 38: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Relative risk of young person seeing a ghost compared to older person:Answer: .139/.106 = 1.31

We would say that the risk that a younger personsees a ghost is 1.31 times higher than the risk that an older person sees a ghost.

The increased risk that a young person sees a ghost overthat of an older person:Answer: (.139 - .106)/.106 = .31

Hence we would say that young people have a 31% higher risk of seeing a ghost than older people.

Page 39: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Statistical significance

Statistical significance is related to the size of the sample. But that makes sense. More data, more information, more precise inference.

So statistical significance is related to two things:

1. The size of the difference between the percentages. Big differences are more likely to show stat. significance.

2. The size of the sample. Bigger samples are more likelyto show statistical significance irrespective of the size ofthe difference in percentages.

Page 40: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Practical significance

Even if the difference in percentages is uninterestingand of no practical interest, the difference may be statistically significant because we have a largesample.

Hence, in the interpretation of statistical significance,we must also address the issue of practical significance.

In other words, you must answer the skeptic’s secondquestion: WHO CARES?

Page 41: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Probability

RelativeFrequency

PersonalOpinion

ExperimentRepeated Sampling

ExperienceNon-repeatable

Event

Physical WorldAssumptions

Estimate Probability RepeatedSampling

Check byRepeated Sampling

Page 42: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Rules: For combining probabilities 0 < Probability < 1

1. If there are only two possible outcomes, then their probabilities must sum to 1.

2. If two events cannot happen at the same time,they are called mutually exclusive. The probabilityof at least one happening (one or the other) is the sum of their probabilities. [1. is a special case of this.]

3. If two events do not influence each other, they are called independent. The probability that theyhappen at the same time is the product of their probabilities.

4. If the occurrence of one event forces the occurrence of another event, then the probability of the second event is always at least as large as the probability of the first event.

Page 43: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Are mutually exclusive events independent or dependent?

Remember the tests:

1. Two events are mutually exclusive if they cannot happenat the same time.

2. Two events are independent if the occurrence of one doesnot alter the probability of the other occurring.

Or, another way, if the probability of the occurrence of oneevent changes when we find out whether the other event occurred or not.

Page 44: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

New Rule:

Suppose we are considering a series of events. The probability of at least one of the events occurring is:

Pr( at least one ) = 1 – Pr( none )

This follows directly from Rule 1 since ‘at least one’or ‘none’ has to occur.

Page 45: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Long Run Behavior

We CANNOT predict individual outcomes.

BUT

We CAN predict quite accurately long run behavior.--------------------------------------------------------------------Standard example:

We cannot predict the outcome of a single toss ofa coin very precisely: Pr(head) = .50

But in the long run we expect about 50% heads and tails.

Page 46: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

10005000

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

proportion of heads.Toss a fair coin 1000 and keep track of the

Page 47: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Two laws (only one of them valid):

Law of large numbers: Over the long haul, Law of large numbers: Over the long haul, we expect about 50% heads (this is true).we expect about 50% heads (this is true).

““Law of small numbers”: If we’ve seen a Law of small numbers”: If we’ve seen a lot of tails in a row, we’re more likely to see lot of tails in a row, we’re more likely to see heads on the next flip (this is completely heads on the next flip (this is completely bogus).bogus).

Remember: The law of large numbers OVERWHELMS; it does not COMPENSATE.

Page 48: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

When will it happen? (p264 text)Odd Man

Consider the odd man game. Three people tossa coin. The odd man has to pay for the drinks.

You are the odd man if you get a head and the other two have tails or if you get a tail and the other two have heads.

Pr(no odd man) = Pr(HHH or TTT) = Pr(HHH) + Pr(TTT) Rule 2 = (1/2)3 + (1/2)3 Rule 3 =1/8 + 1/8 =1/4 = .25Pr( odd man ) = 1 – Pr(no odd man) = 1 - .25 = .75 Rule 1

Page 49: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Pr( odd man occurs on the third try)

= Pr(miss, miss, hit)

= Pr(miss)Pr(miss)Pr(hit) Rule 3

=[Pr(miss)]2Pr(hit)

=[.25]2.75

= .047

Page 50: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

ExpectationInsurance

Example 14 p267 extended.

Suppose my insurance company has 10,000 policy holdersand they are all skateboarders.

I collect a $500 premium each year.

I pay off $1500 for a claim of a skate board accident.

From past experience I know 10% ( ie. 1000) will file a claim.

How much do I expect to make per customer?

Page 51: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Pr(claim) = .10 loss is $1500 - $500 = $1000 recorded as -$1000

Pr(no claim) = .90 gain is $500--------------------------------------------------------------------------Expected value = .10x(-1000) + .90x(500) = -100 + 450 = 350 dollars per customer--------------------------------------------------------------------------Expected value for the 10,000 customers = 10,000x350 = 3,500,000 dollars per year

Page 52: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

333333D

662222C

555111B

444400A

Side value

Pr( B beats A ) = 2/3

Pr( C beats B ) = 2/3

Pr( D beats C ) = 2/3

Pr (A beats D ) = 2/3

Hence, there is NO best die! You can always pick a winner.

EfronDice

Page 53: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Cancer testing: confusion of the inverseSuppose we have a cancer test for a certain type of cancer.

Sensitivity of the test:If you have cancer then the probability of a positive testis .98. Pr(+ given you have C) = .98

Specificity of the test:If you do not have cancer then the probability of a negativetest is .95. Pr(- given you do not have C) = .95

Base rate:The percent of the population who has the cancer. This is the probability that someone has C.Suppose for our example it is 1%. Hence, Pr(C) = .01.

Page 54: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

++

PositivePositive

--

NegativeNegative

CC

(Cancer)(Cancer)

.98.98 .02.02 .01.01

no Cno C

(no Cancer)(no Cancer)

.05.05 .95.95 .99.99

Sensitivity

Specificity

BaseRate

Percent table

Suppose you go in for a test and it comes back positive.What is the probability that you have cancer?

false positive false negative

Page 55: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Count table from a percent table

++ --

CC .98.98 .02.02 .01.01

no Cno C .05.05 .95.95 .99.99

++ --

CC 9898 22 100100

no Cno C 495495 94059405 9,9009,900

593593 94079407 10,00010,000

Pr(C given a + test) = 98/593 = .165

Page 56: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

All people like you

Tree diagrams: A possible tool for solving problems like the “rare disease” problem

With disease Without disease

Positive Negative Positive Negative

.01

.02.98 .05 .95

.99

.0495 .0098

Pr (Positive) = .0098+.0495 = .0593

Pr (Disease given Positive) = .0098/.0593 = .165

Page 57: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Recall earlier quiz we didn’t have:

Mary likes earrings and spends time at festivals shoppingfor jewelry. Her boy friend and several of her close girlfriends have tattoos. They have encouraged her to alsoget a tattoo.

Unknown to you, Mary will be sitting next to you in the next stat100.2 class.

Which of the following do you think is more likely and why?

A. Mary is a physics major.

B. Mary is a physics major with pierced ears.

Page 58: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

A possible cause of this fallacy is the

Representative heuristic: leads people to assign higherprobabilities than are warranted to scenarios that arerepresentative of how we imagine things would happen.

An answer of B (Mary is a physics major with pierced ears) is impossible and illustrates the

Conjunction fallacy: assigning higher probabilityto a detailed scenario involving the conjunction ofevents than to one of the simple events that make up the conjunction.

Page 59: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Exercise 1, page 309 (sort of):

Suppose you flip four coins.

• Which is more likely, HHHH or HTTH?

• Which is more likely, four total heads or two total heads?

Note: These questions are not the same! One of these questions is often mistakenly answered due to belief in the “Law of small numbers” (also known as the Gambler’s Fallacy).

Page 60: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Flip a coin repeatedly. Which of the following is more likely? Your first seven flips are HHTHTTHYour first seven flips are HHTHTTH Your first six flips are all headsYour first six flips are all heads

(By the way, how do you calculate the exact probability of each of these events?)

Page 61: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Exercise 15, page 311. What’s the difference between these two statements? ““I’m confident that there is at least one set I’m confident that there is at least one set

of matching birthdays in this room”of matching birthdays in this room” ““I’m confident that there is at least one I’m confident that there is at least one

person in this room whose birthday matches person in this room whose birthday matches my birthday”my birthday”

Which statement is more likely to be true?

How many possible pairs of people are eligible for matching in each case? Assume 50 people are in the room.

Page 62: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

With 50 people in the room…

There are 49 possible pairs with me.There are 49 possible pairs with me. There are 49+48+47+…+1 = 1225 total There are 49+48+47+…+1 = 1225 total

possible pairs.possible pairs. Pr (No match with my birthday) = Pr (No match with my birthday) =

(364/365)(364/365)4949=.874=.874 Pr (No match at all) = .030 (and we can Pr (No match at all) = .030 (and we can

estimate by (364/365)estimate by (364/365)12251225=.035)=.035)

Page 63: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

Question 1: Have you ever smoked marijuana?

Question 2: Is your mother’s birthday in Jan through May?

If your father’s birthday is in July through Dec, answer question 1. Otherwise answer question 2.

Randomized Response: A technique for asking sensitive questions

Page 64: Mar. 15 Statistics for the day: Highest Temp ever recorded in State College: 102 degrees (July 9, 1936 and July 17, 1988) Lowest temp ever recorded in.

ConditionalConditional

ProbabilitiesProbabilities

nono yesyes Base rateBase rate

Q1Q1 1-p1-p pp 6/126/12

Q2Q2 7/127/12 5/125/12 6/126/12

Unconditional Unconditional

ProbabilitiesProbabilities

nono yesyes

Q1Q1 .5(1-p).5(1-p) .5p.5p

Q2Q2 .292.292 .208.208

.208+.5p.208+.5p

Solve for p: .208+.5p = proportion of observed yeses in sample