Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2:...

30
Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics 101 Thomas Leininger May 21, 2013

Transcript of Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2:...

Page 1: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Unit 2: Probability and distributionsLecture 1: Probability and conditional probability

Statistics 101

Thomas Leininger

May 21, 2013

Page 2: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Announcements

Announcements

PS #1 due today

PS #2 assigned (due Friday)

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 2 / 30

Page 3: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Announcements

Visualization of the day

NA

not wrong,don't overturn

not wrong,overturn

wrong,don't overturn

wrong,overturn

0 10 20 30 40

http:// www.washingtonpost.com/ blogs/ the-fix/ wp/ 2013/ 01/ 22/ why-republicans-should-stop-talking-about-roe-v-wade/

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 3 / 30

Page 4: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Randomness

Random processes

A random process is asituation in which we knowwhat outcomes could happen,but we don’t know whichparticular outcome willhappen.

Examples: coin tosses, dierolls, iTunes shuffle, whetherthe stock market goes up ordown tomorrow, etc.

It can be helpful to model aprocess as random even if itis not truly random.

http:// www.cnet.com.au/

itunes-just-how-random-is-random-339274094.htm

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 4 / 30

Page 5: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Defining probability

Probability

There are several possible interpretations of probability but they(almost) completely agree on the mathematical rules probabilitymust follow.

P(A) = Probability of event A0 ≤ P(A) ≤ 1

Frequentist interpretation:The probability of an outcome is the proportion of times theoutcome would occur if we observed the random process aninfinite number of times.Single main stream school until recently.

Bayesian interpretation:A Bayesian interprets probability as a subjective degree of belief:For the same event, two separate people could have differingprobabilities.Largely popularized by revolutionary advance in computationaltechnology and methods during the last twenty years.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 5 / 30

Page 6: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Law of large numbers

Question

Which of the following events would you be most surprised by?

(a) 3 heads in 10 coin flips

(b) 3 heads in 100 coin flips

(c) 3 heads in 1000 coin flips

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 6 / 30

Page 7: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Law of large numbers

Law of large numbers

Law of large numbers states that as more observations are collected,the proportion of occurrences with a particular outcome, p̂n,converges to the probability of that outcome, p.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 7 / 30

Page 8: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Law of large numbers

Law of large numbers (cont.)

When tossing a fair coin, if heads comes up on each of the first 10tosses, what do you think the chance is that another head will comeup on the next toss? 0.5, less than 0.5, or more than 0.5?

H H H H H H H H H H ?

The probability is still 0.5, or there is still a 50% chance thatanother head will come up on the next toss.

P(H on 11th toss) = P(T on 11th toss) = 0.5

The coin is not due for a tail.The common (mis)understanding of the LLN is that randomprocesses are supposed to compensate for whatever happenedin the past; this is just not true and is also called gambler’s fallacy(or law of averages).

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 8 / 30

Page 9: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Disjoint and non-disjoint outcomes

Disjoint and non-disjoint outcomes

Disjoint (mutually exclusive) outcomes: Cannot happen at the sametime.

The outcome of a single coin toss cannot be a head and a tail.

A student cannot fail and pass a class.

A card drawn from a deck cannot be an ace and a queen.

Non-disjoint outcomes: Can happen at the same time.

A student can get an A in Stats and A in Econ in the samesemester.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 9 / 30

Page 10: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Disjoint and non-disjoint outcomes

Union of non-disjoint events

What is the probability of drawing a jack or a red card from a wellshuffled full deck?

Figure from http:// www.milefoot.com/ math/ discrete/ counting/ cardfreq.htm .

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 10 / 30

Page 11: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Disjoint and non-disjoint outcomes

Recap

General addition rule

P(A or B) = P(A) + P(B) − P(A and B)

Note: For disjoint events P(A and B) = 0, hence the above formula simplifies to

P(A or B) = P(A) + P(B).

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 11 / 30

Page 12: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Disjoint and non-disjoint outcomes

Question

What is the probability that a randomly sampled STA 101 studentthinks marijuana should be legalized or they agree with their parents’political views?

Parent PoliticsLegalize MJ No Yes TotalNo 11 40 51Yes 36 78 114Total 47 118 165

(a) 40+36−78165

(b) 114+118−78165

(c) 78165

(d) 78188

(e) 1147

* Data from a previous semester.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 12 / 30

Page 13: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Probability distributions

Probability distributions

A probability distribution lists all possible events and the probabilitieswith which they occur.

The probability distribution for the gender of one kid:Event B G

Probability 0.5 0.5

Rules for probability distributions:1 The events listed must be disjoint2 Each probability must be between 0 and 13 The probabilities must total 1

The probability distribution for the genders of two kids:

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 13 / 30

Page 14: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Probability distributions

Question

In a survey, 52% of respondents said they are Democrats. What is theprobability that a randomly selected respondent from this sample is aRepublican?

(a) 0.48

(b) more than 0.48

(c) less than 0.48

(d) cannot calculate using only the information given

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 14 / 30

Page 15: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Independence

Two processes are independent if knowing the outcome of oneprovides no useful information about the outcome of the other.

Knowing that the coin landed on a head on the first toss does notprovide any useful information for determining what the coin willland on in the second toss since coin tosses are independent. →Outcomes of two tosses of a coin are independent.

Knowing that the first card drawn from a deck is an ace doesprovide useful information for determining the probability ofdrawing an ace in the second draw. → Outcomes of two drawsfrom a deck of cards (without replacement) are dependent.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 15 / 30

Page 16: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Question

Between January 9-12, 2013, SurveyUSA interviewed a random sample of500 NC residents asking them whether they think widespread gun ownershipprotects law abiding citizens from crime, or makes society more dangerous.58% of all respondents said it protects citizens. 67% of White respondents,28% of Black respondents, and 64% of Hispanic respondents shared thisview. Which of the below is true?

Opinion on gun ownership and race ethnicity are most likely

(a) complementary

(b) mutually exclusive

(c) independent

(d) dependent

(e) disjoint

http:// www.surveyusa.com/ client/ PollReport.aspx?g=a5f460ef-bba9-484b-8579-1101ea26421b

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 16 / 30

Page 17: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Checking for independence

If P(A | B) = P(A), then A and B are independent.

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 17 / 30

Page 18: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Determining dependence based on sample data

If conditional probabilities calculated based on sample datasuggest dependence between two variables, the next step is toconduct a hypothesis test to determine if the observed differencebetween the probabilities is likely or unlikely to have happenedby chance.

If the observed difference between the conditional probabilities islarge, then the hypothesis test will likely be significant.

If the observed difference between the conditional probabilities issmall, and the sample is large as well, the hypothesis test maybe significant. If the sample is small, then it likely will not be.

We have seen that P(protects citizens |White) = 0.67 and P(protects citizens| Hispanic) = 0.64. Under which condition would you be more convinced ofa real difference between the proportions of Whites and Hispanics who thinkgun widespread gun ownership protects citizens? n = 500 or n = 50, 000

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 18 / 30

Page 19: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Product rule for independent events

P(A and B) = P(A) × P(B)

Or more generally, P(A1 and · · · and Ak ) = P(A1) × · · · × P(Ak )

You toss a coin twice, what is the probability of getting two tails in arow?

P(T on the first toss) × P(T on the second toss) =12×

12

=14

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 19 / 30

Page 20: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Independence

Question

A recent Gallup poll suggests that 25.5% of Texans are uninsured as ofJune 2012. Assuming that the uninsured rate stayed constant, what isthe probability that two randomly selected Texans are both uninsured?

(a) 25.52

(b) 0.2552

(c) 0.255 × 2

(d) (1 − 0.255)2

http:// www.gallup.com/ poll/ 156851/ uninsured-rate-stable-across-states-far-2012.aspx

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 20 / 30

Page 21: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Recap

Disjoint vs. complementary

Do the sum of probabilities of two disjoint events always add up to 1?

Do the sum of probabilities of two complementary events always addup to 1?

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 21 / 30

Page 22: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Recap

Putting everything together...

If we were to randomly select 5 Texans, what is the probability that atleast one is uninsured?

If we were to randomly select 5 Texans, the sample space for thenumber of Texans who are uninsured would be:

S = {0, 1, 2, 3, 4, 5}

We are interested in instances where at least one person isuninsured:

S = {0, 1, 2, 3, 4, 5}

So we can divide up the sample space intro two categories:

S = {0, at least one}

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 22 / 30

Page 23: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Recap

Putting everything together...

Since the probability of the sample space must add up to 1:

Prob(at least 1 uninsured) = 1 − Prob(none uninsured)

= 1 − [(1 − 0.255)5]

= 1 − 0.7455

= 1 − 0.23

= 0.77

At least 1

P(at least one) = 1 − P(none)

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 23 / 30

Page 24: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Probability Recap

Question

Roughly 20% of Duke undergraduates are vegetarian or vegan (es-timate based on a past class survey). What is the probability that,among a random sample of 3 Duke undergraduates, at least one isvegetarian or vegan?

(a) 1 − 0.2 × 3

(b) 1 − 0.23

(c) 0.83

(d) 1 − 0.8 × 3

(e) 1 − 0.83

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 24 / 30

Page 25: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

Relapse

Researchers randomly assigned 72 chronic users of cocaine intothree groups: desipramine (antidepressant), lithium (standardtreatment for cocaine) and placebo. Results of the study aresummarized below.

norelapse relapse total

desipramine 10 14 24lithium 18 6 24placebo 20 4 24total 48 24 72

http:// www.oswego.edu/∼srp/ stats/ 2 way tbl 1.htm

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 25 / 30

Page 26: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

Marginal probability

What is the probability that a patient relapsed?

norelapse relapse total

desipramine 10 14 24lithium 18 6 24placebo 20 4 24total 48 48 24 72 72

P(relapsed) = 4872 ≈ 0.67

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 26 / 30

Page 27: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

Joint probability

What is the probability that a patient received the the antidepressant(desipramine) and relapsed?

norelapse relapse total

desipramine 10 10 14 24lithium 18 6 24placebo 20 4 24total 48 24 72 72

P(relapsed and desipramine) = 1072 ≈ 0.14

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 27 / 30

Page 28: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

Conditional probability

If we know that a patient received the antidepressant (desipramine),what is the probability that they relapsed?

norelapse relapse total

desipramine 10 10 14 24 24lithium 18 6 24placebo 20 4 24total 48 24 72

P(relapsed | desipramine) = 1024 ≈ 0.42

P(relapsed | lithium) = 1824 ≈ 0.75

P(relapsed | placebo) = 2024 ≈ 0.83

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 28 / 30

Page 29: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

Conditional probability

If we know that a patient relapsed, what is the probability that theyreceived the antidepressant (desipramine)?

norelapse relapse total

desipramine 10 10 14 24lithium 18 6 24placebo 20 4 24total 48 48 24 72

P(desipramine | relapsed) = 1048 ≈ 0.21

P(lithium | relapsed) = 1848 ≈ 0.375

P(placebo | relapsed) = 2048 ≈ 0.42

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 29 / 30

Page 30: Unit 2: Probability and distributions Lecture 1 ...tjl13/s101/slides/unit2lec1H.pdf · Unit 2: Probability and distributions Lecture 1: Probability and conditional probability Statistics

Marginal, joint, conditional

What if we don’t have counts to createa contingency table with counts?... next time

Statistics 101 (Thomas Leininger) U2 - L1: Probability May 21, 2013 30 / 30