Probability and Probability Distributions ASW, Chapter 4-5 Skip sections 4.5, 5.5, 5.6 September 24,...

Probability and Probability Distributions

ASW, Chapter 4-5

Skip sections 4.5, 5.5, 5.6

September 24, 2008

Conditional probabilities (ASW, 162-167)

• A conditional probability refers to the probability of an event A occurring, given that another event B has occurred.

• Notation: P(A B)• Read this as the “conditional probability of A

given B” or the “probability of A given B.”• Conditional probabilities are especially useful in

economic analysis because probabilities of an event differ, depending on other events occurring.

Formulae for conditional probabilities

• The conditional probability of A given B is

• The conditional probability of B given A is

)(

)()(

BP

BAPBAP

)(

)()(

AP

BAPABP

Number of students by major and Excel skill level

Major of student

Excel skill level Total

None (N) Low (L) Medium (M) High (H)

Math (MA) 0 2 4 0 6

Business (B) 1 3 6 3 13

Economics (E) 2 12 8 2 24

Other (O) 0 1 1 1 3

Total 3 18 19 6 46

This table contains the same data as examined earlier, but reorganized as a table rather than in a tree diagram.

Examples of conditional probabilities from student survey

• Probability that each major has low skill level?

P(L MA) = P(L MA) / P(MA) = (2/46) / (6/46) = 2/6 = 0.333

P(L B) = 3 / 13 = 0.231

P(L E) = 0.500

P(L O) = 0.333

If a student has a high skill level is Excel, what is the probability his or her major is Business? Other?

P(B H) = P(B H) / P(H) = (3/46) / (6/46) = 3/6 = 0.500

P(O H) = 0.167

Using conditional probabilities• “While four-day workweeks make some sense for the

manufacturing sector, it’s much more challenging for service-based companies that have to be available for clients’ questions. Even on Fridays.” Source: The Globe and Mail, September 20, 2008, B18.

• Parents who resided in the largest census metropolitan areas were more likely to have an adult child at home. For example, 41% of parent in Vancouver but only 17% of parent living in rural areas or small towns shared their house with at least one adult child. Source: “Parents with adult children living at home.” Statistics Canada, Canadian Social Trends, Spring 2006.

Sample of Saskatchewan residents

• Random sample of 2,500 Saskatchewan residents from the Census of Canada, 2001 Public Use Microdata File, Individuals File. Obtained from the Internet Data Library System, through the University of Regina Data Library Services.

• Subgroup selected was those with ages 30-64 years, wages and salaries greater than zero, and full-time jobs in the year 2000.

• This resulted in a sample of 700 individuals.

Number of Saskatchewan residents with various levels of wages and salaries and schooling, 2000

Wages and salaries

Years of schooling Total

<12 12-13 14-17 18+

<$20,000 38 84 47 8 177

$20-45,000 69 135 101 20 325

$45,000+ 21 72 82 23 198

Total 128 291 230 51 700

Some conditional probabilities

What is the conditional probability of $45,000 in wages and salaries given less than twelve years of schooling? Given 14-17 years of schooling? Given 18+ years?P(45+ <12) = P(45+ <12) / P(<12) = 21/ 128 = 0.164P(45+ 14-17) = 82/ 230 = 0.357P(45+ 18+) = 0.451

That is, chances of a high income increase with each higher level of schooling.

What is the probability that someone with a middle level of income has 12-13 years of schooling?

P(12-13 20-45) = P(12-1320-45) / P(<20-45) = 135/ 325 = 0.415

Conditional probabilities of various levels of wages and salaries, given years of schooling, n=700

Saskatchewan residents, 2000

Wages and salaries

Years of schooling Total

<12 12-13 14-17 18+

<$20,000 0.297 0.289 0.204 0.157 0.253

$20-45,000 0.539 0.464 0.439 0.392 0.464

$45,000+ 0.164 0.247 0.357 0.451 0.283

1.000 1.000 1.000 1.000 1.000

Organizing cross-classification tables

ASW use joint probability tables with joint and marginal probabilities. Study the example on pages 163-164.

– Joint probabilities are the probabilities of the intersection of each pair of events in a cross-classification table.

– Marginal probabilities are the probabilities of each of the events in the rows and columns of the table.

Conditional probabilities can be computed from the numbers of cases, as reported in the cross-classification table, as in the examples shown above. I find this method more useful for the following analysis of independence and dependence.

Independent and dependent events (ASW, 166)

Two events A and B are independent if

P(A B) = P(A) or P(B A) = P(B).

That is, the probability of one event is not altered by whether or not the other event occurs.

If P(A B) = P(A), then P(B A) = P(B), and vice-versa.

Two events A and B are dependent if

P(A B) ≠ P(A) or P(B A) ≠ P(B).

In this case, the occurrence of one event affects the probability of the other event.

Example of dependence

Does the event of having low wages and salaries depend on having few years of schooling?

If A is the event of having a low salary (<$20,000) and B is the event of having less than twelve years of schooling

P(A B) = 38/128 = 0.297P(A) = 177/700 = 0.253

And P(A B) > P(A) so the chance of having low wages and salaries is greater for those with the least amount of schooling, as compared with the whole sample.

Also note in this case that P(B A) = 38/177 = 0.215 > 0.183 = P(B). This is an alternative way of checking for whether the events are dependent or independent.

Example of independenceAre the events of having 12-13 years of schooling (A) and

the event of having wages and salaries of $20-45,000 (B) dependent or independent?

P(A B) = 135/325 = 0.415P(A) = 291/700 = 0.416

So these two events are essentially independent of each other. Also note that

P(B A) = 135/291 = 0.464P(B) = 325/700 = 0.464

In this case, those with a middle level of schooling (12-13 years) and the middle category of income are similar to a cross-section of the whole sample.

Using dependence and independenceSome authors have argued that parents in higher socio-

economic positions may have a greater tendency to expect their children to be independent earlier than those with less education and income….However, the analysis…does not show support for these interpretations. Parents with a higher level of education were neither more not less likely than less well-educated parents to live with their adult children. Nor were parents with high personal income any less likely than those with lower personal income to provide accommodation for their children. Source: “Parents with adult children living at home.” Statistics Canada, Canadian Social Trends, Spring 2006.

Independence and dependence in economic analysis

Is the price of wheat received by Saskatchewan farmers dependent on the weather in Russia?

Is the chance of NAFTA being renegotiated dependent on the result of the U.S. presidential election?

Is the consumption of table salt dependent on interest rates? Is it dependent on health fads?

Multiplication rule (ASW, 165)

The multiplication rule can be used to compute the probability of the intersection of two events.

P(A ∩ B) = P(A) P(B A)

P(A ∩ B) = P(B) P(A B)

But note that if events A and B are independent of each other, then P(B A) = P(B) and P(A B) = P(A), so that

P(A ∩ B) = P(A) P(B)

Example of multiplication rule

What is the probability of wages and salaries of $20-45,000 (A) and having 12-13 years of schooling (B)?

Since we already know that A and B are independent,

P(A) x P(B) = (325/700) x (291/700) = 0.193.

Note that P(A ∩ B) = 135 / 700 = 0.193 from the table.

What is the probability of wages and salaries of $45,000+ (C) and having 14-17 years of schooling (D)?

In this case, we have not checked for independence, so use the full formula:

P(C ∩ D) = P(C) P(D C) = (198/700)X(82/198) = 82/700 = 0.117

Using independence• Independent trials of an experiment:

– Successive flips of a coin.– Many rolls of a die or a pair of dice.– Sale of a product to customers arriving at a retail store.

• If a population is small and a case that is selected is not replaced before the next case is drawn, then successive drawings are dependent on each other. But if the population is large, successive draws do not alter the composition of the population. Thus, random selection of respondents from a large population produces independence of successive selections.

When trials of an experiment are independent of each other, then the binomial probability distribution can be used to determine the probability of several occurrences of an event in many trials– ASW, section 5.4.

Random variables (ASW, 185)• A random variable is a numerical description of the

outcome of an experiment.• Or, a random variable attaches a numerical value to

each possible experimental outcome.• A random variable is often assigned an algebraic symbol

such as x.• A random variable can be either discrete (countable

number of possible values) or continuous (not countable or any numerical value with an interval).

• Chapter 5 deals with discrete random variables.• Chapter 6 deals with continuous random variables.

Discrete random variables• Any random variable that has a finite number of possible

values or a countably infinite number of possible values.• Examples:

– The number of females in a sample of 3 persons selected from a large population that is half female and half male (x = 0, 1, 2, 3).

– The sum of the faces shown when a pair of dice is rolled (x = 2, 3, 4, … , 12).

– The number of customers at a restaurant at lunch (x = 0, 1, 2, 3, 4, 5, … , 45). To the maximum of the number of seats.

– The number of unemployed workers in Saskatchewan reported by Statistics Canada each month (x = 0, 1, 2, … , 29,800).

– The number of homeowners who have defaulted on mortgages in the United States during the last year.

Continuous random variables• Any random variable whose possible values cannot be counted is

termed continuous. Alternatively, if the possible outcomes can take on any numerical value in an interval or set of intervals, the random variable is continuous.

• Examples:– Number of kilometres goods are transported from a

manufacturing plant to a warehouse.– Time taken to ship the goods.– Exchange rates for currencies.– Household income.

• We will study the continuous uniform distribution and the normal distribution (bell curve) next week.

Probability distributions

• A probability distribution is a random variable, along with the associated probabilities of occurrence of the values of the variable.– Discrete – probabilities of each value of the

random variable. – Continuous – probability that the random

variable is within a particular interval.

Discrete probability distribution. (ASW, 189-192)

For a discrete random variable x, the probability distribution is the set of values of x, along with f(x), the function that gives the probability for each value of x.

For each value of x, f(x) is no less than 0 and no greater than 1. The sum of the probabilities for all values of x equals 1. Symbolically,

0 f(x) 1

∑ f(x) = 1

Probability distribution for sex of person selected

x f(x)

0 1/8 = 0.125

1 3/8 = 0.375

2 3/8 = 0.375

3 1/8 = 0.125

Total 8/8 = 1.000

Equally likely outcomes for experiment of randomly selecting 3 persons from a large population of half males and half females:

FFFFFMFMFFMMMFFMFMMMFMMM

In this example, the values of f(x) are obtained using the classical interpretation of probability.

Let the random variable x be the number of females selected and f(x) the probabilities for each value of x.

Responses to “Would you like to lower tuition, even if it meant larger class sizes?

Response Numerical value

Number of respondents

Relative frequency

Strong no 1 2 0.044

Weak no 2 5 0.111

Indifferent 3 10 0.222

Weak yes 4 8 0.178

Strong yes 5 20 0.444

Total 45 0.999

Probability distribution and expected value for lower tuition question

x f(x) xf(x)

1 0.044 0.044

2 0.111 0.222

3 0.222 0.666

4 0.178 0.712

5 0.445 2.225

Total 1.000 E(x) = 3.869

If a student is randomly selected, let x be the response to the lower tuition question. In this case, the values of the probability function f(x) are the relative frequencies of occurrence of the responses to the question.

Graphing discrete probability distributions

• Use a line chart as in Figure 5.1 of ASW. Or it could be a bar chart with spaces left between the bars, to visually indicate that it is a discrete distribution.

• Convention is to place the values of the random variable x on the horizontal axis and values of the probability function f(x) on the vertical axis.

• Examples that follow illustrate these methods.

Probability of Statistics Courses Completed

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0 1 2 3 4 5 6 7

Number of Courses Completed

Pro

bab

ilit

y

Source: Fall 2005 Survey, prepared by Harvey King

Expected values (ASW, 195)

The expected value E(x) of a random variable x is the mean of the probability distribution. Symbolically,

E(x) = μ = ∑ x f(x)

where μ (pronounced something like “mu”) is a Greek symbol used to indicate mean.

The concept of expected value is more general than just referring to the mean, in that the expected value can be obtained for other expressions – see later notes on the variance. However, in this course, it will be used to denote the expected value of x, or the mean.

Expected value for x, number of females selected

x f(x) x f(x)

0 1/8 = 0.125 0.000

1 3/8 = 0.375 0.375

2 3/8 = 0.375 0.750

3 1/8 = 0.125 0.375

Total 8/8 = 1.000 1.500

E(x) = μ = ∑ x f(x) = 1.500

If a random sample of 3 persons is obtained from a large population composed of half females and half males, the expected number of females selected is 1.5. If there are many samples of 3 persons each time, the mean number of females across the samples is 1.5.

Expected value for responses to lower tuition question

x f(x) xf(x)

1 0.044 0.044

2 0.111 0.222

3 0.222 0.666

4 0.178 0.712

5 0.445 2.225

Total 1.000 E(x) = 3.869

The expected value of the responses is 3.869, or 3.9. Recall that a response of 3 was “indifferent” and a response of 4 was “weak yes” so, in this sample, the expected value or mean is just below “weak yes.”

Variance (ASW, 195)

The variance of a probability distribution is the expected value of the squares of the differences of the random variable x from the mean μ. Symbolically,

Var(x) = σ2 = ∑(x – μ)2 f(x)

The Greek symbol σ is “sigma.”

The variance can be difficult to calculate and interpret. It is in units that are the square of the random variable x. Partly because of this, in statistical work it is more common to use the square root of the variance or σ. The standard deviation has the same units as x.

Variance of x, number of females selected

x f(x) x f(x) x - μ (x – μ)2 (x – μ)2f(x)

0 1/8 = 0.125 0.000 -1.5 2.25 0.28125

1 3/8 = 0.375 0.375 -0.5 0.25 0.09375

2 3/8 = 0.375 0.750 0.5 0.25 0.09375

3 1/8 = 0.125 0.375 1.5 2.25 0.28125

Total 8/8 = 1.000 1.500 0.75000

If a random sample of 3 persons is obtained from a large population composed of half females and half males, the expected number of females selected is μ = 1.5. The variance of the number of females selected is Var(x) = σ2 = ∑(x – μ)2 f(x) = 0.75. The standard deviation is the square root of 0.75, so that σ = 0.866.

Later this class or next day

• Binomial probability distribution

• Continuous probability distributions

• Bring along copies of the Normal Distribution for Monday and Wednesday, Sept. 29 and October 1. This is Table 1 of Appendix B of ASW.

Probability and Probability Distributions ASW, Chapter 4-5 Skip sections 4.5, 5.5, 5.6 September 24,...

Documents

Transcript of Probability and Probability Distributions ASW, Chapter 4-5 Skip sections 4.5, 5.5, 5.6 September 24,...