nossi ch 9

107
Chapter 9 Chapter 9 Collecting and Collecting and Interpreting Data Interpreting Data

Transcript of nossi ch 9

Page 1: nossi ch 9

Chapter 9Chapter 9

Collecting and Collecting and Interpreting DataInterpreting Data

Page 2: nossi ch 9

Section 9.1Section 9.1Populations, Samples, and DataPopulations, Samples, and Data

• GoalsGoals• Study populations and samplesStudy populations and samples• Study dataStudy data

• Quantitative dataQuantitative data• Qualitative dataQualitative data

• Study biasStudy bias• Study simple random samplingStudy simple random sampling

Page 3: nossi ch 9

Populations and SamplesPopulations and Samples• The entire set of objects being studied is The entire set of objects being studied is

called the called the populationpopulation. .

• The members of a population are called The members of a population are called elementselements..

Page 4: nossi ch 9

Populations and Samples, cont’dPopulations and Samples, cont’d• Any characteristic of elements of the Any characteristic of elements of the

population is called a population is called a variablevariable..

• Quantitative Quantitative variables can be expressed variables can be expressed as numbers.as numbers.

• QualitativeQualitative variables cannot be expressed variables cannot be expressed as numbers-usually expressed as as numbers-usually expressed as categories.categories.

Page 5: nossi ch 9

Populations and SamplesPopulations and Samples

• A A censuscensus measures the variable for measures the variable for every element of the population.every element of the population.• A census is time-consuming and A census is time-consuming and

expensive, unless the population is very expensive, unless the population is very small.small.

• Instead of dealing with the entire Instead of dealing with the entire population, a subset, called a population, a subset, called a samplesample, , is usually selected for study.is usually selected for study.

Page 6: nossi ch 9

Example 1Example 1• Suppose you want to determine voter Suppose you want to determine voter

opinion on a ballot measure. You opinion on a ballot measure. You survey potential voters among survey potential voters among pedestrians on Main Street during pedestrians on Main Street during lunch.lunch.

a)a) What is the population?What is the population?

b)b) What is the sample?What is the sample?

c)c) What is the variable being measured?What is the variable being measured?

Page 7: nossi ch 9

Example 1Example 1a)a) Solution: The population consists of all the Solution: The population consists of all the

people who intend to vote on the ballot people who intend to vote on the ballot measure.measure.

Page 8: nossi ch 9

Example 1Example 1b)b) Solution: The sample consists of all the people Solution: The sample consists of all the people

you interviewed on Main Street who intend to you interviewed on Main Street who intend to vote on the ballot measure.vote on the ballot measure.

Page 9: nossi ch 9

Example 1Example 1

c)c) Solution: The variable being Solution: The variable being measured is the voter’s intent to measured is the voter’s intent to vote “yes” or “no” on the ballot vote “yes” or “no” on the ballot measure. measure.

Page 10: nossi ch 9

• Qualitative data with a natural ordering is called Qualitative data with a natural ordering is called ordinalordinal..• For example, a ranking of a pizza on a scale of “Excellent” to “Poor” For example, a ranking of a pizza on a scale of “Excellent” to “Poor”

is ordinal.is ordinal.• Qualitative data without a natural ordering is called Qualitative data without a natural ordering is called nominalnominal..

• For example, eye color is nominal. For example, eye color is nominal.

Page 11: nossi ch 9

Example 2Example 2• Suppose you survey potential voters Suppose you survey potential voters

among the people on Main Street among the people on Main Street during lunch to determine their political during lunch to determine their political affiliation and age, as well as their affiliation and age, as well as their opinion on the ballot measure.opinion on the ballot measure.

• Classify the variables as quantitative or Classify the variables as quantitative or qualitative. qualitative.

Page 12: nossi ch 9

Example 2Example 2

• Solution: Solution: • Political affiliation is a qualitative variable Political affiliation is a qualitative variable

(categories)(categories)• Age is a quantitative variable (numbers)Age is a quantitative variable (numbers)• Opinion on the ballot measure is a Opinion on the ballot measure is a

qualitative variable (categories)qualitative variable (categories)

Page 13: nossi ch 9

Common Sources of BiasCommon Sources of Bias• Faulty samplingFaulty sampling: The sample is not : The sample is not

representative.representative.• Faulty questionsFaulty questions: The questions are : The questions are

worded to influence the answers.worded to influence the answers.• Faulty interviewingFaulty interviewing: Interviewers fail to : Interviewers fail to

survey the entire sample, misread survey the entire sample, misread questions, and/or misinterpret answers.questions, and/or misinterpret answers.

Page 14: nossi ch 9

Common Sources of Bias, cont’dCommon Sources of Bias, cont’d

• Lack of understanding or knowledgeLack of understanding or knowledge: : The person being interviewed does not The person being interviewed does not understand the question or needs more understand the question or needs more information.information.

• False answersFalse answers: The person being : The person being interviewed intentionally gives incorrect interviewed intentionally gives incorrect information.information.

Page 15: nossi ch 9

Example 3Example 3

• Suppose you wish to determine voter Suppose you wish to determine voter opinion regarding eliminating the capital opinion regarding eliminating the capital gains tax. You survey potential voters gains tax. You survey potential voters on a street corner near Wall Street in on a street corner near Wall Street in New York City.New York City.

• Identify a source of bias in this poll.Identify a source of bias in this poll.

Page 16: nossi ch 9

Example 3Example 3

• Solution: One source of bias in Solution: One source of bias in choosing the sample is that people who choosing the sample is that people who work on Wall Street would benefit from work on Wall Street would benefit from the elimination of the tax and are more the elimination of the tax and are more likely to favor the elimination than the likely to favor the elimination than the average voter may be.average voter may be.• This is faulty sampling.This is faulty sampling.

Page 17: nossi ch 9

Example 4Example 4

• Suppose a car manufacturer wants to Suppose a car manufacturer wants to test the reliability of 1000 alternators. test the reliability of 1000 alternators. They will test the first 30 from the lot for They will test the first 30 from the lot for defects.defects.

• Identify any potential sources of bias.Identify any potential sources of bias.

Page 18: nossi ch 9

Example 4Example 4• Solution: One source of bias could be that Solution: One source of bias could be that

the first 30 alternators are chosen for the the first 30 alternators are chosen for the sample. It may be that defects are either sample. It may be that defects are either much more likely at the beginning of a much more likely at the beginning of a production run or much less likely at the production run or much less likely at the beginning. In either case, the sample would beginning. In either case, the sample would not be representative.not be representative.• This is potentially faulty sampling.This is potentially faulty sampling.

Page 19: nossi ch 9

Simple Random SamplesSimple Random Samples

Given a population and a desired Given a population and a desired sample size, a sample size, a simple random samplesimple random sample is any sample chosen in such a way is any sample chosen in such a way that all samples of the same size are that all samples of the same size are equally likely to be chosen.equally likely to be chosen.

Page 20: nossi ch 9

Simple Random Samples, cont’dSimple Random Samples, cont’d• One way to choose a simple random sample One way to choose a simple random sample

is to use a random number generator or is to use a random number generator or table.table.• A A random number generatorrandom number generator is a computer or is a computer or

calculator program designed to produce calculator program designed to produce numbers with no apparent pattern.numbers with no apparent pattern.

• A A random number tablerandom number table is a table produced is a table produced with a random number generator.with a random number generator.

• An example of the first few rows of a random An example of the first few rows of a random number table is shown on the next slide.number table is shown on the next slide.

Page 21: nossi ch 9

Random Number TableRandom Number Table

Page 22: nossi ch 9

Example 5Example 5

• Choose a simple random sample of Choose a simple random sample of size 5 from 12 semifinalists: Astoria, size 5 from 12 semifinalists: Astoria, Beatrix, Charles, Delila, Elsie, Frank, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Gaston, Heidi, Ian, Jose, Kirsten, and Lex.Lex.

Page 23: nossi ch 9

Example 5, cont’dExample 5, cont’d

• Solution: Assign numerical labels to the Solution: Assign numerical labels to the population elements, in any order, as shown population elements, in any order, as shown below:below:

Page 24: nossi ch 9

Example 5Example 5• Choose a random spot in the table to begin. Choose a random spot in the table to begin.

• One option is to start at the top of the third One option is to start at the top of the third column and to read down, looking at the last 2 column and to read down, looking at the last 2 digits in each number. This choice is arbitrary. digits in each number. This choice is arbitrary. There are many ways to use this table.There are many ways to use this table.

• Numbers that correspond to population labels Numbers that correspond to population labels are recorded, ignoring duplicates, until 5 such are recorded, ignoring duplicates, until 5 such numbers have been found.numbers have been found.

Page 25: nossi ch 9

Example 5Example 5

Page 26: nossi ch 9

Example 5Example 5• The numbers located are 01, 06, 10, The numbers located are 01, 06, 10,

11, and 07.11, and 07.• The simple random sample consists of The simple random sample consists of

Beatrix, Gaston, Heidi, Kirsten, and Lex.Beatrix, Gaston, Heidi, Kirsten, and Lex.

Page 27: nossi ch 9

Example 6Example 6

• Choose a simple random sample of Choose a simple random sample of size 8 from the states of the United size 8 from the states of the United States of America.States of America.

Page 28: nossi ch 9

Example 6Example 6

Page 29: nossi ch 9

Example 6Example 6• We randomly choose to start at the top We randomly choose to start at the top

row, left column of the number table row, left column of the number table and read the last 2 digits of each entry and read the last 2 digits of each entry across the row.across the row.• The entries are 039The entries are 0391818 771 7719595 477 4777272

2182187070 871 8712222 994 9944545 100 1004141 317 3179595 638 6385757 6456456969 348 3489393 204 2042929 435 4353737 253 2536868 952 95237 37 1771770707 342 3428080 047 0475555 643 6430101 668 6683636 1221220101……

Page 30: nossi ch 9

Example 6Example 6

• The numbers obtained from the table are The numbers obtained from the table are 18, 22, 45, 41, 29, 37, 07, 01.18, 22, 45, 41, 29, 37, 07, 01.

• The states selected for the sample are The states selected for the sample are Washington, Florida, Vermont, West Washington, Florida, Vermont, West Virginia, Arkansas, Kentucky, Nevada, Virginia, Arkansas, Kentucky, Nevada, and Alaska.and Alaska.

Page 31: nossi ch 9

Section 9.2Section 9.2Survey Sampling MethodsSurvey Sampling Methods

• GoalsGoals• Study sampling methodsStudy sampling methods

• Independent samplingIndependent sampling• Systematic samplingSystematic sampling• Quota samplingQuota sampling• Stratified samplingStratified sampling• Cluster samplingCluster sampling

Page 32: nossi ch 9

9.2 Initial Problem9.2 Initial Problem• You need to interview at least 800 people You need to interview at least 800 people

nationwide.nationwide.• You need a different interviewer for each county.You need a different interviewer for each county.• Each interviewer costs $50 plus $10 per interview.Each interviewer costs $50 plus $10 per interview.• Your budget is $15,000Your budget is $15,000..

• Which is better, a simple random sample of all Which is better, a simple random sample of all adults in the U.S. or a simple random sample of adults in the U.S. or a simple random sample of adults in randomly-selected counties?adults in randomly-selected counties?• The solution will be given at the end of the section.The solution will be given at the end of the section.

Page 33: nossi ch 9

Sample Survey DesignSample Survey Design

• Simple random sampling can be Simple random sampling can be expensive and time-consuming in expensive and time-consuming in practice.practice.

• Statisticians have developed Statisticians have developed sample sample survey designsurvey design to provide less to provide less expensive alternatives to simple expensive alternatives to simple random sampling. random sampling.

Page 34: nossi ch 9

Independent SamplingIndependent Sampling

• In In independent samplingindependent sampling, each member of , each member of the population has the same fixed chance of the population has the same fixed chance of being selected for the sample.being selected for the sample.• The size of the sample is not fixed ahead of The size of the sample is not fixed ahead of

time.time.• For example, in a 50% independent sample, For example, in a 50% independent sample,

each element of the population has a 50% each element of the population has a 50% chance of being selected. chance of being selected.

Page 35: nossi ch 9

Example 1Example 1

• Find a 50% independent sample of the Find a 50% independent sample of the 12 semifinalists: 12 semifinalists:

Astoria, Beatrix, Charles, Delila, Elsie, Astoria, Beatrix, Charles, Delila, Elsie, Frank, Gaston, Heidi, Ian, Jose, Frank, Gaston, Heidi, Ian, Jose, Kirsten, and Lex. Kirsten, and Lex.

Page 36: nossi ch 9

Example 1Example 1• One suggestion is to let the digits 0, 1, 2, 3, or 4 represent “select this One suggestion is to let the digits 0, 1, 2, 3, or 4 represent “select this

contestant” and let the remaining digits represent “do not select this contestant” and let the remaining digits represent “do not select this contestant”.contestant”.

• We randomly choose column 6 in the random number table and look at We randomly choose column 6 in the random number table and look at the first 12 digits: 99445 20429 04.the first 12 digits: 99445 20429 04.

• Contestants: Astoria, Beatrix, Contestants: Astoria, Beatrix, CharlesCharles, , DelilaDelila, Elsie, , Elsie, FrankFrank, , GastonGaston, , HeidiHeidi, , IanIan, Jose, , Jose, KirstenKirsten, and , and LexLex• The first 9 indicates that Astoria is not selected.The first 9 indicates that Astoria is not selected.• The second 9 indicates that Beatrix is not selected.The second 9 indicates that Beatrix is not selected.• The 4 represents that Charles is selected, and so on…The 4 represents that Charles is selected, and so on…

• The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, The 50% independent sample is Charles, Delila, Frank, Gaston, Heidi, Ian, Kirsten, and Lex. Ian, Kirsten, and Lex.

Page 37: nossi ch 9

Systematic SamplingSystematic Sampling• In In systematic samplingsystematic sampling, we decide ahead of , we decide ahead of

time what proportion of the population we time what proportion of the population we wish to sample.wish to sample.

• For a For a 1-in-1-in-kk systematic sample systematic sample::• List the population elements in some order.List the population elements in some order.• Randomly choose a number, Randomly choose a number, rr, from 1 to , from 1 to kk..• The elements selected are those labeled The elements selected are those labeled rr, , rr + +

kk,, r r + 2 + 2kk, , rr + 3 + 3kk, …, …

Page 38: nossi ch 9

Example 3Example 3

• Use systematic sampling to select a 1-Use systematic sampling to select a 1-in-10 systematic sample of the 100 in-10 systematic sample of the 100 automobiles produced in one day at a automobiles produced in one day at a factory. factory.

Page 39: nossi ch 9

Example 3Example 3• Solution: List the Solution: List the

automobiles in some automobiles in some order.order.

• Suppose we randomly Suppose we randomly choose choose rr = 5. = 5.• Since Since rr = 5 and = 5 and kk = 10, = 10,

the automobiles selected the automobiles selected for the sample are those for the sample are those labeled 5, 15, 25, 35, 45, labeled 5, 15, 25, 35, 45, 55, 65, 75, 85, and 95.55, 65, 75, 85, and 95.

Page 40: nossi ch 9

Example 3Example 3

• A systematic sample is easier to A systematic sample is easier to choose than an independent sample.choose than an independent sample.

• However, the regularity in the selection However, the regularity in the selection of a systematic sample can sometimes of a systematic sample can sometimes be a source of bias. be a source of bias.

Page 41: nossi ch 9

Quota SamplingQuota Sampling• In In quota samplingquota sampling, the sample is chosen to , the sample is chosen to

be representative for known important be representative for known important variables. variables. • Quotas may be set for age groups, genders, Quotas may be set for age groups, genders,

ethnicities, occupations, and so on.ethnicities, occupations, and so on.• There is no way to know ahead of time which There is no way to know ahead of time which

variables are important enough to require variables are important enough to require quotas.quotas.

• Quota sampling is not always reliable. Quota sampling is not always reliable.

Page 42: nossi ch 9

Stratified SamplingStratified Sampling• In In stratified samplingstratified sampling, the population is subdivided into 2 or more , the population is subdivided into 2 or more

nonoverlapping subsets, each of which is called a nonoverlapping subsets, each of which is called a stratumstratum. Examples . Examples of strata are:of strata are:

• Men and womenMen and women• Children, working adults, retired adultsChildren, working adults, retired adults

Page 43: nossi ch 9

Example 4Example 4Select a stratified random sample of 10 men Select a stratified random sample of 10 men

and 10 women from a population of 200 and 10 women from a population of 200 (100 men and 100 women).(100 men and 100 women).

Solution: The 2 strata are men and women.Solution: The 2 strata are men and women.

Choose a simple random sample from the men.Choose a simple random sample from the men.Number the 100 men with labels 00 through 99.Number the 100 men with labels 00 through 99.

Use the random number table to choose 10 men.Use the random number table to choose 10 men.

Repeat for the women.Repeat for the women.

Page 44: nossi ch 9

Example 4Example 4The stratified random sample is represented The stratified random sample is represented

below. below.

Page 45: nossi ch 9

Cluster SamplingCluster Sampling• In In cluster samplingcluster sampling, the population is divided into , the population is divided into

nonoverlapping subsets called nonoverlapping subsets called sampling unitssampling units or or clustersclusters. . • Clusters may vary in size.Clusters may vary in size.

• A A frameframe is a complete list of the sampling units. is a complete list of the sampling units.• A A samplesample is a collection of sampling units selected from is a collection of sampling units selected from

the frame.the frame.• Examples: Examples:

• CountiesCounties• CitiesCities• CollegesColleges

Page 46: nossi ch 9

Sampling SummarySampling Summary

Page 47: nossi ch 9

9.2 Initial Problem Solution9.2 Initial Problem Solution• You need to interview at least 800 people You need to interview at least 800 people

nationwide.nationwide.• You need a different interviewer for each You need a different interviewer for each

county.county.• Each interviewer costs $50 plus $10 per Each interviewer costs $50 plus $10 per

interview.interview.• Your budget is $15,000.Your budget is $15,000.

• Which is better, a simple random sample of all Which is better, a simple random sample of all adults in the U.S. or a simple random sample adults in the U.S. or a simple random sample of adults in randomly-selected counties?of adults in randomly-selected counties?

Page 48: nossi ch 9

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• A simple random sample is unbiased, so A simple random sample is unbiased, so

this might seem to be the best choice.this might seem to be the best choice.• However, there are 3130 counties in the However, there are 3130 counties in the

U.S.U.S.• If, for example, you get people in your sample If, for example, you get people in your sample

from only 400 of the counties, it would cost you from only 400 of the counties, it would cost you 400($50) + 800($10) = $28,000.400($50) + 800($10) = $28,000.

• You cannot afford to choose a simple You cannot afford to choose a simple random sample. random sample.

Page 49: nossi ch 9

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• The second type of sample is a much less The second type of sample is a much less

expensive choice.expensive choice.• You must pay 800($10) = $8000 for the You must pay 800($10) = $8000 for the

interviews, which leaves $7000 for hiring interviews, which leaves $7000 for hiring interviewers.interviewers.• You can select a simple random sample of up to You can select a simple random sample of up to

140 counties.140 counties.• Then select a simple random sample of people Then select a simple random sample of people

from each selected county, for a total of 800 from each selected county, for a total of 800 people. people.

Page 50: nossi ch 9

Section 9.3Section 9.3Central Tendency and VariabilityCentral Tendency and Variability

• GoalsGoals• Study measures of central tendencyStudy measures of central tendency

• MeanMean• MedianMedian• ModeMode

• Study measures of dispersion (spread of the data)Study measures of dispersion (spread of the data)• RangeRange• QuartilesQuartiles• Standard deviationStandard deviation

Page 51: nossi ch 9

The MeanThe Mean• The The meanmean is the most common type of is the most common type of

average.average.• This is an arithmetic mean.This is an arithmetic mean.

• If there are If there are NN numbers in a data set, the numbers in a data set, the mean is: mean is:

1 2 Nx x xN

+ + +Lx

Page 52: nossi ch 9

Example 1Example 1• Find the mean of each data set.Find the mean of each data set.

1, 1, 2, 2, 31, 1, 2, 2, 3• Solution:Solution:

The mean is The mean is

1 1 2 2 3 9 415 5 5

+ + + += =

x

Page 53: nossi ch 9

Example 2Example 2

• A college graduate reads that a A college graduate reads that a company with 5 employees has a mean company with 5 employees has a mean salary of $48,000. salary of $48,000.

• How might this be misleading? How might this be misleading?

Page 54: nossi ch 9

Example 2Example 2• One possibility is that every employee earns a One possibility is that every employee earns a

salary of $48,000.salary of $48,000.

• Another possibility is that the owner makes Another possibility is that the owner makes $120,000, while the other 4 employees each earn $120,000, while the other 4 employees each earn $30,000.$30,000.

48000 48000 48000 48000 48000 240000 $48,0005 5

+ + + += =

120000 30000 30000 30000 30000 240000 $48,0005 5

+ + + += =

Page 55: nossi ch 9

The MedianThe Median• The The medianmedian is the “middle number” of a data is the “middle number” of a data

set when the values are arranged from set when the values are arranged from smallest to largest.smallest to largest.• If there are an odd number of data points, the If there are an odd number of data points, the

data point exactly in the middle of the list is the data point exactly in the middle of the list is the median.median.

• If there are an even number of data points, the If there are an even number of data points, the mean of the two data points in the middle of the mean of the two data points in the middle of the list is the median.list is the median.

Page 56: nossi ch 9

Example 3Example 3

• Find the mean and median of each Find the mean and median of each data set.data set.

a)a) 0, 2, 40, 2, 4

b)b) 0, 2, 4, 100, 2, 4, 10

c)c) 0, 2, 4, 10, 10000, 2, 4, 10, 1000

Page 57: nossi ch 9

Example 3, cont’dExample 3, cont’d

a)a) Solution for 0, 2, 4Solution for 0, 2, 4

• The median is 2.The median is 2.

• The mean is: The mean is:

0 2 4 6 23 3

+ += =

Page 58: nossi ch 9

Example 3, cont’dExample 3, cont’d

b)b) Solution: for 0, 2, 4, 10 Solution: for 0, 2, 4, 10

• The median is:The median is:

• The mean is:The mean is:

2 4 6 32 2+

= =

0 2 4 10 16 44 4

+ + += =

Page 59: nossi ch 9

Example 3, cont’dExample 3, cont’d

c)c) Solution: for 0, 2, 4, 10, 1000 Solution: for 0, 2, 4, 10, 1000

• The median is 4.The median is 4.

• The mean is:The mean is:

0 2 4 10 1000 1016 203.25 5

+ + + += =

Page 60: nossi ch 9

Example 3, cont’dExample 3, cont’d

• One very large or very small data One very large or very small data value can change the mean value can change the mean dramatically.dramatically.

• Large or small data values do not Large or small data values do not have much of an effect on the have much of an effect on the median.median.

Page 61: nossi ch 9

Symmetric DistributionsSymmetric Distributions• If the mean and median of a data set are equal, the If the mean and median of a data set are equal, the

data distribution is called data distribution is called symmetricsymmetric..• An example of a symmetric data set is shown An example of a symmetric data set is shown

below. below.

Page 62: nossi ch 9

Skewed DistributionsSkewed Distributions• A distribution is A distribution is skewed leftskewed left if the mean is less than if the mean is less than

the median.the median.• A distribution is A distribution is skewed rightskewed right if the mean is greater if the mean is greater

than the median.than the median.

Page 63: nossi ch 9

The ModeThe Mode

• The mode is the most commonly-The mode is the most commonly-occurring value in a data set.occurring value in a data set.

• A data set may have:A data set may have:• No mode.No mode.• One mode.One mode.• Multiple modes.Multiple modes.

Page 64: nossi ch 9

Example 5Example 5

• Find the mode(s) of the following set of Find the mode(s) of the following set of test scores: 26, 32, 54, 62, 67, 70, 71, test scores: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.93, 95, 96.

• Solution: The value 87 occurs more Solution: The value 87 occurs more times than any other score. The mode times than any other score. The mode is 87.is 87.

Page 65: nossi ch 9

Example 5, cont’dExample 5, cont’d

Page 66: nossi ch 9

The Weighted MeanThe Weighted Mean• A weighted mean is calculated when A weighted mean is calculated when

different data points have different different data points have different levels of importance, called weights.levels of importance, called weights.

• If the numbers in a data set,If the numbers in a data set,

, have weights , have weights

then the weighted mean is:then the weighted mean is:1 2, , , Nx x xL 1 2, , , Nw w wL

1 1 2 2

1 2

N N

N

w x w x w xw w w+ + ++ + +

LL

Page 67: nossi ch 9

Example 6Example 6

• Suppose your grades one semester are:Suppose your grades one semester are:• An A in a 5-credit courseAn A in a 5-credit course• A B in a 4-credit courseA B in a 4-credit course• A C in two 3-credit coursesA C in two 3-credit courses

• What is your GPA that semester?What is your GPA that semester?

Page 68: nossi ch 9

Example 6Example 6• Solution: A grade of A is worth 4 points, Solution: A grade of A is worth 4 points,

a B 3 points, and a C 2 points.a B 3 points, and a C 2 points.• The weights are the number of credits.The weights are the number of credits.• Your GPA is the weighted mean of your Your GPA is the weighted mean of your

grades:grades:

4(5) 3(4) 2(3) 2(3) 2.935 4 3 3

+ + +≈

+ + +

Page 69: nossi ch 9

Measures of VariabilityMeasures of Variability• The measures of central tendency describe The measures of central tendency describe

only part of the behavior of a data set.only part of the behavior of a data set.• Statistics that tell us how the data varies from Statistics that tell us how the data varies from

its center are called its center are called measures of variabilitymeasures of variability or or measures of spreadmeasures of spread..

• The measures of variability studied here are:The measures of variability studied here are:• RangeRange• QuartilesQuartiles• Standard deviationStandard deviation

Page 70: nossi ch 9

The RangeThe Range

• The range of a data set is the difference The range of a data set is the difference between the largest data value and the between the largest data value and the smallest data value.smallest data value.

Page 71: nossi ch 9

Example 8Example 8

• Compute the mean and the range for Compute the mean and the range for each data set.each data set.

a)a) 3, 4, 5, 6, 7, 83, 4, 5, 6, 7, 8

b)b) 0, 2, 5, 7, 8, 110, 2, 5, 7, 8, 11

Page 72: nossi ch 9

Example 8, cont’dExample 8, cont’d• Solution:Solution:

a)a) 3, 4, 5, 6, 7, 83, 4, 5, 6, 7, 8• The mean is 5.5. The mean is 5.5. • The range is 8 – 3 = 5.The range is 8 – 3 = 5.

b)b) 0, 2, 5, 7, 8, 110, 2, 5, 7, 8, 11• The mean is 5.5. The mean is 5.5. • The range is 11 – 0 = 11.The range is 11 – 0 = 11.

• The two data sets have the same mean, but the The two data sets have the same mean, but the difference in ranges shows that the second data difference in ranges shows that the second data set is more spread out.set is more spread out.

Page 73: nossi ch 9

QuartilesQuartiles• QuartilesQuartiles are measures of location that divide are measures of location that divide

a data set approximately into fourths.a data set approximately into fourths.• The quartiles are labeled as the The quartiles are labeled as the

• first quartile, qfirst quartile, q11

• second quartile, qsecond quartile, q22

• The second quartile is the same as the The second quartile is the same as the median.median.

• third quartile, qthird quartile, q33

Page 74: nossi ch 9

QuartilesQuartiles• To find the quartiles, arrange the data To find the quartiles, arrange the data

values in order from smallest to values in order from smallest to largest.largest.1)1) Find the median. This is also the second Find the median. This is also the second

quartile.quartile.

2)2) If the number of data points is even, go to If the number of data points is even, go to Step 3. If the number of data point is odd, Step 3. If the number of data point is odd, remove the median from the list before remove the median from the list before going to Step 3.going to Step 3.

Page 75: nossi ch 9

QuartilesQuartiles

3)3) Divide the remaining data points into Divide the remaining data points into a lower half and an upper half.a lower half and an upper half.

4)4) The first quartile, The first quartile, qq11, is the median of , is the median of the lower half of the data.the lower half of the data.

5)5) The third quartile, The third quartile, qq33, is the median , is the median of the upper half of the data. of the upper half of the data.

Page 76: nossi ch 9

Quartiles, cont’dQuartiles, cont’d• The interquartile range, IQR, is the The interquartile range, IQR, is the

difference between the first and third difference between the first and third quartiles.quartiles.

• IQR = IQR = qq33 - - qq11

• The IQR is a measure of variability.The IQR is a measure of variability.• About half of the data points lie within About half of the data points lie within

the IQRthe IQR

Page 77: nossi ch 9

Example 10Example 10

• Find the median, the first and third Find the median, the first and third quartiles, and the interquartile range for quartiles, and the interquartile range for the test scores: the test scores:

• 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.80, 81, 84, 87, 87, 87, 89, 93, 95, 96.

Page 78: nossi ch 9

Example 10Example 1026, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.

• The median is The median is

• Since there is an even number of data points, we do not Since there is an even number of data points, we do not remove the median from the list.remove the median from the list.

• The first quartile is the median of the lower half of The first quartile is the median of the lower half of the list: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76.the list: 26, 32, 54, 62, 67, 70, 71, 71, 74, 76.• The first quartile is The first quartile is

76 80 782

m += =

167 70 68.5

2q +

= =

Page 79: nossi ch 9

Example 10Example 10 The third quartile is the median of the The third quartile is the median of the

upper half of the list: upper half of the list:

80, 81, 84, 87, 87, 87, 89, 93, 95, 96.80, 81, 84, 87, 87, 87, 89, 93, 95, 96.• The third quartile is The third quartile is

• The IQR is 87 – 68.5 = 18.5 (Q3-Q1)The IQR is 87 – 68.5 = 18.5 (Q3-Q1)

387 87 87

2q +

= =

Page 80: nossi ch 9

The Five-Number SummaryThe Five-Number Summary• The The five-number summaryfive-number summary of a data set is a of a data set is a

list of 5 informative numbers related to that list of 5 informative numbers related to that set:set:• The smallest value, The smallest value, ss• The first quartile, The first quartile, qq11

• The median, The median, mm

• The third quartile, The third quartile, qq33

• The largest value, The largest value, LL• The numbers are always written in this order.The numbers are always written in this order.

Page 81: nossi ch 9

Example 11Example 11

• Consider the set of test scores from the Consider the set of test scores from the previous example: previous example:

26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 26, 32, 54, 62, 67, 70, 71, 71, 74, 76, 80, 81, 84, 87, 87, 87, 89, 93, 95, 96.80, 81, 84, 87, 87, 87, 89, 93, 95, 96.

• The five-number summary for this data The five-number summary for this data set is 26, 68.5, 78, 87, 96.set is 26, 68.5, 78, 87, 96.

Page 82: nossi ch 9

Box-and-Whisker PlotBox-and-Whisker Plot

• The The box-and-whisker plotbox-and-whisker plot, also called a , also called a box box plotplot, is a graphical representation of the five-, is a graphical representation of the five-number summary of a data set.number summary of a data set.• The box (rectangle) represents the IQR.The box (rectangle) represents the IQR.

• The location of the median is marked within the box.The location of the median is marked within the box.

• The whiskers (lines) represent the lower and The whiskers (lines) represent the lower and upper 25% of the data.upper 25% of the data.

Page 83: nossi ch 9

Box-and-Whisker PlotBox-and-Whisker Plot

Page 84: nossi ch 9

Example 12Example 12• The list of test scores from the previous The list of test scores from the previous

example had a five-number summary of example had a five-number summary of

26, 68.5, 78, 87, 96.26, 68.5, 78, 87, 96.• The box-and-whisker plot for this data set is The box-and-whisker plot for this data set is

shown below.shown below.

Page 85: nossi ch 9

Example 13Example 13• The monthly rainfall for 2 cities is shown The monthly rainfall for 2 cities is shown

below.below.• Use box-and-whisker plots to compare the Use box-and-whisker plots to compare the

rainfall amounts.rainfall amounts.

Page 86: nossi ch 9

Example 13, cont’dExample 13, cont’d• Solution: In St. Louis, MO, the rainfalls were: Solution: In St. Louis, MO, the rainfalls were:

2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 2.21, 2.23, 2.31, 2.64, 2.96, 3.20, 3.26, 3.29, 3.74, 4.10, 4.12.3.74, 4.10, 4.12.• The median is 3.08.The median is 3.08.• The first quartile is 2.475.The first quartile is 2.475.• The third quartile is 3.515.The third quartile is 3.515.

• The five-number summary for St. Louis is The five-number summary for St. Louis is 2.21, 2.475, 3.08, 3.515, 4.12.2.21, 2.475, 3.08, 3.515, 4.12.

Page 87: nossi ch 9

Example 13, cont’dExample 13, cont’d• Solution, cont’d: In Portland, OR, the rainfalls Solution, cont’d: In Portland, OR, the rainfalls

were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, were: 0.46, 1.13, 1.47, 1.61, 2.08, 2.31, 3.05, 3.61, 3.93, 5.17, 6.14, 6.16.3.61, 3.93, 5.17, 6.14, 6.16.• The median is 2.68.The median is 2.68.• The first quartile is 1.54.The first quartile is 1.54.• The third quartile is 4.55.The third quartile is 4.55.

• The five-number summary for Portland is The five-number summary for Portland is 0.46, 1.54, 2.68, 4.55, 6.16.0.46, 1.54, 2.68, 4.55, 6.16.

Page 88: nossi ch 9

Example 13Example 13

• Solution, cont’d: The 2 box-and-whisker plots Solution, cont’d: The 2 box-and-whisker plots are shown above.are shown above.

• Note that the amount of rainfall in Portland, Note that the amount of rainfall in Portland, OR, varies much more from month-to-month OR, varies much more from month-to-month than it does in St. Louis, MO. than it does in St. Louis, MO.

Page 89: nossi ch 9

Standard DeviationStandard Deviation• The standard deviation is a widely-used The standard deviation is a widely-used

measure of variability.measure of variability.• Calculating the standard deviation requires Calculating the standard deviation requires

several intermediate steps, which will be several intermediate steps, which will be illustrated using the data set of incomes illustrated using the data set of incomes shown below.shown below.

Page 90: nossi ch 9

Deviation From The MeanDeviation From The Mean

• The difference between a data point and the The difference between a data point and the mean of the data set is called the mean of the data set is called the deviation deviation from the meanfrom the mean of that data point. of that data point.

Page 91: nossi ch 9

Deviation From The Mean, cont’dDeviation From The Mean, cont’d

• The mean income is $35,800.The mean income is $35,800.

Page 92: nossi ch 9

Sample VarianceSample Variance• The variance of the incomes is calculated by first The variance of the incomes is calculated by first

squaring all the deviations.squaring all the deviations.

Page 93: nossi ch 9

Sample Variance, cont’dSample Variance, cont’d

• The squared deviations are added and The squared deviations are added and then divided by then divided by nn – 1 = 9 – 1 = 8. – 1 = 9 – 1 = 8.

• 2,465,560,000 308,195,0008

=

Page 94: nossi ch 9

Standard DeviationStandard Deviation• Standard deviation is the square root Standard deviation is the square root

of the variance.of the variance.• The standard deviation of the incomes is:The standard deviation of the incomes is:

2 308,195,000 $17,555.00s s= = ≈

Page 95: nossi ch 9

Example 14Example 14

• Find the sample standard deviation of Find the sample standard deviation of the weights (in pounds) in the 2 data the weights (in pounds) in the 2 data sets.sets.• Turkeys: 17, 18, 19, 20, 21Turkeys: 17, 18, 19, 20, 21• Dogs: 13, 16, 19, 22, 25Dogs: 13, 16, 19, 22, 25

Page 96: nossi ch 9

Example 14Example 14• Solution: Solution: • The sample mean for the turkeys is 19 The sample mean for the turkeys is 19

pounds.pounds.• The sample mean for the dogs is also The sample mean for the dogs is also

19 pounds.19 pounds.• We note that although the means are the We note that although the means are the

same, the standard deviations should same, the standard deviations should reflect the amount of variability in the data reflect the amount of variability in the data values.values.

Page 97: nossi ch 9

Example 14Example 14

The deviations from the mean for the The deviations from the mean for the turkey weights are found.turkey weights are found.

Page 98: nossi ch 9

Example 14Example 14• The sample variance The sample variance

of the turkey weights of the turkey weights is 2.5 square is 2.5 square pounds. pounds.

• The sample standard The sample standard deviation of the deviation of the turkey weights is turkey weights is 1.58 pounds.1.58 pounds.

2 2.5 1.58s s= = ≈

2 10 10 2.55 1 4

s = = =−

Page 99: nossi ch 9

Example 14Example 14

The deviations from the mean for the dog The deviations from the mean for the dog weights are found.weights are found.

Page 100: nossi ch 9

Example 14Example 14• The sample variance The sample variance

of the dog weights is of the dog weights is 22.5 square pounds. 22.5 square pounds.

• The sample standard The sample standard deviation of the dog deviation of the dog weights is 4.74 weights is 4.74 pounds.pounds.2 22.5 4.74s s= = ≈

2 90 90 22.55 1 4

s = = =−

Page 101: nossi ch 9

Example 14Example 14

• The sample standard deviation of the turkey The sample standard deviation of the turkey weights is 1.58 pounds.weights is 1.58 pounds.

• The sample standard deviation of the dog weights The sample standard deviation of the dog weights is 4.74 pounds.is 4.74 pounds.

• The standard deviation of the sample of dog The standard deviation of the sample of dog weights is larger than the standard deviation weights is larger than the standard deviation of the sample of turkey weights because of the sample of turkey weights because there was a much wider spread among the there was a much wider spread among the dog weights.dog weights.

Page 102: nossi ch 9

9.3 Initial Problem Solution9.3 Initial Problem Solution• Which stockbroker should you choose if Which stockbroker should you choose if

you want to minimize risk while you want to minimize risk while maintaining a steady rate of growth?maintaining a steady rate of growth?• One stockbroker’s recommendations had One stockbroker’s recommendations had

percentage gains of 21%, -3%, 16%, 27%, percentage gains of 21%, -3%, 16%, 27%, 9%, 11%, 13%, 6%, and 17%.9%, 11%, 13%, 6%, and 17%.

• The other’s recommendations had The other’s recommendations had percentage gains of 11%, 13%, 16%, 8%, percentage gains of 11%, 13%, 16%, 8%, 5%, 14%, 15%, 17%, and 18%.5%, 14%, 15%, 17%, and 18%.

Page 103: nossi ch 9

Initial Problem SolutionInitial Problem Solution• First you could calculate the mean rate of First you could calculate the mean rate of

return for each stockbroker.return for each stockbroker.• Both stockbrokers have a mean rate of Both stockbrokers have a mean rate of

return of 13%.return of 13%.• Since the average growth rates are the Since the average growth rates are the

same, you can measure the variability to same, you can measure the variability to determine which stockbroker’s determine which stockbroker’s recommendations have the least recommendations have the least variability. variability.

Page 104: nossi ch 9

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• First stockbroker:First stockbroker:

Page 105: nossi ch 9

Initial Problem Solution, cont’dInitial Problem Solution, cont’d• Second stockbroker:Second stockbroker:

Page 106: nossi ch 9

Initial Problem Solution, cont’dInitial Problem Solution, cont’d

• The standard deviation of the second The standard deviation of the second portfolio 4.30 is much smaller than the portfolio 4.30 is much smaller than the standard deviation of the first stock standard deviation of the first stock portfolio 8.73.portfolio 8.73.

• Since the growth rates were the same, Since the growth rates were the same, the second stockbroker should be the second stockbroker should be chosen in order to minimize risk.chosen in order to minimize risk.

Page 107: nossi ch 9

Ch 9 Assignment• You must show some work for calculations to

receive full credit.• Section 9.1 pg 573 (1,3,4,13,14,19,23,25,27)• Section 9.2 pg 586 (1,2,21,27,33,39)• Section 9.3 pg 614 (1,5,15,16,19,21,33 and

find standard deviation=square root of the variance, 35)

• I will also be giving an extra credit assignment. You will review an article from the Tennessean. This assignment can count as a homework.