Chapter 1 Getting Started Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared...
-
Upload
natalie-thompson -
Category
Documents
-
view
231 -
download
0
Transcript of Chapter 1 Getting Started Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared...
Chapter 1
Getting Started
Understanding Basic Statistics Fifth Edition
By Brase and Brase Prepared by Jon Booze
© Cengage Learning. All rights reserved. 1 | 2
What is Statistics?
• Collecting data• Organizing data• Analyzing data• Interpreting data
© Cengage Learning. All rights reserved. 1 | 3
Individuals and Variables
• Individuals are people or objects included in the study.
• Variables are characteristics of the individual to be measured or observed.
© Cengage Learning. All rights reserved. 1 | 4
Variables
• Quantitative Variable – The variable is numerical, so operations such as adding and averaging make sense.
• Qualitative Variable – The variable describes an individual through grouping or categorization.
© Cengage Learning. All rights reserved. 1 | 5
Data
• Population Data – The data are from every individual of interest.
• Sample Data – The data are from only some of the individuals of interest.
© Cengage Learning. All rights reserved. 1 | 6
DataWhich of the following Venn diagrams shows the
relationship between population data and sample data?
a). b).
c). d).
S P
S
P S
P
P
S
© Cengage Learning. All rights reserved. 1 | 7
Levels of Measurement
• Nominal Level – The data consists of names, labels, or categories.
• Ordinal Level – The data can be ordered, but the differences between data values are meaningless.
© Cengage Learning. All rights reserved. 1 | 8
Levels of Measurement
• Interval Level – The data can be ordered and the differences between data values are meaningful.
• Ratio Level – The data can be ordered, differences and ratios are meaningful, and there is a meaningful zero value.
© Cengage Learning. All rights reserved. 1 | 9
Levels of Measurement
The freezing points of four liquids are 32°F, 6°F, 13°F, and 20°F. What is the level of these measurements?
a). Nominalb). Ordinalc). Intervald). Ratio
© Cengage Learning. All rights reserved. 1 | 10
Levels of Measurement
The freezing points of four liquids are 32°F, 6°F, 13°F, and 20°F. What is the level of these measurements?
a). Nominalb). Ordinalc). Intervald). Ratio
© Cengage Learning. All rights reserved. 1 | 11
Two Branches of Statistics
• Descriptive Statistics: Organizing, summarizing, and graphing information from populations or samples.
• Inferential Statistics: Using information from a sample to draw conclusions about a population.
© Cengage Learning. All rights reserved. 1 | 12
Sampling Techniques• Simple Random Sampling, Sample size = n
– Each member of the population has an equal chance of being selected.
– Each sample of size n has an equal chance of being selected.
• Stratified sampling Population
Subgroup 4
Subgroup 1Subgroup 2Subgroup 3
sample
© Cengage Learning. All rights reserved. 1 | 13
Sampling Techniques • Systematic sampling
– Number every member of the population.– Select every kth member.
• Cluster sampling– Population is naturally divided into pre-
existing segments.– Make a random selection of clusters, then
select all members of each cluster.
• Convenience sampling - Collect sample data from a readily-available population database.
© Cengage Learning. All rights reserved. 1 | 14
Critical Thinking
Which of the following sampling strategies is likely to lead to a non-sampling error?
Individuals are selected at random from…a). A database of social security numbers.b). A cluster of phone books.c). A collection of birth certificates.d). None of these is likely to introduce non-
sampling error.
© Cengage Learning. All rights reserved. 1 | 15
Critical Thinking
Which of the following sampling strategies is likely to lead to a non-sampling error?
Individuals are selected at random from…a). A database of social security numbers.b). A cluster of phone books.c). A collection of birth certificates.d). None of these is likely to introduce non-sampling
error.Not everyone has a phone. Sampling from phone
books may introduce bias.
© Cengage Learning. All rights reserved. 1 | 16
Guidelines For Planning a Statistical Study
1. Identify individuals or objects of interest.2. Specify the variables.3. Determine if you will use the entire
population. If not, determine an appropriate sampling method
4. Determine a data collection plan, addressing privacy, ethics, and confidentiality if necessary.
© Cengage Learning. All rights reserved. 1 | 17
Guidelines For Planning a Statistical Study
5. Collect data.6. Analyze the data using appropriate statistical
methods.7. Note any concerns about the data and
recommend any remedies for further studies.
© Cengage Learning. All rights reserved. 1 | 18
Census vs. Sample
• In a census, measurements or observations are obtained from the entire population (uncommon and often impractical).
• In a sample, measurements or observations are obtained from part of the population (common).
© Cengage Learning. All rights reserved. 1 | 19
Observational Studies and Experiments
• Observational Study – Measurements are obtained in a way that does not change the response or the variable being measured. (No treatment is applied.)
• Experiment – A treatment is applied in order to observe its effect on the variable being measured.
© Cengage Learning. All rights reserved. 1 | 20
Experiment
• Used to determine the effect of a treatment.
• Experimental design needs to control for other possible causes of the effect.
– Placebo effect. – Lurking variables.
• To minimize these confounds, create one or more control groups that receive no treatment.
© Cengage Learning. All rights reserved. 1 | 21
Experiment Designs
• Double-Blinding – minimizes the unintentional transfer of bias between researcher and subject.
© Cengage Learning. All rights reserved. 1 | 22
Surveys• Collecting data from respondents by asking them
questions.
Survey Pitfalls• Nonresponse → undercoverage of population.• Truthfulness – respondents sometimes lie.• Faulty recall of respondent• Hidden bias – due to poor question wording.• Vague wording – “sometimes”, “often”, “seldom”• Interviewer influence – who is asking the
questions and in what manner.• Voluntary response – relatively interested
individuals are more likely to participate.
Chapter 2
Organizing Data
Understanding Basic Statistics Fifth Edition
By Brase and Brase Prepared by Jon Booze
© Cengage Learning. All rights reserved. 1 | 24
Frequency Tables
• A frequency table– organizes quantitative data.– partitions data into classes (intervals).– shows how many data values are in each
class.
Test Score Number of Students
61-70 4
71-80 8
81-90 15
91-100 7
© Cengage Learning. All rights reserved. 1 | 25
Data Classes and Class Frequency• Class: an interval of values.
– Example: 61 x 70
• Frequency: the number of data values that fall within a class.
– “Five data fall within the class 61 x 70”.
• Relative Frequency: the proportion of data values that fall within a class.
– “18% of the data fall within the class 61 x 70”.
© Cengage Learning. All rights reserved. 1 | 26
Structure of a Data ClassA “data class” is basically an interval on a number
line.
It has:• A lower limit a and an upper limit b.• A width.• A lower boundary and an upper boundary (integer data).• A midpoint.
© Cengage Learning. All rights reserved. 1 | 27
Structure of a Data ClassA “data class” is basically an interval on a number
line.
If a = 60 and b = 69 for integer data, what is the value of the lower boundary?
a). 60 b). 59.5
c). 9 d). 64.5
© Cengage Learning. All rights reserved. 1 | 28
Constructing Data Classes• Find the class width.
–
– Increase the computed value to the next higher whole number.
• Find the class limits. – The lower limit of the “leftmost” class is set
equal to the smallest value in the data set.
Largest data value – smallest data valueDesired number of classes
© Cengage Learning. All rights reserved. 1 | 29
Constructing Data Classes, cont’d
• Find the class boundaries (integer data).– Subtract 0.5 from the lower class limit and
add 0.5 to the upper class limit.
For a certain data set, the minimum value is 25 and the maximum value is 58. If you wish to partition the data into 5 classes, what would be the class width?
a). 5 b). 6 c). 7 d). 8
© Cengage Learning. All rights reserved. 1 | 30
Constructing Data Classes, cont’d
• Find the class boundaries (integer data).– Subtract 0.5 from the lower class limit and
add 0.5 to the upper class limit.
For a certain data set, the minimum value is 25 and the maximum value is 58. If you wish to partition the data into 5 classes, what would be the class width?
a). 5 b). 6 c). 7 d). 8
© Cengage Learning. All rights reserved. 1 | 31
Histograms• Histogram – graphical summary of a frequency
table.• Uses bars to plot the data classes versus the
class frequencies.
© Cengage Learning. All rights reserved. 1 | 32
Making a Histogram• Make a frequency table.
• Place class boundaries on horizontal axis. Place frequencies on vertical axis.
• For each class, draw a bar with height equal to the class frequency and width equal to the class width plus 1.
© Cengage Learning. All rights reserved. 1 | 33
Making a Histogram
© Cengage Learning. All rights reserved. 1 | 34
Distribution Shapes
Symmetric Uniform
Skewed Left Skewed Right
Bimodal
© Cengage Learning. All rights reserved. 1 | 35
Graphical Displays…
• … represent the data.
• … induce the viewer to think about the substance of the graphic.
• …should avoid distorting the message of the data.
© Cengage Learning. All rights reserved. 1 | 36
Bar Graphs
• Used for qualitative or quantitative data.
• Can be vertical or horizontal.
• Bars are uniformly spaced and have equal widths.
• Length/height of bars indicate counts or percentages of the variable.
• “Good practice” requires including titles and units and labeling axes.
© Cengage Learning. All rights reserved. 1 | 37
Bar Graphs
Example:
© Cengage Learning. All rights reserved. 1 | 38
Pareto Charts
• A bar chart with two specific features:
– Heights of bars represent frequencies.
– Bars are vertical and are ordered from tallest to shortest.
© Cengage Learning. All rights reserved. 1 | 39
Circle Graphs/Pie Charts• Used for qualitative data
• Wedges of the circle represent proportions of the data that share a common characteristic.
• “Good practice” requires including a title and either wedge labels or legend.
© Cengage Learning. All rights reserved. 1 | 40
Time-Series• Shows data measurements in chronological
order.
• Data are plotted in order of occurrence at regular intervals over a period of time.
© Cengage Learning. All rights reserved. 1 | 41
Critical Thinking – which type of graph to use?
• Bar graphs are useful for quantitative or qualitative data.
• Pareto charts identify the frequency in decreasing order.
• Circle graphs display how a total is dispersed into several categories.
• Time-series graphs display how data change over time.
© Cengage Learning. All rights reserved. 1 | 42
Critical Thinking – which type of graph to use?
What type of graph would be best for showing the ice cream flavor preferences of a group of 100 children?
a). Histogram b). Pareto graphc). Time series graph d). Circle graph
© Cengage Learning. All rights reserved. 1 | 43
Critical Thinking – which type of graph to use?
What type of graph would be best for showing the ice cream flavor preferences of a group of 100 children?
a). Histogram b). Pareto graphc). Time series graph d). Circle graph
© Cengage Learning. All rights reserved. 1 | 44
Stem and Leaf Plots• Displays the distribution of the data while
maintaining the actual data values.
• Each data value is split into a stem and a leaf.
© Cengage Learning. All rights reserved. 1 | 45
Stem and Leaf Plot Construction
© Cengage Learning. All rights reserved. 1 | 46
Critical Thinking
• Large gaps between stems containing leaves, especially at the top or bottom, suggest the existence of outliers.
• Watch the outliers – are they data errors or simply unusual data values?
Chapter 3
Averages and Variation
Understanding Basic Statistics Fifth Edition
By Brase and Brase Prepared by Jon Booze
© Cengage Learning. All rights reserved. 1 | 48
Measures of Central Tendency
• Average – a measure of the center value or central tendency of a distribution of values.
• Three types of average:– Mode– Median– Mean
© Cengage Learning. All rights reserved. 1 | 49
ModeThe mode is the most frequently occurring value in
a data set.
Example: Sixteen students are asked how many college math classes they have completed.
{0, 3, 2, 2, 1, 1, 0, 5, 1, 1, 0, 2, 2,
7, 1, 3}
The mode is 1.
© Cengage Learning. All rights reserved. 1 | 50
Median
Finding the median:1). Order the data from smallest to largest.
2). For an odd number of data values:Median = Middle data value
3). For an even number of data values:Sum of middle two valuesMedian
2
© Cengage Learning. All rights reserved. 1 | 51
Sample mean Population mean
Mean
xx
n x
N
© Cengage Learning. All rights reserved. 1 | 52
Resistant Measures of Central Tendency
• A resistant measure will not be affected by extreme values in the data set.
• The mean is not resistant to extreme values.
• The median is resistant to extreme values.
• A trimmed mean is also resistant.
© Cengage Learning. All rights reserved. 1 | 53
Critical Thinking
• Four levels of data – nominal, ordinal, interval, ratio (Chapter 1)
• Mode – can be used with all four levels.
• Median – may be used with ordinal, interval, of ratio level.
• Mean – may be used with interval or ratio level.
© Cengage Learning. All rights reserved. 1 | 54
Critical Thinking
• Mound-shaped data – values of mean, median and mode are nearly equal.
© Cengage Learning. All rights reserved. 1 | 55
Measures of Variation
• Range = Largest value – smallest value
Only two data values are used in the computation, so much of the information in the data is lost.
Three measures of variation: rangevariancestandard deviation
© Cengage Learning. All rights reserved. 1 | 56
The Coefficient of Variation
100x
sCV 100
CV
For Samples For Populations
© Cengage Learning. All rights reserved. 1 | 57
Percentiles and Quartiles
• For whole numbers P, 1 ≤ P ≤ 99, the Pth percentile of a distribution is a value such that P% of the data fall below it, and (100-P)% of the data fall at or above it.
• Q1 = 25th Percentile• Q2 = 50th Percentile = The Median• Q3 = 75th Percentile
© Cengage Learning. All rights reserved. 1 | 58
Quartiles and Interquartile Range (IQR)
© Cengage Learning. All rights reserved. 1 | 59
Computing Quartiles
© Cengage Learning. All rights reserved. 1 | 60
Five Number Summary
• A listing of the following statistics:
– Minimum, Q1, Median, Q3, Maximum
• Box-and-Whisder plot – represents the five-number summary graphically.
© Cengage Learning. All rights reserved. 1 | 61
Box-and-Whisker Plot Construction
© Cengage Learning. All rights reserved. 1 | 62
• Box-and-whisker plots display the spread of data about the median.
• If the median is centered and the whiskers are about the same length, then the data distribution is symmetric around the median.
• Fences – may be placed on either side of the box. Values lie beyond the fences are outliers. (See problem 10)
Critical Thinking
© Cengage Learning. All rights reserved. 1 | 63
Problems
• Pg. 109 #4, 5