Post on 01-Jan-2016
Chapter 8
Quantitative Data Analysis
MeaningfulInformation
QuantitativeAnalysis
Quantitative analysisQuantitative analysis is a scientific approach to answering questions
Raw data are processed and manipulated resulting in meaningful information
Raw Data
What is Quantitative Analysis?
Quantitative data analysis
Making sense of the numbers for meaningful interpretation
It involves:1.Organizing the data2.Doing the calculations3.Interpreting the information4.Explaining limitations
What are the Options for Summarizing Distributions?
•Measures of Central Tendency:•Mode•Median•Mean
What are the Options for Summarizing Distributions?
•Measures of Variation: •Range•Interquartile range•Variance •Standard deviation
The ModeThe most frequent value in a distribution.
In a distribution of Americans’religious affiliations, Protestant Christian is the most frequently occurring value—the largest single group.
Respondent's Religious Preference (GSS94)
RS RELIGIOUS PREFERENCE
OTHERNONEJEWISHCATHOLICPROTESTANT
Co
un
t
2000
1000
0
The MedianThe point that divides the distribution in half (the 50th percentile).
The median in a frequency distribution is determined by identifying the value corresponding to a cumulative percentage of 50.
HIGHEST YEAR OF SCHOOL COMPLETED
4 .1 .1 .1
1 .0 .0 .2
3 .1 .1 .3
6 .2 .2 .5
12 .4 .4 .9
15 .5 .5 1.4
19 .6 .6 2.0
29 1.0 1.0 3.0
109 3.6 3.6 6.6
85 2.8 2.8 9.5
102 3.4 3.4 12.9
168 5.6 5.6 18.5
929 31.0 31.1 49.6
277 9.3 9.3 58.9
321 10.7 10.7 69.6
146 4.9 4.9 74.5
433 14.5 14.5 89.0
97 3.2 3.2 92.2
119 4.0 4.0 96.2
46 1.5 1.5 97.8
64 2.1 2.1 99.9
3 .1 .1 100.0
2988 99.9 100.0
4 .1
2992 100.0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
DK
Total
Valid
NAPMissing
Total
Frequency Percent Valid PercentCumulative
Percent
The Mean
The mean is just the arithmetic average.
Mean = Sum of value of cases/number of cases
The Mean, cont’d
For example, to calculate the mean value of eight cases, add the values of all cases and divide by the
number of cases (N):
(28 + 117 + 42 + 10 + 77 + 51 + 64 + 55) /8 = 444/8 = 55.5
It is important to know that the median household income in the United States is a bit over $40,000 a year,
We need to know the Variation in income:The fact that incomes range from zero up to hundreds of millions of dollars
Measures of variation capture how widely or densely spread income (for instance) is.
Measures of Variation
Measures of Variation
•Four measures of variation for quantitative variables:
1. Range
2. Interquartile range
3. Variance
4. Standard deviation
The Range
Simplest measure of variation
Calculated as highest value in a distribution minus lowest value, plus 1:
Range = Highest value – Lowest value + 1
It often is important to report the range of a distribution, to identify the whole range of possible values that might be encountered.
The Range, cont’d.
Say that you surveyed 10 people, and asked them how many times they saw the movie Star Wars, and their answers looked like this:
The range for “times respondent saw Star Wars” is 20 – 0 + 1= 21.
The range can be drastically altered by one exceptionally high or low value (termed an outlier), it’s not a good summary measure for most purposes.
Number of times Respondent sawStar Wars:
0
2
2
3
4
4
5
20
2
1
Interquartile Range
The interquartile range avoids problem created by outliers, by showing the range where most cases lie.
Quartiles are points in a distribution corresponding to the first 25% of cases, the first 50% of cases, and first 75% of cases.
Interquartile Range• Star Wars example: Number of times respondents saw Star Wars,
• First 25% of cases fall within the range of 0 and 1.75 times.
• Second quartile fall within the range of 1.75 and 2.5 times.
• Third quartile falls within 2.5 and 4.25 times.
• Last quartile is between 4.25 and 20 times.
Interquartile Range, cont’d
Interquartile range is the difference between first quartile and third quartile (plus 1).
In Star Wars example, the interquartile range is
4.25 – 1.75 + 1 = 3.50
Variance
Statistical definition: The average squared deviation of each case from the mean; •Take each case’s distance from the mean, •Square that number, and•Take the average of all such numbers.
Variance
Takes into account the amount by which each case differs from the mean.
It is affected by outliers, such as the person who saw Star Wars 20 times.
Mainly useful for computing the standard deviation, which comes next.
Standard Deviation
The standard deviation is the square root of the variance. It is the square root of the average squared deviation of each case from the mean:
Symbol key: ¯ Y = mean; N = number of cases; S = sum over all cases; Yi = value of case i on variable Y; = square root.
Standard DeviationStandard deviation has mathematical properties that make it the preferred measure of variability, particularly when a variable is normally distributed.
Graph of a normal distribution looks like a bell, with one “hump” in the middle, centered around the population mean, and the number of cases tapering off on both sides of the mean.
Scores
85.065.045.0
10
8
6
4
2
0
Std. Dev = 12.67
Mean = 75.0
N = 25.00
Normal DistributionA normal distribution is symmetric: If you folded it in half at its center (at the population mean), the two halves would match perfectly.
If a variable is normally distributed, 68% of the cases (2/3) lie between plus and minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between 1.96 standard deviations above and below the mean.
Scores
85.065.045.0
10
8
6
4
2
0
Std. Dev = 12.67
Mean = 75.0
N = 25.00
Normal Distribution
•The normal curve is a tool used to tell how far the sample is likely to be off from the population
•How big a "margin of error" there is likely to be in the poll results.
Margin of error
• Price that researchers pay for not talking to everyone in population.
• Describes the range that the true answer likely falls between if researcher had talked to everyone instead of just a sample.
• http://www.custominsight.com/articles/random-sample-calculator.asp
Different Statistics for Different Data
Nominal Ordinal Interval/Ratio
Mode X X X
Median X X
Mean X X
Range X X
Interquartile Range X
Variance X
Standard Deviation X
Relationships between variables
Crosstabulation (cross tabs) display the distribution of one variable for each category in another variable
Also known as bivariate distribution
Cross tabs are presented first with frequencies and then with percentages
Crosstabulation of Voting in 2000 by Family Income: Cell Counts and Percentages
FAMILY INCOME: CELL COUNTS
Voting <$20,000 $20,000-$34,999 $35,000 - $59,999 $60,000+
Voted 178 239 364 761 Did not vote 182 135 168 193 Total (n) (360) (374) (532) (954)
FAMILY INCOME: PERCENTAGES
Voting <$20,000 $20,000-$34,999 $35,000 - $59,999 $60,000+
Voted 49% 64% 68% 80% Did not vote 51% 36% 32% 20% Total 100% 100% 100% 100%
Source: General Social Survey, 2004. Weighted.
Summary statistics describe particular features of a distribution and facilitate comparison among distributions.
The next step is to test for associations . . .
Which calculation do I use? It depends on what you want to know.
Do you want to know how many individuals checked each answer?
Frequency
Do you want the proportion of people who answered in a certain way?
Percentage
Do you want the average number or average score? Mean
Do you want the middle value in a range of values or scores?
Median
Do you want to show the range in answers or scores?
Range
Do you want to compare one group to another? Cross tab
Do you want to report changes from pre to post? Change score
Do you want to show the degree to which a response varies from the mean?
Standard deviation