Chapter 8 Quantitative Data Analysis. Meaningful Information Quantitative Analysis Quantitative...

Post on 01-Jan-2016

271 views 8 download

Tags:

Transcript of Chapter 8 Quantitative Data Analysis. Meaningful Information Quantitative Analysis Quantitative...

Chapter 8

Quantitative Data Analysis

MeaningfulInformation

QuantitativeAnalysis

Quantitative analysisQuantitative analysis is a scientific approach to answering questions

Raw data are processed and manipulated resulting in meaningful information

Raw Data

What is Quantitative Analysis?

Quantitative data analysis

Making sense of the numbers for meaningful interpretation

It involves:1.Organizing the data2.Doing the calculations3.Interpreting the information4.Explaining limitations

What are the Options for Summarizing Distributions?

•Measures of Central Tendency:•Mode•Median•Mean

What are the Options for Summarizing Distributions?

•Measures of Variation: •Range•Interquartile range•Variance •Standard deviation

The ModeThe most frequent value in a distribution.

In a distribution of Americans’religious affiliations, Protestant Christian is the most frequently occurring value—the largest single group.

Respondent's Religious Preference (GSS94)

RS RELIGIOUS PREFERENCE

OTHERNONEJEWISHCATHOLICPROTESTANT

Co

un

t

2000

1000

0

The MedianThe point that divides the distribution in half (the 50th percentile).

The median in a frequency distribution is determined by identifying the value corresponding to a cumulative percentage of 50.

HIGHEST YEAR OF SCHOOL COMPLETED

4 .1 .1 .1

1 .0 .0 .2

3 .1 .1 .3

6 .2 .2 .5

12 .4 .4 .9

15 .5 .5 1.4

19 .6 .6 2.0

29 1.0 1.0 3.0

109 3.6 3.6 6.6

85 2.8 2.8 9.5

102 3.4 3.4 12.9

168 5.6 5.6 18.5

929 31.0 31.1 49.6

277 9.3 9.3 58.9

321 10.7 10.7 69.6

146 4.9 4.9 74.5

433 14.5 14.5 89.0

97 3.2 3.2 92.2

119 4.0 4.0 96.2

46 1.5 1.5 97.8

64 2.1 2.1 99.9

3 .1 .1 100.0

2988 99.9 100.0

4 .1

2992 100.0

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

DK

Total

Valid

NAPMissing

Total

Frequency Percent Valid PercentCumulative

Percent

The Mean

The mean is just the arithmetic average.

Mean = Sum of value of cases/number of cases

The Mean, cont’d

For example, to calculate the mean value of eight cases, add the values of all cases and divide by the

number of cases (N):

(28 + 117 + 42 + 10 + 77 + 51 + 64 + 55) /8 = 444/8 = 55.5

It is important to know that the median household income in the United States is a bit over $40,000 a year,

We need to know the Variation in income:The fact that incomes range from zero up to hundreds of millions of dollars

Measures of variation capture how widely or densely spread income (for instance) is.

Measures of Variation

Measures of Variation

•Four measures of variation for quantitative variables:

1. Range

2. Interquartile range

3. Variance

4. Standard deviation

The Range

Simplest measure of variation

Calculated as highest value in a distribution minus lowest value, plus 1:

Range = Highest value – Lowest value + 1

It often is important to report the range of a distribution, to identify the whole range of possible values that might be encountered.

The Range, cont’d.

Say that you surveyed 10 people, and asked them how many times they saw the movie Star Wars, and their answers looked like this:

The range for “times respondent saw Star Wars” is 20 – 0 + 1= 21.

The range can be drastically altered by one exceptionally high or low value (termed an outlier), it’s not a good summary measure for most purposes.

Number of times Respondent sawStar Wars:

0

2

2

3

4

4

5

20

2

1

Interquartile Range

The interquartile range avoids problem created by outliers, by showing the range where most cases lie.

Quartiles are points in a distribution corresponding to the first 25% of cases, the first 50% of cases, and first 75% of cases.

Interquartile Range• Star Wars example: Number of times respondents saw Star Wars,

• First 25% of cases fall within the range of 0 and 1.75 times.

• Second quartile fall within the range of 1.75 and 2.5 times.

• Third quartile falls within 2.5 and 4.25 times.

• Last quartile is between 4.25 and 20 times.

Interquartile Range, cont’d

Interquartile range is the difference between first quartile and third quartile (plus 1).

In Star Wars example, the interquartile range is

4.25 – 1.75 + 1 = 3.50

Variance

Statistical definition: The average squared deviation of each case from the mean; •Take each case’s distance from the mean, •Square that number, and•Take the average of all such numbers.

Variance

Takes into account the amount by which each case differs from the mean.

It is affected by outliers, such as the person who saw Star Wars 20 times.

Mainly useful for computing the standard deviation, which comes next.

Standard Deviation

The standard deviation is the square root of the variance. It is the square root of the average squared deviation of each case from the mean:

Symbol key: ¯ Y = mean; N = number of cases; S = sum over all cases; Yi = value of case i on variable Y; = square root.

Standard DeviationStandard deviation has mathematical properties that make it the preferred measure of variability, particularly when a variable is normally distributed.

Graph of a normal distribution looks like a bell, with one “hump” in the middle, centered around the population mean, and the number of cases tapering off on both sides of the mean.

Scores

85.065.045.0

10

8

6

4

2

0

Std. Dev = 12.67

Mean = 75.0

N = 25.00

Normal DistributionA normal distribution is symmetric: If you folded it in half at its center (at the population mean), the two halves would match perfectly.

If a variable is normally distributed, 68% of the cases (2/3) lie between plus and minus 1 standard deviation from the distribution’s mean, and 95% of the cases will lie between 1.96 standard deviations above and below the mean.

Scores

85.065.045.0

10

8

6

4

2

0

Std. Dev = 12.67

Mean = 75.0

N = 25.00

Normal Distribution

•The normal curve is a tool used to tell how far the sample is likely to be off from the population

•How big a "margin of error" there is likely to be in the poll results.

Margin of error

• Price that researchers pay for not talking to everyone in population.

• Describes the range that the true answer likely falls between if researcher had talked to everyone instead of just a sample.  

• http://www.custominsight.com/articles/random-sample-calculator.asp

Different Statistics for Different Data

Nominal Ordinal Interval/Ratio

Mode X X X

Median X X

Mean X X

Range X X

Interquartile Range X

Variance X

Standard Deviation X

Relationships between variables

Crosstabulation (cross tabs) display the distribution of one variable for each category in another variable

Also known as bivariate distribution

Cross tabs are presented first with frequencies and then with percentages

Crosstabulation of Voting in 2000 by Family Income: Cell Counts and Percentages

FAMILY INCOME: CELL COUNTS

Voting <$20,000 $20,000-$34,999 $35,000 - $59,999 $60,000+

Voted 178 239 364 761 Did not vote 182 135 168 193 Total (n) (360) (374) (532) (954)

FAMILY INCOME: PERCENTAGES

Voting <$20,000 $20,000-$34,999 $35,000 - $59,999 $60,000+

Voted 49% 64% 68% 80% Did not vote 51% 36% 32% 20% Total 100% 100% 100% 100%

Source: General Social Survey, 2004. Weighted.

Summary statistics describe particular features of a distribution and facilitate comparison among distributions.

The next step is to test for associations . . .

Which calculation do I use? It depends on what you want to know.

Do you want to know how many individuals checked each answer?

Frequency

Do you want the proportion of people who answered in a certain way?

Percentage

Do you want the average number or average score? Mean

Do you want the middle value in a range of values or scores?

Median

Do you want to show the range in answers or scores?

Range

Do you want to compare one group to another? Cross tab

Do you want to report changes from pre to post? Change score

Do you want to show the degree to which a response varies from the mean?

Standard deviation