Descriptive statistics I Distributions, summary statistics.

17
Descriptive statistics I Distributions, summary statistics

Transcript of Descriptive statistics I Distributions, summary statistics.

Page 1: Descriptive statistics I Distributions, summary statistics.

Descriptive statistics I

Distributions, summary statistics

Page 2: Descriptive statistics I Distributions, summary statistics.

Frequency distributions• Frequency means the number of cases at a single value of a variable

• A “distribution” depicts the frequency (number of cases) at every value of a variable

– Frequency distributions illustrate how values disperse– For categorical variables use a BAR graph– For continuous variables use a HISTOGRAM (also try AREA)

• Open DEMO PLUS.SAV

• For categorical choose variable SEX (1=Male, 2=Female)• For continuous choose variable AGE

• Open Height weight gender age.sav (or .xls), choose a categorical and continuous variable, display their distributions as above

Page 3: Descriptive statistics I Distributions, summary statistics.

Summarizing distributions• Producing a single statistic that best depicts a distribution• For categorical variables, use the statistic “proportion”

– Proportions with a base 100 are called a “percentage” (per 100)• For continuous variables, use a measure of central tendency

– The statistic “mean” (arithmetic average)– The statistic “median” (midpoint value – half of cases above, half below)– The statistic “mode” (most frequent value – can be more than one)

• Open DEMO PLUS.SAV– For categorical choose variable SEX (1=Male, 2=Female)

• Analyze|Descriptive Statistics|Frequencies• Ask for a Bar Chart

– For continuous choose variable AGE• Analyze|Descriptive Statistics|Frequencies• Ask for a Histogram

• Open Height weight gender age.sav (or .xls), choose a categorical and continuous variable, proceed as above

Page 4: Descriptive statistics I Distributions, summary statistics.

Categorical variables

• “Percent” is a summary statistic – it summarizes a distribution

• “Percent” – per cent – per hundred. 100 is always the denominator

• Increases in percentage are computed off the base amount:

Increase in jail population of 100 prisoners

• 100 percent increase - 100 percent of 100 is 100; 100 + 100 = 200

• 150 percent increase – 150 percent of 100 is 150, 150 plus 100 = 250

• 200 percent increase – 200 percent of 100 is 200, 200 plus 100= 300 (3 times the base amount)

Page 5: Descriptive statistics I Distributions, summary statistics.

• Percentages of less than 1 percent are described as a fraction

– Example - 0.2 percent is 2/10th of 1 percent

– Do not confuse decimals and percentages

• Decimal .20 = 20/100 = 20 percent

• Decimal .0020 = 20/10,000 = .20 percent

Page 6: Descriptive statistics I Distributions, summary statistics.

• Percentages (proportions) are usually the best way to summarize datasets using categorical variables

– 70 percent of students are employed

– 60 percent of parolees recidivate

• Percentages can be used to summarize findings when large numbers are involved

– 50,000 persons were asked whether crime is a serious problem: 32,700 said “yes”

Compute…

Page 7: Descriptive statistics I Distributions, summary statistics.

Divide 32,700 by 50,000 and multiply by 100

32,700 -------- = .65 .65 X 100 = 65% 50,000

Page 8: Descriptive statistics I Distributions, summary statistics.

• Percentages can be used to compare datasets

– This year, 65% of 10,000 people polled said crime is a serious problem

– Last year, 12,000 people were polled and 9,000 said crime is a serious problem

Compute…

Page 9: Descriptive statistics I Distributions, summary statistics.

9,000--------- = .75 .75 X 100= 75%12,000

• Because both samples were standardized (responses per 100 persons) they are directly comparable even though different numbers of persons were polled

– 65% v. 75%

Page 10: Descriptive statistics I Distributions, summary statistics.

• Percentages can magnify differences when raw numbers are small

• Percentages can deflate differences when numbers are large

– Increase from 1 to 3 convictions is …

– Increase from 5,000 to 6,000 convictions is …

Compute both...

Page 11: Descriptive statistics I Distributions, summary statistics.

• Increase from 1 to 3 convictions is 200 percent– 3-1 = 2

– 2/1 (base) X 100= 200%

• Increase from 5,000 to 6,000 convictions is 20 percent– 6,000 - 5,000 = 1000

– 1000/5000 (base) X 100= 20%

Page 12: Descriptive statistics I Distributions, summary statistics.

• Categorical variables – categories reflect an inherent rank or order

• Can summarize the distribution of an ordinal variable two ways:

– As a categorical variable, using proportions / percentages

– As a continuous variable, treating categories as points on a scale

• Assign a numerical value to each category and calculate a mean

• Open DEMO PLUS.SAV

– Variable “class” is ordinal

– Display and summarize the distribution both ways...

• As a categorical/ordinal variable

• As a continuous variable

Summarizing a distribution for ordinal variables

Page 13: Descriptive statistics I Distributions, summary statistics.

• If variables are continuous, can summarize a distribution with one or more measures of “central tendency”

– Mean, median, mode

• Mean: arithmetic average of scores

– Pulled in the direction of extreme scores– Experiment with Height weight gender age.sav

• Median: Middle score – half higher, half lower

– If there is an even number of scores, average the two center scores– If there is an odd number of scores, use the center score

• Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21• Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Continuous variables

Page 14: Descriptive statistics I Distributions, summary statistics.

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

Answer: 8

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Answer: 10

12-8 = 4 4/2 = 2 8+2 or 12-2 = 10

• Median is a useful summary statistic when there are extreme scores

– Extreme scores make the mean a misleading summary measure of a distribution

• Median can be used with continuous or ordinal variables

Page 15: Descriptive statistics I Distributions, summary statistics.

• Mode: Score that occurs most often (with the greatest frequency)

– There can be more than one mode (bi-modal, tri-modal, etc.)

• Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21• Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Page 16: Descriptive statistics I Distributions, summary statistics.

Exercise 1: 2, 3, 5, 5, 8, 12, 17, 19, 21

Mode = 5 (uni-modal)

Exercise 2: 2, 3, 5, 5, 8, 12, 17, 19, 21, 21

Modes = 5, 21 (bi-modal)

• Modes are a useful summary statistic for distributions where cases cluster at particular scores – an interesting condition that would be missed by the mean or median

Page 17: Descriptive statistics I Distributions, summary statistics.

Range

• Another way to describe a distribution of a continuous variable

– Not a measure of central tendency

• Range depicts the lowest and highest scores in a distribution

2, 3, 5, 5, 8, 12, 17, 19, 21

Range is 221 or 19 (21-2)