Chapter 5 Exploring Data: Distributions

22
Chapter 5 Exploring Data: Distributions February 9, 2010 Brandon Groeger

description

Chapter 5 Exploring Data: Distributions. February 9, 2010 Brandon Groeger. Outline. What is Statistics? Data Distributions Histograms Stemplots Mean, Median, and Quartiles Standard Deviation and Variance Normal Distribution Extensions and Applications Discussion. What is Statistics?. - PowerPoint PPT Presentation

Transcript of Chapter 5 Exploring Data: Distributions

Page 1: Chapter 5 Exploring Data: Distributions

Chapter 5Exploring Data: DistributionsFebruary 9, 2010Brandon Groeger

Page 2: Chapter 5 Exploring Data: Distributions

Outline1. What is Statistics?2. Data3. Distributions4. Histograms5. Stemplots6. Mean, Median, and Quartiles7. Standard Deviation and Variance8. Normal Distribution9. Extensions and Applications10.Discussion

Page 3: Chapter 5 Exploring Data: Distributions

What is Statistics?•“Statistics is the science of collecting,

organizing and interpreting data”

•Statistical inference is drawing conclusions from data.

Page 4: Chapter 5 Exploring Data: Distributions

Data•Data is information

about an individual or a group of individuals (a population).

•“A variable is any characteristic of a individual”

Name Height WeightJohn 71 in. 160 lbsBob 67 in. 150 lbsJane 64 in. 130 lbsFred 78 in. 180 lbs

Page 5: Chapter 5 Exploring Data: Distributions

Distribution•“The distribution of a variable tells us

what values the variable takes and how often it takes these values.”

•Graphical representations of data make seeing patterns easier.

Page 6: Chapter 5 Exploring Data: Distributions

Histograms

Weight (lbs)0

1

2

3

130-149150-169170-189

100 Die Rolls0

5

10

15

20

25

123456

Page 7: Chapter 5 Exploring Data: Distributions

Making a Histogram1. Step 1: Define a set of

equally sized classes2. Step 2: Determine the

number of individuals in each class.

3. Step 3: Draw the histogram

Name Height WeightJohn 71 in. 160 lbsBob 67 in. 150 lbsJane 64 in. 130 lbsFred 78 in. 180 lbs Height (in.)

0

1

2

60-6465-6970-7475-79

Page 8: Chapter 5 Exploring Data: Distributions

Interpreting Histograms• Look for patterns, shape,

the center, and spread.

• Distributions can be symmetric or skewed.

• An outlier is “an individual value that falls outside the overall pattern.”

49 54 59 64 69 74 79 84 890123456789

Height (in.)

Page 9: Chapter 5 Exploring Data: Distributions

Stemplots• 30 Test Scores(41, 52, 58, 63,

64, 65, 68, 70, 71, 71, 72, 75, 79, 82, 82, 83, 84, 85, 88, 89, 89, 90, 91, 92, 94, 98, 99, 100, 100, 100)

• In this stemplot the left column(the stem) represents the “tens place” of each test score and the right column(the leaf) represents the “ones place”.

• Stemplots can be easier to read and more detailed than Histograms for small amounts of data.

Test ScoresStem Leaf

01234 15 286 34587 0112598 2235468999 102489

10 00

Page 10: Chapter 5 Exploring Data: Distributions

Describing the Center: Mean• The mean of a set of data is the sum of the data divided by

the number of data points.

• Mean =

• Example: Heights (64, 67, 71, 78)

• Mean = (64 + 67 + 71 + 78)/4 = 280/4 = 70

nxxxx n

...21

Page 11: Chapter 5 Exploring Data: Distributions

Describing the Center: Median• “The median is the midpoint of a distribution, the number

such that half of the observations are smaller and the other half are larger.”

• Finding the median:1. Arrange the data in order from smallest to largest2. If the number of data points (n) is odd: median = the entry

(n+1)/23. If n is even: median = the average of entry (n/2) and (n+1)/2

• Example: 30 Test Scores(41, 52, 58, 63, 64, 65, 68, 70, 71, 71, 72, 75, 79, 82, 82, 83, 84, 85, 88, 89, 89, 90, 91, 92, 94, 98, 99, 100, 100, 100)

• Median = Average(82,83) = 82.5

Page 12: Chapter 5 Exploring Data: Distributions

Describing Spread: Quartiles• Quartiles divide a data set into four pieces, where each

quartile has one quarter of the data points.

• Finding the quartiles of a data set:1. Find the median of the set this is the half way point

(1/2) which is the 2nd quartile (2/4).2. Take all of the data points smaller than the median

and find their median this is the 1st quartile.3. Take all of the data points larger than the median and

find their median this is the 3rd quartile .

Page 13: Chapter 5 Exploring Data: Distributions

Five Number Summary• The five number summary of a distribution is the

minimum, the 3 quartiles, and the maximum written in order.

• Example: 30 Test Scores(41, 52, 58, 63, 64, 65, 68, 70, 71, 71, 72, 75, 79, 82, 82, 83, 84, 85, 88, 89, 89, 90, 91, 92, 94, 98, 99, 100, 100, 100)

• Minimum = 41, 1st Quartile = 70, Median = 2nd Quartile= 82.5,3rd Quartile = 91, Maximum = 100

Page 14: Chapter 5 Exploring Data: Distributions

Boxplots•“A boxplot is a graph of the five number

summary”

Test Scores0

20

40

60

80

100Maximum3rd QuartileMedian1st Quartile

Page 15: Chapter 5 Exploring Data: Distributions

Practice• Make a boxplot for the

following set of monthly S&P500 returns (-3.5%, -0.6% 4.8%, 1.1%, -8.6%, -1.0%, 1.2%, -9.1%, -16.9%, -7.5%, 0.8%, -8.6%, -11.0%, 8.5%, 9.4%, 5.3%, 0.0%, 7.4%, 3.4%, 3.6%, -2.0%, 5.7%, 1.8%)

• Minimum: -16.9%• 1st Quartile: -5.5%• Median: 0.8%• 3rd Quartile: 3.4%• Maximum: 9.4% -20.0%

-15.0%

-10.0%

-5.0%

0.0%

5.0%

10.0%

Page 16: Chapter 5 Exploring Data: Distributions

Describing Spread: Standard Deviation & Variance• “The variance (s2) of a set of observations is an average of

the squares of the deviations of the observations from their mean.”

• “The standard deviation (s) is the square root of the variance.”

• Note: Standard deviation is often calculated using n as the denominator instead of n-1. This is called Bessel’s correction, which corrects for bias.

1)(...)()( 2

22

22

12

n

xxxxxxs

Page 17: Chapter 5 Exploring Data: Distributions

Standard Deviation Example•Weights in lbs: (130, 150, 160, 180)

•Mean = 155 lbs

•Variance = s2 = ((130-155) 2 + (150-155) 2

+ (160-155) 2 + (180-155) 2 ) / (4-1) = 433.33

•Standard deviation = s = (433.33)1/2 = 20.82 lbs

Page 18: Chapter 5 Exploring Data: Distributions

Normal Distributions• A normal curve is the

graph of a normal distribution, which is one of many types of distributions.

• Many data sets including the height of humans roughly follow a normal distribution.

• 68-95-99.7 rule

A Normal Curve

Page 19: Chapter 5 Exploring Data: Distributions

Extensions•Other distributions

▫Uniform, Exponential, Gamma

•Regression analysis and fitting a trend line

•Other Statistics▫Geometric mean, Mode, Kurtosis

Page 20: Chapter 5 Exploring Data: Distributions

Applications•Manufacturing•Insurance•Investment/Banking•Marketing•Biology•Business Management•The Census

Page 21: Chapter 5 Exploring Data: Distributions

Trivia•Abraham Wald (1902-1950): Where should

extra armor be added to WWII combat aircraft?

•1999 Mars Climate Orbiter Crash

•22% of American high school students reported they smoke, but only 9.7% said that they smoked 20 out of the past 30 days.

Page 22: Chapter 5 Exploring Data: Distributions

Discussion•Questions?

•Can you think of other extensions or applications?

•How can you use statistics in everyday life?

•Homework: (7th edition) #9, 30a-b