Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator,...

26
Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by Michael Sullivan, III And the corresponding Test Generator from Pearson Revised 12/13

Transcript of Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator,...

Page 1: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Stats Review Chapters 3-4

Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success

Examples are taken from Statistics 4 E by Michael Sullivan, III

And the corresponding Test Generator from Pearson

Revised 12/13

Page 2: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Note:

This review is composed of questions the textbook and the test generator. This review is meant to highlight basic concepts from the course. It does not cover all concepts presented by your instructor. Refer back to your notes, unit objectives, handouts, etc. to further prepare for your exam. A copy of this review can be found at www.sctcc.edu/cas.

The final answers are displayed in red and the chapter/section number is the corner.

Page 3: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Mean, Median, Mode Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69,70

Mean: add up all the numbers and divide by the amount of numbers

71+74+ 67+ 64+72+71+65+66+69+70

10= 68.9

Median: Arrange numbers from smallest to largest then find the middle number

64, 65,66, 67,69,70,71, 71, 72, 74

The middle number is between 69 and 70. Finding the mean of these two numbers will give us the median which is 69.5

Mode: The most repeated number (can have none, one, or more than one)

The mode is 71. 3.1

Page 4: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Range and Sample Standard Deviation

Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69,70

Range: The highest number – the smallest number

74-64=10

Sample Standard Deviation: Use Excel or Calculator

You get 3.28

3.2

Page 5: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Five-Number Summary: Part 1 of 2

Data Set: 64, 65, 66, 67, 69, 70, 71, 71, 72, 74

Min: the smallest number 64

Max: the largest number 74

Median: the middle number 69.5

Q1: Find the median of the numbers from the minimum and the median (do not include the median)

These numbers are 64, 65, 66, 67, 69. The median of these numbers are 66. So Q1 is 66.

3.4

Page 6: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Five-Number Summary: Part 2 of 2

Q3: Find the median of the numbers from the median (not included) to the maximum.

These numbers are 70, 71, 71, 72, 74. The median of these numbers are 71. So Q1 is 71.

The five-number summary is

64 66 69.5 71 74

3.4

Page 7: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Upper and Lower Fences

Data Set: 71, 74, 67, 64, 72, 71, 65, 66, 69, 70

1) Find the IQR. IQR=Q3-Q1=71-66=5

2) Multiply the IQR by 1.5: 5*1.5=7.5

3) Upper Fence: Add 7.5 to Q3; 7.5+71=78.5

4) Lower Fence: Subtract 7.5 from Q1; 66-7.5=58.5

Any numbers outside this range would be considered outliers.

3.4

Page 8: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Construct a Boxplot Construct a box plot given the five-number summary

64 66 69.5 71 74 64 66 69.5 71 74

3.5

Page 9: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Relationship between Mean, Median and Shape

Match the shape to the relationship between mean and median.

Mean>Median Mean=Median Mean<Median

Skewed Left

Mean<Median

Symmetric

Mean=Median

Skewed right

Mean>Median

3.1 If data is skewed, median is more representative

of the typical observation.

Page 10: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Empirical Rule to Find the Probability: Part 1 of 2

At a tennis tournament, a statistician keeps track of every serve. She reported that the mean serve speed of a particular player was 104 mph and the standard deviation of the serve speeds was 8 mph. Assume that the statistician also gave us the information that the distribution of the serve speeds was bell shaped. Using the Empirical Rule, what proportion of the player's serves are expected to be between 112 mph and 120 mph?

3.2

Page 11: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Empirical Rule to Find the Probability: Part 1 of 2

Start by drawing the bell curve

Compare this to the empirical rule curve on page 149 or find the percents as described on page 149. We

can see that the probability is 13.5% or .135. *You could also find the z-scores, then find the probability. 3.2

Page 12: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Mean: Part 1 of 2 Days Frequency

1-2 2

3-4 21

5-6 20

7-8 10

9-10 30

Step 1: Find the midpoint of each class (days)

Days Class Midpoint, xi Frequency, fi

1-2 1 + 2

2= 1.5

2

3-4 3.5 21

5-6 5.5 20

7-8 7.5 10

9-10 9.5 30

3.3

Page 13: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Mean: Part 2 of 2

Step 2: Multiply the class midpoint by frequency

Days Class

Midpoint, xi

Frequency, fi

xifi

1-2 1 + 2

2= 1.5

2 1.5*2=3

3-4 3.5 21 73.5

5-6 5.5 20 110

7-8 7.5 10 75

9-10 9.5 30 285

Step 3: Find the sum of the xifi column: 3+73.3+110+75+285=546.5 Step 4: Find the sum of the frequency column: 83 Step 5: Divide the number from step 3 by the number from step 4:

546.5/83=6.6 The mean is 6.6.

3.3

Page 14: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Sample Standard Deviation: Part 1 of 2 (from previous problem)

Step 1: Find the mean (see previous 2 slides). Mean=6.6 Step 2: Complete the following table (the first three columns were done in the previous 2 slides and column 4 is the mean)

Days Class

Midpoint, xi

Frequency, fi

𝒙 xi-𝒙 (𝒙𝒊 − 𝒙 ) 𝟐𝒇𝒊

1-2 1 + 2

2= 1.5

2 6.6 1.5-6.6=-5.1 (−5.1)2∗ 2= 52.02

3-4 3.5 21 6.6 -3.1 201.81

5-6 5.5 20 6.6 -1.1 24.2

7-8 7.5 10 6.6 .9 8.1

9-10 9.5 30 6.6 2.9 252.3

3.3

Page 15: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Find the Sample Standard Deviation: Part 2 of 2 (from previous problem)

Days Class

Midpoint, xi

Frequency, fi 𝒙 xi-𝒙 (𝒙𝒊 − 𝒙 ) 𝟐𝒇𝒊

1-2 1 + 2

2= 1.5

2 6.6 1.5-6.6=-5.1 (−5.1)2∗ 2 = 52.02

3-4 3.5 21 6.6 -3.1 201.81

5-6 5.5 20 6.6 -1.1 24.2

7-8 7.5 10 6.6 .9 8.1

9-10 9.5 30 6.6 2.9 252.3

Step 3: Find the total of the (𝑥𝑖 − 𝑥 ) 2𝑓𝑖 column. Total =538.43 Step 4: Find the total of the frequency column. Total =83. Take this number and subtract 1. 83-1=82 Step 5: Take the number from Step 3 and divide by the number from step 4. 538.43/82=6.56622

Step 6: Take the square root of the number from step 5. 6.56622 = 2.6 3.3

Page 16: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Correlation Coefficient Match the Correlation with the Graph

r=-.6 r=0 r=.99 should not use correlation

0

2

4

6

8

10

12

0 2 4 6 8 10

r=.999

0

5

10

15

20

0 2 4 6 8 10

0

2

4

6

8

10

0 2 4 6 8 10

r=0

0

5

10

15

20

0 2 4 6 8 10

r=-.6

Should not use correlation

4.1

Page 17: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Least-Squares Regression Line: Part 1 of 3

The data are the average one-way commute times (in minutes) for selected students and the number of absences for those students during the term. a) Find the equation of the

regression line for the given data. Round the regression line values to the nearest hundredth.

b) What would be the predicted number of absences if the commute time was 40 minutes? Is this a reasonable question?

c) Interpret the Slope

Commute time (x)

Number of absences (y)

72 3

85 7

91 10

90 10

88 8

98 15

75 4

100 15

80 5

4.2

Page 18: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Least-Squares Regression Line: Part 2 of 3

a) Find the regression Line With Calculator 𝑦 = .45x − 30.3

By Hand First find the mean of x and y (labeled 𝑥 and 𝑦 ), the sample standard deviation of x and y (labeled 𝑠𝑥 and 𝑠𝑦), and the correlation (r).

𝑥 =86.556, 𝑦 =8.556, 𝑠𝑥=9.593, 𝑠𝑦=4.39, r=.98

Slope (b1)=r𝑠𝑦

𝑠𝑥=.98

4.39

9.593=.45

Y-Intercept (b0)= 𝑦 -b1 𝑥 =8.556-.45(86.556)=-30.3

The regression Line is 𝑦 = 𝑏1x + 𝑏𝑜 so ours is 𝑦 = .45x − 30.3

4.2

Page 19: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Least-Squares Regression Line: Part 3 of 3

b) What would be the predicted number of absences if the commute time was 40 minutes? Is this a reasonable question?

Tine is 40 minutes or x=40. Put this value into our least-squares regression line 𝑦 = .45(40) − 30.3=-12.3 This means that when the commute time is 40 minutes, than the number of absences is -12.3. This is not a reasonable question since 40 is outside the scope (i.e. 40 is not within the given range of x values).

c) Interpret the slope The slope is .45. This means that for every minute we increase our commute the number of absences increases by .45.

4.2

Page 20: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Sum Of Residuals: Part 1 of 2 Find the sum of residuals

Step 1: Find the least squares regression line (see previous slides) Step 2: Find the predicted y values (𝑦 ) for each x

Commute time (x) Number of absences

(y)

72 3

85 7

91 10

90 10

88 8

98 15

75 4

100 15

80 5

Predicted 𝒚

=.45(72)-30.3=2.1

7.95

10.65

10.2

9.3

13.8

3.45

14.7

5.7

4.3

Page 21: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Sum Of Residuals: Part 2 of 2 Step 3: Calculate the residuals: observed –predicted or y-𝑦 Step 4: Calculate the residuals squared: (observed –predicted)2 or (y-𝑦 )2

Step 5: Find the sum of the numbers in the column from step 4. This is the sum of residuals which equals 6.1875.

Commute time (x)

Number of absences (y)

Predicted 𝒚

72 3 2.1

85 7 7.95

91 10 10.65

90 10 10.2

88 8 9.3

98 15 13.8

75 4 3.45

100 15 14.7

80 5 5.7

Step 3 y-𝒚

.9

-.95

-.65

-.2

-1.3

1.2

.55

.3

-.7

Step 4 (y-𝒚 )2

.81

.9025

.4225

.04

1.69

1.44

.3025

.09

.49

4.3

Page 22: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Coefficient of Determination (R2)

If the coefficient of determination (R2) is 86.44% and the data shows a negative association, what is the linear correlation coefficient (r)?

𝑟 = 𝑟2 = .8644 = .9297

Since it has a negative association, r =-.9297.

Interpret R2 = 86.44% 86.44% of the variability in y (the response variable) is explained by the least-squares regression line.

4.3

Page 23: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Residual Plots

• What does the residual plot to the right suggest?

– There is an outlier

• Removing the outlier, what does the residual plot suggest?

– No pattern, linear model is appropriate

4.3

-14

-12

-10

-8

-6

-4

-2

0

2

4

6

0 2 4 6 8 10

Page 24: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Contingency Tables: Part 1 of 3 Is there an association between party affliction and gender? The following represents the gender and party affliction of registered voters based on random sample 802 adults.

Female Male

Republican 105 115

Democratic 150 103

Independent 150 179

a) Construct a frequency marginal distribution b) Construct a relative frequency marginal

distribution c) Construct a conditional distribution of

party affiliation by gender d) Is gender associated with party affiliation?

If so, how?

4.4

Page 25: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Contingency Tables: Part 2 of 3 a) Construct a frequency marginal distribution

Gender frequency marginal distribution Party Female Male

Republican 105 115 =105+115=220

Democratic 150 103 253

Independent 150 179 329

=105+150+150=405 397 802

To do: Find the total for each row and column

b) Construct a relative frequency marginal distribution

Gender relative frequency marginal distribution Party Female Male

Republican 105 115 .274

Democratic 150 103 .315

Independent 150 179 .41

=405/802=.505 .495 1

To do: Divide the row/column total by the sample size

4.4

Page 26: Stats Review Chapters 3-4 · Stats Review Chapters 3-4 Created by Teri Johnson Math Coordinator, Mary Stangler Center for Academic Success Examples are taken from Statistics 4 E by

Contingency Tables: Part 3 of 3 c) Construct a conditional distribution of party affiliation by gender

Gender

Party Female Male

Republican =105/405=.259 =115/397=.290

Democratic .370 .259

Independent .370 .451

Total 1 1

To do: Divide the each cell by its column total

d) Is gender associated with party affiliation? If so, how? Yes; males are more likely to be Independents and less likely to be democrats.

4.4