Statistics trinity college

106
Statistics Stacy Cater

Transcript of Statistics trinity college

Page 1: Statistics trinity college

StatisticsStacy Cater

Page 2: Statistics trinity college

Question 1

Page 3: Statistics trinity college

11 31 18 13 11 3 1 1 6 1 4

4 - 6hours

6.5 hours

Page 4: Statistics trinity college

Histogram

Is chosen to represent “continuous numerical data”. That is data that represents a quantity where the numbers can take on any value in a certain range.

Page 5: Statistics trinity college
Page 6: Statistics trinity college

Distribution of Data

Page 7: Statistics trinity college

Positively Skewered Distribution

Also known as a skewered right distribution.

Page 8: Statistics trinity college

Negatively Skewered Distribution

Also known as a skewered left distribution.

Page 9: Statistics trinity college

Symmetric Distribution

If the values smaller and larger than its midpoint are mirror images of each other

Page 10: Statistics trinity college

Question 2

Page 11: Statistics trinity college

Standard Deviation

Page 12: Statistics trinity college

Two classes took a recent test. There were 10 students in each class, and each class had an average score of 81.5%

Page 13: Statistics trinity college

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?

Page 14: Statistics trinity college

The answer is… No.

The average (mean) does not tell us anything about the distribution or variation in the grades.

Page 15: Statistics trinity college

Here are Dot-Plots of the grades in each class:

Page 16: Statistics trinity college

Mean

Page 17: Statistics trinity college

So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of

our data.

Page 18: Statistics trinity college

Why not just give an average and the range of data (the

highest and lowest values) to describe the distribution of

the data?

Page 19: Statistics trinity college

Well, for example, lets say from a set of data, the

average is 17.95 and the range is 23.

But what if the data looked like this:

Page 20: Statistics trinity college

Here is the average

And here is the range

But really, most of the numbers are in this area, and are

not evenly distributed

throughout the range.

Page 21: Statistics trinity college

The Standard Deviation is a number that measures how far away each number in a

set of data is from their mean.

Page 22: Statistics trinity college

If the Standard Deviation is large, it means the numbers are spread

out from their mean.

If the Standard Deviation is small, it means the numbers are close to

their mean.

Page 23: Statistics trinity college

Here are the scores

on the math

test for Team A:

72

76

80

80

81

83

84

85

85

89

Average: 81.5

Page 24: Statistics trinity college

The Standard Deviation measures how far away each number in a set of data is from their mean.

For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?

72 - 81.5 = - 9.5

- 9.5

Page 25: Statistics trinity college

- 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?

89 - 81.5 = 7.5

7.5

Page 26: Statistics trinity college

So, the first step to

finding the Standard

Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

-9.5

7.5

Distance from Mean

Page 27: Statistics trinity college

So, the first step to

finding the Standard

Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

Page 28: Statistics trinity college

Next, you need to

square each of the

distances to turn them all into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

Distances Squared

Page 29: Statistics trinity college

Next, you need to

square each of the

distances to turn them all into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Page 30: Statistics trinity college

Add up all of the distances

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

Page 31: Statistics trinity college

Divide by (n - 1) where n

represents the amount of

numbers you have.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

Page 32: Statistics trinity college

Finally, take the Square Root of the

average distance

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

Page 33: Statistics trinity college

This is the Standard Deviation

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

Page 34: Statistics trinity college

Now find the Standard

Deviation for the other

class grades

57

65

83

94

95

96

98

93

71

63

- 24.5

- 16.5

1.5

12.5

13.5

14.5

16.5

11.5

- 10.5

-18.5

Distance from Mean

600.25

272.25

2.25

156.25

182.25

210.25

272.25

132.25

110.25

342.25

Distances Squared

Sum:2280.5(10 - 1)

= 253.4

= 15.91

Page 35: Statistics trinity college

Now, lets compare the two classes again

Team A Team B

Average on the Test

Standard Deviation

81.5 81.5

4.88 15.91

Page 36: Statistics trinity college

You have to be able to calculate standard deviation using your

calculator!Try!Try!

Page 37: Statistics trinity college

Try using the scores for Team

A:

72

76

80

80

81

83

84

85

85

89

ANS: 4.88

Page 38: Statistics trinity college

Note:

Measures of central tendency (mean, mode&median) and variability are known as SUMMARY STATISTICS.

Page 39: Statistics trinity college

Question 3

Page 40: Statistics trinity college

Solution

93.725360

X 3601

= 824 schools

Page 41: Statistics trinity college

Try Some Questions

Page 42: Statistics trinity college

2011 Paper - Q 7 (i)

Page 43: Statistics trinity college

Q7 (ii)

Page 44: Statistics trinity college

Q 7 (b) (i)

Page 45: Statistics trinity college

Q 7 (c)

Page 46: Statistics trinity college
Page 47: Statistics trinity college
Page 48: Statistics trinity college

Say Bye to Univariate& Hello to

Bivariate DataBivariate DataBivariate Data

Page 49: Statistics trinity college

Two variables

Tied or paired together

Two - dimensional data

Bivariate Data

Deals with causes or relationships

The major purpose of bivariate analysis is to determine whether relationships exist.

Each observation is composed of..

Page 50: Statistics trinity college

National Institutes of Health (NIH)Sedentary activities (like Tv watching) are associated with an increase in obesity and an increase in the risk of diabetes in women.

Anger expression may be inversely related to the risk of heart attack and stroke. (Those who express anger may have a decreased risk).

Light to moderate drinking reduces the risk of heart disease in men.

Page 51: Statistics trinity college

News Reporters love to tell stories about the latest links!

Such as..

Does having her first baby later in life cause a woman to live longer? (New York Times)

Page 52: Statistics trinity college

‘Count Cricket Chirps to Gauge Temperature’(Garden Gate)

)What you have to do!

1. find a cricket2. count the number of times it chirps in 15 seconds

3. add 40

You’ve just predicted the temp. in degrees Fahrenfeit!

Page 53: Statistics trinity college
Page 54: Statistics trinity college

No. of Chirps in 15 sec Temperature (in degrees Fahrenheit)

18 57

20 60

21 64

23 65

27 68

30 71

34 74

39 77

Table 18-1 Cricket Chirps and Temperature Data (Excerpt)

Page 55: Statistics trinity college

Lets see another example!

Page 56: Statistics trinity college

A Press Release by Ohio State University Medical Center

The headline says that...

“aspirin can prevent polyps in colon cancer patients”

Page 57: Statistics trinity college

Raw Data for this StudyID NO. 22292 GROUP=ASPIRIN DEVELOPED POLYPS=NO

(635 LINES)

Table 18-2 Summary of Aspirin v’s Polyps Study Results

Group % Developing Polyps*

Aspirin 17Non-aspirin 27

*total sample size = 635 (approx were half randomly assigned to each person)

Page 58: Statistics trinity college

Scatter Plots

Bivariate Numerical Data

Two Dimensions

Horizontal dimension (x-axis)

Vertical dimension (y-axis)

Page 59: Statistics trinity college

Scatter Plot of cricket chirps versus outdoor temperature.

Page 60: Statistics trinity college

Interpreting a Scatterplot

you do this by looking for trends in the data as you go from left to right.

Page 61: Statistics trinity college

Positive linear relationship

Proportional relationship

As x increases (moves right one unit), y increases (moves up) a certain amount.

Page 62: Statistics trinity college

Negative linear relationship

Inverse relationship

As x increases, y decreases (moves down) a certain amount.

Page 63: Statistics trinity college

If the data don’t seem to resemble any kind of line (even a vague one) this means that no linear relationship exists.

Page 64: Statistics trinity college

Positive Linear Relationship

as the cricket chirps increase so does the temperature aswell.

Page 65: Statistics trinity college

Example

Age of Car

Value of Car (£)

Page 66: Statistics trinity college

Quantifying the Relationship

Quantify or measure the extent and nature of the relationship.

Page 67: Statistics trinity college

We have already seen how to measure the direction of a linear relationship BUT you will also have to decide on

the STRENGTH of the relationsbip!!

Introduce the...

Page 68: Statistics trinity college

Correlation CoefficientMeasures the strength and direction of the linear relationship between x and y (or the vertical and

horizontal dimension).

Page 69: Statistics trinity college

Calculating the C.C.

It is represented by the letter r

It has a value between - 1 and 1

You only have to be able to calculate it using your calculator-luckily for you!

Page 70: Statistics trinity college

If r is close to 1, then there is a strong positive correlation between two sets of data.

If r is close to -1, we say there is a strong negative correlation between the two sets.

If r is close to 0, then there is no correlation between the two sets.

Most statisticians like to see correlations above = 0.6 or below - 0.6.

Page 71: Statistics trinity college

Types of Correlation

Page 72: Statistics trinity college

It is important you state the Direction and the Strength of a Correlation

Correlation Coefficient = 0.99 Correlation coefficient = 0.5

Page 73: Statistics trinity college

A positive correlation means that high values of one variable are associated with high values of a second variable. The relationship between height and weight, between IQ scores and achievement test scores, and between self-concept and grades are examples of positive correlation.

Page 74: Statistics trinity college

Correlation Coefficient = - 0.99 Correlation Coefficient = - 0.5

Page 75: Statistics trinity college

A negative correlation or relationship means that high values of one variable are associated with low values of a second variable. Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.

Page 76: Statistics trinity college

No CORRELATION

Correlation Coefficient = -.16

Page 77: Statistics trinity college

Scatter Plot of cricket chirps versus outdoor temperature.

Page 78: Statistics trinity college

Correlation of 0.98!

Page 79: Statistics trinity college

Correlation versus Causation

Page 80: Statistics trinity college

The amount of fuel burned by a car depends on the size of its engine, since bigger engines burn more petrol. We say there is a CASUAL RELATIONSHIP between the amount of petrol used and the size of the cars engine.

Page 81: Statistics trinity college

If two variables are found to be either associated or correlated, that doesn’t necessarily mean that a cause-and-effect relationship exists between the two variables.

If we find a statistical relationship between two variables, then we cannot always conclude that one of the variables is the cause of the other, i.e. correlation does not always imply causality.

Page 82: Statistics trinity college
Page 83: Statistics trinity college

During 1980 and 2000 there was a large increase in sales of calculators and computers!

There was a strong positive correlation between the sales of computers and the sales of calculators!

For Example..

Did the increase of sales of calculators cause an increase in the sale of computers??

Page 84: Statistics trinity college

NO!!!!

Production Costs Decreased

Cost of Production was a third variable causing the other two to increase.

We call this third variable a LURKING VARIABLE.

Page 85: Statistics trinity college

Linear RegressionLine of Best Fit

Page 86: Statistics trinity college

After you’ve found a relationship between two variables and you have some way of quantifying this relationship,

you can create a model that allows you to use one variable to predict another.

Page 87: Statistics trinity college

1. Draw a Scatter Plot.2. If graph suggests a linear relationship..3. Calculate Correlation Coefficient.4. Find the equation of the Line that best fits the data.

- We draw this by eye, and then find its equation.

Page 88: Statistics trinity college

Because you have a strong correlation be it positive or negative you know that x is correlated with y.

If you know the slope and the y-intercept of that line, then you can plug in a value for x and predict the average value for y.

In other words, you can predict y from x.

You should never do a regression analysis unless you’ve already found a strong correlation (either pos. or neg.) between the two variables!

Page 89: Statistics trinity college
Page 90: Statistics trinity college

Now Calculate Line!

Page 91: Statistics trinity college

Equation: y = mx + c

M = slope y2-y2/x2-x1 where (x1,y1) and (x2,y2) are points on the line of best fit.

Substitute the m and one point into y-y1=m(x-x1).

Page 92: Statistics trinity college
Page 93: Statistics trinity college

0.98

Page 94: Statistics trinity college
Page 95: Statistics trinity college

Let’s Sum up!

Types of Sampling

Populations and Samples

Types of Sampling

Bias in Sampling

Reliability of Data

Collecting Data

Page 96: Statistics trinity college

Frequency Tables

Stem-and-Leaf Diagram

Back-to Back S & L

Histograms

Distribution of Data

Page 97: Statistics trinity college

Scatter Graph

Correlation

Correlation Coefficient

Causality

Linear Regression

Page 98: Statistics trinity college

2011 paper 2 Q 2

Page 99: Statistics trinity college
Page 100: Statistics trinity college

2013 paper 2 Q 7

Page 101: Statistics trinity college

1st= run 2nd= cycle 3rd=swim

25 mins

3.17 mins

no modal time but modal class.

Page 102: Statistics trinity college
Page 103: Statistics trinity college
Page 104: Statistics trinity college

2012 paper 2 Q 7

Page 105: Statistics trinity college
Page 106: Statistics trinity college