Statistics trinity college

Post on 14-Jul-2015

232 views 0 download

Tags:

Transcript of Statistics trinity college

StatisticsStacy Cater

Question 1

11 31 18 13 11 3 1 1 6 1 4

4 - 6hours

6.5 hours

Histogram

Is chosen to represent “continuous numerical data”. That is data that represents a quantity where the numbers can take on any value in a certain range.

Distribution of Data

Positively Skewered Distribution

Also known as a skewered right distribution.

Negatively Skewered Distribution

Also known as a skewered left distribution.

Symmetric Distribution

If the values smaller and larger than its midpoint are mirror images of each other

Question 2

Standard Deviation

Two classes took a recent test. There were 10 students in each class, and each class had an average score of 81.5%

Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?

The answer is… No.

The average (mean) does not tell us anything about the distribution or variation in the grades.

Here are Dot-Plots of the grades in each class:

Mean

So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of

our data.

Why not just give an average and the range of data (the

highest and lowest values) to describe the distribution of

the data?

Well, for example, lets say from a set of data, the

average is 17.95 and the range is 23.

But what if the data looked like this:

Here is the average

And here is the range

But really, most of the numbers are in this area, and are

not evenly distributed

throughout the range.

The Standard Deviation is a number that measures how far away each number in a

set of data is from their mean.

If the Standard Deviation is large, it means the numbers are spread

out from their mean.

If the Standard Deviation is small, it means the numbers are close to

their mean.

Here are the scores

on the math

test for Team A:

72

76

80

80

81

83

84

85

85

89

Average: 81.5

The Standard Deviation measures how far away each number in a set of data is from their mean.

For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?

72 - 81.5 = - 9.5

- 9.5

- 9.5

Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?

89 - 81.5 = 7.5

7.5

So, the first step to

finding the Standard

Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

-9.5

7.5

Distance from Mean

So, the first step to

finding the Standard

Deviation is to find all the

distances from the mean.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

Next, you need to

square each of the

distances to turn them all into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

Distances Squared

Next, you need to

square each of the

distances to turn them all into positive

numbers

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Add up all of the distances

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

Divide by (n - 1) where n

represents the amount of

numbers you have.

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

Finally, take the Square Root of the

average distance

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

This is the Standard Deviation

72

76

80

80

81

83

84

85

85

89

- 9.5

- 5.5

- 1.5

- 1.5

- 0.5

1.5

2.5

3.5

3.5

7.5

Distance from Mean

90.25

30.25

2.25

2.25

0.25

2.25

6.25

12.25

12.25

56.25

Distances Squared

Sum:214.5

(10 - 1)

= 23.8

= 4.88

Now find the Standard

Deviation for the other

class grades

57

65

83

94

95

96

98

93

71

63

- 24.5

- 16.5

1.5

12.5

13.5

14.5

16.5

11.5

- 10.5

-18.5

Distance from Mean

600.25

272.25

2.25

156.25

182.25

210.25

272.25

132.25

110.25

342.25

Distances Squared

Sum:2280.5(10 - 1)

= 253.4

= 15.91

Now, lets compare the two classes again

Team A Team B

Average on the Test

Standard Deviation

81.5 81.5

4.88 15.91

You have to be able to calculate standard deviation using your

calculator!Try!Try!

Try using the scores for Team

A:

72

76

80

80

81

83

84

85

85

89

ANS: 4.88

Note:

Measures of central tendency (mean, mode&median) and variability are known as SUMMARY STATISTICS.

Question 3

Solution

93.725360

X 3601

= 824 schools

Try Some Questions

2011 Paper - Q 7 (i)

Q7 (ii)

Q 7 (b) (i)

Q 7 (c)

Say Bye to Univariate& Hello to

Bivariate DataBivariate DataBivariate Data

Two variables

Tied or paired together

Two - dimensional data

Bivariate Data

Deals with causes or relationships

The major purpose of bivariate analysis is to determine whether relationships exist.

Each observation is composed of..

National Institutes of Health (NIH)Sedentary activities (like Tv watching) are associated with an increase in obesity and an increase in the risk of diabetes in women.

Anger expression may be inversely related to the risk of heart attack and stroke. (Those who express anger may have a decreased risk).

Light to moderate drinking reduces the risk of heart disease in men.

News Reporters love to tell stories about the latest links!

Such as..

Does having her first baby later in life cause a woman to live longer? (New York Times)

‘Count Cricket Chirps to Gauge Temperature’(Garden Gate)

)What you have to do!

1. find a cricket2. count the number of times it chirps in 15 seconds

3. add 40

You’ve just predicted the temp. in degrees Fahrenfeit!

No. of Chirps in 15 sec Temperature (in degrees Fahrenheit)

18 57

20 60

21 64

23 65

27 68

30 71

34 74

39 77

Table 18-1 Cricket Chirps and Temperature Data (Excerpt)

Lets see another example!

A Press Release by Ohio State University Medical Center

The headline says that...

“aspirin can prevent polyps in colon cancer patients”

Raw Data for this StudyID NO. 22292 GROUP=ASPIRIN DEVELOPED POLYPS=NO

(635 LINES)

Table 18-2 Summary of Aspirin v’s Polyps Study Results

Group % Developing Polyps*

Aspirin 17Non-aspirin 27

*total sample size = 635 (approx were half randomly assigned to each person)

Scatter Plots

Bivariate Numerical Data

Two Dimensions

Horizontal dimension (x-axis)

Vertical dimension (y-axis)

Scatter Plot of cricket chirps versus outdoor temperature.

Interpreting a Scatterplot

you do this by looking for trends in the data as you go from left to right.

Positive linear relationship

Proportional relationship

As x increases (moves right one unit), y increases (moves up) a certain amount.

Negative linear relationship

Inverse relationship

As x increases, y decreases (moves down) a certain amount.

If the data don’t seem to resemble any kind of line (even a vague one) this means that no linear relationship exists.

Positive Linear Relationship

as the cricket chirps increase so does the temperature aswell.

Example

Age of Car

Value of Car (£)

Quantifying the Relationship

Quantify or measure the extent and nature of the relationship.

We have already seen how to measure the direction of a linear relationship BUT you will also have to decide on

the STRENGTH of the relationsbip!!

Introduce the...

Correlation CoefficientMeasures the strength and direction of the linear relationship between x and y (or the vertical and

horizontal dimension).

Calculating the C.C.

It is represented by the letter r

It has a value between - 1 and 1

You only have to be able to calculate it using your calculator-luckily for you!

If r is close to 1, then there is a strong positive correlation between two sets of data.

If r is close to -1, we say there is a strong negative correlation between the two sets.

If r is close to 0, then there is no correlation between the two sets.

Most statisticians like to see correlations above = 0.6 or below - 0.6.

Types of Correlation

It is important you state the Direction and the Strength of a Correlation

Correlation Coefficient = 0.99 Correlation coefficient = 0.5

A positive correlation means that high values of one variable are associated with high values of a second variable. The relationship between height and weight, between IQ scores and achievement test scores, and between self-concept and grades are examples of positive correlation.

Correlation Coefficient = - 0.99 Correlation Coefficient = - 0.5

A negative correlation or relationship means that high values of one variable are associated with low values of a second variable. Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.

No CORRELATION

Correlation Coefficient = -.16

Scatter Plot of cricket chirps versus outdoor temperature.

Correlation of 0.98!

Correlation versus Causation

The amount of fuel burned by a car depends on the size of its engine, since bigger engines burn more petrol. We say there is a CASUAL RELATIONSHIP between the amount of petrol used and the size of the cars engine.

If two variables are found to be either associated or correlated, that doesn’t necessarily mean that a cause-and-effect relationship exists between the two variables.

If we find a statistical relationship between two variables, then we cannot always conclude that one of the variables is the cause of the other, i.e. correlation does not always imply causality.

During 1980 and 2000 there was a large increase in sales of calculators and computers!

There was a strong positive correlation between the sales of computers and the sales of calculators!

For Example..

Did the increase of sales of calculators cause an increase in the sale of computers??

NO!!!!

Production Costs Decreased

Cost of Production was a third variable causing the other two to increase.

We call this third variable a LURKING VARIABLE.

Linear RegressionLine of Best Fit

After you’ve found a relationship between two variables and you have some way of quantifying this relationship,

you can create a model that allows you to use one variable to predict another.

1. Draw a Scatter Plot.2. If graph suggests a linear relationship..3. Calculate Correlation Coefficient.4. Find the equation of the Line that best fits the data.

- We draw this by eye, and then find its equation.

Because you have a strong correlation be it positive or negative you know that x is correlated with y.

If you know the slope and the y-intercept of that line, then you can plug in a value for x and predict the average value for y.

In other words, you can predict y from x.

You should never do a regression analysis unless you’ve already found a strong correlation (either pos. or neg.) between the two variables!

Now Calculate Line!

Equation: y = mx + c

M = slope y2-y2/x2-x1 where (x1,y1) and (x2,y2) are points on the line of best fit.

Substitute the m and one point into y-y1=m(x-x1).

0.98

Let’s Sum up!

Types of Sampling

Populations and Samples

Types of Sampling

Bias in Sampling

Reliability of Data

Collecting Data

Frequency Tables

Stem-and-Leaf Diagram

Back-to Back S & L

Histograms

Distribution of Data

Scatter Graph

Correlation

Correlation Coefficient

Causality

Linear Regression

2011 paper 2 Q 2

2013 paper 2 Q 7

1st= run 2nd= cycle 3rd=swim

25 mins

3.17 mins

no modal time but modal class.

2012 paper 2 Q 7