Statistics trinity college
-
Upload
stacy-carter -
Category
Education
-
view
232 -
download
0
Transcript of Statistics trinity college
StatisticsStacy Cater
Question 1
11 31 18 13 11 3 1 1 6 1 4
4 - 6hours
6.5 hours
Histogram
Is chosen to represent “continuous numerical data”. That is data that represents a quantity where the numbers can take on any value in a certain range.
Distribution of Data
Positively Skewered Distribution
Also known as a skewered right distribution.
Negatively Skewered Distribution
Also known as a skewered left distribution.
Symmetric Distribution
If the values smaller and larger than its midpoint are mirror images of each other
Question 2
Standard Deviation
Two classes took a recent test. There were 10 students in each class, and each class had an average score of 81.5%
Since the averages are the same, can we assume that the students in both classes all did pretty much the same on the exam?
The answer is… No.
The average (mean) does not tell us anything about the distribution or variation in the grades.
Here are Dot-Plots of the grades in each class:
Mean
So, we need to come up with some way of measuring not just the average, but also the spread of the distribution of
our data.
Why not just give an average and the range of data (the
highest and lowest values) to describe the distribution of
the data?
Well, for example, lets say from a set of data, the
average is 17.95 and the range is 23.
But what if the data looked like this:
Here is the average
And here is the range
But really, most of the numbers are in this area, and are
not evenly distributed
throughout the range.
The Standard Deviation is a number that measures how far away each number in a
set of data is from their mean.
If the Standard Deviation is large, it means the numbers are spread
out from their mean.
If the Standard Deviation is small, it means the numbers are close to
their mean.
Here are the scores
on the math
test for Team A:
72
76
80
80
81
83
84
85
85
89
Average: 81.5
The Standard Deviation measures how far away each number in a set of data is from their mean.
For example, start with the lowest score, 72. How far away is 72 from the mean of 81.5?
72 - 81.5 = - 9.5
- 9.5
- 9.5
Or, start with the lowest score, 89. How far away is 89 from the mean of 81.5?
89 - 81.5 = 7.5
7.5
So, the first step to
finding the Standard
Deviation is to find all the
distances from the mean.
72
76
80
80
81
83
84
85
85
89
-9.5
7.5
Distance from Mean
So, the first step to
finding the Standard
Deviation is to find all the
distances from the mean.
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
Next, you need to
square each of the
distances to turn them all into positive
numbers
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
Distances Squared
Next, you need to
square each of the
distances to turn them all into positive
numbers
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Add up all of the distances
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
Divide by (n - 1) where n
represents the amount of
numbers you have.
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
Finally, take the Square Root of the
average distance
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
This is the Standard Deviation
72
76
80
80
81
83
84
85
85
89
- 9.5
- 5.5
- 1.5
- 1.5
- 0.5
1.5
2.5
3.5
3.5
7.5
Distance from Mean
90.25
30.25
2.25
2.25
0.25
2.25
6.25
12.25
12.25
56.25
Distances Squared
Sum:214.5
(10 - 1)
= 23.8
= 4.88
Now find the Standard
Deviation for the other
class grades
57
65
83
94
95
96
98
93
71
63
- 24.5
- 16.5
1.5
12.5
13.5
14.5
16.5
11.5
- 10.5
-18.5
Distance from Mean
600.25
272.25
2.25
156.25
182.25
210.25
272.25
132.25
110.25
342.25
Distances Squared
Sum:2280.5(10 - 1)
= 253.4
= 15.91
Now, lets compare the two classes again
Team A Team B
Average on the Test
Standard Deviation
81.5 81.5
4.88 15.91
You have to be able to calculate standard deviation using your
calculator!Try!Try!
Try using the scores for Team
A:
72
76
80
80
81
83
84
85
85
89
ANS: 4.88
Note:
Measures of central tendency (mean, mode&median) and variability are known as SUMMARY STATISTICS.
Question 3
Solution
93.725360
X 3601
= 824 schools
Try Some Questions
2011 Paper - Q 7 (i)
Q7 (ii)
Q 7 (b) (i)
Q 7 (c)
Say Bye to Univariate& Hello to
Bivariate DataBivariate DataBivariate Data
Two variables
Tied or paired together
Two - dimensional data
Bivariate Data
Deals with causes or relationships
The major purpose of bivariate analysis is to determine whether relationships exist.
Each observation is composed of..
National Institutes of Health (NIH)Sedentary activities (like Tv watching) are associated with an increase in obesity and an increase in the risk of diabetes in women.
Anger expression may be inversely related to the risk of heart attack and stroke. (Those who express anger may have a decreased risk).
Light to moderate drinking reduces the risk of heart disease in men.
News Reporters love to tell stories about the latest links!
Such as..
Does having her first baby later in life cause a woman to live longer? (New York Times)
‘Count Cricket Chirps to Gauge Temperature’(Garden Gate)
)What you have to do!
1. find a cricket2. count the number of times it chirps in 15 seconds
3. add 40
You’ve just predicted the temp. in degrees Fahrenfeit!
No. of Chirps in 15 sec Temperature (in degrees Fahrenheit)
18 57
20 60
21 64
23 65
27 68
30 71
34 74
39 77
Table 18-1 Cricket Chirps and Temperature Data (Excerpt)
Lets see another example!
A Press Release by Ohio State University Medical Center
The headline says that...
“aspirin can prevent polyps in colon cancer patients”
Raw Data for this StudyID NO. 22292 GROUP=ASPIRIN DEVELOPED POLYPS=NO
(635 LINES)
Table 18-2 Summary of Aspirin v’s Polyps Study Results
Group % Developing Polyps*
Aspirin 17Non-aspirin 27
*total sample size = 635 (approx were half randomly assigned to each person)
Scatter Plots
Bivariate Numerical Data
Two Dimensions
Horizontal dimension (x-axis)
Vertical dimension (y-axis)
Scatter Plot of cricket chirps versus outdoor temperature.
Interpreting a Scatterplot
you do this by looking for trends in the data as you go from left to right.
Positive linear relationship
Proportional relationship
As x increases (moves right one unit), y increases (moves up) a certain amount.
Negative linear relationship
Inverse relationship
As x increases, y decreases (moves down) a certain amount.
If the data don’t seem to resemble any kind of line (even a vague one) this means that no linear relationship exists.
Positive Linear Relationship
as the cricket chirps increase so does the temperature aswell.
Example
Age of Car
Value of Car (£)
Quantifying the Relationship
Quantify or measure the extent and nature of the relationship.
We have already seen how to measure the direction of a linear relationship BUT you will also have to decide on
the STRENGTH of the relationsbip!!
Introduce the...
Correlation CoefficientMeasures the strength and direction of the linear relationship between x and y (or the vertical and
horizontal dimension).
Calculating the C.C.
It is represented by the letter r
It has a value between - 1 and 1
You only have to be able to calculate it using your calculator-luckily for you!
If r is close to 1, then there is a strong positive correlation between two sets of data.
If r is close to -1, we say there is a strong negative correlation between the two sets.
If r is close to 0, then there is no correlation between the two sets.
Most statisticians like to see correlations above = 0.6 or below - 0.6.
Types of Correlation
It is important you state the Direction and the Strength of a Correlation
Correlation Coefficient = 0.99 Correlation coefficient = 0.5
A positive correlation means that high values of one variable are associated with high values of a second variable. The relationship between height and weight, between IQ scores and achievement test scores, and between self-concept and grades are examples of positive correlation.
Correlation Coefficient = - 0.99 Correlation Coefficient = - 0.5
A negative correlation or relationship means that high values of one variable are associated with low values of a second variable. Examples of negative correlations include those between exercise and heart failure, between successful test performance and feelings of incompetence, and between absence from school and school achievement.
No CORRELATION
Correlation Coefficient = -.16
Scatter Plot of cricket chirps versus outdoor temperature.
Correlation of 0.98!
Correlation versus Causation
The amount of fuel burned by a car depends on the size of its engine, since bigger engines burn more petrol. We say there is a CASUAL RELATIONSHIP between the amount of petrol used and the size of the cars engine.
If two variables are found to be either associated or correlated, that doesn’t necessarily mean that a cause-and-effect relationship exists between the two variables.
If we find a statistical relationship between two variables, then we cannot always conclude that one of the variables is the cause of the other, i.e. correlation does not always imply causality.
During 1980 and 2000 there was a large increase in sales of calculators and computers!
There was a strong positive correlation between the sales of computers and the sales of calculators!
For Example..
Did the increase of sales of calculators cause an increase in the sale of computers??
NO!!!!
Production Costs Decreased
Cost of Production was a third variable causing the other two to increase.
We call this third variable a LURKING VARIABLE.
Linear RegressionLine of Best Fit
After you’ve found a relationship between two variables and you have some way of quantifying this relationship,
you can create a model that allows you to use one variable to predict another.
1. Draw a Scatter Plot.2. If graph suggests a linear relationship..3. Calculate Correlation Coefficient.4. Find the equation of the Line that best fits the data.
- We draw this by eye, and then find its equation.
Because you have a strong correlation be it positive or negative you know that x is correlated with y.
If you know the slope and the y-intercept of that line, then you can plug in a value for x and predict the average value for y.
In other words, you can predict y from x.
You should never do a regression analysis unless you’ve already found a strong correlation (either pos. or neg.) between the two variables!
Now Calculate Line!
Equation: y = mx + c
M = slope y2-y2/x2-x1 where (x1,y1) and (x2,y2) are points on the line of best fit.
Substitute the m and one point into y-y1=m(x-x1).
0.98
Let’s Sum up!
Types of Sampling
Populations and Samples
Types of Sampling
Bias in Sampling
Reliability of Data
Collecting Data
Frequency Tables
Stem-and-Leaf Diagram
Back-to Back S & L
Histograms
Distribution of Data
Scatter Graph
Correlation
Correlation Coefficient
Causality
Linear Regression
2011 paper 2 Q 2
2013 paper 2 Q 7
1st= run 2nd= cycle 3rd=swim
25 mins
3.17 mins
no modal time but modal class.
2012 paper 2 Q 7