ContentDetail Two variable statistics involves discovering if two variables are related or linked...

21
Section 6.7- Scatter Diagrams and Linear Correlation IB Mathematical Studies SL

description

 Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine income? - Is there a link between foot size and the height of a person?  One variable is independent (x-axis) whilst the other is dependent (y-axis)  This section of statistics involves conclusions that can be made about data that has not been collected using data that has been collected.  Hence we can infer or predict certain points based on the data collected. Often this involves sampling as analysing an entire population can be difficult.

Transcript of ContentDetail Two variable statistics involves discovering if two variables are related or linked...

Page 1: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Section 6.7- Scatter Diagrams and Linear Correlation

IB Mathematical Studies SL

Page 2: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Syllabus reference

Content Detail

Page 3: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Inferential Statistics Two variable statistics involves discovering if two

variables are related or linked to each other in some way. e.g. - Does IQ determine income?

- Is there a link between foot size and the height of a person?

One variable is independent (x-axis) whilst the other is dependent (y-axis)

This section of statistics involves conclusions that can be made about data that has not been collected using data that has been collected.

Hence we can infer or predict certain points based on the data collected. Often this involves sampling as analysing an entire population can be difficult.

Page 4: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Methods A scatter plot is necessary to quickly determine whether

the variables are related, however, more formally we may need to measure:

(1) Correlation – initially it may be necessary to determine if a relationship exists between two or more variables(Pearson’s product moment correlation coefficient)

(2) Regression analysis – if a relationship appears to exist we can then conduct further analysis to determine the type and strength of the relationship(Linear Regression)

Page 5: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Correlation

Correlation refers to the relationship or association between two variable.

They are classified qualitatively in three ways:› Direction – positive, negative, none› Strength – weak, moderate, strong› Type – linear or non-linear

They are classified quantitatively by Pearson’s product-moment correlation coefficient

Outliers must also be considered and usually appear as isolated points away from the main body (group) of data.

Dancing Statistics:https://www.youtube.com/watch?v=VFjaBh12C6s

Exam hint - use this language!

Page 6: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Correlation Scatter graphsPositive linear correlations

Negative linear correlations

Page 7: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

CAUTION! - Causation Be careful not to jump to conclusions when you

determine a strong correlation between two variables – why?

› It does not mean that a causal relationship exists, i.e. one variable does not necessarily cause the other.

› e.g – there is a strong correlation between arm length and running speed, does that mean that short arms cause a reduction in running speed?

How Ice Cream Kills!https://www.youtube.com/watch?v=VMUQSMFGBDo

Page 8: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Pearson’s product moment correlation coefficient

Formula for “r” r tells how strong a correlation is between two

variables There are several formulae for calculating “r”

but the one given and used in the IB course is:

› sxy is the covariance(It will always be given if required). If not given then use calculator method

› Sx is the standard deviation of x data values› Sy is the standard deviation of y data values

r sxysxsy

Exam hint – make sure you know

that sx is σx and sy is σy on your

GDC!

Page 9: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Using GDC to find “r”

Read p. 44 in your book and follow along with the example to practice using the GDC to calculate r

r lies between -1 and 1› The closer to 1 the r-value is, the stronger

the (positive) correlation› The closer to 0 the r-value is, the weaker

the correlation› The closer to -1 the r-value is, the stronger

the (negative) correlation

Page 10: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Linking “r” to terminologyCorrelation coefficient value Description of strength &

direction1 Perfect positive

0.8 to 1 Strong positive0.6 to 0.8 Moderate positive0.4 to 0.6 Weak positive0 to 0.4 No correlation-0.4 to 0 No correlation

-0.6 to -0.4 Weak negative-0.8 to -0.6 Moderate negative-1 to -0.8 Strong negative

-1 Perfect negativeNote: These are only guideline values, there is no specific division

points where the description has to change from strong to moderate etc.

Page 11: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Another chart

Page 12: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Correlation Scatter graphs with “r” values

Page 13: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Example – correlation coefficient

Use the data in the table below to calculate the r –value, given sxy=7.92

› Calculate the standard deviation of x› Calculate the standard deviation of y› Evaluate “r” using the IB formula and

compare it to your calculator.

# 1 2 3 4 5 6 7 8 9 10x 72 85 94 92 73 81 86 95 78 72y 6 5 6 7 4 4 6 7 5 3

Page 14: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Line of Best Fit & Linear Regression The line of best fit is the “quick and easy” way

of finding the trend of the data› By eye it should have approximately the same

number of data points above the line as below› A more accurate method is to calculate the mean of

the x data and y data and ensure the trend line passes through this point called the mean point

Linear regression is the most accurate process for determining the trend line, as the process takes every data point in to account via a formula.

x, y

Page 15: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

GDC: LinReg, r, and o You can find the linear regression equation by using

the process you read about on p. 44 to find ro You can also use this process to find o (coefficient of determination): number that indicates

how well the given data fits the line of regressiono Usually the greater is the better the regression

line fits the datao The greater is the closer the data points are to

the regression lineo If then we say 56.43% of the variation in set y

is caused by the variation in set x.

Page 16: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Syllabus reference

Content Detail

Page 17: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Example – line of best fit A statistician wants to know if there is a correlation

between HSC maths scores and the Math Studies IB exam scores. She collected the following data from 10 randomly selected students.

› Is there a correlation ? If so, what kind?› Draw the scatter plot of IB vs HSC› Draw the line of best fit by finding the mean of each variable.› If an HSC score is 77, predict the corresponding IB score.› If an IB score was a 2, predict the corresponding HSC score.

# 1 2 3 4 5 6 7 8 9 10HSC 72 85 94 92 73 81 86 95 78 72IB 6 5 6 7 4 4 6 7 5 3

Page 18: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Formula for Linear Regression

The line of best fit by the process of linear regression can be found using the given IB formula:

› sxy is the covariance(It will always be given if required).

› Sx is the standard deviation of x data values› sx

2 = (sx)2 i.e the std dev of x, then squared

y y sxysx2 x x

Page 19: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Example – regression line Find the equation of the line of best fit

in y=ax+b form using the linear regression formula, if:sxy = 9.23sx = 3.46

= (14.4, 35.2)x, y

Page 20: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Extrapolation vs Interpolation

Once you have a line of best fit you can use that equation to infer or predict what would happen to one variable if the other changes.

If you are predicting values within the range of your current data then you are said to be “interpolating”.› The accuracy of interpolation depends on the accuracy of

your line of best fit and your r-value If you are predicting values outside the range of your

data then you are “extrapolating”.› The accuracy of extrapolation not only on the accuracy of

your line of best fit but also whether it is reasonable to assume that the same trend will continue outside your range of data.

Page 21: ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.

Graph of extrapolation and interpolation ranges