3 - Correlation, Covariance and Scatter Plots

download 3 - Correlation, Covariance and Scatter Plots

of 4

Transcript of 3 - Correlation, Covariance and Scatter Plots

  • 8/12/2019 3 - Correlation, Covariance and Scatter Plots

    1/4

    ETC1000 / ETC9000 Business and Economic Statistics

    Semester 1, 2010

    Demonstration Lecture Week 3: Correlation, Covariance and Scatter Plots

    This lecture provides examples of the material taught in this weeks lectures, to help

    you see its potential for real world application, and to reinforce the ideas being

    communicated.

    Case Study: Attendance at lectures and your final mark!

    There is a wide range of views among lecturers and students about the benefits of

    attending lectures. Some say that with good quality, complete lecture notes and the

    availability of audio lectures, there is little need to actually physically attend. Others

    argue that a face-to-face lecture is a much better way to learn than simply reading or

    listening on-line. Mostly it is lecturers who take this view after all, who wants tostand up and talk to a half-empty lecture theatre?

    Is there evidence to support either view?

    In 2009 one of the lecturers in Business and Economics decided to get some data to

    try and address the question. She hoped that it would show that attendance at lectures

    helps a lot, and that she could use this evidence to persuade students to attend lectures

    more regularly.

    Lets see whether the results worked out as she hoped.

    Students were asked to record attendance at lectures throughout the semester, and

    these will be compared with their final mark in the unit. Data is available for 100

    students. The maximum number of lectures they could attend is 24 (12 weeks x 2

    lectures per week).

    Look first at some descriptive statistics on number of lectures attended.

    Number of Lectures attended

    Mean 15.62Standard Error 0.481743

    Median 16

    Mode 18

    Standard Deviation 4.817435

    Sample Variance 23.20768

    Kurtosis -0.736

    Skewness -0.08733

    Range 18

    Minimum 6

    Maximum 24

    Sum 1562

    Count 100

  • 8/12/2019 3 - Correlation, Covariance and Scatter Plots

    2/4

    Now lets move to the potential relationship between attendance at class and the final

    mark a student achieves. We look at this first with descriptive statistics for those who

    attended 15 or less lectures, and compare with those who attended more than 15

    lectures.

    Attended 15 or less Attended 16 or more

    Final Mark Final Mark

    Mean 58.1087 Mean 68.11111

    Standard Error 1.601397 Standard Error 1.32325

    Median 59.5 Median 68

    Mode 59 Mode 64StandardDeviation 10.8612

    StandardDeviation 9.72386

    SampleVariance 117.9657

    SampleVariance 94.55346

    Kurtosis 3.820277 Kurtosis 1.105246

    Skewness -1.37546 Skewness -0.36536

    Range 62 Range 53

    Minimum 16 Minimum 39

    Maximum 78 Maximum 92

    Sum 2673 Sum 3678

    Count 46 Count 54

    Note the much higher mean and median for those who attend the higher number of

    lectures.

    Now consider a scatter plot.

    Final Mark

    0

    10

    20

    30

    40

    5060

    70

    80

    90

    100

    0 5 10 15 20 25 30

    What do we learn from this scatter plot?

  • 8/12/2019 3 - Correlation, Covariance and Scatter Plots

    3/4

    There does appear to be a relationship (the dots are not totally random), it is positive

    (higher attendance is associated with a higher mark), and a linear model would do a

    reasonable job of summarising the relationship.

    The message is that attending more lectures is associated with a higher mark in the

    unit.

    To get a handle on the strength of the relationship, lets calculate the correlation:

    Number

    of

    Lectures

    attended

    Number of Lecturesattended 1

    0.612081 1

    A correlation of 0.612 indicates a reasonably clear relationship, but not a really strong

    one. There are obviously a number of other factors that contribute to a students final

    mark other than attending lectures. Some of these might be measurable things like

    mathematical ability, VCE ENTER score, who the tutor is, etc. Others might be

    individual-specific unmeasurable things like basic intelligence / ability, motivation,

    etc.

    Theres a couple of provisos

    1. Correlation is not Causality

    All we have shown is that attendance at lectures is associated with better final marks.

    But we cannot prove with this analysis that attending more lectures will lead to an

    improved mark. Maybe this correlation between these two factors is due to some

    other common causal factor.

    Eg. One scenario: students who attend lectures are generally more diligent and

    organised, and hence work harder on their assignments and exam preparation, thus

    performing well in their final mark. So it is not attending lectures per se that

    improves your final mark, it is your diligence and organisational skills.

    Determining causality is difficult!

    2. Outliers

    What if we had one freak student who was brilliant and hard working, but never

    attended lectures they listened online instead. They attend just one lecture, but

    scored 98%.

    Lets change person number one in this way, and see what happens.

  • 8/12/2019 3 - Correlation, Covariance and Scatter Plots

    4/4

    Final Mark

    0

    20

    40

    60

    80

    100

    120

    0 5 10 15 20 25 30

    Number of Lectures

    attended

    Final

    Mark

    Number of Lecturesattended 1

    Final Mark 0.483083 1

    Now we have a correlation of only 48%. Even though we have 100 students in thesample, this one outlier person can have quite a big impact on the correlation value.