Lecture 1 - Correlation

27
1 Correlation Correlation measures numerically the relationship between two variables X and Y (e.g. market capitalization and debt).

description

Artsfin

Transcript of Lecture 1 - Correlation

  • 1

    Correlation Correlation measures numerically the relationship between two variables X and Y (e.g. market capitalization and debt).

  • 2

    Correlation between X and Y is symbolised by r or rXY.

    =

    2)Xi

    (X2)Yi

    (Y

    )Xi

    )(XYi

    (Yr

  • 3

    Properties of Correlation r lies between 1 and +1. Positive values of r indicate positive correlation between X and Y, negative

    values indicate negative correlation, r = 0 implies X and Y are uncorrelated. Larger positive values of r indicate stronger positive correlation. r = 1 indicates

    perfect positive correlation. More negative values of r indicate stronger negative correlation. r = -1 indicates

    perfect negative correlation. The correlation between Y and X is the same as the correlation between X and

    Y. The correlation between any variable and itself is 1.

  • 4

    Understanding Correlation

    Example: Y = executive compensation, X = profit we obtain r = .66. What does the fact r is positive mean? There is a positive relationship between executive compensation and profit. Companies with high profits tend to pay their executives well. Companies with low profits tend to have low executive compensation. Executive compensation and profits both vary across companies. The high/low

    variation in executive compensation tends to match up with the high/low variation in profits.

  • 5

    Understanding Correlation (cont.)

    What does the magnitude of r mean? r2 measures the proportion of the variance in executive compensation that matches up with (or is explained by) the variance in profits.

    r2 = .662 = .44. 44% of the cross-company variation in executive compensation can be explained by the cross-country variation in profits.

  • 6

    Why are Variables Correlated?

    Correlation does not necessarily imply causality. Example: The correlation between workers education levels and wages is strongly positive. Does this mean education causes higher wages? We cant know for sure.

  • 7

    Possibility 1: Education improves skills and more skilled workers get better

    paying jobs.

    Education causes wages to increase. Possibility 2: Individuals are born with quality A which is relevant for success

    in education and on the job (e.g. intelligence or talent or determination, etc.).

    Quality A (NOT education) causes wages to increase.

  • 8

    Why are Variables Correlated? (cont.)

    Example: Data on N=546 houses sold in Windsor, Canada Y = sales price of a house X = the size of the lot the house is on rXY = .54 Houses with large lots tend to be worth more than houses with small lots. Economic theory tells us that the price of a house should depend on its

    characteristics.

  • 9

    Economic theory suggests X is causing Y. Here economic theory suggests that correlation does imply causality.

  • 10

    Why are Variables Correlated? (cont.)

    Example: Assume: Cigarette smoking causes cancer. Drinking alcohol does not cause cancer. Smokers tend to drink more alcohol. Suppose we collected data on many elderly people on how much they smoked (X), whether they had cancer (Y) and how much they drank (Z). What correlations would we find? rXY>0 Direct causality. rYZ>0 This is does NOT reflect causality.

  • 11

    Why are Variables Correlated? Summary

    Correlations can be very suggestive, but cannot on their own establish causality.

    Correlation + a sensible theory suggests (but does not prove) causality

  • 12

    Understanding Correlation through XY-plots Figure 3.1: House price versus lot size Think of drawing a straight line that best fits the points in the XY-plot. It will have positive slope. Positive slope=positive relationship=positive correlation.

  • 13

    Figure 3.1: XY Plot of House Price vs. Lot Size

    0

    20000

    40000

    60000

    80000

    100000

    120000

    140000

    160000

    180000

    200000

    0 2000 4000 6000 8000 10000 12000 14000 16000 18000

    Lot Size (square feet)

    Hou

    se P

    rice

    (Can

    adia

    n do

    llars

    )

  • 14

    Figure 3.2: XY plot with rXY = 1 All points fit on a straight upward sloping line.

  • 15

    Figure 3.2: XY-plot of Two Perfectly Correlated Variables

    -2.5

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    2.5

    -1.5 -1 -0.5 0 0.5 1 1.5

  • 16

    Understanding Correlation through XY-plots (continued)

    Figure 3.3: XY plot with rXY = .51 Points still exhibit an upward sloping pattern, but much more scattered.

  • 17

    Figure 3.3: XY-Plot of Two Positively Correlated Variables (r=.51)

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    -1 -0.5 0 0.5 1 1.5 2

  • 18

    Figure 3.4: XY plot with rXY = 0 Completely random scattering of points.

  • 19

    Figure 3.4: XY-plot of Two Uncorrelated Variables

    -2

    -1.5

    -1

    -0.5

    0

    0.5

    1

    1.5

    2

    -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

  • 20

    Figure 3.5: XY plot with rXY = -.51 Think of drawing a straight line that best fits the points in the XY-plot. It will have negative slope. Negative slope=negative relationship=negative correlation.

  • 21

    Figure 3.5: XY-plot of Two Negatively Correlated Variables (r=-.58)

    -0.3

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    -0.3 -0.2 -0.1 0 0.1 0.2 0.3

  • 22

    Correlation Among Several Variables Correlation relates precisely two variables. What to do with three or more? Usually use regression (next chapter). Or you can calculate the correlation between every possible pair of variables.

  • 23

    Example: Three variables: X, Y and Z. Can calculate three correlations: rxy, rxz and ryz. Four variables: X,Y, Z and W. Can calculate six correlation: rxy, rxz, rxw, ryz, ryw and rzw. M variables: M(M-1)/2 correlations.

  • 24

    A Correlation Matrix Example: Column 1 = X, Column 2 = Y, Column 3 = Z

    Column 1 Column 2 Column 3Column 1 1Column 2 0.3182369 1Column 3 -0.130974 0.0969959 1 rxy = 0.3182369 rxz = -0.1309744 ryz = 0.0969959

  • 25

    Covariances and Population Correlations

    In basic statistics, you distinguished between sample and population means and variances. Remember sample mean was denoted by Y and was the average calculated using the data at hand. Population mean denoted by E(Y) and called the expected value. Population variance: var(Y).

    The same sample/population distinction holds with correlations. Let be population correlation (remember r is the sample correlation).

  • 26

    Chapter Summary 1. Correlation is a common way of measuring the relationship between two

    variables. 2. Correlation can be interpreted in a common sense way as a numerical measure

    of a relationship or association between two variables. 3. Correlation can also be interpreted graphically by means of XY-plots. That is,

    the sign of the correlation relates to the slope of a best fitting line through an XY-plot. The magnitude of the correlation relates to how scattered the data points are around the best fitting line.

    4. Correlations can arise for many reasons. However, correlation does not

    necessarily imply causality between two variables.

  • 27

    5. The population correlation, , is a useful concept when talking about many issues in finance (e.g. portfolio management).