CORRELATIONAL RESEARCH Assoc. Prof. Dr. Şehnaz Şahinkarakaş.

46
CORRELATIONAL RESEARCH Assoc. Prof. Dr. Şehnaz Şahinkarakaş

Transcript of CORRELATIONAL RESEARCH Assoc. Prof. Dr. Şehnaz Şahinkarakaş.

CORRELATIONAL RESEARCHAssoc. Prof. Dr. Şehnaz Şahinkarakaş

NATURE OF CORRELATIONAL RESEARCH

In correlational studies, we have 2 or more quantitative variables from the same subjects (at least 30) and we try to see if there is a relationship between these variables

There is no manipulation of variables

The degree to which the quantitative variables are related is measured using a correlation coefficient (r)

A perfect correlation would be an r = +1.0 and -1.0, while no correlation would be r = 0

Correlation can't prove a causal relationship: it can be used for prediction, to support a theory, to measure test-retest reliability, etc.

When two things constantly occur together according to a predictable pattern, the two things are said to be "correlated”

E.g. Age and sense of humor: The older a person

gets, the more subtle his sense of humor becomes

Intelligence and watching TV: Less intelligent persons watch more TV than do more intelligent persons

Smoking and cancer: The more a person smokes, the more likely he will be to contract cancer.

CORRELATION OR EXPERIMENTAL Read this hypothesis:

“Children who have at least one personal computer in their house will do better on the SAT than those who do not have at least one computer in the house.”

There are two distinctly different ways this hypothesis could be tested:

If, on the day before they took the SAT, the researcher asked the students whether they had a computer in their house and then compared the SAT scores of those who said yes to those who said no, this would be a correlational design

If, at the beginning of the freshman year, the researcher identified 50 students who did not have computers and gave computers to 25 of those students, and then compared the SAT scores of those who received the computers to those who did not, this would be an experimental design.

CORRELATION COEFFICIENTS

A correlation coefficient is one way to state the presence of a correlation.

A correlation coefficient describes the strength of the relationship between two variables.

It states the presence or absence of a relationship.

In itself, a correlation coefficient says nothing about whether or not the relationship is a causal relationship.

A correlation coefficient would be used to test the following hypotheses: a. As intelligence increases, reading ability increases. b. The more time a high school student spends on studying,

the less time he will spend dating.

A correlation coefficient would not be used to test the following hypotheses (because each describes a causal relationship): a. Phonetic reading techniques do not teach children with

auditory learning disabilities to read as well as whole language methods.

b. If overweight volleyball players lose weight, their games will improve.

POSİTİVE OR NEGATİVE CORRELATİON?

Positive correlation: high scores on one variable are associated with high scores on the other; or low scores on one variable are associated with low scores on the other

Negative correlation: high scores on one variable are associated with low scores on the other.

CONFUSION ABOUT THE MAGNITUDE OF CORRELATION COEFFICIENTS:

Positive correlation coefficients indicate a stronger relationship than a negative coefficient!!

In fact, it is the absolute value of the coefficient (that is, ignoring the plus or minus sign) that describes the strength of the relationship

Correlations of .80 and -.80 describe equally strong relationships - one positive and one negative.

EXERCISES ON COEFFICIENTS

Which of the following hypotheses would adequately be tested by a correlation coefficient?a. Frequent reading improves a person's sense of

humor. b. People who spend a larger amount of time

reading have a better sense of humor than those who read less.

b is correct. ‘a’ goes beyond a correlation and states a causal relationship.

Which of the following hypotheses would adequately be tested by a correlation coefficient?a. Children who spend a lot of time watching

Sesame Street learn to read faster than those who watch it rarely or not at all.

b. Sesame Street prepares children to learn how to read.

a is correct. ‘b’ states a causal relationship.

Can the following hypotheses adequately be tested by a correlation coefficient?

People who sing well have better self-concepts than those who sing poorly.

YES Wealthy persons play tennis more than poor

persons. YES Frequent exercise increases life span. NO

WHAT DO CORRELATION COEFFICIENTS TELL US?

Correlation coefficients indicate the strength of the relationship between two sets of scores acquired by the same group of individuals.

Correlation coefficients theoretically range from +1.00 to -1.00

A value of 1.00 would signify a perfect positive relationship.

A value of -1.00 would signify a perfect negative (or inverse) relationship

WHAT DO CORRELATION COEFFICIENTS TELL US? Correlation coefficients below .35 show only a

slight relationship: this has no value but can show that there is no relationship

Correlation coefficients between .40 and .65 may have a theoretical or practical value in education: in such a case, you should also consider the size of your samples

Correlation coefficients higher than .65 can be used for individual predictions (with smaller size)

Correlation coefficients above .85 show a close relationship (not very usual in educational research)

SCATTERPLOT

Correlation coefficients can be supported with scatterplots

It gives a good visual picture of the relationship and helps to interpret the correlation coefficient.

It is generally employed to identify potential associations between two quantitative variables.

SAMPLE SCATTERPLOTS(STRONG POSİTİVE RELATİONSHİP)

Small values of X correspond to small values of Y. (e.g. 200 in X corresponds to 2 in Y)

STRONG NEGATİVE RELATİONSHİP

Small values of X correspond to large values of Y. (e.g. 200 in X corresponds to -2 in Y)

NO RELATİONSHİP For a given value of X (e.g. 0.5), the

corresponding values of Y range all over the the place (from +2 to –2).

CORRELATİON COEFFİCİENTS AND RELİABİLİTY

Correlation coefficients are also used to check the reliability and validity of scores obtained from tests and other instruments: Then they are called reliability/validity coefficients. In that case, it should be .70 or higher.

In case of correlation between scorers, it should be at least .90 (or .85)

For validity of scores, it should be at least .50

EXERCISES

Which of the following coefficients indicates the strongest relationship? A) .15 B) .70 C) -.15 D) -.80

(D)

There is a very low relationship between age and IQ. Which of the following correlation coefficients would best represent this relationship? A) .80 B) -.10 C) -.80

(B)

PARAPHRASE FOLLOWING STATEMENTS

There is a correlation of .85 between reading ability and mathematical ability. E.g. As reading ability increases, mathematical

ability also increases.

What other ways can we use to paraphrase this statement?

People who read well also do well in maths. People who read poorly do poorly in maths. People average in reading also tend to be

average in maths.

There is a correlation of .09 between reading ability and mathematical ability.

There is no relationship between the two variables.

By knowing pupils' reading ability, we can make no prediction about their mathematical ability.

The correlation between IQ and creativity was .60, whereas the correlation between self-concept and creativity was .75.

Self-concept was more strongly related to creativity than was IQ.

Creativity had better self-concept scores than IQ scores.

TYPES OF CORRELATION COEFFICIENTS

Pearson r can be used for any two variables with

ordinal or interval data, but not nominal data E.g.

There was a correlation of .80 between pretest and posttest scores in the science class.

There's a correlation of -.65 between the number of lies a child tells and his popularity among his peers

Partial r Used when measuring what the relationship

would be between two variables if the influence of some third variable were eliminated.

E.g. There's a partial r of .60 between family size and

self-concept when social economic status (SES) is controlled.

How can we interpret this statement in different situations?

THERE'S A PARTIAL R OF .60 BETWEEN FAMILY SIZE AND SELF-CONCEPT WHEN SOCIAL ECONOMIC STATUS (SES) IS CONTROLLED.

Before the application of the partial r, if the Pearson r had been .85, then the following paraphrasing would be accurate: When the influence of SES is eliminated, the strong

relationship between family size and self-concept is somewhat weakened.

On the other hand, if the original Pearson r had been .60, then the following statement would be accurate: Even when the influence of SES is eliminated, the

moderately strong relationship between family size and self-concept persists; or the size of the relationship between family size and self-concept is not affected by controlling the influence of SES

Multiple R States what the relationship would be

between the one variable and two or more others if they were combined into a single variable.

E.g. The Multiple R between teacher competence

and undergraduate GPA and student-teaching rating is .76. This means: “The combination of GPA and

student-teaching rating gives a pretty accurate prediction of future teaching competency.”

Analyzing the Data Using Excel

ENTERING DATA

Participants Hours Studying Exam Score

Participant 1 2 80

Participant 2 3 85

Participant 3 1 70

Participant 4 8 90

Participant 5 9 95

CALCULATING THE CORRELATION COEFFICIENT (EXCEL)

Click in the cell where you want the standard deviation to be displayed.

Click the fx to open up the Functions box.

Select ALL from the SELECT a CATEGORY Dropdown Menu

Select CORREL/KORELASYON from the SELECT a FUNCTION list (list is in alphabetical order)

Select the cells you want to include in the standard deviation calculation.(Array 1 = 1st group of numbers # of hours studying, Array 2 = 2nd group of numbers exam scores)

Click OK.

THREATS TO INTERNAL VALIDITY

As in the experimental research design, there are some threaths to internal validity in correlational studies.

Some possible threats: subject characteristics, location, instrumentation, testing, and mortality

Keep in mind: you should give alternative explanations for relationships found in the data.

EXERCISES

A) EXAMINE EACH OF THE FOLLOWING BRIEF DESCRIPTIONS OF RESEARCH STUDIES. THEN CLASSIFY EACH AS EITHER EXPERIMENTAL RESEARCH OR CORRELATIONAL RESEARCH.

1. Mr. X wants to find out whether or not watching television has an impact on the performance of the students in his Reading course. He lists the number of hours each of his students spends watching TV each week, and then he compares the amount of TV watched to the scores on his weekly test.

a. Experimental research. b. Correlational research

Answer to Q1: b

There was no assignment of subjects to groups followed by the administration of a treatment. Subjects simply watched television and the researcher recorded the number of hours per week, and this rate of watching television was compared to performance on the reading test. This was a correlational study. Mr. X would compute the correlation coefficient between the number of hours spent watching television each week and performance on the science test to determine the strength of the relationship.

2. Mrs. X wants to find out whether watching a series of shows on public television which dealt with science actually resulted in improved attitudes toward science. She therefore finds a group of students who watched most of the shows in this series and compared the attitudes of these students with the attitudes of a group of students who had not watched the show.

a. Experimental research. b. Correlational research.

Answer to Q2: b There was no assignment of subjects to

group followed by the administration of a treatment. Students either watched or did not watch the television shows before Mrs. X even questioned them about this topic. Their viewing habits provided the criterion by which they were classified as either watchers or non-watchers of the series. This can also be called ‘criterion-group research’ (assigning a group of subjects in which the variable is absent).

3. Mr. Y is looking for a good way to help his high school students develop their vocabulary skills. He feels that a certain paperback book will accomplish this. He wants to find out whether the book actually works. He asks two groups of his English class read the book and two groups not to read the book. He then compares the scores of those who read the book to the scores of those who did not read the book.

a. Experimental research. b. Correlational research.

Answer to Q3: a This is experimental research - specifically, it

is an example of quasi-experimental design. Mr. Y has assigned the paperback book to an experimental group (two of his classes), while withholding the book from a control group (the other two classes). If Mr. Y would have simply asked who had read the book and then compared the performance of readers to non-readers, this would be an example of correlational (criterion-group) research.

4. Mr. XY wants to find out if the type of undergraduate college education a person had influences that person's performance in law school. He goes through the law school's files and finds at random 50 students who had majored in the humanities, 50 students who had majored in a science program, and 50 who had majored in a combined science and humanities program. He then compared the performance of all the three groups with regard to their achievement in law school.

a. Experimental research. b. Correlational research

Answer to Q4: b

This is a tricky question, because it uses the word random, which superficially suggests a true experimental research design. However, Mr. XY randomly selected his subjects; he did not randomly assign them to the treatments. (This would have been the case if he had an initial pool of 150 students, from which he randomly assigned 50 to major in the humanities, 50 to major in the sciences, and 50 in the combined program.) Rather, college major was the criterion by which the subjects were classified as belonging to one of the various groups.

B) ANSWER THE FOLLOWİNG QUESTİONS 5. If a researcher wants to find out how

well a combination of several different variables can predict a single outcome, which correlation coefficient would be employed?a. Pearson correlation coefficient. b. Partial correlation coefficient.c. Multiple correlation coefficient.

(c) The multiple correlation coefficient determines the strength of the relationship between a weighted combination of variables and some outcome that this combination of variables may predict

6. If a researcher wants to find out how strongly two variables are related when the influence of some third variable is eliminated, what correlation coefficient would the researcher use?a. Pearson correlation coefficient. b. Partial correlation coefficient.c. Multiple correlation coefficient.

(b) The partial correlation coefficient mathematically controls the influence of one variable while measuring the strength of the relationship between two other variables.

7. If a researcher has two sets of exact measurements on the same group of people and wants to find out how closely these two sets of measurements are related, what correlation coefficient would the researcher use? a. Pearson correlation coefficient. b. Partial correlation coefficient.c. Multiple correlation coefficient.

(a) The Pearson correlation coefficient is used when the scores are interval/ratio or ordinal data. This statement describes interval/ratio data for both variables.

EXPLAIN WHAT IS MEANT BY EACH OF THE FOLLOWING STATISTICAL STATEMENTS.

8. There is a Pearson r of -.90 between the number of books borrowed from the library and performance on the Graduate Record Exam.

a. Students who borrowed a large number of books from the library did better on the exam than those who borrowed a low number of books.

b. Students who did well on the exam also borrowed a large number of books from the library.

c. Borrowing books from the library caused a person to do well on the exam.

d. There was very little (if any) relationship between the number of books the person borrowed from the library and that person's performance on the Graduate Record Exam.

e. None of the above is a correct interpretation.

Answer: e This statement indicates an inverse relationship

(negative correlation). Statements (a) and (b) describe a positive

correlation. If (a) were reversed to say "Students who borrowed a large number of books from the library did worse on the exam than those who borrowed a low number of books," this would be an accurate statement of this relationship.

Response (c) not only states a positive relationship, it also states causality - which is something a correlation coefficient does not do.

Response (d) describes a near zero correlation coefficient.

9. The Pearson correlation between ranking in the prom competition and performance in the science class was .15.

a. People who ranked high in science class were also likely to rank high in the prom competition.

b. People who ranked high in the science class were likely to rank low in the prom competition.

c. Successful performance in the science class caused a person to do well in the prom competition.

d. None of the above is an acceptable interpretation of this statement.

Answer: d

Response (a) states a high positive correlation (perhaps .85).

Response (b) states a high negative correlation (perhaps -.85).

Response (c) not only describes a positive relationship, it also states causality - which is something a simple correlation coefficient does not permit.

An accurate statement would be, "It was impossible to make accurate predictions of performance in science class based on performance in the prom competition."