The nature of the data

Post on 22-Jun-2015

143 views 1 download

Tags:

description

The nature of the data

Transcript of The nature of the data

The Nature of Your Data

The purpose of this presentation is to help you determine if the two data sets you are working with in this problem are:

The purpose of this presentation is to help you determine if the two data sets you are working with in this problem are:

Dichotomous by Scaled

Ordinal by Another Variable

or

Dichotomous by Dichotomous

First, let's define what each of these mean.

Dichotomous by Scaled

Ordinal by Ordinal

or

Dichotomous by Dichotomous

Beginning with

Dichotomous by Dichotomous

What is dichotomous data?

The "Di" in Dichotomous means "two"

. . . and "tomous" or "tomy" as in “appendec-tomy” means to divide by.

. . . and "tomous" or "tomy" as in “appendec-tomy” means to divide by.

So, dichotomous means to divide by two.

In this case a variable is divided by two or specifically it can only take on two values.

For example:

Gender is a good example of a dichotomous data.

Gender is a good example of a dichotomous data. It generally takes on two values

Gender is a good example of a dichotomous data. It generally takes on two values (1) male (2) female

In some cases individuals are divided by (1) those who received a treatment and (2) those who did not.

For example:

You have been asked to determine if those who eat asparagus score higher on a well-being scale (1-10) than those who do not.

You have been asked to determine if those who eat asparagus score higher on a well-being scale (1-10) than those who do not.

You have been asked to determine if those who eat asparagus score higher on a well-being scale (1-10) than those who do not.

In this case, we are dealing with those (1) who eat asparagus and those (2) who do not.

With dichotomous by dichotomous data you are examining the relationship between two dichotomous variables.

Here is an example:

It has been purported that females prefer artichokes more than do males.

It has been purported that females prefer artichokes more than do males.

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 1: Gender(1)Male(2)Female

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 1: Gender(1)Male(2)Female

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 1: Gender(1)Male(2)Female

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 2: Artichoke Preference(1)Prefer Artichokes(2)Do not prefer Artichokes

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 2: Artichoke Preference(1)Prefer Artichokes(2)Do not prefer Artichokes

It has been purported that females prefer artichokes more than do males.

Dichotomous variable 2: Artichoke Preference(1)Prefer Artichokes(2)Do not prefer Artichokes

Here is what the data set looks like:

It has been purported that females prefer artichokes more than do males.

Study Participant Gender1 = Male

2 = Female

Artichoke Preference1 = Prefer Artichokes

2 = Don’t Prefer ArtichokesA 1 2

B 2 1

C 1 2

D 2 1

E 2 1

F 1 2

G 1 2

This is an example of:

Study Participant Gender1 = Male

2 = Female

Artichoke Preference1 = Prefer Artichokes

2 = Don’t Prefer ArtichokesA 1 2

B 2 1

C 1 2

D 2 1

E 2 1

F 1 2

G 1 2

DichotomousData

This is an example of:

Study Participant Gender1 = Male

2 = Female

Artichoke Preference1 = Prefer Artichokes

2 = Don’t Prefer ArtichokesA 1 2

B 2 1

C 1 2

D 2 1

E 2 1

F 1 2

G 1 2

DichotomousData

byDichotomous

Data

As you will learn, there is a specific statistical method used to calculate the relationship between two dichotomous variables. It is called the Phi-coefficient.

Note - a dichotomous variable is also a nominal variable.

Note - a dichotomous variable is also a nominal variable. However, nominal variables can also take on more than two values:

Note - a dichotomous variable is also a nominal variable. However, nominal variables can also take on more than two values:

1 = American2 = Canadian3 = Mexican

like so

Note - a dichotomous variable is also a nominal variable. However, nominal variables can also take on more than two values:

1 = American2 = Canadian3 = Mexican

Dichotomous nominal variables can only take on two values - (e.g., 1 = Male, 2 = Female)

The next type of relationship involves dichotomous by scaled variables.

The next type of relationship involves dichotomous by scaled variables.

Dichotomous by Scaled

Dichotomous by Dichotomous

Ordinal by Another Variable

Now you already know what a dichotomous variable is, but what is a scaled variable?

A scaled variable is a variable that theoretically can take on an infinite amount of values.

A scaled variable is a variable that theoretically can take on an infinite amount of values.

For example,

Let's say a car can go as slow as 0 miles per hour and as fast as 130 miles per hour.

Within those two points (0 and 130mph) it could go 30 mph, 60 mph, 23 mph, 120 mph, 33.2 mph, 44.302 mph, or even 88.00000000001 mph.

The point is that between these two points (0 and 130mph) there are an infinite number of values that the speed could take.

Scaled data also has what are called equal intervals.

Scaled data also has what are called equal intervals. This means that the basic unit of measurement (e.g., inches, miles per hour, pounds) are the same across the scale:

Scaled data also has what are called equal intervals. This means that the basic unit of measurement (e.g., inches, miles per hour, pounds) are the same across the scale:

40o - 41o

100o - 101o

70o - 71o

Each set of readings are the same distance apart: 1o

Slide 51 of 85

Here is an example of a word problem with scaled by dichotomous variables:

You have been asked to determine the relationship between age and hours of sleep. Age is divided into two groups: Middle Age (45-64) and Old Age (65-94).

You have been asked to determine the relationship between age and hours of sleep. Age is divided into two groups: Middle Age (45-64) and Old Age (65-94).

The Scaled Variable is hours of sleep which can take on values

from 0 to 8+ hours.

You have been asked to determine the relationship between age and hours of sleep. Age is divided into two groups: Middle Age (45-64) and Old Age (65-94).

The Dichotomous Variable is age which in this case can take on two values (1) middle and (2) old age.

Here is what the data set might look like:

Here is what the data set might look like:

Study Participant Age1 = 45-64 years2 = 65-94 years

Hours of Sleep

A 1 6.2

B 2 9.1

C 1 5.8

D 2 8.2

E 2 7.4

F 1 4.9

G 1 6.8

Here is what the data set might look like:

Study Participant Age1 = 45-64 years2 = 65-94 years

Hours of Sleep

A 1 6.2

B 2 9.1

C 1 5.8

D 2 8.2

E 2 7.4

F 1 4.9

G 1 6.8

DichotomousData

Here is what the data set might look like:

Study Participant Age1 = 45-64 years2 = 65-94 years

Hours of Sleep

A 1 6.2

B 2 9.1

C 1 5.8

D 2 8.2

E 2 7.4

F 1 4.9

G 1 6.8

DichotomousData

Here is what the data set might look like:

Study Participant Age1 = 45-64 years2 = 65-94 years

Hours of Sleep

A 1 6.2

B 2 9.1

C 1 5.8

D 2 8.2

E 2 7.4

F 1 4.9

G 1 6.8

DichotomousData

by

Here is what the data set might look like:

Study Participant Age1 = 45-64 years2 = 65-94 years

Hours of Sleep

A 1 6.2

B 2 9.1

C 1 5.8

D 2 8.2

E 2 7.4

F 1 4.9

G 1 6.8

DichotomousData

byScaled

Data

Note, in the strictest sense scaled data should be like the car example (values are infinite between 0 and 130 mph).

However, in the social sciences many times data that is technically not scaled (e.g., on a scale of 1-10 how would you rate the ballerina's performance), are still treated as scaled data.

However, in the social sciences many times data that is technically not scaled (e.g., on a scale of 1-10 how would you rate the ballerina's performance), are still treated as scaled data.

Yes, it is true there are only 10 values that the variable can take on, but many researchers will treat it as scaled data. For the purposes of this class we will treat variables such as these as scaled data as well.

However, in the social sciences many times data that is technically not scaled (e.g., on a scale of 1-10 how would you rate the ballerina's performance), are still treated as scaled data.

Yes, it is true there are only 10 values that the variable can take on, but many researchers will treat it as scaled data. For the purposes of this class we will treat variables such as these as scaled data as well.

However, if we were rating on a scale of 1-2, 1-3 or 1-4 we most likely would not treat such variables as scaled.

As you will learn there is a specific statistical method used to calculate the relationship between scaled by dichotomous variables. it is called the Point Biserial Correlation.

Lastly let's consider the relationship involving ordinal data by another variable.

Lastly let's consider the relationship involving ordinal data by another variable.

Dichotomous by Scaled

Dichotomous by Dichotomous

Ordinal by Another Variable

An ordinal variable is a variable where the numbers represent relative amounts of a an attribute. However, they do not have equal intervals.

For example,

In this pole vaulting example you will notice that 1st and 2nd place are closer to each other:

In this pole vaulting example you will notice that 1st and 2nd place are closer to each other:

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

In this pole vaulting example you will notice that 1st and 2nd place are closer to each other:

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

In this pole vaulting example you will notice that 1st and 2nd place are closer to each other:

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

2 inches apart

. . . than 2nd and 3rd place, which are much further apart

. . . than 2nd and 3rd place, which are much further apart

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

. . . than 2nd and 3rd place, which are much further apart

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

3 feet 1” apart

Rank ordered or ordinal data such as these do not have equal intervals.

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”

Rank ordered or ordinal data such as these do not have equal intervals.

3rd Place15’ 2”

2nd Place18’ 1”

1st Place18’ 3”No equal in

tervals

Here is what an ordinal by ordinal problem looks like:

In a study, researchers rank order different breeds of dog based on how high they can jump. They then rank order them based on the length of their hind legs. They wish to determine if a relationship exists between jumping height and hind leg length.

In a study, researchers rank order different breeds of dog based on how high they can jump. They then rank order them based on the length of their hind legs. They wish to determine if a relationship exists between jumping height and hind leg length.

In a study, researchers rank order different breeds of dog based on how high they can jump. They then rank order them based on the length of their hind legs. They wish to determine if a relationship exists between jumping height and hind leg length.

Here’s the data set:

Here’s the data set:

Breed Participant Jumping Rank Hind-Leg Length Rank

A 1st 2nd

B 3rd 6th

C 6th 4th

D 4th 3rd

E 7th 7th

F 2nd 1st

G 5th 5th

Here’s the data set:

Breed Participant Jumping Rank Hind-Leg Length Rank

A 1st 2nd

B 3rd 6th

C 6th 4th

D 4th 3rd

E 7th 7th

F 2nd 1st

G 5th 5th

Ordinal or Ranked Data

Here’s the data set:

Breed Participant Jumping Rank Hind-Leg Length Rank

A 1st 2nd

B 3rd 6th

C 6th 4th

D 4th 3rd

E 7th 7th

F 2nd 1st

G 5th 5th

byOrdinal or

Ranked DataOrdinal or

Ranked Data

Rank ordered data can also take the form of percentiles.

Percentiles communicate the percentage of observations or values below a certain point.

If my score on the ACT is at the 35th percentile that means the 35% of ACT takers are below me.

If my score on the ACT is at the 35th percentile that means the 35% of ACT takers are below me.

A data set taken from the dog jumping question might look like this:

A data set taken from the dog jumping question might look like this:

Breed Participant Jumping Percentile Rank

Hind-Leg Percentile Rank

A 99% 85%

B 78% 33%

C 54% 64%

D 69% 73%

E 34% 28%

F 84% 97%

G 61% 54%

A data set taken from the dog jumping question might look like this:

Breed Participant Jumping Percentile Rank

Hind-Leg Percentile Rank

A 99% 85%

B 78% 33%

C 54% 64%

D 69% 73%

E 34% 28%

F 84% 97%

G 61% 54%

Ordinal or Percentile

Ranked Data

A data set taken from the dog jumping question might look like this:

Breed Participant Jumping Percentile Rank

Hind-Leg Percentile Rank

A 99% 85%

B 78% 33%

C 54% 64%

D 69% 73%

E 34% 28%

F 84% 97%

G 61% 54%

byOrdinal or Percentile

Ranked Data

Ordinal or Percentile

Ranked Data

The next example is that of a relationship between ordinal variable and a scaled variable.

You have been asked to determine if there is a relationship between the height of marathon runners and their final ranking in a race.

You have been asked to determine if there is a relationship between the height of marathon runners and their final ranking in a race.

Here’s the data set:

Marathon Runners Height in inches Order of Finish

A 73 6th

B 67 4th

C 69 5th

D 64 2nd

E 71 7th

F 62 1st

G 66 3rd

Here’s the data set:

Marathon Runners Height in inches Order of Finish

A 73 6th

B 67 4th

C 69 5th

D 64 2nd

E 71 7th

F 62 1st

G 66 3rd

ScaledData

Here’s the data set:

Marathon Runners Height in inches Order of Finish

A 73 6th

B 67 4th

C 69 5th

D 64 2nd

E 71 7th

F 62 1st

G 66 3rd

ScaledData

by Ordinal/Ranked Data

The final example is that of a relationship between ordinal variable and a nominal variable.

You have been asked to determine if there is a relationship between gender and spelling bee competition rankings.

You have been asked to determine if there is a relationship between gender and spelling bee competition rankings.

Here’s the data set:

Marathon Runners Gender Spelling Bee Rank

A 1 6th

B 2 4th

C 2 5th

D 2 2nd

E 1 7th

F 1 1st

G 2 3rd

Marathon Runners Gender Spelling Bee Rank

A 1 6th

B 2 4th

C 2 5th

D 2 2nd

E 1 7th

F 1 1st

G 2 3rd

Dichotomous/Nominal Data

Marathon Runners Gender Spelling Bee Rank

A 1 6th

B 2 4th

C 2 5th

D 2 2nd

E 1 7th

F 1 1st

G 2 3rd

by Ordinal/Ranked Data

Dichotomous/Nominal Data

In summary,

In summary, When at least one variable in the relationship is ordinal or rank ordered, then you choose the final option:

In summary, When at least one variable in the relationship is ordinal or rank ordered, then you choose the final option:

Dichotomous by Scaled

Dichotomous by Dichotomous

Ordinal by Another Variable

As you will learn there are specific statistical methods used to calculate the relationship between ordinal by ordinal or ordinal by other variables.

As you will learn there are specific statistical methods used to calculate the relationship between ordinal by ordinal or ordinal by other variables. They are the Spearman Rho and Kendall Tau.

As you will learn there are specific statistical methods used to calculate the relationship between ordinal by ordinal or ordinal by other variables. They are the Spearman Rho and Kendall Tau. We'll explain their difference in another presentation.

A final note:

Dichotomous data like this:

1 = Catholic

2 = Mormon

Dichotomous data like this:

1 = Catholic

2 = Mormon

Study Participants

Religious Affiliation

1 = Catholic2 = Mormon

A 1

B 1

C 1

D 2

E 1

F 2

Dichotomous data like this:

1 = Catholic

2 = Mormon

Study Participants

Religious Affiliation

1 = Catholic2 = Mormon

A 1

B 1

C 1

D 2

E 1

F 2Nom

inal D

ichot

omou

s Dat

a

Dichotomous data like this:

1 = Catholic

2 = Mormon

. . . can become scaled if we are talking about the number of Catholics or Mormons.

Dichotomous data like this:

1 = Catholic

2 = Mormon

. . . can become scaled if we are talking about the number of Catholics or Mormons.

Event Number of Catholics in attendance

Number of Mormons in attendance

A 120 22

B 322 34

C 401 78

D 73 55

E 80 3

F 392 102

Dichotomous data like this:

1 = Catholic

2 = Mormon

. . . can become scaled if we are talking about the number of Catholics or Mormons.

Event Number of Catholics in attendance

Number of Mormons in attendance

A 120 22

B 322 34

C 401 78

D 73 55

E 80 3

F 392 102

Scaled

Data

Which option is most appropriate for the problem you are working with:

Dichotomous by Scaled

Dichotomous by Dichotomous

Ordinal by Another Variable