Variable Coding

19
A Short Introduction Prepared by Mirya Holman

Transcript of Variable Coding

Page 1: Variable Coding

A Short IntroductionPrepared by Mirya Holman

Page 2: Variable Coding

There are three kinds of data◦ Qualitative◦ Quantitative◦ Ordinal

Page 3: Variable Coding

◦ Qualitative (also called ordinal) data is distinguished by being a set of unordered categories.

Qualitative variables differ in quality, not quantity or magnitudeExamples: Race, gender

Page 4: Variable Coding

◦ Quantitative (or interval) data varies in magnitude.Each possible value of a quantitative variable is greater than or smaller than any other possible value.Examples: Education, incomeQualitative data can either be…

discrete , if it can take on a finite number of valuesThe number of visits to the dentist last year

or continuous, if it can take an infinite continuum of possible real number values

The number of minutes it takes to finish a book

Page 5: Variable Coding

Ordinal data consists of categorical scales that have a natural ordering of values◦ It does not have defined interval distances between the

values. ◦ Ordinal data is usually transformed into interval data, or

data that contains categorical scales with a defined interval distances between the values

◦ Examples: Political identification (Strong Democrat to Strong Republican) or Class (low, middle, high).

Page 6: Variable Coding

Coding variables is a way to change qualitative data to quantitative data

We normally do this to perform statistical analysis on the qualitative data

Coding a variable consistently assigns a numerical value to qualitative trait◦ Example: Gender is a qualitative trait (or a variable

without a natural ordering) ◦ We can assign male and female each a numerical

value (say, zero and one). Now we have numbers to do statistics with!

Page 7: Variable Coding

We code the variables for 3 primary reasons: ◦ 1: We can run statistical models◦ 2: Our computer programs will understand the

variables◦ 3: Accountability – we can run models “blind,” or

without knowing what variables stand for, in order to reduce programming / author bias.

Page 8: Variable Coding

Say that we want to look at employment discrimination settlements◦ We are interested in whether the type of

representation has an effect on the outcome of the case.◦ We look at four types: Pro se, EEOC, appointed

council, and other. Now, these are qualitative data. ◦ But! We want to know what effect the type of

representation has on the amount received in a settlement

Page 9: Variable Coding

So… we assign consistent numerical values to each type of representation, so that…◦ Pro se = 1◦ EEOC = 2◦ Appointed council = 3◦ Other = 4

Page 10: Variable Coding

Now we can run an ANOVA test, which will statistically compare the mean settlement amount for each type representation, and determine whether the differences are statistically significant. ◦ NOTE: Statistically significant, in this and many

other applications, means that any difference you find can be attributed to differences within the data, and cannot be attributed to chance.

Page 11: Variable Coding

Asbestos cases: I want to investigate whether the nature of asbestos litigation changed between 1992 and 2001. How? By Coding!

Page 12: Variable Coding

Example 2…What is the process? ◦ Step 1: Each case is entered into a spreadsheet,

including information on the number of plaintiffs, the number of defendants, the award amount (if any), the type of award(s), the claim, etc. ◦ Step 2: Each time we deal with a qualitative element

of the case, we transform that into a quantitative descriptor◦ Step 3: We can run statistical analysis on the data

Page 13: Variable Coding

Example 2…A How To: This is what the data looks like when we enter it in:

This is in qualitative form!

Case # Plaintiff Defendant Award Type of Award Claim320278 DAVID and Susan TAYLOR JOHN CRANE INC 3029849 compensatory, loss of consortium mesothelioma

01L781 James and Terry Crawford ACandS Inc., et al 16000000 compensatory, punitive, loss of consortiuMesothelioma98-1386 Andrew and Marietta Prebehall Harbison & Walker Co 593000 wrongful death, loss of consortium Lung cancer

Page 14: Variable Coding

Example 2…We want to code the data, to transform it into quantitative data… so, let’s start with the claim: We decide that we are going to consistently assign each type of claim a numerical identifier:

The number we assign does not matter as much as the consistency in which we assign the code.

Case # PlaintiffDefendAward Type of Award Claim Claim2320278 DAVID JOHN C 3029849 compensatory, loss of consortium mesothelioma 1

01L781 James anACandS 16000000 compensatory, punitive, loss of consortiuMesothelioma 198-1386 Andrew Harbiso 593000 wrongful death, loss of consortium Lung cancer 3

Page 15: Variable Coding

Example 2…Next, we tackle damages. Here it is easier to make separate columns for each type of damage, and then indicate with a 0/1 whether that damage was awarded:

Award Type of Award Compensatory Punative Loss of consortium wrongful death####### compensatory, loss of consortium 1 0 1 01.6E+07 compensatory, punitive, loss of consortium 1 1 1 0593,000 wrongful death, loss of consortium 0 0 1 1

Page 16: Variable Coding

Example 2…We can leave the damages amount alone, since it is already in numerical formWe can transform the plaintiffs, by coding the number of defendants or the type of plaintiffs.

Here, all our plaintiffs are married couples, so there are 2 plaintiffs, and we give them a code of “2.” We could, for example, give a single plaintiff a code of “1” and a surviving spouse, who is suing for the estate, a code of “3.”

Case # Plaintiff Num_plt Type_plt Defendant Award320278 DAVID and Susan TAYLOR 2 2 JOHN CRANE 3029849

01L781 James and Terry Crawford 2 2 ACandS Inc., et 1600000098-1386 Andrew and Marietta Prebehalla 2 2 Harbison & Wal 593000

Page 17: Variable Coding

Example 2…Codebook!When we are coding, it is important to keep track of what we code, and how we code it. This is usually kept in a codebook, which documents what each variable means.So, for the asbestos cases, our codebook would include: ◦ Type_plt = Type of plaintiff. 1= single plaintiff. 2=

married plaintiffs. 3=surviving spouse, suing on behalf of the estate.

Page 18: Variable Coding

Example 2…Now we have the data in a form which allows us to model or manipulate it, in order to better understand trends and relationships.

Page 19: Variable Coding

Final thoughtsIn order to code correctly, we MUST: ◦ Be Consistent in our coding

i.e. if female =1 once, female =1 always◦ Know what you are coding!

Coding is NOT an exact science in most circumstancesKnowing the context can help you determine where to put a case / plaintiff / award when it does not exactly fit your categories

◦ When in doubt, have someone code a sample of your data, and see the level of consistency. ◦ Keep track of what you do! Use a codebook!◦ This is an intuitive process, and everyone makes

mistakes! Take your time!