1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © 1999-2009 Leland...
-
Upload
grace-osborne -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © 1999-2009 Leland...
1
Lab 2 andMerging Data (with SQL)
HRP223 – 2009October 19, 2009
Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
2
Deferred Credit
• If you are taking the class for deferred credit or if you will want class credit later, please tell Kameelah.
3
From Lab 2
• You saw how to create data.– Use loops. – Be sure to include an end with every do.– Include an output inside the innermost loop.• If you forget the output, the only time it will write a
record to the new dataset is at the end of the data step.
4
From Lab 2
• Structure your projects!– Include notes and comments in
the code.– Have no data nodes against the
left margin.– Split projects into logical units.• Include a flowchart for importing
and cleaning.• Have a separate flowchart of
analysis.
5
From Lab 2
• You can add new variables using functions and simple assignment statements inside case-when-else-end phrases within the SQL.
6
From Lab 2
Be sure to specify a character column if
you are making strings of
characters.
Remember the quotes around the
new character strings.
7
From Lab 2 You can find functions here. Use OnLineDoc to find more information.
You can double click variable names
here instead of typing them.
8
Fixing Bad Values
• You will eventually need to fix bad data.– Say you want to set Placebo5 to be a score of 10.
Name the node and output.
Select the variables that are not
modified.
9
Fixing Bad Values
• Tell it to compute a column and choose either Recode column or do a case-when-else-end statement in an Advanced expression.
10
11
To get a better look click validate
12
Collapsing Groups
• Often you will have a categorical variable and you will want to reduce the number of groups.– High Dose and Low Dose are the same as being on
a drug.
• You can create a new variable or just use a custom format to change how the values appear.
13
Adding a New Column
• Choose Computed Column and recode a column.
14
Adding a User Defined Format
Here we are changing characters to appear as other
characters.
15
1
2
3
4
Repeat until you have filled in all the values you want to appear differently.
16
Using Formats
• The formats are not automatically associated with any variables. You need to tell SAS to apply the format when it is creating a dataset or when it is processing a variable.
• Some processing nodes do better if you have assigned the format in a previous step.
17
Select the variable that needs the
format and click properties.
Click Change… and then pick the User
Defined format.
Click Change… and then pick the User
Defined format.
18
Same Information Formatted
19
Combining
• When you have data in two tables, you need to tell SQL how the two tables are related to each other.– Typically you have a subject ID number in both
files. The variable that can be used to link information is called the key.
20
Demographics
Response to TreatmentHere the two tables have different variables (except ID) and they are in a different sort order.
We want the favorite color merged in to see if it is related to response to treatment.
21
Merging
• Merging is trivially easy with EG. Choose a table and do the Query Builder…. And push the Join Tables button.
22
Double click on the dividing lines to make the columns wide
enough to read.
23
Notice the name t1. In the SQL statements, variables from this
table will have the prefix t1.
This table will be referred to as t2.
It noticed that the two tables have the common variable ID. Therefore it is going to match records that have a common
value in ID.
Double click the link for details.
24
Joins
• You will typically do inner joins and left joins.– Inner Joins: select the marching records– Left Joins: select all records on the left side and
any records that match on the right.
25
Inner Joins
• Inner Joins are useful when you want to keep the information from the tables, if and only if, there are matches in both tables.– Here you keep the records where you have
demographic and response to treatment information on people.
26
Left Joins
• Left joins are useful when you have a table with everybody on the left side of the join and not everyone has records in the right table.– A typical example has the left side with the IDs of
everyone in a family and the right table has information on diagnoses. Not everyone is sick so you want to keep all the IDs on the left and add in diagnoses where you can.
27
Typical Left Join
Notice the numeric variable is formatted to
display with words.
28
29
Coalesce
• The previous example leaves NULL for the people who are disease free. You probably want to list the rest as healthy.
• The coalesce function returns the first non-missing value. – Coalesce works on numeric lists.– Coalesce works on character lists.
30
31
Coalesce
• If you are using left joins from multiple tables, coalesce can be really useful.– Say you have people who have reported disease,
other people have verified disease and the rest are assumed to be healthy. You can coalesce an indicator variable from the verified table and reported table and call everybody else healthy.
32
If the tables have indicator variables, once the tables are linked, the coalesce function is easy:COALESCEC(t3.status2 , t2.status1, "Healthy"))
33
No indicator variables?
• If the tables you are coalescing do not have indicator variables, just make them as part of the query by adding a column which has the ID in the child tables (e.g., reported and verified) recoded to a word like “reported” or “verified”.
34
The two new indicator columns.
35
Coalesce the new columns
• Once the new columns are created, create a new variable using the Advanced expression option for a new computed column. Then do coalesce on the new variables. Double click on the new variables and it will insert the code.
36
After double clicking the ver variable the
code is inserted.
Don’t forget the comma before double clicking
the rep variable.
After inserting reported and verified, put in
another comma and the “healthy” option.
37