Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.

Multiple Uses for a Simple SQL Procedure

Rebecca Larsen

University of South Florida

Introduction

• The SQL Procedure…Equivalent or advantageous to other SAS data steps and procedures?

• Handy SQL Procedure:CREATE TABLE…

INSERT INTO…SELECT

• Several brief examples of using the above procedure will be presented

SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are registered trademarks or Trademarks of their respective companies.

Overview of Topics

1. Inserting Rows from Multiple Tables into One Table

2. Inserting Rows from Multiple Queries into One Table

3. Changing the Order of Variables in a SAS Data Set

4. Changing Variable Attributes and/or Names in a SAS Data Set

1. Inserting Rows from Multiple Tables into One Table

• Why would you want to insert rows from multiple tables into one table?– Information related to one

customer/patient/client is stored in multiple data sets and linked by a common identifier

– Information related to one issue/problem you would like to study is stored in multiple data sets and needs to be combined

Health Care Claims Example• In this first example, there are multiple source files with

different record layouts– HomeHealth_clm – record layout 1– Hospice_clm – record layout 1– SkilledNursing_clm – record layout 1– Inpatient_clm – record layout 1– Outpatient_clm – record layout 2– PartB_clm – record layout 2

• All files contain information needed to complete your research study

• You need to combine information from all source data sets into one “master” analysis data set

Record Layouts of Health Care Claims Example Data Sets

• Record Layout 1:– person_id

– health_status_cd

– service_category_cd

– race_cd

– procedure_amt*

– procedure_dt*

• Record Layout 2:– person_id

– health_status_cd

– service_category_cd

– race_cd

– service_amt*

– service_dt*

*In this example, source variables have different names in the different record layouts, although they contain analogous information and should be analyzed as one variable in the “master” data set

Create a new, blank table using the following code:

PROC SQL;CREATE TABLE allclaims(person_id char(11),health_status_cd char(2),service_category_cd char(2),race_cd char(1),serv_amt_cu num,service_dt num);

QUIT;

Use a macro to insert values from multiple source files into one data set

/** 1. Define macro **/%MACRO insert(amount, date, file); PROC SQL; INSERT INTO allclaims SELECT person_id, status_cd, service_category_cd, race_cd, &amount., &date. FROM &file.; QUIT;%MEND insert;

(continued)

/** 2. Call macro **/%insert(service_amt, service_dt, HomeHealth_clm);%insert(service_amt, service_dt, Hospice_clm);%insert(service_amt, service_dt, Inpatient_clm); %insert(service_amt, service_dt, SkilledNursing_clm);%insert(procedure_amt, procedure_dt, Outpatient_clm);%insert(procedure_amt, procedure_dt, PartB_clm);

Notice…

• …that different source variables for the original files are inserted into a single variable in the new file– Both variables must be of the same type in the original

file as being created in the new file (i.e. numeric into numeric)

• service_amt (num) vs. procedure_amt (num)• service_dt (num) vs. procedure_dt (num)

• One master data set was created for the purpose of completing analysis on information related to health care claims data from multiple source files

2. Inserting Rows from Multiple Queries into One Table

• Why would you want to insert rows from multiple queries into one table?– You want to create summary records from your

master data set based on different “where clause” conditions in separate queries and insert them into one summary data set

Health Care Claims Example, Part 2

• Continuing with the health care claims example provided above, a summary table with one record per person is desired to be used in statistical analyses, i.e. ANOVA, PROC LOGISTIC, etc.

• You want the summary data set to be created with the following characteristics:– One record per person,– “Race_cd” (5 class levels) to be categorized to

create the new variable, “minority_cd” (2 class levels),

(continued)– Summary variable “mean_cu” to be created as

the mean amount paid per person,– Records with missing service amount to be

excluded,– Persons with more than one value for

minority_cd were excluded (to eliminate persons with indistinguishable minority status)

• For statistical tests with a CLASS variable such as minority status, you want to make sure the classes are mutually exclusive, or you will compromise the validity of your results

First, create the blank summary table using the following code:

PROC SQL;CREATE TABLE minority(person_id char(11), health_status_cd char(2),service_category_cd char(2),minority_cd char(4),mean_cu num);

Next, Insert values from multiple queries into one summary data set (Query 1)

/** Query 1 **/INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘mh’ and count(distinct minority_cd) = 1 GROUP BY person_id;

Continue inserting values into the summary data set (Query 2)

/** Query 2 **/INSERT INTO minority SELECT distinct person_id, health_status_cd, service_category_cd, case when race_cd = ‘1’ then ‘1’ else ‘0’ end as minority_cd, mean(serv_amt_cu) as mean_cu FROM allclaims WHERE serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1 GROUP BY person_id; QUIT;

Cool Code…• The count function in this WHERE clause is an example of

PROC SQL code that can accomplish in one step, what would take more than one step and several more lines of regular SAS codewhere serv_amt_cu is not missing and service_category_cd = ‘ph’ and count(distinct minority_cd) = 1group by person_id

• The case expression can be used to create a new variable that is a “re-categorization” of the values of another variablecase when race_cd = ‘1’ then ‘1’ else ‘0’end as minority_cd

3. Changing the Order of Variables in a SAS Data Set

• Why would you want to change the order of the variables in your data set?– Perhaps your data step manipulations have

altered the order of the variables from the original, and you need to export the data in the same format you received it

– Or you wish to arrange certain variable types together for coding processes to be run on sequential variables in your data set

Example of Changing the Order of Variables in Your Data Set

• In the following example, oldfile (Table 1) has variables in a certain order that you would like to change, and newfile (Table 2) has variables in the order to which you would like to change

Before changing

Table 1. Order of variables before changing (oldfile)

pin

sex_cd

race_cd

diagnosis_cd

procedure_cd

DOB

enrollment_dt

termination_dt

death_dt

age

Table 2. Order of variables after changing (newfile)

pin

sex_cd

race_cd

DOB

age

diagnosis_cd

procedure_cd

enrollment_dt

termination_dt

death_dt

…After changing

The following code was used to change the variable order:/** Step 1 **/

PROC SQL;CREATE TABLE newfile(pin char(11),sex_cd char(1),race_cd char(1),DOB num,age num,diagnosis_cd char(5),procedure_cd char(5),enrollment_dt num,termination_dt num,death_dt num);

/** Step 2 **/INSERT INTO newfile SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM oldfile;QUIT;

Remember…

• …the order in which you specify the variables in the INSERT INTO…SELECT statement must be the exact order of the variables that are created in the blank data set, NOT the order of the data set from which they came.

4. Changing Variable Attributes and/or Names in a SAS Data Set • The names of the variables do not have to be

identical in the newly created table as the table from which you are selecting the records

• Other attributes (i.e. formats, informats, labels, character length, etc.) can be easily added or changed

• For example, we will create a new file called, “newfile2”, with different variable names and attributes from the data set in the previous example called, “newfile”

Use the following code to change the name of variables and add a date formats

in your data set:/** Step 1 **/

PROC SQL;CREATE TABLE newfile2(pin_id char(11),sex_cd char(1),race_cd char(1),DOB_dt num format=date9.,age_iv num,diagnosis_cd char(5),procedure_cd char(5),enrollment_dt num format=date9.,termination_dt num format=date9.,death_dt num format=date9.);

/** Step 2 **/INSERT INTO newfile2 SELECT pin, sex_cd, race_cd, DOB, age, diagnosis_cd, procedure_cd, enrollment_dt, termination_dt, death_dt FROM newfile;QUIT;

Notice…

• If all you want to do is change names or attributes of variables in a data set, then PROC DATASETS is a much more efficient tool; however,

• If you have other steps that need to be accomplished by writing a new file, then the ability to change names and attributes is available in PROC SQL

Conclusion

• There are many more applications for the CREATE TABLE…INSERT INTO…SELECT statements in PROC SQL than the few examples shown– Many times it is just another way to do the same thing

– Other times it may cut down the number of sorts, data steps & procedures or lines of code required

– Try it out for yourself…you may find something you will wish you had always known!

• My preferred SQL coding resource is the following user’s guide:– SAS Institute Inc. (2000), SAS® SQL Procedure User’s Guide, Version 8, Cary, NC: SAS Institute Inc.

About the SpeakerSpeaker:

Location of Company:

Telephone:Fax:

Email:

Rebecca Larsen,Assistant in Research

13301 Bruce B. Downs BoulevardMHC2617University of South FloridaTampa, Florida 33612-3807

(813) 974-7206(813) 974-6411

[email protected]

To view the presentation slides online, go to http://psrdc.fmhi.usf.edu/SQL_Savannah_2002.ppt

Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.

Documents

Transcript of Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.