Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf ·...

29
Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC [email protected] 20 August 2018

Transcript of Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf ·...

Page 1: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Introduction to SAS®

Cristina Murray-KrezanResearch Assistant Professor of Internal Medicine

Biostatistician, CTSC

[email protected] August 2018

Page 2: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

What is SAS®?

• “Statistical Analysis System”, created in 1976 at NC State for agricultural data analysis

• A consortium of eight universities with major research funding from the USDA realized the importance of such software. They obtained a grant from NIH to further develop the software, and SAS was born.

• Widely used in many disciplines including statistics, health sciences, business, and economics.

Page 3: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

SAS vs. Other Software

• Command-driven vs. menu-driven• Flexibility comes from using SAS language to

write programs.• Other software you may use:

– SPSS– Stata– Minitab– Matlab

Page 4: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Components of SAS Programs

• DATA steps– Here you can:

• Read in data• Manipulate data

• PROC steps– Here you can:

• Analyze the data• Create tables of output

Page 5: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

The SAS Environment

• Five windows:1. Editor – where you write your program

(commands).2. Log – log of success of the submitted command.3. Output – display of your statistical results.4. Explorer – a directory for your libraries.5. Results – a listing of all submitted PROC steps.

Page 6: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Where Your Data Will Live

• Library– This is created to refer to permanent data

sets(such as your Excel file, or other permanent data set).

– You specify the directory and then SAS knows where to get the data, or where to put permanent data sets.

– Use “libname” statement to name your library and specify the directory.

Page 7: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Types of Data Sets

• Temporary data sets– Stored in the “Work” library.– Created while running your program.– Cease to exist when you close SAS.

• Permanent data sets– Stored in a library that you define.– Continue to exist after SAS is closed.– A data set that you are reading into SAS

• Can pretty much be any file type.– A data set that you export out of SAS

• Can export into pretty much any file type.

Page 8: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

The LIBNAME Statement

• Example Syntax:libname sasdata “C:\cristina\Pharm547”;

• Notes:– Your library name (called a “libref” in SAS syntax)

must be ≤ 8 characters in length.– All SAS statements must end with “;”.

Page 9: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Ways to Read Your Data into SAS

• Import Wizard from drop-down menu:– Go to File > Import Data…– Select your data file type– Select your data set– Give your temporary data set a name– SAS can generate the code used to perform the

import. Just select a directory where the code should be output. PROC IMPORT is the procedure used in the code. I recommend doing this.

Page 10: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Ways to Read Your Data into SAS (continued)

• For a very small dataset, or test data, you can input the data in the DATA step using the “datalines” statement (a.k.a. “cards”).

• Exampledata mydata;input patid $ age gender $;datalines;A1001 27 FA1002 32 MA1003 29 MA1004 29 F;run;

Page 11: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Ways to Read Your Data into SAS (continued)

• External data sets– In practice, you will most likely be using Excel or

ACCESS files to read into SAS.– Use Import wizard, PROC IMPORT, or the “infile”

and “input” statements.– Example:data mydata;

infile “C:\cristina\Pharm547\dataset.csv” dlm = “,”;

input patid $ age gender $;

run;

Page 12: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Ways to Read Your Data into SAS (continued)

• Large data sets obtained from national databases/registries very often come with programs you can use to read in the data to SAS.

• BRFSS:– Can use the following to create a permanent SAS data

set:• SASOUT11_LLCP.SAS (this program converts the data from

ASCII to SAS7DBAT)• LLCP2011.ASC (this is the actual data in ASCII format)• Formas11.sas (this formats the data and can put labels over

the variable names)

Page 13: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

From a Temporary Data Set to a Permanent Data Set

• All of the previous examples, except BRFSS, created temporary data sets (will not exist after closing SAS).

• Create a permanent data set for BP_Examplewhich will be stored in the directory you assigned to what you named your library (in this case, “sasdata”):data sasdata.bp_example;set bp_example;run;

Page 14: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Vice Versa: From a Permanent Data Set to a Temporary Data Set

• Create a temporary (or “working”) dataset for the BRFSS data, which will now exist in the “sasdata” library as well as in the “Work” library.data brfss;

set sasdata.brfss2001;

run;

Page 15: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Accessing Your Data in SAS

• To access temporary data sets, use the DATA step, but omit the library name in the front.

• SAS stores temporary data sets in the library “Work”.

• You can refer to the data set as “Work.dataset”, but by default SAS assumes the “Work.”-part unless you specify differently, so you don’t have to add it.

Page 16: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Accessing Your Data in SAS (continued)

• Examples:data brfss2;

set brfss;

run;

• SAS is thinking of it like:data work.brfss2;

set work.brfss;

run;

Page 17: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

The DATA Step

• All DATA steps use the following syntax:• data <new dataset name>;• set <dataset name>;• run;

• NOTE: – Every statement ends with a “;”.– Every step ends with a “run;”.

Page 18: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Things You Can Do with the DATA Step

• Create new variables.• Change the variable type.

– e.g., from numeric to character or vice versa.

• Drop, keep, rename variables.• Output to a new temporary or permanent

data step.• Format the data.

Page 19: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

The PROC Step

• DATA steps are used to read and modify data whereas PROC steps are used to analyze data.

• All PROC steps use the following syntax:• proc <procname> data = <dataset>• run;

Page 20: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Commonly Used PROC Steps

• CONTENT – lists the contents of your data set, such as all the variables, whether they are character, numeric, their assigned formats, etc.

• SORT – sorts your data by the variable(s) that you specify.

• SUMMARY – provides basic summary statistics for your data, such as n, means, standard deviations, etc.

Page 21: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Commonly Used PROC Steps (continued)

• FREQ – create counts of categorical variables with specific features and contingency tables (2x2 or greater). Also calculated associated statistics (e.g., chi-square).

• MEANS – calculate means, CIs, etc. of continuous variables and associated statistics.

• TTEST – conduct a two-sample t-test between two continuous variables.

• REG – perform simple or multiple linear regression.

Page 22: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Many PROCS Use the Following Statements

• by – perform commands by certain groups, such as calculate the mean age by gender.

• class – lets SAS know to treat a variable in the class statement as a categorical variable.

• var – tells SAS on which variables to perform requested calculations.

• output – can output the working data set that SAS creates in the background that may contain calculations of interest.

Page 23: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

More about the PROC Steps

• There are many specific statements for each PROC step—they are not all the same nor are the always consistent.

• Don’t forget that each statement must end with a semicolon.

Page 24: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Outputting Permanent Data Sets

• You may want to create a new permanent data set from your original.

• For example, you may want a subset of variables from the BRFSS data set for your project.

• You can use the DATA step:data sasdata.mydata;set mydata;run;

Page 25: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Outputting Permanent Data Sets (continued)

• Use PROC EXPORT (similar to PROC IMPORT) .• Use the Export Wizard in the drop-down

menu under “file”.• NOTE:

– The DATA step only outputs a SAS data set (in the way I’ve shown you).

– PROC EXPORT or the Export Wizard can output to just about any file type.

Page 26: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

A Few More Things about SAS before You Jump in…

• SAS Help is your friend!!

• Access by either clicking on the book with the question mark or on the Help link and selecting “SAS Help and Documentation”.

Page 27: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

A Few More Things about SAS before You Jump in… (continued)

• The documentation contains almost everything (and often more) that you may want to know, such as all the statements and syntax particular to a given PROC. It also provides detailed discussions about the statistical procedures it uses and how they are implemented.

• A plethora of information, may be a bit terse for some.

Page 28: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Good SAS Resources

• UCLA’s Statistical Computing website:https://stats.idre.ucla.edu/Delwiche & Slaughter, The Little SAS Book: A Primer, 5th Ed. (2012).

• Cody & Smith, Applied Statistics and the SAS Programming Language, 5th Ed. (2005).

• The internet!

Page 29: Cristina Murray-Krezan Research Assistant Professor of ...james/STAT579-F18/intro_to_sas.pdf · Introduction to SAS® Cristina Murray-Krezan Research Assistant Professor of Internal

Now you are ready to program!