HRP223 - 2008

46
HRP223 2008 Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law. HRP223 - 2008 Topic 2 – Using EG

description

HRP223 - 2008. Topic 2 – Using EG. At this point you can:. Start up a project Use SAS as a calculator Set some configuration options Remember to work in WORK, rather than SASUSER Create a library Import a dataset into work or your custom library Subset a dataset - PowerPoint PPT Presentation

Transcript of HRP223 - 2008

Page 1: HRP223 - 2008

HRP223 2008

Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

HRP223 - 2008

Topic 2 – Using EG

Page 2: HRP223 - 2008

HRP223 2008

At this point you can: Start up a project Use SAS as a calculator Set some configuration options– Remember to work in WORK, rather than SASUSER

Create a library Import a dataset– into work or your custom library

Subset a dataset– You can use data steps, write or point/click to SQL

Page 3: HRP223 - 2008

HRP223 2008

Working on a Project

Set up a library to hold your permanent data. Import data into that library. Look at what you’ve got. Check for bad data. Subset the data to keep the data you want. Make a report.

Page 4: HRP223 - 2008

HRP223 2008

Make the Library

Tools menu > Assign Library… Review the code (if you want) Check the log

Page 5: HRP223 - 2008

HRP223 2008

Write the Import Code

Where is the dataset node in the flowchart?

The log is good. It is a bug… they forgot to draw the dataset if you use proc import.

Page 6: HRP223 - 2008

HRP223 2008

You really want to put the source file in the library. – Tweak the code and link the import node to the

library.

Page 7: HRP223 - 2008

HRP223 2008

Page 8: HRP223 - 2008

HRP223 2008

Files in a Library

Once a file is in a library, you can access it just like any other file on your computer.

Page 9: HRP223 - 2008

HRP223 2008

Structure If you have a dataset on the left margin of a

process flow, you will have a problem in your future.– Put every dataset into a library. – If your datasets move across machines you just need

to change the one library reference path. Add a note (File > New > Note) with information

on the origin of every data file and connect it.– Include the time, date, and source of the file (email

titles help also).

Page 10: HRP223 - 2008

HRP223 2008

Add a Variable

To add a variable with EG:– Select the dataset– Choose Filter and Query…. from the Data menu– Name the query and new dataset– Select the current variables (drag and drop to select data)– Click Computed Columns– Click New, then click Build Expression– Fill in the expression and click OK– Select the new variable and give it a good name– Select the new variable (drag and drop to select data)

Page 11: HRP223 - 2008

HRP223 2008

Page 12: HRP223 - 2008

HRP223 2008

Page 13: HRP223 - 2008

HRP223 2008

Page 14: HRP223 - 2008

HRP223 2008

Page 15: HRP223 - 2008

HRP223 2008

Calculate Stuff

Calculate the discounted price and then get some descriptive statistics on the new values.– Either reopen the previous filter and add in the formula there or

just make a new data set by filtering the previously created data set.

Page 16: HRP223 - 2008

HRP223 2008

Click on the data set to analyze or choose it from the list

Proc Means

Proc Univariate

Page 17: HRP223 - 2008

HRP223 2008

Procs or Menu Items

Use the task list (right side of the screen), organized by task name, to look up the procedures that go with a menu item or if you are told to use a procedure, you can find the corresponding menu item like this.

Page 18: HRP223 - 2008

HRP223 2008

Page 19: HRP223 - 2008

HRP223 2008

Not enough data for a useful histogram

Be glad you did not need to memorize this stuff.

Page 20: HRP223 - 2008

HRP223 2008

Looking at Categorical Data In this source file we have a categorical “tour”

variable. What are the its values? Use the Describe > One-Way Frequencies

menu option to see the categories.

Drag Tour from the left pane and drop it into the Analysis variables group.

Page 21: HRP223 - 2008

HRP223 2008

Proc Freak The procedure that does frequency counts is proc

freq (pronounced freak). It is very important to learn because it does the core categorical analysis for basic epidemiological studies. The EG code is:

This could be simplifiedPROC FREQ DATA=day2.source;TABLES Tour;

RUN;

Page 22: HRP223 - 2008

HRP223 2008

The Levels

You have already seen how to subset a dataset using the GUI and SQL.

What if you want to subset into 3 different data sets? You could do a lot of pointing and clicking or write a little program.

Page 23: HRP223 - 2008

HRP223 2008

Page 24: HRP223 - 2008

HRP223 2008

Page 25: HRP223 - 2008

HRP223 2008

That gets you only 1 of 3.

That technique is not fun if you need to split into many subgroups. If you do need many subgroups, use code instead.

Page 26: HRP223 - 2008

HRP223 2008

Splitting in a Data Step All data steps begin with the data statement. Most have a set statement saying where the data is coming from, and they

should end with a run statement.

* A list of what data sets to make;data fj12 ps27 sh43;

* based on what file? ;set day2.source;* Check the value of tour and if TRUE output;if tour = "FJ12" then output fj12;

if tour = "PS27" then output ps27;

if tour = "SH43" then output sh43;return; * This line is optional;

run;

Page 27: HRP223 - 2008

HRP223 2008

What is a statement? A statement is a single instruction beginning

with a keyword and ending in a semicolon. You can use white space and new lines to

make them easier to read.– Look back at the proc sql statements you have

seen and notice where the semicolons are.• SQL created table statements are LONG.

Page 28: HRP223 - 2008

HRP223 2008

Parts of a SAS Dataset You have seen how to browse a SAS dataset like a

spreadsheet. There are two parts of a dataset which you do not see when you browse the data. – There is a section that acts like a dictionary which has a

description of the data set, including among other things, the types of variables (character or numeric) and when the data set was created.

– There is sometimes a section that has “index” information. You can create an index to help speed up processing of huge files.

Page 29: HRP223 - 2008

HRP223 2008

Seeing the Details with EG

Page 30: HRP223 - 2008

HRP223 2008

By Position

Knowing the variables’ order can help you do complex things.

Page 31: HRP223 - 2008

HRP223 2008

If want to code…

You can see the dictionary of attributes by typing a proc contents step in a code window:

proc contents data=teletubbies; run;

To get the variables in their stored order, use:

proc contents data=teletubbies position; run;

Page 32: HRP223 - 2008

HRP223 2008

Formats and Labels

Formats and labels change the appearance of data but do not change the values.– Labels • Column headings in summaries• A way to deal with the fact that variables do not have

spaces in the names

– Formats• Add printing niceties like $ or , or Monday

Page 33: HRP223 - 2008

HRP223 2008

Add Labels

Deposit_date looks bad

Page 34: HRP223 - 2008

HRP223 2008

Page 35: HRP223 - 2008

HRP223 2008

Page 36: HRP223 - 2008

HRP223 2008

Notice dates in Excel are actually the number of days since 1900 (in Windows).

Dates in SAS are the number of days since 1960.

Page 37: HRP223 - 2008

HRP223 2008

If you want to permanently change the format, use Filter and Query.

Page 38: HRP223 - 2008

HRP223 2008

Page 39: HRP223 - 2008

HRP223 2008

Custom Formats

You can write your own formats easily.– Say you want to show the size of the rental car

needed to get people on a tour.• <=4 they need a normal car• 5-7 they need a minivan• 8+ they need a bus

Page 40: HRP223 - 2008

HRP223 2008

Page 41: HRP223 - 2008

HRP223 2008

Do not forget bad data.

Page 42: HRP223 - 2008

HRP223 2008

Made but Useless

The format is made but is not associated with any variable.

Page 43: HRP223 - 2008

HRP223 2008

Page 44: HRP223 - 2008

HRP223 2008

Make it look good.

Be sure to label the format node in the flowchart and also link it up graphically to show where it is used.

Page 45: HRP223 - 2008

HRP223 2008

Analysis of formatted data

You can then use the formatted data for a categorical analysis without having to make new variables.

Page 46: HRP223 - 2008

HRP223 2008

Diabetes Example

Import an Excel file Describe the data Calculate BMI Do a t-test vs. a population BMI of 24.8