HRP 223 – 2008 Day 0 – Computing at Stanford and Introduction to SAS Copyright © 1999-2008...

67
HRP 223 – 2008 Day 0 – Computing at Stanford and Introduction to SAS Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Transcript of HRP 223 – 2008 Day 0 – Computing at Stanford and Introduction to SAS Copyright © 1999-2008...

HRP 223 – 2008

Day 0 – Computing at Stanfordand Introduction to SAS

Copyright © 1999-2008 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

HRP223 2008Administrivia - General

The course website has critical details:

www.stanford.edu/class/hrp223/You will fail the course if you send

me a document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.

HRP223 2008Administrivia – Goals

This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results.

I will talk about issues like security, finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.

HRP223 2008Administrivia – Getting Help

Lamiya Sheikh [email protected] is the TA for the course.

Our office hours will be announced weekly. I will be available for online Q&A at [email protected] or preferably, on the class newsgroup. I will answer questions every morning around dawn.

Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates.

You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.

HRP223 2008Administrivia – Real Data!

There will be almost no toy data sets in this class. Your solutions will work on huge

datasets.You will use generic (ungrouped)

data. The data will be very close to reality. I will not invent any problems.

Because most of the data is “live”, I will introduce small changes to the data to prevent you from beating the authors to press.

HRP223 2008Administrivia – Which SAS?

The homework problems will require you to work with SAS. I will be showing you SAS/Enterprise Guide which only runs on Windows.

HRP223 2008Getting a Computer

If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here:

www.stanford.edu/dept/itss/ess/adminapps/recommended.html

HRP223 2008Free Stanford Tools

You can get access to free software from Stanford by going here:

www.stanford.edu/dept/itss/ess/ You must use antivirus software to

use a computer these days. The Symantec/Norton Antivirus which

has been used for years is going away at the end of the month. Upgrade now.

HRP223 2008Get the Sophos Scanner

HRP223 2008

Virus and Worm Issues (3)

Virus scan before you email me anything!

Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus….

HRP223 2008Stanford Desktop Tools

This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools. BigFix automatically checks for

important software updates. Security Self-Help checks and allows

you to fix security weaknesses on your machine.

Open AFS lets you have access to your UNIX account like it is just another Windows hard drive.

HRP223 2008Stanford Desktop Tools

HRP223 2008AFS

You have a website made for you already: www.stanford.edu/~YOUR_SUNET_ID

UNIX stuff You can use Stanford Desktop Tools to

mount your UNIX drive and get stuff on the web quickly with Open AFSwww.stanford.edu/services/afs/intro/index.html

www.stanford.edu/services/web/howto.leland.html

Do NOT put confidential/HIPAA sensitive

stuff out there.

HRP223 2008After AFS is Installed

HRP223 2008My UNIX Space

HRP223 2008

HRP223 2008

HRP223 2008Passwords

The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack.

www.stanford.edu/services/unix/passwords.html You can use Stanford’s Security Self-Help Tool which

comes with Stanford Desktop Tools to check your passwords.

If you do not know how to set or change your password look here:

www.stanford.edu/group/security/securecomputing/setpass.html

HRP223 2008Security

Every year I get viruses and worms sent to me unwittingly.

Four years ago the department had half a dozen machines “hacked-into” by an unknown assailant, giving the person total control over the machines.

Every day I get dozens or hundreds of hacker/cracker “probes” looking for weaknesses in my Windows XP machine’s security.

Assume that somebody is always looking over your shoulder on the web and people are reading your email.

HRP223 2008

Security(2)

The biggest weaknesses in computer security are the legal users of the system. Walking away from a terminal Using passwords that are easy to crack Taking data off of restricted machines Viruses and Trojan horses will kill you if

you let them!

HRP223 2008Microsoft’s Critical Mistakes

Microsoft is notorious for producing programs with security problems. The latest operating systems have built-in tools to fix problems when Microsoft fixes them.

With XP with SP 2 or SP 3, you can easily set your machine to update itself. You should run Windows Update and download and apply all critical security updates often.

HRP223 2008

HRP223 2008

HRP223 2008Security - Email

Email provides all the confidentiality of a postcard.

Secure your email: There are programs which will scramble your

email while it is in route, effectively making it impossible for people to read it without your permission.

Ask your security professional for help.

HRP223 2008Security – Unsolicited Email

Spam™, Spam™, Spam™, wonderful Spam™, yes wonderful Spam™

You may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account. NEVER respond to these messages, never use the

REMOVE provided in the email. NEVER put your email address on a web page.

HRP223 2008SPAM Filter and Malicious Emails

You can tell the Stanford mail system to filter your mail and automatically remove things that are probably junkmail. Go here: tools.stanford.edu and you can set your mail to be filtered. Definitely have it remove spam marked with SPAM: #####

A fairly new attack is to embed database access code in the body of an email. When your virus scanner notices this it will treat your entire inbox as if it has a virus in it. This can be very bad if your virus scanner is set to delete all files with viruses.

HRP223 2008Back up your work!

Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year. Coffee, computer worm or virus, small

child with refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc.

Every day back up your work to more than one location.

HRP223 2008Where to Backup

PLEASE use removable media if you have no network access – Floppy disk, CD, DVD, flash media

NEVER backup confidential data (HIPPA sensitive data) to mobile media without talking to security experts first.

HRP223 2008How to Backup

You will forget to back up your work. If you can, use a program to do the backup automatically.

I use an inexpensive program called Second Copy 2000 by Centered Software.

www.centered.com It copies all my work to the department’s

server and even keeps the old version of my work.

Talk to your security expert.

HRP223 2008Other Tools I Use

I keep a list of useful links here:www.stanford.edu/class/hrp223/2008/usefulLinks.html

HRP223 2008UltraEditIf you work with text files, get

UltraEdit and buy the perpetual license.

www.ultraedit.com

HRP223 2008UltraCompare

To track changes in code or other text fileswww.ultraedit.com/products/ultracompare.html

HRP223 2008FileLocator Pro

If you can’t find files on your machine, get FileLocator Pro.

www.mythicsoft.com/default.aspx

HRP223 2008MyInfo

If you need to keep track of tons of random facts (like code snippets) get MyInfo

www.milenix.com

HRP223 2008Data Management and Analysis

Use the software which has handy support. SAS, Stata, SPSS and S-Plus (but not R) are fairly user-friendly. The strengths of each: R is free.

Install the Rcmdr package, then type library(Rcmdr) S-plus is wonderful if you are going to invent

statistics. SAS is strong for major data manipulation and

database processing. Use SPSS if you want a clean graphical user

interface (GUI) or if you are statistics phobic.

HRP223 2008SAS vs. S-Plus

I believe that SAS is the de facto standard for biological, clinical, and medical research in the USA (as well as the rest of the world). R and S-Plus are very popular with the statisticians on campus.

Virtually all pharmaceutical companies use SAS for analysis of clinical trial data for assessment of safety and efficacy of drugs. 

S-Plus’ strengths are in graphics and developing new statistics (and perhaps its object-oriented model). Its weakness is poor usability for non-programmers. However, it is making huge gains in usability.

I find S-plus relatively difficult to use for data management.

HRP223 2008R/S-Plus

If you would like to learn R or S-Plus, I strongly recommend that you go with S-Plus for Windows.

Come talk to me for reference books.

HRP223 2008Where can I get SAS?

If you have $60 for the yearwww.stanford.edu/services/softwarelic/sas/If you want to use the computer lab

med.stanford.edu/irt/classrooms/features/computer_labs.html

HRP223 2008Which Parts to Install

During the install it will ask you what components to install. NO NOT USE THE DEFAULT ACADEMIC INSTALL. It is bugged and will not give you stuff you need. Check on everything listed on the next slide:

HRP223 2008Install these

SAS/ACCESS Interface to DB2

SAS/ACCESS Interface to MySQL

SAS/ACCESS Interface to Netezza

SAS/ACCESS Interface to ODBC

SAS/ACCESS Interface to OLE DB

SAS/ACCESS Interface to ORACLE

SAS/ACCESS Interface to PC Files

SAS/ACCESS Interface to SYBASE

SAS/AF Software SAS/ASSIST Software SAS/CONNECT Software SAS/EIS Software SAS Bridge for ESRI SAS/ETS Software SAS/FSP Software

SAS/GRAPH Software SAS/IML Software SAS/INSIGHT Software SAS/LAB Software SAS/OR Software SAS/QC Software SAS/SECURE SAS/SHARE SAS/STAT SAS/ACCESS Enterprise Miner Client

Solution SAS/Genetics SAS Text Miner Client SAS Text Miner for Spanish

HRP223 2008SAS

HRP223 2008SAS Enterprise Guide

HRP223 2008Modern SAS

Enterprise Guide organizes work into projects and uses a flowchart analogy to show what is done.

Enterprise Guide builds code for you and it is very good for building analyses, but data management is still best done with some code.

HRP223 2008A Real Project

HRP223 2008Things You Do With SAS

Use it as an overpriced calculator … or

Get data into the system.Get to know your data.Find subsets of your data.Perform analyses.Visualize your results.Share the information.

All these things can be done by typing code or using point-and-click tools. Some things are best done with code.

HRP223 2008Basics

While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.

HRP223 2008Basic Math

To do a simple calculation you do the following:

1. Give SAS a formula. 1+1

2. Tell it what to call the results.theAnswer

3. Print the results out.putlog _____

4. Tell it you are done giving it instructions.5. Tell it to carry out the instructions.

HRP223 2008

Tell it to create a code object in the flowchart.

HRP223 2008Basic Math

You put the instructions together by typing a program into the code window, like this:

data _null_;theAnswer = 1 + 1;putlog theAnswer;

run; Run it.

Don’t bother to store the results.

HRP223 2008

Basic Math

The log window shows you SAS’s thoughts about your code.

When you make a mistake in your code, the line numbers can point you toward the answer.

The count of how many lines have been submitted

The Answer

HRP223 2008Basic Math

If you want to save the results into a table that looks like a spreadsheet, provide the name of the dataset on the line that has the key word data, like this:

data someData;theAnswer = 1 + 1;

run;

Save the results in a dataset called someData.

HRP223 2008Viewing the Table

You see the content of a table displayed automatically when it is created or you can double click on it.

HRP223 2008Datasets

Since the introduction of SAS 7, dataset names can be from 1 to 32 characters. Prior to SAS 7, they had to be 8 characters or less.

The names can begin with a letter or an underscore (i.e., _ ). They can contain letters, numbers or underscores.

Capitalization does not matter to SAS but mixed case can make your names easier to read.

Make your dataset names meaningful. “Demographics” or “demo” are much better

names than “d”.

HRP223 2008Basic Math

The note in the log which appears after you push the run button tells you that SAS successfully created a new dataset. While I specified the name of the data set as “someData”, SAS uses “work.someData”. This work ‘library’ is just shorthand notation referring to a folder on your hard drive that is emptied and deleted every time you quit SAS. So, the dataset “someData” is stored in the work folder and it will be destroyed when you quit.

HRP223 2008Notes on Notes

The note says that the dataset has 1 observation. That means that the table has just one row. The 1 variable statement means that the table has only one column. SAS datasets can contain millions of observations and can contain tens of thousands of variables.

HRP223 2008Not So Basic Math

You can use the same trickery to do more complex math:

data mathStuff;x = 24;square = x ** 2;poly = (x**3/3)-(x**2/2)-6x-4;putlog square= poly=;

run;

HRP223 2008A Calculator with Functions

If you remember your calculator from when you learned trigonometry, you will recall that it had function buttons to do things like calculate a cosine or sine. SAS has those functions and hundreds more. You tell SAS to do a function by typing the code word for the function you want done, followed by some details in parentheses.

HRP223 2008Function Example

data _null_;someTrigThing = sin(1);putlog someTrigThing;

run;

HRP223 2008

Function Example (2)

You can use variables with functions like this:

data _null_;numberOne = 1;someTrigThing = sin(numberOne);putlog someTrigThing;

run;

HRP223 2008Functions

I will introduce you to dozens of functions later. The important thing to remember is that they all work the same way. You type the function name with “arguments” in parentheses. You probably will never need trig functions, but other functions are extremely useful when you are taking statistics classes. Rather than looking up “density functions” in tables, you can get SAS to give you the values.

HRP223 2008How can you find a function?

Say you need a function to compute some crazy thing like factorial. 5! = 5*4*3*2*1

You can write the math yourself.data _null_;fiveFactorial = 5*4*3*2;putlog fiveFactorial=;

run;

HRP223 2008

How can you find a function? (2)

Or you can look up the function in the SAS online documentation.

OnLineDoc is: support.sas.com/onlinedoc/913/docMainpage.jsp

HRP223 2008

HRP223 2008

HRP223 2008

How can you find a function? (3)

Once you are at the documentation, you can search using keywords. Pick search for words then select all documentation.

In this case I looked for “factorial function” and one of the results was this:

Functions and CALL Routines : FACT Function That link gave me the syntax for factorial

Fact(n) So my code can be simplified todata _null_;

fiveFactorial = fact(5);putlog fiveFactorial=;

run; The link also gave me several other related functions that

math people seem to obsess over like comb and perm…. You can read what those are at your leisure.

HRP223 2008SAS Programming

As you will discover, SAS programming involves mastering only five things: Descriptive comments

notes to you and other programmers SAS options

where your data is and how pages are formatted Data steps

manipulate data Procedures

summarize data Macro commands

automate repetitive tasks