Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 21 st, 2009 Copyright ©...

60
Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 21 st , 2009 Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Transcript of Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 21 st, 2009 Copyright ©...

Computing at Stanfordand Introduction to SAS

HRP223 – Topic 0Sept 21st, 2009

Copyright © 1999-2009 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Objectives

• Administrivia• Software tools at Stanford– Security at Stanford

• Software tools not endorsed by Stanford• Data• SAS

General

• The course website has critical details:www.stanford.edu/class/hrp223/

• If you can, please print the slides just before the start of class.

Administrivia

Goals

• This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results.

• I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.

Administrivia

Getting Help

• Kameelah Abdullah [email protected] is the TA for the course.

• Her office hours will be announced weekly. I will be available for online Q&A at [email protected] or preferably, on the class newsgroup. I will answer questions every morning around dawn. If you post to the newsgroup and do not hear back quickly please email me.

• Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates.

• You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.

Administrivia

Preliminaries

• I assume you know how to use Windows or Mac OS.

• For this class you need access to a machine with Windows XP Pro or Vista Business/Ultimate. XP Home Edition or Vista Home Edition will not work with the software in this class.

• I use: XP Pro, Vista Ultimate, and XP Pro running in Parallels on the Mac.

Administrivia

Getting a Computer

• If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here:

www.stanford.edu/dept/itss/ess/adminapps/recommended.html

• You want to have XP Pro or the Business or Ultimate version of Vista.

Administrivia

Free Stanford Tools

• You can get access to free software from Stanford by going here:

www.stanford.edu/dept/itss/ess/ • You must use antivirus software. • You will fail the course if you send me a

document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.

Stanford Software

Get the Sophos ScannerStanford Software

Virus and Worm Issues (3)

• Virus scan before you email me anything!

• Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus

• Sophos keeps itself updated constantly.

Stanford Software

– Sophos Anti-Virus (For both Windows & Mac OS)• Watches for suspicious things and stops them until you

authorize the software

If your quarantine has a file get help

You can submit suspicious files

Stanford Software

Stanford Desktop Tools

• This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools.– BigFix automatically checks for important

software updates.– Security Self-Help checks and allows you to fix

security weaknesses on your machine.– Open AFS lets you have access to your UNIX

account like it is just another Windows hard drive.

Stanford Software

Stanford Desktop ToolsStanford Software

Your UNIX Account• You have a website made for you already:– www.stanford.edu/~YOUR_SUNET_ID

• UNIX stuffwww.stanford.edu/services/afs/intro/index.html

www.stanford.edu/services/web/howto.leland.html – You can use Stanford Desktop Tools to mount your

UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS

– If you do not want AFS you can also use SecureFX which you can get from ESS or just go to afs.stanford.edu

– Do NOT put confidential/HIPAA sensitive stuff out there.

Stanford Software

After AFS is InstalledStanford Software

My UNIX SpaceStanford Software

SecureFXStanford Software

Stanford Software

Stanford Software

Passwords

• The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack.

www.stanford.edu/services/unix/passwords.html • You can use Stanford’s Security Self-Help Tool which comes with

Stanford Desktop Tools to check your passwords.• If you do not know how to set or change your password look here:

www.stanford.edu/group/security/securecomputing/setpass.html

Security

General Security

• The biggest weaknesses in computer security are the legal users of the system. – Walking away from a terminal – Using passwords that are easy to crack – Taking data off of restricted machines– Viruses and Trojan horses will kill you if you let

them!

Security

Email

• Email provides all the confidentiality of a postcard.

• If you are sending HIPAA sensitive information you can secure your email:

www.stanford.edu/services/secureemail/

Security

Unsolicited Email

• Spam™, Spam™, Spam™, wonderful Spam™, yes wonderful Spam™

• You may get unsolicited commercial solicitations, advertisements, chain letters, or pornography through your Stanford email account.– NEVER respond to these messages, never use the REMOVE

provided in the email.– NEVER put your email address on a web page.

Security

• At webmail.stanford.edu you can choose the Preferences tab and Mail Filters tab to automatically sack repeat offenders.

Security

Back up your work!

• Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year.– Coffee, computer worm or virus, small child with

refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc.

• Every day back up your work to more than one location.

Security

Where to Backup

• PLEASE use removable media if you have no network access – – Floppy disk, CD, DVD, flash media

• NEVER backup or share confidential data (HIPPA sensitive protected health information) on mobile media without talking to security experts first.

• Ask your Tech support person for recommendations.

Security

How to Backup

• You will forget to back up your work. If you can, use a program to do the backup automatically.

• I use an inexpensive program called Second Copy 2000 by Centered Software.

www.centered.com • It copies all my work to the department’s server and

even keeps the old version of my work.

• Talk to your security expert.

Security

Encrypted USB drives• USB drives (also called thumb drives) are a very convenient

way to keep backups and allow you to move your data around.

• However, they are very easy to lose! NEVER store unencrypted, restricted data on a USB drive.

• You can encrypt at the file level (excel, winZip) – ok• You can encrypt the whole drive (PGP disk, TrueCypt) –

Better.• You can have a hardware encrypted USB drive – BEST!

– There are many manufacturers, however, most are Windows only.

– IronKey supports both Windows and Mac and is highly recommended.

Security

Data Management and AnalysisTools of the Trade

• Containers to hold data– Microsoft Excel– REDCap

• Analysis tools– SAS with Enterprise Guide– R with Rcmdr

Other Software

Excel

• is not a good place for HIPAA sensitive (PHI) material

• makes it easy to enter bad data

• can be a huge headache to import

Other Software

REDCap• is a good place for HIPAA sensitive (PHI) material• makes it hard to enter bad data• is mostly painless to import for analysis

Other Software

SAS 9.2 TS2

SAS is an old programming language where you type commands and run a bunch of things at once.

Other Software

Enterprise Guide 4.2

EG is a newish programming environment where you type commands or point and click.

Other Software

R 2.9

R is a modern programming language with user hostile help files….

Other Software

R Commander 1.4

Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R.

Other Software

Getting SAS

• If you have a machine with XP Pro or Vista Business/Ultimate and more than 3 Gig of extra hard drive space you can get SAS for $65 per year. Place the order here:

www.stanford.edu/services/softwarelic/sas/order/index.fft – There is a digital download that is HUGE (3 Gig not Meg). If you

have a wired connection on campus consider it. Otherwise ask me for the DVDs.

• The instructions for installing it can be found here:www.stanford.edu/class/hrp223/2009/install/

Other Software

Updating SAS

• Make sure to patch your version of SAS with all Alert status patches.

ftp.sas.com/techsup/download/hotfix/hotfix.html • Also patch Enterprise Guide 4.2.

None yet….

• Sign up for email notification of new patches (called TSNEWS-L):

support.sas.com/techsup/news/tsnews.html

Other Software

SAS for Free on Campus

• If you don’t mind working in a public place, SAS is in the M202 lounge.

med.stanford.edu/irt/classrooms/features/computer_labs.html

Other Software

Other Tools I Regularly Use

• File manipulation– UltraEdit– Ultracompare– Tinn-R (a great editor for R)

• Info Management– FileLocator Pro– MyInfo

Other Software

UltraEdit

• If you work with text files, get UltraEdit and buy the perpetual license.

www.ultraedit.com

Other Software

UltraCompare

• A tool to track changes in code or other text fileswww.ultraedit.com/products/ultracompare.html

Other Software

FileLocator Pro

• If you can’t find files on your machine, consider FileLocator Pro.

www.mythicsoft.com/default.aspx

Other Software

MyInfo

• If you need to keep track of tons of random facts (like code snippets) consider MyInfo.

www.milenix.com

Other Software

What is Data?

• Stuff that … – will make you famous or cry– you want to pull from the electronic medical

record– the information you will need to store if it is not in

the medical record

Data

Structured vs. Unstructured

• Unstructured data– Text like dictations, operation notes, date entry

comments– Difficult to process

• Structured data– Afford the ability to build Ontologies– Dates– Pick lists (multiple choice)– Relatively easy to process

Data

Structuring Biomedical Data

• RxNORM for drug ingredients / brand names• ICD-9 for billing diagnostic and procedure codes– fairly coarse but nicely hierarchical

• ICD-O for detailed cancer pathology• CPT for procedures – No hierarchical structure, difficult to search

• SNOMED-CT – for general purpose clinical terms– Hierarchical, detailed and vast but with some gaps

Data

What is structured data?

• All pieces of information that you collect and calculate as part of a study are data. Every person’s response to a questionnaire is called a data point.

• There are two fundamentally different types of data: numeric and character. – Numeric data is always … numeric. Information that you could

want to do math on is numeric data.– Character data is alphanumeric. It includes the obvious things

like names and addresses, but it also includes numbers that you should not do math on.

• Some systems, like R, make finer distinctions and let you set data so they are forced to be factors.

Data

What is data coding?

• A question such as, “What is your current age in years?” is going to generate numeric data.

• A question such as, “At what age did you first contract a sexually transmitted disease?” is going to generate numeric data ….

But you are going to need to allow for the possibility that somebody has never contracted a sexually transmitted disease.

… and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers.

Data

What is data coding? (2)

• When you have a question that generates numeric data and your subject’s response is not a “real number” you can code a bogus value.– “Not applicable” can be coded as age –1000000.– “Do not know” can be coded as –2000000.

• The better way to deal with this problem is to use the value “NULL.”– SAS allows you to code 27 different types of NULL.– Null values make your job easier when you try to do math

on the values.

Data

Missing Data

• SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place.

• You can also use .A, .B, etc. to code for missing numbers but you can’t enter them directly.

Data

What is data coding? (3)

• Questions that generate alphanumeric data are always complex compared to numeric data.

• “Where were you born?” can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from a multiple choice format question.– Do not use null in fill-in-the-blanks.

Data

Typical Tasks

• Importing data• Cleaning• Making a subset• Numeric and graphical summaries• Analyses with graphics• Summary reports

or• Doing simple math

Data

Basics

• While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.

• I hope this stuff will make your lives easier in statistics classes…

SAS

Using EG for MathSAS

SAS

A data set is shown in the flowchart.It’s contents are displayed in the programming windowpane.You can see it stored in the temporary “work library” by browsing the Server List.

SAS

The Log tab gives you feedback on what SAS did.

SAS

No Need for a Data setFor a simple calculation you do not need to make a

dataset to hold a single number. You have the number show up in the log window.

1. Give SAS a formula. 1+1

2. Tell it what to call the results.theAnswer

3. Print the results out.putlog theAnswer =

4. Tell it you are done giving it instructions.

Use short meaningful names that do not include spaces, punctuation characters, or leading numbers.

SAS

Basic Math• You put the instructions together by typing a

program into the code window, like this:data _null_;theAnswer = 1 + 1;putlog theAnswer =;run;• Run it.

Don’t bother to store the results.

SAS

The count of how many lines have been submitted

The Answer

SAS