Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 23 rd, 2013 Copyright ©...

Post on 24-Dec-2015

216 views 1 download

Tags:

Transcript of Computing at Stanford and Introduction to SAS HRP223 – Topic 0 Sept 23 rd, 2013 Copyright ©...

Computing at Stanfordand Introduction to SAS

HRP223 – Topic 0Sept 23rd, 2013

Copyright © 1999-2013 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

Plan for Diving Molokini

0) Talk to a statistician1) Outline tables and sketch graphics2) Collect structured data3) Do reproducible research4) Finish on Day 05) Visualize Everything

Plan for Prospective Trials

0) Talk to a Statistician Early

• I hate doing post mortems• I will ask you about …– quantifying the endpoint– meaningful differences– the number of people you can actually assess– the next study

1) Outline Tables and Sketch Graphics

Draw a paper’s tables and figures before day 0 of the study.

2) Collect Structured Data

3) Infrastructure for Reproducible Research

• Web compendium with final data

• Happy medium – nearly raw data– programs/scripts– description of the exact version of the software

• The complete computing environment– raw data– programs/scripts to clean the data– programs/scripts to analyze the data– exact programming environment that did the analysis– ability to push “run”

4) Before Day 0 ….

• Case report forms with outer bounds• Realistic, fake patients• Write code to:– make a warning report– make the graphics– make the tables

5) Visualize Everything

• Traditional plots– Bar charts– Histograms– Boxplots

• Sets of categorical variables– Jitter plots– Sunflower plots

Objectives for HRP 223

• Administrivia• Software tools at Stanford– Security at Stanford

• Software tools not endorsed by Stanford• Data• SAS

General

• The course website has critical details:www.stanford.edu/class/hrp223/

• If you can, please print the slides just before the start of class.

Administrivia

Goals

• This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results.

• I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics.

Administrivia

Getting Help

• Mike Hurley mphurley@stanford.edu is the TA for the course. His office hours will be by appointment. I will be available for online Q&A at balise@stanford.edu or preferably, on the class forum on Coursework:

https://coursework.stanford.edu/portal/site/F13-HRP-223-01 • I will answer questions every morning around dawn. If you

post to the forum and do not hear back quickly please email me.

• Things labeled “Assignment”, but not “Homework”, can be done with the help of classmates.

• You are strongly encouraged to discuss your problems up until you start writing your answers to the homework problems.

Administrivia

Preliminaries

• I assume you know how to use Windows or Mac OS.• For this class you need access to a machine with: – Windows XP Pro or Vista Business/Ultimate– Windows 7/8 Professional/Business/Ultimate.

• XP Home Edition or Vista Home Edition will not work and Windows 7 Home Premium may work with the software in this class.

• I use: Windows 7 Ultimate, and Windows 7 running in Parallels on the Mac.

Administrivia

Getting a Computer

• If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here:

itservices.stanford.edu/service/helpdesk/recommended

• You want to 64 bit Windows 7 Professional/Enterprise/Ultimate.

Administrivia

Use LaneThis is useful!

http://lane.stanford.edu/help/cool-tools-proxyBookmarklet.htmlThis too!

Bookmark at Lane.

The outside book is now bookmarked on all my

machines.

Not All Books are Indexed

• Stanford has many online tech books which are not indexed at Lane or the main campus index, searchworks: searchworks.stanford.edu

Free Stanford Tools

• You can get access to free software from Stanford by going here:https://itservices.stanford.edu/service/ess

• You must use antivirus software. • You will fail the course if you send me a

document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.

Stanford Software

Get the Sophos ScannerStanford Software

Virus and Worm Issues

• Virus scan before you email me anything!

• Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus

• Sophos keeps itself updated constantly.

Stanford Software

– Sophos Anti-Virus (For both Windows & Mac OS)• Watches for suspicious things and stops them until you

authorize the software

Stanford Software

If your quarantine has a file get help

You can submit suspicious files

Stanford Desktop Tools

• This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools.– BigFix automatically checks for important

software updates.– Security Self-Help checks and allows you to fix

security weaknesses on your machine.– Open AFS lets you have access to your UNIX

account like it is just another Windows hard drive.

Stanford Software

Stanford Desktop ToolsStanford Software

Your UNIX Account

• You have a website made for you already:– www.stanford.edu/~YOUR_SUNET_ID

• UNIX stuffhttps://itservices.stanford.edu/service/afs

– If you do not want AFS you can also use SecureFX which you can get from ESS or just go to afs.stanford.edu

– You can use Stanford Desktop Tools to mount your UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS

– Do NOT put confidential/HIPAA sensitive stuff out there.

Stanford Software

Stanford Software

afs.stanford.edu is the easy way to move files to your UNIX space.

Use Your Website

Use AFS and your Website

Mount your drive then you can put stuff in the WWW

folder!

Install OpenAFS

My UNIX SpaceStanford Software

After AFS is InstalledStanford Software

SecureFXStanford Software

Secure AFS• You can make a space that can hold PHI and be shared by anybody with a

SUNet ID.1. Setup the workgroup that will serve as your access control list:

http://workgroup.stanford.edu2. Request the Secure AFS space:

https://tools.stanford.edu/cgi-bin/secure-group-request

To access you need OpenAFS client installed: http://www.stanford.edu/service/openafs

as well as Kerberos installed (Windows): https://itservices.stanford.edu/service/ess/pc/kfw

or configured (Mac): https://itservices.stanford.edu/service/ess/mac/kfm

Stanford Software

Passwords

• The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack.

https://itservices.stanford.edu/service/unixcomputing/unix/passwords • You can use Stanford’s Security Self-Help Tool which comes with

Stanford Desktop Tools to check your passwords.

Security

Protect Your Data

Two-Step Authentication!

https://accounts.stanford.edu/

• Click Manage then turn on Two-Step Auth

Data Management and AnalysisTools of the Trade

• Containers to hold data– Microsoft Excel– REDCap

• Analysis tools– SAS with Enterprise Guide– R with Rcmdr or deducer

Other Software

Excel

• is not a good place for HIPAA sensitive (PHI) material

• makes it easy to enter bad data

• can be a huge headache to import

Other Software

REDCap• is a good place for HIPAA sensitive (PHI) material• makes it hard to enter bad data• is mostly painless to import for analysis

Other Software

SAS 9.4

SAS is an old programming language where you type commands and run a bunch of things at once.

Enterprise Guide 6.1

EG is a newish programming environment where you type commands or point and click.

R 2.15.3 http://cran.cnr.berkeley.edu/bin/windows/base/old/2.15.3

R is a modern programming language with user hostile help files….

Other Software

R Studio http://www.rstudio.org/

Studio is an Integrated Development Environment (IDE) for R.

R with R Commander

Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R.

Other Software

R with Deducer

Deducer is another friendly, but incomplete, graphical user interface (GUI) for R.

Getting SAS

• If you have a machine with XP, Vista or Windows 7 Pro, Business or Ultimate and more than 30 Gig of extra hard drive space you can get SAS for $45 per year. Place the order here:https://itservices.stanford.edu/service/softwarelic/sas – There is a digital download that is HUGE (12+ Gig not Meg). If

you have a wired connection on campus use it.

• The instructions for installing it can be found on the class website:

Other Software

SAS for Free on Campus

• If you don’t mind working in a public place, SAS is in the Lane library.

Other Software

Other Tools I Regularly Use

• File manipulation– NotePad++– UltraEdit– Ultracompare

• Info Management– FileLocator Pro

Other Software

NotePad++

• Excellent free text editor

UltraEdit

• If you work with huge text files, get UltraEdit and buy the perpetual license.

www.ultraedit.com

Other Software

UltraCompare

• A tool to track changes in code or other text fileswww.ultraedit.com/products/ultracompare.html

Other Software

FileLocator Pro

• If you can’t find files on your machine, consider FileLocator Pro.

www.mythicsoft.com/default.aspx

Other Software

What is Data?

• Stuff that … – will make you famous or cry– you want to pull from the electronic medical

record– the information you will need to store if it is not in

the medical record

Data

Structured vs. Unstructured

• Unstructured data– Text like dictations, operation notes, data entry

comments– Difficult to process

• Structured data– Afford the ability to build ontologies– Dates– Pick lists (multiple choice)– Relatively easy to process

Data

Structuring Biomedical Data

• RxNORM for drug ingredients / brand names• ICD-9 for billing diagnostic and procedure codes– fairly coarse but nicely hierarchical

• ICD-O for detailed cancer pathology• CPT for procedures – No hierarchical structure, difficult to search

• SNOMED-CT – for general purpose clinical terms– Hierarchical, detailed and vast but with some gaps

Data

What is structured data?

• All pieces of information that you collect and calculate as part of a study are data. Every person’s response to a questionnaire is called a data point.

• There are two fundamentally different types of data: numeric and character. – Numeric data is always … numeric. Information that you could

want to do math on is numeric data.– Character data is alphanumeric. It includes the obvious things

like names and addresses, but it also includes numbers that you should not do math on.

• Some systems, like R, make finer distinctions and let you set data so they are forced to be factors.

Data

What is data coding?

• A question such as, “What is your current age in years?” is going to generate numeric data.

• A question such as, “At what age did you first contract a sexually transmitted disease?” is going to generate numeric data ….

But you are going to need to allow for the possibility that somebody has never contracted a sexually transmitted disease.

… and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers.

Data

What is data coding? (2)

• When you have a question that generates numeric data and your subject’s response is not a “real number” you can code a bogus value.– “Not applicable” can be coded as age –1000000.– “Do not know” can be coded as –2000000.

• The better way to deal with this problem is to use the value “NULL.”– SAS allows you to code 27 different types of NULL.– Null values make your job easier when you try to do math

on the values.

Data

Missing Data

• SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place.

• You can also use .A, .B, etc. to code for missing numbers but you can’t enter them directly.

Data

What is data coding? (3)

• Questions that generate alphanumeric data are always complex compared to numeric data.

• “Where were you born?” can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from a multiple choice format question.– Do not use null in fill-in-the-blanks.

Data

Typical Tasks

• Importing data• Cleaning• Making a subset• Numeric and graphical summaries• Analyses with graphics• Summary reports

or• Doing simple math

Data

Basics

• While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.

• I hope this stuff will make your lives easier in statistics classes…

SAS

Using EG for MathSAS

SAS

A data set is shown in the flowchart.It’s contents are displayed in the programming windowpane.You can see it stored in the temporary “work library” by browsing the Server List.

SASMake a temporary

dataset to hold the answer.

The Log tab gives you feedback on what SAS did.

SAS

No Need for a Data SetFor a simple calculation you do not need to make a

dataset to hold a single number. You have the number show up in the log window.

1. Give SAS a formula. 1+1

2. Tell it what to call the results.theAnswer

3. Print the results out.putlog theAnswer =

4. Tell it you are done giving it instructions.

Use short meaningful names that do not include spaces, punctuation characters, or leading numbers.

SAS

Basic Math• You put the instructions together by typing a

program into the code window, like this:data _null_;theAnswer = 1 + 1;putlog theAnswer =;run;• Run it.

Don’t bother to store the results in a dataset.

SAS

The count of how many lines have been submitted

The Answer

SAS

Don’t panic….

• The help that ships with SAS is good.

• It is its own program hidden inside the SAS folder off the Windows start button.

Search for functions and call routines by category

Final Administrivia

• Please save a table for the people who are officially enrolled (or are taking the class for deferred credit).

• Bring a laptop with SAS if possible.• Grades (pass/fail only)– Pass 4 of 4 homework assignments for 3 units– Pass 3 of 4 homework assignments for 2 units