Working with Data in Windows and Descriptive Statistics HRP223 – Topic 2 October 5 th, 2011...

download Working with Data in Windows and Descriptive Statistics HRP223 – Topic 2 October 5 th, 2011 Copyright © 1999-2011 Leland Stanford Junior University. All.

If you can't read please download the document

  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Working with Data in Windows and Descriptive Statistics HRP223 – Topic 2 October 5 th, 2011...

  • Slide 1
  • Working with Data in Windows and Descriptive Statistics HRP223 Topic 2 October 5 th, 2011 Copyright 1999-2011 Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.
  • Slide 2
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 3
  • Sources of Data Small data sets (aka Toy data) You may be able to type in the data directly into a SAS code file with EG like in The Little SAS Book for EG. Excel For small amounts of HIPAA safe data you can use Excel with validation. Text files with columns of numbers and text Exports created by databases frequently provide a text file full of data and a program for loading it into SAS (like REDCap). Data from the CDC Wonder database SAS Native SAS datasets created by somebody else.
  • Slide 4
  • Types of Files SuffixFile Type.pdfAdobe portable document format.zipArchives full of compressed data.xlsExcel prior to 2007.xlsxExcel 2007 and later.csvComma separated values (text which Excel likes).txtText files (letters number and punctuation without formats.sasSAS code files.egpEnterprise Guide projects.sas7bdatSAS data files.htm or.htmlWeb pages.cssCascading style sheets for web pages
  • Slide 5
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 6
  • SAS and EG files.sas files are text files full of instructions that a programmer can easily write and/or edit..egp files are not.
  • Slide 7
  • What is an EGP file? EGP files are actually zip archives (with a.egp suffix instead of.zip) which contain XML text and other text files.
  • Slide 8
  • Searching Because the contents of.egp files are compressed, the built in Windows file finder will not be able to find files by searching for keywords inside the projects. This affects me when I cant remember the file name for a project and to find it I want to search for key words in the code (like the principal investigators name or the name of the source data file).
  • Slide 9
  • Searching Inside.egp files File Locator Pro can search inside the egp files: Tools menu > Configuration Add egp here. Without the. Click here
  • Slide 10
  • Files in Enterprise Guide Alternatively, you can save SAS code files outside of the EG project. Most people create EG projects that reference data files that live outside of EG. SAS datasets Excel files Text files full of data Converted to SAS format Native Excel format
  • Slide 11
  • How SAS EG works SAS EG Saved output SAS Data (.xls,.sas7bdat, etc)
  • Slide 12
  • Shortcuts Windows indicates a shortcut to a file that lives elsewhere with an arrow in the bottom left corner of an icon. EG uses the same symbol to denote a shortcut to a file outside of the project.
  • Slide 13
  • What is in an EGP file? An EG project file ( a file with an.egp suffix) contains information and instructions but it will also have links to a lot of external files. Shortcut to a file NOT in the project. This is part of the project Shortcut to a file NOT in the project.
  • Slide 14
  • EG and Code Most of the time you will point and click to build an analysis but you can write and store your code instructions to SAS inside of the EG project or you can create a short cut to the code file which lives outside of EG. Right click and choose New > ProgramLook at the process flowNo shortcut icon
  • Slide 15
  • External SAS files You can easily save a code file outside of the project by choosing Save Program As from the File menu or clicking the Save or Save As from the program tab (when the code is open). Shortcut
  • Slide 16
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 17
  • Where are SAS data sets Stored? While SAS can refer to files using their Windows path, it is easier to type a short name instead of a long path. SAS calls the short names libraries. EG automatically knows about a couple of places where data can be stored. It creates a temporary work folder whenever EG starts. It creates a permanent sasuser folder when EG is installed. The locations for data are called libraries.
  • Slide 18
  • Where are those folders? Look at the servers list and expand out the tree to show: Servers - Local - Libraries - WORK Right click on WORK and choose Properties. If the Server List display is not showing use view menu.
  • Slide 19
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 20
  • Importing the Easy Way The most bulletproof way for importing with EG 4.3 is to use the import wizard and save into the Work library.
  • Slide 21
  • Always check this on.
  • Slide 22
  • Double check that it guesses the right Type, especially for dates.
  • Slide 23
  • Check this on By default you dont see the library or path to the Excel file.
  • Slide 24
  • Libraries Prior to the version of EG that shipped with SAS 9.3, the default behavior was for EG to save all data into the same folder/library, sasuser. This is a very bad idea. Nave students would end up with every SAS data set in one folder. Anybody using SAS can access that folder, so there are significant HIPAA issues. You can right click on a file and pick Properties to see where it is stored.
  • Slide 25
  • Change the Default File Location If you are working with an old SAS install, change the default file location to the work library. Do this once per machine.
  • Slide 26
  • Click 1st Click 2x
  • Slide 27
  • Permanent Store I suggest that you save your data into the temporary work library by default. If you have a huge file which you only want to import once, or if you want to keep a permanent copy of a SAS data file, you will want to set up a permanent library. A library reference is just a fancy way of specifying what folder SAS should use to save the.sas7bdat data files.
  • Slide 28
  • Fix the Registry (Once) then Make a Library First fix the problematic registry entries that are described in my instructions on installing SAS. If you do not do this, if you have mixtures of characters and number values in a column from Excel, programs reading the data (including SAS) can drop the cells that have character data without warning. Using Windows, make a folder c:\blah\libraryDemo to hold the data set. Using SAS, make a library to point to the folder where your data should be stored.
  • Slide 29
  • Tell SAS that there is a folder which can hold data by creating a library. This only makes SAS aware of the folder. It does not automatically put stuff into the folder.
  • Slide 30
  • Its just a folder! When the library is created it is just a pointer to a preexisting folder. That folder can contain anything. When you want to use the folder you need to explicitly tell EG to store data in the folder. First rename your input the node and draw an arrow to indicate where the library is used. These changes are mostly just aesthetic.
  • Slide 31
  • Now it looks good but the import is still into work. 1 st rename the node to match the library name 2 nd add a line to the flowchart connecting the library to the import. It just looks good.
  • Slide 32
  • Find your library here.
  • Slide 33
  • Notice it is in the library. A design feature is that you have to Refresh the library to see the freshly added file. You can see it in Windows.
  • Slide 34
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 35
  • Playing with Data Once the data is imported you can add code nodes to the flowchart or use the graphical user interface to tweak the data and do analyses. Complex changes Quick and easy subset and sorting Select all variables for the new dataset
  • Slide 36
  • Slide 37
  • Slide 38
  • Push Validate to see the SQL code. Notice the tabs in the output.
  • Slide 39
  • Notice Analysis.css hidden in the voodoo. It has the appearance scheme (color, bold, etc.)
  • Slide 40
  • Convert From a Character to a Number Remember that page I told you to bookmark in OnlineDoc? Hold the control key and type f to bring up the find box.
  • Slide 41
  • Slide 42
  • 2 nd 3 rd Click New 4 th Click Advanced expression 5 th Click Next
  • Slide 43
  • Convert to a 4 digit number with the input function: input( t1.score, 4. )
  • Slide 44
  • Before After Context sensitive menus help you describe the data you are browsing.
  • Slide 45
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 46
  • Descriptive Statistics drag
  • Slide 47
  • Slide 48
  • Turn on Higher Quality Graphics Tools > Options > Tasks > Custom Code
  • Slide 49
  • Slide 50
  • Slide 51
  • This is SAS code that can be cut and pasted into a.SAS file and run outside of EG.
  • Slide 52
  • Slide 53
  • Slide 54
  • I like this color scheme.
  • Slide 55
  • Fixing the title is too advanced for now but it is trivial to cut it in Illustrator or to mask it in PowerPoint.
  • Slide 56
  • Clean the Project 2 nd Right click and rename. 3 rd Right click and link it to the code 1 st Right click and rename it.
  • Slide 57
  • In this lecture How SAS works in Windows SAS vs EG files Libraries vs. Folders Importing Data Subsets and creating new variables Describing Data Making better summary tables
  • Slide 58
  • Table 1 Table 1 in a manuscript describes data grouped by something, typically a treatment. Frequency count by gender Means for age
  • Slide 59
  • Drowning. is bad SCUBA divers practically never drown. Can I find any patterns in who dies? Load the fakeDrowningData Excel file. It is real data based on the CDCs mortality data from 1999-2007: wonder.cdc.gov/controller/datarequest/D53 The actual ages are sampled from the age bins the CDC gives and the SCUBA rate is simulated. wonder.cdc.gov/controller/datarequest/D53
  • Slide 60
  • For each treatment table 1 always has For continuous data, a measure of central tendency and variability. Number of people Mean and standard deviation Median, min, max, 25 th and 75 th percentiles For categorical data Frequency counts, percentages
  • Slide 61
  • Too Many Nodes Continuous You can request lots of tables. Typically people do one node per variable.
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Slide 66
  • Slide 67
  • .M (dot M)
  • Slide 68
  • Add ageFixed
  • Slide 69
  • Now there is a useful dataset Now the analysis is running on the wrong data. Select the new input data and modify the node to run on the new variable.
  • Slide 70
  • The new variable N is not the number of observations. The minimum is not -1.
  • Slide 71
  • Notice the bug it lost the 5 year bins. Right click the node and reset it.
  • Slide 72
  • Slide 73
  • Categorical: several variables cross tabulated
  • Slide 74
  • Slide 75
  • Exposure Outcome Notice the table request
  • Slide 76
  • Typically I want row not column percentages. Watch the code change as you click.
  • Slide 77
  • Women dont drown while diving and there is no evidence of a SCUBA effect You can rinse and repeat building this table but then you need to copy and paste a LOT for your paper.
  • Slide 78
  • Bug with Reports If your table has missing data you may get an Unable to read SAS Report file error. Use the Tools > Options menu to turn on the procedure titles in the output.
  • Slide 79
  • Categorical and continuous pretty tables. I am going to want to count people. The easiest way to do this is to add a new column. Every person should have the value 1 then I can count or sum that variable. I am going to write a program to do this Add a programming node to the project by right clicking on the process flow and choosing new program.
  • Slide 80
  • Make a new dataset called analysisFinal Base the new dataset on everything in the analysis dataset. Make a new variable call it one and have it contain the number one. What library will the new dataset live in? Is the variable one character or numeric? Rename and link the program Describe> Summary Tables
  • Slide 81
  • Slide 82
  • Slide 83
  • Slide 84
  • Add Race then State.
  • Slide 85
  • Slide 86
  • This is too confusing with row and column percentages.
  • Slide 87
  • Slide 88
  • Slide 89
  • Slide 90
  • Slide 91
  • It is too advanced for now but you can do fancy formatting like using colors for big or impossible values/patterns. You can save this as HTML and open it in Excel to do final touches.