1 How to Navigate the Guide To navigate this SAS Guide, use the PageDown and PageUp buttons on the...

250
1 How to Navigate the Guide To navigate this SAS Guide, use the PageDown and PageUp buttons on the keyboard. A copy of this PowerPoint document can be downloaded from http://www.biostat.ku.dk/~lts/varians_reg ression/sasguide.ppt
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    1

Transcript of 1 How to Navigate the Guide To navigate this SAS Guide, use the PageDown and PageUp buttons on the...

  • Slide 1
  • 1 How to Navigate the Guide To navigate this SAS Guide, use the PageDown and PageUp buttons on the keyboard. A copy of this PowerPoint document can be downloaded from http://www.biostat.ku.dk/~lts/varians_regression/sasguide.ppt
  • Slide 2
  • 2 Preface This is The Beginners Guide To SAS. The document was originally written by Anna Johansson, MEP, Stockholm. It has been lightly edited by Peter Dalgaard and Lene Theil Skovgaard for the Ph.D. course on SAS at the Faculty of Health Sciences, University of Copenhagen, May 2002, and later by LTS for the Ph.D. Course in Analysis of Variance and Regression.
  • Slide 3
  • 3 Introduction What is SAS? SAS is a software package for managing large amounts of data and performing statistical analyses. It was created in the early 1960s by the Statistical Department at North Carolina State University. Today SAS is developed and marketed by SAS Institute Inc. with head office in Cary, North Carolina, U.S.A.
  • Slide 4
  • 4 Introduction (cont.) SAS in Denmark The Danish subdivision of SAS Institute provides consulting and a wide range of courses. It is located in Copenhagen. SAS Institute A/S Kbmagergade 7-9 1150 Kbh. K Tel: 70 28 28 70 Fax: 70 28 29 91 Email: [email protected]
  • Slide 5
  • 5 Introduction (cont.) The SAS System The SAS System is mainly used for -Data Management (about 80% of all users) -Statistical Analysis (about 20% of all users) The power of SAS lies in its ability to manage large data sets. It is fast and has many 5statistical and non-statistical features. The disadvantage of SAS is its steep learning curve. It takes quite a bit of an effort to get started. User-friendly interfaces do exist, though.
  • Slide 6
  • 6 Introduction (cont.) Start af SAS p kursussalen: -Flyt p musen (eller tnd maskinen) -Login er kursusxx -Password skifter -Vlg START, efterfulgt af STATISTIK og SAS 8.2
  • Slide 7
  • 7 Introduction (cont.) Getting Started A very good start is to enter the SAS Online Training. Choose in the menu Help + Getting Started with the SAS Software, then click on the book.
  • Slide 8
  • 8 Introduction (cont.) SAS Files If your data is not yet in a SAS data set, you access the raw data by creating a SAS data set from it. Once you have made the SAS data set, you use SAS programs to analyse, manage and/or present the data. SAS data sets can be permanent or temporary. A special library called WORK is created on start-up and deleted on exit.
  • Slide 9
  • 9 Introduction (cont.) SAS Programming SAS programming works in two steps: Data Step 1. reads data from file 2. makes transformations and adds new variables 3. creates SAS Data Set Proc Step 4. uses the SAS Data Set 5. produces the information we want, such as tables, statistics, graphs, web pages
  • Slide 10
  • 10 Introduction (cont.) Data and Proc Steps Example of a SAS program: data work.main; set work.original; age=1997-birthyr; Data Step bmi=weight/(height*height); run; proc print data=work.main; var id age bmi; run; Proc Steps proc means data=work.main; var age bmi; run;
  • Slide 11
  • 11 Introduction (cont.) SAS Modules The SAS system is made up of several modules, each used for different purposes. This Guide deals only with the SAS BASE and the GRAPH modules, giving knowledge on basic data management and simple statistical analyses. Other modules are SAS/Stat (statistical analyses), SAS/Access (data base applications), SAS/Graph, SAS/Assist (menu-driven info system), SAS/FSP (data entry and retrieval), SAS/Connect (remote submit), etc.
  • Slide 12
  • 12 Introduction (cont.) SAS at Biostat Dept. We primarily use SAS on a Unix server whereas these notes assume that the programs are run locally on a PC The basic programming is the same regardless of what platform you use. This is one of the big advantages of SAS. We do tend to prefer running SAS non-interactively though.
  • Slide 13
  • 13 The SAS Environment Windows The main feature of SAS is its division of the main window into two halves. The left part is a navigator of SAS libraries and Results (from the Output window). The right part is divided into three separate windows: -Program window or Enhanced Editor -Log window -Output window
  • Slide 14
  • 14 The SAS Environment (cont.) Windows The log and output windows are always opened by default when you start SAS (although they may be hidden behind each other). The program window and the Enhanced Editor are two different windows but they are used for the same purpose, i.e. writing code and executing it. One of them will open by default. Other windows are also available and are opened on request (use View), for instance the Graphics window.
  • Slide 15
  • 15 The SAS Environment (cont.) Windows (The program window is a reminiscent of the older SAS version 6. The Enhanced Editor is a new feature of version 8, and is more user-friendly, since it colours the code and works more like an ordinary text editor.)
  • Slide 16
  • 16 The SAS Environment (cont.) Windows To check which windows are opened, choose Window in the menu. At the bottom there is a list of opened windows. The active window is indicated by a . A star * after the window name indicates that the file has not been saved since its latest alteration. If you are missing any of the windows (Enhanced Editor, Log, Output), you can open it by choosing in the menu View + window-name
  • Slide 17
  • 17 The SAS Environment (cont.) Windows You switch between the windows by choosing Window + ENHANCED EDITOR Window + OUTPUT Window + LOG in the menu.
  • Slide 18
  • 18 The SAS Environment (cont.) Windows The window location on the screen can be changed by choosing Window + Tile Window + Cascade or by pulling the lower right corner of the window with the mouse. When you exit SAS, the window setting will be kept for the next session (unless someone else...).
  • Slide 19
  • 19 The SAS Environment (cont.) Enhanced Editor / Program Window In the Enhanced Editor you write the SAS programs. The programs tell SAS to produce the data sets, tables, statistics, etc. A program consists of data steps and proc steps. A SAS program is executed (submitted) by choosing Run + Submit in the menu (or by clicking on the Running Man icon, fourth from the right in the menu).
  • Slide 20
  • 20 The SAS Environment (cont.) Output and Log Windows The result of a program execution is printed to the Output window. There you will find the prints, tables and reports, etc. A log file is printed to the Log window. The log file contains information about the execution, whether it was successful or not. It usually points out your mistakes with warning and error messages so that you can correct them.
  • Slide 21
  • 21 The SAS Environment (cont.) Example: SAS Log 65 proc gplot data=work.influnce; 66 plot di*pred / vaxis=axis1 haxis=axis1; ERROR: Variable DI not found. NOTE: The previous statement has been deleted. 67 run; Make a habit of checking the Log window after every execution. Even if SAS has accepted and executed the program, you may have made a methodological error. Check the note on how many observations were read, and if there were any missing values.
  • Slide 22
  • 22 The SAS Environment (cont.) Example: SAS Output patientens alder Cumulative ALDER Frequency Frequency __________________________________ 0 - 24 41 41 25 - 44 176 217 45 - 64 77 294 65- 25 319
  • Slide 23
  • 23 The SAS Environment (cont.) File Types These files are created by SAS: -.sas file (SAS program) -.log file (Log) -.lst file (Output) The SAS data sets are saved as.sd7 or.sas7bdat files. (Other file types, e.g. catalogs, are also used and created by SAS, but we will not pursue this any further.)
  • Slide 24
  • 24 The SAS Environment (cont.) Using the SAS System You work with SAS using -Menus and Toolbar -Command Line -Key Functions F1-F12
  • Slide 25
  • 25 The SAS Environment (cont.) Example Three different ways to Open a File in the Enhanced Editor: 1. Menus: choose File + Open 2. Toolbar: press the icon for Open 3. Command line: write include N:\temp\bp.sas and press Enter.
  • Slide 26
  • 26 The SAS Environment (cont.) Commands and Keys
  • Slide 27
  • 27 The SAS Environment (cont.) Write and Read In the Enhanced Editor you can -create new, or edit existing, programs -submit programs -save programs (an unsaved file is marked with * after the file name) You can NOT edit the log file or the output file in their windows. They are only readable. If you wish to edit these files, save them and use the Enhanced Editor or Word.
  • Slide 28
  • 28 SAS syntax Statements The SAS code (syntax) consists of statements (stninger). Statements mostly begin with a keyword (ngleord), and they ALWAYS end with a SEMICOLON. data work.cohort; set course.males98; run; proc print data=work.cohort; run; Examples of keywords: data, set, run, proc.
  • Slide 29
  • 29 SAS syntax (cont.) Statements SAS statements can begin and end anywhere on a line. data work.cohort; One or several blanks can be used between words. data work.cohort; One or several semicolons can be used between statements. data work.cohort;;; ;
  • Slide 30
  • 30 SAS syntax (cont.) Statements The statement can begin and end on different lines. data work.cohort; SAS will not object to several statements on the same line. However, it is not considered good programming to have more than one statement per line. It makes the code difficult to read. Avoid this! data work.cohort; set course.males98; run;
  • Slide 31
  • 31 SAS syntax (cont.) Indenting to improve readability Improve the readability of your program by adding more space to the code (= indenting). Begin data steps and proc steps in the first position, as far left as possible. The ending run statement should also be in the first position. All statements in between should start a few blanks in from the left margin. This creates blocks of data steps and proc steps, and you can easily see where one ends and another begins.
  • Slide 32
  • 32 SAS syntax (cont.) Example of Indenting data work.height; infile 'h:\mep\rawdata_height.txt'; input name $ 1-20 kon 21 alder 22-23 height 24-30; if kon=0 and (height ne.) then do; if 0
  • 104 The Online HELP (cont.) Example PROC MEANS: Syntax PROC MEANS ; BY variable-1 variable- n> ; CLASS variable(s) ; FREQ variable; ID variable(s); OUTPUT ; TYPES request(s); VAR variable(s) ; WAYS list; WEIGHT variable;
  • Slide 105
  • 105 The Online HELP (cont.) Explanation to the Online Help Text -underlined word = keyword referring to a statement (statements within a procedure are optional, the PROC and the RUN statements are required) -black word = required if the corresponding keyword is used -words within = optional, not required -words separated by | = possible choices of values for a specific option
  • Slide 106
  • 106 The Online HELP (cont.) Example If you click on the PROC MEANS, a list of possible options will be displayed. Among them is the MAXDEC= option which we have already used. The equal sign is required. Next to MAXDEC= is the black word number. If you use the MAXDEC option you are required to fill in a number corresponding to the maximum number of decimals to be displayed. (The exact conventions depend on which version of the help you use. pd)
  • Slide 107
  • 107 Labels What are Labels? Each variable has a variable name (e.g. birthyr) and a LABEL (e.g. Year of Birth). The label is how the variable is written on the output. By default the label = variable name unless you specify it. To define and assign a label, use the LABEL statement. label variable1 = label-name1 variable2 = label-name2... ;
  • Slide 108
  • 108 Labels (cont.) What are Labels? Labels can be 256 characters long at most. The output from proc CONTENTS include a column with labels for all the variables in the data set. To delete a label simply define the label equal to space: label variable1 = ;
  • Slide 109
  • 109 Labels (cont.) Permanent or Temporary Labels Labels can be assigned inside a data step or a proc step. Labels assigned in a data step are permanent. They are also transferred to new data sets. Labels assigned in a proc step are temporary. A temporary label replaces a permanent label throughout the execution of the procedure step. Most common are permanent labels defined in the data step.
  • Slide 110
  • 110 Labels (cont.) Example Assigning permanent labels in a data step: data course.main; set course.original; age=1997-birthyr; height=height/100; bmi=weight/(height*height); label birthyr=Year of Birth age=Alder height=Hjde bmi=BMI; run;
  • Slide 111
  • 111 Labels (cont.) Example With label: Year of OBS Birth 1 1954 2 1956 3 1956 4 1962 5 1954 6 1953 7 1955... 18 1957 Without label: OBS BIRTHYR 1 1954 2 1956 3 1956 4 1962 5 1954 6 1953 7 1955... 18 1957
  • Slide 112
  • 112 Formats What are Formats? Formats are used on variable values to -display the values differently from the raw values (e.g. with fewer decimals, or as dates) -group the values (values 0-25=low, values 26-100=high) There are predefined formats in SAS which you may use, but you can also create your own formats. The procedures are designed to handle formats and use them accordingly.
  • Slide 113
  • 113 Formats (cont.) Assign Formats To assign formats you use the FORMAT statement inside a data step (permanently) or a proc step (temporarily). The general form of the FORMAT statement is format variable1 format1.; The following yields a value with two digits, a decimal point and two decimals (5 positions, of which two are decimals): format bmi 5.2;
  • Slide 114
  • 114 Formats (cont.) Example Permanent Assignment data course.main; set course.original; age=1997-birthyr; height=height/100; bmi=weight/(height*height); format age 4.0 bmi 4.2 birthyr best4.; run;
  • Slide 115
  • 115 Formats (cont.) Example Temporary Assignment proc print data=course.main; var birthyr age bmi; format age 4.0 bmi 4.2 birthyr best5.; run; Usually, the format statement is at the end of the data or proc step together with the label statement.
  • Slide 116
  • 116 Formats (cont.) Predefined SAS Formats Formats are all of the form (where indicates optional and is not to be typed in): format-name. w indicates maximum number of positions used to display the value d indicates optional number of decimals in a numeric format
  • Slide 117
  • 117 Formats (cont.) Predefined SAS Formats Formats for character variables need a $ sign in the first position: $ format-name. All formats, numeric or character, MUST contain a period (. punktum), either at the end or before the d value. See examples.
  • Slide 118
  • 118 Formats (cont.) Predefined SAS Formats w.d = numeric values at most w positions long, and d of these positions are decimals $w. = character values w positions long COMMAw.d = numeric values with commas and decimal points: 12,345.67 BESTw. = chooses the best notation with w positions for numeric values The period (.) occupies one position in all of these formats.
  • Slide 119
  • 119 Formats (cont.) Example
  • Slide 120
  • 120 Formats (cont.) User-defined Formats There are situations when the predefined formats do not suffice. An example, you wish to group the BMI values into three categories; underweight, normal weight, overweight. There is no predefined format to meet your demands in this situation. The solution is to create your own format.
  • Slide 121
  • 121 Formats (cont.) User-defined Formats To use your own formats you must -define the format -assign the format Several variables may be assigned to the same format and A variable may be assigned to different formats in different procedures
  • Slide 122
  • 122 Formats (cont.) Proc FORMAT defines formats Formats are defined through the FORMAT procedure. proc format; value format-name range1 = label range2 = label... ; run; The labels must be inside quotes ().
  • Slide 123
  • 123 Formats (cont.) User-defined Formats Format names are like any other SAS names, however they must not end in a number. A format for a character variable must have a dollar sign $ as its first character. Format names do NOT end with a period (.) in proc FORMAT. The period is only used when assigning the format in a data or proc step.
  • Slide 124
  • 124 Formats (cont.) Example: User-defined Formats A case/control format (case_1f) and a BMI format (bmif). proc format; value case_1f 0=Case 1=Control other=Other; value bmif low-20.0=Underweight 20.0-25.0=Normal weight 25.0-high=Overweight other=Other; run;
  • Slide 125
  • 125 Formats (cont.) Example: User-defined Formats Above, a value of 20.0000 would fall into Underweight, but 20.0001 would fall into Normal weight. The first true range alternative is used for a value of a variable assigned by the format.
  • Slide 126
  • 126 Formats (cont.) Special Format Values other = all other values, including missing values low = the lowest value (minimum) of the variable assigned to the format, including missing values. (For character formats low does not include missing values.) high = the highest value (maximum) of the variable assigned to the format
  • Slide 127
  • 127 Formats (cont.) Assigning User-defined Formats User-defined formats are assigned by a FORMAT statement, exactly as with the predefined formats. proc freq data=course.main; tables bmi; format bmi bmif.; run; Cumulative Cumulative BMI Frequency Percent Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0
  • Slide 128
  • 128 Formats (cont.) Assigning User-defined Formats proc means data=course.main maxdec=1; class bmi; var age; format bmi bmif.; run; The MEANS Procedure Analysis Variable : age N bmi Obs N Mean Std Dev Minimum Maximum Underweight 7 7 37.9 3.3 35.0 44.0 Normal weight 51 49 37.6 3.4 30.0 44.0 Overweight 5 5 38.0 3.3 35.0 42.0
  • Slide 129
  • 129 Formats (cont.) Assigning User-defined Formats As shown above, user-defined formats are assigned in the exact same way as the SAS formats: format variable1 format1.;
  • Slide 130
  • 130 Titles and Footnotes Titles You can add titles to the output with a TITLE statement. A TITLE statement is one of the global statements which do not have to be included in a data step or a proc step (other global statements are the LIBNAME and OPTIONS statements). The form of the TITLE statement is title here-you-write-the-title; The title must be surrounded by quotes ().
  • Slide 131
  • 131 Titles and Footnotes (cont.) Example title BMI Body Mass Index; proc freq data=course.main; tables bmi; format bmi bmif.; run; BMI Body Mass Index Cumulative Cumulative BMI Frequency Percent Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0
  • Slide 132
  • 132 Titles and Footnotes (cont.) Delete Titles A title will stay defined, and be printed to all output, until it is changed, or deleted. To delete a title simply write title;
  • Slide 133
  • 133 Titles and Footnotes (cont.) Several Titles It is also possible to have second titles below the main title. A maximum of 10 titles can be used simultaneously. title1 here-you-write-the-first-title; title2 here-you-write-the-second-title;... title10 here-you-write-the-tenth-title;
  • Slide 134
  • 134 Titles and Footnotes (cont.) Several Titles The unnumbered title statement, is equal to the title1 statement. It is possible to have, for example, title2 undefined or deleted while title3 is defined. It will result in a gap between title1 and title3 on the printout representing title2. However, when you delete say title3, all titles beneath it (title4-title10) will also be deleted. title3 ;
  • Slide 135
  • 135 Titles and Footnotes (cont.) Example title BMI Body Mass Index; title2 Women 35-45 yrs; proc freq data=course.main; tables bmi; format bmi bmif.; run;
  • Slide 136
  • 136 Titles and Footnotes (cont.) Example BMI Body Mass Index Women 35-45 yrs Cumulative Cumulative BMI Frequency Percent Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0
  • Slide 137
  • 137 Titles and Footnotes (cont.) Titles Window A shortcut to defining titles is the Titles window. Issue the title command in the Command line. The Titles window will open, with all your current title definitions. From here the titles can be changed directly by editing. The disadvantage of this shortcut is that you can NOT save the title definitions, as you could have if you had written them in code. When a program is rerun later after many title changes, the titles will not be as originally.
  • Slide 138
  • 138 Titles and Footnotes (cont.) Footnotes Footnotes work in the exact same way as titles. The only difference is that footnotes are written at the bottom of the printout. footnote here-you-write-the-footnote; To delete a footnote write footnote;
  • Slide 139
  • 139 Titles and Footnotes (cont.) Example footnote BMI Body Mass Index; footnote2 Women 35-45 yrs; proc freq data=course.main; tables bmi; format bmi bmif.; run;
  • Slide 140
  • 140 Titles and Footnotes (cont.) Example Cumulative Cumulative BMI Frequency Percent Frequency Percent Underweight 2 11.1 2 11.1 Normal weight 14 77.8 16 88.9 Overweight 2 11.1 18 100.0 BMI Body Mass Index Women 35-45 yrs
  • Slide 141
  • 141 Titles and Footnotes (cont.) Footnotes Window To open the Footnotes window and edit footnotes directly, issue the command footnote in the Command line. ________________________________________________ There are lots of additional features to titles and footnotes available, such as fonts, sizes and orientation, etc. See the manual SAS/GRAPH Software Vol. I.
  • Slide 142
  • 142 Subsetting a Data Set Subsets of Data Often one wants to use only a subset of a data set, e.g. persons older than 60 years, women, cases etc. This is particularly useful when performing data cleaning, and you only want to print the observations with extreme values of a variable, say blood pressure > 200.
  • Slide 143
  • 143 Subsetting a Data Set (cont.) WHERE option In procedures you use the WHERE data set option to subset the data set. proc print data=SAS-data-set(where=(expression)); run; The WHERE data set option may be used in any procedure. It can also be used in data steps, although it is less usual. data course.cases; set course.main(where=(case_1=1)); run;
  • Slide 144
  • 144 Subsetting a Data Set (cont.) WHERE option The expression must be a logical one, resulting in true or false. Only observations for which the expression is true will be used in the proc step. Examples of expressions are: where=(birthyr gt 1950) where=(1947
  • 145 Subsetting a Data Set (cont.) Conditional Operators Possible conditional operators (use sign or abbreviation): = eq equal to ^= ne not equal to > gt greater than < lt less than >= ge greater than or equal to
  • 147 Subsetting a Data Set (cont.) Examples proc freq data=course.main(where=(birthyr