Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently...

8
• Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: – the most recently created dataset – all the observations – all the appropriate variables unless you tell SAS otherwise by – changing the DATA= statement – using the WHERE statement or the BY statement – using the VAR statement • All PROCs begin with the keyword PROC, followed by the name of the PROC, followed by additional options or statements required by the specific PROC.

Transcript of Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently...

Page 1: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on:– the most recently created dataset– all the observations– all the appropriate variables

unless you tell SAS otherwise by– changing the DATA= statement– using the WHERE statement or the BY statement– using the VAR statement

• All PROCs begin with the keyword PROC, followed by the name of the PROC, followed by additional options or statements required by the specific PROC.

Page 2: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• Some PROCs we’ve already seen are:– PROC CONTENTS; – PROC PRINT ; – PROC SORT; BY sortingvariablename; – PROC MEANS; – PROC FREQ; TABLES list_of_variables;

• The BY statement is required with PROC SORT, optional with others... when used with other PROCs it tells SAS to perform separate analyses for each of the values of the BY variables, instead of keeping all the observations together in one group.

Page 3: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• Let’s do an example with Dr. Padgett’s dataset...suppose you wanted to analyze the flowering and the number of live leaves of the plants separately for each marsh ...

PROC SORT ; BY marsh;PROC FREQ; TABLES

flower*lleaves; BY marsh;

• NOTE that I have SORTed the data prior to doing the PROC FREQ BY school... when you do a PROC BY a variable, SAS assumes that the dataset is SORTed BY the variable...so if it’s not already sorted , use PROC SORT to do the sorting...

Page 4: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• NOTE also that you may use TITLE, FOOTNOTE, and LABEL statements to enhance your output from any PROC... the syntax is TITLEn ‘...’; and

FOOTNOTEn ‘...’; and up to 10 titles and footnotes are allowable. The LABEL statement allows you to give labels to your variable names up to 256 characters long... If you want your labels to be used throughout a dataset, use the label statement in the DATA step - otherwise use it within the PROC step...

Page 5: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• The WHERE statement may be used with PROCs to perform the procedure only on certain observations in the dataset, those satisfying the conditions given in the WHERE statement... for example:– in Dr. Padgett’s data, try to PRINT only those

flowering plants from the Shell Island marsh:PROC PRINT; WHERE flower=“yes” AND marsh=“si”;

• Note the similarity with the subsetting IF statements we saw earlier. The same comparison, logical, and arithmetic symbols may be used to construct the condition. See the examples on page 102...

Page 6: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• PROC SORT is one of the most useful PROC in SAS. Besides being required to sort data prior to doing other PROCs BY the sorting variable, SORTing is useful in its own right...

• You may specify as many sorting variables as you like, but be careful to note how the multiple sorts are done... try a SORT of Dr. Padgett’s data first by marsh and then by plant height. What happens if you reverse the order or the sorting variables?

• Note that you may save the SORTed data in its own dataset using the OUT= statement... otherwise the original dataset is overwritten with the SORTed data...

Page 7: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• PROC PRINT is the most widely used PROC and it has great flexibility

• Some of the options used with PROC PRINT are...:

– LABEL - to print w/labels previously assigned

– NOOBS - to not print the observation number

– ID - to use another variable instead of the OBS

– SUM - to sum up particular variables this one is helpful for financial printouts...

– VAR - to print specific variables in a certain order

– BY - to print the data in sections defined by the BY variable in a previous PROC SORT

• Example: Get a printout of Dr. Padgett’s data so that we print only the total mass of the plants in groups by the number of live leaves summed up in those groups and using the plant number to identify the observations.

Page 8: Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.

• For Monday:– Make sure you understand the material in

Chapters 2 and 3 in preparation for the midterm - ask questions if you have them…

– Read Chapter 4 up through 4.4 - we’ll discuss FORMATs next time along with writing our own formats with PROC FORMAT