Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book)

36
IOWA STATE UNIVERSITY Department of Animal Science Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book) Animal Science 500 Lecture No. 5 September 14, 2010

description

Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book). Animal Science 500 Lecture No. 5 September 14, 2010. Using a Procedure (PROC). Using the procedure statement is like filling out a form You fill in the blanks Choose from the list of options PROC statement - PowerPoint PPT Presentation

Transcript of Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book)

Page 1: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Sorting, Printing, and Summarizing Your Data (Chapter in the 4 Little SAS Book)

Animal Science 500Lecture No. 5

September 14, 2010

Page 2: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using a Procedure (PROC)u Using the procedure statement is like filling

out a formn You fill in the blanksn Choose from the list of options

u PROC statementn All procedures start with this statementn Is followed by the name of the procedure desired (Print,

Means, Tabulate, etc.)

Page 3: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

PROC statementsu If you have many SAS dataset you are using

you can specify which dataset you want the procedure to applied ton PROC Contents Data = Pig12;

u This statement is optionaln If the data statement is not present the Procedure will

be applied to the last data set created (not necessarily used).

Page 4: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

By statementu The by statement is required for only one

Procedure, the PROC Sortn Obviously if you are sorting your data set you want to

sort it by some variable contained in your data.

u In all other Procedures the by statement is optional.

Page 5: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Title and Footnote statementsu Title prints at the top of your output

n 10 line limit

u Footnote prints at the bottom of your outputn 10 line limit

u You can place these anywhere you want n Applies to the Procedure you are working with so you

might want to just include it in this particular section

Page 6: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Title and Footnote statementsu When using the title and footnote statements

SAS does not care if you use the single quotes ‘test’ or the double quotes “test” as long as you are consistent.

u Do not mix types of quotes

Page 7: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Label statementu The default situation in SAS uses the variable

names to label your outputu Can create more descriptive labels

n Limit up to 256 characters long limitn Example Label = DOT ‘date on test’

ADG = ‘Average Daily Gain’;

Page 8: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the Where Statementu The Where statement can be used with almost any

PROC statementu The Where statement can be used to subset the

data much like an If – Then statementu Advantage for the Where statement

n The IF – Then statement only works in the Data stepn The Where statement works in both the Data and the

Procedure step.

Page 9: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the Where Statementu Examples using the Where statementu In the Data step

n Where Backfat le .50 backfat group =1;n Where Backfat ge .50 AND le .75 backfat group =2;n Where Backfat ge .75 AND le 1.00 backfat group =3;n Else Backfat group =.;n Run;n Quit;

Page 10: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the Where Statementu Examples using the Where statementu In the Proc Step

n Where Backfat Group = 1 n Title ‘Leanest Pigs From The Experiment 1’n Footnote ‘Experimental Pigs with .50 inches of backfat and

less’;n Run;n Quit;

Page 11: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the PROC Sort Statementu Many reasons to sort your data

n Think of examples from labsn You can use the PROC Sort statement with the by option;

l In the by statement you can include as many variables as you want Example – Proc sort; by pen sex trt; So it would sort by pen, then within pen sort sex and then, within pen

and sex sort by treatment; Often more useful to sort by one variable so that other procedures can

be performed.

Page 12: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the PROC Sort Statementu The NODUPKEY eliminates duplicate observations

that have the same values for the BY variableu Use this with the DUPOUT option

n This option will put the deleted observations in new data set

u Example n Data sort; by pen; Out = Pig13 NODUPKEY DUPOUT =

duplicates;

Page 13: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Subsetting in Procedures with the PROC Sort Statementu Example

n Data new Proc sort; by pen; Out = Pig13 NODUPKEY DUPOUT = duplicates;

u This is useful for working with field datan DHIA recordsn Records from Breed Associations (Yorkshire, Landrace,

Angus, Hereford, etc.)n How might you find?n Data= name; Proc Sort; by herd id; Out = cleandata

NODUPKEY DUPOUT = duplicates;

Page 14: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Printing your Data with PROC PRINTu We have used PROC PRINT many times in labu What is new is that there are several options in

SAS associated with the PROC PRINTu Like other statements you can tell SAS what

data set to use for printingn PROC PRINT DATA = PIG12

Page 15: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Printing your Data with PROC PRINTu PROC PRINT Data = PIG12 NOOBS LABEL;u The NOOBS results in SAS NOT printing the

observation number by each linen Might want the observation number to check counts for

some reason so you would omit the NOOBS optionn Use the NOOBS because it just takes up space

Page 16: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Printing your Data with PROC PRINTu Variable option used with PROC PRINT

n ID statement prints out the IDs rather than the observation number.

n Useful to compare a printed list from SAS output to your original data that you have stored electronically or on paper.

u The SUM optionn This option sums the variable specified n Could be particularly useful when looking at pen weights

rather than individual wts. when pen is the experimental unit.

Page 17: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Printing your Data with PROC PRINTu The SUM option

n PROC Print Data = Pig 12; by pen; Sum wtgain;VAR ID Pen Sex ADG wtgain;Title ‘Weight Gain Summed by Pens’;

RUN; QUIT;

Page 18: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

SAS Can be Used to Write Simple Reports u For example you just wanted to summerize the

data from Pig12u Exampleu Data _Null_;

Infile ‘some data file source’;Input variable names;Created variable like ADG;Tell where to put it;File ‘c:\ somefile\data.txt’ Print;Title;Put @5 ‘Sale Report for ‘Name’ from classroom’ Class // @5 ‘Congratulations! You sold ‘

Quantity ’ boses of candy’ / @5 ‘and earned ‘ Profit DOLLAR6.2 ‘ for out field trip.’;Put_PAGE_;

Page 19: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Summarizing Your Data Using PROC MEANSu We have started to use PROC MEANS in LAB;u Numerous options to use with PROC MEANS

n Use by including them behind PROC MEANS Option

u The default will print the number of non-missing values, the mean, the standard deviation, and the minimum – maximum value for each observation.

u The options available include;n Median –the numeric value separating the higher half of a

samplen Mode – value occurring most frequently

Page 20: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Summarizing Your Data Using PROC MEANSu Options cont’

n Nmiss – number of missing values by variablen Range – range of valuesn Sum – the sum

u Use the default PROC MEANS; without the var option;n If you do not wish to obtain means for every trait you can

indicate which variable means you wish to obtainn PROC MEANS var ADG, Backfat, LMA;

Page 21: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Summarizing Your Data Using PROC MEANSu You can subset the data and obtain means using

the by statementn You will get n sets of means depending on how many levels

the by variable you sorted by

Page 22: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Counting your Data Using PROC FREQu PROC FREQ

n A frequency table gives you simple counts or number of variables you have for each

n When you have counts for one variable it is called a one-way frequencies

n When you have two or more variables the counts are called two-way, three-way and so forthl Alternatively the multiple variables are called Cross tabulations

n Used most frequently to show distribution of data variables

n Can be used to identify data irregularities

Page 23: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Counting your Data Using PROC FREQu Basis form is PROC FREQ;

Tables variable – combinations;uOptions if any are included after a / after the tables statement i.e.

n Tables variable – combinations / list;n The options available are:

l List – prints cross-tabulations in list format rather than gridl Missing – Includes missing values in frequency statisticsl NOCOL – suppresses printing column percentages in corss-

tabulationsl NOPERCENT- suppresses printing of percentagesl NOROW - suppresses printing row percentages in cross-tabulationsl Out = data-set writes a data set containing frequencies

Page 24: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Counting your Data Using PROC FREQuOptions if any are included after a / after the tables statement i.e.

n Tables variable – combinations / list;n The options cont’ are:n Chisq - performs the standard Pearson chi-square test on

the table(s) requested. n Expected – prints the expected number of observations in

each cell under the null hypothesis. n Exact - requests Fisher's exact test for the table(s). This is

automatically computed for 2 x 2 tables. n Sparse - produces a full table, even if the table has many

cells containing no observations.

Page 25: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Example of One–Way Contingency Table

CATEGORY Frequency PercentCumulativeFrequency

CumulativePercentage

JAZZ 273 61.21 273 61.21

CLASSICAL 59 13.23 332 74.44

POP 49 10.99 381 85.43

GOSPEL 44 9.87 425 95.29

RAP 21 4.71 446 100.00

Page 26: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Table of hichol1 by hichol2 hichol1 hichol2

| 1 | 2 | Total -----------+-----------+--------

Frequeny 1 | 21 | 21 | 42Percent | 22.83 | 22.83 | 45.65

-----------+------------+-------- 2 | 23 | 27 | 50 | 25.00 | 29.35 | 54.35 -----------+------------+-------- Total 44 48 92 47.83 52.17 100.00

Frequency Missing = 2

Example of Two –Way Contingency Table

Page 27: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using PROC TABULATEu Every summary statistic the TABULATE

computes can be produced by other proceduresn Printn Means, andn FreqBut some people think PROC TABULATE output is much

“prettier”.

Page 28: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using PROC TABULATEu General Form

n PROC TABULATE;l CLASS –classification variable list;l TABLE page-dimension, row-dimension, column-dimension

CLASS statement tell SAS which variables contain categorical data to be used for dividing the observations into groups

n In our example data set this would mean things like test, sex, pen, treatment, breed

Page 29: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using PROC TABULATEu General Form

n PROC TABULATE;l CLASS –classification variable list;l TABLE page-dimension, row-dimension, column-dimension

n TABLE statement defines only one table but you might have multiple TABLE statements

n If a variable is listed in the CLASS statement, then, by default, PROC TABULATE produces simple counts of the number of observations in each category fo that variable.

n PROC TABULATE offers many other statistics too.

Page 30: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using PROC TABULATEu General Form

n PROC TABULATE;l CLASS –classification variable list;l TABLE page-dimension, row-dimension, column-dimension

n Each TABLE statement can specify up to 3 dimensionsn The dimensions separated by commas , and tell SAS

which variables are used for the pages, rows, and columnsl Specify one dimension then it becomes the column dimension l Specify two dimensions then they become rows and columnsl Specify three dimensions then they become row, columns and

pages

Page 31: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Using PROC TABULATEu General Form

n PROC TABULATE;l CLASS –classification variable list;l TABLE page-dimension, row-dimension, column-dimension

n The default results in observations that are missing are not included in the in the tablesl To keep the missing values included in the report then the code PROC TABULATE MISSING; must be used

Page 32: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Example form Generated by PROC TABULATEu PROC TABULATE;

n CLASS GENDER;n VAR AGE INCOME EDUC;n TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL;

u RUN;u QUIT;

Page 33: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Example form Generated by PROC TABULATEu PROC TABULATE;

n CLASS GENDER;n VAR AGE INCOME EDUC;n TABLE (AGE INCOME EDUC)*MEAN, GENDER ALL;

u RUN;u QUIT;

Page 34: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Output from a PROC TABULATE exampleOutput 1.1

Variable Label N Mean Std Dev Minimum Maximum--------------------------------------------------------------------------------------------------------------------------AGE Age 6639 48.614 16.598 25.000 90.000INCOME Income 6639 25065.797 23850.488 0.000 263253.000EDUC Education 6639 13.040 2.953 4.000 19.000--------------------------------------------------------------------------------------------------------------------------

Page 35: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Output from a PROC TABULATE exampleOutput 1.2

GENDER = FemaleVariable Label N Mean Std Dev Minimum Maximum--------------------------------------------------------------------------------------------------------------------------AGE Age 3559 49.528 17.158 25.000 90.000INOME Income 3559 17780.087 17070.596 0.000 263253.000EDUC Education 3559 12.932 2.899 4.000 19.000 -------------------------------------------------------------------------------------------------------------------------

GENDER = Male Variable Label N Mean Std Dev Minimum Maximum--------------------------------------------------------------------------------------------------------------------------AGE Age 3080 47.558 15.864 25.000 90.000INCOME Income 3080 33484.577 27520.481 0.000 251998.000EDUC Education 3080 13.165 3.011 4.000 19.000

-------------------------------------------------------------------------------------------------------------

Page 36: Sorting, Printing, and Summarizing Your Data (Chapter  in  the 4  Little SAS Book)

IOWA STATE UNIVERSITYDepartment of Animal Science

Output from a PROC TABULATE exampleOutput 1.3

------------------------------------------------------------------------------------------------------------------------- | | GENDER | || |-------------------------------| || | Female | Male | ALL ||-----------------------------------------------------+---------------+--------------+--------------------------------| Age MEAN | 49.53 | 47.56 | 48.61 ||-----------------------------------------------------+---------------+--------------+--------------------------------|| Income MEAN | 17780.09 | 33484.58 | 25065.80 ||-----------------------------------------------------+---------------+--------------+--------------------------------|| Education MEAN | 12.93 | 13.17 | 13.04 |-------------------------------------------------------------------------------------------------------------------------