A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2...

22
1 A 701 SAS Primer Introduction Introduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides extensive capabilities to both process and analyze data. This document attempts to provide some initial structure so that a user who is unfamiliar with SAS can get started. Overview Overview SAS processing of input is essentially divided into two steps: (1) the DATA step; and, (2) the PROC step. The DATA step reads in the names of variables, and the values for each observation on those variables. Data can be dropped, selected, and/or transformed in the DATA step. The PROC step represents one or more executions of a defined procedure intended to manipulate and/or analyze data. Note that within a single execution of SAS it is possible to have both multiple DATA steps - with multiple sets of data - and/or, multiple PROC steps. Indeed, it is possible to have a SAS PROC create data which will become available for subsequent analyses. The SAS language The SAS language SAS expects to process any of a variety of keywords that serve as commands. Many commands require the user provide additional information. All SAS commands end in a semi-colon! All commands and user-supplied information can be entered either in upper- or lower-case. SAS does not (typically) distinguish between the two cases. In the presentation below, the basic portion of a given SAS command is presented in all CAPITAL letters, while information to be provided by the user to the SAS command is in lower-case letters. Alphameric means any combination of alphabetic and numeric characters, in which the first character is alphabetic.

Transcript of A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2...

Page 1: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

1

A 701 SAS Primer

IntroductionIntroduction

SAS Version 9.2 is a powerful, very general purpose, 'high-level' programminglanguage. It provides extensive capabilities to both process and analyze data. Thisdocument attempts to provide some initial structure so that a user who is unfamiliar withSAS can get started.

OverviewOverview

SAS processing of input is essentially divided into two steps: (1) the DATA step;and, (2) the PROC step. The DATA step reads in the names of variables, and thevalues for each observation on those variables. Data can be dropped, selected, and/ortransformed in the DATA step. The PROC step represents one or more executions of adefined procedure intended to manipulate and/or analyze data.

Note that within a single execution of SAS it is possible to have both multipleDATA steps - with multiple sets of data - and/or, multiple PROC steps. Indeed, it ispossible to have a SAS PROC create data which will become available for subsequentanalyses.

The SAS languageThe SAS language

SAS expects to process any of a variety of keywords that serve as commands. Many commands require the user provide additional information. All SAS commandsend in a semi-colon!

All commands and user-supplied information can be entered either in upper- orlower-case. SAS does not (typically) distinguish between the two cases.

In the presentation below, the basic portion of a given SAS command ispresented in all CAPITAL letters, while information to be provided by the user to theSAS command is in lower-case letters.

Alphameric means any combination of alphabetic and numeric characters, inwhich the first character is alphabetic.

Page 2: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

2

The DATA step:

At a minimum, the DATA step requires 3 commands: DATA, INPUT, andDATALINES.

DATA dataset ;

where

DATA indicates to SAS the beginning of the DATA step; and,

`dataset` is a name of 8 or less alphameric characters that SAS will use to create andkeep track of a data set during its execution.

INPUT varname1 ... varname_ ;

where

INPUT tells SAS the names of each of the variables in the data set; and,

`varname1 ... varname_` are the names of the variables, each of which are 8 or lessalphameric characters.

DATALINES ;

where DATALINES tell SAS that the data for each observation follow immediately. (Asynonym for DATALINES is CARDS.)

The data can be most easily input in what is known as free-field format, that is,with one or more blank spaces separating each value. For each observation, its`varname1` value comes first, then its `varname2` value, and so forth. The values foreach observation begin on a new line.

The PROC step:

There exist a variety of PROCs available for statistical analysis. Each procedurebegins with the command PROC, followed by the name of the procedure. This initialcommand is often followed by any of a number of subcommands, which allow for usercontrol of the PROC.

Page 3: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

3

SAS GraphicsSAS Graphics

SAS procedures CORR, REG, and GLM support Output Delivery System (ODS)graphics functionality in SAS 9.2. Therefore, these procedures will automaticallyproduce high-quality graphics using SAS/GRAPH. Moreover, graphics output is alsounder control of the user.

In order to produce graphs and plots, (1) an output location for the graphs mustbe specified, and (2) ODS Graphics must be enabled:

ODS destination ;ODS GRAPHICS ON ;PROC name ;...RUN ;QUIT ;ODS GRAPHICS OFF ;ODS destination CLOSE ;

`destination` can be either HTML or RTF, which means that SAS will create an html orrtf file you can save and later edit;

`name` is CORR, REG, or GLM.

SAS ProceduresSAS Procedures

For our purposes, there are seven PROCs we will use: PRINT, PLOT, CHART,CORR, REG, GLM, and IML. A brief introduction to each of these PROCs follows.

PROC PRINT:

PRINT can be used to list data values for each observation, as:

PROC PRINT DATA=dataset options ;VAR varname1 ... varname_ ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC;

`options` include:

Page 4: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

4

LABEL uses the variable labels as the column headings;

N prints a line at the end of the output giving the number of observationsin the dataset; and,

NOOBS omits printing of the observation number preceding eachobservation in the dataset.

`varname1 ... varname_` are the names of (one or more) variables whose values are tobe printed. (Omitting the VAR subcommand will produce a listing of all variables for allcases.)

PROC PLOT:

PLOT produces any of a variety of Euclidean (i.e., X-Y) plots, as:

PROC PLOT DATA=dataset ;PLOT yvarname*xvarname / options ; and/orPLOT yvarname*xvarname = 'symbol'/ options ; and/orPLOT yvarname*xvarname = zvarname / options ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC; and,

`yvarname` is the name of the variable to be plotted on the Y-axis;

`xvarname` is the name of the variable to be plotted on the X-axis;

`'symbol'` is a single alphameric character, enclosed in single quotes, with which eachpoint will be represented in the plot; and,

`zvarname` is the name of a variable whose values will mark each point in the plot; and,

`options` include:

OVERLAY indicates that all the plots requested with a PLOTsubcommand are to be printed on a single page.

Note that it is possible to use multiple PLOT subcommands in a single executionof PLOT.

Note that PLOT produces low-resolution, dot-matrix-style plots. The

Page 5: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

5

SAS/GRAPH module contains PROC GPLOT, which produces the same plots asPLOT, in high-resolution, presentation-quality format. PROC GPLOT is extremelypowerful and flexible, and control of the features of the plot are under direct control ofthe programmer.

PROC CHART:

CHART produces a variety of bar graphs and charts, both vertical and horizontal,as:

PROC CHART DATA=dataset ;VBAR effect / options ; and/orHBAR effect / options ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC;

`effect` is the name of (one or more) variables whose values are to be plotted; and,

`options` include:

DISCRETE indicates the variable is discrete rather than continuous;

TYPE=MEAN indicates that each bar should be drawn at the mean of thevarname specified in the SUMVAR subcommand;

SUMVAR=varname

where

`varname` is the name of the variable to be plotted in the bar chart;

MIDPOINTS=values

where

`values` define midpoints of the range of values for the chart variable overwhich each bar will be drawn;

REF=value

where

Page 6: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

6

`value` is a number, at which point a reference line will be drawn on theresponse axis.

Note that it is possible to use multiple VBAR and/or HBAR subcommands in asingle execution of CHART.

Note that CHART produces low-resolution, dot-matrix-style charts and graphs. The SAS/GRAPH module contains PROC GCHART, which produces the same chartsand graphs as CHART, in high-resolution, presentation-quality format. PROC GCHARTis extremely powerful and flexible, and control of the features of the chart are underdirect control of the programmer.

PROC CORR:

CORR will provide Pearson product-moment correlations among a set ofvariables, together with some descriptive statistics and probability values, as:

PROC CORR DATA=dataset options ;VAR varname1 ... varname_ ;PARTIAL pvar1 ... pvar_ ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC; and,

`options` include:

PLOT=MATRIX(moptions) prints a scatterplot matrix for variables.

`moptions` include:

HISTOGRAM displays histograms of the variables in thematrix plot.

PLOT=SCATTER(soptions) prints scatterplots for pairs of variables.

`soptions` include:

ALPHA=n

where

`n` is a value between 0 and 1, and defines the alpha level

Page 7: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

7

for the confidence or prediction ellipses to be displayed inthe scatterplots.

ELLIPSE=eoptions

where

`eoptions` include:

PREDICTION to request prediction ellipses fornew observations, and/or;

CONFIDENCE to request confidence ellipsesfor the mean to be displayed in thescatterplots.

COV prints the dispersion matrix for the variables specified on the VARstatement (If a PARTIAL subcommand is used, the partialled dispersionmatrix is printed);

CSSCP prints the corrected sums-of-squares-and-cross-products matrixfor the variables specified on the VAR statement (If a PARTIALsubcommand is used, both the unpartialled and partialled matrices areprinted); and,

SSCP prints the uncorrected sums-of-squares-and-cross-products matrixfor the variables specified on the VAR statement (If a PARTIALsubcommand is used, only the unpartialled matrix is printed).

`varname1 ... varname_` are the names of (two or more) variables whoseintercorrelations are to be determined. (Omitting the VAR subcommand will produce acorrelation matrix of all variables.)

`pvar1 ... pvar_` are the names of one or more variables whose effects will be removedfrom the relationships among the variables specified on the VAR statement.

PROC REG:

REG is a very general-purpose ordinary least squares regression procedure,which provides a wide choice of analysis and output options. The basic approach wouldbe:

PROC REG DATA=dataset poptions ;label: MODEL crit1 ... crit_ = pred1 ... pred_ / options ;

Page 8: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

8

tlabel: TEST effects / toption ;mlabel: MTEST equations / moptions ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC;

`poptions` include:

CORR prints the intercorrelations of the variables in the MODELstatement;

SIMPLE prints various descriptive statistics for the variables in theMODEL statement;

USSCP prints the uncorrected sums-of-squares-and-cross-productsmatrix for the variables specified on the VAR statement; and,

LINEPRINTER specifies that the PLOT subcommand (see below) producelow-resolution, dot-matrix-type scatterplots, rather than high-resolutionscatterplots. [This option does not affect the PLOT= option, which willalways produce high-resolution graphics using ODS GRAPHICS.]

`label` is a name of 8 or less alphameric characters that yields an output label for themodel tested;

`crit1 ... crit_` is the name of one or more criterion variables;

`pred1 ... pred_` is the name of none or more predictor variables. (Note that it ispossible to specify no effects, in which case only an intercept term is included in themodel. In this case, the equal sign must still be included on the MODEL statement. Note that INTERCEPT cannot be included as the name of a predictor variable.)

`options` include:

NOINT excludes the intercept term from the model;

XPX prints the sums-of-squares-and-cross-products matrix;

I prints the inverse of sums-of-squares-and-cross-products of the predictors;

COLLIN prints collinearity diagnostics;

Page 9: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

9

CORRB prints the correlations of the estimates;

COVB prints the dispersion matrix of the estimates;

PCORR2 prints the squared partial correlation coefficients;

SCORR2 prints the squared semi-partial correlation coefficients;

SS2 prints the unique sums of squares associated with each predictor;

STB prints the standardized partial regression coefficients;

TOL prints the tolerance values for the estimates;

VIF prints the variance inflation factors for the estimates;

P prints the predicted and residual value for each observation;

R prints an analysis of the residuals (subsumes P);

INFLUENCE prints a detailed summary of the influence of each observation onboth the estimates and the predicted values;

CLI prints prediction intervals for individual predicted values for eachobservation; and,

CLM prints confidence intervals for a mean predicted value for eachobservation.

`tlabel` is a name of 8 or less alphameric characters that yields an output label for aneffect tested;

`effects` is the name of one or more predictor variables, followed by an equal sign, anda hypothesized value [for example: if X1 and X2 were predictors, TEST X1 = 0 wouldtest the hypothesis that the regression coefficient associated with the predictor variableX1 equals zero; or TEST X1 - X2 = 0 would test the hypothesis that the regressioncoefficient associated with X1 equals the regression coefficient associated with X2].(Note that if an intercept term is included in the model, then it may be specified as apredictor variable using the name INTERCEPT.)

`toption` is:

PRINT prints intermediate calculations.

`mlabel` is a name of 8 or less alphameric characters that yields an output label for the

Page 10: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

10

equation tested;

`equation` specifies a linear function of the criteria and/or the predictors suitable fortesting [for example: if Y1 and Y2 were criteria, and X1 and X2 were predictors, MTESTX1, X2 would test that the regression coefficients for X1 and X2 are zero for Y1 and Y2;or, MTEST Y1 - Y2, X1 would test the hypothesis that the regression coefficientassociated with X1 is the same for both Y1 and Y2]. (Note that if an intercept term isincluded in the model, then it may be specified as a predictor variable using the nameINTERCEPT.); and,

`moptions` include:

CANPRINT prints the canonical correlation(s) for the linear function;

DETAILS prints various intermediate calculations;

MSTAT=EXACT produces exact multivariate test statistics; and,

PRINT prints the hypothesis and error sums-of-squares-and-cross-products matrices.

Note that it is also possible to use REG to plot variables, simply by including thefollowing command:

PLOT yvarname*xvarname='symbol ' ;

where

`yvarname` is the name of the variable to be plotted on the Y-axis;

`xvarname` is the name of the variable to be plotted on the X-axis; and,

`'symbol'` is a single alphameric character, enclosed in single quotes, with which eachpoint will be represented in the plot;

Note that it is possible to include the observation number, predicted values, orresiduals in a PLOT request by using the special `xvarname` or `yvarname` of obs., p.and/or r., respectively.

Note that it is also possible to use REG to create a dataset containing the valuesof one or more variables, simply by including the following command:

OUTPUT OUT=regdata P=predname R=residname ;

Page 11: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

11

where

`regdata` is the name of a SAS data set containing all the variables in the input data set,plus the predicted values for each observation, `predname`, and the residual values foreach observation, `residname`. This data set may then be referenced later, and its dataprinted, plotted, and so forth.

`predname` is an 8 character alphameric variable name for the predicted values foreach subject; and,

`residname` is an 8 character alphameric variable name for the residual values for eachsubject.

Note that it is possible to use multiple TEST, MTEST, and PLOT subcommandsin a single execution of REG.

PROC GLM:

GLM represents an exceedingly general implementation of the General LinearModel (hence its name), and thus, has both considerable power and flexibility. It willconduct, among other analyses, simple and multiple regression, univariate andmultivariate analysis of variance and covariance, and canonical correlation:

PROC GLM DATA=dataset options ;CLASS factor(s) ;MODEL crit1 ... crit_ = effect1 ... effect_ / doptions ;RANDOM reffect(s) / roptions ;TEST H=teffects E=eeffect ;ESTIMATE 'label' ceffect(s) coefficients / coptions ;CONTRAST 'label' ceffect(s) coefficients / coptions ;LSMEANS leffect(s) / loptions ;MANOVA H=heffects E=eeffect M=mtransformation / moptions ;REPEATED weffect wlevels wtransformation / woptions ;

where

`dataset` is the name of the SAS dataset, created in a DATA step, or by a previously-executed PROC;

`options` include:

ORDER=INTERNAL, which indicates that the levels of the factor(s) areorganized according to the values read in the input data. Otherwise, ifPROC FORMAT is used, levels of the factor(s) are organized according to

Page 12: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

12

the alphameric order of the formatted values.

PLOT=plot request (options), which indicates any of a variety of plots thatare available, depending on the statistical model. Plot requests includeANCOVAPLOT, ANOMPLOT, BOXPLOT, COUNTOURFIT,CONTROLPLOT, DIAGNOSTICS, DIFFPLOT, FITPLOT, INTPLOT,MEANPLOT, and RESIDUALS. [Note that if ODS Graphics are enabled,GLM will automatically print what it considers the most appropriate plotsfor the specified statistical model.]

`factors` are the names of (one or more) variables whose values represent levels of thefactor(s);

`crit1 ... crit_` is the name of one or more criterion variables;

`effect1 ... effect_` is the name of none or more effects to be included in the model. These effects can include: main effects – specified as the names of the factors [forexample: A]; interactions – specified with a asterisk between the factor names [forexample: A*B]; nested factors – specified as the name of the factor, followedimmediately by parentheses around the name the factor within which it is nested [forexample: B(A), meaning B is nested within A]; and, continuous variables [for example:X1, X2]. (Note that it is possible to specify no effects, in which case only an interceptterm is included in the model. In this case, the equal sign must still be included on theMODEL statement. Note that INTERCEPT cannot be included as the name of apredictor variable.)

`doptions` include:

INT prints the tests of the intercept term in the model;

NOINT excludes the intercept term from the model (default is to includethe intercept term, but not to print any tests of significance associated withit);

NOUNI suppresses printing of the univariate tests of significance (oftenuseful when the REPEATED subcommand is used);

SOLUTION prints the parameter estimates;

EFFECTSIZE produces effect size information, including the estimates forthe semipartial-ω2 and the partial-ω2 and estimates and confidenceintervals for the semipartial-η2 and the partial-η2 for each fixed effect andspecified CONTRASTs on fixed effects in the model;

E prints the general form of contrast matrix that is defined by the design

Page 13: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

13

specified by the terms on the right-hand-side of the equation on theMODEL statement;

E1, E2, and/or E3 print the corresponding contrast matrix;

SS1, SS2, and/or SS3 print the corresponding sums of squares and Fvalue for each effect in the model;

P prints the predicted and residual value for each observation;

R prints an analysis of the residuals (subsumes P);

CLI prints prediction intervals for individual predicted values for eachobservation;

CLM prints confidence intervals for a mean predicted value for eachobservation;

XPX prints the sums-of-squares-and-cross-products matrix; and,

I prints the inverse of sums-of-squares-and-cross-products of the predictors.

`reffect(s)` is the name of one or more effects – main, interaction, or nested – includedin the MODEL statement;

`roptions` include:

Q prints all quadratic forms in the fixed effects that appear in the expectedmean squares; and,

TEST requests tests of significance for the random effects in the model.

`teffects` is the name of one or more effects which will be tested against `eeffect`;

`eeffect` is the name of an effect which will serve as the error term only for the `heffects`given on the TEST or MANOVA subcommand;

`'label'` is an alphameric label of 20 or less characters, used on the output to identify the estimate or contrast;

`ceffect(s)` is the name of one or more effects that appear in the MODEL statement,upon which an estimate will be obtained and/or a contrast will be drawn. (Note that ifan intercept term is included in the model, it may be specified as an effect using thename INTERCEPT.);

Page 14: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

14

`coefficients` are numbers that indicate exactly what means are to be compared;

`coptions` include:

E prints the entire contrast matrix implicitly defined by the CONTRAST orESTIMATE.

`leffect(s)` is the name of one or more effects in the model for which least-squaresmeans are to be calculated;

`loptions` include:

STDERR prints the standard error of the least squares means and the pvalues for the hypothesis that the least squares mean equals zero;

CL prints confidence limits for each of the least squares means;

ALPHA=n

where

`n` is a value between 0 and 1, and defines the alpha level for theconfidence limits for the least squares means; and,

PDIFF prints the p values for all possible simple comparisons of themeans for the `leffect(s)`;

TDIFF prints the t values and corresponding (adjusted) p values for allpossible simple comparisons of the means for the `leffect(s)`;

ADJUST=amethod

where

`amethod` can include either BON, SIDAK, TUKEY, SCHEFFE, and yieldsp values associated with the tests of the least squares means adjusted forthe chosen multiple comparison procedure. [Note that the PDIFF orTDIFF option must be specified along with ADJUST.]

SLICE=seffect or SLICE=(seffects)

where

`seffect` is the name of a fixed effect or effects by which the interaction`leffect(s)` are divided, yielding simple effects tests for the `leffect(s)` as a

Page 15: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

15

function of the `seffect`. [for example: if A*B is an effect in the model, thenSLICE=A would test the test the simple main effects of B within each levelof A, while SLICE=(A B) would test the simple main effects of B withineach level of A, and the simple main effects of A within each level of B]. [Note that ADJUST does not adjust the p values for any tests associatedwith SLICE.]

OUTPUT OUT=lsmeans

where

`lsmeans` is the name of a SAS dataset containing the means of thecriteria by the `leffect(s)` . This data set may then be referenced later,and its data printed, plotted, and so forth.

`heffects` is the name of one or more effects which will be tested against `eeffect`. (Note that if an intercept term is included in the model, it may be specified as an effectusing the name INTERCEPT.)

`mtransformation` defines a transformation of the criterion variables. It can be specifiedin one of two ways. Either a set of equations is given, each equation separated bycommas, where the equations represent linear combinations of the criteria [for example:if Y1, Y2, and Y3 were the criteria, M = -Y1 + Y3, Y1 - 2*Y2 + Y3 would yield two newvariables representing the linear and quadratic terms associated with the criteria]; or atransformation matrix, enclosed in parentheses, is specified directly [for example: M = ( -1 0 1 , 1 -2 1) yields the same two transformed variables representing the linear andquadratic terms as the previous example].

`moptions` include:

CANONICAL prints the canonical decomposition of the effect;

MSTAT=EXACT produces exact multivariate test statistics;

PRINTE prints the error sums-of-squares-and-cross-products matrix, andif, this matrix is the residual matrix from the full model, then the partialcorrelations among the criteria is also printed;

PRINTH prints the hypothesis sums-of-squares-and-cross-productsmatrix; and,

SUMMARY prints an ANOVA table for each criterion variable (Note that ifa transformation of the criteria have been specified with the Msubcommand, the tables apply to the transformed variables).

Page 16: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

16

`weffect` is the name to be given to a within-subject factor;

`wlevels` indicates the number of levels associated with the within-subject factor; and,

`wtransformation` defines the type of contrasts used to define the transformed variablesassociated with the within-subjects effect. Methods include:

CONTRAST(n) generates dummy-coded transformed variables, where thereference level is defined by n;

POLYNOMIAL generates variables transformed on the basis of orthogonalpolynomials; and,

PROFILE generates transformed variables based on adjacent differences.

`woptions` include:

CANONICAL prints the canonical decomposition of the transformedvariables associated with each within-subject effect;

MSTAT=EXACT produces exact multivariate test statistics;

NOM suppresses printing of the multivariate tests of significance;

NOU suppresses printing of the univariate tests of significance;

PRINTE prints the error sums-of-squares-and-cross-products matrix andpartial correlation matrix for both the untransformed and transformedvariables for each within-subject effect, as well as sphericity tests for thetransformed variables;

PRINTH prints the hypothesis sums-of-squares-and-cross-products matrixfor the tranformed variables associated with each within-subjects effect;

PRINTM prints the transformation matrix that defines each within-subjectseffect; and,

SUMMARY prints an ANOVA table for each transformed variable.

Note that it is also possible to use GLM to create a dataset containing the valuesof one or more variables, simply by including the following command:

OUTPUT OUT=glmdata P=predname R=residname ;

where

Page 17: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

17

`glmdata` is the name of a SAS data set containing all the variables in the input dataset, plus the predicted values for each observation, `predname`, and the residual valuesfor each observation, `residname`. This data set may then be referenced later, and itsdata printed, plotted, and so forth.

`predname` is an 8 character alphameric variable name for the predicted values foreach subject; and,

`residname` is an 8 character alphameric variable name for the residual values for eachsubject.

Note that it is possible to use multiple ESTIMATE and CONTRASTsubcommands in a single execution of GLM.

PROC IML:

IML is an abbreviation for Interactive Matrix Language. IML is an exceedinglypowerful and flexible procedure that can perform a variety of matrix operations. Generalrules are:

Matrix names follow the same naming convention as do variables in SAS – 8 orless alphameric characters. And, the semi-colon is still required at the end of eachstatement.

Matrices can be read directly into the program, within braces, rows demarked bya comma, as:

A = {1 1, 2 2, 3 3} ;

which specifies the element values for a 3 x 2 matrix.

Matrix operations include:

= assignment

(..) precedence [order of operation]

+ add two matrices

- subtract two matrices

* multiply two matrices

Page 18: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

18

# multiply the elements in the first matrix by the corresponding elements inthe second matrix

/ divide the elements in the first matrix by the corresponding elements in thesecond matrix

` transpose a matrix (Note this is the grave accent, not the single quote)

@ direct or Kronecker product operator

|| join two matrices horizontally

// join two matrices vertically

Matrix functions include:

DET(A) which returns the determinant of square A

DIAG(A) which sets all off-diagonal elements of square A to zero

DIAG(a) which yields a diagonal matrix with elements equal to vector a

EIGVAL(A) which yields the eigenvalues of symmetric A

EIGVEC(A) which yields the eigenvectors of symmetric A

INV(A) which inverts square A

GINV(A) which yields the generalized inverse of A

NCOL(A) which returns the number of columns in A

NROW(A) which returns the number of rows in A

ROOT(A) which yields the Cholesky decomposition of symmetric A

SQRT(A) which takes the square root of each element in A

SWEEP(A) which sweeps the matrix A on the main diagonal

TRACE(A) which returns the trace of A

VECDIAG(A) creates a column vector of the diagonal elements of square A

Page 19: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

19

Finally, results for any operation can be printed:

PRINT ‘label’ ,, A ;

where `’label’` is the output label associated with the matrix A.

Page 20: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

20

In closing, please recognize that this introduction only presents the barest ofessentials. Hence, many of the capabilities of a given PROC are ignored. And, thereare better (where better means both faster and more efficient) ways of accomplishingmany of the tasks required throughout the semester. However, what's presented hereinwill get you started – and will also get you finished – without a degree in statistics orcomputer science.

Want to know even more? SAS is very glad you asked. At last count, the SASInstitute, marketers of SAS, was selling in excess of 100 different manuals for the SASSystem. Perhaps the first several to purchase might be:

SAS Introductory Guide

SAS Language: Reference

SAS Language and Procedures: Introduction

SAS Language and Procedures: Usage 1 and 2

SAS Procedures Guide

SAS/STAT User's Guide, Volume 1 and 2

A Step-by-Step Approach to Using the SAS System for Univariate andMultivariate Statistics

Applied Multivariate Statistics with SAS Software

SAS/IML User’s Guide

All are typically on sale in the UMCP Bookstore.

Want it all? Go to: www.sas.com

Page 21: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

21

Postscript

Like all statistical programs, SAS makes certain decisions in what its outputshould be, and how it should be labeled. This section briefly highlights some of theseassumptions which may be either non-standard or atypical in their usage.

In PROC IML:

Note that EIGEN and EIGVEC return the normalized eigenvectors, not the non-normalized eigenvectors.

In PROC REG:

In the MTEST command, the “Eigenvectors” are transposed, that is, the rows arethe variates and the columns are the variables; and, they differ from UT, the total sampleunstandardized canonical variate coefficients, by a factor of sqrt(N-1). The“Eigenvalues” equal θ, that is, canonical R2.

In PROC GLM:

The use of the SS2 option in the MODEL statement when there is an interactionbetween a factor and a covariate does not produce the sum of squares (and hence, testof significance) one might expect for the factor. Rather than estimating and testing thefactor adjusted for the covariate and ignoring the presence of the interaction effect, thefactor is adjusted for both the covariate and the interaction effect. Thus, the test of thefactor in this circumstance is the same as its test were SS3 specified. In contrast, thetest of the covariate is, in fact, adjusted for the factor but not for the interaction effect.

In the presence of covariate(s), the LSMEANS command has as its default thecalculation of estimated means at the mean of the covariate(s). This is a frequentapproach to estimating adjusted means, and is often appropriate. However, in certaincircumstances, particularly involving interactions between a factor and a covariate, thisassumption produces what might best be termed ‘Type I means’, that is, the means forthe effect unadjusted for the covariate(s). To get ‘Type III means’, the covariates mustbe evaluated with the option “AT covariate = 0".

Page 22: A 701 SAS Primer - Kevin E. O'Grady, PhD1 A 701 SAS Primer IntroductionIntroduction SAS Version 9.2 is a powerful, very general purpose, 'high-level' programming language. It provides

22

Author:

Kevin E. O’Grady, PhDDepartment of PsychologyUniversity of MarylandCollege Park, MD301-405-5927 (voice)[email protected]

Date Prepared:

28 December 2009