BIOMETRICS INFORMATION - British Columbia

84
BIOMETRICS INFORMATION HANDBOOK NO. 3 MARCH 1992 Guidelines for the Statistical Analysis of Forest Vegetation Management Data Biometrics Information Handbook Series BC Ministry of Forests

Transcript of BIOMETRICS INFORMATION - British Columbia

Page 1: BIOMETRICS INFORMATION - British Columbia

BIOMETRICSINFORMATION

HANDBOOK NO. 3 MARCH 1992

Guidelines for the Statistical Analysis ofForest Vegetation Management Data

Biometrics Information Handbook Series

BC Ministry of Forests

Page 2: BIOMETRICS INFORMATION - British Columbia

Guidelines for the Statistical Analysisof Forest Vegetation Management Data

byAmanda F. Linnell Nemec

International Statistics and Research CorporationP.O. Box 496

Brentwood Bay, B.C.V0S 1A0

March 1992

Series EditorWendy Bergerud

Forest Science Research BranchB.C. Ministry of Forests

31 Bastion SquareVictoria, B.C.

V8W 3E7

Page 3: BIOMETRICS INFORMATION - British Columbia

Partial funding for this research project and the costof printing this publication were provided by theCanada-British Columbia Partnership Agreementon Forest Resource Development: FRDA II.

Canadian Cataloguing in Publication Data

Nemec, Amanda F. Linnell (Amanda Frances Linnell)Guidelines for the statistical analysis of forest

vegetation management data

(Biometrics information handbook series, ISSN1183-9759 ; no. 3)

Includes bibliographical references: p.ISBN 0-7718-9138-5

1. Forest management - Statistical methods. 2.Forests and forestry - Statistical methods. 3.Forest site quality - Statistical methods. I.British Columbia. Ministry of Forests. II. Title.III. Series.

SD387.S73N45 1992 634.9’2’072 C92-092015-2

1992 Province of British ColumbiaPublished by theForest Science Research BranchMinistry of Forests31 Bastion SquareVictoria, B.C. V8W 3E7

Copies of this and other Ministry of Forests titlesare available from Crown Publications Inc.,546 Yates Street, Victoria, B.C. V8W 1K8.

Page 4: BIOMETRICS INFORMATION - British Columbia

iii

SUMMARY

This report presents a set of guidelines for the statistical analysis of forest vegetation management data.Such data typically comprise repeated (over time) measurements of both crop trees and non-crop vegeta-tion. The response variables include continuous (e.g., height, diameter, cover, etc.) and categorical(e.g., survival, species occurrence, condition, etc.) variables. The recommended methods of analysis are:graphical methods; analysis of variance methods, in particular, multivariate methods for the analysis ofrepeated measurements (or growth curves); multiple comparisons methods; and methods based on log-linear models. An analysis of power is also recommended to assist in the interpretation of the results. GenericSAS programs for carrying out these analyses are provided.

Page 5: BIOMETRICS INFORMATION - British Columbia

iv

TABLE OF CONTENTS

SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 DESIGN OF FOREST VEGETATION MANAGEMENT TRIALS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

3 TREATMENT RESPONSE VARIABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 STATISTICAL ANALYSIS OF FOREST VEGETATION MANAGEMENT DATA . . . . . . . . . . . . . . . 4

4.1 Summary Statistics and Exploratory Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.2 Model Selection and Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2.1 Analysis of response of non-crop vegetation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64.2.2 Analysis of response of crop trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3 Model Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 Power Analysis and Sample Size Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Interpretation and Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 SAS Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6.1 Summary statistics and exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6.2 Repeated measures analysis and univariate ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.6.3 Log-linear analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.6.4 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.7 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.7.1 Univariate ANOVA of pre-treatment heights of crop trees (1985) . . . . . . . . . . . . . . . . . 234.7.2 Repeated measures ANOVA of heights of crop trees (1985–1988) . . . . . . . . . . . . . . . 274.7.3 Log-linear analysis of frequency of occurrence of target species . . . . . . . . . . . . . . . . . 37

5 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

APPENDIX 1 SAS programs for the analysis of forest vegetation management data . . . . . . . . . . 41

A Univariate ANOVA and repeated measures analysis of continuous response variables . . . . 41

B Log-linear (logistic) analysis of categorical response variables . . . . . . . . . . . . . . . . . . . . . . . . . . 44

C FPOWTAB: power analysis for general linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

D CPOWTAB: power analysis for log-linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Page 6: BIOMETRICS INFORMATION - British Columbia

v

TABLES

1 Response variables for forest vegetation management trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 ANOVA for a single response variable (Y) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Comparison of six multiple comparisons methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Univariate repeated measures ANOVA (completely randomized design) . . . . . . . . . . . . . . . . . . . 13

5 Hypothetical frequencies (number of subplots out of 60) of occurrence of a target species . . 15

6 Hypothesis testing: types of errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7 Hypothetical treatment effects for power calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

8 Data for herbicide trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

FIGURES

1 Comparison of completely randomized and randomized block designs . . . . . . . . . . . . . . . . . . . . 2

2 Plot of mean cover of vegetation versus time for four treatments (A,B,C,D) and a control group 5

3 Plot of frequency of occurrence of target species versus time for four treatments (A,B,C,D) and acontrol group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Hypothetical plot of treatment mean versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5 Hypothetical plot of vegetation cover versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Hypothetical (exponential) height growth curves for crop trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

7 Output from FPOWTAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 SAS output for herbicide trial: univariate ANOVA of heights of crop trees — 1985 . . . . . . . . . . 25

9 SAS output for herbicide trial: analysis of 1985 height residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 28

10 SAS output for herbicide trial: repeated measures ANOVA of heights of crop trees . . . . . . . . . 30

11 SAS output for herbicide trial: repeated measures ANOVA of plot averages . . . . . . . . . . . . . . . 34

12 SAS output for herbicide trial: log-linear analysis of frequency of occurrence of target species 38

Page 7: BIOMETRICS INFORMATION - British Columbia

1 INTRODUCTION

Pollack and Herring (1985) and Herring and Pollack (1985) provide a detailed set of guidelines for thedesign and analysis of Level A and Level B vegetation management trials. Both types of trials are used toinvestigate the effects of various treatments in the implementation of one of the three basic competitioncontrol strategies — site preparation, stand establishment, and stand release. Level A trials are lessintensive than Level B trials and are generally used for screening new treatments, and for confirmingprevious treatment recommendations. Level B trials are used to carry out more comprehensive assessmentsof potentially useful treatments. The purpose of this report, which is intended as a supplement to Pollack andHerring (1985) and Herring and Pollack (1985), is to provide a general set of guidelines for the statisticalanalysis of Level A and Level B trials.

The design of a vegetation management study (i.e., the method of randomization and replication, thetype of measurements, etc.) is obviously important in determining the appropriate method of analysis. Theguidelines given in this report are based on the assumption that the design conforms to the recommenda-tions set forth by Pollack and Herring (1985) and Herring and Pollack (1985). The features of the design thataffect the statistical analysis are discussed briefly in Section 2.

In addition to the design, the types of variables to be analysed must be considered in the selection of thestatistical methods. The variables that are commonly encountered in vegetation management trials aresummarized in Section 3. These include both continuous variables (e.g., height, diameter, cover) andcategorical (or coded) variables (e.g., condition). In general, different methods must be employed for thestatistical analysis of the two types of data.

The single most important consideration in any statistical analysis is the goal of the study. Pollack andHerring (1985) state that all forest vegetation management trials have the same basic objectives:

• ‘‘to identify site and stand treatments which manipulate vegetation on forest sites to favour theestablishment and growth of commercial tree species, and to determine further research needsassociated with their use’’ and

• ‘‘to determine the impact of these treatments on growth of commercial tree species and competingvegetation species or communities.’’

For the purposes of this report, a narrower objective will be defined. It will be assumed that the primaryobjective is to compare several treatments (including a control) in terms of their effect on the height, cover,condition, and occurrence of non-crop vegetation, and in terms of the direct (e.g., herbicide damage) andindirect (e.g., changes in the growth rate as a result of reduced competition) effects on crop trees. Thisobjective is limited in that it does not require an investigation of the (causative) relationship between theresponse of the non-crop vegetation and that of the crop trees, or such other issues as the competitionbetween the various non-crop species. Consequently, the general approach described here includes aseparate analysis of the response of the non-crop vegetation and that of the crop trees. The recommendedmethods of analysis for the two responses are discussed in Section 4. A collection of generic SAS programsfor performing the analyses is given in Appendix 1.

2 DESIGN OF FOREST VEGETATION MANAGEMENT TRIALS

Throughout this report, it will be assumed that a completely randomized design or a randomized blockdesign, as prescribed by Pollack and Herring (1985) and Herring and Pollack (1985), is used. The twodesigns are depicted in Figure 1 for the case of three treatments (A,B,C), each of which is applied to threeplots (nine plots in total). In the completely randomized design (Figure 1A), the treatments are assigned atrandom to any of the nine plots. In the randomized block design (Figure 1B), the plots are grouped into threehomogeneous blocks (three plots each) and the randomization is constrained so that each treatment isassigned to one plot within each block. The blocks need not be physically continuous. In both designs, theplots are typically divided into an equal number of subplots (usually 20–30) and the sampling unit is either a

Page 8: BIOMETRICS INFORMATION - British Columbia

2

(A) Completely randomized design

A B B

C C A

B A C

(B) Randomized block design

C A BBlock 1

B C ABlock 2

B A CBlock 3

FIGURE 1. Comparison of completely randomized and randomized block designs.

single specimen (e.g., a crop tree or a non-crop tree) within each subplot, or, in the case of herbaceousvegetation, the subplot itself (i.e., the measurements are averages for the subplots). In either case, thesampling unit will be referred to as a ‘‘subplot.’’ For an overview of the design of forestry trials, the reader isreferred to Stafford (1985), who discusses the completely randomized and randomized block designs, aswell as the split-plot design.

The completely randomized design has three sources of variation in the sampling units: 1) differencesdue to the treatments; 2) random variation between plots receiving the same treatment; and 3) randomvariation between subplots nested within a given plot. The randomized block design has five potentialsources of variation: 1) differences due to the treatments; 2) inherent differences between blocks;3) differences due to the interactions between treatments and blocks; 4) random variation between plots inthe same block and receiving the same treatment; and 5) random variation between subplots nested within agiven plot. If there is only one plot per treatment per block in the randomized block design, it is generallyassumed that there is no interaction, that is, the treatment effect is assumed to be the same in all blocks.

A randomized block design will usually be preferrable to a completely randomized design whenever alarge proportion of the overall variability in the response variable (which is not due to the treatments) can beattributed to the variation between blocks. This means that a substantial part of the variability that wouldremain unexplained in the completely randomized design is explained in the randomized block design.Consequently, there is a reduction in the error mean square, which makes it easier to detect a treatmenteffect without increasing the sample size. For example, blocking is likely to be useful when much of thevariability in the response variable of interest (e.g., height or diameter growth of crop trees) is due to agradient in the moisture or fertility of the soil. In such cases, it will probably be advantageous to divide thestudy area into blocks that have a similar moisture content or fertility. On the other hand, sub-dividing ahomogeneous population into arbitrary blocks serves no useful purpose.

According to the recommendations of Pollack and Herring (1985) and Herring and Pollack (1985), thetreatments in a vegetation management trial are assessed by measuring the same subplots on severaldifferent occasions, (e.g., before the treatments are applied, one growing season later, two growing seasonslater, etc.). Therefore, in addition to the treatment and plot effects, the effect of time must be considered. Thistype of design is often referred to as a ‘‘repeated measures design.’’ Although it may be informative to make aseparate comparison of the treatments for each assessment (e.g., using the analyses of variance suggestedby Herring and Pollack [1985]), questions related to the changes over time (e.g., Does the relative ranking ofthe treatments change with time? Is there a difference in the growth rates of the crop trees for the differenttreatment groups?) are obviously of interest as well. To answer these questions, special methods of analysisare required which take into account that the measurements are likely to be correlated over time.

Page 9: BIOMETRICS INFORMATION - British Columbia

3

It is important to note that the designs described by Pollack and Herring (1985) and Herring and Pollack(1985) are balanced. That is, there is an equal number of subplots per plot and an equal number of plots pertreatment (or per treatment × block combination). In general, the anaysis for balanced designs is simpler thanit is for unbalanced designs. Therefore, special care must be taken when missing data renders the designunbalanced, or the sample sizes are unequal for some other reason. For example, in an analysis of variancewith nested random effects, it is necessary to check that the usual F-tests are applicable when the samplesizes are unequal (e.g., see Dunn and Clark [1974] for a discussion of this point). The advice of a statisticianshould be sought in such cases.

Sample size is an important consideration in the design and analysis of vegetation management trials.The sample size must be sufficiently large to ensure that any treatment effect of practical significance has ahigh probability of detection. Sample size and power calculations should obviously be carried out at theplanning stages of an analysis. However, they can also be valuable when one is interpreting the results of adata analysis. This will be discussed in more detail in ‘‘Power Analysis and Sample Size Calculations’’(Section 4.4).

3 TREATMENT RESPONSE VARIABLES

Vegetation management trials typically involve the measurement of a relatively large number of vari-ables. These can be separated into two groups: 1) variables that are used to assess the treatment responseof the non-crop vegetation; and 2) variables that are used to assess the response of the crop trees. Bothgroups include variables that can reasonably be considered continuous (e.g., height, diameter, and cover),as well as categorical variables (e.g., the condition of the foliage, leader, and stem). Binary variables are aspecial type of categorical variable. They have only two values (e.g., dead or alive; target species present orabsent) and arise in connection with the analysis of survival and species occurrence, or when a variable is re-coded by combining categories (e.g., damage cause might be coded simply as herbicide damage or noherbicide damage).

A list of the standard response variables, together with the names that will be used in the SAS programs(Section 4.5), is given in Table 1. The suffixes 1,2, . . . ,T are used to identify the assessment times. That is,HT_1 denotes the pre-treatment assessment of height, and HT_2 through HT_T denote the consecutivepost-treatment assessments (usually T is 2 or 3). The arrays HT_1, HT_2, . . . , HT_T; DIAM_1, DIAM_2, . . . ,DIAM_T, etc., are denoted HT_1–HT_T, DIAM_1–DIAM_T, etc.

In the case of the non-crop vegetation, the response variables are often evaluated for the overallvegetation, and for a small number of target species (three to five) identified at the outset of the study. Forsimplicity, and because the same general methods of analysis can be applied to both, no distinction is madehere between the two sets of measurements. However, since the target species will not necessarily occur inevery subplot, particularly after the treatments have been applied, the height and condition variables for theindividual species are expected to have a number of missing values. The appropriate treatment of thesemissing values should be given careful consideration. For example, the usual problems of interpretation(e.g., the possibility of bias), and such technical difficulties as those associated with small or unequal samplesizes, must be tackled. On the other hand, the analysis of the cover for the individual species presents nospecial problems because the absence of a species implies that the cover is zero — that is, it is not a missingvalue.

Detailed definitions of all the variables, as well as the codes for the categorical variables, can be found inHerring and Pollack (1985).

Page 10: BIOMETRICS INFORMATION - British Columbia

4

TABLE 1. Response variables for forest vegetation management trials

(A) Non-crop vegetation response variables

Variable name Definition Type

HT_1 – HT_T Plant height Numeric (continuous)

COV_1 – COV_T Plant cover Numeric (continuous)

FOLI_1 – FOLI_T Foliage condition Character (categorical)

LEAD_1 – LEAD_T Leader condition Character (categorical)

STEM_1 – STEM_T Stem condition Character (categorical)

CAUS_1 – CAUS_T Damage cause code Character (categorical)

ECW_1 – ECW_T ECW condition code Numeric (categorical)

VIG_1 – VIG_T Vigour code Numeric (categorical)

(B) Crop tree response variables

Variable name Definition Type

HT_1 – HT_T Tree height Numeric (continuous)

DIAM_1 – DIAM_T Tree diameter Numeric (continuous)

FOLI_1 – FOLI_T Foliage condition Character (categorical)

LEAD_1 – LEAD_T Leader condition Character (categorical)

STEM_1 – STEM_T Stem condition Character (categorical)

CAUS_1 – CAUS_T Damage code Character (categorical)

ECW_1 – ECW_T ECW condition code Numeric (categorical)

TOP_1 – TOP_T Overtopping code Numeric (categorical)

VIG_T – VIG_T Vigour code Numeric (categorical)

4 STATISTICAL ANALYSIS OF FOREST VEGETATION MANAGEMENT DATA

Before carrying out a statistical analysis, it is essential to have a precise statement of the objectives ofthe study or the hypotheses to be tested. In addition, the relevant details of the design should be reviewedand the variables of interest identified. Once these preliminaries have been completed, there are four mainsteps in a statistical analysis: 1) an exploratory analysis of the data, which usually consists of the computa-tion of various summary statistics and graphical displays of the data; 2) the selection and fitting of a model onwhich statistical inferences (e.g., confidence intervals, tests of hypotheses) are to be based; 3) the verifica-tion of the model assumptions; and 4) the interpretation and summary of the results of the analysis.

Page 11: BIOMETRICS INFORMATION - British Columbia

5

4.1 Summary Statistics and Exploratory Methods

Summary statistics and graphical displays of the data are usually employed in the initial exploratorystages of a statistical analysis, where they are valuable for examining the distributional properties of the data.They are also useful for describing time trends and relationships between the variables (for numerousillustrations, see Tukey 1977; Chambers et al. 1983; Wilkinson 1988). The information derived from summarystatistics and plots can be very helpful in determining an appropriate method of analysis.

In the case of the continuous response variables (see Table 1), it is useful to tabulate the mean, standarddeviation, skewness, kurtosis, and sample size, by treatment (plot or block) and by assessment time. It isalso useful to prepare boxplots (of the subplot values) for the corresponding subsets of the data. A subjectiveevaluation of the summary statistics and boxplots can reveal the presence of gross outliers (e.g., codingerrors, keypunching errors, etc.), heterogeneity of the variances for the treatment groups, and departuresfrom a normal (or symmetric) distribution. A plot of the treatment group means versus time (see Figure 2) is aconvenient way of examining the effects of time. Such a time-plot, for example, can be helpful in determiningwhether or not there is an interaction between treatment and time (as indicated by a crossing over of thetrend lines), and in examining the differences between the treatments, as well as overall time trends.

FIGURE 2. Plot of mean cover of vegetation versus time for four treatments (A,B,C,D) and a control group.

Categorical variables (see Table 1) can be summarized by tabulating (in a multi-way contingency tablewith cells defined by treatment, time, etc.) the absolute or relative (percent) frequencies for the variousresponse categories. If relative frequencies are used, the sample size should be specified. For clarity, it issometimes advisable to combine several response categories. When cause of damage is being tabulated,for example, it may be more revealing to record the proportion of trees with herbicide damage, the proportionwith some other type of damage, and the proportion that is healthy, rather than to tabulate frequencies for all18 types of damage (see Herring and Pollack 1985). Frequency bar charts and time-plots of the proportionscan be useful for depicting the differences between the treatment groups and the effects of time.

To tabulate response frequencies, it may be helpful to use the baseline values to correct for initialdisparities between the treatment groups. For example, when the frequency of occurrence of a particular

Page 12: BIOMETRICS INFORMATION - British Columbia

6

species is being examined, a more informative approach might be to tabulate separately the frequency forthose subplots in which the species was initially present and for those subplots in which it was absent (seeFigure 3).

4.2 Model Selection and Model Fitting

To make statistical inferences concerning the efficacy of the treatments, an appropriate probabilitymodel must be selected and fitted to the data. In general, the models that are likely to be most useful in theanalysis of forest vegetation management data are analysis of variance models, including repeated mea-sures models, and log-linear models. Analysis of variance (repeated measures) models can be used toanalyse the continuous response variables; log-linear models can be used to analyse the categoricalvariables. Some general guidelines for the application of these models are given below. Although there isconsiderable overlap of the methodology, the non-crop vegetation and the crop trees are discussed sep-arately, since different issues arise in connection with the analysis of the responses of each type ofvegetation.

4.2.1 Analysis of response of non-crop vegetation

Height and cover

For the completely randomized design, a suitable model for the analysis of the continuousvariables, height and cover, is:

yijk(t) = u(t) + ai(t) + bj(i)(t) + eijk(t) (1)

in which yijk(t) is the measured height, or cover (for all species combined, or for a single targetspecies) for treatment i, plot j, subplot k, and assessment time t;

u(t) is the overall mean for time t;ai(t) is the fixed effect of the ith treatment at time t;bj(i)(t) is the random effect of plot j (nested within treatment i) at time t; andeijk(t) is the residual random error.

If the data for each assessment are analysed separately, then the model given by equation 1reduces to the usual univariate, one-way nested analysis of variance (ANOVA) model. On the otherhand, to examine the effects of time, the model can be interpreted as a multivariate model. That is,equation 1 becomes a system of equations (one for each assessment time t). The latter model isusually referred to as a multivariate repeated measures model.

The univariate or multivariate model given by equation 1 must be modified slightly when arandomized block design is used instead of a completely randomized design. For the randomizedblock design, the model becomes:

yijk(t) = u(t) + ai(t) + bj(t) + abij(t) + eijk(t) (2)

in which yijk(t), u(t), and eijk(t) are defined as before;ai(t) is the fixed main effect of the ith treatment at time t;bj(t) is the random (fixed) effect of block j at time t; andabij(t) is the (random or fixed) interaction between the treatment and the block effects, at

time t.

Statistical inferences based on either of the preceding univariate or multivariate models(equation 1 or 2) assume implicitly that the model provides an adequate representation of the data.

Page 13: BIOMETRICS INFORMATION - British Columbia

7

FIGURE 3. Plot of frequency of occurrence of target species versus time for four treatments (A,B,C,D) and acontrol group: (A) subplots in which the species was absent for the 1985 pre-treatment assess-ment and (B) subplots in which the species was present for the 1985 pre-treatment assessment.

Page 14: BIOMETRICS INFORMATION - British Columbia

8

In particular, it is assumed that the model includes all the relevant factors and that these factorsenter in a linear fashion. Various assumptions about the random effects, bj(i)(t) (or bj(t) and abij(t)),and the random error, eijk(t), are also required. These include the assumption of a common variance(or covariance matrix, in the case of the multivariate repeated measures model) for all the treatmentgroups, plots, and subplots, (multivariate) normality, and independence of the random terms (foreach assessment time). Dunn and Clark (1974) and Hand and Taylor (1987) give a more detaileddiscussion of the assumptions of the univariate ANOVA model and the multivariate repeatedmeasures model.

Analysis of variance (ANOVA)

A univariate ANOVA based on the model given by equations 1 or 2 can be used to compare thetreatments for each assessment time. There are various reasons why it might be appropriate to dothis. For example, it is obviously advisable to carry out an ANOVA of the pre-treatment data todetermine whether or not there were any significant differences between the groups before thetreatments were applied (since such differences might affect the analysis and interpretation of thepost-treatment measurements). In addition, it is sometimes appropriate to carry out a separateANOVA of each assessment when the treatment effects are found to vary with time (see ‘‘Analysisof repeated measures,’’ below).

Table 2 summarizes the ANOVA for a single response variable for a single assessment time(e.g., Y = HT_1). The table gives the sources of variation, degrees of freedom, and F-ratios (MSdenotes the mean squares) for the completely randomized design (Table 2A) and for the ran-domized block design (Table 2B). The corresponding models are given in SAS notation at the top ofeach sub-table. In the completely randomized design, it is assumed that each of K treatments israndomly assigned to M plots. In the randomized block design, it is assumed that each of Ktreatments is randomly assigned to one plot within each of M blocks. In both cases, the number ofsubplots per plot is N.

The model for the randomized block design assumes there is no interaction between treatmentand block. Thus, the sum of squares and degrees of freedom for what would otherwise be the‘‘interaction’’ term is used to estimate the variation between plots, or the ‘‘experimental error.’’ If thedesign includes more than one plot per treatment per block, then both the interaction and thevariation between plots can be estimated. In that case, the appropriate model is Y = TREAT BLOCKTREAT*BLOCK PLOT(TREAT*BLOCK) and the analysis is more complicated. When the interactionterm is omitted, the analysis is the same regardless of whether or not block is a random effect.

Multiple comparisons methods (simultaneous statistical inference)

The F-tests in an ANOVA are used to detect an overall effect (e.g., a difference betweentreatments). To investigate these differences, appropriate multiple comparisons (such as a compari-son of all pairs of treatments) can be made. When multiple comparisons are made, the collectiveprobability of making a Type I error (see Section 4.4) in connection with at least one comparison isgreater than the probability of making an error for a single comparison. The former probability isknown as the experimentwise error rate (EER) and the latter is the comparisonwise error rate(CER).

A number of methods are commonly used for multiple comparisons. These so-called multiplecomparisons methods control the EER and CER to varying degrees and include: 1) methodsbased on the t-test (e.g., Fisher’s protected or unprotected least significant difference [LSD], theBonferroni method and Sidak’s method); 2) Scheffe’s method, which is derived from the ANOVAF-test; and 3) tests based on the studentized range distribution (e.g., Tukey’s honestly significantdifference method and Duncan’s multiple range test). In the case of Fisher’s LSD, ‘‘protected’’means that multiple comparisons are carried out only if the corresponding F-test is significant.Otherwise, it is called ‘‘unprotected.’’ The methods based on the studentized range are sometimesreferred to as multiple range tests.

Page 15: BIOMETRICS INFORMATION - British Columbia

9

TABLE 2. ANOVA for a single response variable (Y)

(A) Completely randomized design

Model: Y = TREAT PLOT(TREAT)

Source of Type of Degrees ofvariation factor freedom F-ratio

Treatments: TREAT Fixed K−1 MS(TREAT)/MS(PLOT)

Plots: PLOT(TREAT) Random K(M−1) MS(PLOT)/MS(Error)

Subplots: Error Random KM(N−1)

Total KMN−1

(B) Randomized block design

Model: Y = TREAT BLOCK PLOT(TREAT BLOCK)

Source of Type of Degrees ofvariation factor freedom F-ratio

Treatments: TREAT Fixed K−1 MS(TREAT)/MS(PLOT)

Blocks: BLOCK Random M−1 MS(BLOCK)/MS(PLOT)

Plots: PLOT(TREAT BLOCK) Random (K−1)(M−1) MS(PLOT)/MS(Error)

Subplots: Error Random KM(N−1)

Total KMN−1

Much has been written about the preceding and other multiple comparisons methods (e.g.,Miller 1981; Milliken and Johnson 1984; SAS Institute Inc. 1985; Fuchs and Sampson 1987). (SeeHochberg and Tamhane 1987 for a mathematical treatment of the subject.) Unfortunately, no onemethod emerges as the ‘‘best’’ in all situations. Selection should therefore be guided by anunderstanding of the differences between the methods, and by the objectives of the study. Regard-less of the method, multiple comparisons should not be made indiscriminately. Careful thought mustbe given to the types of comparisons that are appropriate for the study at hand (see Mize andSchultz 1985; Warren 1986).

Table 3 is a comparison of six common multiple comparisons procedures. The first column isthe name of the method. These are consistent with those in the SAS User’s Guide (SAS InstituteInc., 1985). The second column gives the actual EER when an alpha level (α) is specified; the thirdcolumn lists any restrictions on the number or type of comparisons; the last column specifieswhether or not the method can be used to construct simultaneous confidence intervals, such thatall the intervals are ‘‘on target’’ with an overall confidence level equal to the specified value. Theexact EER for Fisher’s unprotected or protected LSD and Duncan’s method is unknown.

The first four methods in Table 3 can be used to compare groups with unequal sample sizes.Tukey’s method and Duncan’s method were originally developed for equal sample sizes. However,modifications to handle the unequal sample size case have been proposed for both, although thesetend to be less reliable than the other methods when there is a wide range in sample sizes.

Page 16: BIOMETRICS INFORMATION - British Columbia

10

TABLE 3. Comparison of six multiple comparisons methods

Type of SimultaneousMethod EER comparisonsa confidence intervals

Fisher’s LSDb Unknown Planned Yes

Bonferroni ≤ α Planned Yes

Sidak ≤ α Planned Yes

Scheffe α Unplanned, planned Yes

Tukey ≤ α Unplanned, planned Yes

Duncan Unknown Pairwise No

a All comparisons are assumed to be expressed in terms of a contrast, although most methods can be generalized to anylinear combination of the means.

b Fisher’s protected or unprotected LSD.

An important disadvantage of Duncan’s method is the failure to provide confidence intervals.Therefore, although the procedure can be used to test for significant differences for all pairs ofmeans, it cannot estimate the size of those differences. The remaining five methods can be used toconstruct confidence intervals. Since Duncan’s method is thought to have similar properties toFisher’s unprotected LSD, there does not appear to be any good reason to use Duncan’s method(see Milliken and Johnson 1984; SAS Institute Inc. 1985). Despite this, the method is popular in theforestry literature.

Another factor governing the selection of a method is the number and nature (e.g., planned,unplanned, pairwise, and contrasts) of the comparisons. Fisher’s protected or unprotected LSD isuseful for a relatively small number of comparisons involving any type of contrast (or linearcombination of means). The advantage of Fisher’s LSD over some of the other methods is that, likethe Bonferroni method and Sidak’s method, it is easy to use since it is nothing more than a repeatedt-test. For example, to compare all pairs of treatment means with a CER of 5%, the LSD for theequal sample size (n) case, is computed as:

LSD = t(0.025,v) s √2/n

where t(0.025,v) is the upper 2.5 percentile of the t-distribution with v degrees of freedom, ands √2/n is an estimate of the standard deviation of the difference between the sample means for twotreatments. The standard deviation s and the degrees of freedom v will depend on the design. Insome cases, s is simply the square root of the error mean square. According to Fisher’s method,two treatment means are declared to be significantly different if the difference between the samplemeans exceeds the LSD. If the sample sizes are equal, it is sometimes instructive to include theLSD in a plot of the treatment means (versus time).

The main drawback of Fisher’s LSD is that it does not control the EER, which can becomeunacceptably large if the number of comparisons is large (see Table 3.1 of Milliken and Johnson1984). One solution is to decrease the CER of the individual t-tests as in Bonferroni’s and Sidak’smethods. Since the size of the adjustment in these two methods depends on the number ofcomparisons, the comparisons must be planned. In general, the adjustment in the Bonferroniprocedure is too stringent, resulting in a loss of power and overly wide confidence intervals relativeto those in Sidak’s method. Therefore, Sidak’s method is usually to be preferred.

When a large number of unplanned (or planned) comparisons is anticipated, Scheffe’s orTukey’s method should be used since these control the EER without placing any restriction on the

Page 17: BIOMETRICS INFORMATION - British Columbia

11

number of comparisons. The choice between these two methods is not clear, although Scheffe’smethod tends to be more conservative (i.e., less likely to detect a significant difference) thanTukey’s method. However, Scheffe’s method is generally preferable for unequal sample sizes.

Analysis of repeated measures

Separate analyses of variance for each assessment time do not provide information about theeffects of time, and are subject to the usual problems associated with univariate analyses ofmultivariate data (e.g., an increase in the probability of a Type I error due to the application of manytests, unless an adjustment is made to the individual significance levels). A more appropriateapproach is repeated measures analysis of variance.

To analyse the effects of treatment and time, it is important to consider the various situationsthat might arise in a vegetation management trial. Figure 4 illustrates the four patterns that areusually envisaged in a repeated measures analysis. The graphs show the ‘‘population’’ means, orexpected values, for the treatment groups versus time. In Figure 4A, the time-plots for the treatmentgroups are parallel — that is, there is no treatment × time interaction. This implies that the meangrowth increments for each time (the mean at time t minus the mean at time (t−1), with t=2,3,4) areequal across groups. Figure 4B is a special case of 4A, in which there is no difference between thegroups: the time-plots are coincident. Another special case occurs when there is no time effect forany treatment group (Figure 4C): the time-plots are horizontal for all treatments. Finally, Figure 4Dshows a general situation in which none of the preceding cases pertains and there is a treatment ×time interaction.

Figures 4A–C correspond to the three hypotheses that are routinely tested in a repeatedmeasures analysis:

H01: There is no treatment × time interaction (parallelism, Figure 4A).

H02: There is no treatment effect (Figure 4B).

H03: There is no time effect (Figure 4C).

For obvious reasons, the hypothesis of parallelism must be tested first. If H01 is retained, then it isappropriate to test for a treatment effect (H02) and for a time effect (H03).

When the hypothesis of no treatment × time interaction is rejected and no particular pattern ofinteraction is expected, then it may be desirable to compare the treatments by carrying out a simpleunivariate ANOVA for each assessment time (as discussed before). Similarly, the effects of time canbe assessed separately for each treatment (e.g., using a repeated measures analysis). This generalapproach is recommended by Looney and Stanley (1989). Alternatively, it may be more appropriateto test for a specific type of treatment × time interaction. For example, the treatment × timeinteraction might be adequately described by a set of lines (polynomials) with different slopes(coefficients), in which case it is appropriate to assume a linear (polynomial) time trend for eachtreatment group and to test for a significant difference in the slopes (coefficients). This is equivalentto carrying out an ANOVA of the values obtained by computing orthogonal polynomial contrasts ofthe repeated measurements. This is easily done in SAS by specifying the POLYNOMIAL option inthe REPEATED statement in PROC GLM. For more information on the analysis of polynomialtrends, refer to Littell (1989). Also see Meredith and Stehman (1991).

There are two distinct approaches to testing the three hypotheses H01, H02, and H03 in arepeated measures analysis. One is based on a univariate ANOVA model, in which treatment andtime (i.e., date of assessment) are included as factors. The second is based on a multivariateANOVA model, in which treatment is a factor and the repeated measurements for each samplingunit are treated as a multivariate observation (i.e., a vector with elements indexed by time). The twotypes of analyses require different assumptions. In the univariate approach, it is assumed that thevariances and covariances of the repeated measurements are such that all orthogonal contrastshave the same variance, and the covariance between any pair of orthogonal contrasts is zero. Thisis called the Huynh-Feldt condition, and can be tested with a ‘‘sphericity test.’’ The

Page 18: BIOMETRICS INFORMATION - British Columbia

12

FIGURE 4. Hypothetical plot of treatment mean versus time: (A) no treatment × time interaction, (B) notreatment effect (lines coincident), (C) no time effect, and (D) treatment × time interaction.

Page 19: BIOMETRICS INFORMATION - British Columbia

13

multivariate model makes no assumptions about the variances or covariances of the repeatedmeasurements. However, because the multivariate model is general, the ANOVA tests can be lesspowerful (for detecting departures from the null hypotheses) than the corresponding univariatetests. As a compromise, various adjustments to the univariate tests have been proposed, whichallow a relaxation of the Huynh-Feldt condition without causing a substantial loss of power.

The sources of variation, degrees of freedom, and error terms (i.e., the denominators for theF-tests) for each factor in a univariate repeated measures ANOVA are summarized in Table 4, forthe case of a completely randomized design. The notation is the same as in Table 2A, except thatthe repeated factor (TIME with T levels) has been added.

TABLE 4. Univariate repeated measures ANOVA (completely randomized design)

Degrees of Error term forSource freedom testing effect

Between subplots

TREAT (K−1) PLOT(TREAT)

PLOT(TREAT) K(M−1) Error

Error KM(N−1)

Total KMN−1

Within subplots

TIME (T−1) TIME*PLOT(TREAT)

TIME*TREAT (T−1)(K−1) TIME*PLOT(TREAT)

TIME*PLOT(TREAT) (T−1)K(M−1) Error (TIME)

Error (TIME) (T−1)KM(N−1)

Total (T−1)(KMN−1)

Total TKMN−1

For the corresponding multivariate ANOVA, the ‘‘within subplots’’ F-tests are replaced bysuitable multivariate tests. These are generalizations of the F-tests specified in Table 2A. Each testrequires the computation of an appropriate ‘‘error matrix,’’ which is a multivariate version of themean square that appears in the denominator of the corresponding F-test in Table 2A. For moreinformation on both univariate and multivariate testing procedures for the ANOVA of repeatedmeasures, the reader should refer to Hand and Taylor (1987), Littell (1989), Looney and Stanley(1989), or Meredith and Stehman (1991).

In vegetation management trials, hypotheses other than H01, H02, and H03 might also be ofinterest. For example, since the treatments are usually expected to produce an initial reduction incompetitor vegetation height or cover (Figure 5A), the hypothesis of parallelism may be unrealistic.In such cases, it is often possible to manipulate the data so that H01, H02, and H03 become relevant,by replacing the raw data with the post-treatment minus pre-treatment differences (see Figure 5B).Alternatively, the null hypotheses can be redefined and the (multivariate) ANOVA tests adjustedaccordingly.

An adjustment to the repeated measures analysis might also be required if there are significantpre-treatment differences among the groups. In that case, it may be appropriate to use the

Page 20: BIOMETRICS INFORMATION - British Columbia

14

FIGURE 5. Hypothetical plot of vegetation cover versus time: (A) pre-treatment (time=1) and post-treatment(time=2,3,4) means and (B) post-treatment minus pre-treatment differences.

Page 21: BIOMETRICS INFORMATION - British Columbia

15

pre-treatment measurement as a covariate (provided the linearity assumption is valid) when thepost-treatment differences are being analysed.

Species occurrence, condition, and other categorical variables

In addition to height and cover, a variety of categorical variables are used to measure theresponse of the non-crop vegetation. Since these variables take on a relatively small number ofvalues, usually 10 or less which are not necessarily ordinal, (although there are 18 damage codeslisted in Herring and Pollack 1985), the ANOVA and repeated measures analysis described in theprevious section are not directly applicable. Analogous methods, based on a log-linear (logistic)model, have instead been developed for the analysis of categorical data.

Species Occurrence

Treatments are often evaluated by examining the frequency of occurrence of the targetspecies. Consider, for example, the hypothetical data in Table 5, which gives, for each treatment iand assessment time t, the number of subplots yij(t) in which a particular target species occurs (j=1)and does not occur (j=0). Notice that, for simplicity, the plot and block structure of the design hasbeen ignored. That is, the frequencies for each treatment and year have been pooled over plots orblocks. This presupposes that there is no inherent difference between the plots or blocks. Themodel can readily be generalized to include fixed block or other effects, by incorporating therelevant factors into the classification. However, the sample size must be large enough to supportthe increased number of classes. For example the large sample approximations for the goodness-of-fit tests must be applicable (see below).

TABLE 5. Hypothetical frequencies (number of subplots out of 60) of occurrence of a target species

Assessment time: Pre-treatment Post-1 Post-2 Post-3

Species absent/present (0/1): 0 1 0 1 0 1 0 1

Control 30 30 28 32 20 40 35 25

Treatment 1 25 35 50 10 37 23 36 24

Treatment 2 38 22 55 5 50 10 42 18

Log-linear analysis

For a given assessment time t, an appropriate (saturated) log-linear model for analysing theobserved frequencies yij(t) (Table 5) is:

log[mij(t)] = u(t) + ai(t) + bj(t) + abij(t) (3)

in which mij(t) is the expected number of subplots receiving the ith treatment, which contain thespecies (j=1), or do not contain the species (j=0). Equation (3) is analogous to equation 1 or 2,except the terms involving the plot or block effects have been omitted.

An equivalent and simpler representation of the preceding model is the logistic model:

log{pi(t)/[1−pi(t)]} = v(t) + ci(t) (4)

Page 22: BIOMETRICS INFORMATION - British Columbia

16

in which pi(t) is the probability that the species of interest occurs (at time t) in a randomly selectedsubplot receiving the ith treatment. The left-hand side of equation 4 is the log-odds (logarithm of theodds) that the species of interest will occur when treatment i is applied. For example, if the odds thatthe species will occur is two to one, then pi(t)=2/3 and the log-odds is log(2). The overall log-odds isv(t) while ci(t) measures the effect of the treatment i.

In a logistic or log-linear model, various comparisons can be made by testing the significance oflog-odds ratios. A log-odds ratio is the logarithm of the ratio of the odds for two sets of conditions.These are easily expressed in terms of the model parameters. For example, for the model given byequation 4, the log-odds ratio comparing treatments i and i′ is:

log{[pi/(1−pi)]/[pi ′/(1−pi ′)]} = ci − ci ′

where t has been dropped for simplicity. If there is no difference between the two treatments, thenpi = pi ′ (i.e., the odds ratio is 1) and ci = ci ′ (since log(1) = 0).

The null hypothesis that none of the treatments has an effect is, for the logistic model ofequation 4:

H0: c1(t) = c2(t) = . . . = 0

This null hypothesis can be tested by computing a ‘‘goodness-of-fit’’ statistic that measures theagreement between the observed frequencies and the frequencies expected under H0. Two good-ness-of-fit statistics are commonly used: the chi-squared statistic and the likelihood-ratio statistic(deviance). These are approximately equivalent when the sample size (i.e., the total number ofsubplots) is large. If the null hypothesis is rejected, the effects of individual treatments can beinvestigated by testing the significance of (or constructing confidence intervals for) appropriatecontrasts of the model parameters. For example, for the logistic model (equation 4), a series ofpairwise comparisons of the treatments can be conducted by testing the significance of c(i) − c(i′)for all pairs of treatments (i,i′).

The validity of statistical inferences based on log-linear model depends on the adequacy of themodel assumptions. In general, the model should match the sampling design. For example, if afixed number of subplots is assigned to each level of a particular factor, then that factor must beincluded in the model (see Fingleton 1984). The assumption that the observations are independentmust also be satisfied. Furthermore, since the chi-squared and likelihood-ratio tests are based onlarge sample theory, the expected frequencies must be sufficiently large (see SAS Institute Inc.1985: p. 205 for some guidelines). If some of the frequencies are small, because events occurrarely, it may be necessary to assume that there are no differences between certain classes and topool the frequencies accordingly.

A generalized version of the simple log-linear model (given by equation 3 or 4) can be used tocarry out more complicated analyses. In particular, the effects of time can be evaluated with a log-linear model analogous to the repeated measures model discussed previously. Here the frequen-cies for the assessment times t=1,2, . . . ,T are analysed simultaneously. Table 5, for example, isanalysed in its entirety, instead of analysing the sub-tables that correspond to each t separately. Asin the case of the continuous variables, it is important to consider carefully the hypotheses thatshould be tested, and to determine whether or not an adjustment should be made for possible pre-treatment differences. A detailed exposition of analysis of repeated measures for categorical data isbeyond the scope of this report. The interested reader is referred to the SAS User’s Guide (SASInstitute Inc. 1985) and Stanek (1988). For more information on the analysis of categorical datausing log-linear (logistic) models, see Cox (1970), Plackett (1981), Fingleton (1984), and Bergerud(1989).

Condition and other categorical variables

Log-linear models can be used to analyse any type of categorical variable (i.e., the responsevariable need not be binary), provided the underlying assumptions can be justified. For example,

Page 23: BIOMETRICS INFORMATION - British Columbia

17

the log-linear methodology described above can be used to analyse the treatment effects on thecondition of the foliage of a target species. There are, however, several points to remember whenanalysing the responses of individual species. First, the sampling design is not the same as before.In the analysis of species occurrence, the sample size was fixed for each treatment, plot or block. Inthe case of the condition, etc., of individual species, the sample sizes vary because specimensoccur in only a fraction of the subplots. Second, the total number of variables is large when theresponses of all the target species are considered collectively. Therefore, any log-linear analysis islikely to become unwieldy, unless it is restricted in a sensible fashion. In many cases, simplesummary statistics are likely to be more appropriate than a formal analysis of statisticalsignificance.

4.2.2 Analysis of response of crop trees

The analysis of the treatment response of the crop trees is generally simpler than that of the non-crop vegetation, because the problems that arise in connection with the latter (e.g., missing data) do notusually occur. Furthermore, there are generally fewer variables because the number of species of croptrees is generally smaller than the number of species of vegetation, and there is no ‘‘overall’’ measure-ment. Since most of the response variables for the crop trees are the same as the response variables forthe non-crop vegetation (see Table 1), methods similar to those described in Section 4.2.1 can beapplied. Therefore, the discussion in this section will be limited to those issues that relate specifically tothe analysis of the crop tree response.

Height and diameter

The height and diameter measurements for the crop trees are analogous to the height andcover measurements for the non-crop vegetation, in that they are continuous variables and thedesign is the same for both sets of data (see equations 1 and 2). Therefore, the ANOVA methodsdiscussed in Section 4.2.1 can be used to evaluate the differences between the treatment groupsfor each time. Likewise, a repeated measures approach can be used to evaluate the effects of time.However, in the case of the crop trees, the patterns of growth are likely to be much more regular,and a special type of repeated measures analysis known as the analysis of growth curves may beapplicable (see, for example, Potthoff and Roy 1964; Rao 1965; Meredith and Stehman 1991).

Analysis of growth curves

For crop trees, a plot of height or diameter versus time is expected to produce a more or lesswell-defined curve called a ‘‘growth curve.’’ That is:

y(t) = W(t) + e(t) (t=1,2, . . . ,T) (5)

in which y(t) is the measured height or diameter at time t;W(t) is mean height or diameter at time t; ande(t) is the unexplained error.

The function W(t) defines the growth curve. The instantaneous growth rate is the derivativedW/dt and the relative growth rate (RGR) is:

RGR = W−1 dW/dt = d[log(W)]/dt (6)

See Coombs et al. (1985) for a discussion of growth curves and growth rates. Since the errors e(t)are assumed to be correlated over time because the same trees are measured on severaloccasions, standard least squares regression methods (which assume independent errors) cannotbe applied. However, a repeated measures analysis approach can be used (see Morrison 1976).

To compare the height or diameter growth of several treatment groups, it is necessary toconsider the shape of the growth curves. If there are significant pre-treatment differences, correc-

Page 24: BIOMETRICS INFORMATION - British Columbia

18

tive action may be required before a repeated measures analysis is carried out. For example, Figure6A shows hypothetical height growth curves for three treatment groups. Each growth curve isexponential. That is:

Wi(t) = W0iexp(Rit) (7)

in which W0i is the initial height for the ith treatment group and RGR = Ri is the relative growth rate,which is independent of t for exponential growth. Obviously, it would be inappropriate to test thehypotheses H01, H02, and H03 (see Section 4.2.1) using the raw data, since the mean heights areinitially different for the three groups. However, if a log transformation is applied, the initialdifferences in height become an additive effect and a repeated measures analysis should lead tothe correct conclusion that RC = R1 = R2 — that is, the transformed growth curves are parallel(Figure 6B).

When there are no pre-treatment differences between the groups or a corrective transformationhas already been applied, and the transformed growth curves are adequately described by poly-nomials, polynomial contrasts can be used to assess the statistical significance of the differencesbetween the linear, quadratic, and higher order terms (see Hand and Taylor 1987; Littell 1989). Thisis equivalent to fitting a line (polynomial) to each group and then testing for equal slopes (coeffi-cients). If the growth curves cannot be described by polynomials, then they can be tested for thegeneral form of parallelism described previously (see Figure 4A), using a repeated measuresanalysis.

Survival, condition, and other categorical variables

The log-linear methodology discussed in Section 4.2.1 can be used to analyse categoricalresponse variables for the crop trees. The only new variable to consider is the binary (dead or alive)variable describing tree survival. Provided the sample size is sufficiently large, this can be analysedwith the same general methods as were described for the analysis of species occurrence. Ifmortality is rare, then the large sample theory for log-linear models is unlikely to hold, in which case,summary statistics may have to suffice.

4.3 Model Verification

As mentioned before, all statistical inference depends on the adequacy of the underlying probabilitymodel. Therefore, it is important to review the model assumptions and, where possible, verify that the dataare consistent with them. Standard methods for checking the validity of a model include various diagnosticresidual plots, e.g., boxplots, probability plots, and residuals versus fitted values, as well as formal tests fornormality and for a homogeneous variance. The latter should be used with caution since such tests may beunnecessarily stringent, given that ANOVA methods are robust against certain model departures. Most of thereference books mentioned in Section 4.2 contain a discussion of the model assumptions and methods fortheir verification.

The effects of model departures can sometimes be overcome through the prudent use of data tranfor-mations (e.g., a log transformation is often helpful in reducing skewness) and data reductions (e.g.,averaging over subplots can help induce normality).

4.4 Power Analysis and Sample Size Calculations

Sample size and power calculations should be made during the planning stages of a vegetationmanagement trial. However, they can also be a valuable adjunct to the data analysis, providing insight intothe interpretation of the results and the adequacy of the design.

Page 25: BIOMETRICS INFORMATION - British Columbia

19

FIGURE 6. Hypothetical (exponential) height growth curves for crop trees: (A) means for treatment groupand (B) log transformed group means.

Page 26: BIOMETRICS INFORMATION - British Columbia

20

The statistical significance of a test is the probability (α) that the null hypothesis (H0) will be rejectedwhen it is true. The power of a test is the probability (1−β) that the H0 will be rejected when it is false. Theseand some other common terms used in hypothesis testing are summarized in Table 6.

TABLE 6. Hypothesis testing: types of errors

Researcher’s decision

State of nature Reject H0 Do not reject H0

H0 true Type I error Correct decisionprobability = α probability = 1−α

(significance level) (confidence level)

H0 false Correct decision Type II errorprobability = 1−β probability = β

(power)

Ideally, trials should be designed so that tests can be carried out at a reasonable level of significance(e.g., 5%), while maintaining adequate power (e.g., 80%) against departures from H0 that are of practicalconcern. In general, this can be achieved only if the sample size is sufficiently large, which may not always bepossible because of restrictions on the time and cost of the trial.

Since the probability of making a Type I error is the significance level of the test, it can easily be adjusted.For this reason, H0 is usually defined so that the Type I error is the most serious error. For example, whentesting the efficacy of a herbicide, H0 is routinely taken to be the hypothesis that the herbicide has no effect(e.g., the expected diameter growth of treated crop trees is the same as that of untreated trees). In that case,a Type I error is committed if it is incorrectly concluded that the herbicide has a significant effect on growth,when it actually has none. Obviously, this could have serious consequences. On the other hand, it is alsoimportant to guard against incorrectly concluding that the null hypothesis is true simply because it is notrejected (e.g., concluding that a potentially useful herbicide has no effect). Such a conclusion is implicitlybased on the assumption that there is only a small probability of making a Type II error. Unless theappropriate power calculations have been made, this assumption cannot be justified.

The power of an F-test based on the general linear model (including multivariate and univariate ANOVAmodels) depends on: 1) the significance level; 2) the type of departure from H0 (e.g., the sizes of the varioustreatment effects); 3) the sample size; and 4) the error variance (covariance matrix). The power of other typesof tests (e.g., tests based on log-linear models) also depends on (1), (2) and (3), and in some cases, onauxiliary information similar to (4). While (1), (3) and (4) are straightforward, (2) requires careful considera-tion, since in most cases, various types of departures from H0 are possible. For example, in a vegetationmanagement trial, one or several treatments may differ from the control, and the size of the differences mayor may not depend on such other factors as species and time. To limit the possibilities, certain simplifyingassumptions (e.g., no interactions) must be made.

Power and sample size calculations for the t-test are discussed in most introductory textbooks onstatistics (e.g., Devore 1987; Marshall 1987). O’Brien (1987), Sanders (1989) and Nemec (1991) discuss thegeneral linear model and describe how PROC GLM of SAS can be used to compute the power of theassociated F-tests (see also Keppel 1973). In addition, O’Brien (1987) includes a brief discussion of powercalculations for log-linear models. A simple example using FPOWTAB is given below.

4.4.1 Example

Suppose a trial is conducted to evaluate three new fertilizers (A,B,C) by comparing the diametergrowth of three species of crop trees, 1 year after the trees have been treated with the following: (1) no

Page 27: BIOMETRICS INFORMATION - British Columbia

21

treatment (control), (2) fertilizer A, (3) fertilizer B, or (4) fertilizer C. Each treatment is randomly assignedto N trees of each species, giving a total sample size of 12N trees. Suppose the researcher wishes to besure that the sample size is large enough to detect the treatment effects given in Table 7 (this alternativehypothesis is referred to as a ‘‘scenario’’ in FPOWTAB). The SAS program FPOWTAB (see ‘‘PowerAnalysis’’ in Section 4.6) was used to compute the power of the F-test for the treatment main effect in atwo-factor (treatment and species), fixed effects ANOVA model. The computations were made for 1 and5% significance levels, sample sizes N=10, 25, and 50, and assuming an error standard deviation of 5and 10 mm. In practice, suitable values for the standard error must be based on estimates from previousstudies or a pilot study. The output from FPOWTAB is displayed in Figure 7.

TABLE 7. Hypothetical treatment effects for power calculation. Expected diameter growth for first year (mm).

Species Treatment

Control Fertilizer Fertilizer FertilizerA B C

1 1 3 2 4

2 3 5 4 6

3 5 7 6 8

Figure 7 shows that, if the treatment effects are as in Table 7 and the error standard deviation is5 mm/year, then for a total sample size of 300 trees and a significance level of 5%, there is a 91%chance of detecting the treatment main effect. This drops to 50% if the sample size is decreased to 120trees. The power of the F-test is reduced substantially if the error standard deviation is 10 mm/year. Forexample, if the sample size is 120 and the standard deviation is 10 mm/year, then it would be unwise toconclude that there is no treatment effect even if the F-test fails to reject the null hypothesis at the 5%level of significance. The computation could be repeated for various other treatment scenarios andcontrasts, e.g., herbicide A versus the control. A power analysis could also be carried out for the speciesmain effect and for the species × treatment interaction.

4.5 Interpretation and Summary of Results

The final step in a statistical analysis is the interpretation and summary of the results. Completing thisrequires a good understanding of the model, the model parameters, in particular, and the various hypothesesthat were tested. If certain hypotheses were not rejected, the adequacy of the design should be evaluated,either informally (e.g., by estimating the variance components), or by making relevant power calculations, asdescribed in the previous section.

Warren (1986) gives some practical guidelines for the presentation of the results of a statistical analysis,with an emphasis on ANOVA and multiple comparisons. The guidelines can be summarized as follows:

1. The assumptions on which the analysis is based should be stated explicitly. For example, if anANOVA is used, the underlying model should be described in detail by identifying the main effects,interactions, and error terms, and by stating whether each is to be interpreted as fixed, random, ornested.

2. Sufficient quantitative detail should be provided to allow an independent assessment of the conclu-sions. For example, the degrees of freedom, sums of squares, and F-ratios should be provided whenreporting the results of an ANOVA.

Page 28: BIOMETRICS INFORMATION - British Columbia

22

EFFECT: TREATMENT MAIN EFFECT,DEGREES OF FREEDOM HYPOTHESIS: 3,SCENARIO: SEE TABLE 5,POWERS COMPUTED FROM SSH(POPULATION): 15,USING THE BASIS TOTAL SAMPLE SIZE: 12,AND TOTAL NONREDUNDANT PARAMETERS IN MODEL: 11

STD DEV

5 10

TOTAL N TOTAL N

120 300 600 120 300 600

PO- PO- PO- PO- PO- PO-WER WER WER WER WER WER

TEST TYPE ALPHA

REGULAR F 0.01 .27 .77 .99 .05 .15 .38

0.05 .50 .91 .99 .15 .33 .62

FIGURE 7. Output from FPOWTAB.

3. The statistical analysis should match the experimental design. For example, multiple-comparisons(which are appropriate only when no particular ordering of the treatment groups is anticipated) shouldnot be used to report the results of an experiment designed to investigate a specific trend.

4. Results should not be presented unless the validity of the underlying assumptions and the extent towhich any violations would invalidate the conclusions have been considered. The effect of unequalvariances on ANOVA F-tests, for example, should be assessed.

4.6 SAS Programs

4.6.1 Summary statistics and exploratory data analysis

Exploratory data analysis is straightforward in SAS (see PROC MEANS, PROC SUMMARY, PROCUNIVARIATE, PROC FREQ, PROC CHART, PROC PLOT). An introduction to the subject can be foundin Chapter 2 of Cody and Smith (1987) and Schlotzhauer and Little (1987). For most purposes, the lowresolution graphics that are available in SAS/Base are adequate. Otherwise, high resolution plots canbe made with SAS/Graph. (SYSTAT/SYGRAPH is also recommended for high resolution graphics. It iseasy to use and has excellent documentation.)

4.6.2 Repeated measures analysis and univariate ANOVA

The main procedure for carrying out an ANOVA or a repeated measures analysis in SAS is PROCGLM. Alternatively, PROC ANOVA can be used if the design is balanced (although the residuals cannotbe extracted). The procedures NESTED and VARCOMP can be used to estimate variance components.A generic SAS program for carrying out a repeated measures analysis of continuous response variablesis given in Appendix 1A. The program also performs a univariate ANOVA for each assessment. Inaddition, Cody and Smith (1987) and Hand and Taylor (1987) give several examples of ANOVA andrepeated measures analysis in SAS. They provide listings of the corresponding SAS programs, excerptsfrom the output, and explanatory notes.

Page 29: BIOMETRICS INFORMATION - British Columbia

23

4.6.3 Log-linear analysis

Log-linear analyses can be performed with the SAS procedure PROC CATMOD. A generic programfor carrying out a log-linear analysis of categorical response variables is provided in Appendix 1B.

4.6.4 Power analysis

O’Brien (1987) has written two programs for performing power analyses. The program FPOWTAB isapplicable to the general linear model (ANOVA, etc.), while CPOWTAB is applicable to log-linearmodels. Program listings, which include detailed instructions, are provided in Appendices 1C and 1D.

4.7 Example

A hypothetical trial was conducted to assess the effects of five treatments on a single crop tree speciesand on a particular species of competing vegetation. The treatments included three herbicide treatments(SULFMET, HEXASPT, HEXABRD), manual removal of vegetation (MANUAL), and no treatment (CON-TROL). A completely randomized design was used, in which each treatment was randomly assigned to threeplots. A random sample of 10 crop trees was selected from each plot, and the height and cover of the targetvegetation were measured for subplots centred on each crop tree. Assessments of the target vegetation andcrop trees were made once in the year prior to treatment (1985) and once during each of the following 3 years(1986–1988).

A partial listing of the data is given in Table 8, where columns 1–3 give the plot, subplot, and treatment;columns 4–7 give the percentage cover of the target species for 1985–1988; and columns 8–11 give theheights of the crop trees for 1985–1988.

The following analyses illustrate some of the methods described in the preceding sections.

4.7.1 Univariate ANOVA of pre-treatment heights of crop trees (1985)

To determine whether or not there were significant pre-treatment differences in the heights of thecrop trees for the five treatment groups, a univariate ANOVA was carried out with the following SAScommands:

DATA SUBPLOTS;INFILE ‘A:EXAMPLE.DAT’;INPUT PLOT $ 1-3 SUBPLOT 5-7 TREAT $ 9-15 (COV85-COV88) (4*5.0)(HT85-HT88) (4*5.0);

PROC GLM DATA=SUBPLOTS;TITLE ‘UNIVARIATE ANOVA OF HEIGHTS OF CROP TREES - 1985’;CLASS TREAT PLOT;MODEL HT85=TREAT PLOT(TREAT);RANDOM PLOT(TREAT);TEST H=TREAT E=PLOT(TREAT);MEANS TREAT/T CLDIFF E=PLOT(TREAT);OUTPUT OUT=RESID85 R=RHT85;

An edited version of the SAS output is shown in Figure 8. Since the default error term (i.e., thedenominator) for all F-ratios is the mean square due to variation among subplots, a TEST statementmust be used to test the effect due to TREAT (see Table 2A). Thus, the appropriate F-ratio is 0.54(Figure 8A) and not 8.37 (Figure 8B); and it may be concluded that there is no evidence of a pre-treatment difference in the heights of the crop trees. If differences were detected, then the pairwisecomparisons of the treatments (Figure 8C) could have been used to locate and quantify the differences.

Page 30: BIOMETRICS INFORMATION - British Columbia

24

TABLE 8. Data for herbicide trial

Cover of target species (%) Crop tree height (mm)

Plot Treatment 1985 1986 1987 1988 1985 1986 1987 1988Subplot

T 1 HEXASPT 55 5 10 10 95 126 178 245T 2 HEXASPT 0 0 10 25 138 184 268 363T 3 HEXASPT 0 0 0 5 91 121 173 260T 4 HEXASPT 10 10 40 95 88 102 155 253T 5 HEXASPT 10 10 45 100 62 79 139 181T 7 HEXASPT 0 0 5 15 99 135 185 285T 8 HEXASPT 0 0 5 5 110 150 189 240T 9 HEXASPT 10 15 30 70 75 89 140 225T 10 HEXASPT 30 25 50 70 120 175 250 345T 11 HEXASPT 40 25 0 0 102 140 200 3055 31 HEXASPT 25 25 100 100 94 112 133 1335 32 HEXASPT 30 30 100 100 100 127 178 2095 33 HEXASPT 20 25 75 100 123 142 155 1565 34 HEXASPT 70 70 100 100 114 137 172 1875 35 HEXASPT 100 95 100 100 121 147 176 2095 36 HEXASPT 20 5 20 20 75 86 123 1505 37 HEXASPT 15 15 30 10 123 161 217 2585 38 HEXASPT 20 20 60 100 120 146 192 2555 39 HEXASPT 15 15 70 100 74 81 92 945 40 HEXASPT 100 100 100 100 82 92 102 976 102 SULFMET 30 10 30 70 31 38 75 986 105 SULFMET 80 20 100 100 49 53 71 696 106 SULFMET 50 20 75 100 53 59 63 866 107 SULFMET 70 15 100 100 39 49 57 676 108 SULFMET 40 5 75 50 51 84 105 1286 109 SULFMET 10 5 0 0 43 55 90 1266 112 SULFMET 5 0 0 0 55 63 87 1216 113 SULFMET 10 5 5 25 73 81 101 1336 114 SULFMET 30 5 10 40 46 51 84 1066 115 SULFMET 40 10 50 35 43 45 60 868 143 HEXABRD 75 75 100 100 95 115 140 1648 144 HEXABRD 15 15 30 20 108 130 162 2038 146 HEXABRD 0 0 0 0 112 131 107 2018 147 HEXABRD 10 10 60 70 58 57 84 1138 148 HEXABRD 0 0 0 0 95 115 153 1928 149 HEXABRD 0 0 0 0 85 96 133 1618 150 HEXABRD 0 0 20 25 63 76 104 1198 151 HEXABRD 0 0 0 0 55 60 77 1008 153 HEXABRD 0 0 0 0 101 124 170 2358 159 HEXABRD 5 10 20 10 108 140 198 284X 261 HEXAGRN 100 100 100 100 141 180 211 290X 262 HEXAGRN 60 50 90 100 105 134 184 246X 264 HEXAGRN 5 5 10 15 175 198 265 300X 265 HEXAGRN 5 5 10 20 139 158 252 285X 268 HEXAGRN 80 85 100 100 90 112 144 197X 269 HEXAGRN 40 30 75 100 215 268 350 425

Page 31: BIOMETRICS INFORMATION - British Columbia

25

Class Levels Values

TREAT 5 CONTROL HEXABRD HEXAGRN HEXASPT SULFMETPLOT 15 1 3 4 5 6 8 C D E F G H J T X

Number of observations in data set = 150

Dependent Variable: HT85(B)

Sum of MeanSource DF Squares Square F Value Pr > F

Model 14 110209.9600 7872.1400 13.38 0.0001Error 135 79441.7000 588.4570Corrected Total 149 189651.6600

R-Square C.V. Root MSE HT85 Mean

0.581118 21.72500 24.25813 111.660000

Source DF Type I SS Mean Square F Value Pr > F

TREAT 4 19709.29333 4927.32333 8.37 0.0001PLOT(TREAT) 10 90500.66667 9050.06667 15.38 0.0001

Source DF Type III SS Mean Square F Value Pr > F

TREAT 4 19709.29333 4927.32333 8.37 0.0001PLOT(TREAT) 10 90500.66667 9050.06667 15.38 0.0001

Source Type III Expected Mean Square

TREAT Var(Error) + 10 Var(PLOT(TREAT)) + Q(TREAT)PLOT(TREAT) Var(Error) + 10 Var(PLOT(TREAT))

Tests of Hypotheses using the Type III MS for PLOT(TREAT) as an error term

Source DF Type III SS Mean Square F Value Pr > F

TREAT 4 19709.29333 4927.32333 0.54 0.7073 (A)

FIGURE 8. SAS output for herbicide trial: univariate ANOVA of heights of crop trees – 1985.

Page 32: BIOMETRICS INFORMATION - British Columbia

26

T tests (LSD) for variable: HT85

NOTE: This test controls the type I comparisonwise error rate not the experimentwise error rate.

Alpha = 0.05 Confidence = 0.95 df = 10 MSE = 9050.067Critical Value of T = 2.22814

Least Significant Difference = 54.73

Comparisons significant at the 0.05 level are indicated by ‘***’.

Lower Difference UpperTREAT Confidence Between Confidence

Comparison Limit Means Limit

CONTROL – HEXAGRN −42.26 12.47 67.20 (C)CONTROL – HEXABRD −34.10 20.63 75.36CONTROL – HEXASPT −31.26 23.47 78.20CONTROL – SULFMET −20.43 34.30 89.03

HEXAGRN – CONTROL −67.20 −12.47 42.26HEXAGRN – HEXABRD −46.56 8.17 62.90HEXAGRN – HEXASPT −43.73 11.00 65.73HEXAGRN – SULFMET −32.90 21.83 76.56

HEXABRD – CONTROL −75.36 −20.63 34.10HEXABRD – HEXAGRN −62.90 −8.17 46.56HEXABRD – HEXASPT −51.90 2.83 57.56HEXABRD – SULFMET −41.06 13.67 68.40

HEXASPT – CONTROL −78.20 −23.47 31.26HEXASPT – HEXAGRN −65.73 −11.00 43.73HEXASPT – HEXABRD −57.56 −2.83 51.90HEXASPT – SULFMET −43.90 10.83 65.56

SULFMET – CONTROL −89.03 −34.30 20.43SULFMET – HEXAGRN −76.56 −21.83 32.90SULFMET – HEXABRD −68.40 −13.67 41.06SULFMET – HEXASPT −65.56 −10.83 43.90

FIGURE 8. (Concluded).

Page 33: BIOMETRICS INFORMATION - British Columbia

27

The residuals from the preceding ANOVA, which were saved by including the OUTPUT statement,were examined for outliers and for other evidence of non-normality, using PROC UNIVARIATE as shownbelow:

PROC UNIVARIATE DATA=RESID85 PLOT NORMAL;TITLE ‘ANALYSIS OF RESIDUALS’;VAR RHT85;

The condensed SAS output is displayed in Figure 9. Some noteworthy points are:

• The summary statistics and test results (Figure 9A) show no evidence of a departure fromnormality.

• There is no serious lack of symmetry and there are no apparent outliers in the histogram or in theboxplot of the residuals (Figure 9B).

• The probability plot (Figure 9C) is approximately linear, which is consistent with a normaldistribution.

4.7.2 Repeated measures ANOVA of heights of crop trees (1985–1988)

Both univariate and multivariate repeated measures analyses of variance of the crop tree heightswere performed as follows:

PROC GLM DATA=SUBPLOTS;TITLE ‘REPEATED MEASURES ANOVA OF HEIGHTS OF CROP TREES’;CLASS TREAT PLOT;MODEL HT85−HT88 = TREAT PLOT(TREAT);REPEATED YEAR 4 (1 2 3 4) PROFILE/SUMMARY PRINTE;MANOVA H=TREAT M=HT85−HT86,HT86−HT87,HT87−HT88 E=PLOT(TREAT);MANOVA H=TREAT E=PLOT(TREAT);OUTPUT OUT=RESIDUAL R=RHT85−RHT88;

Some highlights of the results are displayed in Figure 10 (starting on page 30).

Page 34: BIOMETRICS INFORMATION - British Columbia

28

Variable = RHT85

Moments

N 150 Sum Wgts 150Mean 0 Sum 0Std Dev 23.09038 Variance 533.1658Skewness −0.07924 Kurtosis 0.64884 (A)USS 79441.7 CSS 79441.7CV . Std Mean 1.885322T:Mean = 0 0 Prob>T 1.0000Sgn Rank 20.5 Prob>S 0.9695Num ˆ= 0 150W:Normal 0.987094 Prob<W 0.8382

Quantiles(Def = 5)

100% Max 62.2 99% 51.975% Q3 16.4 95% 4050% Med 1.6 90% 27.325% Q1 −15.6 10% −28.650% Min −62.8 5% −39.8

Range 125Q3 − Q1 32Mode 6.4

Extremes

Lowest Obs Highest Obs−62.8( 45) 44.2( 63)−59.8( 61) 46.2( 62)−51.7( 100) 46.2( 69)−47.8( 42) 51.9( 86)−43.9( 56) 62.2( 46)

Histogram # Boxplot70+* 1

.**** 7

.********** 20

.************************** 52 +- -+- -+ (B)

.********************* 41 +- - - - -+

.* ********** 22

.*** 6−70+* 1

- - - -+- - - -+- - - -+- - - -+- - - -+-* may represent up to 2 counts

FIGURE 9. SAS output for herbicide trial: analysis of 1985 height residuals.

Page 35: BIOMETRICS INFORMATION - British Columbia

29

Normal Probability Plot70+ *

*****++*++********+

*********** (C)

*********+********

++*+*****−70+*

+- - - -+- - - -+- - - -+- - - -+- - - -+- - - -+- - - -+- - - -+- - - -+- - - -+−2 −1 0 +1 +2

FIGURE 9. (Concluded).

Page 36: BIOMETRICS INFORMATION - British Columbia

30

Manova Test Criteria and Exact F Statistics forthe Hypothesis of no YEAR Effect

H = Type III SS&CP Matrix for YEAR E = Error SS&CP Matrix

S=1 M=0.5 N=65.5

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.08585085 472.0661 3 133 0.0001Pillai’s Trace 0.91414915 472.0661 3 133 0.0001Hotelling-Lawley Trace 10.64810835 472.0661 3 133 0.0001Roy’s Greatest Root 10.64810835 472.0661 3 133 0.0001

Manova Test Criteria and F Approximations forthe Hypothesis of no YEAR*TREAT Effect (A)

H = Type III SS&CP Matrix for YEAR*TREAT E = Error SS&CP Matrix

S=3 M=0 N=65.5

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.52774085 8.0195 12 352.1764 0.0001Pillai’s Trace 0.54168007 7.4367 12 405 0.0001Hotelling-Lawley Trace 0.76370073 8.3795 12 395 0.0001Roy’s Greatest Root 0.50736842 17.1237 4 135 0.0001

Manova Test Criteria and F Approximations forthe Hypothesis of no YEAR*PLOT(TREAT) Effect

H = Type III SS&CP Matrix for YEAR*PLOT(TREAT) E = Error SS&CP Matrix

S=3 M=3 N=65.5

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.41462444 4.5594 30 391.0573 0.0001Pillai’s Trace 0.73298233 4.3649 30 405 0.0001Hotelling-Lawley Trace 1.07695801 4.7266 30 395 0.0001Roy’s Greatest Root 0.63583745 8.5838 10 135 0.0001

Note: F Statistic for Roy’s Greatest Root is an upper bound.

FIGURE 10. SAS output for herbicide trial: repeated measures ANOVA of heights of crop trees.

Page 37: BIOMETRICS INFORMATION - British Columbia

31

Tests of Hypotheses for Between Subjects Effects (F)

Source DF Type III SS Mean Square F Value Pr > F

TREAT 4 206239.893 51559.973 9.86 0.0001PLOT(TREAT) 10 700425.367 70042.537 13.40 0.0001Error 135 705704.600 5227.441

Univariate Tests of Hypotheses for Within Subject Effects

Source: YEAR (G)Adj Pr > F

DF Type III SS Mean Square F Value Pr > F G − G H − F3 1257151.633333 419050.544444 1050.21 0.0001 0.0001 0.0001

Source: YEAR*TREAT (C)Adj Pr > F

DF Type III SS Mean Square F Value Pr > F G − G H − F12 43823.333333 3651.944444 9.15 0.0001 0.0001 0.0001

Source: YEAR*PLOT(TREAT) (D)Adj Pr > F

DF Type III SS Mean Square F Value Pr > F G − G H − F30 93775.633333 3125.854444 7.83 0.0001 0.0001 0.0001

Source: Error (YEAR)

DF Type III SS Mean Square405 161601.400000 399.015802

Greenhouse-Geisser Epsilon = 0.4172Huynh-Feldt Epsilon = 0.4629

Analysis of Variance of Contrast Variables (H)YEAR.N represents the nth successive difference in YEAR

Contrast Variable: YEAR.1

Source DF Type III SS Mean Square F Value Pr > F

MEAN 1 94451.306667 94451.306667 869.43 0.0001TREAT 4 5218.760000 1304.690000 12.01 0.0001PLOT(TREAT) 10 5424.133333 542.413333 4.99 0.0001Error 135 14665.800000 108.635556

Contrast Variable: YEAR.2

Source DF Type III SS Mean Square F Value Pr > F

MEAN 1 279331.52667 279331.52667 1055.28 0.0001TREAT 4 12454.10667 3113.52667 11.76 0.0001PLOT(TREAT) 10 19362.86667 1936.28667 7.32 0.0001Error 135 35734.50000 264.70000

Figure 10. (Continued).

Page 38: BIOMETRICS INFORMATION - British Columbia

32

Contrast Variable: YEAR.3

Source DF Type III SS Mean Square F Value Pr > F

MEAN 1 410188.90667 410188.90667 576.56 0.0001TREAT 4 16531.96000 4132.99000 5.81 0.0002PLOT(TREAT) 10 47252.53333 4725.25333 6.64 0.0001Error 135 96044.60000 711.44148

M Matrix Describing Transformed Variables

HT85 HT86 HT87 HT88

MVAR1 1 −1 0 0MVAR2 0 1 −1 0MVAR3 0 0 1 −1

MANOVA Test Criteria and F Approximations forthe Hypothesis of no Overall TREAT Effect (B)

on the variables defined by the M Matrix TransformationH = Type III SS&CP Matrix for TREAT

E = Type III SS&CP Matrix for PLOT(TREAT)

S=3 M=0 N=3

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.26447097 1.1680 12 21.45751 0.3630Pillai’s Trace 0.96402873 1.1837 12 30 0.3379Hotelling-Lawley Trace 1.93488089 1.0749 12 20 0.4280Roy’s Greatest Root 1.31516053 3.2879 4 10 0.0577

NOTE: F Statistic for Roy’s Greatest Root is an upper bound.

MANOVA Test Criteria and F Approximations forthe Hypothesis of no Overall TREAT Effect (E)

H = Type III SS&CP Matrix for TREATE = Type III SS&CP Matrix for PLOT(TREAT)

S=4 M=−0.5 N=2.5

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.21241138 0.9091 16 22.02298 0.5703Pillai’s Trace 1.07167001 0.9149 16 40 0.5592Hotelling-Lawley Trace 2.41897543 0.8315 16 22 0.6425Roy’s Greatest Root 1.70977545 4.2744 4 10 0.0284

NOTE: F Statistic for Roy’s Greatest Root is an upper bound.

Figure 10. (Concluded).

Page 39: BIOMETRICS INFORMATION - British Columbia

33

The second multivariate ANOVA (Manova) test (Figure 10A) is a test of H01: no time × treatmentinteraction (see Figure 4A). However, since this test uses the default error matrix (SUBPLOTS), it is notapplicable. The correct test is performed by including the statement

MANOVA H=TREAT M=HT85−HT86,HT86−HT87,HT87−HT88 E=PLOT(TREAT);

the results of which are given in Figure 10B. To obtain the corresponding univariate test for parallelism,the univariate mean square for YEAR*TREAT (Figure 10C) is divided by the univariate mean square forYEAR*PLOT(TREAT) (Figure 10D), which gives an F-ratio of 1.17 with 12 and 30 degrees of freedom(p=0.35). Thus it can be concluded that there is no evidence that the treatment effects, if any, vary withtime. Consequently, it is appropriate to test for an overall treatment effect (H02) and an overall time effect(H03).

A Manova test of the null hypothesis H02 — no treatment effect (see Figure 4B) — is requested bythe statement

MANOVA H=TREAT E=PLOT(TREAT);

the results of which are given in Figure 10E. A univariate F-test for an overall treatment effect (i.e., acomparison of the average treatment effects over all years) is obtained by computingF = 51559.973/70042.537 = 0.74, which has 4 and 10 degrees of freedom (see Figure 10F). Both testssuggest there is no difference between the treatments. However, the trees exhibit a time effect, whichcan be attributed to significant height growth over the 4 years. The F-ratio for this effect isF = 419050.54444/3125.85444 = 134.06, which has 3 and 30 degrees of freedom (Figure 10G).

The PROFILE option in the REPEATED statement produces three separate ANOVA’s of the annualgrowth increments: HT86-HT85,HT87-HT86,HT88-HT87 (Figure 10H). Here, as in the precedinganalyses, the F-ratios for comparing the treatment groups must be re-computed with PLOT(TREAT) asthe error term.

The above results could also have been obtained with a repeated measures analysis of the subplotaverages. The relevant SAS commands are:

PROC SORT DATA=SUBPLOTS;BY TREAT PLOT;

PROC SUMMARY;BY TREAT PLOT;VAR HT85-HT88;OUTPUT OUT=PLOTS MEAN=MHT85-MHT88;

PROC GLM DATA=PLOTS;TITLE ‘REPEATED MEASURES ANALYSIS OF PLOT AVERAGES’;CLASS TREAT;MODEL MHT85-MHT88 = TREAT;REPEATED YEAR 4 (1 2 3 4) PROFILE/SUMMARY PRINTE;

The results are displayed in Figure 11. In this case, the default tests of H01 (Figure 11A,B), H02

(Figure 11C), and H03 (Figure 11D,E) are correct (cf. Figure 10). In addition to the univariate andmultivariate tests of the effects of treatment and time, a test of sphericity to assess the validity of theHuynh-Feldt condition was applied by specifying the PRINTE option in the REPEATED statement. Theresults (Figure 11F) suggest that the condition does not hold. The adjusted univariate tests or themultivariate tests should therefore be used.

Page 40: BIOMETRICS INFORMATION - British Columbia

34

Class Levels Values

TREAT 5 CONTROL HEXABRD HEXAGRN HEXASPT SULFMET

Number of observations in data set = 15

Repeated Measures Analysis of VarianceRepeated Measures Level Information

Dependent Variable MHT85 MHT86 MHT87 MHT88

Level of YEAR 1 2 3 4

Partial Correlation Coefficients from the Error SS&CP Matrix / Prob > r

DF = 9 MHT85 MHT86 MHT87 MHT88

MHT85 1.000000 0.992714 0.947376 0.8347640.0 0.0001 0.0001 0.0014

MHT86 0.992714 1.000000 0.971185 0.8667190.0001 0.0 0.0001 0.0006

MHT87 0.947376 0.971185 1.000000 0.9490280.0001 0.0001 0.0 0.0001

MHT88 0.834764 0.866719 0.949028 1.0000000.0014 0.0006 0.0001 0.0

E = Error SS&CP Matrix

YEAR.N represents the nth succesive difference in YEAR

YEAR.1 YEAR.2 YEAR.3

YEAR.1 542.413333 779.020000 688.326667YEAR.2 779.020000 1936.286667 2367.343333YEAR.3 688.326667 2367.343333 4725.253333

FIGURE 11. SAS output for herbicide trial: repeated measures ANOVA of plot averages.

Page 41: BIOMETRICS INFORMATION - British Columbia

35

Partial Correlation Coefficients from the Error SS&CP Matrixof the Variables Defined by the Specified Transformation / Prob > r

DF = 9 YEAR.1 YEAR.2 YEAR.3

YEAR.1 1.000000 0.760149 0.4299490.0 0.0066 0.1869

YEAR.2 0.760149 1.000000 0.7826430.0066 0.0 0.0044

YEAR.3 0.429949 0.782643 1.0000000.1869 0.0044 0.0

Test for Sphericity: Mauchly’s Criterion = 0.0488732 (F)Chisquare Approximation = 26.328264 with 5 df Prob > Chisquare = 0.0001

Applied to Orthogonal Components:Test for Sphericity: Mauchly’s Criterion = 0.0055393

Chisquare Approximation = 45.31972 with 5 df Prob > Chisquare = 0.0000

Manova Test Criteria and Exact F Statistics forthe Hypothesis of no YEAR Effect (E)

H = Type III SS&CP Matrix for YEAR E = Error SS&CP Matrix

S=1 M=0.5 N=3

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.04989100 50.7832 3 8 0.0001Pillai’s Trace 0.95010900 50.7832 3 8 0.0001Hotelling-Lawley Trace 19.04369532 50.7832 3 8 0.0001Roy’s Greatest Root 19.04369532 50.7832 3 8 0.0001

Manova Test Criteria and F Approximations forthe Hypothesis of no YEAR*TREAT Effect (A)

H = Type III SS&CP Matrix for YEAR*TREAT E = Error SS&CP Matrix

S=3 M=0 N=3

Statistic Value F Num DF Den DF Pr > F

Wilks’ Lambda 0.26447097 1.1680 12 21.45751 0.3630Pillai’s Trace 0.96402873 1.1837 12 30 0.3379Hotelling-Lawley Trace 1.93488089 1.0749 12 20 0.4280Roy’s Greatest Root 1.31516053 3.2879 4 10 0.0577

NOTE: F Statistic for Roy’s Greatest Root is an upper bound.

Tests of Hypotheses for Between Subjects Effects (C)

Source DF Type III SS Mean Square F Value Pr > F

TREAT 4 20623.9893 5155.9973 0.74 0.5881Error 10 70042.5367 7004.2537

FIGURE 11. (Continued).

Page 42: BIOMETRICS INFORMATION - British Columbia

36

Univariate Tests of Hypotheses for Within Subject Effects

Source: YEAR (D)Adj Pr > F

DF Type III SS Mean Square F Value Pr > F G − G H − F3 125715.1633333 41905.0544444 134.06 0.0001 0.0001 0.0001

Source: YEAR*TREAT (B)Adj Pr > F

DF Type III SS Mean Square F Value Pr > F G − G H − F12 4382.3333333 365.1944444 1.17 0.3480 0.3796 0.3707

Source: Error (YEAR)

DF Type III SS Mean Square30 9377.5633333 312.5854444

Greenhouse-Geisser Epsilon = 0.3708Huynh-Feldt Epsilon = 0.5507

FIGURE 11. (Concluded).

Page 43: BIOMETRICS INFORMATION - British Columbia

37

4.7.3 Log-linear analysis of frequency of occurrence of target species

A log-linear model was used to test whether or not the presence or absence of the target competitorspecies differed, in 1988, for the five treatment groups. The pre-treatment presence or absence of thespecies was included in the model to adjust for initial differences. The SAS code for tabulating thefrequencies and for fitting the relevant models is listed below:

DATA SUBPLOTS; INFILE ‘A:EXAMPLE.DAT’;INPUT PLOT $ 1-3 SUBPLOT 5-7 TREAT $ 9-15 (COV85-COV88) (4*5.0)(HT85-HT88) (4*5.0);IF COV85>0 THEN VEG85=1; ELSE VEG85=0;IF COV88>0 THEN VEG88=1; ELSE VEG88=0;

PROC TABULATE DATA=SUBPLOTS;TITLE ‘FREQUENCY OF OCCURRENCE OF VEGETATION’;CLASS TREAT VEG85 VEG88;TABLE (TREAT ALL),(VEG85 ALL)*(VEG88*N*F=4. ALL*N*F=4.);

PROC CATMOD DATA=SUBPLOTS;TITLE ‘LOGISTIC ANALYSIS OF VEG88: TREATMENT EFFECT’;POPULATION TREAT VEG85;MODEL VEG88=VEG85 TREAT/ML NOGLS;CONTRAST ‘HEXABRD VS CONTROL’ TREAT −1 1 0 0;CONTRAST ‘HEXAGRN VS CONTROL’ TREAT −1 0 1 0;CONTRAST ‘HEXASPT VS CONTROL’ TREAT −1 0 0 1;CONTRAST ‘SULFMET VS CONTROL’ TREAT −2 −1 −1 −1;

PROC CATMOD DATA=SUBPLOTS;TITLE ‘LOGISTIC ANALYSIS OF VEG88: NO TREATMENT EFFECT’;POPULATION TREAT VEG85;MODEL VEG88=VEG85/ML NOGLS;

The output from the above program is presented in Figure 12.

The frequencies of occurrence for the five treatment groups, controlling for the initial presence orabsence of the species, are given in Figure 12A. Comparison of the likelihood ratio statistics for themodel that includes treatment (Figure 12B) and the model that does not include treatment (Figure 12C)suggest there is a significant difference between treatments. That is, the former model provides asignificantly better fit than the latter. One difference is evident from the comparison of each treatmentwith the control (Figure 12D). The herbicide SULFMET differs significantly from the control. It seems todeter the target species from becoming established in subplots in which it was initially absent (seeFigure 12A).

Page 44: BIOMETRICS INFORMATION - British Columbia

38

FREQUENCY OF OCCURRENCE OF VEGETATION (A)

VEG85

0 1 ALL

VEG88 VEG88 VEG88

0 1 ALL 0 1 ALL 0 1 ALL

N N N N N N N N N

TREAT

CONTROL 6 5 11 . 19 19 6 24 30

HEXABRD 8 2 10 . 20 20 8 22 30

HEXAGRN 8 5 13 . 17 17 8 22 30

HEXASPT 1 5 6 1 23 24 2 28 30

SULFMET 15 2 17 2 11 13 17 13 30

ALL 38 19 57 3 90 93 41 109 150

LOGISTIC ANALYSIS OF VEG88: TREATMENT EFFECT

Response: VEG88 Response Levels (R) = 2Weight Variable: None Populations (S) = 10Data Set: SUBPLOTS Total Frequency (N) = 150

Observations (Obs) = 150

RESPONSE PROFILES

Response VEG88

1 02 1

MAXIMUM LIKELIHOOD ANALYSIS OF VARIANCE TABLE

Source DF Chi-Square Prob

INTERCEPT 1 19.01 0.0000VEG85 1 34.51 0.0000TREAT 4 10.48 0.0331

LIKELIHOOD RATIO 4 6.08 0.1935 (B)

FIGURE 12. SAS output for herbicide trial: log-linear analysis of frequency of occurrence of target species.

Page 45: BIOMETRICS INFORMATION - British Columbia

39

ANALYSIS OF MAXIMUM LIKELIHOOD ESTIMATES

Standard Chi-Effect Parameter Estimate Error Square Prob

INTERCEPT 1 −1.5634 0.3586 19.01 0.0000VEG85 2 2.1504 0.3660 34.51 0.0000TREAT 3 −0.5054 0.5371 0.89 0.3468

4 0.4049 0.5566 0.53 0.46705 −0.2217 0.5124 0.19 0.66526 −1.3909 0.7136 3.80 0.0513

CONTRASTS OF MAXIMUM LIKELIHOOD ESTIMATES

Contrast DF Chi-Square Prob

HEXABRD VS CONTROL 1 1.15 0.2844HEXAGRN VS CONTROL 1 0.13 0.7187HEXASPT VS CONTROL 1 0.76 0.3836SULFMET VS CONTROL 1 6.32 0.0119 (D)

LOGISTIC ANALYSIS OF VEG88: NO TREATMENT EFFECT

MAXIMUM LIKELIHOOD ANALYSIS OF VARIANCE TABLE

Source DF Chi-Square Prob

INTERCEPT 1 17.32 0.0000VEG85 1 39.60 0.0000

LIKELIHOOD RATIO 8 19.38 0.0129 (C)

ANALYSIS OF MAXIMUM LIKELIHOOD ESTIMATES

Standard Chi-Effect Parameter Estimate Error Square Prob

INTERCEPT 1 −1.3540 0.3253 17.32 0.0000VEG85 2 2.0472 0.3253 39.60 0.0000

FIGURE 12. (Concluded).

Page 46: BIOMETRICS INFORMATION - British Columbia

40

5 CONCLUSIONS

The objectives and design of a vegetation management trial are the two most important factorsgoverning the choice of suitable statistical methods of analysis. The nature of the response variables (forexample, continuous or categorical) must also be taken into consideration. Exploratory and diagnosticmethods, especially such graphical methods as probability plots and boxplots, are valuable for suggesting anappropriate model or data transformation, and for verifying model assumptions. Analysis of variancemethods, in particular the analysis of repeated measures, in conjunction with multiple comparisons to identifyeffective treatments and to estimate the sizes of the treatment effects, are generally suitable for the analysisof continuous response variables such as height and diameter. Log-linear methods, including tests forsignificant contrasts among the treatments, are often appropriate for the analysis of categorical responsevariables (e.g., survival and species occurrence). Finally, an analysis of power is recommended to assist inthe interpretation of the results of an analysis and in the improvement of the design of future trials.

Page 47: BIOMETRICS INFORMATION - British Columbia

41

APPENDIX 1: SAS programs for the analysis of forest vegetation management data

A. Univariate ANOVA and repeated measures analysis for each assessment of continuous responsevariables

/************************************************************************************************************************ /

/ * * /

/ * * /

/************************************************************************************************************************ /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

REPEATED MEASURES ANALYSIS AND UNIVARIATE ANOVA

OF CONTINUOUS RESPONSE VARIABLES

The data is read from the file INPUT.DAT (substitute desired name).

The data file has the following form:

Completely Randomized Design

TREAT PLOT SUBPLOT Y1 Y2 Y3 Y4 . . .

CONTROL 1 1 23.1 23.8 29.1 29.6

CONTROL 1 2 19.8 19.9 20.2 21.3

. . . . . . .

. . . . . . .

. . . . . . .

Randomized Block Design

TREAT BLOCK SUBPLOT Y1 Y2 Y3 Y4 . . .

CONTROL 1 1 23.1 23.8 29.1 29.6

CONTROL 1 2 19.8 19.9 20.2 21.3

. . . . . . .

. . . . . . .

. . . . . . .

The response variable Y (height, diameter, cover, etc.) is measured before the treatments are

applied (Y1) and on one or more post-treatment occasions (Y2,Y3,Y4). For illustration, it is

assumed that there are three post-treatment assessments. This should be adjusted as required.

The analysis is for the completely randomized design.

Page 48: BIOMETRICS INFORMATION - British Columbia

42

/* * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/************************************************************************************************************************ /

/ * * /

In the case of a randomized block design (1 plot per treatment per block) substitute BLOCK for

PLOT and change the statements in PROC GLM to:

PROC GLM DATA=SUBPLOTS;

TITLE ‘REPEATED MEASURES ANALYSIS’;

CLASS TREAT BLOCK;

MODEL Y1−Y4=TREATBLOCK;

REPEATED YEAR 4 (1 2 3 4) PROFILE/SUMMARY PRINTM;

MANOVA H=TREAT M=Y2−Y1,Y3−Y2,Y4−Y3 E=TREAT*BLOCK;

TEST H=TREAT E=TREAT*BLOCK;

MANOVA H=TREAT E=TREAT*BLOCK;

RANDOM BLOCK TREAT*BLOCK;

MEANS TREAT/T E=TREAT*BLOCK;

OUTPUT OUT=RESIDUAL R=R1−R4 P=P1−P4;

The blocking factor is assumed to be random. If it is fixed the TEST statement and the

RANDOM statement should be omitted; and E=TREAT*BLOCK should be deleted from the

MEANS statement.

Create data set SUBPLOTS containing raw data.

DATA SUBPLOTS;

INFILE ‘C:INPUT.DAT’ MISSOVER;

INPUT TREAT PLOT SUBPLOT Y1−Y4;

/* * /Sort by treatment and plot.

PROC SORT;

BY TREAT PLOT;

/* * /Compute summary statistics for plots (blocks).

PROC MEANS DATA=SUBPLOTS MEAN STD MIN MAX;

CLASS TREAT PLOT;

VAR Y1−Y4;

OUTPUT OUT=PLOTS N=N1−N4 MEAN=M1−M4 STD=S1−S4;

/* * /Compute summary statistics for treatment groups.

PROC MEANS DATA=SUBPLOTS MEAN STD MIN MAX;

CLASS TREAT;

VAR Y1−Y4;

OUTPUT OUT=TREATS N=N1−N4 MEAN=M1−M4 STD=S1−S4;

Page 49: BIOMETRICS INFORMATION - British Columbia

43

/* * /Perform repeated measures analysis and univariate ANOVA of Y1−Y4.

PROC GLM DATA=SUBPLOTS;

TITLE ‘REPEATED MEASURES ANALYSIS’;

CLASS TREAT PLOT;

MODEL Y1−Y4=TREAT PLOT(TREAT);

/ * * /The PROFILE option tests for parallelism. To test the significance of polynoimal contrasts substitute:

/ * * /POLYNOMIAL.

REPEATED YEAR 4 (1 2 3 4) PROFILE/SUMMARY PRINTM;

/* * /Multivariate test for treatment × time interaction (parallelism).

MANOVA H=TREAT M=Y2−Y1,Y3−Y2,Y4−Y3 E=PLOT(TREAT);

/ * * /Test for a treatment effect (for each year).

TEST H=TREAT E=PLOT(TREAT);

/ * * /Multivariate test for a treatment effect (i.e., H0: group means are equal for all assessments).

MANOVA H=TREAT E=PLOT(TREAT);

/ * * /Compute expected mean squares to check the validity of F-tests, especially when the sample sizes

/* * /are unequal.

RANDOM PLOT(TREAT);

/ * * /Perform multiple comparisons using Fisher’s LSD (T). Substitute another option as desired.

MEANS TREAT/T E=PLOT(TREAT);

/ * * /Save residuals and predicted values for diagnostic analysis.

OUTPUT OUT=RESIDUAL R=R1−R4 P=P1−P4;

/* * /Carry out diagnostic analysis of residuals.

PROC UNIVARIATE DATA=RESIDUAL PLOT NORMAL;

VAR R1−R4;

RUN;

Page 50: BIOMETRICS INFORMATION - British Columbia

44

B. Log-linear (logistic) analysis of categorical response variables

/************************************************************************************************************************ /

/ * * /

/ * * /

/************************************************************************************************************************ /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/ * * /

/************************************************************************************************************************ /

/ * * /

LOG-LINEAR (LOGISTIC) ANALYSIS OF CATEGORICAL

(CODED) RESPONSE VARIABLES

The data is read from the file INPUT.DAT (substitute desired name).

The data file has the following form:

TREAT FACTOR SUBPLOT Y

CONTROL 1 1 0

CONTROL 1 2 0

CONTROL 1 3 1

. . . .

. . . .

. . . .

The variable Y is a binary (categorial) response variable for a single assessment, e.g., Y=0/1 if

a particular target species is absent/present, or Y=0/1 if crop tree is dead/alive. The variable

FACTOR is an explanatory factor other than TREAT, e.g., location, absence/presence of a

competing species, etc. FACTOR may be omitted or additional factors added as required.

Create data set SUBPLOTS containing raw data.

DATA SUBPLOTS;

INFILE ‘C:INPUT.DAT’ MISSOVER;

INPUT TREAT FACTOR Y;

/* * /Compile frequency (contingency) table. This table is unformatted.

/ * * /Formats, labels, etc. should be added as desired.

PROC TABULATE DATA=SUBPLOTS;

TITLE ‘OBSERVED FREQUENCIES’;

CLASS TREAT FACTOR Y;

TABLE (TREAT ALL),(FACTOR ALL)*(Y*N*F=4. ALL*N*F=4.);

/ * * /Carry out logistic analysis. Test various hypotheses by fitting appropriate models.

PROC CATMOD DATA=SUBPLOTS;

TITLE ‘LOGISTIC ANALYSIS OF Y: SATURATED MODEL’;

POPULATION TREAT FACTOR;

Page 51: BIOMETRICS INFORMATION - British Columbia

45

MODEL Y=FACTORTREAT/ML NOGLS;

PROC CATMOD DATA=SUBPLOTS;

TITLE ‘LOGISTIC ANALYSIS OF Y: REDUCED MODEL WITH TREATMENT EFFECT’;

POPULATION TREAT FACTOR;

MODEL Y=TREAT FACTOR/ML NOGLS;

/* * /Compare treatments by testing various contrasts. Be sure to check that the contrasts are

consistent with the model parameterization./ * * /

CONTRAST ‘TREATMENT A VS CONTROL’ TREAT −1 1 0;

CONTRAST ‘TREATMENT B VS CONTROL’ TREAT −1 0 1;

CONTRAST ‘TREATMENT C VS CONTROL’ TREAT −2 −1 −1;

/ * * /Save fitted frequencies for examination.

RESPONSE/OUT=FITTED1;

PROC CATMOD DATA=SUBPLOTS;

TITLE ‘LOGISTIC ANALYSIS OF Y: REDUCED MODEL, NO TREATMENT EFFECT’;

POPULATION TREAT FACTOR;

MODEL Y=FACTOR/ML NOGLS;

RESPONSE/OUT=FITTED2;

RUN;

Page 52: BIOMETRICS INFORMATION - British Columbia

46

C. FPOWTAB: power analysis for general linear models (O’Brien 1987)

OPTIONS LS=70 DQUOTE NOSOURCE;

* ;

FPOWTAB****************************************************** *****************************************************;

SAS VERSION*************************************************** **************************************************;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

BEFORE USING THIS PROGRAM**************************************** ***************************************;

* ;

* ;

* ;

* ;

JUNE 1988

THIS PROGRAM WAS DEVELOPED UNDER THE CMS VERSION OF SAS RELEASE 5.16

IT SHOULD ALSO RUN UNDER OTHER RELEASES OF SAS VERSION 5 AND OTHER

OPERATING SYSTEMS. IT IS NOT COMPATIBLE WITH SAS-PC (VERSION 6).

IT CALLS THE MACRO FACILITY, WHICH MAY NEED TO BE TURNED ON WITH AN

OPTION. FOR EXAMPLE, IN IBM-OS:

/ / EXEC SAS,OPTIONS= ‘MACRO’

THIS IS PUBLIC-DOMAIN SOFTWARE: PLEASE DISTRIBUTE FREELY (AND FREE) TO

OTHERS. EVEN THOUGH I HAVE CHECKED THE PROGRAM THOROUGHLY, I CAN MAKE

NO GUARANTEE OF ANY SORT REGARDING ITS PERFORMANCE.

BUT IF YOU HAPPEN TO FIND A PROBLEM, PLEASE TELL ME.

NON-SAS USERS SHOULD NOTE THAT THERE IS A STAND-ALONE ANSI FORTRAN 77

VERSION OF THIS PROGRAM.

THIS WORK WAS SUPPORTED BY A FACULTY RESEARCH FELLOWSHIP FROM THE

UNIVERSITY OF TENNESSEE COLLEGE OF BUSINESS.

RALPH O’BRIEN, STATISTICS DEPT. AND COMPUTING CENTER

UNIV. OF TENNESSEE, KNOXVILLE, TN 37996-0532

615-974-2556, BITNET: PA87458 @ UTKVM1

THIS PROGRAM IS DESIGNED TO MAKE USE OF RESULTS PRODUCED BY ORDINARY

GENERAL LINEAR MODEL ROUTINES, SUCH AS PROC GLM. YOU MUST FIRST OBTAIN

SSH ‘‘STATISTICS’’ USING ARTIFICIAL DATA THAT HAS CELL MEANS THAT EQUAL YOUR

Page 53: BIOMETRICS INFORMATION - British Columbia

47

* ;

* ;

* ;

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* ;

* ;

* ;

* ;

* ;

SCENARIO OF THE POPULATION MEANS. HERE IS THE SAS CODE THAT PRODUCED

THE SSH(POPULATION) VALUES USED FOR INPUT.

OPTIONS LINESIZE=72;

TITLE ‘GET SSH(POPULATION) VALUES FOR HYPOTHETICAL LDL CHOLEST STUDY’;

DATA;

INPUT DIEX DRUG SCNARIO1 SCNARIO2 BASE_N;

LIST;

CARDS;

1 1 −.05 −.05 2

1 2 −.10 −.12 1

1 3 −.13 −.18 1

2 1 −.10 −.12 2

2 2 −.12 −.15 1

2 3 −.16 −.20 1

PROC GLM;

CLASS DIEX DRUG;

FREQ BASE_N;

MODEL SCNARIO1 SCNARIO2 = DIEX DRUG DIEX*DRUG/SS3;

CONTRAST ‘DRUG(2 −1 −1)’ DRUG −2 1 1;

CONTRAST ‘DRUG(0 1 −1)’ DRUG 0 1 −1;

PROC GLM; *SPECIAL CONTRASTS USING THE CELL-MEANS MODEL;

CLASS DIEX DRUG;

FREQ BASE_N;

MODEL SCNARIO1 SCNARIO2 = DIEX*DRUG/NOINT SOLUTION;

CONTRAST ‘DRUG(2 −1 −1) IN DIEX(2)’ DIEX*DRUG 0 0 0 2 −1 −1;

* ;

************************************************END OF SAS CODE ************************************************;

* ;

********************************** INSTRUCTIONS FOR INPUT TO FPOWTAB **********************************;

THIS IS A BATCH-VERSION PROGRAM, BUT IT COULD BE MODIFIED QUICKLY TO

ACCEPT INSTRUCTIONS INTERACTIVELY IN A QUESTION-ANSWER FORMAT. ALL

SPECIFICATION RECORDS ARE INPUT FROM A FILE SPECIFIED TO BE CALLED

‘POWTABS’. FOR EXAMPLE, THE IBM-OS JCL COMMAND MIGHT BE

/ /POWTABS DD DSN=DIEXDRUG.POWSPECS,DISP=OLD

Page 54: BIOMETRICS INFORMATION - British Columbia

48

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

*

*

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

OR THE CMS COMMAND MIGHT BE

FILEDEF POWTABS DISK DIEXDRUG POWSPECS A

ALL RECORDS ARE GIVEN IN LIST-DIRECTED MODE: THERE IS NO COLUMN

DEPENDENCY AND RECORDS CAN SPILL OVER ONTO TWO OR MORE LINES.

BUT, EACH RECORD MUST BEGIN ON A NEW LINE.

BY CHANGING THE STATEMENT

INFILE POWSPECS EOF=LAST

TO

INFILE CARDS EOF=LAST

ONE CAN USE THE INTERNAL FILE (‘‘CARDS’’) FOR THE SPECIFICATIONS, BUT ONE

RUNS THE RISK THAT THE PROGRAMMING STATEMENTS MIGHT BE ACCIDENTLY

MODIFIED.

RECORD 1: MAIN TITLE, UP TO 78 CHARACTERS.

(2 CONSECUTIVE SPACES MARK THE END OF TITLE)

EXAMPLE:

INPUT==> EFFECT OF DIET/EXERCISE AND DRUG THERAPY ON LDL CHOLESTEROL

RECORD 2: TOTAL SAMPLE SIZE USED AS BASIS FOR SSH(POPULATION) VALUES,

TOTAL NUMBER OF NONREDUNDANT PARAMETERS IN MODEL

(I.E. COLUMN RANK OF DESIGN MATRIX)

EXAMPLE: FOR A 2×3 FACTORIAL WITH FOUR N=1 CELLS AND TWO N=2 CELLS THE

BASIS TOTAL SAMPLE SIZE IS 8. THE FULL INTERACTION MODEL HAS 6

NONREDUNDANT PARAMETERS. THUS

INPUT==> 8 6

RECORD 3: NUMBER OF SCENARIOS FOR POPULATION MEANS, UP TO 5

TITLE OF FIRST SCENARIO, UP TO 78 CHARACTERS

TITLE OF SECOND SCENARIO, IF ANY

ETC.

TITLE OF LAST SCENARIO

(2 OR MORE CONSECUTIVE SPACES MARK THE ENDS OF TITLES)

EXAMPLE:

INPUT==> 2 LOW EFFECT HIGH EFFECT

Page 55: BIOMETRICS INFORMATION - British Columbia

49

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

RECORD 4: NUMBER OF TYPE I ERROR RATES (ALPHA LEVELS), UP TO 3

FIRST VALUE FOR ALPHA

SECOND VALUE, IF ANY

THIRD VALUE, IF ANY

EXAMPLE:

INPUT==> 2 .01 .05

RECORD 5: NUMBER OF STANDARD DEVIATIONS FOR ERROR TERM, UP TO 3

FIRST VALUE OF STANDARD DEVIATION

SECOND VALUE, IF ANY

THIRD VALUE, IF ANY

EXAMPLE: IF THE WITHIN-CELL STD. DEVS. FOR THIS 2 × 3 ANOVA ARE TO BE .1

AND .2, THEN

INPUT==> 2 .1 .2

RECORD 6: NUMBER OF TOTAL SAMPLE SIZES , UP TO 5

FIRST TOTAL SAMPLE SIZE

SECOND TOTAL SAMPLE SIZE, IF ANY

ETC.

LAST TOTAL SAMPLE SIZE

EXAMPLE: IN ORDER TO GET POWER FOR TOTAL SAMPLE SIZES OF 104, 208 AND

416 USE:

INPUT==> 3 104 208 416

EFFECTS RECORDS. ONE RECORD FOR EACH HYPOTHESIS (EFFECT) CONSIDERED,

UNLIMITED NUMBER. EACH RECORD HAS THE FOLLOWING FORMAT:

TITLE, UP TO 78 CHARACTERS, FOLLOWED BY AT LEAST 2 SPACES

DEGREES OF FREEDOM (NUMERATOR) FOR EFFECT

SSH(POPULATION) VALUE FOR FIRST SCENARIO OF MEANS

SSH(POPULATION) VALUE FOR SECOND SCENARIO OF MEANS, IF ANY

ETC.

SSH(POPULATION) VALUE FOR LAST SCENARIO OF MEANS

EXAMPLE: SIX EFFECTS AND TWO SCENARIOS. (THE SAS GLM CODE THAT

PRODUCED THESE SSH(POPULATION) VALUES IS REPRODUCED ABOVE.)

Page 56: BIOMETRICS INFORMATION - British Columbia

50

* ;

* ;

* ;

* ;

* ;

* ;

* ;

*********************************************END OF INSTRUCTIONS *********************************************;

* ;

INPUT==> DIET/EXERCISE MAIN EFFECT 1 .002 .00288

INPUT==> DRUG MAIN EFFECT 2 .00674 .01504

INPUT==> DIET/EXERCISE BY DRUG INTERACTION 2 .00034 .00104

INPUT==> DRUG(2 −1 −1) 1 .00551 .01201

INPUT==> DRUG(0 1 −1) 1 .00123 .00303

INPUT==> DRUG(2 −1 −1) IN DIET/EXERCISE(2) 1 .0016 .00303

Page 57: BIOMETRICS INFORMATION - British Columbia

51

******************** OUTPUT PRODUCED BY THE ABOVE ‘‘INPUT==>‘‘ STATEMENTS********************;

* ;

********* FIRST PAGE ********* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

EFFECT OF DIET/EXERCISE AND DRUG THERAPY ON LDL CHOLESTEROL 1

EFFECT: DIET/EXERCISE MAIN EFFECT,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: LOW EFFECT,

POWERS COMPUTED FROM SSH(POPULATION): 0.002,

USING THE BASIS TOTAL SAMPLE SIZE: 8,

AND TOTAL NONREDUNDANT PARAMETERS IN MODEL: 6

STD DEV

0.1 0.2

TOTAL N TOTAL N

104 208 416 104 208 416

PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER

TEST TYPE ALPHA

REGULAR F 0.01 .16 .38 .74 .04 .07 .17

0.05 .36 .62 .90 .13 .21 .36

1-TAILED T 0.01 .23 .48 .81 .06 .12 .24

0.05 .48 .73 .94 .20 .31 .49

********* LAST PAGE *********

Page 58: BIOMETRICS INFORMATION - British Columbia

52

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

***************************************END OF OUTPUT FOR EXAMPLE ***************************************;

* ;

EFFECT OF DIET/EXERCISE AND DRUG THERAPY ON LDL CHOLESTEROL 12

EFFECT: DRUG(2 −1 −1) IN DIET/EXERCISE(2),

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: BIG EFFECT,

POWERS COMPUTED FROM SSH(POPULATION): 0.00303,

USING THE BASIS TOTAL SAMPLE SIZE: 8,

AND TOTAL NONREDUNDANT PARAMETERS IN MODEL: 6

STD DEV

0.1 0.2

TOTAL N TOTAL N

104 208 416 104 208 416

PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER

TEST TYPE ALPHA

REGULAR F 0.01 .27 .58 .92 .05 .12 .27

0.05 .50 .80 .98 .17 .29 .51

1-TAILED T 0.01 .36 .68 .95 .09 .18 .36

0.05 .63 .88 .99 .25 .40 .63

Page 59: BIOMETRICS INFORMATION - British Columbia

53

************************************ SAS VERSION 5 CODE FOR FPOWTAB ************************************

* ;

OPTIONS LS=70 DQUOTE NOSOURCE; *ADD NOSOURCE AFTER CHECK OUT;

DATA POWDATA;

INFILE POWSPECS EOF=LAST; *IF NEEDED, CHANGE POWSPECS TO OTHER FILENAME;

FILE PRINT;

KEEP SCENARIO ALPHA POWER STDDEV TOTALN EFF_TITL DF_HYPTH TESTYPE

BASETOTN RANKX SSHPOP;

ARRAY ALPHAV{3} ALPHA1−ALPHA3;

ARRAY SCNARIOV{5} $ 78 SCNARIO1−SCNARIO5;

ARRAY SSHPOPV{5} SSHPOP1−SSHPOP5;

ARRAY STDDEVV{3} STDDEV1−STDDEV3;

ARRAY TOTALNV{5} TOTALN1−TOTALN5;

*;

************ INPUT TITLE AND ALL PARAMETER VALUES ************;

INPUT TITLEVAR & $78.;

CALL SYMPUT (‘MCRTITL’,TITLEVAR);

INPUT BASETOTN RANKX;

INPUT NUM_SCN @; DO I = 1 TO NUM_SCN; INPUT SCNARIOV{I} & @; END;

INPUT NUM_ALPH @; DO I = 1 TO NUM_ALPH; INPUT ALPHAV{I} @; END;

INPUT NUM_SD @; DO I = 1 TO NUM_SD; INPUT STDDEVV{I} @; END;

INPUT NUM_N @; DO I = 1 TO NUM_N; INPUT TOTALNV{I} @; END;

*;

NXTEFFCT:; * NEXT EFFECT ;

INPUT EFF_TITL & $78. DF_HYPTH @;

DO I=1 TO NUM_SCN; INPUT SSHPOPV {I} @; END;

*;

* LOOPS TO COMPUTE ALL ENTREES FOR TABLE;

DO I_ALPHA = 1 TO NUM_ALPH; ALPHA = ALPHAV{I_ALPHA};

DO I_SCN = 1 TO NUM_SCN; SCENARIO = SCNARIOV{I_SCN};

SSHPOP = SSHPOPV{I_SCN};

DO I_SD = 1 TO NUM_SD; STDDEV = STDDEVV{I_SD};

DO I_N = 1 TO NUM_N; TOTALN = TOTALNV{I_N};

LAMBDA = TOTALN*SSHPOP/(BASETOTN*STDDEV**2);

DF_ERROR = TOTALN − RANKX;

Page 60: BIOMETRICS INFORMATION - British Columbia

54

TESTTYPE = ‘REGULAR F’;

FCRIT = FINV( 1−ALPHA, DF_HYPTH, DF_ERROR, 0.0 );

POWER = 1 − FPROB( FCRIT, DF_HYPTH, DF_ERROR, LAMBDA );

*FOR NON-IBM USE;

*POWER = 1 − PROBF( FCRIT, DF_HYPTH, DF_ERROR, LAMBDA );

POWER = ROUND(POWER,.01);

IF POWER GT .99 THEN POWER = .99;

OUTPUT;

*;

*COMPUTE POWER FOR ONE-TAILED T TESTS FOR SINGLE DF HYPOTHESES;

IF DF_HYPTH = 1 THEN DO;

TESTTYPE = ‘1—TAILED T’;

TCRIT = TINV( 1−ALPHA, DF_ERROR, 0);

POWER = 1 − TPROB( TCRIT, DF_ERROR, SQRT(LAMBDA));

*FOR NON-IBM USE;

*POWER = 1 − PROBT( TCRIT, DF_ERROR, SQRT(LAMBDA));

POWER = ROUND(POWER,.01);

IF POWER GT .99 THEN POWER = .99;

OUTPUT;

END;

*;

END;

END;

END;

END;

GO TO NXTEFFCT;

*;

LAST:; * NO MORE EFFECTS TO INPUT;

PROC TABULATE FORMAT=3.2 ORDER=DATA;

CLASS SCENARIO DF_HYPTH EFF_TITL TESTTYPE STDDEV TOTALN ALPHA

BASETOTN RANKX SSHPOP;

VAR POWER;

TABLE EFF_TITL= ‘EFFECT:’ * DF_HYPTH= ‘DEGREES OF FREEDOM HYPOTHESIS:’ *

SCENARIO= ‘SCENARIO:’ * SSHPOP= ‘POWERS COMPUTED FROM SSH(POPULATION):’

* BASETOTN= ‘USING THE BASIS TOTAL SAMPLE SIZE:’

* RANKX= ‘TOTAL NONREDUNDANT PARAMETERS IN MODEL:’,

Page 61: BIOMETRICS INFORMATION - British Columbia

55

TESTTYPE= ‘TEST TYPE’ * ALPHA,

STDDEV= ‘STD DEV’*TOTALN= ‘TOTAL N’*POWER*SUM=’’

/RTSPACE=25;

TITLE 1 ‘‘&MCRTITL’’;

*;

/* THE ‘‘POWSPECS’’ DATA FILE IS LISTED BELOW:

EFFECT OF DIET/EXERCISE AND DRUG THERAPY ON LDL CHOLESTEROL

8 6

2 LOW EFFECT BIG EFFECT

2 .01 .05

2 .1 .2

3 104 208 416

DIET/EXERCISE MAIN EFFECT 1 .002 .00288

DRUG MAIN EFFECT 2 .00674 .01504

DIET/EXERCISE BY DRUG INTERACTION 2 .00034 .00104

DRUG(2 −1 −1) 1 .00551 .1201

DRUG(0 1 −1) 1 .00123 .00303

DRUG(2 −1 −1) IN DIET/EXERCISE(2) 1 .0016 .00303

*/

Page 62: BIOMETRICS INFORMATION - British Columbia

56

D. CPOWTAB: power analysis for log-linear models (O’Brien 1987)

OPTIONS LS=70 DQUOTE NOSOURCE;

CPOWTAB****************************************************** *****************************************************;

SAS VERSION*************************************************** **************************************************;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

BEFORE USING THIS PROGRAM**************************************** ***************************************;

* ;

* ;

* ;

* ;

* ;

JUNE 1988

THIS PROGRAM WAS DEVELOPED UNDER THE CMS VERSION OF SAS RELEASE 5.16

IT SHOULD ALSO RUN UNDER OTHER RELEASES OF SAS VERSION 5 AND OTHER

OPERATING SYSTEMS. IT IS NOT COMPATIBLE WITH SAS-PC (VERSION 6).

IT CALLS THE MACRO FACILITY, WHICH MAY NEED TO BE TURNED ON WITH AN

OPTION. FOR EXAMPLE, IN IBM-OS:

IT CALLS THE MACRO FACILITY, WHICH MUST BE TURNED ON WITH AN OPTION:

/ / EXEC SAS,OPTIONS= ‘MACRO’

THIS IS PUBLIC-DOMAIN SOFTWARE: PLEASE DISTRIBUTE FREELY (AND FREE) TO

OTHERS. EVEN THOUGH I HAVE CHECKED THE PROGRAM THOROUGHLY, I CAN MAKE

NO GUARANTEE OF ANY SORT REGARDING ITS PERFORMANCE.

BUT IF YOU HAPPEN TO FIND A PROBLEM, PLEASE TELL ME.

NON-SAS USERS SHOULD NOTE THAT THERE IS A STAND-ALONE ANSI FORTRAN

VERSION OF THIS PROGRAM.

THIS WORK WAS SUPPORTED BY A FACULTY RESEARCH FELLOWSHIP FROM THE

UNIVERSITY OF TENNESSEE COLLEGE OF BUSINESS.

RALPH O’BRIEN, STATISTICS DEPT. AND COMPUTING CENTER

UNIV. OF TENNESSEE, KNOXVILLE, TN 37996-0532

615-974-2556, BITNET: PA87458 @ UTKVM1

THIS PROGRAM IS DESIGNED TO MAKE USE OF RESULTS PRODUCED BY ORDINARY

LOG-LINEAR/LOGIT MODEL ROUTINES, SUCH AS PROC CATMOD IN SAS, OR BY THE

APPROPRIATE USE OF GLIM. ONE MUST FIRST OBTAIN LIKELIHOOD RATIO G**2

‘‘STATISTICS’’ USING ARTIFICIAL DATA THAT HAS CELL FREQUENCIES CONFORMING TO

Page 63: BIOMETRICS INFORMATION - British Columbia

57

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

*

*

*

* *

*

*

*

*

*

*

*

*

*

*

*

* *

*

*

*

*

*

*

*

*

THE POPULATION FREQUENCIES DERIVED FROM THE CONJECTURED SCENARIOS AND

A CONVENIENT BASE-LEVEL TOTAL SAMPLE SIZE.

OFTEN ONE MUST SUBTRACT (BY HAND!) LOG-LIKELIHOOD STATISTICS OBTAINED

FROM ‘‘FULL’’ AND ‘‘REDUCED’’ MODELS. GIVEN BELOW IS A SAS APPLICATION FOR A

2 × 2 × 3 TABLE, OUTCOME × STANDARD SCORE × LYON’S SCORE. OUTCOME IS A

DEPENDENT VARIABLE. THE BASE TOTAL SAMPLE SIZE IS 1000. THE TABLE IS FIT

TRADITIONALLY AS WELL AS USING ONLY LINEAR-ONLY TERMS FOR THE LYONS

PREDICTOR, WHICH HAS THREE ORDINAL CATEGORIES.

THE SAS COMMANDS ARE:

OPTIONS LS=72;

TITLE 1 ‘GET G2(POPLTN) VALUES FOR NONSTRESS TESTS COMPARISON:’;

TITLE 2 ‘STANDARD METHOD VERSUS LYONS METHOD’;

DIRECTIONS: NONREACTIVE STANDARD, LOW LYONS==ABNORMAL OUTCOME;

PROC FORMAT;

VALUE STNDFMT 1 = ‘NONREACTIVE’ 2 = ‘REACTIVE’;

VALUE LYONSFMT 1 = ‘LOW’ 2 = ‘INTERMEDIATE’ 3 = ‘HIGH’;

VALUE OUTCMFMT 0 = ‘NORMAL’ 1 = ‘ABNORMAL’;

LABLE PR—ABN1 = ‘PROB OF ABNORMAL OUTCOME (SCENARIO 1)’;

LABEL PR—ABN2 = ‘PROB OF ABNORMAL OUTCOME (SCENARIO 2)’;

LABEL CELLSIZE = ‘PROB OF THIS STANDARD/LYONS COMBO’;

DATA;

INPUT STANDARD LYONS PR—ABN1 PR—ABN2 CELLSIZE;

FORMAT STANDARD STNDFMT. LYONS LYONSFMT.;

PR—ABN1 OUTCMFMT. PR—ABN2 OUTCMFMT.;

COMPUTE EXPECTED FREQUENCIES FOR NORMAL AND PROBLEM BABIES;

OUTCOME=0; * COMPUTE EXPECTED NUMBER OF NORMAL OUTCOMES;

POPFREQ1 = (1−PR—ABN1)*CELLSIZE*1000; * EXP FREQ(NRML), SCENARIO 1;

POPFREQ2 = (1−PR—ABN2)*CELLSIZE*1000; * ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ 2;

OUTPUT;

OUTCOME=1; * COMPUTE EXPECTED NUMBER OF ABNORMAL OUTCOMES;

POPFREQ1 = PR_ABN1*CELLSIZE*1000; * EXP FREQ(ABNORM) FOR SCENARIO 1;

POPFREQ2 = PR_ABN2*CELLSIZE*1000; * ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ ‘‘ 2;

OUTPUT;

Page 64: BIOMETRICS INFORMATION - British Columbia

58

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

* ;

* ;

* ;

* ;

* ;

CARDS;

1 1 .40 .50 .04

1 2 .32 .45 .08

1 3 .27 .40 .04

2 1 .30 .45 .02

2 2 .22 .25 .18

2 3 .15 .15 .64

PROC PRINT;

PROC CATMOD;

WEIGHT POPFREQ1;

MODEL OUTCOME = STANDARD LYONS STANDARD*LYONS/ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ1;

MODEL OUTCOME = STANDARD LYONS/ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ1;

DIRECT LYONS;

MODEL OUTCOME = STANDARD LYONS/ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ1;

MODEL OUTCOME = STANDARD /ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ1;

MODEL OUTCOME = LYONS/ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ1;

DIRECT LYONS;

MODEL OUTCOME = LYONS/ML NOGLS;

PROC CATMOD;

WEIGHT POPFREQ2;

MODEL OUTCOME = STANDARD LYONS STANDARD*LYONS/ML NOGLS;

************** REPEAT OF FIVE REMAINING MODELS USING WEIGHT POPFREQ2 **************

******************************************* END OF SAS CODE *******************************************

Page 65: BIOMETRICS INFORMATION - British Columbia

59

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

***************************** INSTRUCTIONS FOR INPUT TO CPOWTAB *****************************

THIS IS A BATCH-VERSION PROGRAM, BUT IT COULD BE MODIFIED QUICKLY TO

ACCEPT INSTRUCTIONS INTERACTIVELY IN A QUESTION-ANSWER FORMAT. ALL

SPECIFICATION RECORDS ARE INPUT FROM A FILE SPECIFIED TO BE CALLED

‘POWTABS’. FOR EXAMPLE, THE IBM-OS JCL COMMAND MIGHT BE

/ /POWABS DD DSN=NONSTRES.POWSPECS,DISP=OLD

OR THE CMS COMMAND MIGHT BE

FILEDEF POWTABS DISK NONSTRES POWSPECS A

ALL RECORDS ARE GIVEN IN LIST-DIRECTED MODE: THERE IS NO COLUMN

DEPENDENCY AND RECORDS CAN SPILL OVER ONTO TWO OR MORE LINES. BUT, EACH

RECORD MUST BEGIN ON A NEW LINE.

BY CHANGING THE STATEMENT

INFILE POWSPECS EOF=LAST

TO

INFILE CARDS EOF=LAST

ONE CAN USE THE INTERNAL FILE (‘‘CARDS’’) FOR THE SPECIFICATIONS, BUT ONE

RUNS THE RISK THAT THE PROGRAMMING STATEMENTS MIGHT BE ACCIDENTLY

MODIFIED.

RECORD 1: MAIN TITLE, UP TO 78 CHARACTERS.

(2 OR MORE BLANK SPACES MARK END OF TITLE.)

EXAMPLE:

INPUT==> STANDARD VS LYONS SCORING OF NONSTRESS TEST

RECORD 2: SAMPLE SIZE USED AS BASIS FOR CHI-SQ(POPULATION) VALUES.

EXAMPLE:

INPUT==> 1000

RECORD 3: NUMBER OF SCENARIOS FOR POPULATION PARAMETERS, UP TO 5.

TITLE OF FIRST SCENARIO, UP TO 78 CHARS.

(2 OR MORE BLANK SPACES MARK END OF TITLE.)

TITLE OF SECOND SCENARIO, IF ANY. (2 OR MORE BLANK SPACES.)

ETC.

TITLE OF LAST SCENARIO. (2 OR MORE BLANK SPACES.)

Page 66: BIOMETRICS INFORMATION - British Columbia

60

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

EXAMPLE:

INPUT==> 2

INPUT==> MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS

INPUT==> STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS

RECORD 4: NUMBER OF TYPE I ERROR RATES (ALPHA LEVELS), UP TO 3.

FIRST VALUE FOR ALPHA.

SECOND VALUE, IF ANY.

THIRD VALUE, IF ANY.

EXAMPLE:

INPUT==> 2 .05 .01

RECORD 5: NUMBER OF TOTAL SAMPLE SIZES , UP TO 5

FIRST TOTAL SAMPLE SIZE

SECOND TOTAL SAMPLE SIZE, IF ANY

ETC.

LAST TOTAL SAMPLE SIZE

EXAMPLE: IN ORDER TO GET POWER FOR TOTAL SAMPLE SIZES OF 400, 600, 800,

AND 1000, USE

INPUT==> 4 400 600 800 1000

EFFECTS RECORDS. ONE RECORD FOR EACH HYPOTHESIS (EFFECT) CONSIDERED.

UNLIMITED NUMBER. EACH RECORD HAS THE FOLLOWING FORMAT:

TITLE, UP TO 78 CHARACTERS (ENDS WITH 2 OR MORE BLANKS).

DEGREES OF FREEDOM HYPOTHESIS FOR EFFECT.

CHI-SQ(POPULATION) VALUE FOR FIRST SCENARIO OF PARAMETERS.

CHI-SQ(POPULATION) FOR SECOND SCENARIO, IF ANY.

ETC.

CHI-SQ(POPULATION) VALUE FOR LAST SCENARIO.

Page 67: BIOMETRICS INFORMATION - British Columbia

61

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

EXAMPLE: 3 EFFECTS FOR TRADITIONAL MODEL, 3 FOR LYONS(LINEAR) MODEL

INPUT==> STANDARD GIVEN LYONS 1 6.709 20.637

INPUT==> LYONS GIVEN STANDARD 2 8.160 14.997

INPUT==> LYONS BY STANDARD INTERACTION 2 .287 3.064

INPUT==> LYONS(LINEAR) MODEL: LYONS(LINEAR) GIVEN STANDARD

INPUT==> 1 8.118 14.823

INPUT==> LYONS(LINEAR) MODEL: STANDARD GIVEN LYONS(LINEAR)

INPUT==> 1 6.676 20.543

INPUT==> LYONS(LINEAR) MODEL: LACK OF FIT 3 .329 3.238

**************************************** END OF INSTRUCTIONS ****************************************

Page 68: BIOMETRICS INFORMATION - British Columbia

62

****************************************************** OUTPUT ******************************************************;

*************************** PRODUCED FROM ABOVE INPUT==> STATEMENTS ***************************;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 1

EFFECT: STANDARD GIVEN LYONS,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 6.709,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .37 .52 .64 .74 .17 .28 .40 .51

1-TAILED Z .50 .64 .75 .83 .25 .37 .50 .60

NEW PAGE

Page 69: BIOMETRICS INFORMATION - British Columbia

63

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 2

EFFECT: STANDARD GIVEN LYONS,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 20.637,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .82 .94 .98 .99 .62 .83 .93 .98

1-TAILED Z .89 .97 .99 .99 .71 .88 .96 .99

NEW PAGE

Page 70: BIOMETRICS INFORMATION - British Columbia

64

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 3

EFFECT: LYONS GIVEN STANDARD,

DEGREES OF FREEDOM HYPOTHESIS: 2,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 8.16,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .35 .49 .62 .73 .16 .27 .38 .50

NEW PAGE

Page 71: BIOMETRICS INFORMATION - British Columbia

65

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 4

EFFECT: LYONS GIVEN STANDARD,

DEGREES OF FREEDOM HYPOTHESIS: 2,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 14.997,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .58 .77 .88 .94 .35 .55 .72 .84

NEW PAGE

Page 72: BIOMETRICS INFORMATION - British Columbia

66

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 5

EFFECT: LYONS BY STANDARD INTERACTION,

DEGREES OF FREEDOM HYPOTHESIS: 2,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 0.287,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .06 .06 .07 .07 .01 .01 .02 .02

NEW PAGE

Page 73: BIOMETRICS INFORMATION - British Columbia

67

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 6

EFFECT: LYONS BY STANDARD INTERACTION,

DEGREES OF FREEDOM HYPOTHESIS: 2,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 3.064,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .15 .21 .27 .33 .05 .08 .11 .14

NEW PAGE

Page 74: BIOMETRICS INFORMATION - British Columbia

68

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 7

EFFECT: LYONS(LINEAR) MODEL: LYONS(LINEAR) GIVEN STANDARD,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 8.118,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .44 .60 .72 .81 .22 .36 .49 .61

1-TAILED Z .56 .71 .82 .89 .30 .45 .59 .70

NEW PAGE

Page 75: BIOMETRICS INFORMATION - British Columbia

69

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 8

EFFECT: LYONS(LINEAR) MODEL: LYONS(LINEAR) GIVEN STANDARD,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 14.823,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .68 .85 .93 .97 .44 .66 .81 .90

1-TAILED Z .79 .91 .96 .99 .54 .74 .87 .94

NEW PAGE

Page 76: BIOMETRICS INFORMATION - British Columbia

70

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 9

EFFECT: LYONS(LINEAR) MODEL: STANDARD GIVEN LYONS(LINEAR),

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 6.676,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .37 .52 .64 .73 .17 .28 .40 .50

1-TAILED Z .50 .64 .75 .83 .24 .37 .49 .60

NEW PAGE

Page 77: BIOMETRICS INFORMATION - British Columbia

71

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 10

EFFECT: LYONS(LINEAR) MODEL: STANDARD GIVEN LYONS LINEAR,

DEGREES OF FREEDOM HYPOTHESIS: 1,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 20.543,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .82 .94 .98 .99 .61 .83 .93 .97

1-TAILED Z .89 .97 .99 .99 .71 .88 .96 .99

NEW PAGE

Page 78: BIOMETRICS INFORMATION - British Columbia

72

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 11

EFFECT: LYONS(LINEAR) MODEL: LACK OF FIT,

DEGREES OF FREEDOM HYPOTHESIS: 3,

SCENARIO: MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 0.329,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .06 .06 .07 .07 .01 .01 .01 .02

NEW PAGE

Page 79: BIOMETRICS INFORMATION - British Columbia

73

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

* ;

STANDARD VS LYONS SCORING OF NONSTRESS TEST 12

EFFECT: LYONS(LINEAR) MODEL: LACK OF FIT,

DEGREES OF FREEDOM HYPOTHESIS: 3,

SCENARIO: STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS,

POWERS COMPUTED FROM G**2(POPULATION): 3.238,

AND USING THE BASIS TOTAL SAMPLE SIZE: 1000

ALPHA

0.05 0.01

TOTAL N TOTAL N

10- 10-

400 600 800 00 400 600 800 00

PO- PO- PO- PO- PO- PO- PO- PO-

WER WER WER WER WER WER WER WER

TEST TYPE

REGULAR .14 .19 .24 .29 .04 .06 .09 .12

*;

*;

Page 80: BIOMETRICS INFORMATION - British Columbia

74

OPTIONS LS=70 DQUOTE NOSOURCE;

DATA POWDATA;

INFILE POWSPECS EOF=LAST; *IF DESIRED, CHANGE POWSPECS TO CARDS;

KEEP SCENARIO ALPHA POWER TOTALN EFF_TITL DF_HYPTH TESTTYPE

BASETOTN G2POP;

ARRAY ALPHAV(3) ALPHA1−ALPHA3;

ARRAY SCNARIOV(5) $ 78 SCNARIO1−SCNARIO5;

ARRAY G2POPV(5) G2POP1−G2POP5;

ARRAY TOTALNV(5) TOTALN1−TOTALN5;

*;

* ******************************* INPUT TITLE AND ALL PARAMETER VALUES *******************************;

INPUT TITLEVAR & $78.;

CALL SYMPUT(‘MCRTITL’, TITLEVAR);

INPUT BASETOTN;

INPUT NUM_SCN @; DO I = 1 TO NUM_SCN; INPUT SCNARIOV{I} & @; END;

INPUT NUM_ALPH @; DO I = 1 TO NUM_ALPH; INPUT ALPHAV{I} @; END;

INPUT NUM_N @; DO I = 1 TO NUM_N; INPUT TOTALNV{I} @; END;

*;

NXTEFFCT:; * NEXT EFFECT ;

INPUT EFF_TITL & $78. DF_HYPTH @;

DO I = 1 TO NUM_SCN; INPUT G2POPV(I) @; END;

DO I_ALPHA = 1 TO NUM_ALPH; ALPHA = ALPHAV(I_ALPHA);

DO I_SCN = 1 TO NUM_SCN; SCENARIO = SCNARIOV(I_SCN);

G2POP = G2POPV(I_SCN);

DO I_N = 1 TO NUM_N; TOTALN = TOTALNV(I_N);

LAMBDA = TOTALN*G2POP/BASETOTN;

TESTTYPE = ‘REGULAR ’;

CCRIT = CINV( 1−ALPHA, DF_HYPTH,0);

/* THE NEXT LINE ONCE USED THE CPROB FUNCTION, BUT I FOUND THAT IT GAVE

ERRONEOUS RESULTS FOR VERY LARGE LAMBDA. BUT THE PROBCHI ROUTINE GIVES

ERROR MESSAGES AND MISSING VALUES FOR VERY LARGE LAMBDA. I HAVE REPORTED

THESE THINGS TO SAS. I FEEL RELUCTANT AT THIS POINT TO TRAP LARGE LAMBDAS AND

RETURN POWER OF .99

22 JUNE 1988 */

Page 81: BIOMETRICS INFORMATION - British Columbia

75

*POWER = 1 − CPROB(CCRIT, DF_HYPTH, LAMBDA );

POWER = 1 − PROBCHI(CCRIT, DF_HYPTH, LAMBDA );

POWER = ROUND(POWER,.01); IF POWER GT .99 THEN POWER = .99;

OUTPUT;

*;

* COMPUTE POWER FOR ONE-TAILED Z TESTS OF SINGLE DF HYPOTHESES;

IF DF_HYPTH = 1 THEN DO;

TESTTYPE = ‘1-TAILED Z’;

ZCRIT = PROBIT(1−ALPHA);

POWER = 1 − PROBNORM(ZCRIT−SQRT(LAMBDA));

POWER = ROUND(POWER,.01); IF POWER GT .99 THEN POWER = .99;

OUTPUT;

END;

*;

END;

END;

END;

GO TO NXTEFFCT;

*;

LAST:; * NO MORE EFFECTS TO INPUT;

PROC TABULATE FORMAT=3.2 ORDER=DATA;

CLASS SCENARIO DF_HYPTH EFF_TITL TESTTYPE TOTALN ALPHA

BASETOTN G2POP;

VAR POWER;

TABLE EFF_TITL= ‘EFFECT:’ * DF_HYPTH= ‘DEGREES OF FREEDOM HYPOTHESIS:’ *

SCENARIO= ‘SCENARIO:’ *

G2POP= ‘POWERS COMPUTED FROM G**2(POPULATION):’

* BASETOTN= ‘USING THE BASIS TOTAL SAMPLE SIZE:’,

TESTTYPE= ‘TEST TYPE’,

ALPHA*TOTALN= ‘TOTAL N’*POWER*SUM=‘ ’

/RTSPACE=25;

TITLE1 ‘‘&MCRTITL’’;

/* THE POWER SPECIFICATIONS FILE . . .

Page 82: BIOMETRICS INFORMATION - British Columbia

76

STANDARD VS LYONS SCORING OF NONSTRESS TEST

1000

2

MODERATE UNIQUE PREDICTABILITY FOR BOTH SCORINGS

STRONG UNIQUE PREDICTABILITY FOR BOTH SCORINGS

2 .05 .01

4 400 600 800 1000

STANDARD GIVEN LYONS 1 6.709 20.637

LYONS GIVEN STANDARD 2 8.160 14.997

LYONS BY STANDARD INTERACTION 2 .287 3.064

LYONS(LINEAR) MODEL: LYONS(LINEAR) GIVEN STANDARD

1 8.118 14.823

LYONS(LINEAR) MODEL: STANDARD GIVEN LYONS(LINEAR)

1 6.676 20.543

LYONS(LINEAR) MODEL: LACK OF FIT 3 .329 3.238

*/

Page 83: BIOMETRICS INFORMATION - British Columbia

77

REFERENCES

Bergerud, W. 1989. Logistic regression methods with some forestry applications. B.C. Min. For., Victoria,B.C. Biometrics Inf.

Chambers, J.M., W.S. Cleveland, B. Kleiner and P.A. Tukey. 1983. Graphical methods for data analysis.Duxbury Press, Boston, Mass.

Cody, R.P. and J.K. Smith. 1987. Applied statistics and the SAS programming language. 2nd ed. North-Holland, New York, N.Y.

Coombs, J., D.O. Hall, S.P. Long and J.M.O. Scurlock. 1985. Techniques in bioproductivity and photo-synthesis. 2nd ed. Pergamon, Oxford.

Cox, D.R. 1970. The analysis of binary data. Methuen, London.Devore, J.L. 1987. Probability and statistics for engineering and the sciences. 2nd ed. Brooks/Cole, Belmont,

Cal.Dunn, O.J. and V.A. Clark. 1974. Applied statistics: analysis of variance and regression. Wiley & Sons, New

York, N.Y.Fingleton, B. 1984. Models of category counts. Cambridge Univ. Press, Cambridge, Mass.Fuchs, C. and A.R. Sampson. 1987. Simultaneous confidence intervals for the general linear model.

Biometrics 43:457–469.Hand, D.J. and C.C. Taylor. 1987. Multivariate analysis of variance and repeated measures. Chapman and

Hall, London.Herring, L.J. and J.C Pollack. 1985. Experimental design protocol for forest vegetation management

research: Level B trials. First approximation. B.C. Min. For., Victoria, B.C. Research Rep. RR84013-HQ.Hochberg, Y. and A.C. Tamhane. 1987. Multiple comparison procedures. Wiley & Sons, New York, N.Y.Keppel, G. 1973. Design and analysis: a researcher’s handbook. Prentice-Hall, Englewood Cliffs, N.J.Littell, R.C. 1989. Statistical analysis of experiments with repeated measures. HortScience 24:37–40.Looney, S.W. and W.B. Stanley. 1989. Exploratory repeated measures analysis for two or more groups. The

American Statistician 43:220–225.Marshall, P.L. 1987. A microcomputer program for assisting in the design of simple random samples. For.

Chron. 63:422–425.Meredith, M.P. and S.V. Stehman. 1991. Repeated measures experiments in forestry: focus on analysis of

response curves. Can. J. For. Res. 21:957–965.Miller, R.G., Jr. 1981. Simultaneous statistical inference. Springer-Verlag, New York, N.Y.Milliken, G.A. and D.E. Johnson. 1984. Analysis of messy data. Vol. I: Designed experiments. Lifetime

Learning Public., Belmont, Cal.Mize, C.W. and R.C. Schultz. 1985. Comparing treatment means correctly and appropriately. Can. J. For.

Res. 15:1142–1148.Morrison, D.F. 1976. Multivariate statistical methods. 2nd ed. McGraw-Hill, New York, N.Y.Nemec, Amanda F. Linnell. 1991. Power analysis handbook for the design and analysis of forestry trials. B.C.

Min. For., Victoria, B.C. Biometrics Info. Hand. No. 2.O’Brien, R.G. 1987. Teaching power analysis using regular statistical software. Proc. 2nd International Conf.

on Teaching Statistics. August 1986, Univ. Victoria, Victoria, B.C., pp. 204–211.Plackett, R.L. 1981. The analysis of categorical data. 2nd ed. MacMillan (Griffin), New York, N.Y.Pollack, J.C. and L.J. Herring. 1985. Experimental design protocol for forest vegetation management

research: Level A trials. First approximation. B.C. Min. For., Victoria, B.C. Research Rep. RR84012-HQ.Potthoff, R.R. and S.N. Roy. 1964. A generalized multivariate analysis of variance model useful especially for

growth curve problems. Biometrika 51:313–326.Rao, C.R. 1965. The theory of least squares when the parameters are stochastic and its application to the

analysis of growth curves. Biometrika 52:447–478.Sanders, W.L. 1989. Use of PROC GLM of SAS (or a similar linear model computing tool) in research

planning. HortScience 24:40–45.

Page 84: BIOMETRICS INFORMATION - British Columbia

78

SAS Institute Inc. 1985. SAS user’s guide: statistics. Version 5 ed. SAS Institute Inc., Cary, N.C.Schlotzhauer, S.D. and Littell, R.C. 1987. SAS system for elementary statistical analysis. SAS Inst. Inc.,

Cary, N.C.Stafford, S.G. 1985. A statistics primer for foresters. J. For. 83:148–157.Stanek, E.J., III. 1988. Growth curve methods of repeated binary response. Biometrics 44:973–983.Tukey, J.W. 1977. Exploratory data analysis. Addison-Wesley, Reading, Mass.Warren, W.G. 1986. On the presentation of statistical analysis: reason or ritual. Can. J. For. Res. 16:1185–

1191.Wilkinson, L. 1988. SYGRAPH. SYSTAT, Inc., Evanston, Ill.