ASReml User Guide - uncronopio.orguncronopio.org/uploads/ASReml/ASReml_User_Guide.pdf · ASReml...

ASRemlUser Guide

Release 1.0

A R GilmourNSW Agriculture, Orange, Australia

B J GogelDepartment of Primary Industries, Brisbane, Australia

B R CullisNSW Agriculture, Wagga Wagga, Australia

S J WelhamRothamsted Research, Harpenden, United Kingdom

R ThompsonRothamsted Research, Harpenden, United Kingdom

ASReml User Guide Release 1.0

A R Gilmour, B J Gogel, B R Cullis, S J Welham and R Thompson

Published by:

VSN International Ltd,5 The Waterhouse,Waterhouse Street,Hemel Hempstead,HP1 1ES, UK

E-mail: [email protected]: http://www.vsn-intl.com/

Copyright Notice

Copyright c© 2002, NSW Agriculture. All rights reserved.

Except as permitted under the Copyright Act 1968 (Commonwealth of Aus-tralia), no part of the publication may be reproduced by any process, electronicor otherwise, without specific written permission of the copyright owner. Nei-ther may information be stored electronically in any form whatever without suchpermission.

The correct bibliographical reference for this document is:

Gilmour, A.R., Gogel, B.J., Cullis, B.R., Welham, S.J. and Thompson, R. 2002ASReml User Guide Release 1.0 VSN International Ltd, Hemel Hempstead, HP11ES, UK

ISBN 1-904375-07-3

Author email addresses

[email protected]@[email protected]@[email protected]

Preface

ASReml is a statistical package that fits linear mixed models using Residual Maxi-mum Likelihood (REML). It has been under development since 1993 and is a jointventure between the Biometrics Program of NSW Agriculture and the Biomath-ematics Unit previously the Statistics Department of Rothamsted ExperimentStation. This guide relates to the August 2002 version of ASReml.

Linear mixed effects models provide a rich and flexible tool for the analysis ofmany data sets commonly arising in the agricultural, biological, medical and en-vironmental sciences. Typical applications include the analysis of (un)balancedlongitudinal data, repeated measures analysis, the analysis of (un)balanced de-signed experiments, the analysis of multi-environment trials, the analysis of bothunivariate and multivariate animal breeding and genetics data and the analysisof regular or irregular spatial data.

One of the strengths of ASReml is the wide range of possible variance modelsfor the random effects in the linear mixed model. There is a potential cost forthis wide choice. Users should be aware of the dangers of either overfitting orattempting to fit inappropriate variance models to small or highly unbalanceddata sets. We stress the importance of the use of data-driven diagnostics andencourage the user to read the examples chapter, in which we have attempted tonot only present the syntax of ASReml in the context of real analyses but also toindicate some of the modelling approaches we have found useful.

Another strength of ASReml is the use of the Average Information (AI) algorithmand sparse matrix methods for fitting the linear mixed model. This enablesASReml to analyse large and complex data sets quite efficiently. We have concen-trated on expanding the range of variance models and improving the speed andreliability of the engine rather than providing comprehensive reporting facilities.However, the facilities are adequate for most situations. The engine of ASRemlis used by the REML directive in GENSTAT and by the samm class of functionsin S-PLUS. Both of these have good data manipulation and graphical facilities.

i

Preface ii

The guide has 15 chapters. Chapter 1 introduces ASReml and describes the con-ventions used in this guide. Chapter 2 outlines some basic theory while Chapter3 presents an overview of the syntax of ASReml through a simple example. Datafile preparation is described in Chapter 4 and Chapter 5 describes how to inputdata into ASReml. Chapters 6 and 7 are key chapters which present the syntax forspecifying the linear model and the variance models for the random effects in thelinear mixed model. Chapters 8 and 9 describe special commands for multivari-ate and genetic analyses respectively. Chapter 10 deals with prediction of linearfunctions of fixed and random effects in the linear mixed model and Chapter 11presents the syntax for forming functions of variance components. Chapter 12demonstrates running an ASReml job and explains the interactive menu featuresavailable and Chapter 13 gives a detailed explanation of the output files. Chapter14 gives an overview of the error messages generated in ASReml and some guid-ance as to their probable cause. The guide concludes with the most extensivechapter which presents the examples.

The data sets and ASReml input files used in this guide are available for electronicdistribution from http://vsn-intl.com/asreml. They remain the property ofthe authors or of the original source but may be freely distributed provided thesource is acknowledged. The authors would appreciate feedback and suggestionsfor improvements to the program and this guide.

Upgrades

ASReml is being continually upgraded to implement new developments in theapplication of linear mixed models. The release version will be distributed byVSN on CD to licensed users. A developmental version is available to licenseesof the current release version via the ASReml website. This User Guide and aReference Manual are also available on that site. Most users will not need toaccess the developmental version unless they are actively involved in testing anew development.

Acknowledgements

We gratefully acknowledge the Grains Research and Development Corporation ofAustralia for their financial support for our research since 1988. Brian Cullis andArthur Gilmour wish to thank NSW Agriculture, and in particular Alan Gleesonfor having faith in the project and for providing a stimulating and exciting envi-ronment for applied biometrical research and consulting. Rothamsted Researchreceives grant-aided support from the Biotechnology and Biological Sciences Re-search Council of the United Kingdom.

Preface iii

We sincerely thank Ari Verbyla, Dave Butler and Alison Smith, the other mem-bers of the ASReml ‘team’. Ari is thanked for his contribution to the cubicsmoothing splines technology, on-going testing of the software and numeroushelpful discussions and insight. Dave is thanked for his support of the projectand in the latter years the development of the samm class of functions. Ali-son is thanked for her contribution to Chapter 15, the development of many ofthe approaches for the analysis of multi-section trials, and constant and unde-terred commitment to ASReml. We also thank Ian White for his contribution tothe spline methodology and Simon Harding and Megan Cherry for licensing andinstallation software.

ASReml was available free of charge for a number of years and we acknowledgeour users who have supported the package through its early development. Welook forward to their continued support and interaction. ASReml is now dis-tributed by VSN-International on behalf of NSW Agriculture and RothamstedResearch. Proceeds are used to support continued development. Please visithttp://www.VSN-Intl.com or email [email protected] for further informa-tion.

Arthur Gilmour acknowledges the grace of God through Jesus Christ who enablesthis research to proceed. Be exalted O God, above the heavens: and Thy gloryabove all the earth. Psalm 108:5.

Contents

Preface i

List of Tables xvi

List of Figures xviii

1 Introduction 1

1.1 What ASReml can do . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 How to use this guide . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Help and discussion list . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Typographic conventions . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Some theory 5

2.1 The linear mixed model . . . . . . . . . . . . . . . . . . . . . . . . . 6

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Direct product structures . . . . . . . . . . . . . . . . . . . . . . . 6

Variance structures for the errors: R structures . . . . . . . . . . . 8

iv

Contents v

Variance structures for the random effects: G structures . . . . . . 9

2.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Estimation of the variance parameters . . . . . . . . . . . . . . . . 10

Estimation/prediction of the fixed and random effects . . . . . . . 13

2.3 What are BLUPs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Combining variance models . . . . . . . . . . . . . . . . . . . . . . . 15

2.5 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Tests of hypotheses: variance parameters . . . . . . . . . . . . . . 16

Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Tests of hypotheses: fixed effects . . . . . . . . . . . . . . . . . . 18

3 A guided tour 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Nebraska Intrastate Nursery (NIN) field experiment . . . . . . . . . . 20

3.3 The ASReml data file . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.4 The ASReml command file . . . . . . . . . . . . . . . . . . . . . . . . 23

The title line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Reading the data . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

The data file line . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Specifying the terms in the mixed model . . . . . . . . . . . . . . 25

Tabulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Contents vi

Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Variance structures . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.5 Running the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Description of output files . . . . . . . . . . . . . . . . . . . . . . . . 27

The .asr file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

The .sln file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

The .yht file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.7 Tabulation, predicted values and functions of the variance components 31

4 Data file preparation 33

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2 The data file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Free format data files . . . . . . . . . . . . . . . . . . . . . . . . . 34

Fixed format data files . . . . . . . . . . . . . . . . . . . . . . . . 36

Binary format data files . . . . . . . . . . . . . . . . . . . . . . . 36

5 Command file: Reading the data 37

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.2 Important rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.3 Title line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.4 Specifying and reading the data . . . . . . . . . . . . . . . . . . . . . 39

Data field definition syntax . . . . . . . . . . . . . . . . . . . . . . 39

Contents vii

5.5 Transforming the data . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Transformation syntax . . . . . . . . . . . . . . . . . . . . . . . . 41

Other rules and examples . . . . . . . . . . . . . . . . . . . . . . 46

Special note on covariates . . . . . . . . . . . . . . . . . . . . . . 47

5.6 Data file line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Data line syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.7 Data file qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.8 Job control qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6 Command file: Specifying the terms in the mixed model 61

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2 Specifying model formulae in ASReml . . . . . . . . . . . . . . . . . 62

General rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.3 Fixed terms in the model . . . . . . . . . . . . . . . . . . . . . . . . . 66

Primary fixed terms . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Sparse fixed terms . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.4 Random terms in the model . . . . . . . . . . . . . . . . . . . . . . . 67

6.5 Interactions and conditional factors . . . . . . . . . . . . . . . . . . . 68

Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Conditional factors . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Contents viii

6.6 Alphabetic list of model functions . . . . . . . . . . . . . . . . . . . . 69

6.7 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.8 Missing values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Missing values in the response . . . . . . . . . . . . . . . . . . . . 73

Missing values in the explanatory variables . . . . . . . . . . . . . 73

6.9 Some technical details about model fitting in ASReml . . . . . . . . . 74

Sparse versus dense . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Ordering of terms in ASReml . . . . . . . . . . . . . . . . . . . . 74

Aliasing and singularities . . . . . . . . . . . . . . . . . . . . . . . 74

Examples of aliasing . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.10 Testing of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Command file: Specifying the variance structures 77

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Non singular variance matrices . . . . . . . . . . . . . . . . . . . . 78

7.2 Variance model specification in ASReml . . . . . . . . . . . . . . . . 79

7.3 A sequence of structures for the NIN data . . . . . . . . . . . . . . . 79

7.4 Variance structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

General syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Variance header line . . . . . . . . . . . . . . . . . . . . . . . . . 89

R structure definition . . . . . . . . . . . . . . . . . . . . . . . . 90

Contents ix

G structure header and definition lines . . . . . . . . . . . . . . . 92

7.5 Variance model description . . . . . . . . . . . . . . . . . . . . . . . . 92

Forming variance models from correlation models . . . . . . . . . . 93

Additional notes on variance models . . . . . . . . . . . . . . . . . 94

7.6 Variance structure qualifiers . . . . . . . . . . . . . . . . . . . . . . . 103

7.7 Rules for combining variance models . . . . . . . . . . . . . . . . . . 104

7.8 G structures involving more than one random term . . . . . . . . . . 105

7.9 Constraining variance parameters . . . . . . . . . . . . . . . . . . . . 106

Parameter constraints within a variance model . . . . . . . . . . . 106

Constraints between and within variance models . . . . . . . . . . 107

7.10 Model building using the !CONTINUE qualifier . . . . . . . . . . . . . 109

8 Command file: Multivariate analysis 110

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Wether trial data . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.2 Model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

8.3 Variance structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Specifying multivariate variance structures in ASReml . . . . . . . 113

8.4 The output for a multivariate analysis . . . . . . . . . . . . . . . . . . 114

8.5 More than two response variates . . . . . . . . . . . . . . . . . . . . . 116

9 Command file: Genetic analysis 117

Contents x

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

9.2 The command file . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

9.3 The pedigree file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

9.4 Reading in the pedigree file . . . . . . . . . . . . . . . . . . . . . . . 120

9.5 Genetic groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

9.6 Reading a user defined inverse relationship matrix . . . . . . . . . . . 123

The example continued . . . . . . . . . . . . . . . . . . . . . . . . 124

10 Tabulation of the data and prediction from the model 125

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

10.2 Tabulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

10.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

11 Functions of variance components 134

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

11.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

Linear combinations of components . . . . . . . . . . . . . . . . . 135

Heritability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

A more detailed example . . . . . . . . . . . . . . . . . . . . . . . 138

Contents xi

12 Command file: Running the job 140

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

12.2 The command line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

12.3 Command line options . . . . . . . . . . . . . . . . . . . . . . . . . . 142

Debug command line options (D, E) . . . . . . . . . . . . . . . . . 143

Graphics command line options (G, I, N) . . . . . . . . . . . . . . 143

Job control command line options (C, F, R) . . . . . . . . . . . . . 144

Memory command line options (S) . . . . . . . . . . . . . . . . . 144

Menu command line options (M) . . . . . . . . . . . . . . . . . . 145

Log file (L) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Post-processing calculation of variance components (P) . . . . . . 146

Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

12.4 Advanced processing arguments . . . . . . . . . . . . . . . . . . . . . 147

Standard use of arguments . . . . . . . . . . . . . . . . . . . . . . 147

Recycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Entering command line arguments interactively . . . . . . . . . . 148

13 Description of output files 149

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

13.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

13.3 Key output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Contents xii

The .asr file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

The .sln file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

The .yht file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

13.4 Other ASReml output files . . . . . . . . . . . . . . . . . . . . . . . . 156

The .res file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

The .vrb file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

The .vvp file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

The .rsv file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

The .dpr file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

The .pvc file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

The .pvs file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

The .tab file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

13.5 ASReml output objects and where to find them . . . . . . . . . . . . 165

14 Error messages 168

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

14.2 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

14.3 Things to check in the .asr file . . . . . . . . . . . . . . . . . . . . . 171

14.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

14.5 Information, Warning and Error messages . . . . . . . . . . . . . . . . 184

15 Examples 193

Contents xiii

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

15.2 Split plot design - Oats . . . . . . . . . . . . . . . . . . . . . . . . . 194

15.3 Unbalanced nested design - Rats . . . . . . . . . . . . . . . . . . . . 198

15.4 Source of variability in unbalanced data - Volts . . . . . . . . . . . . . 202

15.5 Balanced repeated measures - Height . . . . . . . . . . . . . . . . . . 204

15.6 Spatial analysis of a field experiment - Barley . . . . . . . . . . . . . . 213

15.7 Unreplicated early generation variety trial - Wheat . . . . . . . . . . . 219

15.8 Paired Case-Control study - Rice . . . . . . . . . . . . . . . . . . . . 225

Standard analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

A multivariate approach . . . . . . . . . . . . . . . . . . . . . . . 231

Interpretation of results . . . . . . . . . . . . . . . . . . . . . . . . 235

15.9 Balanced longitudinal data - Random coefficients and cubic smoothingsplines - Oranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

15.10Multivariate animal genetics data - Sheep . . . . . . . . . . . . . . . . 245

Half-sib analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

Animal model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

Bibliography 257

Index 260

List of Tables

2.1 Combination of models for G and R structures . . . . . . . . . . . . 15

3.1 Trial layout and allocation of varieties to plots in the NIN field trial . 21

5.1 List of transformation qualifiers and their actions with examples . . . 43

5.2 Qualifiers relating to data input and output . . . . . . . . . . . . . . 48

5.3 List of frequently used job control qualifiers . . . . . . . . . . . . . . 49

5.4 List of occasionally used job control qualifiers . . . . . . . . . . . . . 50

5.5 List of rarely used job control qualifiers . . . . . . . . . . . . . . . . 55

5.6 List of very rarely used job control qualifiers . . . . . . . . . . . . . 59

6.1 Summary of reserved words, allowable operators and functions . . . . 64

6.2 Alphabetic list of model functions and descriptions . . . . . . . . . . 69

6.3 Examples of aliasing in ASReml . . . . . . . . . . . . . . . . . . . . 75

6.4 Schematic analysis of variance table from fitting the modelyield ∼ mu A B X A.B A.X B.X A.B.X !r rep . . . . . . . . . 76

7.1 Sequence of variance structures for the NIN field trial data . . . . . 86

7.2 Schematic outline of variance model specification in ASReml . . . . 88

xiv

List of Tables xv

7.3 Details of the variance models available in ASReml . . . . . . . . . 99

7.4 List of R and G structure qualifiers . . . . . . . . . . . . . . . . . . 103

7.5 Examples of constraining variance parameters in ASReml . . . . . . 107

9.1 List of pedigree file qualifiers . . . . . . . . . . . . . . . . . . . . . 121

10.1 List of prediction qualifiers . . . . . . . . . . . . . . . . . . . . . . . 131

12.1 Command line options . . . . . . . . . . . . . . . . . . . . . . . . . 142

12.2 The use of arguments in ASReml . . . . . . . . . . . . . . . . . . . 148

13.1 Summary of ASReml output files . . . . . . . . . . . . . . . . . . . 150

13.2 ASReml output objects and where to find them . . . . . . . . . . . 165

14.1 Some information messages and comments . . . . . . . . . . . . . . 184

14.2 List of warning messages and likely meaning(s) . . . . . . . . . . . . 185

14.3 Alphabetical list of error messages and probable cause(s) . . . . . . 188

15.1 A split-plot field trial of oat varieties and nitrogen application . . . . 194

15.2 Rat data: AOV decomposition . . . . . . . . . . . . . . . . . . . . . 199

15.3 Summary of F-tests of fixed effects adjusted for the other factors forthe rat data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

15.4 REML log-likelihood ratio for the variance components in the volt-age data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

15.5 Summary of variance models fitted to the plant data . . . . . . . . . 206

List of Tables xvi

15.6 Summary of Wald test for fixed effects for variance models fitted tothe plant data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

15.7 Field layout of Slate Hall Farm experiment . . . . . . . . . . . . . . 214

15.8 Summary of models for the Slate Hall data . . . . . . . . . . . . . . 219

15.9 Estimated variance components from univariate analyses of blood-worm data. (a) Model with homogeneous variance for all terms and(b) Model with heterogeneous variance for interactions involving tmt 229

15.10 Equivalence of random effects in bivariate and univariate analyses . . 231

15.11 Estimated variance parameters from bivariate analysis of bloodwormdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

15.12 Orange data: AOV decomposition . . . . . . . . . . . . . . . . . . . 241

15.13 Sequence of models fitted to the Orange data . . . . . . . . . . . . 242

15.14 REML estimates of a subset of the variance parameters for each traitfor the genetic example, expressed as a ratio to their asymptotic s.e. 247

15.15 Wald tests of the fixed effects for each trait for the genetic example 247

15.16 Variance models fitted for each part of the ASReml job in the analysisof the genetic example . . . . . . . . . . . . . . . . . . . . . . . . . 250

List of Figures

13.1 Variogram of residuals . . . . . . . . . . . . . . . . . . . . . . . . . 158

13.2 Plot of residuals in field plan order . . . . . . . . . . . . . . . . . . 159

13.3 Plot of the marginal means of the residuals . . . . . . . . . . . . . . 160

13.4 Histogram of residuals . . . . . . . . . . . . . . . . . . . . . . . . . 160

15.1 Residual plot for the rat data . . . . . . . . . . . . . . . . . . . . . 201

15.2 Residual plot for the voltage data . . . . . . . . . . . . . . . . . . . 204

15.3 Trellis plot of the height for each of 14 plants . . . . . . . . . . . . 205

15.4 Residual plots for the EXP variance model for the plant data . . . . . 209

15.5 Sample variogram of the AR1×AR1 model for the Slate Hall data . . 216

15.6 Sample variogram of the AR1×AR1 model for the Tullibigeal data . 222

15.7 Sample variogram of the AR1×AR1 + pol(column,-1) model forthe Tullibigeal data . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

15.8 Rice bloodworm data: Plot of square root of root weight for treatedversus control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

15.9 BLUPs for treated for each variety plotted against BLUPs for control 234

15.10 Estimated deviations from regression of treated on control for eachvariety plotted against estimate for control . . . . . . . . . . . . . . 235

xvii

List of Figures xviii

15.11 Estimated difference between control and treated for each varietyplotted against estimate for control . . . . . . . . . . . . . . . . . . 236

15.12 Trellis plot of trunk circumference for each tree . . . . . . . . . . . . 238

15.13 Fitted cubic smoothing spline for tree 1 . . . . . . . . . . . . . . . . 240

15.14 Plot of fitted cubic smoothing spline for model 1 . . . . . . . . . . . 243

15.15 Trellis plot of trunk circumference for each tree at sample dates(adjusted for season effects), with fitted profiles across time andconfidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . 244

15.16 Plot of the residuals from the nonlinear model of Pinheiro and Bates 245

1 Introduction

What ASReml can do

Installation

How to use the guide

Help and discussion list

Typographic conventions

1

1 Introduction 2

1.1 What ASReml can do

ASReml (pronounced A S Remel) is used to fit linear mixed models to quite largedata sets with complex variance models. It extends the range of variance modelsavailable for the analysis of experimental data. ASReml has application in theanalysis of

• (un)balanced longitudinal data,

• repeated measures data (multivariate analysis of variance and spline type mod-els),

• (un)balanced designed experiments,

• multi-environment trials and meta analysis,

• univariate and multivariate animal breeding and genetics data (involving arelationship matrix for correlated effects),

• regular or irregular spatial data.

The engine of ASReml forms the basis of the REML procedure in GENSTAT. Aninterface for SPLUS called samm has also been developed. While these interfaceswill be adequate for many analyses, some large problems will need to use ASReml.The ASReml user interface is concise. Most effort has been directed towardsefficiency of the engine. It normally operates in a batch mode.

Problem size depends on the sparsity of the mixed model equations and the sizeof your computer. However, models with 100,000 effects have been fitted suc-cessfully. The computational efficiency of ASReml arises from using the AverageInformation REML procedure (giving quadratic convergence) and sparse matrixoperations. ASReml has been operational since March 1996 and is updated peri-odically.

1.2 Installation

Installation instructions vary for the various platforms and are distributed withthe program. If you find the instructions inadequate for some reason, please [email protected].

1 Introduction 3

1.3 How to use this guide

The guide consists of 15 chapters. Chapter 1 introduces ASReml and describesthe conventions used in the guide. Chapter 2 outlines some basic theory whichTheory

you may need to come back to.

New ASReml users are advised to read Chapter 3 before attempting to code theirGetting started

first job. It presents an overview of basic ASReml coding demonstrated on a realdata example. Chapter 15 presents a range of examples to assist users further.Examples

When coding you first job, look for an example to use as a template.

Data file preparation is described in Chapter 4, while Chapter 5 describes howData file

to input data into ASReml. Chapters 6 and 7 are key chapters which present thesyntax for specifying the linear model and the variance models for the randomeffects in the linear mixed model. Variance modelling is a complex aspect ofLinear model

analysis. We introduce variance modelling in ASReml by example in Chapter 7.Variance model

Chapters 8 and 9 describe special commands for multivariate and genetic analysesrespectively. Chapter 10 deals with prediction of fixed and random effects fromPrediction

the linear mixed model and Chapter 11 presents the syntax for forming functionsof variance components.

Chapter 12 discusses running an ASReml job and explains the interactive menufeatures available. Chapter 13 gives a detailed explanation of the output files.Output

Chapter 14 gives an overview of the error messages generated in ASReml andsome guidance as to their probable cause.

1.4 Help and discussion list

Supported users of ASReml may email [email protected] for assistence.When requesting help for a job that is not working as you expect, please sendthe input command file, the data file and the corresponding primary output filealong with a description of the problem.

There is also an ASReml discussion list. To join (leave) the list, send an email [email protected] the word subscribe (unsubscribe) in the body of the message.

The address for messages to the list is [email protected].

1 Introduction 4

1.5 Typographic conventions

A hands on approach is the best way to develop a working understanding of a newcomputing package. We therefore begin by presenting a guided tour of ASRemlusing a sample data set for demonstration (see Chapter 3). Throughout the guidenew concepts are demonstrated by example wherever possible.

An example ASReml code box

bold type highlights sections

of code currently under

discussion

remaining code is not

highlighted... indicates that some of the

original code is omitted from

the display

In this guide you will often find framed sampleboxes to the right of the page as shown here.These contain ASReml command file (sample)code. Note that

– the code under discussion is highlighted inbold type for easy identification,

– the continuation symbol (... ) is used to

indicate that some of the original code isomitted.

Data examples are displayed in larger boxes in the body of the text, see, forexample, page 34. Other conventions are as follows:

• keyboard key names appear in smallcaps, for example, tab and esc,

• example code within the body of the text is in this size and font and ishighlighted in bold type, see pages 27 and 40,

• in the presentation of general ASReml syntax, for example

[path] asreml basename[.as] [arguments]

– typewriter font is used for text that must be typed verbatim, for example,asreml and .as after basename in the example,

– italic font is used to name information to be supplied by the user, for exam-ple, basename stands for the name of a file with an .as filename extension,

– square brackets indicate that the enclosed text and/or arguments are notalways required.

• ASReml output is in this size and font, see page 28,

• this font is used for all other code.

2 Some theory

The linear mixed model

IntroductionDirect product structures

Variance structures for the errors: R structuresVariance structures for the random effects: G structures

Estimation

Estimation of the variance parametersEstimation/prediction of fixed and random effects

What are BLUPs?

Combining variance models

Inference

Tests of hypotheses: variance parametersTests of hypotheses: fixed effects

5

2 Some theory 6

2.1 The linear mixed model

Introduction

If y denotes the n × 1 vector of observations, the linear mixed model can bewritten as

y = Xτ + Zu + e (2.1)

where τ is the p × 1 vector of fixed effects, X is an n × p design matrix of fullcolumn rank which associates observations with the appropriate combination offixed effects, u is the q× 1 vector of random effects, Z is the n× q design matrixwhich associates observations with the appropriate combination of random effects,and e is the n× 1 vector of residual errors.

The model (2.1) is called a linear mixed model or linear mixed effects model. Itis assumed [

ue

]∼ N

([00

], θ

[G(γ) 0

0 R(φ)

])(2.2)

where the matrices G and R are functions of parameters γ and φ, respectively.The parameter θ is a variance parameter which we will refer to as the scaleparameter. In mixed effects models with more than one residual variance, arisingfor example in the analysis of data with more than one section (see below) orvariate, the parameter θ is fixed to one. In mixed effects models with a singleresidual variance then θ is equal to the residual variance (σ2). In this case Rmust be a correlation matrix (see Table 2.1 for a discussion).

Direct product structures

To undertake variance modelling in ASReml you need to understand the formationof variance structures via direct products (⊗). The direct product of two matricesA (m×p) and B (n×q) is

a11B . . . a1pB

.... . .

...

am1B. . . ampB

.

Direct products in R structures

Consider a vector of common errors associated with an experiment. The usualleast squares assumption (and the default in ASReml) is that these are indepen-dently and identically distributed (IID). However, if e was from a field experiment

2 Some theory 7

laid out in a rectangular array of r rows by c columns, we could arrange the resid-uals as a matrix and might consider that they were autocorrelated within rowsand columns. Writing the residuals as a vector in field order, that is, by sort-ing the residuals rows within columns (plots within blocks) the variance of theresiduals might then be

σ2e Σc(ρc)⊗Σr(ρr)

where Σc(ρc) and Σr(ρr) are correlation matrices for the row model (order r, auto-correlation parameter ρr) and column model (order c, autocorrelation parameterρc) respectively. More specifically, a two-dimensional separable autoregressivespatial structure (AR1 ⊗ AR1) is sometimes assumed for the common errors ina field trial analysis (see Gogel (1997) and Cullis et al. (1998) for examples). Inthis case

Σr =

1ρr 1ρ

2

r ρr 1...

......

. . .ρr−1

r ρr−2r ρr−3

r . . . 1

and Σc =

1ρc 1ρ2

c ρc 1...

......

. . .ρc−1

c ρc−2c ρc−3

c . . . 1

.

Alternatively, the residuals might relate to a multivariate analysis with nt traitsSee Chapter 8

for further de-

tails

and n units and be ordered traits within units. In this case an appropriatevariance structure might be

In ⊗Σ

where Σ (nt×nt) is a general or unstructured variance matrix.

Direct products in G structures

Likewise, the random terms in u in the model may have a direct product struc-ture. For example, for a field trial with s sites, g varieties and the effects orderedvarieties within sites, the model term site.variety may have the variance structure

Σ⊗ Ig

where Σ is the variance matrix for sites. This would imply that the varieties areindependent random effects within each site, have different variances at each site,and are correlated across sites. Important Whenever a random term is formedas the interaction of two factors you should consider whether the IID assumptionis sufficient or if a direct product structure might be more appropriate.

2 Some theory 8

Variance structures for the errors: R structures

The vector e will in some situations be a series of vectors indexed by a fac-tor or factors. The convention we adopt is to refer to these as sections. Thuse = [e′1,e′2, . . . , e′s]′ and the ej represent the errors of sections of the data. For ex-ample, these sections may represent different experiments in a multi-environmenttrial (MET), or different trials in a meta analysis. It is assumed that R is thedirect sum of s matrices Rj , j = 1 . . . s, that is,

R = ⊕sj=1Rj =

R1 0 . . . 0 00 R2 . . . 0 0...

.... . .

......

0 0 . . . Rs−1 00 0 . . . 0 Rs

,

so that each section has its own variance structure which is assumed to be inde-pendent of the structures in other sections.

A structure for the residual variance for the spatial analysis of multi-environmenttrials (Cullis et al., 1998) is given by

Rj = Rj(φj)= σ2

j (Σj(ρj)).

Each section represents a trial and we note that this model accounts for betweentrial error variance heterogeneity (σ2

j ) and possibly a different spatial variancemodel for each trial.

In the simplest case the matrix R could be known and proportional to an identitymatrix. Each component matrix, Rj (or R itself for one section) is assumed tobe the direct product (see Searle, 1982) of one, two or three component matrices.The component matrices are related to the underlying structure of the data. If thestructure is defined by factors, for example, replicates, rows and columns, thenthe matrix R can be constructed as a direct product of three matrices describingthe nature of the correlation across replicates, rows and columns. These factorsmust completely describe the structure of the data, which means that

1. the number of combined levels of the factors must equal the number of datapoints,

2. each factor combination must uniquely specify a single data point.

These conditions are necessary to ensure the expression var (e) = θR is valid.The assumption that the overall variance structure can be constructed as a direct

2 Some theory 9

product of matrices corresponding to underlying factors is called the assumptionof separability and assumes that any correlation process across levels of a factoris independent of any other factors in the term. Multivariate data and repeatedmeasures data usually satisfy the assumption of separability. In particular, if thedata are indexed by factors units and traits (for multivariate data) or times(for repeated measures data), then the R structure may be written as units ⊗traits or units ⊗ times. This assumption is sometimes required to make theestimation process computationally feasible, though it can be relaxed, for certainapplications, for example fitting isotropic covariance models to irregularly spacedspatial data.

Variance structures for the random effects: G structures

The q × 1 vector of random effects is often composed of b subvectors u =[u′1 u′2 . . . u′b]

′ where the subvectors ui are of length qi and these subvectorsare usually assumed independent normally distributed with variance matricesθGi. Thus just like R we have

G = ⊕bi=1Gi =

G1 0 . . . 0 00 G2 . . . 0 0...

.... . .

......

0 0 . . . Gb−1 00 0 . . . 0 Gb

.

There is a corresponding partition in Z, Z = [Z1 Z2 . . . Zb]. As before eachsubmatrix, Gi, is assumed to be the direct product of one, two or three componentmatrices. These matrices are indexed for each of the factors constituting the termin the linear model. For example, the term site.genotype has two factors and sothe matrix Gi is comprised of two component matrices defining the variancestructure for each factor in the term.

Models for the component matrices Gi include the standard model for whichGi = γiIqi and direct product models for correlated random factors given by

Gi = Gi1 ⊗Gi2 ⊗Gi3

for three component factors. The vector ui is therefore assumed to be the vectorrepresentation of a 3-way array. For two factors the vector ui is simply the vecof a matrix with rows and columns indexed by the component factors in theterm, where vec of a matrix is a function which stacks the columns of its matrixargument below each other.

A range of models are available for the components of both R and G. Theyinclude correlation (C) models (that is, where the diagonals are 1), or covariance

2 Some theory 10

(V ) models and are discussed in detail in Chapter 7. Some correlation modelsinclude

• autoregressive (order 1 or 2)

• moving average (order 1 or 2)

• ARMA(1,1)

• uniform

• banded

• general correlation.

Some of the covariance models include

• diagonal (that is, independent with heterogeneous variances)

• antedependence

• unstructured

• factor analytic.

There is the facility within ASReml to allow for a nonzero covariance between thesubvectors of u, for example in random regression models. In this setting theintercept and say the slope for each unit are assumed to be correlated and it ismore natural to consider the the two component terms as a single term, whichgives rise to a single G structure. This concept is discussed later.

2.2 Estimation

Estimation involves two processes that are closely linked. They are performedwithin the ‘engine’ of ASReml. One process involves estimation of τ and predic-tion of u (although the latter may not always be of interest) for given θ, φ and γ.The other process involves estimation of these variance parameters. Note that inthe following sections we have set θ = 1 to simplify the presentation of results.

Estimation of the variance parameters

Estimation of the variance parameters is carried out using residual or restrictedmaximum likelihood (REML), developed by Patterson and Thompson (1971). Anhistorical development of the theory can be found in Searle et al. (1992). Notefirstly that

y ∼ N(Xτ , H). (2.3)

2 Some theory 11

where H = R + ZGZ ′. REML does not use (2.3) for estimation of varianceparameters, but rather uses a distribution free of τ , essentially based on errorcontrasts or residuals. The derivation given below is presented in Verbyla (1990).

We transform y using a non-singular matrix L = [L1 L2] such that

L′1X = Ip, L′2X = 0.

If yj = L′jy, j = 1, 2,

[y1

y2

]∼ N

([τ0

],

[L′1HL1 L′1HL2

L′2HL1 L′2HL2

]).

The full distribution of L′y can be partitioned into a conditional distribution,namely y1|y2, for estimation of τ , and a marginal distribution based on y2 forestimation of γ and φ; the latter is the basis of the residual likelihood.

The estimate of τ is found by equating y1 to its conditional expectation, andafter some algebra we find,

τ = (X ′H−1X)−1X ′H−1y

Estimation of κ = [γ ′ φ′]′ is based on the log residual likelihood,

`R = −12(log detL′2H

−1L2 + y′2(L′2HL2)−1y2)

= −12(log detX ′H−1X + log detH + y′Py2) (2.4)

whereP = H−1 −H−1X(X ′H−1X)−1X ′H−1.

Note that y′Py = (y−Xτ )′H−1(y−Xτ ). The log-likelihood (2.4) depends onX and not on the particular non-unique transformation defined by L.

The log residual likelihood (ignoring constants) can be written as

`R = −12(log detC + log detR + log detG + y′Py). (2.5)

We can also write

P = R−1 −R−1WC−1W ′R−1

2 Some theory 12

with W = [X Z] . Letting κ = (γ, φ), the REML estimates of κi are found bycalculating the score

U(κi) = ∂`R/∂κi = −12[tr (PH i)− y′PH iPy] (2.6)

and equating to zero. Note that H i = ∂H/∂κi.

The elements of the observed information matrix are

− ∂2`R

∂κi∂κj=

12tr (PH ij)− 1

2tr (PH iPHj)

+ y′PH iPHjPy − 12y′PH ijPy (2.7)

where H ij = ∂2H/∂κi∂κj .

The elements of the expected information matrix are

E

(− ∂2`R

∂κi∂κj

)=

12tr (PH iPHj) . (2.8)

Given an initial estimate κ(0), an update of κ, κ(1) using the Fisher-scoring (FS)algorithm is

κ(1) = κ(0) + I(κ(0), κ(0))−1U(κ(0)) (2.9)

where U(κ(0)) is the score vector (2.6) and I(κ(0), κ(0)) is the expected infor-mation matrix (2.8) of κ evaluated at κ(0).

For large models or large data sets, the evaluation of the trace terms in either(2.7) or (2.8) is either not feasible or is very computer intensive. To overcomethis problem ASReml uses the AI algorithm (Gilmour, Thompson and Cullis,1995). The matrix denoted by IA is obtained by averaging (2.7) and (2.8) andapproximating y′PH ijPy by its expectation, tr (PH ij) in those cases whenH ij 6= 0. For variance components models (that is those linear with respect tovariances in H), the terms in IA are exact averages of those in (2.7) and (2.8).The basic idea is to use IA(κi, κj) in place of the expected information matrix in(2.9) to update κ.

The elements of IA are

IA(κi, κj) =12y′PH iPHjPy. (2.10)

2 Some theory 13

The IA matrix is the (scaled) residual sums of squares and products matrix of

y = [y1, . . . ,yk]

where yi is the ‘working’ variate for κi and is given by

yi = H iPy

= H iR−1e

= RiR−1e, κi ∈ φ

= ZGiG−1u, κi ∈ γ

where e = y −Xτ − Zu, τ and u are solutions to (2.11). In this form the AImatrix is relatively straightforward to calculate.

The combination of the AI algorithm with sparse matrix methods, in which onlynon-zero values are stored, gives an efficient algorithm in terms of both computingtime and workspace.

Estimation/prediction of the fixed and random effects

To estimate τ and predict u the objective function

log fY (y | u ; τ ,R) + log fU (u ; G)

is used. The is the log-joint distribution of (Y , u).

Differentiating with respect to τ and u leads to the mixed model equations(Robinson, 1991) which are given by

[X ′R−1X X ′R−1ZZ ′R−1X Z ′R−1Z + G−1

] [τu

]=

[X ′R−1yZ ′R−1y

]. (2.11)

These can be written asCβ = WR−1y

where C = W ′R−1W + G∗, β = [τ ′ u′]′ and

G∗ =

[0 00 G−1

].

The solution of (2.11) requires values for γ and φ. In practice we replace γ andφ by their REML estimates γ and φ.

2 Some theory 14

Note that τ is the best linear unbiased estimator (BLUE) of τ , while u is the bestlinear unbiased predictor (BLUP) of u for known γ and φ. We also note that

β − β =[

τ − τu− u

]∼ N

([00

], C−1

).

2.3 What are BLUPs?

Consider a balanced one-way classification. For data records ordered r repeatswithin b treatments, the linear mixed model is y = Xτ + Zu + e where X =1b ⊗ 1r is the design matrix for τ , Z = Ib ⊗ 1r is the design matrix for the btreatment effects ui and e is the error vector. In the following we assume, thatthe treatment effects are random. That is, u ∼ N(Aψ, σ2

bIb), for some designmatrix A and parameter vector ψ. It can be shown that

u =bσ2

b

bσ2b + σ2

(y − 1y··) +σ2

bσ2b + σ2

Aψ (2.12)

where y is the vector of treatment means, y·· is the grand mean. The differencesof the treatment means and the grand mean are the estimates of treatment effectsif treatment effects are fixed. The BLUP is therefore a weighted mean of the databased estimate and the ‘prior’ mean Aψ. If ψ = 0, the BLUP in (2.12) becomes

u =bσ2

b

bσ2b + σ2

(y − 1y··) (2.13)

and the BLUP is a so-called shrinkage estimate. As σ2b becomes large relative to

σ2, the BLUP tends to the fixed effect solution, while for small σ2b relative to σ2

the BLUP tends towards zero, the assumed initial mean. Thus (2.13) represents aweighted mean which involves the prior assumption that the ui have zero mean.

Note also that the BLUPs in this simple case are constrained to sum to zero. Thisis essentially because the unit vector defining X can be found by summing thecolumns of the Z matrix. This linear dependence of the matrices translates todependence of the BLUPs and hence constraints. This aspect occurs wheneverthe column space of X is contained in the column space of Z. The dependenceis slightly more complex with correlated random effects.

2 Some theory 15

2.4 Combining variance models

The combination of variance models within G structures and R structures andbetween G structures and R structures is a difficult and important concept. Theunderlying principle is that each Ri and Gi variance model can only have a singlescaling variance parameter associated with it. If there is more than one scalingvariance parameter for any Ri or Gi then this results in the variance model beingoverspecified, or nonidentifiable. Some variance models are presented in Table 2.1to illustrate this principle.

While all 9 forms of model in Table 2.1 can be specified within ASReml onlymodels of forms 1 and 2 are recommended. Models 4-6 have too few variance pa-rameters and are likely to cause serious estimation problems. For model 3, wherethe scale parameter θ has been fitted (univariate single site analysis), it becomesthe scale for G. This parameterisation is bizarre and is not recommended. Mod-els 7-9 have too many variance parameters and ASReml will arbitrarily fix one ofthe variance parameters leading to possible confusion for the user. If you fix thevariance parameter to a particular value then it does not count for the purposesof applying the principle that there be only one scaling variance parameter. Thatis, models 7-9 can be made identifiable by fixing all but one of the nonidentifiablescaling parameters in each of G and R to a particular value.

Table 2.1 Combination of models for G and R structures

model G1 G2 R1 R2 θ comment

1. V C C C y valid2. V C V C n valid3. C C V C y valid, but not recommended4. * * C C n inappropriate as R is a correlation model5. C C C C y inappropriate, same scale for R and G6. C C V C n inappropriate, no scaling parameter for G7. V V * * * nonidentifiable, 2 scaling parameters for G8. V C V C y nonidentifiable, scale for R and overall scale9. * * V V * nonidentifiable, 2 scaling parameters for R

* indicates the entry is not relevant in this caseNote that G1 and G2 are interchangeable in this table, as are R1 and R2

2 Some theory 16

2.5 Inference

Tests of hypotheses: variance parameters

Inference concerning variance parameters of a linear mixed effects model usu-ally relies on approximate distributions for the (RE)ML estimates derived fromasymptotic results.

It can be shown that the approximate variance matrix for the REML estimates isgiven by the inverse of the expected information matrix (Cox and Hinkley, 1974,section 4.8). Since this matrix is not available in ASReml we replace the expectedinformation matrix by the AI matrix. Furthermore the REML estimates are con-sistent and asymptotically normal, though in small samples this approximationappears to be unreliable (see later).

A general method for comparing the fit of nested models fitted by REML is theREML likelihood ratio test, or REMLRT. The REMLRT is only valid if the fixedeffects are the same for both models. In ASReml this requires not only the samefixed effects model, but also the same parameterisation.

If `R2 is the REML log-likelihood of the more general model and `R1 is the REMLlog-likelihood of the restricted model (that is, the REML log-likelihood under thenull hypothesis), then the REMLRT is given by

D = 2 log(`R2/`R1) = 2 [log(`R2)− log(`R1)] (2.14)

which is strictly positive. If ri is the number of parameters estimated in modeli, then the asymptotic distribution of the REMLRT, under the restricted modelis χ2

r2−r1.

The REMLRT is implicitly two-sided, and must be adjusted when the test involvesan hypothesis with the parameter on the boundary of the parameter space. Infact, theoretically it can be shown that for a single variance component, say, theasymptotic distribution of the REMLRT is a mixture of χ2 variates, where themixing probabilities are 0.5, one with 0 degrees of freedom (spike at 0) and theother with 1 degree of freedom. The distribution of the REMLRT for the testthat k variance components are zero, or tests involved in random regressions,which involve both variance and covariance components, involves a mixture of χ2

variates from 0 to k degrees of freedom. See Self and Liang (1987) for details.

Tests concerning variance components in generally balanced designs, such as thebalanced one-way classification, can be derived from the usual analysis of vari-ance. It can be shown that the REMLRT for a variance component being zero is

2 Some theory 17

a monotone function of the F-statistic for the associated term.

To compare two (or more) non-nested models we can evaluate the Akaike Infor-mation Criteria (AIC) or the Bayesian Information Criteria (BIC) for each model.These are given by

AIC = −2`Ri + 2ti

BIC = −2`Ri + ti log ν (2.15)

where ti is the number of variance parameters in model i and ν = n − p is theresidual degrees of freedom. AIC and BIC are calculated for each model and themodel with the smallest value is chosen as the preferred model.

Diagnostics

In this section we will briefly review some of the diagnostics that have been im-plemented in ASReml for examining the adequacy of the assumed variance matrixfor either R or G structures, or for examining the distributional assumptions re-garding e or u. Firstly we note that the BLUP of the residual vector is givenby

e = y −Wβ

= RPy (2.16)

It follows that

E (e) = 0

var (e) = R−WC−1W ′

The matrix WC−1W ′ is the so-called ‘extended hat’ matrix. It is the linearmixed effects model analogue of X(X ′X)−1X ′ for ordinary linear models. Thediagonal elements are returned in the .yht file by ASReml .

The variogram has been suggested as a useful diagnostic for assisting with theidentification of appropriate variance models for spatial data (Cressie, 1991).Gilmour et al. (1997) demonstrate its usefulness for the identification of thesources of variation in the analysis of field experiments. If the elements of thedata vector (and hence the residual vector) are indexed by a vector of spatialcoordinates, si, i = 1, . . . , n, then the ordinates of the sample variogram aregiven by

vij =12

[ei(si)− ej(sj)] , i, j = 1, . . . , n; i 6= j

The sample variogram is the triple (lij1, lij2, vij) where lij1 = |si1 − sj1| andlij2 = |si2 − sj2| are the absolute displacements. If the data arise from a regular

2 Some theory 18

array there will be many vij with the same absolute displacements, in which caseASReml displays the vector (lij1, lij2, vij) as a perspective plot and returns it inthe .res file.

Tests of hypotheses: fixed effects

Testing the significance of fixed effects under REML does introduce a difficulty.The standard approach is to use likelihood ratio methods. Unfortunately, if thefixed effects part of the model is changed, the residual likelihood changes. Thusthe maximized residual likelihood under the null and alternative hypotheses arenot comparable. Alternative methods are required.

An obvious approach is to use the Wald statistic. To test the hypothesis H0 :Lτ = l for given L, r × p, and l, r × 1, the Wald statistic is given by

W = (Lτ − l)′{L(X ′H−1X)−1L′}−1(Lτ − l) (2.17)

and asymptotically, this statistic has a chi-square distribution on r degrees offreedom. These are marginal tests, so that there is an adjustment for all otherterms in the fixed part of the model.

It is also possible to carry out a sequential test so that higher order terms, forexample interactions, are adjusted for lower order terms in the model. The lattertests conform more to the analysis of variance philosophy. The test statisticimplicit in this sequential approach is the change in residual sum of squaresbetween the model not containing the term and that containing the term.

ASReml produces an F-statistic by dividing the Wald test by r. In this form itis possible to perform an approximate F test. The numerator degrees of freedomfor this test is r. The denominator degrees of freedom is generally not knownexactly. It is possible, though anti-conservative for most situations to use theresidual degrees of freedom for the denominator. For balanced designs the Fstatistics produced by ASReml are numerically identical to the F-tests obtainedfrom the standard analysis of variance.

3 A guided tour

Introduction

Nebraska Intrastate Nursery (NIN) field experiment

The ASReml data file

The ASReml command file

The title lineReading the dataThe data file line

Specifying the terms in the mixed modelTabulationPrediction

Variance structures

Running the job

Description of output files

The .asr fileThe .sln fileThe .yht file

Tabulation, predicted values and functions of the variancecomponents

19

3 A guided tour 20

3.1 Introduction

This chapter presents a guided tour of ASReml, from data file preparation andbasic aspects of the ASReml command file, to running an ASReml job and inter-preting the output files. You are encouraged to read this chapter before movingto the later chapters;

• a real data example is used in this chapter for demonstration, see below,

• the same data are also used in later chapters,

• links to the formal discussion of topics are clearly signposted by margin notes.

Note that some aspects of ASReml, in particular, pedigree files (see Chapter 9)and multivariate analysis (see Chapter 8) are only covered in later chapters.

ASReml is usually run in batch mode from a command prompt. You will need afile editor to create the command file and to view the various output files. On unixsystems, vi and emacs are commonly used. Under Windows, there are severalprogram editors available such as Brief and Winedt. You can use NOTEPAD orWORDPAD. ASReml has a built in editor which is accessed through menu mode (page).

3.2 Nebraska Intrastate Nursery (NIN) field experiment

The yield data from an advanced Nebraska Intrastate Nursery (NIN) breedingtrial conducted at Alliance in 1988/89 will be used for demonstration, see Stroupet al. (1994) for details. Four replicates of 19 released cultivars, 35 experimen-tal wheat lines and 2 additional triticale lines were laid out in a 22 row by 11column rectangular array of plots; the varieties were allocated to the plots usinga randomised complete block (RCB) design. In field trials, complete replicatesare typically allocated to consecutive groups of whole columns or rows. In thistrial the replicates were not allocated to groups of whole columns, but rather,overlapped columns. Table 3.1 gives the allocation of varieties to plots in fieldplan order with replicates 1 and 3 in ITALICS and replicates 2 and 4 in BOLD.

Tab

le3.

1:Trial

layo

ut

and

allo

cation

ofva

riet

ies

toplo

tsin

the

NIN

fiel

dtr

ial colu

mn

row

12

34

56

78

910

11

1-

NE83407

BU

CK

SK

INN

E87612

VO

NA

NE87512

NE87408

CO

DY

BU

CK

SK

INN

E87612

KS831374

2-

CEN

TU

RA

NE86527

NE87613

NE87463

NE83407

NE83407

NE87612

NE83406

BU

CK

SK

IN

NE86482

3-

SCO

UT66

NE86582

NE87615

NE86507

NE87403

NO

RK

AN

NE87457

NE87409

NE85556

NE85623

4-

CO

LT

NE86606

NE87619

BU

CK

SK

IN

NE87457

RED

LA

ND

NE84557

NE87499

BR

ULE

NE86527

5-

NE83498

NE86607

NE87627

RO

UG

HR

ID

ER

NE83406

KS831374

NE83T12

CEN

TU

RA

NE86507

NE87451

6-

NE84557

RO

UG

HR

IDER

-N

E86527

CO

LT

CO

LT

NE86507

NE83432

RO

UG

HR

ID

ER

NE87409

7-

NE83432

VO

NA

CEN

TU

RA

SC

OU

T66

NE87522

NE86527

TA

M200

NE87512

VO

NA

GA

GE

8-

NE85556

SIO

UX

LA

ND

NE85623

NE86509

NO

RK

AN

VO

NA

NE87613

RO

UG

HR

IDER

NE83404

NE83407

9-

NE85623

GAG

EC

OD

YN

E86606

NE87615

TA

M107

AR

APA

HO

EN

E83498

CO

DY

NE87615

10

-CEN

TU

RA

K78

NE83T12

NE86582

NE84557

NE85556

CEN

TU

RA

K78

SCO

UT66

-N

E87463

AR

APA

HO

E

11

-N

OR

KA

NN

E86T666

NE87408

KS831374

TA

M200

NE87627

NE87403

NE86T

666

NE86582

CH

EY

EN

NE

12

-K

S831374

NE87403

NE87451

GA

GE

LA

NC

OTA

NE86T666

NE85623

NE87403

NE87499

RED

LA

ND

13

-TA

M200

NE87408

NE83432

NE87619

NE86503

NE87615

NE86509

NE87512

NO

RK

AN

NE83432

14

-N

E86482

NE87409

CEN

TU

RA

K78

NE87499

NE86482

NE86501

NE85556

NE87446

SC

OU

T66

NE87619

15

-H

OM

ESTEA

DN

E87446

NE83T

12

CH

EY

EN

NE

BR

ULE

NE87522

HO

MESTEA

DC

EN

TU

RA

NE87513

NE83498

16

LA

NCER

LA

NCO

TA

NE87451

NE87409

NE86607

NE87612

CH

EY

EN

NE

NE83404

NE86503

NE83T

12

NE87613

17

BRU

LE

NE86501

NE87457

NE87513

NE83498

NE87613

SIO

UX

LA

ND

NE86503

NE87408

CEN

TU

RA

K78

NE86501

18

RED

LA

ND

NE86503

NE87463

NE87627

NE83404

NE86T

666

NE87451

NE86582

CO

LT

NE87627

TA

M200

19

CO

DY

NE86507

NE87499

AR

APA

HO

EN

E87446

-G

AG

EN

E87619

LA

NC

ER

NE86606

NE87522

20

AR

APA

HO

EN

E86509

NE87512

LA

NC

ER

SIO

UX

LA

ND

NE86607

LA

NCER

NE87463

NE83406

NE87457

NE84557

21

NE83404

TA

M107

NE87513

TA

M107

HO

MEST

EA

DLA

NCO

TA

NE87446

NE86606

NE86607

NE86509

TA

M107

22

NE83406

CH

EY

EN

NE

NE87522

RED

LA

ND

NE86501

NE87513

NE86482

BRU

LE

SIO

UX

LA

ND

LA

NC

OTA

HO

MEST

EA

D

3 A guided tour 22

3.3 The ASReml data file

The standard format of an ASReml data file is to have the data arranged in space,See Chapter 4

for details TAB or comma separated columns/fields with a line for each sampling unit. Thecolumns contain covariates, factors, response variates (traits) and weight variablesin any convenient order. This is the first 30 lines of the file nin89.asd containingthe data for the NIN variety trial. The data are in field order (rows withincolumns) and an optional row of field labels (first row) has been included. Inthis case there are 11 space separated data fields (variety. . . column) and thecomplete file has 224 data rows, one for each variety in each replicate.

variety id pid raw repl nloc yield lat long row column

LANCER 1 1101 585 1 4 29.25 4.3 19.2 16 1

BRULE 2 1102 631 1 4 31.55 4.3 20.4 17 1

REDLAND 3 1103 701 1 4 35.05 4.3 21.6 18 1

CODY 4 1104 602 1 4 30.1 4.3 22.8 19 1

ARAPAHOE 5 1105 661 1 4 33.05 4.3 24 20 1

NE83404 6 1106 605 1 4 30.25 4.3 25.2 21 1

NE83406 7 1107 704 1 4 35.2 4.3 26.4 22 1

NE83407 8 1108 388 1 4 19.4 8.6 1.2 1 2

CENTURA 9 1109 487 1 4 24.35 8.6 2.4 2 2

SCOUT66 10 1110 511 1 4 25.55 8.6 3.6 3 2

COLT 11 1111 502 1 4 25.1 8.6 4.8 4 2

NE83498 12 1112 492 1 4 24.6 8.6 6 5 2

NE84557 13 1113 509 1 4 25.45 8.6 7.2 6 2

NE83432 14 1114 268 1 4 13.4 8.6 8.4 7 2

NE85556 15 1115 633 1 4 31.65 8.6 9.6 8 2

NE85623 16 1116 513 1 4 25.65 8.6 10.8 9 2

CENTURAK78 17 1117 632 1 4 31.6 8.6 12 10 2

NORKAN 18 1118 446 1 4 22.3 8.6 13.2 11 2

KS831374 19 1119 684 1 4 34.2 8.6 14.4 12 2

TAM200 20 1120 422 1 4 21.1 8.6 15.6 13 2

NE86482 21 1121 560 1 4 28 8.6 16.8 14 2

HOMESTEAD 22 1122 566 1 4 28.3 8.6 18 15 2

LANCOTA 23 1123 514 1 4 25.7 8.6 19.2 16 2

NE86501 24 1124 635 1 4 31.75 8.6 20.4 17 2

NE86503 25 1125 840 1 4 42 8.6 21.6 18 2

NE86507 26 1126 618 1 4 30.9 8.6 22.8 19 2

NE86509 27 1127 658 1 4 32.9 8.6 24 20 2

TAM107 28 1128 481 1 4 24.05 8.6 25.2 21 2

CHEYENNE 29 1129 564 1 4 28.2 8.6 26.4 22 2...

optional field labelsdata line sampling unit 1data line sampling unit 2

.

.

.

Note that in Chapter 7 these data are analysed using spatial methods of analysis,see model 3a in Section 7.3. For spatial analysis using a separable error structure(see Chapter 2) the data file must first be augmented to specify the complete 22

3 A guided tour 23

row × 11 column array of plots. These are the first 20 lines of the augmenteddata file nin89aug.asd with 242 data rows.


LANCER 1 NA NA 1 4 NA 4.3 1.2 1 1

LANCER 1 NA NA 1 4 NA 4.3 2.4 2 1

LANCER 1 NA NA 1 4 NA 4.3 3.6 3 1

LANCER 1 NA NA 1 4 NA 4.3 4.8 4 1

LANCER 1 NA NA 1 4 NA 4.3 6 5 1

LANCER 1 NA NA 1 4 NA 4.3 7.2 6 1

LANCER 1 NA NA 1 4 NA 4.3 8.4 7 1

LANCER 1 NA NA 1 4 NA 4.3 9.6 8 1

LANCER 1 NA NA 1 4 NA 4.3 10.8 9 1

LANCER 1 NA NA 1 4 NA 4.3 12 10 1

LANCER 1 NA NA 1 4 NA 4.3 13.2 11 1

LANCER 1 NA NA 1 4 NA 4.3 14.4 12 1

LANCER 1 NA NA 1 4 NA 4.3 15.6 13 1

LANCER 1 NA NA 1 4 NA 4.3 16.8 14 1

LANCER 1 NA NA 1 4 NA 4.3 18 15 1

LANCER 1 NA NA 2 4 NA 17.2 7.2 6 4

LANCER 1 NA NA 3 4 NA 25.8 22.8 19 6

LANCER 1 NA NA 4 4 NA 38.7 12.0 10 9

LANCER 1 1101 585 1 4 29.25 4.3 19.2 16 1

BRULE 2 1102 631 1 4 31.55 4.3 20.4 17 1

REDLAND 3 1103 701 1 4 35.05 4.3 21.6 18 1

CODY 4 1104 602 1 4 30.1 4.3 22.8 19 1...

optional field labelsfile augmented bymissing values for first15 plots and 3 bufferplots and variety codedLANCER to complete22×11 array...

buffer plotsbetween reps

original data...

Note that

• the pid, raw, repl and yield data for the missing plots have all been madeNA (one of the three missing value indicators in ASReml, see Section 4.2),

• variety is coded LANCER for all missing plots; one of the variety names mustbe used but the particular choice is arbitrary.

3.4 The ASReml command file

By convention an ASReml command file has a .as extension. The file definesSee Chapters 5,

6 and 7 for de-

tails• a title line to describe the job,

• labels for the data fields in the data file and the name of the data file,

• the linear mixed model and the variance model(s) if required,

• output options including directives for tabulation and prediction.

3 A guided tour 24

Below is the ASReml command file for an RCB analysis of the NIN field trial datahighlighting the main sections. Notice the order of the main sections.

title line −→data field definition−→

.

.

.

data field definition −→data file name and qualifiers−→

linear mixed model definition−→tabulate statement−→predict statement−→

variance model specification−→..

NIN Alliance trial 1989

variety !A

id

pid

raw

repl 4

nloc

yield

lat

long

row 22

column 11

nin89.asd !skip 1

yield ∼ mu variety !r repl

tabulate yield ∼ variety

predict variety

0 0 1

repl 1

repl 0 IDV 0.1

The title line


variety !A

id...

The first non-blank line in an ASReml com-mand file is taken as the title for the job andis used to identify the analysis for future ref-erence.

Reading the data


variety !A

id

pid

raw

repl 4

nloc

yield

lat

long

row 22

column 11

nin89.asd !skip 1...

The data fields are defined prior to the datafile name to tell ASReml how many fields toexpect in the data file and what the fields are.Field definitions must be given for all fields inthe data file and in the order in which theyappear in the data file. Data field defini-tions must be indented. In this case thereare 11 data fields in the data file nin89.asd,see Section 3.3. They are specified in the com-mand file in the order they appear in the datafile, that is, variety . . . column. The !A aftervariety tells ASReml that the first field is al-

3 A guided tour 25

phanumeric and the 4 after repl tells ASReml that the field called repl (the fifthfield read) is a numeric factor with 4 levels. Similarly for row and column. Theother data fields include variates (yield) and various other variables.

The data file line


variety !A

id

pid...

row 22

column 11

nin89.asd !skip 1



predict variety

0 0 1

repl 1

repl 0 IDV 0.1

The data file name is specified immediatelyafter the last data field definition. Data filequalifiers that relate to data input and outputare also placed on this line if they are required.In this example, !skip 1 tells ASReml to readSee Section 5.7

the data from nin89.asd but to ignore thefirst line in this file, the line containing thefield labels.

The data file line can also contain qualifiersSee Section 5.8

that control other aspects of the analysis.These qualifiers are presented in Section 5.8.

Specifying the terms in the mixed model


variety !A...

column 11

nin89.asd !skip 1



predict variety

0 0 1

repl 1

repl 0 IDV 0.1

The linear mixed model is specified as a list ofSee Chapter 6

model terms and qualifiers. All elements mustbe space separated. ASReml accommodates awide range of analyses. See Section 2.1 for abrief discussion and general algebraic formu-lation of the linear mixed model. The modelspecified here for the NIN data is a simple ran-dom effects RCB model including fixed vari-ety effects and random replicate effects. Thereserved word mu fits a constant term (inter-cept), variety fits a fixed variety effect and repl fits a random replicate effect.The !r qualifier tells ASReml to fit the terms that appear after this qualifier asrandom effects.

3 A guided tour 26

Tabulation


variety !A...

column 11

nin89.asd !skip 1



predict variety

0 0 1

repl 1

repl 0 IDV 0.1

Tabulate statements provide a simple way ofSee Chapter 10

exploring the structure of a data file. They ap-pear after the model line and before any vari-ance structure definition lines. In this case the56 simple variety means for yield are formedand written to a .tab output file. See Chapter10 for a discussion of tabulation.

Prediction


variety !A...

column 11

nin89.asd !skip 1



predict variety

0 0 1

repl 1

repl 0 IDV 0.1

Prediction statements also appear after theSee Chapter 10

model statement and before any variancestructure lines. In this case the 56 varietymeans for yield would be formed and returnedin the .pvs output file. See Chapter 10 for adetailed discussion of prediction in ASReml.

Variance structures


variety !A...

column 11

nin89.asd !skip 1



predict variety

0 0 1

repl 1

repl 0 IDV 0.1

An extensive range of variance structures canSee Chapter 7

be fitted in ASReml. See Chapter 7 for alengthy discussion of variance modelling inASReml. In this case independent and identi-cally distributed random replicate effects arespecified using the identifier IDV in a G struc-ture. G structures are described in Section2.1 and the list of available variance struc-tures/models is presented in Table 7.3. Notethat IDV is the default variance structure forrandom effects so that the last 3 lines presented here are purely for example: thesame analysis would be performed if these lines were omitted.

3 A guided tour 27

3.5 Running the job

An ASReml job is often run from a command line. The basic command to runSee Chapter 12

an ASReml job is normally

asreml basename[.as]

where basename[.as] is the name of the command file. For example, the commandto run nin89.as is

asreml nin89.as

However, if the path to program asreml.exe is not specified in your system’sPath environment variable, a path to this program must also be given. In thiscase the command to run the ASReml job in a PC/Windows environment is

[path]asreml basename[.as]

where path might be c:\asreml\ pointing to the location of the ASReml program.ASReml is always run using an ASCII text command file. In this guide we assumeMake a habit of

giving command

files the .as ex-

tension

the command file has a filename extension .as. ASReml also recognises thefilename extension .asc as an ASReml command file. When these are used, theextension (.as or .asc) may be omitted from basename.as in the command lineif there is no file in the same sub-directory with the name basename. There areoptions and arguments that can be supplied on the command line to modify ajob at run time. These are described in Chapter 12.

3.6 Description of output files

A series of output files are produced with each ASReml run. Running the exam-ple as above, the primary output is written to nin89.asr. This file contains asummary of the data, the iteration sequence, estimates of the variance param-eters, an analysis of variance (ANOVA) table and estimates of the fixed effects.The estimates of all the fixed and random effects are written to nin89.sln. Theresiduals, predicted values of the observations and the diagonal elements of thehat matrix (see Chapter 2) are returned in nin89.yht, see Section 13.3. Otherfiles include the .res, .asl, .asp, .pin, .pvc, .pvs, .tab, .vvp and .vrb files,see Section 13.4.

The .asr file

Below is nin89.asr with pointers to the main sections. The title line gives theversion of ASReml used (in square brackets) and the title of the job. The second

3 A guided tour 28

line gives the date and time that the job was run. The following output presentsa data summary, the iteration sequence, the estimated variance parameters, anANOVA and estimates of the fixed effects. The second to last line notifies the userof possible outliers as recorded in the .res file and the final line gives the dateand time that the job was completed and a statement about convergence. Thegeneral announcements box (outlined in asterisks) at the top of the file notifiesthe user of current release features.

ASReml 1.0 [31 Aug 2002] nin alliance trialtitle line13 Sep 2002 12:55:17.777 8.00 Mbyte Windows nin89date and time* ASReml 2002 *********************************************release features* Contact [email protected] for licensing and support *

* Use !{ and !} when fitting covariance across terms *

***************************************************** ARG *

QUALIFIERS: !SKIP 1

Reading nin89.asd FREE FORMAT skipping 1 linesdata summaryFolder: C:\asr\ex\manex

Univariate analysis of yield

Using 224 records of 224 read

Model term Size Type COL Minimum Mean Maximum #zero #miss

1 variety 56 Factor 1 1 28.5000 56 0 0

2 id 2 1.000 28.50 56.00 0 0

3 pid 3 1101. 2629. 4156. 0 0

4 raw 4 21.00 510.5 840.0 0 0

5 repl 4 Factor 5 1 2.5000 4 0 0

6 nloc 6 4.000 4.000 4.000 0 0

7 yield 1 Variate 7 1.050 25.53 42.00 0 0

8 lat 8 4.300 27.22 47.30 0 0

9 long 9 1.200 14.08 26.40 0 0

10 row 22 Factor 10 1 11.7321 22 0 0

11 column 11 Factor 11 1 6.3304 11 0 0

12 mu 1 Constant Term

Forming 61 equations: 57 dense

Initial updates will be shrunk by factor 0.316

NOTICE: 1 (more) singularities,

1 LogL=-454.807 S2= 50.329 168 df 0.1000 1.000iteration2 LogL=-454.663 S2= 50.120 168 df 0.1173 1.000sequence3 LogL=-454.532 S2= 49.868 168 df 0.1463 1.000

4 LogL=-454.472 S2= 49.637 168 df 0.1866 1.000

5 LogL=-454.469 S2= 49.585 168 df 0.1986 1.000

6 LogL=-454.469 S2= 49.582 168 df 0.1993 1.000

Final parameter values 0.19932 1.0000

Source Model terms Gamma Component Comp/SE % Cestimatedrepl 4 4 0.199323 9.88291 1.12 0 PvarianceVariance 224 168 1.00000 49.5824 9.08 0 Pparameters

3 A guided tour 29

Analysis of Variance DF F-incrANOVA

estimated effects

1 variety 55 0.88

NOTICE: ASREML cannot yet determine the correct denominator DFs for

these F statistics. They will not exceed the Residual DF

which applies only if all components are KNOWN.

Estimate Standard Error T-value T-prev

1 variety

BRULE -2.48750 4.97908 -0.50

REDLAND 1.93750 4.97908 0.39 0.89

CODY -7.35000 4.97908 -1.48 -1.87

ARAPAHOE 0.875000 4.97908 0.18 1.65

NE83404 -1.17500 4.97908 -0.24 -0.41...

NE87619 2.70000 4.97908 0.54 1.12

NE87627 -5.33750 4.97908 -1.07 -1.61

12 mu

57 28.5625 3.85568 7.41

5 repl 4 effects fitted

1 possible outliers: see .res fileoutlier info

completion info Finished: 13 Sep 2002 12:55:22.133 LogL Converged

The .sln file

The following is an extract from nin89.sln containing the estimated varietyeffects, intercept and random replicate effects in this order (column 3) with stan-dard errors (column 4). Note that the variety effects are returned in the order oftheir first appearance in the data file, see replicate 1 in Table 3.1. Note also thatLANCER did not appear in the list in the .asr file because it was aliased with mu(see 6.9) but it does appear here.

variety LANCER 0.000 0.000

variety BRULE -2.487 4.979

variety REDLAND 1.938 4.979

variety CODY -7.350 4.979

variety ARAPAHOE 0.8750 4.979

variety NE83404 -1.175 4.979

variety NE83406 -4.287 4.979

variety NE83407 -5.875 4.979

3 A guided tour 30

variety CENTURA -6.912 4.979

variety SCOUT66 -1.037 4.979

variety COLT -1.562 4.979

variety NE83498 1.563 4.979

variety NE84557 -8.037 4.979

variety NE83432 -8.837 4.979...

variety NE87615 -2.875 4.979

variety NE87619 2.700 4.979

variety NE87627 -5.337 4.979

mu 1 28.56 3.856

repl 1 1.880 1.755

repl 2 2.843 1.755

repl 3 -0.8713 1.755

repl 4 -3.852 1.755

The .yht file

The following is an extract from nin89.yht containing the predicted values ofthe observations (column 2), the residuals (column 3) and the diagonal elementsof the hat matrix. This final column can be used in tests involving the residuals,see Section 2.5 under Diagnostics.

Record Yhat Residual Hat

1 30.442 -1.192 13.01

2 27.955 3.595 13.01

3 32.380 2.670 13.01

4 23.092 7.008 13.01

5 31.317 1.733 13.01

6 29.267 0.9829 13.01

7 26.155 9.045 13.01

8 24.567 -5.167 13.01

9 23.530 0.8204 13.01...

222 16.673 9.877 13.01

223 24.548 1.052 13.01

224 23.786 3.114 13.01

3 A guided tour 31

3.7 Tabulation, predicted values and functions of the variance com-ponents

It may take several runs of ASReml to determine an appropriate model for thedata, that is, the fixed and random effects that are important. During this processyou may wish to explore the data by simple tabulation. Having identified anappropriate model, you may then wish to form predicted values or functions of thevariance components. The facilities available in ASReml to determine predictedvalues and functions of the variance components are described in Chapters 10and 11 respectively. Our example only includes tabulation and prediction.

The statement


in nin89.as results in nin89.tab as follows:

nin alliance trial 13 Sep 2002 12:35:40

Simple tabulation of yield

variety

LANCER 28.56

BRULE 26.07

REDLAND 30.50

CODY 21.21

ARAPAHOE 29.44

NE83404 27.39

NE83406 24.28

NE83407 22.69

CENTURA 21.65

SCOUT66 27.52

COLT 27.00...

NE87522 25.00

NE87612 21.80

NE87613 29.40

NE87615 25.69

NE87619 31.26

NE87627 23.23

3 A guided tour 32

The

predict variety

statement after the model statement in nin89.as results in the nin89.pvs filedisplayed below (some output omitted) containing the 56 predicted variety means,also in the order in which they first appear in the data file (column 2), togetherwith standard errors (column 3). An average standard error of difference amongthe predicted variety means is displayed immediately after the list of predictedvalues. As in the .asr file, date, time and trial information are given the title line.The Ecode for each prediction (column 4) is usually E indicating the prediction isof an estimable function. Predictions of non-estimable functions are usually notprinted, see Chapter 10.

nin alliance trial 13 Sep 2002 12:35:40title line

nin893

Ecode is E for Estimable, * for Not Estimable

---- ---- ---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ---- ----

Predicted values of yield

repl is ignored in the prediction (except where specifically included

variety Predicted_Value Standard_Error Ecode

LANCER 28.5625 3.8557 Epredicted varietyeffects BRULE 26.0750 3.8557 E

REDLAND 30.5000 3.8557 E

CODY 21.2125 3.8557 E

ARAPAHOE 29.4375 3.8557 E

NE83404 27.3875 3.8557 E

NE83406 24.2750 3.8557 E

NE83407 22.6875 3.8557 E

CENTURA 21.6500 3.8557 E

SCOUT66 27.5250 3.8557 E

COLT 27.0000 3.8557 E...

NE87613 29.4000 3.8557 E

NE87615 25.6875 3.8557 E

NE87619 31.2625 3.8557 E

NE87627 23.2250 3.8557 E

SED: Overall Standard Error of Difference 4.979

4 Data file preparation

Introduction

The data file

Free format data filesBinary format data files

33


4.1 Introduction

The first step in an ASReml analysis is to prepare the data file. Data file prepara-tion is described in this chapter using the NIN example of Chapter 3 for demon-stration. The first 25 lines of the data file are as follows:


BRULE 2 1102 631 1 4 31.55 4.3 20.4 17 1

REDLAND 3 1103 701 1 4 35.05 4.3 21.6 18 1

CODY 4 1104 602 1 4 30.1 4.3 22.8 19 1

ARAPAHOE 5 1105 661 1 4 33.05 4.3 24 20 1

NE83404 6 1106 605 1 4 30.25 4.3 25.2 21 1

NE83406 7 1107 704 1 4 35.2 4.3 26.4 22 1

NE83407 8 1108 388 1 4 19.4 8.6 1.2 1 2

CENTURA 9 1109 487 1 4 24.35 8.6 2.4 2 2

SCOUT66 10 1110 511 1 4 25.55 8.6 3.6 3 2

COLT 11 1111 502 1 4 25.1 8.6 4.8 4 2

NE83498 12 1112 492 1 4 24.6 8.6 6 5 2

NE84557 13 1113 509 1 4 25.45 8.6 7.2 6 2

NE83432 14 1114 268 1 4 13.4 8.6 8.4 7 2

NE85556 15 1115 633 1 4 31.65 8.6 9.6 8 2

NE85623 16 1116 513 1 4 25.65 8.6 10.8 9 2

CENTURK78 17 1117 632 1 4 31.6 8.6 12 10 2

NORKAN 18 1118 446 1 4 22.3 8.6 13.2 11 2

KS831374 19 1119 684 1 4 34.2 8.6 14.4 12 2

TAM200 20 1120 422 1 4 21.1 8.6 15.6 13 2

NE86482 21 1121 560 1 4 28 8.6 16.8 14 2

HOMESTEAD 22 1122 566 1 4 28.3 8.6 18 15 2

LANCOTA 23 1123 514 1 4 25.7 8.6 19.2 16 2

NE86501 24 1124 635 1 4 31.75 8.6 20.4 17 2

NE86503 25 1125 840 1 4 42 8.6 21.6 18 2...

4.2 The data file

The standard format of an ASReml data file is to have the data arranged incolumns/fields with a single line for each sampling unit. The columns containvariates and covariates (numeric), factors (alphanumeric), traits (response vari-ables) and weight variables in any order that is convenient to the user. The datafile may be free format, fixed format or a binary file.

Free format data files

The data are read free format (space, comma or TAB separated) unless the filename has extension .bin for real binary, or .dbl for double precision binary (see


below). Important points to note are as follows:

• blank lines are ignored,

• column headings, field labels or comments may be present at the top of thefile provided that the !skip qualifier (Table 5.2) is used to skip over them,

• NA,* and . are accepted as coding for missing values in free format data files;

– missing values can also be coded with a unique value (for example, 0 or -9)and then using the !M or !D qualifiers to declare them as missing values(see Table 5.1),

• comma delimited files whose file name ends in .csv or for which the !CSVqualifier is set recognise empty fields as missing values,

– a line beginning with a comma implies a preceding missing value,– consecutive commas imply a missing value,– a line ending with a comma implies a trailing missing value,– if the filename does not end in .csv or the !CSV qualifier is not set, commas

are treated as white space,

• characters following # on a line are ignored so this character may not be usedin alphanumeric fields,

• blank spaces, tabs and commas must not be used (embedded) in alphanumericfields unless the label is enclosed in quotes, for example, the name WillowCreek would need to be appear in the data file as ‘Willow Creek’ to avoiderror,

• the $ symbol must not be used in the data file,

• alphanumeric fields are truncated to 20 characters in length,

• extra data fields on a line are ignored,

• if there are fewer data items on a line than ASReml expects the remainder aretaken from the following line(s) except in .csv files were they are taken asmissing. If you end up with half the number of records you expected, this isprobably the reason,

• all lines beginning with ! followed by a blank are copied to the .asr file ascomments for the output; their contents are ignored,

• a data file line may not exceed 418 characters; if the data fields will not fit in418 characters, put some on the next line.


Fixed format data files

The format must be supplied with the !FORMAT qualifier which is described in(Table 5.5). However, if all fields are present and are separated, the file can beread free format.

Binary format data files

Conventions for binary files are as follows:

• binary files are read as unformatted Fortran binary in single precision if thefilename has a .bin or .BIN extension,

• fortran binary data files are read in double precision if the filename has a .dblor .DBL extension,

• ASReml recognises the value -1e37 as a missing value in binary files,

• fortran binary in the above means all real (.bin) or all double precision (.dbl)variables; mixed types, that is, integer and alphabetic binary representation ofvariables is not allowed in binary files,

• binary files can only be used in conjunction with a pedigree file if the pedigreefields are coded in the binary file so that they correspond with the pedigree file(this can be done using the !SAVE option in ASReml to form the binary file,see Table 5.5), or the identifiers are whole numbers less than 9,999,999 and the!RECODE qualifier is specified (see Table 5.5).

5 Command file: Reading the data

Introduction

Important rules

Title line

Specifying and reading the data

Data field definition syntax

Transforming the data

Transformation syntaxOther rules and examplesSpecial note on covariates

Other examples

Data file line

Data line syntax

Data file qualifiers

Job control qualifiers

37


5.1 Introduction

NIN Alliance Trial 1989

variety !A

id

pid

raw

repl 4

nloc

yield

lat

long

row 22

column 11

alliance89.asd !skip 1

yield ∼ mu variety

1 2

11 column AR1 .424

22 row AR1 .904

In the code box to the right is the ASReml

command file nin89.as for the Nebraska In-trastate Nursery (NIN) field experiment intro-duced Chapter 3. The lines that are high-lighted in bold/blue type relate to reading inthe data. In this chapter we use this exampleto discuss reading in the data in detail.

5.2 Important rules

In the ASReml command file

• all characters following a # symbol on aline are ignored,

• all blank lines are ignored,

• lines beginning with ! followed by a blank are copied to the .asr file ascomments for the output,

• a blank is the usual separator; tab is also a separator,

• maximum line length is 418 characters but it is usually best to limit lines to80 characters for ease of editing,

• a comma as the last character on the line is used to indicate that the currentlist is continued on the next line; a comma is not needed when ASReml knowshow many values to read,

• reserved words used in specifying the linear model (Table 6.1) are case sensitive;they need to be typed exactly as defined: they may not be abbreviated.

• a qualifier is a particular letter sequence beginning with an ! which sets anoption or changes some aspect of ASReml;

– some qualifiers require arguments,– qualifiers must appear on the correct line,– qualifier identifiers are not case sensitive,– qualifier identifiers may be truncated to 3 characters.


5.3 Title line

The first 40 characters of the first nonblankline in an ASReml command file are taken asa title for the job. Use this to identify theanalysis for future reference.


variety !A

id

pid...

5.4 Specifying and reading the data


variety !A

id

pid

raw

repl 4

nloc

yield

lat

long

row 22

column 11


yield ∼ mu variety...

The data fields are defined immediately afterthe job title. They tell ASReml how manyfields to expect in the data file and what theyare. Data field definitions

• should be given for all fields in the data file;data fields on the end of a data line thatdo not have a corresponding field definitionwill be ignored,

• must be presented in the order in whichthey appear in the data file,

• must be indented one or more spaces,Important

• can appear with other definitions on thesame line,

• data fields can be transformed as they are defined (see below),

• additional data fields can be created by transformation; these should be listedafter the data fields read from the data file.

Data field definition syntax

Data field definitions appear in the ASReml command file in the form

space label [field type ] [transformations ]

• space

– is a required space

• label

– is an alphanumeric string to identify the field,


– has a maximum of 31 characters of which only 20 are printed; the remainingcharacters are not displayed,

– must begin with a letter,– must not contain the special characters ., *, :, /, !, #, — or ( ,– reserved words (Table 6.1 and Table 7.3) must not be used,

• field type defines how a variable is interpreted if specified in the linear model,

– for a variate, leave field type blank or specify 1,– for a model factor, various qualifiers are required depending on the form of

the factor coding where n is the number of levels of the factor and s is a listof labels to be assigned to the levels:

* or n is used when the data field has values 1. . .n directly coding forthe factor unless the levels are to be labelled (see !L),

!A [n] is required if the data field is alphanumeric,!I [n] is required if the data is numeric but not 1. . .n; !I must be

followed by n if more than 1000 codes are present,!L s is used when the data field is numeric with values 1. . .n and

labels are to be assigned to the n levels, for exampleSex !L Male FemaleIf there are many labels, they may be written over several linesby using a trailing comma to indicate continuation of the list.

!P indicates the special case of a pedigree factor; ASReml will de-termine the levels from the pedigree file, see Section 9.3,

A warning is printed if the nominated value for n does not agree with theactual number of levels found in the data and if the nominated value is toosmall the correct value is used.

– for a group of m variates!G m is used when m contiguous data fields are to be

treated as a factor or group of variates. For examplelonghand...X1 X2 X3 X4 X5 ydata.daty ∼ mu X1 X2 X3 X4 X5

shorthand...X !G 5 ydata.daty ∼ mu X

so that the 5 variates can be referred to in the model as X byusing X !G 5

• transformations

– see below


5.5 Transforming the data

Transformation is the process of modifying the data (for example, dividing allof the data values in a field by 10) or deriving new data for analysis (for ex-ample, summing the data in two fields). Sometimes it will be easier to use aspreadsheet to calculate derived variables than to modify variables using ASReml

transformations.

Note that in ASReml

• alpha and integer fields are converted to numbers before transformations areapplied,

• transformations may be applied to any data field since all data fields are storedas real numbers, but it may not be sensible to change factor level codes,

• all fields in a record are read before transformations are performed,

• transformations are performed in the order of appearance.

Transformation syntax

Transformations have one of six forms, namely

!operator to perform an operation on the current field,for example, !ABS to take absolute values,

!operator value to perform an operation involving an argu-ment on the current field, for example, !+3to add 3 to all elements in the field,

!operator Vfield to perform an operation on the current fieldusing the data in another field, for example,!-V2 to subtract field 2 from the current field

!V target to change the field modified by subsequenttransformations,

!V target = value to change all of the data in a target field toa given value,

!V target = Vfield to overwrite the data in a target field by thedata values of another field; a special case iswhen field is 0 instructing ASReml to put therecord number into the target field.

• ! flags the presence of the transformation


• operator is one of the symbols outlined in Table 5.1,

• value is the argument required by the transformation,

• V is a literal character and is followed by the number (target or field) of a datafield; the data field is used or modified depending on the context,

• Vfield may be replaced by the name of the field if the field is already named.

• in the first three forms the operation is performed on the current field; thiswill be the field associated with the label unless this field has been reset byspecifying a new target in a preceding transformation,

• the fourth form changes the focus for subsequent transformations to the fieldtarget,

• in the last two forms the operation is performed on the target field which ismodified as a consequence, for example, to create a copy of field 11, which isan existing data field, we would use !V22=V11 which copies field 11 into field22; if field 22 does not exist in the data it is created; note that subsequenttransformations on the line will also apply to the target,

• ASReml expects to read values for each variable from the data file unless thevariable and all subsequent ones are created using !=. The number of variablesread can be explicitly set using the !READ qualifier described in Table 5.5.


Table 5.1: List of transformation qualifiers and their actions with examples

qualifier argument action examples

!=, !+, !-, !*,!/

v usual arithmetic meaning; note that,0/0 gives 0 but v/0 gives a missingvalue where v is any number otherthan 0.

yield !/10

!ˆ v raises the data (which must be pos-itive) to the power v.

yield

SQRyld !=yield !^0.5

!ˆ 0 takes natural logarithms of the data(which must be positive).

yield

LNyield !=yield !^0

!ˆ −1 takes reciprocal of data (data mustbe positive).

yield

INVyield !=yield !^-1

!>, !<, !<>,!==, !<=,!>=

v logical operators forming 1 if true, 0if false.

yield

high !=yield !>10

!ABS takes absolute values - no argumentrequired.

yield

ABSyield !=yield !ABS

!ARCSIN v forms an ArcSin transformation us-ing the sample size specified in theargument which may be a numberor another field. In the side exam-ple, for two existing fields Germ andTotal containing counts, we formthe ArcSin for their ratio (ASG) bycopying the Germ field and applyingthe ArcSin transformation using theTotal field as sample size.

Germ Total

ASG !=Germ !ARCSIN Total

!COS, !SIN s takes cosine and sine of the datavariable with period s having default2π; omit s if data is in radians, sets to 360 if data is in degrees.

Day

CosDay !=Day !COS 365

!D v !Dv removes records with v or ’miss-ing value’ in the field; if !D is usedafter !A or !I, v should refer to theencoded factor level rather than thevalue in the data file (see also Sec-tion 4.2).

yield !D-9

!EXP takes antilog base e - no argumentrequired.

Rate !EXP


List of transformation qualifiers and their actions with examples


!Jddm, !Jmmd

!Jyyd

!Jddm converts a date in the formddmmccyy, ddmmyy or ddmm intodays. !Jmmd converts a date in theform ccyymmdd, yymmdd or mmddinto days. !Jyyd converts a datein the form ccyyddd or yyddd intodays. These calculate the number ofdays since December 31 1900 and arevalid for dates from January 1 1900to December 31 2099; note thatif cc is omitted it is taken as 19 if yy> 50 and 20 if yy < 50,the date string must be entirely nu-meric: characters such as / may notbe present.

!M v !Mv converts data values of v tomissing; if !M is used after !A or !I,v should refer to the encoded fac-tor level rather than the value in thedata file (see also Section 4.2).

yield !M-9

!MAX, !MIN,

!MOD

v the maximum, minimum and modu-lus of the field values and the valuev.

yield !MAX 9

!NA v replaces any missing values in thevariate with the value v.

Rate !NA 0

!SET vlist for vlist, a list of n values, the datavalues 1 . . . n are replaced by the cor-responding element from vlist; datavalues that are < 1 or > n are re-placed by zero. vlist may run overseveral lines provided each incom-plete line ends with a comma, i.e.,a comma is used as a continuationsymbol (see Other examples below).

treat !L C A B

CvR !=treat !SET 1 -1 -1

group !=treat !SET 1,

2 2 3 3 4


List of transformation qualifiers and their actions with examples


!SUB vlist replaces data values = vi with theirindex i where vlist is a vector of nvalues. Data values not found invlist are set to 0. vlist may runover several lines if necessary pro-vided each incomplete line ends witha comma. ASReml allows for a smallrounding error when matching. Itmay not distinguish properly if val-ues in vlist only differ in the sixthdecimal place (see Other examplesbelow).

year 3 !SUB 66 67 68

!SEQ replaces the data values with a se-quential number starting at 1 whichincrements whenever the data valuechanges between successive records;the current field is presumed to de-fine a factor and the number of levelsin the new factor is set to the num-ber of levels identified in this sequen-tial process (see Other examples be-low). Missing values remain miss-ing.

plot !=V3 !SEQ

!Vtarget= value assigns value to data field tar-get overwriting previous contents;subsequent transformation qualifierswill operate on data field target.

!V3=2.5

Vfield assigns the contents of data fieldfield to data field target overwritingprevious contents; subsequent trans-formation qualifiers will operate ondata field target. If field is 0 thenumber of the data record is in-serted.

!V10=V3

!V11=block

!V12=V0


Other rules and examples

Other rules include the following

• missing values are unaffected by arithmetic operations, that is, missing valuesin the current or target column remain missing after the transformation hasbeen performed except in assignment

– !+3 will leave missing values (NA, * and .) as missing,– !=3 will change missing values to 3,

• multiple arithmetic operations cannot be expressed in a complex expressionbut must be given as separate operations that are performed in sequence asthey appear, for example, yield !-120 !*0.0333 would calculate 0.0333 *(yield - 120).

ASReml code action

yield !M0 changes the zero entries in yield to missing values

yield !^0 takes natural logarithms of the yield data

score !-5 subtracts 5 from all values in score

score !SET -0.5 1.5 2.5 replaces data values of 1, 2 and 3 with -0.5, 1.5 and2.5 respectively

score !SUB -0.5 1.5 2.5 replaces data values of -0.5, 1.5 and 2.5 with 1, 2and 3 respectively; a data value of 1.51 would bereplaced by 0 since it is not in the list or very closeto a number in the list

block 8

variety 20

yield

plot * !=variety !SEQ

in the case where

– there are multiple units per plot,

– contiguous plots have different treatments, and

– the records are sorted units within plots withinblocks,

this code generates a plot factor assuming a newplot whenever the code in V2 (variety) changes; nofield would be read unless there were later definitionswhich are not created by transformation,

Var 3

Nit 4

VxN 12 !=Var !-1 !*4 !+Nit

assuming Var is coded 1:3 and Nit is coded 1:4, thissyntax could be used to create a new factor VxN withthe 12 levels of the composite Var by Nit factor.


Special note on covariates

Covariates are variates that appear as independent variables in the model. It isrecommended that covariates be centred and scaled to have a mean of zero anda variance of approximately one to avoid failure to detect singularities. This canbe achieved either

• externally to ASReml in data file preparation,

• using !- mean !/ scale where mean and scale are user supplied values, forexample, age !-200 !/10.

5.6 Data file line

The purpose of the data file line is to

• nominate the data file,

• specify qualifiers to modify

– the reading of the data,– the output produced,– the operation of ASReml , see Table 5.2.


variety !A...

row 22

column 11



Data line syntax

The data file line appears in the ASReml command file in the form

datafile [qualifiers]

• datafile is the path name of the file that contains the variates, factors, covari-ates, traits (response variates) and weight variables represented as data fields,see Chapter 4; the path name may not include embedded blanks,

• the qualifiers tell ASReml to modify either

– the reading of the data and/or the output produced, see Table 5.2 below fora list of data file related qualifiers,

– the operation of ASReml, see Tables 5.3 to 5.6 for a list of job control qual-ifiers

• the data file related qualifiers must appear on the data file line,

• the job control qualifiers may appear on the data file line or on the followingline,


• the arguments to qualifiers are represented by the following symbols

f — a filename,n — an integer number, typically a count,p — a vector of real numbers, typically in increasing order,r — a real number,s — a character string,t — a model term label,v — the number or label of a data variable,vlist — a list of variable labels.

5.7 Data file qualifiers

Table 5.2 lists the qualifiers relating to data input and output. These include!SKIP, !CSV, !FILTER, !SELECT and !PRINT. These must occur on the data filename line. Use the Index to check for examples or further discussion of thesequalifiers.

Table 5.2: Qualifiers relating to data input and output

qualifier action

Frequently used data file qualifier

!SKIP n causes the first n records of the (non-binary) data file to beignored. Typically these lines contain column headings forthe data fields.

Other data file qualifiers

!CSV used to make consecutive commas imply a missing value; thisis automatically set if the file name ends with .csv or .CSV

(see also Section 4.2)

,

!FILTER v !SELECT n enables a subset of the data to be analysed; v is the numberor name of a data field. When reading data, the value infield v is checked after any transformations are performed. If!select is omitted, records with zero in field v are omittedfrom the analysis. Otherwise, records with n in field v areretained and all other records are omitted. Warning If thefilter column contains a missing value, the value from theprevious non-missing record is assumed in that position.


5.8 Job control qualifiers

The following table provides a comprehensive list of the job control qualifiers.These change or control various aspects of the analysis. Job control qualifiersmay be placed on the data file line and the following line. Use the Index to checkfor examples or further discussion of these qualifiers.

Important Many of these are only required in very special circumstances andnew users should not attempt to understand all of them. You do need to under-stand that all general qualifiers are specified here. Many of these qualifiers arereferenced in other chapters where their purpose will be more evident.

Table 5.3: List of frequently used job control qualifiers

qualifier action

!MAXIT n sets the maximum number of iterations; the default is 10 iter-ations. ASReml iterates for n iterations unless convergence isachieved first. Convergence is presumed when the REML log-likelihood changes less than 0.002* current iteration numberand the individual variance parameter estimates change lessthan 1%.

To abort the job at the end of the current iteration, createa file named ABORTASR.NOW in the directory in which the jobis running. At the end of each iteration, ASReml checks forthis file and if present, stops the job, writing out the normalresults. On case sensitive operating systems (eg. Unix), thefilename must be upper case. Note that the ABORTASR.NOW

file is deleted so nothing of importance should be placed init.

If the job has not converged in n iterations, use the !CONTINUEqualifier to resume iterating from the current point.

If you perform a system level abort (CTRL C or close theprogram window) output files other than the .rsv file willbe incomplete. The .rsv file should still be functional forresuming iteration at the most recent parameter estimates(see !CONTINUE).

Use !MAXIT 1 where you want estimates of fixed effects andpredictions of random effects for the particular set of varianceparameters supplied as initial values. Otherwise the estimatesand predictions will be for the updated variance parameters(see also the !BLUP qualifier below).


Table 5.4: List of occasionally used job control qualifiers

qualifier action

!ASMV n indicates a multivariate analysis is required although thedata is presented in a univariate form. ’Multivariate Anal-ysis’ is used in the narrow sense where an unstructurederror variance matrix is fitted across traits, records areindependent, and observations may be missing for particulartraits, see Chapter 8 for a complete discussion.

The data is presumed arranged in lots of n records where n isthe number of traits. It may be necessary to expand the datafile to achieve this structure, inserting a missing value NA onthe additional records. This option is sometimes relevant forsome forms of repeated measures analysis. There will need tobe a factor in the data to code for trait as the intrinsic Trait

factor is undefined when the data is presented in a univariatemanner.

!ASUV indicates that a univariate analysis is required although thedata is presented in a multivariate form. Specifically, itallows you to have an error variance other than I ⊗Σ whereΣ is the unstructured variance structure (with the US vari-ance model in ASReml, see Table 7.3). If there are missingvalues in the data, you must include !f mv on the end of thelinear model. The intrinsic Trait factor is defined and maybe used in the model, see Chapter 8 for a complete discussion.

This option is used for repeated measures analysis when thevariance structure required is not the standard multivariateunstructured matrix.

!COLFAC v is used with !SECTION v and !ROWFAC v to instruct ASReml toset up R structures for analysing a multi-environment trialwith a separable first order autoregressive model for each site(environment). v is the name of a factor or variate containingcolumn numbers (1 . . . nc where nc is the number of columns)on which the data is to be sorted. See !SECTION for moredetail.


List of occasionally used job control qualifiers

qualifier action

!CONTINUE is used to restart/resume iterations from the point reachedin a previous iteration. This qualifier can alternately be setfrom the command line using the option letters C (continue)or F (final) (see Section 12.3 on job control command lineoptions). After each iteration, ASReml writes the currentvalues of the variance parameters to a file with extension.rsv (re-start values). The .rsv file contains information toidentify individual variance parameters.

The !CONTINUE qualifier causes ASReml to scan the .rsv filefor parameter values related to the current model replacingthe values obtained from the .as file before iteration resumes.If the model has changed, ASReml will pick up the values itrecognises as being for the same terms. Furthermore, AS-Reml will use estimates in the .rsv file for certain models toprovide starting values for certain more general models, in-serting reasonable defaults where necessary. The transitionsrecognised are listed and discussed in 7.10.

DIAG to FA1

DIAG to CORUH (uniform heterogeneous)CORUH to FA1

FAi to FAi+1FAi to CORGH (full heterogeneous)FAi to US (full heterogeneous)CORGH (heterogeneous) to US

See the !DOPART qualifier for how this may be exploited.

!DISPLAY n is used to select particular graphic displays.In spatial analysis, four graphic displays are possible. Codingthese1=variogram2=histogram4=row and column trends8=perspective plot of residualsset n to the sum of the codes for the desired graphics. Thedefault is 9. Note that the graphics are only displayed incompiled versions of ASReml linked with Interacter (that is,Sun and PC) versions. However, line printer versions of thesegraphics are written to the .res file. See the G commandline option (Section 12.3 on graphics) for how to save thegraphs in a file for printing.Use !NODISPLAY to suppress graphic displays.



qualifier action

!DOPART n is provided so that several analyses can be coded and runsequentially without having to edit the .as file betweenruns. Sections of the .as file can be delineated by !PART

n lines. These lines begin with the word !PART followedby a number. The lines after the !PART line are ignoredunless n on the !DOPART qualifier matches n on the pre-ceding !PART statement or there is no n on the !PART

line which means always include. By setting the !DOPART

argument to $1, the part processed can be specified at runtime from the command line (see Section 12.4). See Sec-tions 15.7 and 15.10 for examples. The default value of n is 1.

One situation where this might be useful is where it is neces-sary to run simpler models to get reasonable starting valuesfor more complex variance models. The more complex mod-els are specified in later parts and the !CONTINUE commandis used to pick up the previous estimates.

!MVINCLUDE when ASReml detects missing values in the design it will re-port this fact and abort the job unless !MVINCLUDE is specified.Otherwise, use the !D transformation to drop the records withthe missing values.

!MVREMOVE instructs ASReml to discard records which have missing valuesin the design matrix.

!NODISPLAY used to suppress the graphic display of the variogram andresiduals which is otherwise produced for spatial analyses inthe PC and SUN versions. This option is usually set on thecommand line using the option letter N (see Section 12.3 ongraphics). The display is dependent on the program beingcompiled with the Interacter library. The text version of thegraphics is still written to the .res file.

!PVAL v p provides a mechanism for nominating the particular pointsto be predicted for covariates including fac(), leg(), spl()and pol(). The points are specified here so that they can beincluded in the appropriate design matrices. v is the nameof a data field. p is the list of values at which prediction isrequired.

!PVAL f vlist is used to read predict points for several variables from a file.vlist is the names of the variables having values defined. Ifthe file contains unwanted fields, put the pseudo variate labelskip in the appropriate position in vlist to ignore them. Thefile should only have numeric values. predict points cannotbe specified for design factors.



qualifier action

!ROWFAC v is used with !SECTION v and !COLFAC v to instruct ASReml tosetup the R structures for multi-environment spatial analysis.n is the name of a factor or variate containing row numbers(1 . . . nr where nr is the number of rows) on which the datais to be sorted. See !SECTION for more detail.

!SECTION v specifies the factor in the data that defines the data sections.This qualifier enables ASReml to check that sections havebeen correctly dimensioned but does not cause ASReml tosort the data unless !ROWFAC and !COLFAC are also specified.Data is assumed to be sorted by section. The following is abasic example assuming 5 sites (sections).

When !ROWFAC v and !COLFAC v are both specified ASRemlgenerates the R structures for a standard AR ⊗ AR spatialanalysis. The R structure lines that a user would normallybe required to work out and type into the .as file (see theexample of Section 15.6) are written to the .res file. Theuser may then cut and paste them into the .as file for a laterrun if the structures need to be modified.

Basic multi-environment trial analysis

site 5 # sites coded 1 ... 5

column * # columns coded 1 ...

row * # rows coded 1 ...

variety !A # variety names

yield

met.dat !SECTION site !ROWFAC row !COLFAC col

yield ∼ site !r variety site.variety !f mv

site 2 0 # variance header line

# asreml inserts the 10 lines required to define

# the R structure lines for the five sites here



qualifier action

!SPLINE t p defines a spline model term with an explicit set of knotpoints. The basic form of the spline model term, spl(v), isdefined in Table 6.1 where v is the underlying variate. Thebasic form uses the unique data values as the knot points.The extended form is spl(v,n) which uses n knot points.Use this !SPLINE qualifier where t is spl(v,n) to supplyan explicit set of n knot points (p) for the model term t.Using the extended form without using this qualifier resultsin n equally spaced knot points being used. The !SPLINE

qualifier may only be used on a line by itself after the datafile line and before the model line.

When knot points are explicitly supplied they should be inincreasing order and adequately cover the range of the dataor ASReml will modify them before they are applied. If youchoose to spread them over several lines use a comma at theend of incomplete lines so that ASReml will to continue read-ing values from the next line of input. If the explicit pointsdo not adequately cover the range, a message is printed andthe values are rescaled unless !NOCHECK is also specified. In-adequate coverage is when the explicit range does not coverthe midpoint of the actual range. See also !KNOTS, !PVAL and!SCALE.

!STEP r reduces the update step sizes of the variance parameters. Thedefault value is the reciprocal of the square root of !MAXIT. Itmay be set between 0.01 and 1.0. The step size is increasedtowards 1 each iteration. Starting at 0.1, the sequence wouldbe 0.1, 0.32, 0.56, 1. This option is useful when you do nothave good starting values, especially in multivariate analyses.


Table 5.5: List of rarely used job control qualifiers

qualifier action

!BLUP n restricts ASReml to performing only part of the first iteration.The estimation routine is aborted aftern = 1: forming the estimates of the vector of fixed andrandom effects,n = 2: forming the estimates of the vector of fixed andrandom effects, REML log-likelihood and residuals (this isthe default),n = 3: forming the estimates of the vector of fixed andrandom effects, REML log-likelihood, residuals and inversecoefficient matrix.

ASReml prints its standard reports as if it had completed theiteration normally, but since it has not completed it, someof the information printed will be incorrect. In particular,variance information on the variance parameters will alwaysbe wrong. Standard errors on the estimates will be wrongunless n=3. Residuals are not available if n=1. Use of n=3or n=2 will halve the processing time when compared to thealternative of using !MAXIT 1. However, !MAXIT 1 does resultin a complete and correct output report.

!DATAFILE f specifies the datafile name replacing the one obtained fromthe datafile name line. It is required when different !PARTS

(see !DOPART in Table 5.4) of a job must read different files.The !SKIP qualifier, if specified, will be applied when read-ing the file.

!DENSE n reduces the number of equations solved densely. By default,sparse matrix methods are applied to the random effects andany fixed effects listed after random factors. Use !DENSE n toapply sparse methods to effects listed before !r. This qualifierwill not increase the number of dense equations (maximum of800) and individual model terms will not be split so that onlypart is in the dense section. n should be kept small (<100)for faster processing.

!DF n alters the error degrees of freedom from ν to ν − n. Thisqualifier might be used when analysing pre-adjusted data toreduce the degrees of freedom (n negative) or when weightsare used in lieu of actual data records to supply error infor-mation (n positive). The degrees of freedom is only used inthe calculation of the residual variance in a univariate singlesite analysis. The option will have no effect in analyses withmultiple error variances (for sites or traits) other than in thereported degrees of freedom. Use !ADJUST r rather than !DF

n if r is not a whole number.


List of rarely used job control qualifiers

qualifier action

!FORMAT s supplies a FORTRAN like FORMAT statement for reading fixedformat files. A simple example is !FORMAT(3I4,5F6.2) whichreads 3 integer fields and 5 floating point fields from the first42 characters of each data line. A format statement is en-closed in parentheses and may include 1 level of nested paren-theses, for example, e.g. !FORMAT(4x,3(I4,f8.2)). Fielddescriptors are

• rX to skip r character positions,

• rAw to define r consecutive fields of w characters width,

• rIw to define r consecutive fields of w characters width,and

• rFw.d to define r consecutive fields of w characters width;d indicates where to insert the decimal point if it is notexplicitly present in the field,

where r is an optional repeat count.

In ASReml, the A and I field descriptors are treated identi-cally and simply set the field width. Whether the field isinterpreted alphabetically or as a number is controlled by the!A qualifier.Other legal components of a format statement are

• the , character; required to separate fields - blanks arenot permitted in the format.

• the / character; indicates the next field is to be read fromthe next line. However a / on the end of a format to skipa line is not honoured.

• BZ; the default action is to read blank fields as missingvalues. * and NA are also honoured as missing values. Ifyou wish to read blank fields as zeros, include the stringBZ.

• the string BM; switches back to ’blank missing’ mode.

• the string Tc; moves the ’last character read’ pointer to lineposition c so that the next field starts at position c + 1.For example T0 goes back to the beginning of the line.

• the string D; invokes debug mode.

A format showing these components is!FORMAT(D,3I4,8X,A6,3(2x,F5.2)/4x,BZ,20I1) and issuitable for reading 27 fields from 2 data records such as111122223333xxxxxxxxALPHAFxx 4.12xx 5.32xx 6.32

xxxx123 567 901 345 7890

!G v is used to set a grouping variable for plotting, see !X.



qualifier action

!JOIN is used to join lines in plots, see !X.

!PRINT n causes ASReml to print the transformed data file to base-name.asp. Ifn < 0, data fields 1...mod(n) are written to the file,n = 0, nothing is written,n = 1, all data fields are written to the file if it does not exist,n = 2, all data fields are written to the file overwriting anyprevious contents,n > 2, data fields n. . . t are written to the file where t is thelast defined column.

!READ n formally instructs ASReml to read n data fields from the datafile. It is needed when there are extra columns in the datafile that are only required for combination into earlier fieldsin transformations.

!RECODE required when reading a binary data file with pedigree iden-tifiers that have not been recoded according to the pedigreefile. It is not needed when the file was formed using the !SAVEoption but will be needed if formed in some other way (seealso Section 4.2).

!REPORT forces ASReml to attempt to produce the standard output re-port when there is a failure of the iteration algorithm. Usu-ally no report is produced unless the algorithm has at leastproduced estimates for the fixed and random effects in themodel. Note that residuals are not included in the outputforced by this qualifier. This option is primarily intended tohelp debugging a job that is not converging properly.

!RESIDUALS [2] on the data file line or following lines instructs ASReml towrite the transformed data and the residuals to a binary file.The residual is the last field. The file basename.srs is writtenin single precision unless the argument is 2 in which case base-name.drs is written in double precision. The file will not bewritten from a spatial analysis (two-dimensional error) whenthe data records have been sorted into field order because theresiduals are not in the same order that the data is stored.The residual from a spatial analysis will have the units partadded to it when units is also fitted. The .drs file couldbe renamed (with extension .dbl) and used for input in asubsequent run.



qualifier action

!SAVE n on the data file line or following lines instructs ASReml towrite the data to a binary file. The file asrdata.bin is writtenin single precision if the argument n is 1 or 3; asrdata.dblis written in double precision if the argument n is 2 or 4; thedata values are written before transformation if the argumentis 1 or 2 and after transformation if the argument is 3 or 4.The default is single precision after transformation (see alsoSection 4.2).

!VCC n specifies that n constraints are to be applied to the varianceparameters. The constraint lines occur after the G structuresare defined. The constraints are described in Section 7.9.The variance header line (Section 7.4) must be present, evenif only 0 0 0 indicating there are no explicit R or G structures(see also Section 7.9).

!X v !Y v !G v !JOIN is used to plot the (transformed) data. Use !X to specifythe x variable, !Y to specify the y variable and !G to specifya grouping variable. !JOIN joins the points when the xvalue increases between consecutive records. The groupingvariable may be omitted for a simple scatter plot. Omit !Y

y produce a histogram of the x variable.

For example,

!X age !Y height !G sex

Note that the graphs are only produced in the graphicsversions of ASReml (Section 12.3).

For multivariate repeated measures data, ASReml can plotthe response profiles if the first response is nominated withthe !Y qualifier and the following analysis is of the multi-variate data. ASReml assumes the response variables are incontiguous fields and are equally spaced. For exampleResponse profiles

Treatment !A

Y1 Y2 Y3 Y4 Y5

rm.asd !Y Y1 !G Treatment !JOIN

Y1 Y2 Y3 Y4 Y5 ∼ Trait Treatment Trait.Treatment


Table 5.6: List of very rarely used job control qualifiers

qualifier action

!CINV n prints the portion of the inverse of the coefficient matrix per-taining to the nth term in the linear model. Because themodel has not been defined when ASReml reads this line, itis up to the user to count the terms in the model to iden-tify the portion of the inverse of the coefficient matrix to beprinted. The option is ignored if the portion is not whollyin the SPARSE stored equations. The portion of the inverse isprinted to a file with extension .cii The sparse form of thematrix only is printed in the form i j Cij , that is, elementsof Cij that were not needed in the estimation process are notincluded in the file.

!EXTRA n forces another mod(n,10) rounds of iteration after apparentconvergence. The default for n is 1. This qualifier has lowerpriority than !MAXIT and ABORTASR.NOW (see !MAXIT for de-tails).Convergence is judged by changes in the REML log-likelihoodvalue and variance parameters. However, sometimes the vari-ance parameter convergence criteria has not been satisfied.

!FACPOINTS n affects the number of distinct points recognised by the fac()

model function (Table 6.1). The default value of n is 1000 sothat points closer than 0.1% of the range are regarded as thesame point.

!KNOTS n changes the default knot points used when fitting a spline todata with more than n different values of the spline variable.When there are more than n (default 50) points, ASReml willdefault to using n equally spaced knot points.

!NOCHECK forces ASReml to use any explicitly set spline knot points (see!SPLINE) even if they do not appear to adequately cover thedata values.

!NOREORDER prevents the automatic reversal of the order of the fixed terms(in the dense equations) and possible reordering of terms inthe sparse equations.

!NOSCRATCH forces ASReml to hold the data in memory. ASReml will usu-ally hold the data on a scratch file rather than in memory. Inlarge jobs, the system area where scratch files are held maynot be large enough. A Unix system may put this file in the/tmp directory which may not have enough space to hold it.

!POLPOINTS n affects the number of distinct points recognised by the pol()

model function (Table 6.1). The default value of n is 1000 sothat points closer than 0.1% of the range are regarded as thesame point.


List of very rarely used job control qualifiers

qualifier action

!PPOINTS n influences the number of points used when predicting splinesand polynomials. The design matrix generated by the leg(),pol() and spl() functions are modified to include extra rowsthat are accessed by the PREDICT directive. The default valueof n is 21 if there is no !PPOINTS qualifier. The range of thedata is divided by n-1 to give a step size i. For each point pin the list, a predict point is inserted at p + i if there is nodata value in the interval [p,p+1.1×i]. !PPOINTS is ignored if!PVAL is specified for the variable. This process also effectsthe number of levels identified by the fac() model term.

!SCALE 1 When forming a design matrix for the spl() model term,ASReml uses a standardized scale (independent of the actualscale of the variable). The qualifier !SCALE 1 forces ASRemlto use the scale of the variable. The default standardisedscale is appropriate in most circumstances.

!SLOW n reduces the update step sizes of the variance parameters morepersistently than the !STEP r qualifier. If specified, ASRemllooks at the potential size of the updates and if any are large,it reduces the size of r. If n is greater than 10 ASReml alsomodifies the Information matrix by multiplying the diagonalelements by n. This has the effect of further reducing theupdates. This option may help when you do not have goodstarting values, especially in multivariate analyses.

6 Command file: Specifying theterms in the mixed model

Introduction

Specifying model formulae in ASRemlGeneral rules

Examples

Fixed terms in the modelPrimary fixed termsSparse fixed terms

Random terms in the model

Interactions and conditional factorsInteractions

Conditional factors

Alphabetic list of model functions

Weights

Missing valuesMissing values in the response

Missing values in the explanatory variables

Some technical details about model fitting in ASRemlSparse versus dense

Ordering of terms in ASRemlAliasing and singularities

Examples of aliasing

Testing of terms61


6.1 Introduction

The linear mixed model is specified in ASReml as a series of model terms andqualifiers. In this chapter the model formula syntax is described.

6.2 Specifying model formulae in ASReml

The linear mixed model is specified in AS-

Reml as a series of model terms and qualifiers.Model terms include factor and variate labels(Section 5.4), functions of labels, special termsand interactions of these. The model is speci-fied immediately after the data file and any jobcontrol qualifier lines. The syntax for specify-ing the model is


variety...

column 11


yield ∼ mu variety !r repl,

!f mv

1 2

11 column AR1 .3

22 row AR1 .3

response [!wt weight] ∼ fixed [!r random] [!f sparse fixed]

• response is the label for the response variable(s) to be analysed; multivariateanalysis is discussed in Chapter 8,

• weight is a label of a variable containing weights; weighted analysis is discussedin Section 6.7,

• ∼ separates response from the list of fixed and random terms,

• fixed represents the list of primary fixed explanatory terms, that is, variates,factors, interactions and special terms for which analysis of variance (ANOVA)type tests are required. See Table 6.1 for a brief definition of model terms,allowable operators and commonly used functions. The full definition is inSection 6.6,

• random represents the list of explanatory terms to be fitted as random effects,see Table 6.1,

• sparse fixed are additional fixed terms not included in the ANOVA table.

General rules

The following general rules apply in specifying the linear mixed model

• all elements in the model must be space separated,

• the character ∼ separates the response variables(s) from the explanatory vari-ables in the model,


• data fields are identified in the model by their labelsChoose labels

that will avoid

confusion– labels are case sensitive,– labels may be abbreviated (truncated) when used in the model line but care

must be taken that the truncated form is not ambiguous. If the truncatedform matches more than one label, the term associated with the first matchis assumed,

– model terms may only appear once in the model line; repeated occurrencesare ignored,

– model terms other than the original data fields are defined the first time theyappear on the model line. They may be abbreviated (truncated) if they arereferred to again provided no ambiguity is introduced.Important It is often clearer if labels are not abbreviated. If abbreviationsare used then they need to be chosen to avoid confusion.

• if the model is written over several lines, all but the final line must end with acomma to indicate that the list is continued.

In Tables 6.1 and 6.2, the arguments in model term functions are represented bythe following symbols

f — the label of a data variable defined as a model factor,

k, n — an integer number,

r — a real number,

t — a model term label (includes data variables),

v, y — the label of a data variable,

Parsing of model terms in ASReml is not very sophisticated. Where a model termtakes another model term as an argument, the argument must be predefined. Ifnecessary, include the argument in the model line with a leading ’-’ which willcause the term to be defined but not fitted.


Table 6.1: Summary of reserved words, allowable operators and functions

model term brief description common usage

fixed random

reservedwords

mu the constant term or intercept√

mv a term to estimate missing values√

Trait multivariate counterpart to mu√

units forms a factor with a level for eachexperimental unit

√

operators . or : placed between labels to specify aninteraction

√ √

/ placed between labels to create con-ditional terms (Section 6.5)

√ √

- placed before model terms to ex-clude them from the model

√ √

, placed at the end of a line to in-dicate that the model specificationcontinues on the next line

+ treated as a space√ √

!{ ... !} placed around some model termswhen it is important the terms notbe reordered (Section 6.4)

√

commonlyusedfunctions

at(f,n) condition on level n of factor f√ √

fac(v ) forms a factor from v with a levelfor each unique value in v

√

fac(v,y ) forms a factor with a level for eachcombination of values in v and y

lin(f ) forms a variable from the factorf with values equal to 1. . . n cor-responding to level(1). . . level(n) ofthe factor

√

spl(v [,k ]) forms the design matrix for the ran-dom component of a cubic spline forvariable v

√

otherfunctions

and(t[,r]) adds r times the design matrix formodel term t to the previous designmatrix; r has a default value of 1. Ift is complex if may be necessary topredefine it by saying -t and(t,r)


Summary of reserved words, allowable operators and functions

model term brief description common usage

fixed random

cos(v,r) forms cosine from v with period r√

giv(f,n) associates the nth .giv G-inversewith the factor f

√

ide(f) fits pedigree factor f without rela-tionship matrix

√

inv(v[,r]) forms reciprocal of v + r√

leg(v,[-]n) forms n+1 legendre polynomials oforder 0 (intercept), 1 (linear). . . nfrom the values in v; the interceptpolynomial is omitted if v is pre-ceded by the negative sign.

√

log(v[,r]) forms natural logarithm of v + r√

ma1(f) constructs MA1 design matrix forfactor f

√

ma1 forms an MA1 design matrix fromplot numbers

√

pol(v,[-]n) forms n+1 orthogonal polynomialsof order 0 (intercept), 1 (linear). . . nfrom the values in v; the interceptpolynomial is omitted if n is pre-ceded by the negative sign.

√

sin(v,r) forms sine from v with period r√

sqrt(v[,r]) forms square root of v + r√

uni(f) forms a factor with a level for eachrecord where factor f is non-zero

√

uni(f,n) forms a factor with a level for eachrecord where factor f has level n

√

xfa(f,k) is formally a copy of factor f with kextra levels. This is used when fit-ting extended factor analytic mod-els (XFA, Table 7.3) of order k in asparser form.

√


Examples

ASReml code action

yield ∼ mu variety fits a model with a constant andfixed variety effects

yield ∼ mu variety !r block fits a model with a constant term,fixed variety effects and randomblock effects

yield ∼ mu time variety time.variety fits a saturated model with fixedtime and variety main effects andtime by variety interaction effects

livewt ∼ mu breed sex breed.sex !r sire fits a model with fixed breed, sexand breed by sex interaction effectsand random sire effects

6.3 Fixed terms in the model

Primary fixed terms

The fixed list in the model formula

• describes the fixed covariates, factors andinteractions including special functions tobe included in the ANOVA table,

• generally begins with the reserved word muwhich fits a constant term, mean or inter-cept, see Table 6.1.


variety...

row 22

column 11


!mvinclude


!f mv

1 2

11 column AR1 .3

22 row AR1 .3

Table 6.1 includes the reserved words and special functions used in the fixed list.


Sparse fixed terms

The !f sparse fixed terms in model formula

• are the fixed covariates (for example, thefixed lin(row) covariate now included inthe model formula), factors and interac-tions including special functions and re-served words (for example mv, see Table6.1) for which ANOVA type tests are notrequired,

• include large (>100 levels) terms.


variety...

row 22

column 11



!f mv lin(row)

1 2

11 column AR1 .424

22 row AR1 .904

6.4 Random terms in the model


variety...

row 22

column 11



!f mv 1 2

11 column AR1 .424

22 row AR1 .904

The !r random terms in the model formula

• comprise random covariates, factors and in-teractions including special functions andreserved words, see Table 6.1,

• involve an initial non-zero variance compo-nent or ratio (relative to the residual vari-ance) default 0.1; the initial value can bespecified after the model term or if the vari-ance structure is not scaled identity, by syn-tax described in detail in Chapter 7,

• an initial value of its variance (ratio) may be followed by a !GP (restricted tobe positive, the default), !GU (unrestricted) or !GF (fixed) qualifier, see Table7.4,

• use !{ and !} to group model terms that may not be reordered. NormallyASReml will reorder the model terms in the sparse equations - putting smallerterms first to speed up calculations. However, the order must be preserved if theuser defines a structure for a term which also covers the following term(s) (a wayof defining a covariance structure across model terms). Grouping is specificallyrequired if the model terms are of differing sizes (number of effects). Forexample, for traits weaning weight and yearling weight, an animal modelwith maternal weaning weight should specify model terms!{ Trait.animal at(Trait,1).dam !}when fitting a genetic covariance between the direct and maternal effects.


6.5 Interactions and conditional factors

Interactions

• interactions are formed by joining two or more terms with a ‘.’ or a ‘:’, forexample, a.b is the interaction of factors a and b,

• interaction levels are arranged with the levels of the second factor nested withinthe levels of the first,

• labels of factors including interactions are restricted to 31 characters of whichonly the first 20 are ever displayed. Thus for interaction terms it is oftennecessary to shorten the names of the component factors in a systematic way,for example, if Time and Treatment are defined in this order, the interactionbetween Time and Treatment could be specified in the model as Time.Treat;remember that the first match is taken so that if the label of each field beginswith a different letter, the first letter is sufficient to identify the term,

• interactions can involve model functions,

• a*b is expanded to a b a.b

• a*b*c*d is expanded to a b c d a.b a.c a.d b.c b.d c.d a.b.c a.b.da.c.d b.c.d a.b.c.d

• a.(b c d) e is expanded to a.b a.c a.d e. This syntax is detected by thestring ‘.(’ and the closing parenthesis must occur on the same line and beforeany comma indicating continuation. Any number of terms may be enclosed.Each may have ‘-’ prepended to suppress it from the model but insert an extraspace before the first term if this facility is required. Each enclosed term mayhave initial values and qualifiers following. For example,

yield∼site site.(lin(row) !r variety) at(site,1).(row .3 col .2)

expands to

yield∼site site.lin(row) !r site.variety at(site,1).row .3,

at(site,1).col .2

Conditional factors

A conditional factor is a factor that is present only when another factor has aparticular level.

• individual components can be specified using the at(f,n) function (see Table6.2), for example, at(site,1).row will fit row as a factor only for site 1,

• a complete set of conditional terms are specified by joining two terms with a /


– a/b creates a series of factors representing b nested within a. b may be anypreviously defined model term. A factor is created for each level of a; eachhas the size of b. For example, if site and geno are factors with 3 and10 levels respectively, then for site/geno ASReml constructs 3 new factorssite 1.geno, site 2.geno and site 3.geno, each with 10 levels,

– the first term must be a factor,– this is similar to forming an interaction except that a separate model term is

created for each level of the first factor; this is useful for random terms wheneach component can have a different variance. The same effect is achievedby using an interaction and associating a DIAG variance structure with thefirst component (see Section 7.5).

6.6 Alphabetic list of model functions

Table 6.2 presents detailed descriptions of the model functions discussed above.Note that some three letter function names may be abbreviated to the first letter.

Table 6.2: Alphabetic list of model functions and descriptions

model function action

and(t,r)

a(t,r)

overlays (adds) r times the design matrix for model term t to the existingdesign matrix. Specifically, if the model up to this point has p effects andt has a effects, the a columns of the design matrix for t are multiplied byr and added to the last a of the p columns already defined. This can beused to force a correlation of 1 between two terms with different variancesas in a diallel analysismale and(female)

Note that if the overlaid term is complex, it must be predefined.

at(f,n) defines a binary variable which is 1 if the factor f has level n for therecord. For example, to fit a row factor only for site 3, use the expressionat(site,3).row.

cos(v,r) forms cosine from v with period r

fac(v)

fac(v,y)

fac(v) forms a factor with a level for each value of x and any addi-tional points inserted as discussed with the qualifiers !PPOINTS and !PVAL.fac(v,y) forms a factor with a level for each combination of values fromv and y. The values are reported in the .res file.

giv(f,n)

g(f,n)

associates the nth .giv G-inverse with the factor. This is used whenthere is a known (except for scale) G-structure other than the additiveinverse genetic relationship matrix. The G-inverse is supplied in a filewhose name has the file extension .giv described in Section 9.6


Alphabetic list of model functions and descriptions


ide(f)

i(f)

is used to take a copy of a pedigree factor f and fit it without the ge-netic relationship covariance. This facilitates fitting a second animal ef-fect. Thus, to form a direct, maternal genetic and maternal environmentmodel, the maternal environment is defined as a second animal effectcoded the same as dams.

inv(v[,r]) forms the reciprocal of v + r. This may also be used to transform theresponse variable.

leg(v,[-]n) forms n+1 legendre polynomials of order 0 (intercept), 1 (linear). . . n fromthe values in v; the intercept polynomial is omitted if n is preceded bythe negative sign.

lin(f)

l(f)

takes the coding of factor f as a covariate. The function is defined for fbeing a simple factor, Trait and units. The lin(f) function does notcentre or scale the variable. Motivation: Sometimes you may wish to fit acovariate as a random factor as well. If the coding is say 1. . .n, then youshould define the field as a factor in the field definition and use the lin()

function to include it as a covariate in the model. Do not centre the fieldin this case. If the covariate values are irregular, you would leave the fieldas a covariate and use the fac() function to derive a factor version.

log(v[,r]) forms the natural log of v + r. This may also be used to transform theresponse variable.

ma1

ma1(f)

creates a first-differenced (by rows) design matrix which, when defining arandom effect, is equivalent to fitting a moving average variance structurein one dimension. In the ma1 form, the first-difference operator is codedacross all data points (assuming they are in time/space order). Otherwisethe coding is based on the codes in the field indicated.

mu is used to fit the intercept/constant term. It is normally present andlisted first in the model. It should be present in the model if there are noother fixed factors or if all fixed terms are covariates or contrasts exceptin the special case of regression through the origin.




mv is used to estimate missing values in the response variable. Formallythis creates a model term with a column for each missing value. Eachcolumn contains zeros except for a solitary -1 in the record containingthe corresponding missing value. This is used in spatial analyses sothat computing advantages arising from a balanced spatial layout canbe exploited. The equations for mv and any terms that follow are alwaysincluded in the sparse set of equations.

Missing values are handled in three possible ways during analysis. Inthe simplest case, records containing missing values in the response vari-able are deleted. For multivariate (including some repeated measures)analysis, records with missing values are not deleted but ASReml dropsthe missing observation and uses the appropriate unstructured R-inversematrix. For regular spatial analysis, we prefer to retain separability andtherefore estimate the missing value(s) by including the special term mv

in the model.

pol(v,n)

p(v,n)

forms a set of orthogonal polynomials of order |n| based on the uniquevalues in variate (or factor) v and any additional interpolated points,see !PPOINTS and !PVAL in Table 5.4. It includes the intercept if n ispositive, omits it if n is negative. For example, pol(time,2) forms adesign matrix with three columns of the orthogonal polynomial of degree2 from the variable time. Alternatively, pol(time,-2) is a term with twocolumns having centred and scaled linear coefficients in the first columnand centred and scaled quadratic coefficients in the second column. Theactual values of the coefficients are written to the .res file. This factorcould be interacted with a design factor to fit random regression models.

sin(v,r) forms sine from v with period r

spl(v [,k])

s(v [,k ])

In order to fit spline models associated with a variate v and k knot pointsin ASReml, v is included as a covariate in the model and spl(v,k) asa random term. The knot points can be explicitly specified using the!SPLINE qualifier (Table 5.4). If k is specified but !SPLINE is not specified,equally spaced points are used. If k is not specified and there are less than50 unique data values, they are used as knot points. If there are morethan 50 unique points then 50 equally spaced points will be used. Thespline design matrix formed is written to the .res file. An example ofthe use of spl() is

price ∼ mu week !r spl(week)

sqrt(v[,r]) forms the square root of v + r. This may also be used to transform theresponse variable.




Trait is used with multivariate data to fit the individual trait means. It isformally equivalent to mu but Trait is a more natural label for use withmultivariate data. It is interacted with other factors to estimate theireffects for all traits.

units creates a factor with a level for every record in the data file. This is usedto fit the ’nugget’ variance when a correlation structure is applied to theresidual.

uni(f[,0[,n]]) creates a factor with a new level whenever there is a level present for thefactor f. Levels (effects) are not created if the level of factor f is 0, missingor negative. The size may be set in the third argument by setting thesecond argument to zero.

uni(f,k[,n]) creates a factor with a level for every record subject to the factor levelof f equalling k, i.e. a new level is created for the factor whenever anew record is encountered whose integer truncated data value from datafield f is k. Thus uni(site,2) would be used to create an independenterror term for site 2 in a multi-environment trial and is equivalent toat(site,2).units. The default size of this model term is the number ofdata records. The user may specify a lower number as the third argument.There is little computational penalty from the default but the .sln filemay be substantially larger than needed.

xfa(f,k) Factor analytic models are discussed in Chapter 7. There are three forms,FAk, FACVk and XFAk where k is the number of factors. The XFAk form is asparse formulation that requires an extra k levels to be inserted into themixed model equations for the k factors. This is achieved by the xfa(f,k)model function which defines a design matrix based on the design matrixfor f augmented with k columns of zeros for the k factors.

6.7 Weights

Weighted analyses are achieved by using !WT weight as a qualifier to the responsevariable. An example of this is y !WT wt ∼ mu A X where y is the name of theresponse variable and wt is the name of a variate in the data containing weights.If these are relative weights (to be scaled by the units variance) then this is allthat is required. If they are absolute weights, that is, the reciprocal of knownvariances, use the !S2==1 qualifier described in Table 7.4 to fix the unit variance.When a structure is present in the residuals the weights are applied as a matrix


product. If Σ is the structure and W is the diagonal matrix constructed from thesquare root of the values of the variate weight, then R−1 = WΣ−1W . Negativeweights are treated as zeros.

6.8 Missing values

Missing values in the response

mv controls the estimation of missing values.It is sometimes computationally convenient toestimate missing values, for example, in spa-tial analysis of regular arrays, see example 3ain Section 7.3. The default action is to excludeunits with missing values, and in multivariateanalyses, the R matrix is modified to reflectthe missing value pattern.Formally, mv creates a factor with a covari-ate for each missing value. The covariates arecoded 0 except in the record where the par-ticular missing value occurs, where it is coded-1.


variety...

row 22

column 11



!f mv

1 2

11 column AR1 .424

22 row AR1 .904

Missing values in the explanatory variables

ASReml will abort the analysis if it finds missing values in the design matrix unless!MVINCLUDE or !MVREMOVE is specified, see Section 5.8. !MVINCLUDE causes themissing value to be treated as a zero. !MVREMOVE causes ASReml to discard thewhole record. Records with missing values in particular fields can be explicitlydropped using the !D transformation, Table 5.1.

Covariates: Treating missing values as zero in covariates is usually only sensibleif the covariate is centred (has mean of zero).

Design factors: Where the factor level is zero (or missing and the !MVINCLUDEqualifier is specified), no level is assigned to the factor for that record.


6.9 Some technical details about model fitting in ASReml

Sparse versus dense

ASReml partitions the terms in the linear model into two parts: a dense set and asparse set. The partition is at the !r point unless explicitly set with the !DENSEdata line qualifier or mv is included before !r, see Table 5.5. The special term mvis always included in sparse. Thus random and sparse terms are estimated usingsparse matrix methods which result in faster processing. The inverse coefficientmatrix is fully formed for the terms in the dense set. The inverse coefficientmatrix is only partially formed for terms in the sparse set. Typically, the sparseset is large resulting in savings in memory and computing. A consequence isthat the variance matrix for estimates is only available for equations in the denseportion.

Ordering of terms in ASReml

The order in which estimates for the fixed and random effects in linear mixedmodel are reported will usually differ from the order the model terms are speci-fied. Estimates are obtained by solving the corresponding mixed model equations(Gilmour et al., 1995). ASReml partitions the equations into a dense set whichtypically includes the primary fixed factors, and a sparse set which includes therest including any missing value (mv) covariates. ASReml first orders the equationsin the sparse part to maintain as much sparcity as it can during the estimation.After absorbing them, it absorbs the model terms associated with the dense equa-tions in the order specified. Within each term it absorbs the terms starting withthe highest level. In the dense equations terms should be specified in the ‘natu-ral’ order of main effects before interactions so that the ANOVA table has correctmarginalization.

Aliasing and singularities

A singularity is reported in ASReml when there is either

• a linear dependence in the design matrix and therefore no information left toestimate the corresponding effect, or

• no data for that fixed effect.

ASReml handles singularities by setting to zero and ignoring equations detectedas aliasses. Since ASReml solves from the bottom up, the first level (and hencethe last level processed) of a factor is the one that will be declared singular and


dropped from the model. The number of singularities is reported in the .asrfile immediately prior to the REML log-likelihood (LogL) line for that iteration(see Section 13.3). The singular effects can be identified in the .sln file (see alsoSection 13.3). Estimates that are zero and have zero for their standard error arethe effects reported as singular.

Singularities in the sparse fixed terms of the model may change with changesWarning

in the random terms included in the model. If this happens it will mean thatchanges in the REML log-likelihood are not valid for testing the changes made tothe random model. This situation is not easily detected as the only evidence willbe in the .sln file where different fixed effects are singular. A likelihood ratiotest is not valid if the fixed model has changed.

Examples of aliasing

The sequence of models in Table 6.3 are presented to facilitate an understandingof over-parameterised models. It is assumed that var is a factor with 4 levels,trt with 3 levels and rep with 3 levels and that all var.trt combinations arepresent in the data.

Table 6.3: Examples of aliasing in ASReml

model number of sin-gularities

order of fitting

yield ∼ var !r rep 0 rep var

yield ∼ mu var !r rep 1 rep mu var

first level of var is aliassed and set tozero

yield ∼ var trt !r rep 1 rep var trt

var fully fitted, first level of trt isaliassed and set to zero

yield ∼ mu var trt,

var.trt !r rep

8 rep mu var trt var.trt

first levels of both var and trt arealiassed and set to zero, together withsubsequent interactions

yield ∼ mu var trt !r rep,

!f var.trt

8 [ var.trt rep ] mu var trt

var.trt fitted before mu, var and trt,var.trt fully fitted; mu, var and trt

are completely singular and set to zero.The order within [ var.trt rep ] is de-termined internally.


6.10 Testing of terms

An analysis of variance table of F ratio statistics labelled F-incr is displayed.It allows approximate testing of the term adjusted for terms not in the table orhigher in the table as portrayed in Table 6.4.

No denominator degrees of freedom is supplied since the denominator degrees offreedom is not easy to deduce in a mixed model. The Error degrees of freedomis a maximum value for the denominator but sometimes a much smaller value isappropriate (Kenward and Roger, 1997).

Table 6.4: Schematic analysis of variance table from fitting the modelyield ∼ mu A B X A.B A.X B.X A.B.X !r rep

source F-incremental

mu mu | repA A | mu rep

B B | A mu rep

X X | B A mu rep

A.B A.B | X B A mu rep

A.X A.X |A.B X B A mu rep

B.X B.X | A.X A.B X B A mu rep

A.B.X A.B.X | B.X A.X A.B X B A mu rep

7 Command file: Specifying thevariance structures

Introduction

Non-singular variance matrices

Variance model specification in ASReml

A sequence of structures for the NIN data

Variance structures

General syntaxVariance header line

R structure definitionG structure header and definition lines

Variance model descriptionForming the variance models from correlation models

Additional notes of variance models

Variance structure qualifiers

Rules for combining variance models

G structures involving more than one random term

Constraining variance parametersParameter constraint within a variance model

Constraints between and within variance models

Model building using the !CONTINUE qualifier77


7.1 Introduction

The subject of this chapter is variance model specification in ASReml. ASReml

allows a wide range of models to be fitted. The key concepts you need to under-stand are

• the mixed linear model y = Xτ + Zu + e has a residual term e ∼ N(0, R)and random effects u ∼ N(0, G),

• we use the terms R structure and G structure to refer to the independentblocks of R and G respectively,

• R and G structures are typically formed as a direct product of particularvariance models,

• variance models may be correlation matrices or variance matrices with equal orunequal variances on the diagonal. A model for a correlation matrix (eg. AR1)can be converted to an equal variance form (eg. AR1V) and to a heterogeneousvariance form (eg. AR1H),

• variances are sometimes estimated as variance ratios (relative to the residualvariance).

These issues are fully discussed in Chapter 2. In this chapter we begin by con-sidering an ordered sequence of variance structures for the NIN variety trial (seeSection 7.3). This is to introduce variance modelling in practice. We then presentthe topics in detail.

Non singular variance matrices

When undertaking the REML estimation, ASReml needs to invert each variancematrix. For this it requires that the matrices be negative definite or positivedefinite. They must not be singular. Negative definite matrices will have negativediagonal elements on the diagonal of the matrix and/or its inverse. The exceptionis the XFA model which has been specifically designed to fit singular matrices(Thompson et al. 2003).

Let x′Ax represent an arbitrary quadratic form for x = (x1, . . . , xn)′. Thequadratic form is said to be nonnegative definite if x′Ax ≥ 0 for all x ∈ Rn. Ifx′Ax is nonnegative definite and in addition the null vector 0 is the only valueof x for which x′Ax = 0, then the quadratic form is said to be positive definite.Hence the matrix A is said to be positive definite if x′Ax is positive definite, seeHarville (1997), pp 211.


7.2 Variance model specification in ASReml


variety !A...

column 11

nin89.asd !skip 1


0 0 1

repl 1

repl 0 IDV 0.1

The variance models are specified in the AS-

Reml command file after the model line, asshown in the code box. In this case just onevariance model is specified (for replicates, seemodel 2b below for details). predict andtabulate lines may appear after the modelline and before the first variance structureline. These are described in Chapter 10.

Table 7.3 presents the full range of variance models available in ASReml. Theidentifiers for specifying the individual variance models in the command file aredescribed in Section 7.5 under Specifying the variance models in ASReml. Manyof the models are correlation models. However, these are generalised to homoge-neous variance models by appending V to the base identifier. They are generalisedto heterogeneous variance models by appending H to the base identifier.

7.3 A sequence of structures for the NIN data

Eight variance structures of increasing complexity are now considered for the NIN

field trial data (see Chapter 3 for an introduction to these data). This is to giveyou a feel for variance modelling in ASReml and some of the models that arepossible. Table 7.3 lists the variance models that are available in ASReml.

Before proceeding, it is useful to link this section to the algebra of Chapter 2. InSee Section 2.1

this case the mixed linear model is

y = Xτ + Zu + e

where y is the vector of yield data, τ is a vector of fixed variety effects but wouldalso include fixed replicate effects in a simple RCB analysis and might also includefixed missing value effects when spatial models are considered, u ∼ N(0, G) isa vector of random effects (for example, random replicate effects) and the errorsare in e ∼ N(0, R). The focus of this discussion is on

• changes to u and e and the assumptions about these terms,

• the impact this has on the specification of the G structures for u and the Rstructures for e.


1 Traditional randomised complete block (RCB) analysis


variety !A

id

pid

raw

repl 4...

row 22

column 11

nin89.asd !skip 1

yield ∼ mu variety repl

The only random term in a traditional RCB

analysis of these data is the (residual) errorterm e ∼ N(0, σ2

eI224). The model thereforeinvolves just one R structure and no G struc-tures (u = 0). In ASReml

• the error term is implicit in the model andis not formally specified on the model line,

• the IID variance structure (R = σ2eI224) is

the default for error.

Important The error term is always present in the model but does not need tobe formally declared when it has the default IID structure.

2a Random effects RCB analysis


variety !A

id

pid

raw

repl 4...

row 22

column 11

nin89.asd !skip 1


The random effects RCB model has 2 randomterms to indicate that the total variation inthe data is comprised of 2 components, a ran-dom replicate effect ur ∼ N(0, γrσ

2eI4) where

γr = σ2r/σ2

e , and error as in 1. This model in-volves both the original implicit IID R struc-ture and an implicit IID G structure for therandom replicates. In ASReml

• IID variance structure is the default for theextra random terms in the model.

For this reason the only change to the original command file is the !r beforerepl. Important All random terms (other than error which is implicit) must beSee Section 6.4

written after !r in the model specification line(s).


2b Random effects RCB analysis with a G structure specified


variety !A

id

pid

raw

repl 4...

row 22

column 11

nin89.asd !skip 1


0 0 1

repl 1

4 0 IDV 0.1

This model is equivalent to 2a but we explic-itly specify the G structure for repl, that is,ur ∼ N(0, γrσ

2eI4), to introduce the syntax.

The 0 0 1 line is called the variance headerSee Section 7.4

line. In general, the first two elements of thisline refer to the R structures and the third el-ement is the number of G structures. In thiscase 0 0 tells ASReml that there are no ex-plicit R structures but there is one G structure(1). The next two lines define the G structure.See page 92

The first line, a G structure header line, linksthe structure that follows to a term in the lin-ear model (repl) and indicates that it involvesone variance model (1) (a 2 would mean that the structure was the direct productof two variance models). The second line tells ASReml that the variance modelfor replicates is IDV of order 4 (σ2

rI4). The 0.1 is a starting value for γr = σ2r/σ2

e ;a starting value must be specified. Finally, the second element (0) on the lastline of the file indicates that the effects are in standard order. There is almostalways a 0 (no sorting) in this position for G structures. The following pointsshould be noted:

• the 4 on the final line could have been written as repl to give

repl 0 IDV 0.1

This would tell ASReml that the order or dimension of the IDV variance modelis equal to the number of levels in repl (4 in this case),

• when specifying G structures, the user should ensure that one scale parameteris present. ASReml does not automatically include and estimate a scale param-eter for a G structure when the explicit G structure does not include one. Forthis reason

– the model supplied when the G structure involves just one variance modelmust not be a correlation model (all diagonal elements equal 1),See Sections 2.1

and 7.5 – all but one of the models supplied when the G structure involves more thanone variance model must be correlation models; the other must be eitheran homogeneous or a heterogeneous variance model (see Section 7.5 for thedistinction between these models; see also 5 for an example),

• an initial value must be supplied for all parameters in G structure definitions.ASReml expects initial values immediately after the variance model identifier


or on the next line (0.1 directly after IDV in this case),

– 0 is ignored as an initial value on the model line,– if ASReml does not find an initial value after the identifier, it will look on

the next line,– if ASReml does not find an initial value it will stop and give an error messageSee Chapter 14

in the .asr file,

• in this case V = σ2rZrZ

′r +σ2

eI224 which is fitted as σ2e (γrZrZ

′r + I224) where

γr is a variance ratio (γr = σ2r/σ2

e) and σ2e is the scale parameter. Thus 0.1 is

a reasonable initial value for γr regardless of the scale of the data.

3a Two-dimensional spatial model with spatial correlation in one direc-tion


variety !A

id

pid

raw

repl 4...

row 22

column 11

nin89aug.asd !skip 1

yield ∼ mu variety !f mv

1 2 0

11 column ID

22 row AR1 0.3

This syntax specifies a two-dimensional spa-tial structure for error but with spatial cor-relation in the row direction only, that is,e ∼ N(0, σ2

eI11 ⊗ Σr(ρr)). The varianceheader line tells ASReml that there is one Rstructure (1) which is a direct product of twovariance models (2); there are no G structures(0). The next two lines define the two modelsSee page 90

that form the R structure. A structure defi-nition line must be specified for each model.For V = σ2

eI11 ⊗Σ(ρr), the first matrix is anidentity matrix of order 11 for columns (ID),the second matrix is a first order autoregres-sive correlation matrix of order 22 for rows (AR1) and the variance scale parameterσ2

e is implicit. Note the following:

• placing column and row in the second position on lines 1 and 2 respectivelytells ASReml to internally sort the data rows within columns before processingthe job. This is to ensure that the data matches the direct product structurespecified. If column and row were replaced with 0 in these two lines, ASReml

would assume that the data were already sorted in this order (which is nottrue in this case),

• the 0.3 on line 2 specifies a starting autoregressive row correlation of 0.3. Notethat for spatial analysis in two dimensions using a separable model, a completematrix or array of plots must be present. To achieve this we augmented thedata with the 18 records for the missing yields as shown on page 23. In the


augmented data file the yield data for the missing plots have all been madeNA (one of the missing value indicators in ASReml) and variety has beenarbitrarily coded LANCER for all of the missing plots (any of the variety namescould have been used),

• !f mv is now included in the model specification. This tells ASReml to estimatethe missing values. The !f before mv indicates that the missing values are fixedSee Chapter 13

effects in the sparse set of terms,See Sections 6.3

and 6.9 • unlike the case with G structures, ASReml automatically includes and estimatesa scale parameter for R structures (σ2

e for V = σ2e (I11 ⊗Σ(ρr)) in this case).

This is why the variance models specified for row (AR1) and column (ID) arecorrelation models. The user could specify a non-correlation model (diagonalSee Sections 2.1

and 7.5 elements 6= 1) in the R structure definition, for example, ID could be replacedby IDV to represent V = σ2

e(σ2cI11)⊗Σ(ρr). However, IDV would then need to

be followed by !S2==1 to fix σ2e at 1 and prevent ASReml trying (unsuccessfully)

to estimate two confounded parameters: the scale parameter associated withSee Section 7.7

IDV and the implicit error variance parameter, see Section 2.1 under Combining

variance models. Specifically, the code

11 column IDV 48 !S2==1

would be required in this case, where 48 is the starting value for the variances.This complexity allows for heterogeneous error variance.

3b Two-dimensional separable autoregressive spatial model


variety !A

id...

row 22

column 11



1 2 0

11 column AR1 0.3

22 row AR1 0.3

This model extends 3a by specifying a firstorder autoregressive correlation model of or-der 11 for columns (AR1). The R structurein this case is therefore the direct product oftwo autoregressive correlation matrices thatis, V = σ2

eΣc(ρc) ⊗ Σr(ρr), giving a two-dimensional first order separable autoregres-sive spatial structure for error. The startingcolumn correlation in this case is also 0.3.Again note that σ2

e is implicit.


3c Two-dimensional separable autoregressive spatial model with mea-surement error


variety !A

id...

row 22

column 11


yield ∼ mu variety !r units,

!f mv

1 2 0

11 column AR1 0.3

22 row AR1 0.3

This model extends 3b by adding a randomunits term. ThusV = σ2

e (γηI242 + Σc(ρc)⊗Σr(ρr)) . The re-served word units tells ASReml to constructan additional random term with one level foreach experimental unit so that a second (in-dependent) error term can be fitted. A unitsterm is fitted in the model in cases like this,where a variance structure is applied to theerrors. Because a G structure is not explic-itly specified here for units, the default IDVstructure is assumed. The units term is often fitted in spatial models for fieldtrial data to allow for a nugget effect.

4 Two-dimensional separable autoregressive spatial model with randomreplicate effects


variety !A

id...

row 22

column 11



!f mv

1 2 1

11 column AR1 0.3

22 row AR1 0.3

repl 1

repl 0 IDV 0.1

This is essentially a combination of 2b and 3cto demonstrate specifying an R structure anda G structure in the same model. The varianceSee Section 7.4

header line 1 2 1 indicates that there is one Rstructure (1) that involves two variance mod-els (2) and is therefore the direct product oftwo matrices, and there is one G structure (1).The R structures are defined first so the nexttwo lines are the R structure definition linesfor e, as in 3b. The last two lines are the Gstructure definition lines for repl, as in 2b. Inthis case V = σ2

e (γrI242 + Σc(ρc)⊗Σr(ρr)) .


5 Two-dimensional separable autore-gressive spatial model defined as a G structure


variety !A

id...

row 22

column 11

nin89.asd !skip 1

yield ∼ mu variety,

!r row.column

0 0 1

row.column 2

row 0 AR1V 0.3 0.1

column 0 AR1 0.3

This model is equivalent to 3c but with thespatial model defined as a G structure ratherthan an R structure. As discussed in 2b,

• all but one of the models supplied when theSee Section 7.7

G structure involves more than one variancemodel must be correlation models,

• the other model must not be a correlationmodel, that is, the other model must beeither an homogeneous or a heterogeneousvariance model, and an initial value for thescale parameter must be supplied.

For this reason, the model for rows is nowAR1V and an initial value of 0.1 has been supplied for the scale parameter. In thiscase V = σ2

e (γrcΣc(ρc)⊗Σr(ρr) + I224) . Use of row.column as a G structure isa useful approach for analysing incomplete spatial arrays. This approach willoften run faster for large trials but requires more memory.

Note that we have used the original version of the data and !f mv is omittedImportant

from this analysis since row.column is fitted as a G structure. If we had usedthe augmented data nin89aug.asd we would still omit !f mv and ASReml woulddiscard the records with missing yield.


Table 7.1: Sequence of variance structures for the NIN field trial data

ASReml syntax extra random terms residual error term

term G structure term R structure

models models

1 2 1 2

1 yield ∼ mu variety repl - - error ID -

2a yield ∼ mu variety,

!r repl

repl IDV error ID -

2b yield ∼ mu variety,

!r repl

0 0 1

repl 1

4 0 IDV 0.1

repl IDV error ID -

3a yield ∼ mu variety,

!f mv

1 2 0

11 column ID

22 row AR1 0.3

- - column.row ID AR1

3b yield ∼ mu variety,

!f mv

1 2 0

11 column AR1 0.3

22 row AR1 0.3

- - column.row AR1 AR1

3c yield ∼ mu variety,

!r units !f mv

1 2 0

11 column AR1 0.3

22 row AR1 0.3

units IDV column.row AR1 AR1

4 yield ∼ mu variety,

!r repl !f mv

1 2 1

11 column AR1 0.3

22 row AR1 0.3

repl 1

4 0 IDV 0.1

repl IDV column.row AR1 AR1

5 yield ∼ mu variety,

!r column.row

0 0 1

column.row 2

column 0 AR1 .3

row 0 AR1V 0.3 0.1

column.row AR1 AR1V error ID -


7.4 Variance structures

The previous sections have introduced variance modelling in ASReml using theNIN data for demonstration. In this and the remaining sections the syntax isdescribed formally. However, where appropriate we continue to reference theexample.

General syntax

Variance model specification in ASReml has the following general form[variance header line[R structure definition lines ][G structure header and definition lines ][variance parameter constraints ]]

• variance header line specifies the number of R and G structures,

• R structure definition lines define the R structures (variance models for error)as specified in the variance header line,

• G structure header and definition lines define the G structures (variance modelsfor the additional random terms in the model) as specified in the varianceheader line; these lines are always placed after any R structure definition lines,

• variance parameter constraints are included if parameter constraints are to beimposed, see the !VCC c qualifier in Table 5.5 and Section 7.9 on constraintsbetween and within variance structures.

A schematic outline of the variance model specification lines (variance headerline, and R and G structure definition lines) is presented in Table 7.2 using thevariance model of 4 for demonstration.


Table 7.2: Schematic outline of variance model specification in ASReml

general syntax model 4

variance header line [s [c [g]]] 1 2 1

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -R structure definition lines S 1 C 1

C 2...C c

11 column AR1 0.3

22 row AR1 0.3...

-

S 2 C 1...C c

-...

-

......

...

S s C 1...C c

-...

-

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -G structure definition lines G 1 repl 1

4 0 IDV 0.1

G 2 -

......

G g -


Variance header line


variety !A

id...

row 22

column 11



!f mv

1 2 1

22 row AR1 0.3

11 column AR1 0.3

repl 1

repl 0 IDV 0.1

The variance header line is of the form[s [c [g]]]

• s and c relate to the R structures, g is thenumber of G structures,

• the variance header line may be omittedif the default IID R structure is required,no G structures are being explicitly definedand there are no parameter constraints (see!VCC and examples 1 and 2a),

• s is used to code the number of independentsections in the error term

– if s = 0, the default IID R structure is as-sumed and no R structure definition linesare required (as in examples 2b and 5),

– if s > 0, s R structure definitions are required, one for each of the s sections(as in examples 3a, 3b, 3c and 4),

– for the analysis of multi-section data s can be replaced by the name of afactor with the appropriate number of levels, one for each section,

• c is the number of component variance models involved in the variance struc-ture for the error term for each section; for example, 3a, 3b and 3c havecolumn.row as the error term and the variance structure for column.row in-volves 2 variance models, the first for column and the second for row,

– c has a default value of 2 when s is not specified as zero,

• g is the number of variance structures (G structures) that will be explicitlyspecified for the random terms in the model.

R and G structures are now discussed with reference to s, c and g. As alreadynoted, each variance structure may involve several variance models which relateto the individual terms involved in the random effect or error. For example, atwo factor interaction may have a variance model for each of the two factorsinvolved in the interaction. Variance models are listed in Table 7.3. As indicatedSee Table 7.3

in the discussion of 2b, care must be taken with respect to scale parameters whenSee Section 7.7

combining variance models (see also Section 7.7).


R structure definition

For each of the s sections there must be c R structure definitions. Each definitionmay take several lines. Each R structure definition specifies a variance model andhas the form


variety !A...

row 22

column 11



!f mv

1 2 1

11 column AR1 0.3

22 row AR1 0.3

repl 1

repl 0 IDV 0.1

order [field model [initial values] [qualifiers][additional initial values]]

• order is either the number of levels in thecorresponding term or the name of a factorthat has the same number of levels as theterm, for example,

11 column AR1 0.5

is equivalent to

column column AR1 0.5

when column is a factor with 11 levels,

• field is the name of the data field (variateor factor) that corresponds to the term andtherefore indexes the levels of the term;

– ASReml uses this field to sort the units so they match the R structure,– in the example the data will be sorted internally rows within columns for

the analysis but the residuals will be printed in the .yht file in the originalorder (which is actually rows within columns in this case).Important It is assumed that the joint indexing of the components uniquelydefines the experimental units,

– if field is a variable, it can be plot coordinates provided the plots are in aregular grid. Thus in this example

11 lat AR1 0.322 long AR1 0.3

is valid because lat gives column position and long gives row position, andthe positions are on a regular grid. The autoregressive correlation values willstill be on an plot index basis (1, 2, 3, . . . ), not on a distance basis (10m,20m, 30m, . . . ),

– if the data is sorted appropriately for the order the models are specified, setfield to 0,

• model specifies the variance model for theterm, for example,


22 row AR1 0.3

chooses a first order autoregressive model for the row error process,

– all the variance models available in ASReml are listed in Table 7.3,– these models have associated variance parameters,– a error variance component (σ2

e for the example, see Section 7.3) is auto-matically estimated for each section,

– the default model is ID,

• initial values are initial or starting values for the variance parameters and mustbe supplied, for example,

22 row AR1 0.3

chooses an autoregressive model for the row error process (see Table 7.1) witha starting value of 0.3 for the row correlation,

• qualifiers tell ASReml to modify the variance model in some way; the qualifiersare described in Table 7.4,

• additional initial values are read from the following lines if there are not enoughinitial values on the model line. Each variance model has a certain number ofparameters. If insufficient non zero values are found on the model line ASReml

expects to find them on the following line(s),

– initial values of 0.0 will be ignored if they are on the model line but areaccepted on subsequent lines,

– the notation n*v (for example, 5 * 0.1) is permitted on subsequent lines(but not the model line) when there are n repeats of a particular initial valuev,

– only in a few specified cases is 0 permitted as an initial value of a non-zeroparameter.


G structure header and definition lines

There are g sets of G structure definition lines and each set is of the form


variety !A

id...

row 22

column 11



!f mv

1 2 1

22 row AR1 0.3

11 column AR1 0.3

repl 1

repl 0 IDV 0.1

model term dorder [key model [initial values] [qualifier][additional initial values]]order [key model [initial values] [qualifier][additional initial values]]...order [key model [initial values] [qualifier][additional initial values]]

• model term is the term from the linearmodel to which the variance structure ap-plies; the variance structure may cover ad-ditional terms in the linear model, see Sec-tion 7.8

• d is the number of variance models andhence direct product matrices involved inthe G structure; the following lines definethe d variance models,

• order is either the number of levels in theterm or the name of a factor that has thesame number of levels as the component,

• key is usually zero but for power models (EXP, GAU,. . . ) provides the distancedata needed to construct the model,

• model is the ASReml variance model identifier/acronym selected for the term,

– variance models are listed in Table 7.3,– these models have associated variance parameters,

• initial values are initial or starting values for the variance parameters, thevalues for initial values are as described above for R structure definition lines,

• qualifier tells ASReml to modify the variance model in some way; the qualifiersare described in Table 7.4.

7.5 Variance model description

Table 7.3 presents the full range of variance models, that is, correlation, homo-geneous variance and heterogeneous variance models available in ASReml. The


table contains the model identifier, a brief description, its algebraic form and thenumber of parameters. The first section defines (BASE) correlation models andin the next section we show how to extend them to form variance models. Thesecond section defines some models parameterized as variance/covariance matri-ces rather than as correlation matrices. The third section covers some specialcases where the covariance structure is known except for the scale.

Forming variance models from correlation models

The base identifiers presented in the first part of Table 7.3 are used to specify thecorrelation models. The corresponding homogeneous and heterogeneous variancemodels are specified by appending V and H to the base identifiers respectively.This convention holds for most models. However, no V or H should be appendedto the base identifiers for the heterogeneous variance models at the end of thetable (from DIAG on).

In summary, to specify

• a correlation model, provide the base identifier given in Table 7.3, for example

EXP .1

is an exponential correlation model,

• an homogeneous variance model, append a V to the base identifier and providean additional initial value for the variance, for example,

EXPV .1 .3

is an exponential variance model,

• a heterogeneous variance model, append an H to the base identifier and provideadditional initial values for the diagonal variances, for example,

CORUH .1 .3 .4 .2

is a 3-dimensional uniform correlation variance model, heterogeneous form.

Important See Section 7.7 for rules on combining variance models and importantnotes regarding initial values.

The algebraic forms of the homogeneous and heterogeneous variance models aredetermined as follows. Let C (ω×ω) = [Cij ] denote the correlation matrix fora particular correlation model. If Σ (ω×ω) is the corresponding homogeneousvariance matrix then

Σ = σ2C.


It has just one more parameter than the correlation model. For example, thehomogeneous variance model corresponding to the ID correlation model has vari-ance matrix Σ = σ2Iω (specified IDV in the ASReml command file, see below)and one parameter. The initial values for the variance parameters are listed afterthe initial values for the correlation parameters. For example, in

AR1V 0.3 0.5

0.3 is the initial spatial correlation parameter and 0.5 is the initial varianceparameter value.

Similarly, if Σ (ω×ω)h is the heterogeneous variance matrix corresponding to C,

then

Σh = DCD

where D (ω×ω) = diag (σi) . In this case there are an additional ω parameters.For example, the heterogeneous variance model corresponding to ID has variancematrix

Σh =

σ21

0 . . . 00 σ2

2. . . 0

......

. . ....

0 0 . . . σ2ω

(specified IDH in the ASReml command file, see below) and involves the ω parametersσ2

1. . . σ2

ω.

Additional notes on variance models

These additional notes are to assist in interpreting Table 7.3.

• the IDH and DIAG models fit the same diagonal variance structure,

• the CORGH and US models fit the same completely general variance structureparameterized differently,

• in CHOLk models Σ = LDL′ where L is lower triangular with ones on thediagonal, D is diagonal and k is the number of non-zero off diagonals in L,

• in ANTEk models Σ−1 = UDU ′ where U is upper triangular with ones on thediagonal, D is diagonal and k is the number of non-zero off diagonals in U ,

• the CHOLk and ANTEk models are equivalent to the US structure, that is, thefull variance structure, when k is ω − 1,


• initial values for US, CHOL and ANTE structures are given in the form of a USmatrix which is specified lower triangle row-wise, viz

σ11

σ21 σ22

σ31 σ32 σ33

,

that is, initial values are given in the order, 1 = σ11 , 2 = σ21 , 3 = σ22 , . . .

• the US model is associated with several special features of ASReml. Whenused in the R structure for multivariate data, ASReml automatically recognisespatterns of missing values in the responses (see Chapter 8). Also, there is anoption to update its values by EM rather than AI when its AI updates makethe matrix non positive definite.

• power models rely on the definition of distance for the associated term, forexample,

– the distance between time points in a one-dimensional longitudinal analysis,– the spatial distance between plot coordinates in a two-dimensional field trial

analysis.

Information for determining distances is supplied by the key argument on thestructure line.

– For one dimensional cases, key may be* the name of a data field containing the coordinate values* 0 in which case a vector of coordinates of length order must be supplied

after all R and G structure lines.

– In two directions (IEXP, IGAU, IEUC, AEXP, AGAU) the key argument in theG structure definition has either* the form rrcc where rr is the number of a data field containing the coordi-

nates for the first dimension and cc is the number of a data field containingthe coordinates for the second direction. For example, in the analysis ofspatial data, if the x coordinate was in field 3 and the y coordinate was infield 4, the second argument would be 304.

* or when used with a factor created by the fac() model term, the syntaxwould be as demonstrated in the following example...y ∼ mu ...!r fac(x,y) ......fac(x,y) 1fac(x,y) fac(x,y) EUCV .7 1.3


• FAk, FACVk and XFAk are different parameterizations of the factor analyticmodel in which Σ is modelled as Σ = ΓΓ′ + Ψ where Γ (ω×k) is a matrix ofloadings on the covariance scale and Ψ is a diagonal vector of specific variances.See Smith et al. (2001) and Thompson et al. (2003) for examples of factoranalytic models in multi-environment trials. The general limitations are

– that Ψ may not include zeros except in the XFAk formulation– constraints are required in Γ for k > 1 for identifiability. Typically, one zero

is placed in the second column, two zeros in the third column, etc.– The total number of parameters fitted (kω+ω−k(k−1)/2) may not exceed

ω(ω + 1)/2.

• in FAk models the variance-covariance matrix Σ (ω×ω) is modelled on the cor-relation scale as Σ = DCD, where

– D (ω×ω) is diagonal,– DD = diag (Σ) ,

– C (ω×ω) is a correlation matrix of the form FF ′ + E where F (ω×k) is amatrix of loadings on the correlation scale and E is diagonal and is definedby difference so that (FF ′ + E) is a correlation matrix (diagonal elementsequal 1),

– the parameters are specified in the order loadings for each factor (F ) fol-lowed by the variances (diag (Σ)); when k is greater than 1, constraints onthe elements of F are required, see Table 7.5,

• FACVk models (CV for covariance) are an alternative formulation of FA modelsin which Σ is modelled as Σ = ΓΓ′ + Ψ where Γ (ω×k) is a matrix of loadingson the covariance scale and Ψ is diagonal

– the parameters are specified in the order loadings (Γ) followed by variances(Ψ); when k is greater than 1, constraints on the elements of Γ are required,see Table 7.5,

– the parameters in FACV are related to those in FA by Γ = DF and Ψ =DED,

• XFAk (X for extended) is the third form of the factor analytic model and hasthe same parameterisation as for FACV, that is, Σ = ΓΓ′ + Ψ. However, XFAmodels

– the parameters are specified in the order diag (Ψ) and vec(Γ); when k isgreater than 1, constraints on the elements of Γ are required, see Table 7.5,

– can only be used in G structures in combination with the xfa(f,k) modelterm,


– return the factors as well as the effects.– permit some elements of Ψ to be fixed to zero,– may not be used in R structures,– are computationally faster than the FACV formulation for large problems

when k is much smaller than ω,

Special consideration is required when using the XFAk model. The SSP mustbe expanded to have room to hold the k factors. This is achieved by using thexfa(f,k) model term in place of f in the model. For example,y ∼ site !r geno.xfa(site,2)0 0 1geno.xfa(site,2) 2genoxfa(site,2) 0 XFA2

• the OWN variance structure is a facility whereby users may specify their own vari-ance structure. This facility requires the user to supply a program MYOWNGDGthat reads the current set of parameters, forms the G matrix and a full set ofderivative matrices, and writes these to disk. Before each iteration, ASReml

writes the OWN parameters to a file, runs MYOWNGDG (which it presumes formsthe G and derivative matrix) and then reads the matrices back in. An exampleof MYOWNGDG.f90 is distributed with ASReml. It duplicates the AR1 and AR2structures. The following job fits an AR2 structure using this program.

Example of using the OWN structurerepblcolblrowvariety 25yield

shf.asd !skip 1y ~ va1 210 0 AR1 .115 0 OWN2 .2 .1 !TRR

The file written by ASReml has extension .own and looks like

15 2 10.6025860D+000.1164403D+00This file was written by asreml for reading by yourprogram MYOWNGDG


asreml writes this file, runs your program and then readsshfown.gdgwhich it presumes has the following format:The first lines should agree with the top of this filespecifying the order of the matrices ( 15)

the number of variance parameters ( 2)and a control parameter you can specify ( 1).

These are written in (3I5) format. They are followed bythe list of variance parameters written in (6D13.7) format.Follow this with 3 matrices written in (6D13.7) format.These are to be each of 120 elements being lower trianglerow-wise of the G matrix and its derivatives with respectto the parameters in turn.

This file contains details about what is expected in the file written by yourprogram. The filename used has the same basename as the job you are run-ning with extension .own for the file written by ASReml and .gdg for the fileyour program writes. The type of the parameters is set with the !T qualifierdescribed below. The control parameter is set using the !F qualifier.

– !F2 applies to OWN models. With OWN, the argument of !F is passed to theMYOWNGDG program as an argument the program can access. This is themechanism that allows several OWN models to be fitted in a single run.

– !Ts is used to set the type of the parameters. It is primarily used in con-junction with the OWN structure as ASReml knows the type in other cases.The valid type codes are as follows:

code description action if !GP is set

V variance forced positiveG variance ratio forced positiveR correlation −1 < r < 1C covarianceP positive correlation 0 < r < 1L loading

This coding also affects whether the parameter is scaled by σ2 in the outputreport.


Table 7.3: Details of the variance models available in ASReml

baseidentifier

description algebraicform

number of parameters†

corr homo’s hetero’svariance variance

Correlation models

ID identity Cii = 1, Cij = 0, i 6= j 0 1 ω

AR[1] 1st

orderautoregressive

Cii = 1, Ci+1,i = φ1

Cij = φ1Ci−1,j , i > j + 1

|φ1 | < 1

1 2 1 + ω

SAR symmetricautoregressive

Cii = 1,

Ci+1,i = φ1/(1 + φ21/4)

Cij = φ1Ci−1,j − φ21/4 Ci−2,j ,

i > j + 1

|φ1 | < 1

1 2 1 + ω

AR2 2nd

orderautoregressive

Cii = 1,

Ci+1,i = φ1/(1− φ2)

Cij = φ1Ci−1,j +φ2Ci−2,j , i > j +1

|φ1 | < (1− φ2), |φ2 | < 1

2 3 2 + ω

MA[1] 1st

ordermoving average

Cii = 1,

Ci+1,i = −θ1/(1 + θ21)

Cji = 0, j > i + 2

|θ1 | < 1

1 2 1 + ω

MA2 2nd

ordermoving average

Cii = 1,

Ci+1,i = −θ1(1− θ2)/(1 + θ21 + θ2

2)

Ci+2,i = −θ2/(1 + θ21 + θ2

2)

Cji = 0, j > i + 2

θ2 ± θ1 < 1

|θ1 | < 1, |θ2 | < 1

2 3 2 + ω

ARMA autoregressivemoving average

Cii = 1,

Ci+1,i = (θ − φ)(1− θφ)/(1 +

θ2 − 2θφ)

Cji = φCj−1,i , j > i + 1

|θ| < 1, |φ| < 1

2 3 2 + ω

CORU uniformcorrelation

Cii = 1, Cij = φ, i 6= j 1 2 1 + ω


Details of the variance models available in ASReml

baseidentifier




CORB bandedcorrelation

Cii = 1

Ci+j,i = φj , 1 ≤ j ≤ ω − 1

|φj | < 1

ω − 1 ω 2ω − 1

CORG generalcorrelationCORGH = US

Cii = 1

Cij = φij , i 6= j

|φij | < 1

ω(ω−1)2

ω(ω−1)2

+1 ω(ω−1)2

+ ω

One-dimensional power models

EXP exponential Cii = 1

Cij = φ|xi−xj |, i 6= j

xi are coordinates0 < φ < 1

1 2 1 + ω

GAU gaussian Cii = 1

Cij = φ(xi−xj)2

xi are coordinates

0 < φ < 1

1 2 1 + ω

Two-dimensional power models

IEXP isotropic expo-nential

Cii = 1

Cij = φ|xi−xj |+|yi−yj |

x and y vectors of coordinates

0 < φ < 1

1 2 1 + ω

IGAU isotropic gaus-sian

Cii = 1

Cij = φ(xi−xj)2+(yi−yj)2


0 < φ < 1

1 2 1 + ω

IEUC isotropic eu-clidean

Cii = 1

Cij = φ√

(xi−xj)2+(yi−yj)2


0 < φ < 1

1 2 1 + ω



baseidentifier




AEXP anisotropic ex-ponential

Cii = 1

Cij = φ|xi−xj |1 φ

|yi−yj |2


0 < φ1 < 1, 0 < φ2 < 1

2 3 2+ω

AGAU anisotropicgaussian

Cii = 1

Cij = φ(xi−xj)2

1 φ(yi−yj)2

2


0 < φ1 < 1, 0 < φ2 < 1

2 3 2 + ω

Additional heterogeneous variance models

DIAG diagonal = IDH Σii = φi Σij = 0, i 6= j - - ω

US unstructuredgeneral covari-ance matrix

Σij = φij - -ω(ω+1)

2

OWNk user explicitlyforms V and∂V

- - k

ANTE[1]

ANTEk1

st

kth

k orderantede-pendence

1 ≤ k ≤ ω − 1

Σ−1

= UDU ′

Dii = di , Dij = 0, i 6= j

Uii = 1, Uij = uij , 1 ≤ j − i ≤ k

U ij = 0, i > j

- -ω(ω+1)

2

CHOL[1]

CHOLk1

st

kth

k ordercholesky

1 ≤ k ≤ ω − 1

Σ = LDL′

Dii = di , Dij = 0, i 6= j

Lii = 1, Lij = lij , 1 ≤ i− j ≤ k

- -ω(ω+1)

2

FA[1]

FAk1

st

kth

k orderfactoranalytic

Σ = DCD,C = FF ′ +E,F contains k correlation factorsE diagonalDD = diag (Σ)

- - ω + ωkω + ω



baseidentifier




FACV[1]

FACVk1

st

kth

k orderfactoranalyticcovarianceform

Σ = ΓΓ′ + Ψ,Γ contains covariance factorsΨ contains specific variance

- - ω + ωkω + ω

XFA[1]

XFAk1

st

kth

k orderextendedfactoranalyticcovarianceform

Σ = ΓΓ′ + Ψ,Γ contains covariance factorsΨ contains specific variance

- - ω + ωkω + ω

Inverse relationship matrices‡

AINV inverse relationship matrix derived from pedigree 0 1 -

GIV1 generalised inverse number 1 0 1 -

......

......

...

GIV6 generalised inverse number 6 0 1 -

† This is the number of values the user must supply as initial values where ω is the dimension of thematrix. The homogeneous variance form is specified by appending V to the correlation basename; theheterogeneous variance form is specified by appending H to the correlation basename‡ These must be associated with 1 variance parameter unless used in direct product with another structurewhich provides the variance.


7.6 Variance structure qualifiers

Table 7.4 describes the R and G structure line qualifiers.

Table 7.4: List of R and G structure qualifiers

qualifier action

!=s used to constrain parameters within variance structures, see Section7.9.

!GP, !GU, !GF,

!GZ

modify the updating of the variance parameters. The exact action ofthese codes in setting bounds for parameters depends on the particularmodel.

!GP (the default in most cases) attempts to keep the parameter in thetheoretical parameter space and is activated when the update of aparameter would take it outside its space. For example, if an updatewould make a variance negative, the negative value is replaced bya small positive value. Under the !GP condition, repeated attemptsto make a variance negative are detected and the value is then fixedat a small positive value. This is shown in the output in that theparameter will have the code B rather than P appended to the value inthe variance component table.

!GU (unrestricted) does not limit the updates to the parameter. Thisallows variance parameters to go negative and correlation parametersto exceed ±1. Negative variance components may lead to problems;the mixed model coefficient matrix may become non-positive definite.In this case the sequence of REML log-likelihoods may be erratic andyou may need to experiment with starting values.

!GF fixes the parameter at its starting value

!GZ only applies to FA and FACV models and fixes the parameter in anFA factor to ZERO

For multiple parameters, the form !GXXXX can be used to specify F,

P, U or Z for the parameters individually. A shorthand notation allowsa repeat count before a code letter. Thus !GPPPPPPPPPPPPPPZPPPZP

could be written as !G14PZ3PZP.

For a US model, !GP forces ASReml to attempt to keep the matrixpositive definite. After each AI update, it extracts the eigenvalues ofthe updated matrix. If any are negative or zero, the AI update isdiscarded and an EM update is performed. Notice that the EM updateis applied to all of the variance parameters in the particular US modeland cannot be applied to only a subset of them.


List of R and G variance structure definition line qualifiers

qualifier action

!S2=value sets the value of the error variance within the section: the !S2= qualifiermay be used in R structure definitions to represent the residual vari-ance associated with the particular section. There is always an errorvariance parameter associated with a section R structure. In multiplesection analyses the !S2= qualifier is used on the first of the c lines foreach section to set the initial error variance for that section. If this isnot supplied, ASReml calculates an initial value that is half the simplevariance of the data in the site. To fix the variance !S2==v is used.

!S2==value is used in a similar manner to !S2= except that the variance is fixed atthe value.

!S2==1 is often used in the analysis of multiple section data and in multivariateanalyses and when variance parameters are included in the R structuremodels.

7.7 Rules for combining variance models

As noted in Section 2.1 under Combining variance models, variance structures aresometimes formed by combining variance models. For example, a two factor in-teraction may involve two variance models, one for each of the two factors inthe interaction. Some of the rules for combining variance models differ for Rstructures and G structures. A summary of the rules is as follows:


variety !A

id...

row 22

column 11

nin89.asd !skip 1


!f mv

0 0 1

column.row 2

column 0 AR1 0.4

row 0 ARV1 0.3 0.1

• when combining variance models in both Rand G structures, the resulting direct prod-uct structure must match the ordered ef-fects with the outer factor first, for example,the G structure in the example opposite isfor column.row which tells ASReml that thedirect product structure matches the effectsordered rows within columns. This is whythe G structure definition line for column isspecified first,

• ASReml automatically includes and esti-mates an error variance parameter for eachsection of an R structure. The variance structures defined by the user shouldtherefore normally be correlation matrices. A variance model can be specified


but the !S2==1 qualifier would then be required to fix the error variance at1 and prevent ASReml trying to estimate two confounded parameters (errorvariance and the parameter corresponding to the variance model specified, see3a on page 83),

• ASReml does not have an implicit scale parameter for G structures that aredefined explicitly. For this reason the model supplied when the G structureinvolves just one variance model must be a variance model; an initial valueSee Sections 2.1

and 7.5 must be supplied for this associated scale parameter; this is discussed underadditional initial values on page 91,

• when the G structure involves more than one variance model, one must beeither a homogeneous or a heterogeneous variance model and the rest shouldbe correlation models; if more than one are non-correlation models then the!GF qualifier should be used to avoid identifiability problems, that is, ASReml

trying to estimate confounded parameters.

7.8 G structures involving more than one random term

The usual case is that a variance structure applies to a particular term in thelinear model and that there is no covariance between model terms. Sometimes itis appropriate to include a covariance. Then, it is essential that the model termsbe listed together and that the variance structure defined for the first term be thestructure required for both terms. When the terms are of different size, the termsshould be linked together with the !{ and !} qualifiers (Table 6.1). While ASReml

will check the overall size, it does not check that the order of effects matches thestructure definition so the user must be careful to get this right. Check that theterms are conformable by considering the order of the fitted effects and ensuringthe first term of the direct product corresponds to the outer factor in the nestingof the effects. Two examples are

• random regressions where we want a covariance between intercept and slope...!r animal animal.time...animal 22 0 US 3 -.5 2animal

is equivalent (though not identical because of the scaling changes) to...


!r pol(time,1).animal...pol(time,1).animal 2pol(time,1) 0 US 1 -.1 .2animal

• maternal/direct genetic covariance

lambid !Psireid !Pdamid !P...wwt ywt ∼ Trait Trait.sex !r !{ Trait.lambid at(Trait,2).damid !}...Trait.lambid 23 0 US1.3 # Var(wwt D)1.0 2.2 # Cov(wwt D,ywt D) Var(ywt D)-.1 -.2 0.8 # Cov(wwt D,wwt M) Cov(ywt D,wwt M) Var(wwt M)lambid 0 AINV # AINV explicitly requests to use A inverse

7.9 Constraining variance parameters

Parameter constraints within a variance model

Equality of parameters in a variance model can be specified using the !=s qualifierwhere s is a string of letters and/or zeros (see Table 7.4). Positions in the stringcorrespond to the parameters of the variance model:

• all parameters with the same letter in the structure are treated as the sameparameter,

• A to Z matches a to z, so that 26 equalities at most can be formed,

• all parameters corresponding to 0 in the string are unconstrained.

Examples are presented in Table 7.5.


Table 7.5: Examples of constraining variance parameters in ASReml

ASReml code action

!=ABACBA0CBA constrain all parameters correspondingto A to be equal, similarly for B and C.The 7th parameter would be left uncon-strained. This sequence applied to anunstructured 4 × 4 matrix would makeit banded, that isAB AC B A0 C B A

site.gen 2 # G header line

site 0 US .3 !=0A0AA0 !GPUPUUP

.1 .4 .1 .1 .3

gen

this example defines a structure for thegenotype by site interaction effects in aMET in which the genotypes are inde-pendent random effects within sites butare correlated across sites with equal co-variance.

site 0 FA2 !G4PZ3P4P !=00000000VVVV

4*.9 # initial values for 1st factor0 3*.1 # initial values for 2nd factor

# first fixed at 04*.2 # init values for site variances

a 2 factor Factor Analytic model for 4sites with equal variance is specified us-ing this syntax. The first loading in thesecond factor is constrained equal to 0for identifiability. P places restrictionson the magnitude of the loadings andthe variances to be positive.

xfa(site,2) 0 XFA2 !=VVVV0 !4P4PZ3P

4*.2 # initial specific variances4*1.2 # initial loadings for 1st factor0 3*.3 # initial loadings for 2nd factor

a 2 factor Factor analytic model inwhich the specific variances are allequal.

Constraints between and within variance models

More general relationships between variance parameters can be defined using the!VCCc qualifier placed on the data file definition line.

• !VCC c specifies that there are c constraint lines defining constraints to beapplied,

• the constraint lines occur after the variance header line and any R and Gstructure lines, that is, there must be a variance header line,

• each constraint is specified in a separate line in the form


P1 ∗ V1 P2 ∗ V2 ...

– Pi is the number of a parameter and Vi is a coefficient,– P1 is the primary parameter number,– ∗ indicates that the next values (Vi) is a weighting coefficient,– if the coefficient is 1 you may omit the * 1,– if the coefficient is -1 you may write −Pi instead of Pi ∗ −1,– the meaning of the coefficients is as follows P2 = V2 ∗ P1/V1 ,

– typically V1 = 1,

– a variance parameter may only be included in constraints once,

• the Pi terms refer to positions in the full variance parameter vector. Thismay change if the model is changed. The full vector includes a term for eachfactor in the model and then a term for each parameter defined in the R andG structures. A list of Pi numbers and their initial values is returned in the.asl file to help you to check the numbers. Alternatively, examine the .asrfile from an initial run with !VCC included but no arguments supplied. Thejob will terminate but ASReml will provide the Pi values associated with eachvariance component. Otherwise the numbers are given in the .res file.

The following are examples:

ASReml code action

5 7 * .1 parameter 7 is a tenth of parameter 5

5 -7 parameter 7 is the negative of parameter 5

32 34 35 37 38 39 for a (4× 4) US matrix given by parameters 31 . . . 40,the covariances are forced to be equal.


7.10 Model building using the !CONTINUE qualifier

In complex models, the Average Information algorithm can have difficulty max-imising the REML log-likelihood when starting values are not reasonably close.ASReml has several internal strategies to cope with this problem but these arenot always successful.

When the user needs to provide better starting values, one method is to fit asimpler variance model. For example, it can be difficult to guess reasonablestarting values for an unstructured variance matrix. A first step might be toassume independence and just estimate the variances. If all the variances are notpositive, there is little point proceeding to try and estimate the covariances.

The !CONTINUE qualifier instructs ASReml to retrieve variance parameters fromthe .rsv file if it exists rather than using the values in the .asr file. When readingthe .rsv file, it will take results from some matrices as supplying starting valuesfor other matrices. The transitions recognised are

DIAG to CORUHDIAG to FA1CORUH to FA1FAi to FAi+1FAi to CORGHFAi to USCORGH to US

The use of the .rsv file with !CONTINUE in this way reduces the need for the userto type in the updated starting values.

The various models may be written in various !PART s of the job and controlledby the !DOPART qualifier. When used with the -r qualifier on the command line(see Chapter 12), the output from the various parts has the partnumber appendedto the filename. In this case it would be necessary to copy the .rsv files with thenew part number before running the next part to take advantage of this facility.

8 Command file: Multivariateanalysis

Introduction

Wether trial data

Model specification

Variance structures

Specifying multivariate variance structures in ASReml

The output for a multivariate analysis

More than two response variates

110


8.1 Introduction

Multivariate analysis is used here in the narrow sense of a multivariate mixedmodel. There are many other multivariate analysis techniques which are notcovered by ASReml. Multivariate analysis is used when we are interested inestimating the correlations between distinct traits (for example, fleece weightand fibre diameter in sheep) and for repeated measures of a single trait.

Wether trial data

Orange Wether Trial 1984-8

SheepID !I

TRIAL

BloodLine !I

TEAM *

YEAR *

GFW YLD FDIAM

wether.dat !skip 1

GFW FDIAM ∼ Trait Trait.YEAR,

!r Trait.TEAM Trait.SheepID

1 2 2

1485 0 ID

Trait 0 US .2 .2 .4

Trait.TEAM 2

Trait 0 US

0.4

0.3 1.3

TEAM 0 ID

Trait.SheepID 2

Trait 0 US

0.2 0.2 2

SheepID 0 ID

predict YEAR Trait

Three important traits for the Australian woolindustry are the weight of wool grown peryear, the cleanness and the diameter of thatwool. Much of the wool is produced fromwethers and most major producers have tradi-tionally used a particular strain or bloodline.In an effort to assess the importance of blood-line differences, many wether trials have nowbeen conducted. The trial analysed here wasconducted from 1984 to 1988. It involves 35teams of wethers representing 27 bloodlines.The teams were run together for three yearsat Borenore near Orange. The file containsgreasy fleece weight (kg), yield (percentage ofclean fleece weight to greasy fleece weight) andfibre diameter (microns).

The command file wether.as for a basic bi-variate analysis of this data is presented to theright. Below is an extract of the first few linesof the data file wether.dat.


SheepID Site Bloodline Team Year GFW Yield FD

0101 3 21 1 1 5.6 74.3 18.5

0101 3 21 1 2 6.0 71.2 19.6

0101 3 21 1 3 8.0 75.7 21.5

0102 3 21 1 1 5.3 70.9 20.8

0102 3 21 1 2 5.7 66.1 20.9

0102 3 21 1 3 6.8 70.3 22.1

0103 3 21 1 1 5.0 80.7 18.9

0103 3 21 1 2 5.5 75.5 19.9

0103 3 21 1 3 7.0 76.6 21.9...

4013 3 43 35 1 7.9 75.9 22.6

4013 3 43 35 2 7.8 70.3 23.9

4013 3 43 35 3 9.0 76.2 25.4

4014 3 43 35 1 8.3 66.5 22.2

4014 3 43 35 2 7.8 63.9 23.3

4014 3 43 35 3 9.9 69.8 25.5

4015 3 43 35 1 6.9 75.1 20.0

4015 3 43 35 2 7.6 71.2 20.3

4015 3 43 35 3 8.5 78.1 21.7

8.2 Model specification

The syntax for specifying a multivariate linear model in ASReml is

Y-variates ∼ fixed [!r random ] [!f sparse fixed ]

• Y-variates is a list of traits,

• fixed, random and sparse fixed are as in the univariate case (see Chapter 6) butinvolve the special term Trait and interactions with Trait.

The design matrix for Trait is has a level (column) for each trait.

– Trait by itself fits the mean for each variate,– In an interaction Trait.Fac fits the factor Fac for each variate andTrait.Cov fits the covariate Cov for each variate.

ASReml internally rearranges the data so that n data records containing t traitseach becomes n sets of t analysis records indexed by the internal factor Trait i.e.nt analysis records ordered Trait within data record. If the data is already inthis long form, use the !ASMV t qualifier to indicate that a multivariate analysisis required.


8.3 Variance structures

Using the notation of Chapter 7, consider a multivariate analysis with t traits andn units in which the data are ordered traits within units. An algebraic expressionfor the variance matrix in this case is

In ⊗Σ

where Σ (t×t) is an unstructured variance matrix.

Specifying multivariate variance structures in ASReml

Orange Wether Trial 1984-8

SheepID !I

TRIAL

BloodLine !I

TEAM *

YEAR *

GFW YLD FDIAM

wether.dat !skip 1

GFW FDIAM ∼ Trait Trait.YEAR,

!r Trait.TEAM Trait.SheepID

1 2 2

1485 0 ID

Trait 0 US

3*0

Trait.TEAM 2

Trait 0 US

3*0

TEAM 0 ID

Trait.SheepID 2

Trait 0 US

3*0

SheepID 0 ID

predict YEAR Trait

A more sophisticated error structure is re-quired for multivariate analysis. For a stan-dard multivariate analysis

• the error structure for the residual must bespecified as two-dimensional with indepen-dent records and an unstructured variancematrix across traits; records may have ob-servations missing in different patterns andthese are handled internally during analy-sis,

• the R structure must be ordered traitswithin units, that is, the R structure defini-tion line for units must be specified beforethe line for Trait,

• variance parameters are variances not vari-ance ratios,

• the R structure definition line for units,that is, 1485 0 ID, could be replaced by0 or 0 0 ID; this tells ASReml to fill in thenumber of units and is a useful option when the exact number of units in thedata is not known to the user,

• the error variance matrix is specified by the model Trait 0 US

– the initial values are for the lower triangle of the (symmetric) matrix speci-fied row-wise,

– finding reasonable initial values can be a problem. If initial values are writtenon the next line in the form q * 0 where q is t(t + 1)/2 and t is the numberof traits, ASReml will take half of the phenotypic variance matrix of the dataas an initial value, see .as file in code box for example,


• the special qualifiers relating to multivariate analysis are !ASUV and !ASMV t,see Table 5.4 for detail

– to use an error structure other than US for the residual stratum you mustalso specify !ASUV (see Table 5.4) and include mv in the model if there aremissing values,

– to perform a multivariate analysis when the data have already been ex-panded use !ASMV t (see Table 5.4)

– t is the number of traits that ASReml should expect,– the data file must have t records for each multivariate record althoughsome may be coded missing.

8.4 The output for a multivariate analysis

Below is the output returned in the .asr file for this analysis.

ASReml 1.0 [31 Aug 2002] Orange Wether Trial 1984-88

16 Sep 2002 16:32:50.370 64.00 Mbyte MSWindow wether

* ASReml 2002 *********************************************

* Contact [email protected] for licensing and support *


***************************************************** ARG *

Folder: C:\asr\ex\manex

BloodLine !I

QUALIFIERS: !SKIP 1

Reading wether.dat FREE FORMAT skipping 1 lines

Bivariate analysis of GFW and FDIAMtraits



1 SheepID 521 Factor 1 1 261.0956 521 0 0

2 TRIAL 2 3.000 3.000 3.000 0 0

3 BloodLine 27 Factor 3 1 13.4323 27 0 0

4 TEAM 35 Factor 4 1 18.0067 35 0 0

5 YEAR 3 Factor 5 1 2.0391 3 0 0

6 GFW 1 Variate 6 4.100 7.478 11.20 0 0

7 YLD 7 60.30 75.11 88.60 0 0

8 FDIAM 1 Variate 8 15.90 22.29 30.60 0 0

9 Trait 2 Traits/Variat

10 Trait.YEAR 6 Interaction 9 Trait : 2 5 YEAR : 3

11 Trait.TEAM 70 Interaction 9 Trait : 2 4 TEAM : 35


12 Trait.SheepID1042 Interaction 9 Trait : 2 1 SheepID : 521

1485 identity

2 UnStructure 0.2000 0.2000 0.4000

2970 records assumed sorted 2 within 1485

2 UnStructure 0.4000 0.3000 1.3000

35 identity

Structure for Trait.TEAM has 70 levels defined

2 UnStructure 0.2000 0.2000 2.0000

521 identity

Structure for Trait.SheepID has 1042 levels defined

Forming 1120 equations: 8 dense.



1 LogL=-886.521 S2= 1.0000 2964 dfconvergence

2 LogL=-818.508 S2= 1.0000 2964 df

3 LogL=-755.911 S2= 1.0000 2964 df

4 LogL=-725.374 S2= 1.0000 2964 df

5 LogL=-723.475 S2= 1.0000 2964 df

6 LogL=-723.462 S2= 1.0000 2964 df

7 LogL=-723.462 S2= 1.0000 2964 df

8 LogL=-723.462 S2= 1.0000 2964 df

Source Model terms Gamma Component Comp/SE % C

Residual UnStruct 1 0.198351 0.198351 21.94 0 Ufinal

Residual UnStruct 1 0.128890 0.128890 12.40 0 Ucomponents

Residual UnStruct 2 0.440601 0.440601 21.93 0 U

Trait.TEAM UnStruct 1 0.374493 0.374493 3.89 0 U



Trait.SheepID UnStruct 1 0.257159 0.257159 12.09 0 U



Covariance/Variance/Correlation Matrix UnStructuredResidual

0.1984 0.43600.4360 is the

0.1289 0.4406correlation

Covariance/Variance/Correlation Matrix UnStructuredTrait.TEAM

0.3745 0.5436

0.3887 1.365

Covariance/Variance/Correlation Matrix UnStructuredTrait.SheepID

0.2572 0.3124

0.2196 1.921


Analysis of Variance DF F-incr

9 Trait 2 5936.27

10 Trait.YEAR 4 1096.41

NOTICE: ASReml cannot yet determine the correct denominator DFs for




10 Trait.YEAR

2 -0.102262 0.290190E-01 -3.52

3 1.06636 0.290831E-01 36.67 42.07

5 1.17407 0.433905E-01 27.06

6 2.53439 0.434880E-01 58.28 32.85

9 Trait

7 7.13717 0.107933 66.13

8 21.0569 0.209095 100.71 78.16

11 Trait.TEAM 70 effects fitted

12 Trait.SheepID 1042 effects fitted

SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section 1

1.00 1.54

17 possible outliers: see .res file

Finished: 16 Sep 2002 16:32:54.710 LogL Converged

8.5 More than two response variates

Wolfinger rat data

treat !A

wt0 wt1 wt2 wt3 wt4

rat.dat

wt0 wt1 wt2 wt3 wt4 ∼ Trait,

treat Trait.treat

1 2 0

27 0 ID #error variance

Trait 0 US

15 * 0

Wolfinger (1996) summarises a range of vari-ance structures that can be fitted to repeatedmeasures data and demonstrates the modelsusing five weights taken weekly on 27 ratssubjected to 3 treatments. The command filefor this analysis is presented to demonstrate amultivariate analysis involving more than tworesponse variables (five in this case). Notethat the two dimensional structure for com-mon error meets the requirement of independent units and is correctly orderedtraits with units. Note also the use of 15 * 0 for the error variance matrix (15= 5(5+1)/2, see Section 8.3).

9 Command file: Genetic analysis

Introduction

The command file

The pedigree file

Reading in the pedigree file

Genetic groups

GIV files

The example

Multivariate genetic analysis

117


9.1 Introduction

In an ‘animal model’ or ‘sire model’ genetic analysis we have data on a set ofanimals that are genetically linked via a pedigree. The genetic effects are there-fore correlated and, assuming normal modes of inheritance, the correlation ex-pected from additive genetic effects can be derived from the pedigree providedall the genetic links are in the pedigree. The additive genetic relationship matrix(sometimes called the numerator relationship matrix) can be calculated from thepedigree. It is actually the inverse relationship matrix that is formed by ASReml

for analysis.

For the more general situation where the pedigree based inverse relationshipmatrix is not the appropriate/required matrix, the user can provide a particulargeneral inverse variance (GIV) matrix explicitly in a .giv file.

In this chapter we consider data presented in Harvey (1977) using the commandfile harvey.as.

9.2 The command file

Pedigree file example

animal !P

sire !A

dam

lines 2

damage

adailygain

harvey.ped !ALPHA

harvey.dat

adailygain mu lines, !r

animal 0.25

In ASReml the !P data field qualifier indicatesthat the corresponding data field has an asso-ciated pedigree. The file containing the pedi-gree (harvey.ped in the example) for animalis specified immediately prior to the data filedefinition. See below for the first 20 lines ofharvey.ped together with the correspondinglines of the data file harvey.dat. All individ-uals appearing in the data file must appear inthe pedigree file. When all the pedigree infor-mation (individual, male parent, female parent) appears as the first three fieldsof the data file, the data file can double as the pedigree file. In this examplethe line harvey.ped !ALPHA could be replaced with harvey.dat !ALPHA. Typi-cally additional individuals providing additional genetic links are present in thepedigree file.


9.3 The pedigree file

The pedigree file is used to define the genetic relationships for fitting a geneticanimal model and is required if the !P qualifier is associated with a data field.The pedigree file

• has three fields; the identity of an individual (name or tag), its sire and itsdam (or maternal grand sire, defined with the !MGS qualifier, Table 9.1); theidentity for the individual must come first in the file,

• is sorted so that the line giving the pedigree of an individual appears beforeany line where that individual appears as a parent,

• is read free format; it may be the same file as the data file if the data file isfree format and has the necessary identities in the first three fields, see below,

• is specified on the line immediately preceding the data file line in the commandfile,

• use identity 0 or * for unknown parents.

harvey.ped harvey.dat

101 SIRE 1 0

102 SIRE 1 0

103 SIRE 1 0

104 SIRE 1 0

105 SIRE 1 0

106 SIRE 1 0

107 SIRE 1 0

108 SIRE 1 0

109 SIRE 2 0

110 SIRE 2 0

111 SIRE 2 0

112 SIRE 2 0

113 SIRE 2 0

114 SIRE 2 0

115 SIRE 2 0

116 SIRE 2 0

117 SIRE 3 0

118 SIRE 3 0

119 SIRE 3 0

120 SIRE 3 0...

101 SIRE 1 0 1 3 192 390 2241

102 SIRE 1 0 1 3 154 403 2651

103 SIRE 1 0 1 4 185 432 2411

104 SIRE 1 0 1 4 183 457 2251

105 SIRE 1 0 1 5 186 483 2581

106 SIRE 1 0 1 5 177 469 2671

107 SIRE 1 0 1 5 177 428 2711

108 SIRE 1 0 1 5 163 439 2471

109 SIRE 2 0 1 4 188 439 2292

110 SIRE 2 0 1 4 178 407 2262

111 SIRE 2 0 1 5 198 498 1972

112 SIRE 2 0 1 5 193 459 2142

113 SIRE 2 0 1 5 186 459 2442

114 SIRE 2 0 1 5 175 375 2522

115 SIRE 2 0 1 5 171 382 1722

116 SIRE 2 0 1 5 168 417 2752

117 SIRE 3 0 1 3 154 389 2383

118 SIRE 3 0 1 4 184 414 2463

119 SIRE 3 0 1 5 174 483 2293

120 SIRE 3 0 1 5 170 430 2303...


9.4 Reading in the pedigree file

The syntax for specifying a pedigree file in the ASReml command file is

pedigree file [qualifiers]

• the qualifiers are listed in Table 9.1,

• the identities (individual, male parent, female parent) are merged into a singlelist and the inverse relationship is formed before the data file is read,

• when the data file is read, data fields with the !P qualifier are recoded accordingto the combined identity list,

• the inverse relationship matrix is automatically associated with factors codedfrom the pedigree file unless some other covariance structure is specified. Theinverse relationship matrix is specified with the variance model name AINV,

• the inverse relationship matrix is written to ainverse.bin,

– if ainverse.bin already exists ASReml assumes it was formed in a previousrun and has the correct inverse

– ainverse.bin is read, rather than the inverse being reformed (unless !MAKEis specified); this saves time when performing repeated analyses based on aparticular pedigree,

– delete ainverse.bin or specify !MAKE if the pedigree is changed betweenruns,

• identities are printed in the .sln file,

– identities should be whole numbers less than 200,000,000 unless ALPHA isspecified,

– pedigree lines for parents must precede their progeny,– unknown parents should be given the identity number 0,– if an individual appearing as a parent does not appear in the first column, it

is assumed to have unknown parents, that is, parents with unknown parent-age do not need their own line in the file,

– identities may appear as both male and female parents, for example, inforestry.


9.5 Genetic groups

If all individuals belong to one genetic group, then use 0 as the identity of theparents of base individuals. However, if base individuals belong to various geneticgroups this is indicated by the !GROUP qualifier and the pedigree file must beginby identifying these groups. All base individuals should have group identifiers asparents. In this case the identity 0 will only appear on the group identity lines,as in the following genetic group example.

Genetic group example

animal !P

sire 9 !A

dam

lines 2

damage

adailygain

harveyg.ped !ALPHA !MAKE !GROUP 3

harvey.dat

adailygain ∼ mu

!r animal 02.5 !GU

G1 0 0

G2 0 0

G3 0 0

SIRE 1 G1 G1

SIRE 2 G1 G1

SIRE 3 G1 G1

SIRE 4 G2 G2

SIRE 5 G2 G2

SIRE 6 G3 G3

SIRE 7 G3 G3

SIRE 8 G3 G3

SIRE 9 G3 G3

101 SIRE 1 G1

102 SIRE 1 G1

103 SIRE 1 G1...

163 SIRE 9 G3

164 SIRE 9 G3

165 SIRE 9 G3

Note that the sires in this example were from three lines. The same model canbe fitted treating these as genetic groups, see Chapter 15.

It is usually appropriate to allocate a genetic group identifier where the parentImportant

is unknown.

Table 9.1: List of pedigree file qualifiers

qualifier description

!ALPHA indicates that the identities are alphanumeric with up to 20 characters;otherwise by default they are numeric whole numbers < 200,000,000.

!GIV instructs ASReml to write out the A-inverse in the format of .giv files.


List of pedigree file qualifiers

qualifier description

!GROUPS g includes genetic groups in the pedigree. The first g lines of the pedigreeidentify genetic groups (with zero in both the sire and dam fields). Allother lines must specify one of the genetic groups as sire or dam if theactual parent is unknown.

!MAKE tells ASReml to make the A-inverse (rather than trying to retrieve it fromthe ainverse.bin file).

!MGS indicates that the third identity is the sire of the dam rather than the dam;a dam identity is created and interposed into the pedigree so that a pedigreelineanimal sire mgs

is interpreted asdam mgs 0

animal sire dam

with a new dam being constructed for each such animal.

!REPEAT tells ASReml to ignore repeat occurrences of lines in the pedigree file.Warning Use of this option will avoid the check that animals occur inchronological order, but chronological order is still required.

!SKIP n allows you to skip n header lines at the top of the file.


9.6 Reading a user defined inverse relationship matrix

Sometimes an inverse relationship matrix is required other than the one ASReml

can produce from the pedigree file. We call this a GIV (G inverse) matrix. Theuser can prepare a .giv file containing this matrix and use it in the analysis. Thesyntax for specifying a G inverse file named name.giv is

name.giv [!SKIP n ]

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1.0666667

6 5 -0.2666667

6 6 1.0666667

7 7 1.0666667

8 7 -0.2666667

8 8 1.0666667

9 9 1.0666667

10 9 -0.2666667

10 10 1.0666667

11 11 1.0666667

12 11 -0.2666667

12 12 1.0666667

• the named file must have a .giv extension,

• the G inverse files must be specified on theline(s) immediately prior to the data file lineafter any pedigree file,

• up to 6 G inverse matrices may be defined,

• the file must be free format with three num-bers per line, namely

row column value

defining the lower triangle row-wise of thematrix,

• the file must be sorted column within row,

• every diagonal element must be repre-sented; missing off-diagonal elements are assumed to be zero cells,

• the file is used by associating it with a factor in the model. The number andorder of the rows must agree with the size and order of the associated factor,

• the !SKIP n qualifier tells ASReml to skip n header lines in the file.

The .giv file presented in the code box gives the following G inverse matrix

I4 0

0 I4 ⊗[

1.067 −0.267−0.267 1.067

]

The .giv file can be associated with a factor in two ways:

• the first is to declare a G structure for the model term and to refer to the .givfile with the corresponding identifier GIV1, GIV2, GIV3, GIV4, GIV5, GIV6; forexample,


animal 1animal 0 GIV1 0.12

for a one-dimensional structure put the scale pa-rameter (0.12 in this case) after the GIVg identifier,

site.variety 2site 0 CORUH 0.58*1.5variety 0 GIV1

for a two-dimensional structure.

• the second is for one-dimensional structures; in this case the .giv structurecan be directly associated with the term using the giv(f,i) model functionwhich associates the ith .giv file with factor f, for example,

giv(animal,1) 0.12

is equivalent to the first of the preceding examples.

The example continued

Below is an extension of harvey.as to include a .giv file. A portion of harvey.givis presented to the right. In this case the G inverse matrix is simply an identitymatrix of order 74 scaled by 0.5, that is, 0.5I74 . This model is simply an examplewhich is easy to verify. Note that harvey.giv is specified on the line immediatelypreceding harvey.dat.

Model term specification associating the harvey.giv structure to the coding ofsire takes precedence over the relationship matrix structure implied by the !Pqualifier for sire. In this case, the !P is being used to amalgamate animals andsires into a single list.

command file .giv file

GIV file example

animal !P

sire !P

dam

lines 2

damage

adailygain

harvey.ped !ALPHA

harvey.giv # giv structure file

harvey.dat

adailygain ∼ mu line, !r giv(sire,1) .25

01 01 .5

02 02 .5

03 03 .5

04 04 .5

05 05 .5

.

.

.

72 72 .5

73 73 .5

74 74 .5

10 Tabulation of the data andprediction from the model

Introduction

Tabulation

Prediction

Examples

125


10.1 Introduction

Tabulation is the process of forming simple tables of averages and counts fromthe data. Such tables are useful for looking at the structure of the data andnumbers of observations associated with factor combinations. Multiple tabulatedirectives may be specified in a job.

Prediction is the process of forming a linear function of the vector of fixed andrandom effects in the linear model to obtain an estimated or predicted value for aquantity of interest. It is primarily used for predicting tables of adjusted means.If a table is based on a subset of the explanatory variables then the other variablesneed to be accounted for. It is usual to form a predicted value either at specifiedvalues of the remaining variables, or averaging over them in some way.

This chapter describes the tabulate directive and the predict directive intro-duced in Section 3.4 under Prediction. However, we first describe the process andexplain some of the concepts.

Our approach to prediction is a generalisation of that of Lane and Nelder (1982)who consider fixed effects models. They form fitted values for all combinationsof the explanatory variables in the model, then take marginal means across theexplanatory variables not relevent to the current prediction. Our case is moregeneral in that random effects can be fitted in our (mixed) models. A full de-scription can be found in Gilmour et al. (2002) and Welham et al. (2003).

Random factor terms may contribute to predictions in several ways. They maybe evaluated at a values specified by the user, they may be averaged over, or theymay be omitted from the fitted values used to form the prediction. Averagingover the set of random effects gives a prediction specific to the random effectsobserved. We describe this as a ‘conditional’ prediction. Omitting the termfrom the model produces a prediction at the population average (zero), that is,substituting the assumed population mean for an unknown random effect. Wecall this a ‘marginal’ prediction. Note that in any prediction, some terms may beevaluated as conditional and others at marginal values, depending on the aim ofprediction.

For fixed factors there is no pre-defined population average, so there is no naturalinterpretation for a prediction derived by omitting a fixed term from the fittedvalues. Averages must therefore be taken over all the levels present to give asample specific average, or value(s) must be specified by the user.


For covariate terms (fixed or random) the associated effect represents the coef-ficient of a linear trend in the data with respect to the covariate values. Theseterms should be evaluated at a given value of the covariate, or averaged over sev-eral given values. Omission of a covariate from the predictive model is equivalentto predicting at a zero covariate value, which is often inappropriate.

Interaction terms constructed from factors generate an effect for each combinationof the factor levels, and behave like single factor terms in prediction. Interactionsconstructed from covariates fit a linear trend for the product of the covariatevalues and behave like a single covariate term. An interaction of a factor and acovariate fits a linear trend for the covariate for each level of the factor. For bothfixed and random terms, a value for the covariate must be given, but the factorlevels may be evaluated at a given level, averaged over or (for random terms only)omitted.

Before considering some examples in detail, it is useful to consider the conceptualsteps involved in the prediction process. Given the explanatory variables used todefine the linear (mixed) model, the four main steps are

(a) Choose the explanatory variable(s) and their respective value(s) for whichpredictions are required; the variables involved will be referred to as the classifyset and together define the multiway table to be predicted.

(b) Determine which variables should be averaged over to form predictions. Thevalues to be averaged over must also be defined for each variable; the variablesinvolved will be referred to as the averaging set. The combination of the classifyset with these averaging variables defines a multiway hyper-table. Note thatvariables evaluated at only one value, for example, a covariate at its mean value,can be formally introduced as part of the classifying or averaging set.

(c) Determine which terms from the linear mixed model are to be used in formingpredictions for each cell in the multiway hyper-table in order to give appropriateconditional or marginal prediction.

(d) Choose the weights to be used when averaging cells in the hyper-table toproduce the multiway table to be reported.

Note that after steps (a) and (b) there may be some explanatory variables in thefitted model that do not classify the hyper-table. These variables occur in termsthat are ignored when forming the predicted values. It was deduced above that


fixed terms could not sensibly be ignored in forming predictions, so that variablesshould only be omitted from the hyper-table when they only appear in randomterms. Whether terms derived from these variables should be used when formingpredictions depends on the application and aim of prediction.

The main difference in this prediction process compared to that described by Laneand Nelder (1982) is the choice of whether to include or exclude model terms whenforming predictions. In linear models, since all terms are fixed, terms not in theclassify set must be in the averaging set.

10.2 Tabulation

A tabulate directive is provided to enable simple summaries of the data to beformed for the purpose of checking the structure of the data. Multiple tabulatestatements are permitted and they must appear immediately after the linearmodel. The tabulate statement has the form

tabulate response variables [!Wt weight !count !filter filter !select value]∼ factors

• tabulate is the directive name and must begin in column 1,

• response variables is a list of variates for which means are required,

• !Wt weight nominates a variable containing weights,

• !count requests counts as well as means to be reported,

• !filter filter nominates a factor for selecting a portion of the data,

• !select value indicates that only records with value in the filter column areto be included,

• ∼ factors identifies the factors to be used for classifying the data. Only factors(not covariates) may be nominated and no more than six may be nominated.

ASReml prints the multiway table of means omitting empty cells to a file withextension .tab.


10.3 Prediction


variety !A...

column 11

nin89.asd !skip 1


predict variety

0 0 1

repl 1

repl 0 IDV 0.1

The first step is to specify the classify set ofexplanatory variables after the predict direc-tive. The predict statement(s) may appearimmediately after the model line (before orafter any tabulate statements) or after the Rand G structure lines. The syntax is

predict factors [qualifiers]

• predict must be the first element of thepredict statement, commencing in column 1 in upper or lower case,

• factors is a list of the variables defining a multiway table to be predicted; eachvariable may be followed by a list of specific values at which the prediction isto occur

• the qualifiers instruct ASReml to modify the predictions in some way, see Table10.1 for a list of the prediction qualifiers,

• several predict statements may be specified.

ASReml parses the predict statement before fitting the model. If any syntaxproblems are encountered, these are reported in the .pvs file after which thestatement is ignored: the job is completed as if the erroneous prediction statementdid not exist. The predictions are formed as an extra process in the final iterationand are reported to the .pvs file. Consequently, aborting a run by creating theABORTASR.NOW file will cause any predict statements to be ignored.

By default, factors are predicted at each level, simple covariates are predictedat their overall mean and covariates used as a basis for splines or orthogonalpolynomials are predicted at their design points. Model terms mv and units arealways ignored.

Prediction at particular values of a covariate or particular levels of a factor isachieved by listing the values after the variate/factor name. Where there is asequence of values, use the notation a b ... n to represent the sequence of valuesfrom a to n with step size b − a. The default stepsize is 1 (in which case b maybe omitted). A colon (:) may replace the ellipsis (...). An increasing sequenceis assumed. When giving particular values for factors, use the coded level (1:n)rather than the label (alphabetical or integer).


The second step is to specify the averaging set. The default averaging set isthose explanatory variables involved in fixed effect model terms that are not inthe classifying set. By default variables that only define random model terms areignored. The qualifier !AVERAGE allows these variables to be added to the defaultaveraging set.

The third step is to determine the linear model terms to use in prediction. Thedefault rule is that all model terms based entirely on the classifying and averagingset are used. Two qualifiers allow this default set of model terms to be modifiedby adding (!USE) or removing (!IGNORE) terms. The qualifier !ONLYUSE explicitlyspecifies the model terms to use, ignoring all others. The qualifier !EXCEPT ex-plicitly specifies the model terms not to use, including all others. These qualifiersmay implicitly modify the averaging set by including variables defining terms inthe predicted model not in the classify set. It is sometimes easier to specify theclassify set and the prediction linear model and allow ASReml to construct theaveraging set.

The fourth step is to choose the weighting for forming means over dimensionsin the hyper-table. The default is to average over the specified levels but thequalifier !AVERAGE factor weights can be used to specify weights to be used inaveraging over a factor.

There are often situations in which the fixed effects design matrix X is not of fullcolumn rank. These can be classified according to the cause of aliasing.

1. linear dependencies among the model terms due to over-parameterisation ofthe model,

2. no data present for some factor combinations so that the corresponding effectscannot be estimated,

3. linear dependencies due to other, usually unexpected, structure in the data.

The first type of aliasing is imposed by the parameterisation chosen and canbe determined from the model. The second type of aliasing can be detectedwhen setting up the design matrix for parameter estimation (which may requirerevision of imposed constraints). The third type can then be detected duringthe absorption of the mixed model equations. Dependencies (aliasing) can bedealt with in several ways and ASReml checks that predictions are of estimablefunctions in the sense defined by Searle (1971, p160) and are invariant to the


constraint method used.

Normally ASReml does not print predictions of non-estimable functions but the!PRINTALL qualifier instructs ASReml to print them.

If the data has missing combinations of factors there is the possibility that thecorresponding cells of the hyper-table are not estimable and hence margins of thetable are not estimable. In some cases it is useful to construct margins of tablesbased only on those cells that have data present. The qualifier !PRESENT doesthis.

The predict statement may be continued on subsequent lines by terminating thecurrent line with a comma.

Table 10.1 is a list of the prediction qualifiers with the following syntax:

• f is an explanatory variable which is a factor,

• t is a list of terms in the fitted model,

• v is a list of explanatory variables.

Table 10.1: List of prediction qualifiers

qualifier action

Controlling formation of tables

!AVERAGE f [weights] is used to formally include a variable in the averaging set andto explicitly set the weights for averaging. Variables that onlyappear in random model terms are not included in the averagingset unless included with this qualifier. The default for weightsis equal weights. The list weights can have syntax like {3*1 0

2*1}/5 to represent the sequence 0.2 0.2 0.2 0 0.2 0.2. Thestring inside the curly brace is expanded first and the expressionn*v means n occurrences of v. A separate !AVERAGE qualifieris required for each variable requiring explicit weights or to beadded to the default averaging set.

!PARALLEL [v] without arguments means all classify variables are expanded inparallel. Otherwise up to four variables from the classify set maybe specified.


List of prediction qualifiers

qualifier action

!PRESENT v is used when averaging is to be based only on cells with data. vis a list of variables and may include variables in the classify set.v may not include variables with an explicit !AVERAGE qualifier.ASReml works out what combinations are present from the designmatrix W. At present, the model must contain the factor as asimple main effect.

Controlling inclusion of model terms

!EXCEPT t causes the prediction to include all fitted model terms not in t.

!IGNORE t causes ASReml to set up a prediction model based on the defaultrules and then removes the terms in t. This might be used to omitthe spline Lack of fit term (!IGNORE fac(x)) from predictions asin

yield ∼ mu x variety !r spl(x) fac(x)

predict x !IGNORE fac(x)

which would predict points on the spline curve averaging overvariety.

!ONLYUSE t causes the prediction to include only model terms in t. It can beused for example to form a table of slopes as in

HI ∼ mu X variety X.variety

predict variety X 1 !onlyuse X X.variety

!USE t causes ASReml to set up a prediction model based on the defaultrules and then adds the terms listed in t.

Printing

!PLOT [x] instructs ASReml to attempt a plot of the predicted values. Thisqualifier is only applicable in versions of ASReml linked with theInteracter Graphics library. If there is no argument, ASReml usesthe variable with the largest number of discrete values as the xvariable in the graph. It draws a line for each combination of theother variables.

!PRINTALL instructs ASReml to print the predicted value, even if it is notof an estimable function. By default, ASReml only prints predic-tions that are of estimable functions.

!SED is specified to request all standard errors of difference be printed.Normally only an average value is printed.

!VPV is specified to request that the variance matrix of predicted valuesbe printed to the .pvs file.


Examples

Examples are as follows:

yield ∼ mu variety !r replpredict variety

is used to predict variety means in the NIN field trial analysis. Random repl isignored in the prediction.

yield ∼ mu x variety !r replpredict variety

predicts variety means at the average of x ignoring random repl.

yield ∼ mu x variety replpredict variety x 2

forms the hyper-table based on variety and repl at the covariate value of 2 andthen averages across repl to produce variety predictions.

GFW Fdiam ∼ Trait Trait.Year !r Trait.Teampredict Trait Team

forms the hyper-table for each trait based on Year and Team with each linearcombination in each cell of the hyper-table for each trait using Team and Yeareffects. Team predictions are produced by averaging over years.

yield ∼ variety !r site.varietypredict variety

will ignore the site.variety term in forming the predictions while

predict variety !AVERAGE site

forms the hyper-table based on site and variety with each linear combinationin each cell using variety and site.variety effects and then forms averagesacross sites to produce variety predictions.

11 Functions of variancecomponents

Introduction

Syntax

Linear combinations of componentsHeritabilityCorrelation

A more detailed example

134


11.1 Introduction

F phenvar 1 + 2 # pheno var

F genvar 1 * 4 # geno var

H herit 4 3 # heritability

ASReml includes a post-analysis procedure tocalculate functions of variance components.Its intended use is when the variance compo-nents are either simple variances or are vari-ances and covariances in an unstructured matrix. The functions covered are linearcombinations of the variance components (for example, phenotypic variance), aratio of two components (for example, heritabilities) and the correlation basedon three components (for example, genetic correlation). The user must preparea .pin file. A simple sample .pin file is shown in the ASReml code box above.The .pin file specifies the functions to be calculated. The user re-runs ASReml

with the -P command line option specifying the .pin file as the input file. AS-

Reml reads the model information from the .asr and .vvp files and calculatesthe requested functions. These are reported in the .pvc file.

11.2 Syntax

Functions of the variance components are specified in the .pin file in lines of theformletter label coefficients

• letter is either F, H or R as follows

– F is for linear combinations of variance components,– H is for forming the ratio of two components,– R is for forming the correlation based on three components,

• label names the result,

• coefficients is the list of coefficients for the linear function.

Linear combinations of components




First ASReml extracts the variance compo-nents from the .asr file and their variancematrix from the .vvp file. Each linear func-tion formed by an F line is added to the list ofcomponents. Thus, the number of coefficients increases by one each line. We seekto calculate k + c′v, cov (c′v, v) and var (c′v) where v is the vector of existingvariance components, c is the vector of coefficients for the linear function andk is an optional offset which is usually omitted but would be 1 to represent the


residual variance in a probit analysis and 3.289 to represent the residual variancein a logit analysis. The general form of the directive is

F label a + b ∗ cb + c + d + m ∗ k

where a, b, c and d are subscripts to existing components va, vb, vc and vd andcb is a multiplier for vb. m is a number bigger than the current length of v to flagthe special case of adding the offset k. Where matrices are to be combined theform

F label a:b * k + c:d

can be used, as in the Coopworth data example, see page 252.

Assuming that the .pin file in the ASReml code box corresponds to a simple siremodel and that variance component 1 is the sire variance and variance component2 is the residual variance, then

F phenvar 1 + 2

gives a third component which is the sum of the variance components, that is,the phenotypic variance, and

F genvar 1 * 4

gives a fourth component which is the sire variance component multiplied by 4,that is, the genotypic variance.

Heritability




Heritabilities are requested by lines in the.pin file beginning with an H. The specificform of the directive in this case is

H label n d

This calculates σ2n/σ2

d and se[σ2n/σ2

d] where n and d are integers pointing to com-ponents vn and vd that are to be used as the numerator and denominator respec-tively in the heritability calculation.

Var(σ2n

σ2d) = (σ2

n

σ2d)2(Var(σ2

n)σ4

n+ Var(σ2

d)

σ4d

− 2Cov(σ2n,σ2

d)

σ2nσ2

d)

In the example

H herit 4 3

calculates the heritability by calculating component 4 (from second line of .pin)


/ component 3 (from first line of .pin), that is, genetic variance / phenotypicvariance.

Correlation

F phenvar 1:3 + 4:6

R phencorr 7 8 9

R gencorr 4:6

Correlations are requested by lines in the .pinfile beginning with an R. The specific form ofthe directive is

R label a ab b

This calculates the correlation r = σab/√

σ2aσ

2b and the associated standard error.

a, b and ab are integers indicating the position of the components to be used.Alternatively,

R label aa:nn

calculates the correlation r = σab/√

σ2aσ

2b for all correlations in the lower trian-

gular row-wise matrix represented by components aa to nn and the associatedstandard errors.

var (r) = r2[var

(σ2

a

)

4σ2a2 +

var(σ2

b

)

4σ2b2 +

var (σab)σ2

ab

+2cov

(σ2

a

)σ2

b

4σ2aσ

2b

− 2cov(σ2

a

)σab

2σ2aσab

− 2cov (σab) σ2b

2σabσ2b

]

In the example

R phencorr 7 8 9

calculates the phenotypic covariance by calculatingcomponent 8 /

√component 7 × component 9 where components 7, 8 and 9 are

created with the first line of the .pin file, and

R gencorr 4:6

calculates the genotypic covariance by calculatingcomponent 5 /

√component 4 × component 6 where components 4, 5 and 6 are

variance components from the analysis.


A more detailed example

F phenvar 1:3 + 4:6

F addvar 4:6 * 4

H heritA 10 7

H heritB 12 9

R phencorr 7 8 9

R gencorr 4:6

The example .pin file in this code box is a lit-tle more complicated. It relates to the bivari-ate sire model in bsiremod.as shown below.

...

Source Model terms Gamma Component Comp/SE % C.asr file




Tr.sire UnStruct 1 16.5262 16.5262 2.69 0 U

Tr.sire UnStruct 1 1.14422 1.14422 1.94 0 U

Tr.sire UnStruct 2 0.132847 0.132847 1.88 0 U...

Bivariate sire model

sire !I

ywt fat

bsiremod.asd

ywt fat ∼ Trait !r Trait.sire

1 2 1

0 # ASReml will count units

Trait 0 US

3*0

Trait.sire 2

Trait 0 US

3*0

sire

The ASReml file bsiremod.as is presented inthe code box to the right. Above is the sectionof the .asr file that displays the variance com-ponent table on which the .pin file is based.


Numbering the parameters reported in bsiremod.asr (and bsiremod.vvp)123456

error variance for ywterror covariance for ywt and faterror variance for fatsire variance component for ywtsire covariance for ywt and fatsire covariance for fat

then

F phenvar 1:3 + 4:6

creates new components 7 = 1+4, 8 = 2+5and 9 = 3+6,

F addvar 4:6 * 4

creates new components 10 = 4 × 4, 11 = 5 × 4 and 12 = 3 × 4,

H heritA 10 7

forms 10 / 7 to give the heritability for ywt,

H heritB 12 9

forms 12 / 9 to give the heritability for fat,

R phencorr 7 8 9

forms 8 /√

7 × 9, that is, the phenotypic correlation between ywt and fat,

R gencorr 4:6

forms 5 /√

4:6, that is, the genetic correlation between ywt and fat.

The result is:7 phenvar 1 42.75 6.297.pvc file

8 phenvar 2 3.995 0.6761

9 phenvar 3 1.848 0.1183

10 addvar 4 66.10 24.58

11 addvar 5 4.577 2.354

12 addvar 6 0.5314 0.2831

h2ywt = addvar 10/phenvar 7= 1.5465 0.3574

h2fat = addvar 12/phenvar 9= 0.2875 0.1430

phencorr = phenvar /SQR[phenvar *phenvar ]= 0.4495 0.0483

gencor 2 1 = Tr.si 5/SQR[Tr.si 4*Tr.si 6]= 0.7722 0.1537

12 Command file: Running the job

Introduction

The command line

Command line options

Debug command line optionsGraphics command line options

Job control command line optionsMemory command line options

Menu command line optionsNon-graphics command line options

Post-processing calculation of variance componentsExamples

Advanced processing arguments

Standard use of argumentsEntering command line arguments interactively

140


12.1 Introduction

The command line is discussed in this chapter. Command line options enablemore memory to be accessed to run the job, control some graphics output, enableuse of a simple menu and control advanced processing options.

12.2 The command line

The basic command to run ASReml is

[path]ASReml basename[.as[c]]

• path provides the path to the ASReml program which is usually called asreml.exein a PC environment. In a UNIX environment, ASReml is usually run trough ashell script called ASReml.

– if the ASReml program is in the search path then path is not required andthe word ASReml will suffice; for example

ASReml nin89.as

will run the NIN analysis,– if asreml.exe(ASReml) is not in the search path then path is required, for

example, if asreml.exe is in the subdirectory c:\Asreml then

c:\Asreml\asreml nin89.as

will run nin89.as,

• ASReml invokes the ASReml program,

• basename is the name of the .as[c] command file.

The basic command line can be extended with options and arguments to

[path] ASReml [options] basename[.as[c]] [arguments]

• options is a string preceded by a - (minus) sign. Its components control sev-eral operations (batch, graphic, memory, . . . ) at run time; for example, thecommand line

ASReml -s4 rat.as

tells ASReml to run the job rat.as with memory allocation increased to 128mb,

• arguments provide a mechanism (mostly for advanced users) to modify a jobat run time; for example, the command line


ASReml rat.as alpha beta

tells ASReml to process the job in rat.as as if it read alpha wherever $1 ap-pears in the file rat.as, beta wherever $2 appears and 0 wherever $3 appears(see below).

12.3 Command line options

Table 12.1 presents the command line options available in ASReml with briefdescriptions. Detailed descriptions follow. Command line options are not casesensitive.

Table 12.1: Command line options

option type action

Frequently used command line options

C job control continue iterations using previous estimates asinitial values

F job control continue for one more iteration using previousestimates as initial values

L non-graphics output direct screen output to file basename.asl

N graphics suppress interactive graphics

Ss memory control available memory

Other command line options

D debug invoke debug mode

E debug invoke extended debug mode

Gg graphics set interactive graphics device

I graphics permit interactive graphics to be displayed

M menu request menu mode

P post-processing calculation of function of variance components

R job control repeat run for each argument

Command line options should be combined in a single string preceded by a -


(minus) sign. For example-LNS3

could be used for this combination of options.

Debug command line options (D, E)

D and E invoke debug mode and increase the information written to the screenor .asl file. This information is not useful to most users. On Unix systems,if ASReml is crashing use the system script command to capture the screenoutput rather than using the L option, as the .asl file is not properly closedafter a crash.

Graphics command line options (G, I, N)

The I option permits the variogram and residual graphics to be displayed. Thisis the default unless the L option is specified.

The N option prevents any graphics from being displayed. This is also the defaultwhen the L option is specified.

The Gg option sets the file type for hard copy versions of the variogram andresidual graphics. Hard copy is only formed if the graphics are displayed.

Graphics are produced in the PC and SUN versions of ASReml using the Interactergraphics library. ASReml writes the graphics to file aspi.e where p indicates thepart of the job, i indicates the contents of the figure (as given in the followingtable) and e indicates the file type, for example ps for postscript:

i file contents

1 R marginal means of residuals from spatial analysis for section1

2 V variogram of residuals from spatial analysis for section 21 S residuals in field planA H histogram of residualsRv E residuals plotted against expected valuesXY G figure produced by !X, !Y and !G qualifiersPVa Predicted values plotted

The file type can be specified by following the G option by a number g, as follows:


g description

1 HP-GL2 Postscript (default for Unix)6 BMP10 Windows Print Manager11 WinMetaFile (default for Windows)12 HP-GL222 EncapsulatedPostScript

Not all of these file types are available on all systems but 2, 12 and 22 should allbe available. The file type may also be set from the user menu by selecting Opt

Form from the menu.

Job control command line options (C, F, R)

C indicates that the job is to continue iterating from the values in the .rsv file.This is equivalent to setting the !CONTINUE data file line qualifier, see Table 5.4for details.

F indicates that the job is to continue for one more iteration from the values inthe .rsv file. This is useful when using predict, see Chapter 10.

R runs ASReml once for each argument on the command line (see page 148)and includes the argument in the output file name, see the multivariate animalgenetics example of Section 15.10 at page 250.

Memory command line options (S)

Ss is used to increase memory allocation to the program. The base memoryallocated is approximately 8mb (1 million double precision words). s is a digit.The memory allocated is 2s times the base amount. The sequence is 8, 32, 64,128, 256, 512, 1024,. . .

If your system cannot provide the requested memory, the request will be dimin-ished until it can be satisfied. On multi-user systems, do not make s larger thannecessary or other users may complain.

The memory requirements depend on problem size and may be quite large. IfSs is not active (it is compiler dependent) you will need to request a version ofASReml compiled with a larger workspace.


Menu command line options (M)

M runs ASReml in menu mode. This mode is only available for the Microsoft

Windows, Intel Linux and Sun Solaris versions of ASReml as these are linked withthe Interacter graphics library. Menu mode is particularly intended to makeASReml easier to use under Windows. The menu enables you to edit the commandfile, run the job and view the output from within ASReml. Otherwise these areseparate operations from the command line. Menu mode is suitable for regularanalyses but large/long runs are better run from the command line as informationreporting the progress of the job is not readily visible when running in menu mode.The primary menu options are

• File

– when menu mode is invoked without a filename argument, the File optionis invoked to input a file name,

– normally you will open an existing .as command file but you may nominatea .dat, .asd or .csv file in which case ASReml will create a .as templatefile for you to edit

– this is helpful if the data file contains field names on the first line,– the .as file is displayed in an edit window,

• Edit

– invokes a primitive editor for you to edit the .as file and then run the job;you must explicitly save the file before running it,

• Run

– this runs the current job and returns the .asl (if L option is active) or .asrfile to a scroll screen for the user to scan or print,

• View

– this invokes a submenu which allows you to browse the .asr, .asl, .sln,.res or .pvs files,

– the browse menu allows you to search for particular text strings,

• Opt

– displays the current state of the C, D, E, G, L and R command line optionsand allows you to toggle their state,

– the graphics line shows the current hard copy graphics file type,

• Help

– displays a syntax summary in a browse window,


• Exit

– closes ASReml down,

• Quit

– returns to the previous menu,

• Info

– describes the main menu.

Log file (L)

L causes most of the screen output to be directed to a file with the file extension.asl. This is desirable for batch processing. It also suppresses the interactivegraphics (available for spatial analyses) unless the I option is also specified.

Post-processing calculation of variance components (P)

P invokes the post-processing sub-program, see Chapter 11; this sub-program isused to calculate functions of variance components from the information in the.asr and .vvp files and it is run after the model has been fitted.


Examples

ASReml code action

asreml -L rat.as send screen output to rat.asl and suppress interactive graphics

asreml -LS3 rat.as increase memory to 8 (= 23) times the base amount

asreml -IL rat.as send screen output to rat.asl but display interactive graphics

asreml -N rat.as allow screen output but suppress interactive graphics

asreml -ILS2 rat.as increase memory to 4 times the base amount, send screen outputto rat.asl but display interactive graphics

asreml -MS2 rat.as invoke Menu model to edit and run the rat.as job using 32Mbworkspace

12.4 Advanced processing arguments

Standard use of arguments

Command line arguments are intended to facilitate the running of a sequence ofjobs that require small changes to the command file between runs. The outputfile name is modified by the use of this feature if the -R option is specified. Thisuse is demonstrated in the Coopworth example of Section 15.10, see page 250.

Command line arguments are strings listed on the command line after basename,the command file name. These strings are inserted into the command file atrun time. When the input routine finds a $n in the command file it substitutesthe nth argument string from the command line. n may take the values 1. . .9to indicate up to 9 strings after the command file name. If the argument has1 character a trailing blank is attached to the character and inserted into thecommand file. If no argument exists, a zero is inserted. For example,

asreml rat.as alpha beta

tells ASReml to process the job in rat.as as if it read alpha wherever $1 appearsin the command file, beta wherever $2 appears and 0 wherever $3 appears.


Table 12.2: The use of arguments in ASReml

in command file on command line becomes in ASReml run

abc$1def no argument abc0 def

abc$1def with argument X abcX def

abc$1def with argument XY abcXYdef

abc$1def with argument XYZ abcXYZdef

abc$1 def with argument XX abcXX def

abc$1 def with argument XXX abcXXX def

abc$1 def

(multiple spaces)with argument XXX abcXXX def

Recycle

Recycle mode is invoked with the R option. Assuming there are say 4 argu-ments 11 22 33 44, the job is run 4 times: in the first run the arguments are11 22 33 44; in the second run the arguments are 22 33 44; in the third runthe arguments are 33 44; in the fourth run the argument is 44. Usually thismode is used with $1 in the command file so that only the first of the argumentsis honoured in a particular run. This is mainly used in conjunction with the!DOPART qualifier. Note that the names of the output files will be formed usinga root based on basename and the current first argument so that they are notoverwritten.

Entering command line arguments interactively

Another way to gain some interactive control of a job in the PC environment isto insert !?{text} in the .as file where you want to specify the rest of the line atrun time. ASReml prompts with text and waits for a response which is used tocompete the line. The !? qualifier may be used anywhere in the job and the lineis modified from that point.

Unfortunately the prompt may not appear on the top screen under some windowsWarningoperating systems in which case it may not be obvious that ASReml is waitingfor a keyboard response.

13 Description of output files

Introduction

An example

Key output files

The .asr fileThe .sln fileThe .yht file

Other ASReml output files

The .res fileThe .vrb fileThe .vvp fileThe .rsv fileThe .dpr fileThe .pvc fileThe .pvs fileThe .tab file

ASReml output objects and where to find them

149


13.1 Introduction

With each ASReml run a number of output files are produced. ASReml generatesthe output files by appending various filename extensions to basename. A briefdescription of the filename extensions is presented in Table 13.1. The .sln and.yht files, in particular, are often quite large and could fill up your disk space.You should therefore regularly tidy your working directories.

Table 13.1: Summary of ASReml output files

file description

Key output files

.asr contains a summary of the data and analysis results.

.sln contains the estimates of the fixed and random effects and theircorresponding standard errors.

.yht contains the predicted values, residuals and diagonal elementsof the hat matrix for each data point.

Other output files

.asl contains a progress log and error messages if the L commandline options is specified.

.asp contains transformed data, see !PRINT in Table 5.2.

.dbr/.dpr/.spr contains the data and residuals in a binary form for furtheranalysis (see !RESIDUALS, Table 5.5).

.pvc contains the report produced with the P option.

.pvs contains predictions formed by the predict directive.

.res contains information from using the pol(), spl() and fac()

functions, the iteration sequence for the variance componentsand some statistics derived from the residuals to assist withmodel selection.

.rsv contains the final parameter values for reading back if the!CONTINUE qualifier is invoked, see Table 5.4.

.tab contains tables formed by the tabulate directive.

.veo holds the equation order to speed up re-running big jobs whenthe model is unchanged. This is a binary file of no use to theuser.

.vrb contains the estimates of the fixed effects and their variance.

.vvp contains the approximate variances of the variance parameters.It is designed to be read back with the P option for calculatingfunctions of the variance parameters.


Summary of ASReml output files

file description

13.2 An example


variety !A

id

raw

repl 4

nloc

yield

lat

long

row 22

column 11

nin89.asd !skip 1


predict variety

1 2 0

22 row AR1 0.3

11 column AR1 0.3

In this chapter the ASReml output files arediscussed with reference to a two-dimensionalseparable autoregressive spatial analysis of theNIN field trial data, see model 3b on page 83 ofChapter 7 for details. The ASReml commandfile for this analysis is presented to the right.Recall that this model specifies a separable au-toregressive correlation structure for residualor plot errors that is the direct product of anautoregressive correlation matrix of order 22for rows and an autoregressive correlation ma-trix of order 11 for columns. In this case 0.3 isthe starting correlation for both columns androws.

13.3 Key output files

The key ASReml output files are the .asr, .sln and .yht files.

The .asr file

This file contains

• a general announcements box (outlined in asterisks) containing current mes-sages,

• a summary of the data to validate the specification of the model,

• a summary of the fitting process to check convergence,

• a summary of the variance parameters:

– The Gamma column reports the actual parameter fitted,


– the Component column reports the gamma converted to a variance scale ifappropriate,

– Comp/SE is the ratio of the component relative to the square root of thediagonal element of the inverse of the average information matrix Warning

Comp/SE should not be used for formal testing,– The % shows the percentage change in the parameter at the last iteration,– use the .pin file described Chapter 11 to calculate meaningful functions of

the variance components,

• an analysis of variance (ANOVA) table (Section 6.10). The table contains thenumerator degrees of freedom for the term and F-tests for approximate testingof effects. This is incremental test adjusted for preceding terms only.

• estimated effects, their standard errors and t values for equations in the DENSEportion of the SSP matrix; the T-prev column tests difference between succes-sive coefficients in the same factor.

The following is a truncated version of nin89.asr.

ASReml 1.0 [31 Aug 2002] nin alliance trialversion & title10 Sep 2002 04:20:15.350 8.00 Mbyte Windows nin89date & timeLicensed to: Arthur Gilmourlicense* ASReml 2002 *********************************************announcements* Contact [email protected] for licensing and support *box* Use !{ and !} when fitting covariance across terms *

***************************************************** ARG *


QUALIFIERS: !SKIP 1

Reading nin.asd FREE FORMAT skipping 1 lines


Using 242 records of 242 readdata summaryModel term Size Type COL Minimum Mean Maximum #zero #miss

1 variety 56 Factor 1 1 26.4545 56 0 0

2 id 2 1.000 26.45 56.00 0 0

3 pid 3 1101. 2628. 4156. 0 18

4 raw 4 21.00 510.5 840.0 0 18

5 repl 4 Factor 5 1 2.4132 4 0 0

6 nloc 6 4.000 4.000 4.000 0 0

7 yield 1 Variate 7 1.050 25.53 42.00 0 18

8 lat 8 4.300 25.80 47.30 0 0


9 long 9 1.200 13.80 26.40 0 0

10 row 22 Factor 10 1 11.5000 22 0 0

11 column 11 Factor 11 1 6.0000 11 0 0


13 mv_estimates 18 Missing value

22 AR=AutoReg 0.3000





1 LogL=-419.949 S2= 39.387 168 df 1.000 0.3000 0.3000iterations2 LogL=-410.420 S2= 38.350 168 df 1.000 0.4030 0.3461sequence3 LogL=-402.549 S2= 40.253 168 df 1.000 0.5202 0.3999

4 LogL=-399.465 S2= 46.228 168 df 1.000 0.6233 0.4423

5 LogL=-399.331 S2= 48.139 168 df 1.000 0.6475 0.4414

6 LogL=-399.324 S2= 48.580 168 df 1.000 0.6535 0.4389

7 LogL=-399.324 S2= 48.680 168 df 1.000 0.6550 0.4379

8 LogL=-399.324 S2= 48.705 168 df 1.000 0.6554 0.4376

Final parameter values 1.0000 0.65549 0.43750


Variance 242 168 1.00000 48.7045 6.81 0 PvarianceResidual AR=AutoR 22 0.655488 0.655488 11.63 0 UparametersResidual AR=AutoR 11 0.437497 0.437497 5.43 0 U

Analysis of Variance DF F-incrANOVA12 mu 1 331.90

1 variety 55 2.23





1 variety

BRULE 2.98387 2.84137 1.05estimatedREDLAND 4.70621 2.97738 1.58 0.62effectsCODY -0.316046 2.96066 -0.11 -1.76

ARAPAHOE 2.95377 2.72658 1.08 1.16

NE83404 1.63070 2.89322 0.56 -0.48

NE83406 1.29029 2.77143 0.47 -0.12

NE83407 0.308992 3.13838 0.10 -0.31

CENTURA 2.26393 3.00167 0.75 0.66...NE87615 1.03251 2.93379 0.35 -0.40

NE87619 5.93687 2.84941 2.08 1.82

NE87627 -4.37848 2.99740 -1.46 -3.72

12 mu

57 24.0892 2.46480 9.77

13 mv_estimates 18 effects fitted

6 possible outliers: see .res fileoutlier infoFinished: 10 Sep 2002 04:20:34.508 LogL Convergedcompletion info


The .sln file

The .sln file contains estimates of the fixed and random effects with their stan-dard errors in an array with four columns ordered as

factor name level estimate standard error

Note that the error presented for the estimate of a random effect is the squareroot of the prediction error variance. The .sln file can easily be read into aGENSTAT spreadsheet or an S-PLUS data frame. This is a reduced version ofnin89.sln. Note that

• the order of some terms may differ from the order in which those terms werespecified in the model statement,

• the missing value estimates appear at the end of the file in this example.

variety LANCER 0.000 0.000variety estimates

variety BRULE 2.987 2.842

variety REDLAND 4.707 2.978

variety CODY -0.3131 2.961

variety ARAPAHOE 2.954 2.727...

variety NE87615 1.035 2.934

variety NE87619 5.939 2.850

variety NE87627 -4.376 2.998

mu 1 24.09 2.465intercept

mv_estimates 1 21.91 6.729missing value

mv_estimates 2 23.22 6.721estimates

mv_estimates 3 22.52 6.708









mv_estimates 12 27.78 6.492...


The .yht file

The .yht file contains the predicted values of the data in the original order (thisis not changed by supplying row/column order in spatial analyses), the residualsand the diagonal elements of the hat matrix. Where an observation is missing, theresidual, missing values predicted value and Hat value are also declared missing.The missing value estimates with standard errors are reported in the .sln file.

This is the first 20 lines of nin89.yht. Note that the values corresponding to themissing data (first 15 records) are all -0.1000E-36 which is the internal valueused for missing values.

Record Yhat Residual Hat

1 -0.10000E-36 -0.1000E-36 -0.1000E-36

2 -0.10000E-36 -0.1000E-36 -0.1000E-36

3 -0.10000E-36 -0.1000E-36 -0.1000E-36

4 -0.10000E-36 -0.1000E-36 -0.1000E-36

5 -0.10000E-36 -0.1000E-36 -0.1000E-36

6 -0.10000E-36 -0.1000E-36 -0.1000E-36

7 -0.10000E-36 -0.1000E-36 -0.1000E-36

8 -0.10000E-36 -0.1000E-36 -0.1000E-36

9 -0.10000E-36 -0.1000E-36 -0.1000E-36

10 -0.10000E-36 -0.1000E-36 -0.1000E-36

11 -0.10000E-36 -0.1000E-36 -0.1000E-36

12 -0.10000E-36 -0.1000E-36 -0.1000E-36

13 -0.10000E-36 -0.1000E-36 -0.1000E-36

14 -0.10000E-36 -0.1000E-36 -0.1000E-36

15 -0.10000E-36 -0.1000E-36 -0.1000E-36

16 24.088 5.162 6.074

17 27.074 4.476 6.222

18 28.795 6.255 6.282

19 23.775 6.325 6.235

20 27.042 6.008 5.962...

240 24.695 1.855 6.114

241 25.452 0.1475 6.158

242 22.465 4.435 6.604


13.4 Other ASReml output files

The .res file

The .res file contains miscellaneous supplementary information including

• a list of unique values of x formed by using the fac() model term,

• a list of unique (x, y) combinations formed by using the fac(x,y) model term,

• orthogonal polynomials produced by pol() model term,

• the design matrix formed for the spl() model term,

• predicted values of the curvature component of cubic smoothing splines,

• the empirical variance-covariance matrix based on the BLUPs when a Σ⊗I orI ⊗Σ structure is used; this may be used to obtain starting values for anotherrun of ASReml,

• a table showing the variance components for each iteration,

• some statistics derived from the residuals from two-dimensional data (multi-variate, repeated measures or spatial)

– the residuals from a spatial analysis will have the units part added tothem (defined as the combined residual) unless the data records were sorted(within ASReml ) in which case the units and the correlated residuals arein different orders (data file order and field order respectively),

– the residuals are printed in the .yht file but the statistics in the .res fileare calculated from the combined residual,

– the Covariance/Variance/Correlation (C/V/C)matrix calculated directlyfrom the residuals; it contains the covariance below the diagonals, the vari-ances on the diagonal and the correlations above the diagonal - these mayhelp with finding starting values,

– relevant portions of the estimated variance matrix for each term for whichan R structure or a G structure has been associated,

• a variogram and spatial correlations for spatial analysis; the spatial correlationsare based on distance between data points (see Gilmour et al., 1997),

• the slope of the log(absolute residual) on log(predicted value) for assessing pos-sible mean-variance relationships and the location of large residuals. For ex-ample,

SLOPES FOR LOG(ABS(RES)) ON LOG(PV) for section 1

0.99 2.01 4.34

produced from a trivariate analysis reports the slopes. A slope of b suggests


that y1−b might have less mean variance relationship. If there is no meanvariance relation, a slope of zero is expected. A slope of 1

2 suggests a SQRTtransformation might resolve the dependence; a slope of 1 means a LOG trans-formation might be appropriate. So, for the 3 traits, log(y1), y−1

2 and y−33 are

indicated. This diagnostic strategy works better when based on grouped dataregressing log(standard deviation) on log(mean).

Also,

STND RES 16 -2.35 6.58 5.64

indicates that for the 16th data record, the residuals are -2.35, 6.58 and 5.64times the respective standard deviations. The standard deviation used in thistest is calculated directly from the residuals rather than from the analysis.They are intended to flag the records with large residuals rather than to pre-cisely quantify their relative size. They are not studentised residuals and aregenerally not relevant when the user has fitted heterogeneous variances.

This is nin89.res.

Covariance/Variance/Correlation matrix of Residuals Section 1

18.261 0.522 0.688 0.684 0.819 0.496 0.445 0.499C/V/C matrix0.053 0.083 0.180

14.887 44.523 0.710 0.550 0.517 0.215 0.568 0.550

0.439 0.263 0.318

14.934 24.080 25.811 0.626 0.727 0.494 0.534 0.587

0.332 0.185 0.329

14.193 17.818 15.433 23.546 0.735 0.479 0.587 0.303

0.055 -0.059 0.059

21.332 21.041 22.519 21.751 37.185 0.345 0.501 0.540

0.212 0.236 0.279

14.092 9.539 16.664 15.431 13.984 44.158 0.614 0.033

-0.314 -0.482 -0.417

13.350 26.637 19.067 20.022 21.461 28.647 49.371 0.435

0.173 -0.037 -0.053

22.425 38.622 31.364 15.478 34.628 2.300 32.134 110.587

0.762 0.721 0.778

3.040 39.560 22.759 3.613 17.429 -28.133 16.449 108.223

182.312 0.887 0.885

4.373 21.628 11.579 -3.524 17.764 -39.584 -3.237 93.638

147.844 152.427 0.921

10.074 27.827 21.927 3.767 22.284 -36.277 -4.919 107.276

156.540 148.997 171.732

Fitted Covariance/Variance matrix of Residuals is AR=AutoReg

1.000 0.438 0.192 0.084 0.037 0.016 0.007 0.003

0.001 0.001 0.000


*see histogram*plot in

* ** ***Figure 13.4* ** *** *

** *** *** *** *

********** *****

******************

* * * * ** ******************

* ** ** ** ** ***** ** ************************ ** *

Min Mean Max -24.872 0.27975 15.915 omitting 18 zeros

Figures 13.1 to 13.4 show the graphics derived from the residuals when the!DISPLAY 15 qualifier is specified and which are written to .eps files by run-ning

ASReml -g22 nin89.as

The graphs are a variogram of the residuals from the spatial analysis for site 1(Figure 13.1), a plot of the residuals in field plan order (Figure 13.2), plots ofthe marginal means of the residuals (Figure 13.3) and a histogram of the resid-uals (Figure 13.4). The selection of which plots are displayed is controlled bythe !DISPLAY qualifier (Table 5.4). By default, the variogram and field plan aredisplayed.

nin alliance trial 89 1 Variogram of residuals 14 Aug 2002 13:03:23

0

3.964908

Outer displacement Inner displacement

Figure 13.1 Variogram of residuals


The sample variogram is a plot of the semi-variances of differences of residuals atparticular distances. The (0,0) position is zero because the difference is identicallyzero. ASReml displays the plot for distances 0, 1, 2, ..., 8, 9-10, 11-14, 15-20, . . . .

The plot of residuals in field plan order (Figure 13.2) contains in its top and rightmargins a diamond showing the minimum, mean and maximum residual for thatrow or column. Note that a gap identifies where the missing values occur.

The plot of marginal means of residuals shows residuals for each row/column aswell as the trend in their means.

nin alliance trial 89 1 Field plot of residuals 14 Aug 2002 13:03:23

Range: −24.87 15.91

Figure 13.2 Plot of residuals in field plan order

The .vrb file

The .vrb file contains the estimates of the effects together with their approxi-mate prediction variance matrix corresponding to the dense portion. The file isformatted for reading back for post processing. The number of equations in thedense portion can be increased (to a maximum of 800) using the !DENSE option(Table 5.5) but not to include random effects. The matrix is lower triangularrow-wise in the order that the parameters are printed in the .sln file. It can bethought of as a partitioned lower triangular matrix,

σ2 .

βD

σ2CDD

nin alliance trial 89 1Residuals V Row and Column position: 14 Aug 2002 13:03:23Range: −24.87 15.91

Figure 13.3 Plot of the marginal means of the residuals

nin alliance trial 1_AHistogram of residuals14 Aug 2002 13:03:23

Range: −24.87 15.91Peak Count: 17

0

Figure 13.4 Histogram of residuals


where βD

is the dense portion of β and CDD

is the dense portion of C−1. Thisis the first 20 rows of nin89.vrb. Note that the first element is the estimatederror variance, that is, 48.6802, see the variance component estimates in the .asroutput.

0.486802E + 02 0.000000E + 00 0.000000E + 00 0.298660E + 01 0.000000E + 00

0.807551E + 01 0.470711E + 01 0.000000E + 00 0.456648E + 01 0.886687E + 01

−0.313123E + 00 0.000000E + 00 0.410031E + 01 0.476546E + 01 0.876708E + 01

0.295404E + 01 0.000000E + 00 0.343331E + 01 0.389620E + 01 0.416124E + 01

0.743616E + 01 0.163302E + 01 0.000000E + 00 0.377176E + 01 0.428109E + 01

0.472519E + 01 0.402696E + 01 0.837281E + 01 0.129013E + 01 0.000000E + 00

0.330076E + 01 0.347471E + 01 0.357605E + 01 0.316915E + 01 0.412130E + 01

0.768275E + 01 0.310018E + 00 0.000000E + 00 0.376637E + 01 0.419780E + 01

0.395693E + 01 0.383429E + 01 0.458492E + 01 0.378585E + 01 0.985202E + 01

0.226478E + 01 0.000000E + 00 0.379286E + 01 0.442457E + 01 0.439485E + 01

0.402503E + 01 0.440539E + 01 0.362391E + 01 0.502071E + 01 0.901191E + 01

0.508553E + 01 0.000000E + 00 0.393626E + 01 0.430512E + 01 0.423753E + 01

0.428826E + 01 0.417864E + 01 0.363341E + 01 0.444776E + 01 0.527289E + 01

0.855241E + 01 0.243687E + 01 0.000000E + 00 0.351386E + 01 0.369983E + 01

0.384055E + 01 0.330171E + 01 0.362019E + 01 0.352370E + 01 0.359516E + 01

0.392097E + 01 0.406762E + 01 0.801579E + 01 0.475935E + 01 0.000000E + 00

...

The first 5 rows of the lower triangular matrix in this case are

48.68020 0

2.98660 0 8.075514.70711 0 4.56648 8.86687

−0.313123 0 4.10031 4.76546 8.76708...

......

......

. . .

The .vvp file

The .vvp file contains the inverse of the average information matrix on the com-ponents scale. The file is formatted for reading back under the control of the.pin file described in Chapter 11. The matrix is lower triangular row-wise in theorder the parameters are printed in the .asr file. This is nin89.vvp with theparameter estimates in the order error variance, spatial row correlation, spatialcolumn correlation.

Variance of Variance components 3

51.0852

0.217089 0.318058E-02

0.677748E-01 -0.201181E-02 0.649355E-02


The .rsv file

The .rsv file contains the variance parameters from the most recent iterationof a model. The primary use of the .rsv file is to supply the values for the!CONTINUE qualifier (see Table 5.4) and the C command line option (see Table12.1). It contains sufficient information to match terms so that it can be usedwhen the variance model has been changed. This is nin89.rsv.

76 6

0.000000 0.000000 0.000000 1.000000 0.6553855 0.4375849

RSTRUCTURE 1 2

VARIANCE 1 1 0 1.00000

STRUCTURE 22 1 10.655385

STRUCTURE 11 1 10.437585

The .dpr file

The .dpr file contains the data and residuals from the analysis in double pre-cision binary form. The file is produced when the !RES qualifier (Table 4.3) isinvoked. The file could be renamed with filename extension .dbl and used forinput to another run of ASReml. Alternatively, it could be used by another For-tran program or package. Factors will have level codes if they were coded using!A or !I. All the data from the run plus an extra column of residuals is in thefile. Records omitted from the analysis are omitted from the file.

The .pvc file

The .pvc file contains functions of the variance components produced by runninga .pin file on the results of an ASReml run as described in Chapter 11. The .pinand .pvc files for a half-sib analysis of the Coopworth data are presented inSection 15.10.

The .pvs file

The .pvs file contains the predicted values formed when a predict statement isincluded in the job. Below is an edited version of nin89.pvs. See Section 3.6for the .pvs file for the simple RCB analysis of the NIN data considered in thatchapter.

nin alliance trial 10 Sep 2002 04:20:15title line

nin89



Warning: mv_estimates is ignored for prediction

---- ---- ---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ---- ---- ----



LANCER 24.0875 2.4645 Epredicted variety

BRULE 27.0749 2.4944 Emeans

REDLAND 28.7949 2.5064 E

CODY 23.7752 2.4970 E

ARAPAHOE 27.0416 2.4417 E

NE83404 25.7212 2.4424 E

NE83406 25.3776 2.5028 E

NE83407 24.3978 2.6882 E

CENTURA 26.3525 2.4763 E

SCOUT66 29.1732 2.4361 E...

NE87615 25.1238 2.4434 E

NE87619 30.0267 2.4666 E

NE87627 19.7126 2.4833 E

SED: Overall Standard Error of Difference 2.925SED summary


The .tab file

The .tab file contains the simple variety means and cell frequencies. Below is anedited version of nin89.tab.

nin alliance trial 10 Sep 2002 04:20:15

Simple tabulation of yield

variety

LANCER 28.56 4

BRULE 26.07 4

REDLAND 30.50 4

CODY 21.21 4

ARAPAHOE 29.44 4

NE83404 27.39 4

NE83406 24.28 4

NE83407 22.69 4

CENTURA 21.65 4

SCOUT66 27.52 4

COLT 27.00 4...

NE87615 25.69 4

NE87619 31.26 4

NE87627 23.23 4


13.5 ASReml output objects and where to find them

Table 13.2 presents a list of objects produced with each ASReml run and whereto find them in the output files.

Table 13.2: ASReml output objects and where to find them

output object found in comment

analysis of variancetable

.asr file the analysis of variance table contains F-statisticsfor each term in the fixed part of the model. Theseprovide for an incremental test of significance (seeSection 6.10).

data summary .asr file includes the number of records read and retainedfor analysis, the minimum, mean, maximum,number of zeros, number of missing values perdata field, factor/variate field distinction

elapsed time .asr file this can be determined by comparing the starttime with the finishing time.

fixed and random ef-fects

.sln file the effects that were included in the dense portionof the solution are also printed in the .asr file withtheir standard error, a t-statistic for testing thateffect and a t-statistic for testing it against thepreceding effect in that factor.

heritability .pvc file placed in the .pvc file when postprocessing witha .pin file

histogram of residu-als

.res file and graphics file

intermediate results .asl file given if the -DL command line option is used.

mean/variance rela-tionship

.res file for non-spatial analyses ASReml prints the slopeof the regression of log(abs(residual)) againstlog(predicted value). This regression is ex-pected to be near zero if the variance is inde-pendent of the mean. A power of the mean datatransformation might be indicated otherwise. Thesuggested power is approximately (1-b) where b isthe slope. A slope of 1 suggests a log transfor-mation. This is indicative only and should notbe blindly applied. Weighted analysis or identi-fying the cause of the heterogeneity should alsobe considered. This statistic is not reliable in ge-netic animal models or when units is included inthe linear model because then the predicted valueincludes some of the residual.


Table of output objects and where to find them ASReml


observed variance/covariance matrixformed from BLUPsand residuals

.res file for an interaction fitted as random effects, whenthe first [outer] dimension is smaller than the in-ner dimension less 10, ASReml prints an observedvariance matrix calculated from the BLUPs. Theobserved correlations are printed in the upper tri-angle. Since this matrix is not well scaled as an es-timate of the underlying variance component ma-trix, a rescaled version is also printed, scaled ac-cording to the fitted variance parameters. Theprimary purpose for this output is to provide rea-sonable starting values for fitting more complexvariance structure. The correlations may also beof interest. After a multivariate analysis, a sim-ilar matrix is also provided, calculated from theresiduals.

phenotypic variance .pvc file placed in the .pvc file when postprocessing witha .pin file

plot of residualsagainst field position

graphics file

possible outliers .res file these are residuals that are more than 3.5 stan-dard deviations in magnitude

predicted (fitted) val-ues at the data points

.yht file these in the are printed in the second column

predicted values .pvs file given if a predict statement is supplied in the .asfile.

REML log-likelihood .asr file the REML log-likelihood is given for each iteration.The REML log-likelihood should have converged

residuals .yht file and in binary form in .dpr file; these are printed incolumn 3. Furthermore, for multivariate analysesthe residuals will be in data order (traits withinrecords). However, in a univariate analysis withmissing values that are not fitted, there will befewer residuals than data records - there will beno residual where the data was missing so this canmake it difficult to line up the values unless youcan manipulate them in another program (spread-sheet).

score .asl file given if the -DL command line option is used.


Table of output objects and where to find them ASReml


tables of means .tab file.pvs file

simple averages of cross classified data are pro-duced by the tabulate directive to the .tab file.Adjusted means predicted from the fitted modelare written to the .pvs file by the predict direc-tive.

variance of varianceparameters

.vvp file based on the inverse of the average informationmatrix

variance parameters .asr file.res file

the values at each iteration are printed in the.res file. The final values are arranged in a table,printed with labels and converted if necessary tovariances.

variogram graphics file

14 Error messages

Introduction

Common problems

Things to check in the .asr file

An example

Error messages

Warning messages

168


14.1 Introduction

When ASReml finds an inconsistency it prints an error message to the screen (orthe .asl file) and dumps the current information to the .asr file. Below is thescreen output for a job that has been terminated due to an error. If a job has anerror you should

• try and identify the problem from the error message in the Fault: line and thetext of the Last line read: (this appears twice in the file to make it easier tofind),

• check that all labels have been defined and are in the correct case,

• some errors arise from conflicting information; the error may point to some-thing that appears valid but is inconsistent with something that is wrong earlierin the file,

• reduce to a simpler model and gradually build up to the desired analysis - thisshould help to identify the exact location of the problem.

If the problem is not resolved from the above list, you may need to email theHelp Desk at [email protected]. Please send the .as file, (a sample of) thedata, the .asr file and the .asl file produced by running

asreml -dl basename.as

The -dl command line option invokes debug mode and sends all non-graphicsSee Chapter 12

screen output to the .asl file.

In this chapter we show some of the common typographical problems. Errorsarising from attempts to fit an inappropriate model are often harder to resolve.Following is an example of output when the data file is not correctly named or isnot present (ASReml tries to interpret the filename as a variable name when thefile is not found).

ASReml 1.0 [31 Aug 2002] nin alliance trial

10 Sep 2002 05:13:15.253 8.00 Mbyte Windows ninerr1memory info

Licensed to: Arthur Gilmour

* ASReml 2002 *********************************************



***************************************************** ARG *

Folder: C:\asr\ex\manexworking direc-

tory


Warning: FIELD DEFINITION lines should be INDENTEDmessages There is no file called nine.asd

Invalid label for data field: ’nine.asd’ contains a reserved character

or may get confused with a previous label or reserved word

Fault 12 FORMAT error reading data structures

Last line read was: nine.asd !slip 1

Currently defined structures, COLS and LEVELS

1 variety -1 56 0 0 0 0

2 id -2 1 0 0 0 0

3 pid -3 1 0 0 0 0

4 raw -4 1 0 0 0 0

5 repl -5 4 0 0 0 0

6 nloc -6 1 0 0 0 0

7 yield -7 1 0 0 0 0

8 lat -8 1 0 0 0 0

9 long -9 1 0 0 0 0

10 row -10 22 0 0 0 0

11 column -11 11 0 0 0 0

12 nine.asd 0 0 0 0 0 0filename!

ninerr1 C:\asr\ex\manex

12 factors defined [max 500].

0 variance parameters [max 900]. 2 special structures

Last line read was: nine.asd !slip 1last line read

12 12 -1 0 1000

Finished: 10 Sep 2002 05:13:16.745 FORMAT error reading data structures :fault message

14.2 Common problems

Common problems in coding ASReml are as follows:

• a variable name has been misspelt; variable names are case sensitive,

• a model term has been misspelt; model term funcions and reserved words (mu,Trait, mv, units) are case sensitive,

• the data file name is misspelt or the wrong path has been given - the pathnamemay not include embedded blanks,

• a qualifier has been misspelt or is in the wrong place,

• there is an inconsistency between the variance header line and the structuredefinition lines presented,


• failure to use commas appropriately in model definition lines,

• there is an error in the R structure definition lines,

• there is an error in the G structure definition lines,

– there is a factor name error,– there is a missing parameter,

• there is an error in the predict statement ,

• model term mv not included in the model when there are missing values in thedata when the user wishes to retain their records in the analysis.

The most common problem in running ASReml is that a variable label is misspelt.

14.3 Things to check in the .asr file

The information that ASReml dumps in the .asr file when an error is encounteredis intended to give you some idea of the particular error:

• if there is no data summary ASReml has failed before or while reading themodel line,

• if ASReml has completed one iteration the problem is probably associated withstarting values of the variance parameters or the logic of the model rather thanthe syntax per se.

Part of the file nin89.asr presented in Chapter 13 is displayed below to indicatethe lines of the .asr file that should be checked. You should check that

• sufficient workspace has been obtained,

• the records read/lines read/records used are correct,

• mean min max information is correct for each variable,

• ANOVA table has the expected degrees of freedom.


10 Sep 2002 04:20:15.350 \textcolor{blue} 8.00 Mbyte \textcolor{black} Windows nin89workspace


* ASReml 2002 *********************************************




***************************************************** ARG *

Folder: C:\asr\ex\manexworking direc-

tory QUALIFIERS: !SKIP 1



Using 242 records of 242 readrecords read


1 variety 56 Factor 1 1 26.4545 56 0 0data summary

2 id 2 1.000 26.45 56.00 0 0

3 pid 3 1101. 2628. 4156. 0 18

4 raw 4 21.00 510.5 840.0 0 18

5 repl 4 Factor 5 1 2.4132 4 0 0

6 nloc 6 4.000 4.000 4.000 0 0

7 yield 1 Variate 7 1.050 25.53 42.00 0 18

8 lat 8 4.300 25.80 47.30 0 0

9 long 9 1.200 13.80 26.40 0 0

10 row 22 Factor 10 1 11.5000 22 0 0

11 column 11 Factor 11 1 6.0000 11 0 0








1 LogL=-419.949 S2= 39.387 168 df 1.000 0.3000 0.3000iteration

2 LogL=-410.420 S2= 38.350 168 df 1.000 0.4030 0.3461sequence

3 LogL=-402.549 S2= 40.253 168 df 1.000 0.5202 0.3999

4 LogL=-399.465 S2= 46.228 168 df 1.000 0.6233 0.4423

5 LogL=-399.331 S2= 48.139 168 df 1.000 0.6475 0.4414

6 LogL=-399.324 S2= 48.580 168 df 1.000 0.6535 0.4389

7 LogL=-399.324 S2= 48.680 168 df 1.000 0.6550 0.4379

8 LogL=-399.324 S2= 48.705 168 df 1.000 0.6554 0.4376



Variance 242 168 1.00000 48.7045 6.81 0 P

Residual AR=AutoR 22 0.655488 0.655488 11.63 0 U



Analysis of Variance DF F-incrANOVA

12 mu 1 331.90

1 variety 55 2.23





1 variety

BRULE 2.98387 2.84137 1.05

REDLAND 4.70621 2.97738 1.58 0.62

CODY -0.316046 2.96066 -0.11 -1.76

: : : : :

NE87615 1.03251 2.93379 0.35 -0.40

NE87619 5.93687 2.84941 2.08 1.82

NE87627 -4.37848 2.99740 -1.46 -3.72

12 mu

57 24.0892 2.46480 9.77

13 mv_estimates 18 effects fitted

6 possible outliers: see .res fileoutliers?


14.4 An example

nin alliance trial

variety 56 # 3.

id

pid

raw

repl 4

nloc

yield

lat

long

row 22

column 11

nin.asd !slip 1 !dopart $1

# 1. & 2.

!part 1

yield∼mu variety # 4.

!r Repl # 5.

0 0 1

Repl 1 # 6.

2 0 IDV 0.1 # 7.

!part 2

yield∼mu variety # 9.

1 2

11 row AR1 .1 #10.

22 col AR1 .1

!part

predict voriety # 8.

This is the command file for a simple RCBSee 2a in Sec-

tion 7.3 analysis of the NIN variety trial data in thefirst part. However, this file contains eightcommon mistakes in coding ASReml. We alsoshow two common mistakes associated withspatial analyses in the second part. The errorsare highlighted and the numbers indicate theorder in which they are detected. Each error isdiscussed with reference to the output writtento the .asr file. A summary of the errors isas follows:

1. data file not found,

2. unrecognised qualifier,

3. incorrectly defined alphanumeric factor,


4. comma missing from first line of model,

5. misspelt variable label in linear model,

6. misspelt variable label in G structureheader line,

7. wrong levels declared in G structure modelline,

8. misspelt variable label in predict statement.

9. mv omitted from spatial model

10. wrong levels declared in R structure model lines

1. Data file not found

nin alliance trial...

nine.asd !slip 1


Running this job produces the .asr file in Sec-tion 14.1. The first problem is that ASReml

cannot find the data file nine.asd as indi-cated in the error message above the Faultline. ASReml reports the last line read beforethe job was terminated, an error messageFORMAT error reading data structuresand other information obtained to that point. In this case the program only madeit to the data file definition line in the command file. Since nine.asd commencesin column 1, ASReml checks for a file of this name (in the working directory sinceno path is supplied). Since ASReml did not find the data file it tried to interpretthe line as a variable definition but . is not permitted in a variable label. Theproblem is either that the filename is misspelt or a pathname is required. In thiscase the data file was given as nine.asd rather than nin.asd.

2. An unrecognised qualifier and 3. An incorrectly defined factor

After supplying the correct pathname and re-running the job, ASReml producesthe warning message

WARNING: Unrecognised qualifier at character 22 !slip 1

followed by the fault message

ERROR Reading the data.The warning does not cause the job to terminate immediately but arises because


!slip is not a recognised data file line qualifier; the correct qualifier is !skip.The job terminates when reading the header line of the nin.asd file which isalphabetic when it is expecting numeric values. The following output displaysthe error message produced.


10 Sep 2002 05:13:58.815 8.00 Mbyte Windows ninerr2


* ASReml 2002 *********************************************



***************************************************** ARG *


QUALIFIERS: !SLIP 1

Warning: Unrecognised qualifier at character 9 !SLIP 1



Field 1 [variety] of record 1 [line 1] is not valid.error

Increase the number of levels in factor varietyhint

Fault 0 ERROR Reading the data

Last line read was: variety id pid raw rep nloc yield lat long row column


1 variety 1 56 56 0 0 0

:

12 mu 0 1 -8 0 -1 0

ninerr2 nin.asd

Model specification: TERM LEVELS GAMMAS

mu 0 0.000

variety 0 0.000



Last line read was: variety id pid raw rep nloc yield lat long row column

12 0 0 0 1000

Finished: 10 Sep 2002 05:14:00.348 ERROR Reading the data

Fixing the error by changing !slip to !skip however still produces the faultmessage

ERROR Reading the data.The portion of output given below shows that ASReml has baulked at the name


LANCER in the first field on the first data line. This alphabetic data field is notdeclared as alphabetic. The correct data field definition for variety is

variety !A

to indicate that variety is a character field.




* ASReml 2002 *********************************************



***************************************************** ARG *


QUALIFIERS: !SKIP 1



Field 1 [LANCER] of record 1 [line 1] is not valid.

Increase the number of levels in factor variety

Fault 0 ERROR Reading the data

Last line read was: LANCER 1 NA NA 1 4 NA 4.3 1.2 1 1


1 variety 1 56 56 0 0 0

:

12 mu 0 1 -8 0 -1 0

ninerr3 variety id pid raw rep nloc yield lat


mu 0 0.000

variety 0 0.000



Last line read was: LANCER 1 NA NA 1 4 NA 4.3 1.2 1 1

12 0 0 0 1000

Finished: 10 Sep 2002 05:14:02.000 ERROR Reading the data

4. A missing comma and 5. A misspelt factor name in linear model

nin alliance trial

variety !A...

repl 4...

nin.asd !skip 1


!r Repl...

The model has been written over two linesbut ASReml does not realise this becausethe first line does not end with a comma.


The missing comma causes the error mes-sage

Variance header: SECTIONS DIMNS GSTRUCT

as ASReml tries to interpret the second line ofthe model (see Last line read) as the vari-ance header line. The .asr file is displayedbelow. Note that the data has now been successfully read as indicated by thedata summary. You should always check the data summary to ensure that thecorrect number of records have been detected and the data values match thenames appropriately.




* ASReml 2002 *********************************************



***************************************************** ARG *


QUALIFIERS: !SKIP 1





1 variety 56 Factor 1 1 28.5000 56 0 0

2 id 2 1.000 28.50 56.00 0 0

3 pid 3 1101. 2628. 4156. 0 0

4 raw 4 21.00 510.5 840.0 0 0

5 repl 4 Factor 5 1 2.5000 4 0 0

6 nloc 6 4.000 4.000 4.000 0 0

7 yield 1 Variate 7 1.050 25.53 42.00 0 0

8 lat 8 4.300 27.22 47.30 0 0

9 long 9 1.200 14.08 26.40 0 0

10 row 22 Factor 10 1 11.7321 22 0 0

11 column 11 Factor 11 1 6.3304 11 0 0


Fault 0 Variance header: SECTIONS DIMNS GSTRUCT

Last line read was: !r Repl 0 0 0 0




variety 56 0.000

mu 1 0.000



Final parameter values

Last line read was: !r Repl 0 0 0 0

12 0 242 224 1000

Finished: 10 Sep 2002 05:14:03.552 Variance header: SECTIONS DIMNS GSTRUCT

Inserting a comma on the end of the first line of the model to give

yield ∼ mu variety,!r Repl

solves that problem but produces the error message

Error reading model factor list

because Repl should have been spelt repl. Portion of the output is displayed.Since the model line is parsed before the data is read, this run failed beforereading the data.




* ASReml 2002 *********************************************



***************************************************** ARG *


QUALIFIERS: !SKIP 1


Fault 0 Error reading model factor list

Last line read was: Repl


1 variety 1 2 2 0 0 0

:

12 mu 0 1 -8 0 -1 0




Last line read was: Repl


12 0 -1 0 1000

Finished: 10 Sep 2002 05:14:05.034 Error reading model factor list

6. Misspelt factor name and 7. Wrong levels declaration in the G struc-ture definition lines

nin alliance trial...

nin.asd !skip 1


!r Repl

0 0 1

Repl 1

2 0 IDV 0.1

The next error message that ASReml presentsis

G structure header: Factor, order

indicating that there is something wrong inthe G structure definition lines. In this casethe replicate term in the first G structure def-inition line has been spelt incorrectly. To cor-rect this error replace Repl with repl.




* ASReml 2002 *********************************************



***************************************************** ARG *


QUALIFIERS: !SKIP 1





1 variety 56 Factor 1 1 28.5000 56 0 0

2 id 2 1.000 28.50 56.00 0 0

3 pid 3 1101. 2628. 4156. 0 0

4 raw 4 21.00 510.5 840.0 0 0

5 repl 4 Factor 5 1 2.5000 4 0 0

6 nloc 6 4.000 4.000 4.000 0 0

7 yield 1 Variate 7 1.050 25.53 42.00 0 0

8 lat 8 4.300 27.22 47.30 0 0

9 long 9 1.200 14.08 26.40 0 0

10 row 22 Factor 10 1 11.7321 22 0 0

11 column 11 Factor 11 1 6.3304 11 0 0



Fault 1 G structure header: Factor, order

Last line read was: Repl 1 0 0 0 0



variety 56 0.000

mu 1 0.000

repl 4 0.100

SECTIONS 224 4 1

TYPE 0 0 0

STRUCT 224 0 0 0 0 0 0




Last line read was: Repl 1 0 0 0 0

12 1 242 224 1000

Finished: 10 Sep 2002 05:14:06.556 G structure header: Factor, order

Fixing the header line, we then get the error messageStructure / Factor mismatchThis arose because repl has 4 levels but we have only declared 2 in the G structuremodel line. The G structure should readrepl 14 0 IDV 0.1The last lines of the output with this error are displayed below.

11 column 11 Factor 11 1 6.3304 11 0 0


2 identity 0.1000

Structure for repl has 2 levels defined

Fault 1 Structure / Factor mismatch

Last line read was: 2 0 IDV 0.1 0 0 0 0 0



variety 56 0.000

mu 1 0.000

repl 4 0.100

SECTIONS 224 4 1

TYPE 0 0 1002


STRUCT 224 0 0 0 0 0 0

2 1 0 5 0 1 0




Last line read was: 2 0 IDV 0.1 0 0 0 0 0

12 1 242 224 1000

Finished: 10 Sep 2002 05:14:08.069 Structure / Factor mismatch

8. A misspelt factor name in the predict statement

The final error in the job is that a factor name is misspelt in the predict statement.This is a non-fatal error. The faulty statement is simply ignored by ASReml andno .pvs file is produced. To rectify this statement correct voriety to variety.see Chapter 13

9. Forgetting mv in a spatial analysis

The first error message from running part 2 of the job is

R structures imply 0 + 242 records: only 224 exist

Checking the seventh line of the output below, we see that there were 242 recordsread but only 224 were retained for analysis. There are three reasons records aredropped.1. the !FILTER qualifier has been specified,2. the !D transformation qualifier has been specified and3. there are missing values in the response variable and the user has not specifiedthat they be estimated.The last applies here so we must change the model line to read yield ∼ muvariety mv.

:





1 variety 56 Factor 1 1 28.5000 56 0 0

:





Maybe you need to include ’mv’ in the model

Fault 1 R structures imply 0 242 records: only 224 e+

Last line read was: 22 column AR1 0.1 0 0 0 0 0



variety 56 0.000

mu 1 0.000

SECTIONS 242 3 1

STRUCT 11 1 1 4 0 1 10

22 1 1 5 0 1 11



Final parameter values 0.0000 -.10000E-360.10000

0.10000


12 1 242 224 1000

Finished: 10 Sep 2002 05:14:11.143 R structures imply 0 242 records: only

224 exist +

10. Field layout error in a spatial analysis

The final common error we highlight is the misspecification of the field layout. Inthis case we have ’accidently’ switched the levels in rows and columns. However,ASReml can detect this error because we have also asked it to sort the data intofield order. Had sorting not been requested, ASReml would not have been ableto detect that the lines of the data file were not sorted into the appropriate fieldorder and spatial analysis would be wrong.

:





1 variety 56 Factor 1 1 26.4545 56 0 0

:

10 row 22 Factor 10 1 11.5000 22 0 0

11 column 11 Factor 11 1 6.0000 11 0 0






Warning: Spatial mapping information for side 1 of order 11

ranges from 1.0 to 22.0

Warning: Spatial mapping information for side 2 of order 22

ranges from 1.0 to 11.0

Fault 2 Sorting data into field order




variety 56 0.000

mu 1 0.000

mv_estimates 18 0.000

SECTIONS 242 4 1

STRUCT 11 1 1 5 0 1 10

22 1 1 6 0 1 11



Final parameter values 0.0000 -.10000E-360.10000

0.10000


13 2 242 242 1000

Finished: 10 Sep 2002 05:14:12.655 Sorting data into field order


14.5 Information, Warning and Error messages

ASReml prints information, warning and error messages in the .asr file. Themajor information messages are in Table 14.2. A list of warning messages togetherwith the likely meaning(s) is presented in Table 14.1. Error messages with theirprobable cause(s) is presented in Table 14.3.

Table 14.1: Some information messages and comments

information message comment

Logl converged the REML log-likelihood changes between iter-ations were less than 0.002 * iteration number.

Logl converged, parameters not

converged

the change in REML log-likelihood was smalland convergence was assumed but the param-eters are, in fact, still changing.

Logl not converged the maximum number of iterations wasreached before the REML log-likelihood con-verged. Examine the sequence of estimates.You may need more iterations in which caserestart with C command line option (see Sec-tion 12.3 on job control). Otherwise restartwith more appropriate initial variances. Itmay be necessary to simplify the model andestimate the dominant components before es-timating other terms.


Table 14.2: List of warning messages and likely meaning(s)

warning message likely meaning

Warning: e missing values

generated by !^ transformation

data values should be positive.

Warning: i singularities in AI

matrix

usually means variance model is overparame-terized.

Warning: m variance structures

were modified

the structures are probably at the boundaryof the parameter space.

Warning: n missing values were

detected in the design

either use !MVINCLUDE or delete the records.

Warning: n negative weights it is better to avoid negative weights unless youcan check ASReml is doing the correct thingwith them.

Warning: r records were read

from multiple lines

check you have the intended number of fieldsper line.

WARNING term has more levels [ ##

] than expected [ ## ]:

You have probably mis-specified the numberof levels in the factor or omitted the !I qual-ifier (see Section 5.4 on data field definitionsyntax). ASReml corrects the number of lev-els.

Warning: term in the predict

!IGNORE list

the term did not appear in the model.

Warning: term in the predict

!USE list

the term did not appear in the model.

Warning: term is ignored for

prediction

terms like units and mv cannot be included inprediction.

Warning: Check if you need the

!RECODE qualifier

!RECODE may be needed if the binary file wasnot prepared with ASReml.

Warning: Code B - fixed at a

boundary (!GP)

suggest drop the term and refit the model.

Warning: Eigen analysis check of

US matrix skipped

matrix is probably OK.

WARNING: Extra lines on the end

of the input file ...:

this indicates that there are some lines on theend of the .as file that were not used. Thefirst ”extra” line is displayed. This is only aproblem if you intended ASReml to read theselines.

Warning: Fewer levels found in

termASReml increases to the correct value.

Warning: FIELD DEFINITION lines

should be INDENTED

indent them to avert this message.


List of warning messages and likely meaning(s)


Warning: Fixed levels for factor user nominated more levels than are permit-ted.

Warning: Initial gamma value is

zero

constraint parameter is probably wrongly as-signed.

Warning: Invalid argument. fix the argument.

Warning: It is usual to include

Trait

in multivariate analysis model.

Warning: LogL Converged;

Parameters Not Converged

you may need more iterations.

Warning: LogL not converged restart to do more iterations (see !CONTINUE).

Warning: Missing cells in table missing cells are normally not reported.

Warning: More levels found in

termconsider setting levels correctly.

Warning: PREDICT LINE IGNORED -

TOO MANY

limit is 100.

Warning: PREDICT statement is

being ignored

because it contains errors.

Warning: Second occurrence of

term dropped

if you really want to fit this term twice, createa copy with another name.

Warning: Spatial mapping

information for side

gives details so you can check ASReml is doingwhat you intend.

Warning: Standard errors that is, these standard errors are approximate.

Warning: SYNTAX CHANGE: text may

be invalid

use the correct syntax.

Warning: The !A qualifier

ignored when reading BINARY data

the !A fields will be treated as factors but arenot encoded.

Warning: The !SPLINE qualifier

has been redefined.

use correct syntax.

Warning: The !X !Y !G qualifiers

are ignored. There is no data to

plot

revise the qualifier arguments.

Warning: The estimation was

ABORTED

do not accept the estimates printed.

Warning: The labels for

predictions are erroneous

the labels for predicted terms are probably outof kilter. Try a simpler predict statement. Ifthe problem persists, send for help.


List of warning messages and likely meaning(s)


Warning: This US structure is

not positive definite

check the initial values.

Warning: Unrecognised qualifier

at character

the qualifier either is misspelt or is in thewrong place.

Warning: US matrix was not

positive definite: MODIFIED

the initial values were modified.

Warning: User specified spline

points

the points have been rescaled to suit the datavalues.

Warning: Variance parameters

were modified by BENDing

ASReml may not have converged to the bestestimate.

WARNING Likelihood decreased.

Check gammas and singularities.:

a common reason is that some constraints haverestricted the gammas. Add the !GU qualifierto any factor definition whose gamma valueis approaching zero (or the correlation is ap-proaching (-)1. Alternatively, more singular-ities may have been detected. You shouldidentify where the singularities are expectedand modify the data so that they are omittedor consistently detected. One possibility is tocentre and scale covariates involved in interac-tions so that their standard deviation is closeto 1.


Table 14.3: Alphabetical list of error messages and probable cause(s)

error message probable cause

Convergence failure: the program did not proceed to convergencebecause the REML log-likelihood was fluctuat-ing wildly. One possible reason is that somesingular terms in the model are not being de-tected consistently. Otherwise, the updated Gstructures are not positive definite. There aresome things to try:

– define US structures as positive definite byusing !GP,

– supply better starting values,

– fix parameters that you are confident ofwhile getting better estimates for others(that is, fix variances when estimating co-variances),

– fit a simpler model,

– reorganise the model to reduce covarianceterms (for example, use CORUH instead ofUS.)

Error reading the model factor

list:

the model specification line is in error: a vari-able is probably misnamed.

Error reading the data: the data file could not be interpreted: al-phanumeric fields need the !A qualifier.

Error reading the data file name: data file name may be wrong

Error opening the file: the named file did not exist or was of the wrongfile type (binary = unformatted, sequential).

Error structures are wrong size: the declared size of the error structures doesnot match the actual number of data records.

Error in R structure: the error model is not correctly specified.

FORMAT error reading factor

Definitions:

Likely causes are

– bad syntax or invalid characters in the vari-able labels; variable labels must not includeany of these symbols; !|-+(:#$ and .,

– the data file name is misspelt,

– there are too many variables declared orthere is no valid value supplied with anarithmetic transformation option.

Getting Pedigree: an error occurred processing the pedigree. Thepedigree file must be ascii, free format withANIMAL, SIRE and DAM as the first three fields.


Alphabetical list of error messages and probable cause(s)


G-structure header: Factor

order:

there is a problem reading G structure headerline. The line must contain the name of a termin the linear model spelt exactly as it appearsin the model

G structure: ORDER 0 MODEL

GAMMAS:

a G structure line cannot be interpreted.

Insufficient knots for : there must be at least 3 distinct data values

Invalid factor in model: a term in the model specification is not amongthe terms that have been defined. Check thespelling.

Invalid model factor ... : there is a problem with the named variable.

Invalid definition of a factor: there is a problem with forming one of thegenerated factors. The most probable causeis that an interaction cannot be formed.

Invalid weight/filter column

number:

the weight and filter columns must be datafields. Check the data summary.

Invalid factor definition: a term in the model has no levels

Missing values not allowed here: missing observations have been dropped sothat multivariate structure is messed up.

Maximum number of special

structures exceeded

special structures are weights, the Ainverseand GIV structures. The limit is 8 and so nomore than 6 GIV structures can be defined.

Negative Sum of Squares: This is typically caused by negative varianceparameters; try changing the starting valuesor using the !STEP option. If the problem oc-curs after several iterations it is likely that thevariance components are very small. Try sim-plifying the model. In multivariate analyses itarises if the error variance is (becomes) nega-tive definite. Try specifying !GP on the struc-ture line for the error variance.

NFACT out of range: too many terms are being defined.

Increase workspace with -Ss

command line option:

on some systems the command line option -S3

may help. If the data set is not extremely big,check the data summary.

Out of memory absorbing: use -S option to increase the memory alloca-tion. It may be possible to revise the modelsto increase sparsity.




Out of memory: forming design: factors are probably not declared properly.Check the number of levels. Possibly use the-S command line option.

Overflow structure table: occurs when space allocated for the structuretable is exceeded. There is room for threestructures for each model term for wich Gstructures are explicitly declared. The errormight occur when ASReml needs to constructrows of the table for structured terms whenthe user has not formally declared the struc-tures. Increasing g on the variance header linefor the number of G structures (see page 89)will increase the space allocated for the table.You will need to add extra explicit declara-tions also.

No residual variation: after fitting the model, the residual variationis essentially zero, that is, the model fully ex-plains the data. If this is intended, use the!BLUP 1 qualifier so that you can see the es-timates. Otherwise check that the dependentvalues are what you intend and then identifywhich variables explain it. Again, the !BLUP

1 qualifier might help.

PROGRAM failed in AIABS: increase memory.

Pedigree coding errors: check the pedigree file and see any messages inthe output. Check that identifiers and pedi-grees are in chronological order.

Pedigree factor has wrong size: the A-inverse factors are not the same size asthe A-inverse. Delete the ainverse.bin file andrerun the job.

Pedigree too big: program will need to be recompiled if file iscorrect.

Reading factor names: something is wrong in the terms definitions. Itcould also be that the data file is misnamed.

Reading the data: if you read less data than you expect, thereare two likely explanations. First, the datafile has less fields than implied by the datastructure definitions (you will probably readhalf the expected number). Second, there isan alphanumeric field where a numeric field isexpected.




Reading Update step size: check the !STEP qualifier argument.

Variance header: SEC DIM

GSTRUCT:

error with the variance header line. Often,some other error has meant that the wrongline is being interpreted as the variance headerline. Commonly, the model is written over sev-eral lines but the incomplete lines do not allend with a comma.

R structure error ORDER SORTCOL

MODEL GAMMAS:

an error reading the error model.

Segmentation fault: this is a Unix memory error. The first thing totry is to increase the memory workspace usingthe -S (see Section 12.3 on memory) commandline option. Otherwise you may need to sendyour data and the .as files to the Help Deskfor debugging.

Sorting the data into field

order:

the field order coding in the spatial errormodel does not generate a complete grid withone observation in each cell; missing valuesmay be deleted: they should be fitted. Alsomay be due to incorrect specification of num-ber of rows or columns.

STOP SCRATCH FILE DATA STORAGE

ERROR:

ASReml attempts to hold the data on a scratchfile. Check that the disk partition where thescratch files might be written is not too full;use the !NOSCRATCH qualifier to avoid thesescratch files.

Structure/ Factor mismatch: the declared size of a variance structure doesnot match the size of the model term that itis associated with.

Too BIG! Use -S option: memory overflow. Simplify the model, checkfactor definitions and increase memory usingthe -S command line option (see Section 12.3on memory).

Too many alphanumeric factor

level labels:

if the factor level labels are actually all inte-gers, use the !I option instead. Otherwise,you will have to convert a factor with alphanu-meric labels to numeric sequential codes ex-ternal to ASReml so that an !A option can beavoided.




Unable to invert R or G [US?]

matrix:

this message occurs when there is an errorforming the inverse of a variance structure.The probable cause is a non positive definite(initial) variance structure (US, CHOL and ANTE

models). It may also occur if an identity by un-structured (ID⊗US) error variance model is notspecified in a multivariate analysis (including!ASMV), see Chapter 8. If the failure is on thefirst iteration, the problem is with the startingvalues. If on a subsequent iteration, the up-dates have caused the problem. You could tryreducing the updates by using the !STEP qual-ifier. Otherwise, you could try fitting an al-ternative parameterisation. The CORGH modelmay be more stable than the US model.

Unable to invert R or G [CORR?]

matrix:

generally refers to a problem setting up themixed model equations. Most commonly, it iscaused by a non positive definite matrix.

15 Examples

Introduction

Split plot design - Oats

Unbalanced nested design - Rats

Source of variability in unbalanced data - Volts

Balanced repeated measures - Height

Spatial analysis of a field experiment - Barley

Unreplicated early generation variety trial - Wheat

Paired Case-Control study - Rice

Standard analysisA multivariate approachInterpretation of results

Balanced longitudinal data - Oranges

Initial analysesRandom coefficients and cubic smoothing splines

Multivariate animal genetics data - Sheep

Half-sib analysisAnimal model

193

15 Examples 194

15.1 Introduction

In this chapter we present the analysis of a variety of examples. The primaryaim is to illustrate the capabilities of ASReml in the context of analysing realdata sets. We also discuss the output produced by ASReml and indicate whenproblems may occur. Statistical concepts and issues are discussed as necessarybut we stress that the analyses are illustrative, not prescriptive.

15.2 Split plot design - Oats

The first example involves the analysis of a split plot design originally presentedby Yates (1935). The experiment was conducted to assess the effects on yieldof three oat varieties (Golden Rain, Marvellous and Victory) with four levels ofnitrogen application (0, 0.2, 0.4 and 0.6 cwt/acre). The field layout consisted ofsix blocks (labelled I, II, III, IV, V and VI) with three whole-plots per block eachsplit into four sub-plots. The three varieties were randomly allocated to the threewhole-plots while the four levels of nitrogen application were randomly assignedto the four sub-plots within each whole-plot. The data is presented in Table 15.1.

Table 15.1 A split-plot field trial of oat varieties and nitrogen application

nitrogenblock variety 0.0cwt 0.2cwt 0.4cwt 0.6cwt

GR 111 130 157 174I M 117 114 161 141

V 105 140 118 156GR 61 91 97 100

II M 70 108 126 149V 96 124 121 144GR 68 64 112 86

III M 60 102 89 96V 89 129 132 124GR 74 89 81 122

IV M 64 103 132 133V 70 89 104 117GR 62 90 100 116

V M 80 82 94 126V 63 70 109 99GR 53 74 118 113

VI M 89 82 86 104V 97 99 119 121

A standard analysis of these data recognises the two basic elements inherent inthe experiment. These two aspects are firstly the stratification of the experimentunits, that is the blocks, whole-plots and sub-plots, and secondly, the treatment

15 Examples 195

structure that is superimposed on the experimental material. The latter is ofprime interest, in the presence of stratification. Thus the aim of the analysisis to examine the importance of the treatment effects while accounting for thestratification and restricted randomisation of the treatments to the experimentalunits. The ASReml input file is presented below.

split plot example

blocks 6 # Coded 1...6 in first data field of oats.asd

nitrogen !A 4 # Coded alphabetically

subplots * # Coded 1...4

variety !A 3 # Coded alphabetically

wplots * # Coded 1...3

yield

oats.asd !SKIP 2

yield ~ mu variety nitrogen variety.nitrogen !r blocks blocks.wplots

predict nitrogen # Print table of predicted nitrogen means

predict variety

predict variety nitrogen !SED

The data fields were blocks, wplots, subplots, variety, nitrogen and yield.The first five variables are factors that describe the stratification or experi-ment design and treatments. The standard split plot analysis is achieved byfitting the model terms blocks and blocks.wplots as random effects. Theblocks.wplots.subplots term is not listed in the model because this interac-tion corresponds to the experimental units and is automatically included as theresidual term. The fixed effects include the main effects of both variety andnitrogen and their interaction. The tables of predicted means and associatedstandard errors of differences (SEDs) have been requested. These are reported inthe .pvs file. Abbreviated output is shown below.


blocks 6 6 1.21116 214.477 1.27 0 P

blocks.wplots 18 18 0.598937 106.062 1.56 0 P

Variance 72 60 1.00000 177.083 4.74 0 P


4 variety 2 1.49

2 nitrogen 3 37.69

8 variety.nitrogen 6 0.30

For simple variance component models such as the above, the default parame-terisation for the variance component parameters is as the ratio to the residual

15 Examples 196

variance. Thus ASReml prints the variance component ratio and variance com-ponent for each term in the random model in the columns labelled Gamma andComponent respectively.

The analysis of variance (ANOVA) is printed below this summary. The usualdecomposition has three strata, with treatment effects separating into differentstrata as a consequence of the balanced design and the allocation of varietyto whole-plots. In this balanced case, it is straightforward to derive the ANOVA

estimates of the stratum variances from the above REML estimates of the variancecomponents. That is

blocks = 12σ2b + 4σ2

w + σ2 = 3175.1

blocks.wplots = 4σ2w + σ2 = 601.3

residual = σ2 = 177.1

The default output for testing fixed effects used by ASReml is a table of so-called incremental F-tests. These F-tests are described in Section 6.10. Thestatistics are simply the appropriate Wald test statistics divided by the numberof estimable effects for that term. In this example there are three terms includedin the summary. The overall mean (denoted by mu) is of no interest for these data.The tests are sequential, that is the effect of each term is assessed by the changein sums of squares achieved by adding the term to the current model, defined bythe model which includes those terms appearing above the current term giventhe variance parameters. For example, the test of nitrogen is calculated fromthe change in sums of squares for the two models mu variety nitrogen and muvariety. No refitting occurs, that is the variance parameters are held constantat the REML estimates obtained from the currently specified fixed model.

The incremental Wald statistics have an asymptotic χ2 distribution, with degreesof freedom (df) given by the number of estimable effects (the number in the DFcolumn). In this example, the incremental F-tests are numerically the same asthe ANOVA F-tests, however ASReml currently has no facility for automaticallydetermining the appropriate denominator df for testing fixed effects. This is asimple problem for balanced designs, such as the split plot design, but it is notstraightforward to determine the relevant denominator df in unbalanced designs,such as the rat data set described in the next section.

Tables of predicted means are presented for the nitrogen, variety, and variety bynitrogen tables in the .pvs file. The qualifier !SED has been used on the thirdpredict statement and so the matrix of SEDs for the variety by nitrogen table

15 Examples 197

is printed. For the first two predictions, the average SED is calculated from theaverage variance of differences. Note also that the order of the predictions (e.g.0.6 cwt, 0.4 cwt 0.2 cwt 0 cwt for nitrogen) is simply the order those treatmentlabels were discovered in the data file.

Split plot analysis - oat Variety.Nitrogen 26 Aug 2002 00:26:33

oats


---- ---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ---- ---- ----

variety is averaged over fixed levels


blocks is ignored in the prediction (except where specifically included

wplots is ignored in the prediction (except where specifically included

nitrogen Predicted_Value Standard_Error Ecode

0.6_cwt 123.3889 7.1747 E

0.4_cwt 114.2222 7.1747 E

0.2_cwt 98.8889 7.1747 E

0_cwt 79.3889 7.1747 E


---- ---- ---- ---- ---- ---- ---- 2 ---- ---- ---- ---- ---- ---- ---- ----

nitrogen is averaged over fixed levels





Marvellous 109.7917 7.7975 E

Victory 97.6250 7.7975 E

Golden_rain 104.5000 7.7975 E


---- ---- ---- ---- ---- ---- ---- 3 ---- ---- ---- ---- ---- ---- ---- ----




variety nitrogen Predicted_Value Standard_Error Ecode

Marvellous 0.6_cwt 126.8333 9.1070 E

Victory 0.6_cwt 118.5000 9.1070 E

Golden_rain 0.6_cwt 124.8333 9.1070 E

Marvellous 0.4_cwt 117.1667 9.1070 E

Victory 0.4_cwt 110.8333 9.1070 E

Golden_rain 0.4_cwt 114.6667 9.1070 E

Marvellous 0.2_cwt 108.5000 9.1070 E

Victory 0.2_cwt 89.6667 9.1070 E

15 Examples 198

Golden_rain 0.2_cwt 98.5000 9.1070 E

Marvellous 0_cwt 86.6667 9.1070 E

Victory 0_cwt 71.5000 9.1070 E

Golden_rain 0_cwt 80.0000 9.1070 E

Predicted values with SED(PV)

126.833

118.500 9.71503

124.833 9.71503 9.71503

117.167 7.68295 9.71503 9.71503

110.833 9.71503 7.68295 9.71503 9.71503

114.667 9.71503 9.71503 7.68295 9.71503

9.71503

108.500 7.68295 9.71503 9.71503 7.68295

9.71503 9.71503

89.6667 9.71503 7.68295 9.71503 9.71503

7.68295 9.71503 9.71503

98.5000 9.71503 9.71503 7.68295 9.71503

9.71503 7.68295 9.71503 9.71503

86.6667 7.68295 9.71503 9.71503 7.68295

9.71503 9.71503 7.68295 9.71503 9.71503

71.5000 9.71503 7.68295 9.71503 9.71503

7.68295 9.71503 9.71503 7.68295 9.71503

9.71503

80.0000 9.71503 9.71503 7.68295 9.71503

9.71503 7.68295 9.71503 9.71503 7.68295

9.71503 9.71503

SED: Min 7.6830 Mean 9.1608 Max 9.7150

15.3 Unbalanced nested design - Rats

The second example we consider is a data set which illustrates some furtheraspects of testing fixed effects in linear mixed models. This example differs fromthe split plot example, as it is unbalanced and so more care is required in assessingthe significance of fixed effects.

The experiment was reported by Dempster et al. (1984) and was designed tocompare the effect of three doses of an experimental compound (control, lowand high) on the maternal performance of rats. Thirty female rats (dams) wererandomly split into three groups of 10 and each group randomly assigned to thethree different doses. All pups in each litter were weighed. The litters differed intotal size and in the numbers of males and females. Thus the additional covariate,littersize was included in the analysis. The differential effect of the compoundon male and female pups was also of interest. Three litters had to be droppedfrom experiment, which meant that one dose had only 7 dams. The analysis

15 Examples 199

must account for the presence of between dam variation, but must also recognisethe stratification of the experimental units (pups within litters) and that dosesand littersize belong to the dam stratum. Table 15.2 presents an indicative AOV

decomposition for this experiment.

Table 15.2 Rat data: AOV decomposition

stratum decomposition type df or ne

constant 1 F 1dams

dose F 2littersize F 1dam R 27

dams.pupssex F 1dose.sex F 2

error R

The dose and littersize effects are tested against the residual dam variation, whilethe remaining effects are tested against the residual within litter variation. TheASReml input to achieve this analysis is presented below.

Rats example

dose 3 !A

sex 2 !A

littersize

dam 27

pup 18

weight

rats.asd !DOPART 1 # Change DOPART argument to select each PART

!PART 1

weight ~ mu littersize dose sex dose.sex !r dam

!PART 2

weight ~ mu littersize sex dose !r dam

!PART 3

weight ~ mu sex dose littersize !r dam

!PART 4

weight ~ mu littersize dose sex !r dam

15 Examples 200

The input file contains an example of the use of the !DOPART qualifier. Its ar-gument specifies which part to execute. We will discuss the models in the fourparts. The full model is fitted in part 1. Abbreviated output from this analysisis presented below.

LogL= 74.2174 S2= 0.19670 315 df 0.1000 1.000

LogL= 79.1577 S2= 0.18751 315 df 0.1488 1.000

LogL= 83.9407 S2= 0.17755 315 df 0.2446 1.000

LogL= 86.8093 S2= 0.16903 315 df 0.4254 1.000

LogL= 87.2249 S2= 0.16594 315 df 0.5521 1.000

LogL= 87.2398 S2= 0.16532 315 df 0.5854 1.000

LogL= 87.2398 S2= 0.16530 315 df 0.5867 1.000



dam 27 27 0.586674 0.969769E-01 2.92 0 P

Variance 322 315 1.00000 0.165299 12.09 0 P


3 littersize 1 27.99

1 dose 2 12.15

2 sex 1 57.96

8 dose.sex 2 0.40

The iterative sequence has converged and the variance component parameterfor dam hasn’t changed for the last two iterations. The incremental Wald testsindicate that the interaction between dose and sex is not significant. Sincethese tests are sequential, the test for the dose.sex term is appropriate as itrespects marginality with both the main effects of dose and sex fitted before theinteraction.

Before proceeding we note the .res file reports several possible outliers, in partic-ular unit 66 has a standardised residual of -8.80 (see Figure 15.1). The weight ofthis female rat, within litter 9 is only 3.68, compared to weights of 7.26 and 6.58for two other female sibling pups. This weight appears erroneous, but withoutknowledge of the actual experiment we retain the observation in the following.

15 Examples 201

Rats example Residuals vs Fitted values Residuals (Y) −3.02: 1.22 Fitted values (X) 5.04: 7.63

Figure 15.1 Residual plot for the rat data

We refit the model without the dose.sex term. Note that the variance param-eters are re-estimated, though there is little change from the previous analysis.The output below presents the incremental Wald tests for this analysis.


dam 27 27 0.595157 0.979179E-01 2.93 0 P

Variance 322 317 1.00000 0.164524 12.13 0 P


7 mu 1 8981.48

3 littersize 1 27.85

2 sex 1 59.54

1 dose 2 11.42

The three remaining fixed effects appear significant, though an additional twomodels (part 3 and part 4) must be fitted to properly test the fixed effects in themodel. Table 15.3 presents the summary of these analyses.

The second column of F-tests pertains to a model which (wrongly) excludes thedam term. The impact of deleting the dam term on the F-tests for the fixed effectsis substantial and not surprising. This example reinforces the importance ofpreserving the strata of the design when assessing the significance of fixed effects.

15 Examples 202

Table 15.3: Summary of F-tests of fixed effects adjusted for the other factors for therat data

F-testterm df with dam without dam

littersize 1 46.43 146.50dose 1 11.42 58.43sex 1 58.27 24.52

15.4 Source of variability in unbalanced data - Volts

In this example we illustrate an analysis of unbalanced data in which the mainaim is to determine the sources of variation rather than assess the significance ofimposed treatments. The data are taken from Cox and Snell (1981) and involve anexperiment to examine the variability in the production of car voltage regulators.Standard production of regulators involves two steps. Regulators are taken fromthe production line and passed to a setting station which adjusts the regulator tooperate within a specified range of voltage. From the setting station the regulatoris then passed to a testing station where it is tested and returned if outside therequired range.

A total of 64 regulators were tested at four testing stations (teststat). Thevoltage for individual regulators was set at a total of 10 setting stations (setstat).A variable number of regulators (between 4 to 8) were set at each station, howevereach regulator was tested at every testing station. The ASReml input file ispresented below

Voltage data

teststat 4 # 4 testing stations tested each regulator

setstat !A # 10 setting stations each set 4-8 regulators

regulatr 8 # regulators numbered within setting stations

voltage

5-3-6.asd !skip 1

voltage ~ mu !r setstat setstat.regulatr teststat setstat.teststat

0 0 0

The factor regulatr numbers the regulators within each setting station. Thusthe term setstat.regulatr allows for differential effects of each regulator, while

15 Examples 203

the other terms examine the effects of the setting and testing stations and possibleinteraction. The abbreviated output is given below

LogL= 188.604 S2= 0.67074E-01 255 df

LogL= 199.530 S2= 0.59303E-01 255 df

LogL= 203.007 S2= 0.52814E-01 255 df

LogL= 203.240 S2= 0.51278E-01 255 df

LogL= 203.242 S2= 0.51141E-01 255 df

LogL= 203.242 S2= 0.51140E-01 255 df


setstat 10 10 0.233418 0.119371E-01 1.35 0 P

setstat.regulatr 80 64 0.601817 0.307771E-01 3.64 0 P

teststat 4 4 0.642752E-01 0.328706E-02 0.98 0 P

setstat.teststat 40 40 0.100000E-08 0.511404E-10 0.00 0 B

Variance 256 255 1.00000 0.511404E-01 9.72 0 P

Warning: Code B - fixed at a boundary (!GP) F - fixed by user

? - liable to change from P to B P - positive definite

C - Constrained by user (!VCC) U - unbounded

S - Singular Information matrix

The convergence criteria has been satisfied after six iterations. A warning messagein printed below the summary of the variance components because the variancecomponent for the setstat.teststat term has been fixed near the boundary.The default constraint for variance components (!GP) is to ensure that the REML

estimate remains positive. Under this constraint, if an update for any variancecomponent results in a negative value then ASReml sets that variance componentto a small positive value. If this occurs in subsequent iterations the parameter isfixed to a small positive value and the code B replaces P in the C column of thesummary table. The default constraint can be overridden using the !GU qualifier,but it is not generally recommended for standard analyses.

Figure 15.2 presents the residual plot which indicates two unusual data values.These values are successive observations, namely observation 210 and 211, beingtesting stations 2 and 3 for setting station 9(J), regulator 2. These observa-tions will not be dropped from the following analyses for consistency with otheranalyses conducted by Cox and Snell (1981) and in the GENSTAT manual.

The REML log-likelihood from the model without the setstat.teststat termwas 203.242, the same as the REML log-likelihood for the previous model. Ta-ble 15.4 presents a summary of the REML log-likelihood ratio for the remainingterms in the model. The summary of the ASReml output for the current model isgiven below. The column labelled Comp/SE is printed by ASReml to give a guide

15 Examples 204

ltage example 5−3−6 from the GENSTAT REML manual Residuals vs Fitted valu Residuals (Y) −1.08: 1.45 Fitted values (X) 15.56: 16.81

Figure 15.2 Residual plot for the voltage data

as to the significance of the variance component for each term in the model. Thestatistic is simply the REML estimate of the variance component divided by thesquare root of the diagonal element (for each component) of the inverse of theaverage information matrix. The diagonal elements of the expected (not the av-erage) information matrix are the asymptotic variances of the REML estimatesof the variance parameters. These Comp/SE statistics cannot be used to test thenull hypothesis that the variance component is zero. If we had used this crudemeasure then the conclusions would have been inconsistent with the conclusionsobtained from the REML log-likelihood ratio (see Table 15.4).


setstat 10 10 0.233417 0.119370E-01 1.35 0 P

setstat.regulatr 80 64 0.601817 0.307771E-01 3.64 0 P

teststat 4 4 0.642752E-01 0.328705E-02 0.98 0 P

Variance 256 255 1.00000 0.511402E-01 9.72 0 P

15.5 Balanced repeated measures - Height

The data for this example is taken from the GENSTAT manual. It consists ofa total of 5 measurements of height (cm) taken on 14 plants. The 14 plantswere either diseased or healthy and were arranged in a glasshouse in a completely

15 Examples 205

Table 15.4: REML log-likelihood ratio for the variance components in the voltagedata

REML −2×terms log-likelihood difference P-value

− setstat 200.31 5.864 .0077− setstat.regulatr 184.15 38.19 .0000− teststat 199.71 7.064 .0039

random design. The heights were measured 1, 3, 5, 7 and 10 weeks after theplants were placed in the glasshouse. There were 7 plants in each treatment. Thedata are depicted in Figure 15.3 obtained by qualifier line!Y y1 !G tmt !JOINin the following multivariate ASReml job.

This is plant data multivariate Y=y1 X=Traits G=tmt

1 2

Y−axis: 21.0000 130.5000 X−axis: 0.5000 5.5000

Figure 15.3 Trellis plot of the height for each of 14 plants

In the following we illustrate how various repeated measures analyses can beconducted in ASReml. For these analyses it is convenient to arrange the datain a multivariate form, with 7 fields representing the plant number, treatmentidentification and the 5 heights. The ASReml input file, up to the specificationof the R structure is given by

15 Examples 206

This is plant data multivariate

tmt !A # Diseased Healthy

plant 14

y1 y3 y5 y7 y10

plantmult.asd !skip 1 !ASUV

The focus is modelling of the error variance for the data. Specifically we fit themultivariate regression model given by

Y = DT + E (15.1)

where Y 14×5 is the matrix of heights, D14×2 is the design matrix, T 2×5 is thematrix of fixed effects and E14×5 is the matrix of errors. The heights taken onthe same plants will be correlated and so we assume that

var (vec(E)) = I14 ⊗Σ (15.2)

where Σ5×5 is a symmetric positive definite matrix.

The variance models used for Σ are given in Table 15.5. These represent somecommonly used models for the analysis of repeated measures data (see Wolfinger,1986). The variance models are fitted by changing the last four lines of the inputfile. The sequence of commands for the first model fitted is given below

y1 y3 y5 y7 y10 ~ Trait tmt Tr.tmt !r units

1 2 0

14

Tr

Table 15.5 Summary of variance models fitted to the plant data

number of REMLmodel parameters log-likelihood BIC

Uniform 2 -196.88 401.95Power 2 -182.98 374.15Heterogeneous Power 6 -171.50 367.57Antedependence (order 1) 9 -160.37 357.51Unstructured 15 -158.04 377.50

15 Examples 207

The split plot in time model can be fitted in two ways, either by fitting a unitsterm plus an independent residual as above, or by specifying a CORU variancemodel for the R-structure as follows

y1 y3 y5 y7 y10 ~ Trait tmt Tr.tmt

1 2 0

14

Tr 0 CORU .5

The two forms for Σ are given by

Σ = σ21J + σ2

2I, unitsΣ = σ2

eI + σ2eρ(J − I), CORU

(15.3)

It follows that

σ2e = σ2

1 + σ22

ρ = σ21

σ21+σ2

2

(15.4)

Portions of the two outputs are given below. The REML log-likelihoods for thetwo models are the same and it is easy to verify that the REML estimates ofthe variance parameters satisfy (15.4), viz. σ2

e = 286.310 ≈ 159.858 + 126.528 =286.386; 159.858/286.386 = 0.558191.

#

#

# !r units

#

LogL=-204.593 S2= 224.61 60 df 0.1000 1.000

LogL=-201.233 S2= 186.52 60 df 0.2339 1.000

LogL=-198.453 S2= 155.09 60 df 0.4870 1.000

LogL=-197.041 S2= 133.85 60 df 0.9339 1.000

LogL=-196.881 S2= 127.56 60 df 1.204 1.000

LogL=-196.877 S2= 126.53 60 df 1.261 1.000



units 14 14 1.26342 159.858 2.11 0 P

Variance 70 60 1.00000 126.528 4.90 0 P

#

# CORU

#

LogL=-196.975 S2= 264.10 60 df 1.000 0.5000

LogL=-196.924 S2= 270.14 60 df 1.000 0.5178

LogL=-196.886 S2= 278.58 60 df 1.000 0.5400

15 Examples 208

LogL=-196.877 S2= 286.23 60 df 1.000 0.5580

LogL=-196.877 S2= 286.31 60 df 1.000 0.5582



Variance 70 60 1.00000 286.310 3.65 0 P

Residual CORRelat 5 0.558191 0.558191 4.28 0 U

A more realistic model for repeated measures data would allow the correlationsto decrease as the lag increases such as occurs with the first order autoregressivemodel. However, since the heights are not measured at equally spaced time pointswe use the EXP model. The correlation function is given by

ρ(u) = φu

where u is the time lag is weeks. The coding for this is


1 2 0 # One error structure in two dimensions

14 # Outer dimension: 14 plants

Tr 0 EXP .5

1 3 5 7 10 # Time coordinates

A portion of the output is given below

LogL=-183.734 S2= 435.58 60 df 1.000 0.9500

LogL=-183.255 S2= 370.40 60 df 1.000 0.9388

LogL=-183.010 S2= 321.50 60 df 1.000 0.9260

LogL=-182.980 S2= 298.84 60 df 1.000 0.9179

LogL=-182.979 S2= 302.02 60 df 1.000 0.9192



Variance 70 60 1.00000 302.021 3.11 0 P

Residual POW-EXP 5 0.918971 0.918971 29.53 0 U

When fitting power models be careful to ensure the scale of the defining variate,here time, does not result in an estimate of φ too close to 1. For example, use ofdays in this example would result in an estimate for φ of about .993.

The residual plot from this analysis is presented in Figure 15.4. This suggestsincreasing variance over time. This can be modelled by using the EXPH model,

15 Examples 209

Residuals plotted against Row and Column position: 1Range: −45.11 34.86

Figure 15.4 Residual plots for the EXP variance model for the plant data

which models Σ byΣ = D0.5CD0.5

where D is a diagonal matrix of variances and C is a correlation matrix withelements given by cij = φ|ti−tj |. The coding for this is


1 2 0

14 !S2==1

Tr 0 EXPH .5 100 200 300 300 300

1 3 5 7 10

Note that it is necessary to fix the scale parameter to 1 (!S2==1) to ensure thatthe elements of D are identifiable. Abbreviated output from this analysis is givenbelow.

1 LogL=-195.598 S2= 1.0000 60 df : 1 components constrained

2 LogL=-179.036 S2= 1.0000 60 df

3 LogL=-175.483 S2= 1.0000 60 df

4 LogL=-173.128 S2= 1.0000 60 df

5 LogL=-171.980 S2= 1.0000 60 df

6 LogL=-171.615 S2= 1.0000 60 df

7 LogL=-171.527 S2= 1.0000 60 df

15 Examples 210

8 LogL=-171.504 S2= 1.0000 60 df

9 LogL=-171.498 S2= 1.0000 60 df

10 LogL=-171.496 S2= 1.0000 60 df








Covariance/Variance/Correlation Matrix POWER

61.11 0.8227 0.6769 0.5569 0.4156

54.88 72.80 0.8227 0.6769 0.5051

93.12 123.5 309.7 0.8227 0.6140

91.02 120.7 302.7 437.1 0.7462

63.57 84.34 211.4 305.3 382.9


8 Trait 5 127.95

1 tmt 1 0.00

9 Tr.tmt 4 4.75

The last two models we fit are the antedependence model of order 1 and theunstructured model. These require, as starting values the lower triangle of thefull variance matrix. We use the REML estimate of Σ from the heterogeneouspower model shown in the previous output. The antedependence model modelsΣ by the inverse cholesky decomposition

Σ−1 = UDU ′

where D is a diagonal matrix and U is a unit upper triangular matrix. Foran antedependence model of order q, then uij = 0 for j > i + q − 1. Theantedependence model of order 1 has 9 parameters for these data, 5 in D and 4in U . The input is given by


1 2 0

14 !S2==1

Tr 0 ANTE

60.16

54.65 73.65

91.50 123.3 306.4

89.17 120.2 298.6 431.8

62.21 83.85 208.3 301.2 379.8

15 Examples 211

The abbreviated output file is given below

1 LogL=-171.501 S2= 1.0000 60 df

2 LogL=-170.097 S2= 1.0000 60 df

3 LogL=-166.085 S2= 1.0000 60 df

4 LogL=-161.335 S2= 1.0000 60 df

5 LogL=-160.407 S2= 1.0000 60 df

6 LogL=-160.370 S2= 1.0000 60 df

7 LogL=-160.369 S2= 1.0000 60 df


Residual ANTE=UDU 1 0.268657E-01 0.268657E-01 2.44 0 U

Residual ANTE=UDU 1 -0.628413 -0.628413 -2.55 0 U








Covariance/Variance/Correlation Matrix ANTE=UDU’

37.20 0.5946 0.3549 0.3114 0.3040

23.38 41.55 0.5968 0.5237 0.5112

34.83 61.89 258.9 0.8775 0.8565

44.58 79.22 331.4 550.8 0.9761

43.14 76.67 320.7 533.0 541.4


8 Trait 5 188.84

1 tmt 1 4.14

9 Tr.tmt 4 3.91

The iterative sequence converged and the antedependence parameter estimatesare printed successively for each time, the element of D and then the row of U .I.e.

D = diag

0.02690.03730.00600.00790.0391

,U =

1 −0.6284 0 0 00 1 −1.4911 0 00 0 1 −1.2804 00 0 0 1 −0.96780 0 0 0 1

.

Finally the input and output files for the unstructured model are presented below.The REML estimate of Σ from the ANTE model is used to provide starting values.

15 Examples 212

###########

#input

###########


1 2 0

14 !S2==1

Tr 0 US

37.20

23.38 41.55

34.83 61.89 258.9

44.58 79.22 331.4 550.8

43.14 76.67 320.7 533.0 541.4

############

# output

############

1 LogL=-160.368 S2= 1.0000 60 df

2 LogL=-159.027 S2= 1.0000 60 df

3 LogL=-158.247 S2= 1.0000 60 df

4 LogL=-158.040 S2= 1.0000 60 df

5 LogL=-158.036 S2= 1.0000 60 df


Residual US=UnStr 1 37.2262 37.2262 2.45 0 U















Covariance/Variance/Correlation Matrix US=UnStructu

37.23 0.5950 0.5259 0.4942 0.5194

23.39 41.52 0.5969 0.3807 0.4170

51.65 61.92 259.1 0.8777 0.8827

70.81 57.61 331.8 551.5 0.9761

73.79 62.57 330.9 533.8 542.2

The antedependence model of order 1 is clearly more parsimonious than theunstructured model. Table 15.6 presents the incremental Wald tests for each ofthe variance models. There is a surprising level of discrepancy between modelsfor the Wald tests. The main effect of treatment is significant for the uniform,power and antedependence models.

15 Examples 213

Table 15.6: Summary of Wald test for fixed effects for variance models fitted to theplant data

treatment treatment.timemodel (df=1) (df=4)

Uniform 9.41 5.10Power 6.86 6.13Heterogeneous power 0.00 4.81Antedependence (order 1) 4.14 3.96Unstructured 1.71 4.46

15.6 Spatial analysis of a field experiment - Barley

In this section we illustrate the ASReml syntax for performing spatial and in-complete block analysis of a field experiment. There has been a large amountof interest in developing techniques for the analysis of spatial data both in thecontext of field experiments and geostatistical data (see for example, Cullis andGleeson, 1991; Cressie, 1991; Gilmour et al., 1997). This example illustrates theanalysis of ’so-called’ regular spatial data, in which the data is observed on alattice or regular grid. This is typical of most small plot designed field exper-iments. Spatial data is often irregularly spaced, either by design or because ofthe observational nature of the study. The techniques we present in the followingcan be extended for the analysis of irregularly spaced spatial data, though, largerspatial data sets may be computationally challenging, depending on the degreeof irregularity or models fitted.

The data we consider is taken from Gilmour et al. (1995) and involves a fieldexperiment designed to compare the performance of 25 varieties of barley. Theexperiment was conducted at Slate Hall Farm, UK in 1976, and was designed asa balanced lattice square with replicates laid out as shown in Table 15.7. Thedata fields were Rep, RowBlk, ColBlk, row, column and yield. Lattice rowand column numbering is typically within replicates and so the terms specified inthe linear model to account for the lattice row and lattice column effects wouldbe Rep.latticerow Rep.latticecolumn. However, in this example lattice rowsand columns are both numbered from 1 to 30 across replicates (see Table 15.7).The terms in the linear model are therefore simply RowBlk ColBlk. Additionalfields row and column indicate the spatial layout of the plots.

15 Examples 214

The ASReml input file is presented below. Three models have been fitted to thesedata. The lattice analysis is included for comparison in PART 3. In PART 1 weuse the separable first order autoregressive model to model the variance structureof the plot errors. Gilmour et al. (1997) suggest this is often a useful model tocommence the spatial modelling process. The form of the variance matrix for theplot errors (R structure) is given by

σ2Σ = σ2(Σc ⊗Σr) (15.5)

where Σc and Σr are 15 × 15 and 10 × 10 matrix functions of the column (φc)and row (φr) autoregressive parameters respectively.

Table 15.7 Field layout of Slate Hall Farm experiment

Column - Replicate levelsRow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 1 1 1 1 1 2 2 2 2 2 3 3 3 3 32 1 1 1 1 1 2 2 2 2 2 3 3 3 3 33 1 1 1 1 1 2 2 2 2 2 3 3 3 3 34 1 1 1 1 1 2 2 2 2 2 3 3 3 3 35 1 1 1 1 1 2 2 2 2 2 3 3 3 3 36 4 4 4 4 4 5 5 5 5 5 6 6 6 6 67 4 4 4 4 4 5 5 5 5 5 6 6 6 6 68 4 4 4 4 4 5 5 5 5 5 6 6 6 6 69 4 4 4 4 4 5 5 5 5 5 6 6 6 6 610 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6

Column - Rowblk levelsRow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 1 1 1 1 1 11 11 11 11 11 21 21 21 21 212 2 2 2 2 2 12 12 12 12 12 22 22 22 22 223 3 3 3 3 3 13 13 13 13 13 23 23 23 23 234 4 4 4 4 4 14 14 14 14 14 24 24 24 24 245 5 5 5 5 5 15 15 15 15 15 25 25 25 25 256 6 6 6 6 6 16 16 16 16 16 26 26 26 26 267 7 7 7 7 7 17 17 17 17 17 27 27 27 27 278 8 8 8 8 8 18 18 18 18 18 28 28 28 28 289 9 9 9 9 9 19 19 19 19 19 29 29 29 29 2910 10 10 10 10 10 20 20 20 20 20 30 30 30 30 30

Column - Colblk levelsRow 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 1 2 3 4 5 6 7 8 9 10 11 12 13 14 152 1 2 3 4 5 6 7 8 9 10 11 12 13 14 153 1 2 3 4 5 6 7 8 9 10 11 12 13 14 154 1 2 3 4 5 6 7 8 9 10 11 12 13 14 155 1 2 3 4 5 6 7 8 9 10 11 12 13 14 156 16 17 18 19 20 21 22 23 24 25 26 27 28 29 307 16 17 18 19 20 21 22 23 24 25 26 27 28 29 308 16 17 18 19 20 21 22 23 24 25 26 27 28 29 309 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3010 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

15 Examples 215

Gilmour et al. (1997) recommend revision of the current spatial model basedon the use of diagnostics such as the sample variogram of the residuals (fromthe current model). This diagnostic and a summary of row and column residualtrends are produced by default with graphical versions of ASReml when a spatialmodel has been fitted to the errors. It can be suppressed, by the use of the -noption on the command line. We have produced the following plots by use of the-g22 option.

Slate Hall example

Rep 6 # Six replicates of 5x5 plots in 2x3 arrangement

RowBlk 30 # Rows within replicates numbered across replicates

ColBlk 30 # Columns within replicates numbered across replicates

row 10 # Field row

column 15 # Field column

variety 25

yield

shf.asd !skip 1 !DOPART 1

!PART 1 # AR1 x AR1

y ~ mu var

predict var

1 2

15 column AR1 0.1 # Second field is specified so ASReml can sort

10 row AR1 0.1 # records properly into field order

!PART 2 # AR1 x AR1 + units

y ~ mu var !r units

predict var

1 2

15 column AR1 0.1

10 row AR1 0.1

!PART 3 # incomplete blocks

y ~ mu var !r Rep Rowblk Colblk

predict var

Abbreviated ASReml output file is presented below. The iterative sequence hasconverged to column and row correlation parameters of (.68377,.45859) respec-tively. The plot size and orientation is not known and so it is not possible toascertain whether these values are spatially sensible. It is generally found thatthe closer the plot centroids, the higher the spatial correlation. This is not alwaysthe case and if the highest between plot correlation relates to the larger spatialdistance then this may suggest the presence of extraneous variation (see Gilmouret al., 1997), for example. Figure 15.5 presents a plot of the sample variogramof the residuals from this model. The plot appears in reasonable agreement withthe model.

15 Examples 216

The next model includes a measurement error or nugget effect component. Thatis the variance model for the plot errors is now given by

σ2Σ = σ2(Σc ⊗Σr) + ψI150 (15.6)

where ψ is the ratio of nugget variance to error variance (σ2). The abbreviatedoutput for this model is given below. There is a significant improvement in theREML log-likelihood with the inclusion of the nugget effect (see Table 15.8).

Slate Hall example f1 1 Variogram of residuals 26 Aug 2002 17:08:51

0

1.569649


Figure 15.5 Sample variogram of the AR1×AR1 model for the Slate Hall data

# AR1 x AR1

#

1 LogL=-739.681 S2= 36034. 125 df 1.000 0.1000 0.1000

2 LogL=-714.340 S2= 28109. 125 df 1.000 0.4049 0.1870

3 LogL=-703.338 S2= 29914. 125 df 1.000 0.5737 0.3122

4 LogL=-700.371 S2= 37464. 125 df 1.000 0.6789 0.4320

5 LogL=-700.324 S2= 38602. 125 df 1.000 0.6838 0.4542

6 LogL=-700.322 S2= 38735. 125 df 1.000 0.6838 0.4579

7 LogL=-700.322 S2= 38754. 125 df 1.000 0.6838 0.4585

8 LogL=-700.322 S2= 38757. 125 df 1.000 0.6838 0.4586



Variance 150 125 1.00000 38756.6 5.00 0 P



15 Examples 217


8 mu 1 850.71

6 variety 24 13.04

# AR1 x AR1 + units

1 LogL=-740.735 S2= 33225. 125 df : 2 components constrained


3 LogL=-698.498 S2= 46239. 125 df

4 LogL=-696.847 S2= 44725. 125 df

5 LogL=-696.823 S2= 45563. 125 df

6 LogL=-696.823 S2= 45753. 125 df

7 LogL=-696.823 S2= 45796. 125 df


units 150 150 0.106154 4861.48 2.72 0 P

Variance 150 125 1.00000 45796.3 2.74 0 P




8 mu 1 259.82

6 variety 24 10.22

The lattice analysis (with recovery of between block information) is presentedbelow. This variance model is not competitive with the preceding spatial models.The models can be formally compared using the BIC values for example.

# IB analysis

1 LogL=-734.184 S2= 26778. 125 df

2 LogL=-720.060 S2= 16591. 125 df

3 LogL=-711.119 S2= 11173. 125 df

4 LogL=-707.937 S2= 8562.4 125 df

5 LogL=-707.786 S2= 8091.2 125 df

6 LogL=-707.786 S2= 8061.8 125 df

7 LogL=-707.786 S2= 8061.8 125 df


Rep 6 6 0.528714 4262.39 0.62 0 P

RowBlk 30 30 1.93444 15595.1 3.06 0 P

ColBlk 30 30 1.83725 14811.6 3.04 0 P

Variance 150 125 1.00000 8061.81 6.01 0 P


8 mu 1 1216.29

6 variety 24 8.84

15 Examples 218

Finally, we present portions of the .pvs files to illustrate the prediction facility ofASReml . The first five and last three variety means are presented for illustration.The overall SED printed is the square root of the average variance of differencebetween the variety means. The two spatial analyses have a range of SEDs whichare available if the !SED qualifier is used. All variety comparisons have the sameSED from the third analysis as the design is a balanced lattice square. The F-teststatistics for the spatial models are greater than for the lattice analysis. We notethe F-test for the AR1×AR1 + units model is smaller than the F-test for theAR1×AR1.


#AR1 x AR1


1.0000 1257.9763 64.6146 E

2.0000 1501.4483 64.9783 E

3.0000 1404.9874 64.6260 E

4.0000 1412.5674 64.9027 E

5.0000 1514.4764 65.5889 E

. . .

23.0000 1311.4888 64.0767 E

24.0000 1586.7840 64.7043 E

25.0000 1592.0204 63.5939 E


#AR1 x AR1 + units


1.0000 1245.5843 97.8591 E

2.0000 1516.2331 97.8473 E

3.0000 1403.9863 98.2398 E

4.0000 1404.9202 97.9875 E

5.0000 1471.6197 98.3607 E

. . .

23.0000 1316.8726 98.0402 E

24.0000 1557.5278 98.1272 E

25.0000 1573.8920 97.9803 E


# IB

Rep is ignored in the prediction

RowBlk is ignored in the prediction

ColBlk is ignored in the prediction


1.0000 1283.5870 60.1994 E

2.0000 1549.0133 60.1994 E

3.0000 1420.9307 60.1994 E

4.0000 1451.8554 60.1994 E

5.0000 1533.2749 60.1994 E

15 Examples 219

. . .

23.0000 1329.1088 60.1994 E

24.0000 1546.4699 60.1994 E

25.0000 1630.6285 60.1994 E


Notice the differences in SE and SED associated with the various models. Choos-ing a model on the basis of smallest SE or SED is not recommended because themodel is not necessarily fitting the variability present in the data.

Table 15.8 Summary of models for the Slate Hall data

REML number ofmodel log-likelihood parameters F-test SED

AR1×AR1 -700.32 3 13.04 59.0AR1×AR1 + units -696.82 4 10.22 60.5IB -707.79 4 8.84 62.0

15.7 Unreplicated early generation variety trial - Wheat

To further illustrate the approaches presented in the previous section, we con-sider an unreplicated field experiment conducted at Tullibigeal situated in south-western NSW. The trial was an S1 (early stage) wheat variety evaluation trialand consisted of 525 test lines which were randomly assigned to plots in a 67by 10 array. There was a check plot variety every 6 plots within each column.That is the check variety was sown on rows 1,7,13,. . . ,67 of each column. Thisvariety was numbered 526. A further 6 replicated commercially available varieties(numbered 527 to 532) were also randomly assigned to plots with between 3 to5 plots of each. The aim of these trials is to identify and retain the top, say 20%of lines for further testing. Cullis et al. (1989) considered the analysis of earlygeneration variety trials, and presented a one-dimensional spatial analysis whichwas an extension of the approach developed by Gleeson and Cullis (1987). Thetest line effects are assumed random, while the check variety effects are consid-ered fixed. This may not be sensible or justifiable for most trials and can lead toinconsistent comparisons between check varieties and test lines. Given the largeamount of replication afforded to check varieties there will be very little shrinkageirrespective of the realised heritability.

15 Examples 220

In the following we assume that the variety effects (including both check, repli-cated and unreplicated lines) is random. We consider an initial analysis withspatial correlation in one direction and present three further spatial models forcomparison. The ASReml input file is presented below.

Tullibigeal trial

linenum

yield

weed

column 10

row 67

variety 532 # testlines 1:525, check lines 526:532

tulli.asd !skip 1 !DOPART 1

!PART 1 # AR1 x I

y ~ mu weed mv !r variety

1 2

67 row AR1 0.1

10 column I 0

!PART 2 # AR1 x AR1

y ~ mu weed mv !r variety

1 2

67 row AR1 0.1

10 column AR1 0.1

!PART 3 # AR1 x AR1 + column trend

y ~ mu weed pol(column,-1) mv !r variety

1 2

67 row AR1 0.1

10 column AR1 0.1

!PART 4 # AR1 x AR1 + Nugget + column trend

y ~ mu weed pol(column,-1) mv !r variety units

1 2

67 row AR1 0.1

10 column AR1 0.1

predict var

The data fields represent the factors variety, row and column, a covariate weedand the plot yield (yield). There are three parts in the ASReml file. We beginwith the one-dimensional spatial model, which assumes the variance model forthe plot effects within columns is described by a first order autoregressive process.The abbreviated output file is presented below.

1 LogL=-4280.75 S2= 0.12850E+06 666 df 0.1000 1.000 0.1000

2 LogL=-4268.57 S2= 0.12138E+06 666 df 0.1516 1.000 0.1798

15 Examples 221

3 LogL=-4255.89 S2= 0.10968E+06 666 df 0.2977 1.000 0.2980

4 LogL=-4243.76 S2= 88033. 666 df 0.7398 1.000 0.4939

5 LogL=-4240.59 S2= 84420. 666 df 0.9125 1.000 0.6016

6 LogL=-4240.01 S2= 85617. 666 df 0.9344 1.000 0.6428

7 LogL=-4239.91 S2= 86032. 666 df 0.9474 1.000 0.6596

8 LogL=-4239.88 S2= 86189. 666 df 0.9540 1.000 0.6668

9 LogL=-4239.88 S2= 86253. 666 df 0.9571 1.000 0.6700

10 LogL=-4239.88 S2= 86280. 666 df 0.9585 1.000 0.6714



variety 532 532 0.959184 82758.6 8.98 0 P

Variance 670 666 1.00000 86280.2 9.12 0 P


Analysis of Variance DF F-incr F-adj

7 mu 1 9799.18 9336.79

3 weed 1 109.33 109.33

The iterative sequence converged, the REML estimate of the autoregressive pa-rameter indicating substantial within column heterogeneity.

The abbreviated output from the two-dimensional AR1×AR1 spatial model isnow presented.

1 LogL=-4277.99 S2= 0.12850E+06 666 df

2 LogL=-4266.13 S2= 0.12097E+06 666 df

3 LogL=-4253.05 S2= 0.10777E+06 666 df

4 LogL=-4238.72 S2= 83156. 666 df

5 LogL=-4234.53 S2= 79868. 666 df

6 LogL=-4233.78 S2= 82024. 666 df

7 LogL=-4233.67 S2= 82725. 666 df

8 LogL=-4233.65 S2= 82975. 666 df

9 LogL=-4233.65 S2= 83065. 666 df

10 LogL=-4233.65 S2= 83100. 666 df


variety 532 532 1.06038 88117.5 9.92 0 P

Variance 670 666 1.00000 83100.1 8.90 0 P




7 mu 1 6248.65 6117.37

3 weed 1 85.84 85.84

The change in REML log-likelihood is significant (χ21 = 12.46, p < .001) with

15 Examples 222

the inclusion of the autoregressive parameter for columns. Figure 15.6 presentsthe sample variogram of the residuals for the AR1×AR1 model. There is anindication that a linear drift from column 1 to column 10 is present. We includea linear regression coefficient pol(column,-1) in the model to account for this.Note we use the ’-1’ option in the pol term to exclude the overall constant inthe regression, as it is already fitted. The linear regression of column number onyield is significant (t = −2.96). The sample variogram ( 15.7) is more satisfactory,though interpretation of variograms is often difficult, particularly for unreplicatedtrials. This is an issue for further research.

Tullibigeal trial i2 1 Variogram of residuals 26 Aug 2002 19:03:11

0

1.130514


Figure 15.6 Sample variogram of the AR1×AR1 model for the Tullibigeal data

The abbreviated output for this model and the final model in which a nuggeteffect has been included is presented below.

#AR1xAR1 + pol(column,-1)

1 LogL=-4270.99 S2= 0.12730E+06 665 df

2 LogL=-4258.95 S2= 0.11961E+06 665 df

3 LogL=-4245.27 S2= 0.10545E+06 665 df

4 LogL=-4229.50 S2= 78387. 665 df

5 LogL=-4226.02 S2= 75375. 665 df

6 LogL=-4225.64 S2= 77373. 665 df

7 LogL=-4225.60 S2= 77710. 665 df

8 LogL=-4225.60 S2= 77786. 665 df

9 LogL=-4225.60 S2= 77806. 665 df


15 Examples 223

variety 532 532 1.14370 88986.3 9.91 0 P

Variance 670 665 1.00000 77806.0 8.79 0 P




7 mu 1 7082.49 6726.77

3 weed 1 91.96 70.00

8 pol(column,-1) 1 8.74 8.74

#

#AR1xAR1 + units + pol(column,-1)

#

1 LogL=-4272.74 S2= 0.11683E+06 665 df : 1 components constrained


3 LogL=-4228.96 S2= 76724. 665 df

4 LogL=-4220.63 S2= 55858. 665 df

5 LogL=-4220.19 S2= 54431. 665 df

6 LogL=-4220.18 S2= 54732. 665 df

7 LogL=-4220.18 S2= 54717. 665 df

8 LogL=-4220.18 S2= 54715. 665 df


variety 532 532 1.34824 73769.0 7.08 0 P

units 670 670 0.556400 30443.6 3.77 0 P

Variance 670 665 1.00000 54715.2 5.15 0 P




7 mu 1 4241.39 4259.90

3 weed 1 86.39 72.73

8 pol(column,-1) 1 4.84 4.84

The increase in REML log-likelihood is significant. The predicted means for thevarieties can be produced and printed in the .pvs file as shown below

Warning: mv_estimates is ignored for prediction

Warning: units is ignored for prediction

---- ---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ---- ---- ----

column evaluated at 5.5000

weed is evaluated at average value of 0.4597



1.0000 2917.1782 179.2881 E

2.0000 2957.7405 178.7688 E

3.0000 2872.7615 176.9880 E

15 Examples 224

Tullibigeal trial i3 1 Variogram of residuals 26 Aug 2002 19:03:22

0

0.955297


Figure 15.7: Sample variogram of the AR1×AR1 + pol(column,-1) model for theTullibigeal data

4.0000 2986.4725 178.7424 E

. . .

522.0000 2784.7683 179.1541 E

523.0000 2904.9421 179.5383 E

524.0000 2740.0330 178.8465 E

525.0000 2669.9565 179.2444 E

526.0000 2385.9806 44.2159 E

527.0000 2697.0670 133.4406 E

528.0000 2727.0324 112.2650 E

529.0000 2699.8243 103.9062 E

530.0000 3010.3907 112.3080 E

531.0000 3020.0720 112.2553 E

532.0000 3067.4479 112.6645 E


Note that the (replicated) check lines have lower SE than the (unreplicated) testlines. There will also be large diffeneces in SEDs. Rather than obtaining the largetable of all SEDs, you could do the prediction in partspredict var 1:525 column 5.5predict var 526:532 column 5.5 !SEDto examine the matrix of pairwise prediction errors of variety differences.

15 Examples 225

15.8 Paired Case-Control study - Rice

This data is concerned with an experiment conducted to investigate the toleranceof rice varieties to attack by the larvae of bloodworms. The data have been kindlyprovided by Dr. Mark Stevens, Yanco Agricultural Institute. A full descriptionof the experiment is given by Stevens et al. (1999). Bloodworms are a significantpest of rice in the Murray and Murrumbidgee irrigation areas where they cancause poor establishment and substantial yield loss.

The experiment commenced with the transplanting of rice seedlings into trays.Each tray contained 32 seedlings and the trays were paired so that a controltray (no bloodworms) and a treated tray (bloodworms added) were grown in acontrolled environment room for the duration of the experiment. At the endof this time rice plants were carefully extracted, the root system washed androot area determined for the tray using an image analysis system described byStevens et al. (1999). Two pairs of trays, each pair corresponding to a differentvariety, were included in each run. A new batch of bloodworm larvae was usedfor each run. A total of 44 varieties was investigated with three replicates of each.Unfortunately the variety concurrence within runs was less than optimal. Eightvarieties occurred with only one other variety, 22 with two other varieties andthe remaining 14 with three different varieties.

In the next three sections we present an exhaustive analysis of these data usingequivalent univariate and multivariate techniques. It is convenient to use two datafiles one for each approach. The univariate data file consists of factors pair, run,variety, tmt, unit and variate rootwt. The factor unit labels the individualtrays, pair labels pairs of trays (to which varieties are allocated) and tmt is thetwo level bloodworm treatment factor (control/treated). The multivariate datafile consists of factors variety and run and variates for root weight of both thecontrol and exposed treatments (labelled yc and ye respectively).

Figure 15.8 presents a plot of the treated and the control root area (on thesquare root scale) for each variety. There is a strong dependence between thetreated and control root area, which is not surprising. The aim of the experimentwas to determine the tolerance of varieties to bloodworms thence identify themost tolerant varieties. The definition of tolerance should allow for the factthat varieties differ in their inherent seedling vigour (Figure 15.8). The originalapproach of the scientist was to regress the treated root area against the controlroot area and define the index of vigour as the residual from this regression. Thisapproach is clearly inefficient since there is error in both variables. We seek to

15 Examples 226

determine an index of tolerance from the joint analysis of treated and controlroot area.

this is for the paired data Y=sye X=syc

Y−axis: 1.8957 14.8835 X−axis: 8.2675 23.5051

Figure 15.8: Rice bloodworm data: Plot of square root of root weight for treatedversus control

Standard analysis

Preliminary analyses indicated variance heterogeneity so that subsequent analyseswere conducted on the square root scale. The allocation of bloodworm treatmentswithin varieties and varieties within runs defines a nested block structure of theform

run/variety/tmt = run + run.variety + run.variety.tmt

( = run + pair + pair.tmt )

( = run + run.variety + units )

There is an additional blocking term, however, due to the fact that the blood-worms within a run are derived from the same batch of larvae whereas betweenruns the bloodworms come from different sources. This defines a block structureof the form

run/tmt/variety = run + run.tmt + run.tmt.variety

( = run + run.tmt + pair.tmt )

15 Examples 227

Combining the two provides the full block structure for the design, namely

run + run.variety + run.tmt + run.tmt.variety

= run + run.variety + run.tmt + units

= run + pair + run.tmt + pair.tmt

In line with the aims of the experiment the treatment structure comprises va-riety and treatment main effects and treatment by variety interactions. In thetraditional approach the terms in the block structure are regarded as randomand the treatment terms as fixed. The choice of treatment terms as fixed orrandom depends largely on the aims of the experiment. The aim of this exampleis to select the ”best” varieties. The definition of best is somewhat more com-plex since it does not involve the single trait sqrt(rootwt) but rather two traits,namely sqrt(rootwt) in the presence/absence of bloodworms. Thus to minimiseselection bias the variety main effects and thence the tmt.variety interactionsare taken as random. The main effect of treatment is fitted as fixed to allow forthe likely scenario that rather than a single population of treatment by varietyeffects there are in fact two populations (control and treated) with a differentmean for each. There is evidence of this prior to analysis with the large differ-ence in mean sqrt(rootwt) for the two groups (14.93 and 8.23 for control andtreated respectively). The inclusion of tmt as a fixed effect ensures that BLUPs

of tmt.variety effects are shrunk to the correct mean (treatment means ratherthan an overall mean).

The model for the data is given by

y = Xτ + Z1u1 + Z2u2 + Z3u3 + Z4u4 + Z5u5 + e (15.7)

where y is a vector of length n = 264 containing the sqrt(rootwt) values, τ

corresponds to a constant term and the fixed treatment contrast and u1 . . .u5

correspond to random variety, treatment by variety, run, treatment by run andvariety by run effects. The random effects and error are assumed to be indepen-dent Gaussian variables with zero means and variance structures var (ui) = σ2

i Ibi

(where bi is the length of ui, i = 1 . . . 5) and var (e) = σ2In.

The ASReml code for this analysis is given by

Bloodworm data Dr M Stevens

pair 132

rootwt

15 Examples 228

run 66

tmt 2 !A

id

variety 44 !A

worm.txt !skip 1 !DOPART 1

!PART 1

sqrt(rootwt) ~ mu tmt !r variety variety.tmt run pair run.tmt

0 0 0

!PART 2

sqrt(rootwt) ~ mu tmt !r variety tmt.variety run pair tmt.run,

uni(tmt,2)

0 0 2

tmt.variety 2

2 0 DIAG .1 .1 !GU

44 0 0

tmt.run 2

2 0 DIAG .1 .1 !GU

66 0 0

The two parts in the input file define the two univariate analyses we will conduct.We consider the results from the analysis defined in PART 1 first. A portion ofthe output file is presented below.

1 LogL=-369.089 S2= 4.1533 262 df

2 LogL=-357.871 S2= 2.6834 262 df

3 LogL=-349.520 S2= 1.7986 262 df

4 LogL=-345.739 S2= 1.3719 262 df

5 LogL=-345.306 S2= 1.3216 262 df

6 LogL=-345.267 S2= 1.3155 262 df

7 LogL=-345.264 S2= 1.3149 262 df

8 LogL=-345.263 S2= 1.3149 262 df


variety 44 44 1.80947 2.37920 3.01 0 P

run 66 66 0.244243 0.321144 0.59 0 P

variety.tmt 88 88 0.374220 0.492047 1.78 0 P

pair 132 132 0.742328 0.976057 2.51 0 P

run.tmt 132 132 1.32973 1.74841 3.65 0 P

Variance 264 262 1.00000 1.31486 4.42 0 P


7 mu 1 1484.96

4 tmt 1 469.35

The estimated variance components from this analysis are given in column (a) oftable 15.9. The variance component for the variety main effects is large. There

15 Examples 229

Table 15.9: Estimated variance components from univariate analyses of bloodwormdata. (a) Model with homogeneous variance for all terms and (b) Model with het-erogeneous variance for interactions involving tmt

(a) (b)source control treated

variety 2.378 2.334tmt.variety 0.492 1.505 -0.372run 0.321 0.319tmt.run 1.748 1.388 2.223variety.run (pair) 0.976 0.987tmt.pair 1.315 1.156 1.359

REML log-likelihood -345.256 -343.22

is evidence of tmt.variety interactions so we may expect some discriminationbetween varieties in terms of tolerance to bloodworms.

Given the large difference (p < 0.001) between tmt means we may wish to allowfor heterogeneity of variance associated with tmt. Thus we fit a separate varietyvariance for each level of tmt so that instead of assuming var (u2) = σ2

2I88 weassume

var (u2) =

[σ2

2c 00 σ2

2t

]⊗ I44

where σ22c and σ2

2t are the tmt.variety interaction variances for control andtreated respectively. This model can be achieved using a diagonal variance struc-ture for the treatment part of the interaction. We also fit a separate run variancefor each level of tmt and heterogeneity at the residual level, by including theuni(tmt,2) term. We have chosen level 2 of tmt as we expect more variationfor the exposed treatment and thus the extra variance component for this termshould be positive. Had we mistakenly specified level 1 then ASReml would haveestimated a negative component by setting the !GU option for this term. Theportion of the ASReml output for this analysis is given below.

1 LogL=-369.991 S2= 4.0377 262 df

2 LogL=-357.468 S2= 2.5831 262 df


15 Examples 230




7 LogL=-343.234 S2= 1.1531 262 df

8 LogL=-343.228 S2= 1.1572 262 df

9 LogL=-343.228 S2= 1.1563 262 df


variety 44 44 2.01903 2.33451 3.01 0 P

run 66 66 0.276045 0.319178 0.59 0 P

pair 132 132 0.853941 0.987372 2.59 0 P

uni(tmt,2) 264 264 0.176158 0.203684 0.32 0 P

Variance 264 262 1.00000 1.15625 2.77 0 P

tmt.variety DIAGonal 1 1.30142 1.50477 2.26 0 U

tmt.variety DIAGonal 2 -0.321901 -0.372199 -0.82 0 U

tmt.run DIAGonal 1 1.20098 1.38864 2.18 0 U

tmt.run DIAGonal 2 1.92457 2.22530 3.07 0 U


7 mu 1 1276.91

4 tmt 1 448.86

The estimated variance components from this analysis are given in column (b)of table 15.9. There is no significant variance heterogeneity at the residual ortmt.run level. This indicates that the square root transformation of the data hassuccessfully stabilised the error variance. There is, however, significant varianceheterogeneity for tmt.variety interactions with the variance being much greaterfor the control group. This reflects the fact that in the absence of bloodworms thepotential maximum root area is greater. Note that the tmt.variety interactionvariance for the treated group is negative. The negative component is meaningful(and in fact necessary and obtained by use of the !GU option) in this context sinceit should be considered as part of the variance structure for the combined varietymain effects and treatment by variety interactions. That is,

var (12 ⊗ u1 + u2) =

[σ2

1 + σ22c σ2

1

σ21 σ2

1 + σ22t

]⊗ I44 (15.8)

Using the estimates from table 15.9 this structure is estimated as[

3.84 2.332.33 1.96

]⊗ I44

Thus the variance of the variety effects in the control group (also known as thegenetic variance for this group) is 3.84. The genetic variance for the treated groupis much lower (1.96). The genetic correlation is 2.33/

√3.84 ∗ 1.96 = 0.85 which

is strong, supporting earlier indications of the dependence between the treatedand control root area (Figure 15.8).

15 Examples 231

Table 15.10 Equivalence of random effects in bivariate and univariate analyses

bivariate univariateeffects (model 15.10) (model 15.7)

trait.variety uv 12 ⊗ u1 + u2

trait.run ur 12 ⊗ u3 + u4

trait.pair e∗ 12 ⊗ u5 + e

A multivariate approach

In this simple case in which the variance heterogeneity is associated with the twolevel factor tmt, the analysis is equivalent to a bivariate analysis in which the twotraits correspond to the two levels of tmt, namely sqrt(rootwt) for control andtreated. The model for each trait is given by

yj = Xτ j + Zvuvj + Zrurj + ej (j = c, t) (15.9)

where yj is a vector of length n = 132 containing the sqrtroot values for variatej (j = c for control and j = t for treated), τ j corresponds to a constant termand uvj and urj correspond to random variety and run effects. The design ma-trices are the same for both traits. The random effects and error are assumedto be independent Gaussian variables with zero means and variance structuresvar

(uvj

)= σ2

vjI44, var

(urj

)= σ2

rjI66 and var (ej) = σ2

j I132. The bivariatemodel can be written as a direct extension of (15.9), namely

y = (I2 ⊗X) τ + (I2 ⊗Zv) uv + (I2 ⊗Zr) ur + e∗ (15.10)

where y = (y′c, y′t)′, uv =

(u′vc

, u′vt

)′, ur =(u′rc

, u′rt

)′ and e∗ = (e′c,e′t)′.

There is an equivalence between the effects in this bivariate model and the uni-variate model of (15.7). The variety effects for each trait (uv in the bivariatemodel) are partitioned in (15.7) into variety main effects and tmt.variety in-teractions so that uv = 12 ⊗ u1 + u2. There is a similar partitioning for the runeffects and the errors (see table 15.10).

In addition to the assumptions in the models for individual traits (15.9) the bi-variate analysis involves the assumptions cov (uvc) u′vt

= σvctI44, cov (urc) u′rt=

σrctI66 and cov (ec) e′t = σctI132. Thus random effects and errors are correlatedbetween traits. So, for example, the variance matrix for the variety effects for

15 Examples 232

each trait is given by

var (uv) =

[σ2

vcσvct

σvct σ2vt

]⊗ I44

This unstructured form for trait.variety in the bivariate analysis is equiv-alent to the variety main effect plus heterogeneous tmt.variety interactionvariance structure (15.8) in the univariate analysis. Similarly the unstructuredform for trait.run is equivalent to the run main effect plus heterogeneoustmt.run interaction variance structure. The unstructured form for the errors(trait.pair) in the bivariate analysis is equivalent to the pair plus heteroge-neous error (tmt.pair) variance in the univariate analysis.

This bivariate analysis is achieved in ASReml as follows, noting that the tmtfactor here is equivalent to traits

this is for the paired data

id

pair 132

run 66

variety 44 !A

yc ye

wormm.asd !skip 1 !X syc !Y sye

sqrt(yc) sqrt(ye) ~ Trait !r Tr.variety Tr.run

1 2 2

132 !S2==1

Tr 0 US 2.21 1.1 2.427

Tr.variety 2

2 0 US 1.401 1 1.477

44 0 0

Tr.run 2

2 0 US .79 .5 2.887

66 0 0

predict variety

A portion of the output from this analysis is presented below.

1 LogL=-354.998 S2= 1.0000 262 df

2 LogL=-350.056 S2= 1.0000 262 df

3 LogL=-345.557 S2= 1.0000 262 df

4 LogL=-343.386 S2= 1.0000 262 df

5 LogL=-343.226 S2= 1.0000 262 df

6 LogL=-343.220 S2= 1.0000 262 df

7 LogL=-343.220 S2= 1.0000 262 df

8 LogL=-343.220 S2= 1.0000 262 df

15 Examples 233





Tr.variety UnStruct 1 3.83959 3.83959 3.47 0 U



Tr.run UnStruct 1 1.70788 1.70788 2.62 0 U

Tr.run UnStruct 1 0.319145 0.319145 0.59 0 U

Tr.run UnStruct 2 2.54326 2.54326 3.20 0 U

Covariance/Variance/Correlation Matrix UnStructured

2.144 0.4402

0.9874 2.348


3.840 0.8504

2.334 1.962


1.708 0.1531

0.3191 2.543

The resultant REML log-likelihood is identical to that of the heterogeneous uni-variate analysis (column (b) of table 15.9). The estimated variance parametersare given in Table 15.11.

Predicted variety means are produced in the .pvs. These are used in the followingsection on interpretation of results. A portion of the file is presented below. Thereis a wide range in SED reflecting the imbalance of the variety concurrence withinruns.

Assuming Power transformation was (Y+ 0.000)^ 0.500

Table 15.11: Estimated variance parameters from bivariate analysis of bloodwormdata

control treatedsource variance variance covariance

us(trait).variety 3.84 1.96 2.33us(trait).run 1.71 2.54 0.32us(trait).pair 2.14 2.35 0.99

15 Examples 234

run is ignored in the prediction (except where specifically included

Trait variety Power_value Stand_Error Ecode Retransformed approx_SE

sqrt(yc) AliCombo 14.9532 0.9181 E 223.5982 27.4571

sqrt(ye) AliCombo 7.9941 0.7993 E 63.9054 12.7790

sqrt(yc) Bluebelle 13.1033 0.9310 E 171.6969 24.3980

sqrt(ye) Bluebelle 6.6299 0.8062 E 43.9559 10.6901

sqrt(yc) C22 16.6679 0.9181 E 277.8192 30.6057

sqrt(ye) C22 8.9543 0.7993 E 80.1798 14.3140

sqrt(yc) Chiyohikari 17.3393 0.9301 E 300.6510 32.2538

sqrt(ye) Chiyohikari 9.6433 0.8057 E 92.9927 15.5393

. . . . . . .

sqrt(yc) YRK1 15.1859 0.9549 E 230.6103 29.0012

sqrt(ye) YRK1 8.3356 0.8190 E 69.4817 13.6534

sqrt(yc) YRK3 13.3057 0.9549 E 177.0428 25.4106

sqrt(ye) YRK3 8.1133 0.8190 E 65.8264 13.2894


-2

-1

0

1

2

-2 -1 0 1 2 3

control BLUP

expo

sed

BLUP

Figure 15.9 BLUPs for treated for each variety plotted against BLUPs for control

15 Examples 235

Interpretation of results

Recall that the researcher is interested in varietal tolerance to bloodworms. Thiscould be defined in various ways. One option is to consider the regression implicitin the variance structure for the trait by variety effects. The variance structurecan arise from a regression of treated variety effects on control effects, namely

uvt = βuvc + ε

where the slope β = σvct/σ2vc

. Tolerance can be defined in terms of the deviationsfrom regression, ε. Varieties with large positive deviations have greatest toleranceto bloodworms. Note that this is similar to the researcher’s original intentionsexcept that the regression has been conducted at the genotypic rather than thephenotypic level. In Figure 15.9 the BLUPs for treated have been plotted againstthe BLUPs for control for each variety and the fitted regression line (slope = 0.61)has been drawn. Varieties with large positive deviations from the regression lineinclude YRK3, Calrose, HR19 and WC1403.

-0.5

0.0

0.5

-2 -1 0 1 2 3

control BLUP

BLUP

regr

essio

n re

sidua

l

Figure 15.10: Estimated deviations from regression of treated on control for eachvariety plotted against estimate for control

15 Examples 236

An alternative definition of tolerance is the simple difference between treatedand control BLUPs for each variety, namely δ = uvc −uvt . Unless β = 1 the twomeasures ε and δ have very different interpretations. The key difference is thatε is a measure which is independent of inherent vigour whereas δ is not. To seethis consider

cov (ε) u′vc= cov (uvt − βuvc) u′vc

=

(σvct −

σvct

σ2vc

σ2vc

)I44

= 0

whereas

cov (δ)u′vc= cov (uvc − uvt) u′vc

=(σ2

vc− σvct

)I44

-1

0

1

-2 -1 0 1 2 3

control BLUP

cont

rol B

LUP

- exp

osed

BLU

P

Figure 15.11: Estimated difference between control and treated for each varietyplotted against estimate for control

15 Examples 237

The independence of ε and uvc and dependence between δ and uvc is clearlyillustrated in Figures 15.10 and 15.11. In this example the two measures haveprovided very different rankings of the varieties. The choice of tolerance mea-sure depends on the aim of the experiment. In this experiment the aim was toidentify tolerance which is independent of inherent vigour so the deviations fromregression measure is preferred.

15.9 Balanced longitudinal data - Random coefficients and cubicsmoothing splines - Oranges

We now illustrate the use of random coefficients and cubic smoothing splinesfor the analysis of balanced longitudinal data. The implementation of cubicsmoothing splines in ASReml was originally based on the mixed model formulationpresented by Verbyla et al. (1999). More recently the technology has beenenhanced so that the user can specify knot points; in the original approach theknot points were taken to be the ordered set of unique values of the explanatoryvariable. The specification of knot points is particularly useful if the number ofunique values in the explanatory variable is large, or if units are measured atdifferent times.

The data we use was originally reported by Draper and Smith (1998, ex24N, p559)and has recently been reanalysed by Pinheiro and Bates (2000, p338). The dataare displayed in Figure 15.12 and are the trunk circumferences (in millimetres) ofeach of 5 trees taken at 7 times. All trees were measured at the same time so thatthe data are balanced. The aim of the study is unclear, though, both previousanalyses involved modelling the overall ‘growth’ curve, accounting for the obviousvariation in both level and shape between trees. Pinheiro and Bates (2000) useda nonlinear mixed effects modelling approach, in which they modelled the growthcurves by a three parameter logistic function of age, given by

y =φ1

1 + exp [−(x− φ2)/φ3](15.11)

where y is the trunk circumference, x is the tree age in days since December 311968, φ1 is the asymptotic height, φ2 is the inflection point or the time at whichthe tree reaches 0.5φ1, φ3 is the time elapsed between trees reaching half andabout 3/4 of φ1.

The datafile consists of 5 columns viz, Tree, a factor with 5 levels, x, tree age indays since 31st December 1968, circ the trunk circumference and season. Thelast column season was added after noting that tree age spans several years and if

15 Examples 238

this is the orange data circ age Tree

1 2 3

4 5

Figure 15.12 Trellis plot of trunk circumference for each tree

converted to day of year, measurements were taken in either Spring (April/May)or Autumn (September/October).

First we demonstrate the fitting of a cubic spline in ASReml by restricting thedataset to tree 1 only. The model includes the intercept and linear regressionof trunk circumference on x and an additional random term spl(x,7) whichinstructs ASReml to include a random term with a special design matrix with7 − 2 = 5 columns which relate to the vector, δ whose elements δi, i = 2, . . . , 6are the second differentials of the cubic spline at the knot points. The seconddifferentials of a natural cubic spline are zero at the first and last knot points(Green and Silverman, 1994). The ASReml job is

this is the orange data, for tree 1

seq # record number is not used

Tree 5

x # 118 484 664 1004 1231 1372 1582

circ

season !L Spring Autumn

orange.asd !skip 1 !filter 2 !select 1

!SPLINE spl(x,7) 118 484 664 1004 1231 1372 1582

!PVAL x 150 200:1500

circ ~ mu x !r spl(x,7)

predict x

15 Examples 239

Note that the data for tree 1 has been selected by use of the !filter and !selectqualifiers. Also note the use of !PVAL so that the spline curve is properly predictedat the additional nominated points. These additional data points are required forASReml to form the design matrix to properly interpolate the cubic smoothingspline between knot points in the prediction process. Since the spline knot pointsare specifically nominated in the !SPLINE line, these extra points have no effecton the analysis run time. The !SPLINE line does not modify the analysis in thisexample since it simply nominates the 7 ages in the data file. The same analysiswould result if the !SPLINE line was omitted and spl(x,7) in the model wasreplaced with spl(x).

An extract of the output file is presented below

1 LogL=-20.9043 S2= 48.470 5 df 0.1000 1.000

2 LogL=-20.9017 S2= 49.022 5 df 0.9266E-01 1.000

3 LogL=-20.8999 S2= 49.774 5 df 0.8356E-01 1.000

4 LogL=-20.8996 S2= 50.148 5 df 0.7937E-01 1.000

5 LogL=-20.8996 S2= 50.213 5 df 0.7866E-01 1.000

Final parameter values 0.78798E-01 1.0000


spl(x,7) 5 5 0.787982E-01 3.95671 0.40 0 P

Variance 7 5 1.00000 50.2133 1.32 0 P


7 mu 1 1382.13 18.03

3 x 1 217.50 217.50





3 x

1 -0.126873 0.552471E-02 -22.96

7 mu

2 -0.397376 5.75569 -0.07

6 spl(x,7) 5 effects fitted


The REML estimate of the smoothing constant indicates that there is some non-linearity. The fitted cubic smoothing spline is presented in Figure 15.13. Thefitted values were obtained from the .pvs file. The four points below the linewere the spring measurements.

15 Examples 240

o

o

o

o

o

oo

20

40

60

80

100

120

140

160

200 400 600 800 1000 1200 1400 1600

age

circu

mfe

renc

e

Figure 15.13 Fitted cubic smoothing spline for tree 1

We now consider the analysis of the full dataset. Following Verbyla et al. (1999)we consider the analysis of variance decomposition (see Table 15.12) which modelsthe overall and individual curves.

An overall spline is fitted as well as tree deviation splines. We note however,that the intercept and slope for the tree deviation splines are assumed to berandom effects. This is consistent with Verbyla et al. (1999). In this sense thetree deviation splines play a role in modelling the conditional curves for each treeand variance modelling. The intercept and slope for each tree are included asrandom coefficients (denoted by RC in Table 15.12). Thus, if U5×2 is the matrixof intercepts (column 1) and slopes (column 2) for each tree, then we assume that

var (vec(U)) = Σ⊗ I5

where Σ is a 2 × 2 symmetric positive definite matrix. Non smooth variationcan be modelled at the overall mean (across trees) level and this is achieved inASReml by inclusion of fac(x) as a random term.

15 Examples 241

Table 15.12 Orange data: AOV decomposition

stratum decomposition type df or ne

constant 1 F 1age

x F 1spl(x,7) R 5fac(x) R 7

treetree RC 5

age.treex.tree RC 5spl(x,7).tree R 25

error R

An extract of the ASReml input file is presented below.

circ ~ mu x !r Tree 4.6 Tree.x .000094 spl(x,7) .1,

spl(x,7).Tree 2.3 fac(x) 13.9

0 0 1

Tree 2

2 0 US 4.6 .00001 .000094

5 0 0

predict x Tree !IGNORE fac(x)

We stress the importance of model building in these settings, where we generallycommence with relatively simple variance models and update to more complexvariance models if appropriate. Table 15.13 presents the sequence of fitted mod-els we have used. Note that the REML log-likelihoods for models 1 and 2 arecomparable and likewise for models 3 to 6. The REML log-likelihoods are notcomparable between these groups due to the inclusion of the fixed season termin the second set of models.

We begin by modelling the variance matrix for the intercept and slope for eachtree, Σ, as a diagonal matrix as there is no point including a covariance com-ponent between the intercept and slope if the variance component(s) for one (or

15 Examples 242

Table 15.13 Sequence of models fitted to the Orange data

model

term 1 2 3 4 5 6

tree y y y y y yx.tree y y y y y y(covariance) n n n n n yspl(x,7) y y y y n ytree.spl(x,7) y y y n y yfac(x) n y y n n nseason n n y y y y

REML log-likelihood -97.78 -94.07 -87.95 -91.22 -90.18 -87.43

both) is zero. Model 1 also does not include a non-smooth component at theoverall level (that is, fac(x)). Abbreviated output is shown below.

11 LogL=-97.7788 S2= 6.3702 33 df


Tree 5 5 4.78452 30.4782 1.24 0 P

Tree.x 5 5 0.938442E-04 0.597805E-03 1.41 0 P

spl(x,7) 5 5 100.403 639.587 1.55 0 P

spl(x,7).Tree 25 25 1.11475 7.10114 1.44 0 P

Variance 35 33 1.00000 6.37018 1.74 0 P


8 mu 1 47.06 43.77

5 x 1 95.00 95.00

A quick look suggests this is fine until we look at the predicted curves in Fig-ure 15.14. The fit is unacceptable because the spline has picked up too muchcurvature, and suggests that there may be systematic non-smooth variation atthe overall level. This can be formally examined by including the fac(x) termas a random effect. This increased the log-likelihood 3.71 (P < 0.05) with thespl(x,7) smoothing constants heading to the boundary. There is a possible ex-planation in the season factor. When this is added (Model 3) it has an F ratio

15 Examples 243

Predicted values of circ

118 1582x 30

217

1 1

2 2

3 3

4 4

5 5

Figure 15.14 Plot of fitted cubic smoothing spline for model 1

of 107.5 (P < 0.01) while the fac(x) term goes to the boundry. Notice thatthe inclusion of the fixed term season in models 3 to 6 means that comparisonswith models 1 and 2 on the basis of the log-likelihood are not valid. The springmeasurements are lower than the autumn measurements so growth is slower inwinter. Models 4 and 5 successively examined each term, indicating that bothsmoothing constants are significant (P < 0.05). Lastly we add the covarianceparameter between the intercept and slope for each tree in model 6. This ensuresthat the covariance model will be translation invariant. A portion of the outputfile for model 6 is presented below.

11 LogL=-87.4291 S2= 5.6329 32 df


spl(x,7) 5 5 2.17207 12.2350 1.09 0 P

spl(x,7).Tree 25 25 1.38506 7.80187 1.47 0 P

Variance 35 32 1.00000 5.63286 1.72 0 P

Tree UnStruct 1 5.62102 31.6624 1.26 0 U

Tree UnStruct 1 -0.124177E-01 -0.699474E-01 -0.85 0 U

Tree UnStruct 2 0.108357E-03 0.610357E-03 1.40 0 U


5.621 -0.5032

-0.1242E-01 0.1084E-03


8 mu 1 169.87

15 Examples 244

5 x 1 92.78

6 Season 1 108.57

o

oo

oo

oo

50

100

150

200

1

200 600 1000 1400

o

o

o

o

o

o o

2o

oo

oo

oo

3

o

o

o

oo

oo

50

100

150

200

4o

o

o

o

o

oo

50

100

150

200

5 Marginal

200 600 1000 1400

Time since December 31, 1968 (Days)

Tru

nk c

ircum

fere

nce

(mm

)

Figure 15.15: Trellis plot of trunk circumference for each tree at sample dates (ad-justed for season effects), with fitted profiles across time and confidence intervals

Figure 15.15 presents the predicted growth over time for individual trees and amarginal prediction for trees with approximate confidence intervals (2±× stan-dard error of prediction). Within this figure, the data is adjusted to remove theestimated seasonal effect. The conclusions from this analysis are quite differ-ent from those obtained by the nonlinear mixed effects analysis. The individualcurves for each tree are not convincingly modelled by a logistic function. Fig-ure 15.16 presents a plot of the residuals from the nonlinear model fitted on p340of Pinheiro and Bates (2000). The distinct pattern in the residuals, which is thesame for all trees is taken up in our analysis by the season term.

15 Examples 245

-10

-5

0

5

10

200 400 600 800 1000 1200 1400 1600

age

Resid

ual

Figure 15.16 Plot of the residuals from the nonlinear model of Pinheiro and Bates

15.10 Multivariate animal genetics data - Sheep

The analysis of incomplete or unbalanced multivariate data often presents com-putational difficulties. These difficulties are exacerbated by either the number ofrandom effects in the linear mixed model, the number of traits, the complexity ofthe variance models being fitted to the random effects or the size of the problem.In this section we illustrate two approaches to the analysis of a complex set ofincomplete multivariate data.

Much of the difficulty in conducting these types of analyses in ASReml is centredon obtaining good starting values. Derivative based algorithms such as the AI

algorithm can be unreliable when fitting complex variance structures unless goodstarting values are available. Poor starting values may result in divergence of thealgorithm or slow convergence. A particular problem with fitting unstructuredvariance models is keeping the estimated variance matrix positive definite. Theseare not simple issues and in the following we present a pragmatic approach which

15 Examples 246

attempts to overcome some of these problems.

The data are taken from a large genetic study on Coopworth lambs. A total of 5traits, namely weaning weight (wwt), yearling weight (ywt), greasy fleece weight(gfw), fibre diameter (fdm) and ultrasound fat depth at the C site (fat) weremeasured on 7043 lambs. The lambs were the progeny of 92 sires and 3561 dams,produced from 4871 litters over 49 flock-year combinations. Not all traits weremeasured on each group. No pedigree data was available for either sires or dams.

The aim of the analysis is to estimate heritability (h2) of each trait and to estimatethe genetic correlations between the five traits. We will present two approaches,a half-sib analysis and an analysis based on the use of an animal model, whichdirectly defines the genetic covariance between the progeny and sires and dams.

The data fields included factors defining sire, dam and lamb (tag), covariates suchas age, the age of the lamb at a set time, brr the birth rearing rank (1 = bornsingle raised single, 2 = born twin raised single, 3 = born twin raised twin and 4= other), sex (M, F) and grp a factor indicating the flock-year combination.

Half-sib analysis

In the half-sib analysis we include terms for the random effects of sires, dams andlitters. In univariate analyses the variance component for sires is denoted byσ2

s = 14σ2

A where σ2A is the additive genetic variance, the variance component for

dams is denoted by σ2d = 1

4σ2A +σ2

m where σ2m is the maternal variance component

and the variance component for litters is denoted by σ2l and represents variation

attributable to the particular mating.

For a multivariate analysis these variance components for sires, dams andlitters are, in theory replaced by unstructured matrices, one for each term.Additionally we assume the residuals for each trait may be correlated. Thus forthis example we would like to fit a total of 4 unstructured variance models. Forsuch a situation, it is sensible to commence the modelling process with a series ofunivariate analyses. These give starting values for the diagonals of the variancematrices, but also indicate what variance components are estimable. The ASReml

job for the univariate analyses is given below

Multivariate Sire & Dam model

tag

sire 92 !I

dam 3561 !I

15 Examples 247

Table 15.14: REML estimates of a subset of the variance parameters for each traitfor the genetic example, expressed as a ratio to their asymptotic s.e.

term wwt ywt gfw fdm fat

sire 3.68 3.57 3.95 1.92 1.92dam 6.25 4.93 2.78 0.37 0.05

litter 8.79 0.99 2.23 1.91 0.00age.grp 2.29 1.39 0.31 1.15 1.74sex.grp 2.90 3.43 3.70 - 1.83

Table 15.15 Wald tests of the fixed effects for each trait for the genetic example

term wwt ywt gfw fdm fat

age 331.3 67.1 52.4 2.6 7.5brr 554.6 73.4 14.9 0.3 13.9sex 196.1 123.3 0.2 2.9 0.6

age.sex 10.3 1.7 1.9 - 5.0

grp 49

sex

brr 4

litter 4871

age wwt !M0 ywt !M0 # !M0 recodes zeros as missing values

gfw !M0 fdm !M0 fat !M0

./wwt/coop.bin

wwt ~ mu age brr sex age.sex !r sire dam lit age.grp sex.grp !f grp

Tables 15.14 and 15.15 present the summary of these analyses. Fibre diameterwas measured on only 2 female lambs and so interactions with sex were notfitted. The dam variance component was quite small for both fibre diameter andfat. The REML estimate of the variance component associated with litters was

15 Examples 248

effectively zero for fat.

Thus in the multivariate analysis we consider fitting the following models to thesire, dam and litter effects,

var (us) = Σs ⊗ I92

var (ud) = Σd ⊗ I3561

var (ul) = Σl ⊗ I4891

where Σ5×5s ,Σ3×3

d and Σ4×4l are positive definite symmetric matrices correspond-

ing to the between traits variance matrices for sires, dams and litters respectively.The variance matrix for dams does not involve fibre diameter and fat depth, whilethe variance matrix for litters does not involve fat depth. The effects in each ofthe above vectors are ordered levels within traits. Lastly we assume that theresidual variance matrix is given by

Σe ⊗ I7043

Table 15.16 presents the sequence variance models fitted to each of the fourrandom terms sire, dam, litter and error in the ASReml job presented below.

Multivariate Sire & Dam model

tag

sire 92 !I

dam 3561 !I

grp 49

sex

brr 4

litter 4871

age wwt !m0 ywt !m0 # !M0 identifies missing values

gfw !m0 fdm !m0 fat !m0

./wwt/coop.bin !DOPART $1 !CONTINUE !MAXIT 20

!PART 3

!EXTRA 4

!PART

wwt ywt gfw fdm fat ~ Trait Tr.age Tr.brr Tr.sex Tr.age.sex,

!r Tr.sire,

at(Tr,1).dam at(Tr,2).dam at(Tr,3).dam,

at(Tr,1).lit at(Tr,2).lit at(Tr,3).lit at(Tr,4).lit,

at(Trait,1).age.grp .0024,




at(Trait,1).sex.grp .93,

at(Trait,2).sex.grp 16.0,

15 Examples 249



!f Tr.grp

1 2 3 #1 R structure with 2 dimensions and 3 G structures

0 0 0 #Independent across animals

Tr 0 US #General structure across traits

15*0. #Asreml will estimate some starting values

Tr.sire 2 #Sire effects.

!PART 1 #Initial analysis ignoring genetic correlations

Tr 0 DIAG #Specified diagonal variance structure

0.608 1.298 0.015 0.197 0.035 #Initial sire variances

!PART 2 #Factor Analytic model

Tr 0 FA1 !GP

0.5 0.5 -.01 -.01 0.1 #Correlation factors

0.608 1.298 0.015 0.197 0.035 #Variances

!PART 3 #Unstructured variance model

Tr 0 US

0.6199 #Lower triangle row-wise

0.6939 1.602

0.003219 0.007424 0.01509

-0.02532 -0.05840 -0.0002709 0.1807

0.06013 0.1387 0.0006433 -0.005061 0.03487

!PART

sire

#Maternal structure covers the 3 model terms

# at(Tr,1).dam at(Tr,2).dam at(Tr,3).dam

at(Tr,1).dam 2 # Maternal effects.

!PART 1

3 0 CORGH !GU # Equivalent to Unstructured

.9

.1 .1

2.2 4.14 0.018

!PART 2

3 0 CORGH !GU

.9

.1 .1

2.2 4.14 0.018

!PART 3

3 0 US !GU

.9

.1 .1

2.2 4.14 0.018

15 Examples 250

!PART

dam

#Litter structure covers the 4 model terms at(Tr,1).lit at(Tr,2).lit

#at(Tr,3).lit at(Tr,4).lit

at(Tr,1).lit 2 # Litter effects.

!PART 1

4 0 DIAG # Diagonal structure

3.74 0.97 0.019 0.941

!PART 2

4 0 FA1 !GP # Factor Analytic 1

.5 .5 .01 .1

4.95 4.63 0.037 0.941

!PART 3

4 0 US # Unstructured

5.073

3.545 3.914

0.1274 0.08909 0.02865

0.07277 0.05090 0.001829 1.019

!PART

lit

Table 15.16: Variance models fitted for each part of the ASReml job in the analysisof the genetic example

term matrix !Part 1 !Part 2 !Part 3

sire Σs DIAG FA1 USdam Σd CORGH CORGH US

litter Σl DIAG FA1 USerror Σe US US US

In !PART 1, the error variance model is taken to be unstructured, but the startingvalues are set to zero. This instructs ASReml to obtain starting values from thesample covariance matrix of the data. For incomplete data the matrix so obtainedmay not, in general be positive definite. Care should be taken when using thisoption for incomplete multivariate data. The command to run this file for !PART1 is

asreml -nrs3 mt 1

15 Examples 251

The results from this analysis can be automatically used by ASReml for the nextpart, if the .rsv is copied prior to running the next part. That is, we add the!PART 2 coding to the job, copy mt1.rsv to mt2.rsv so that when we run !PART2 it starts from where !PART 1 finished, and run the job using

asreml -cnrs3 mt 2

Finally, we use the !PART 3 coding to obtain the final analysis, copy mt2.rsv tomt3.rsv and run the final stage starting from the stage 2 results. Note that weare using the automatic updating associated with !CONTINUE.

A portion of the final output file is presented below.


LogL=-21425.9 S2= 1.0000 35006 df

LogL=-21423.7 S2= 1.0000 35006 df

LogL=-21420.9 S2= 1.0000 35006 df

LogL=-21420.1 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df

LogL=-21419.9 S2= 1.0000 35006 df


at(Tr,1).age.grp 49 49 0.135358E-02 0.135358E-02 2.03 0 P

at(Tr,2).age.grp 49 46 0.101560E-02 0.101560E-02 1.24 0 P

at(Tr,4).age.grp 49 11 0.176513E-02 0.176513E-02 1.13 0 P

at(Tr,5).age.grp 49 28 0.209278E-03 0.209278E-03 1.68 0 P

at(Tr,1).sex.grp 49 49 0.919610 0.919610 2.89 0 P

at(Tr,2).sex.grp 49 46 15.3912 15.3912 3.50 0 P

at(Tr,3).sex.grp 49 46 0.279496 0.279496 3.71 0 P

at(Tr,5).sex.grp 49 28 1.44032 1.44032 1.80 0 P

. . . . . .

. . . . . .


9.462 0.5691 0.2356 0.1640 0.2183

7.332 17.54 0.4241 0.2494 0.4639

0.2728 0.6686 0.1417 0.3994 0.1679

0.9625 1.994 0.2870 3.642 0.4875E-01

0.8336 2.412 0.7846E-01 0.1155 1.541


0.5941 0.7044 0.2966 0.2028 0.2703

0.6745 1.544 0.1364E-01-0.1227 0.5726

0.2800E-01 0.2076E-02 0.1500E-01 0.1114 -0.4818E-02

15 Examples 252

0.6224E-01-0.6072E-01 0.5433E-02 0.1586 -0.6334

0.3789E-01 0.1294 -0.1073E-03-0.4588E-01 0.3308E-01


2.161 1.010 0.7663

2.196 2.186 0.8301

0.1577 0.1718 0.1959E-01


3.547 0.5065 -0.1099 -0.4097E-01

1.555 2.657 0.1740 -0.5150

-0.2787E-01 0.3822E-01 0.1815E-01-0.3281

-0.7313E-01-0.7957 -0.4190E-01 0.8984


15 Tr.age 5 98.95

16 Tr.brr 15 116.72

17 Tr.sex 5 59.78

19 Tr.age.sex 4 4.90

The REML estimates of all the variance matrices are positive definite. Heritabil-ities for each trait can be calculated using the .pin file facility of ASReml. Theheritability is given by

h2 =σ2

A

σ2P

where σ2P is the phenotypic variance and is given by

σ2P = σ2

s + σ2d + σ2

l + σ2e

recalling that

σ2s =

14σ2

A

σ2d =

14σ2

A + σ2m

In the half-sib analysis we only use the estimate of additive genetic variance fromthe sire variance component. The ASReml .pin file is presented below along withthe output from the following command

asreml -p mt3

F phenWYG 9:14 + 24:29 + 39:44 + 45:50 # defines 55:60

F phenD 15:18 + 30:33 + 51:54 # defines 61:64

F phenF 19:23 + 34:38 # defines 65:69

15 Examples 253

F Direct 24:38 * 4. # defines 70:84

F Maternal 39:44 - 24:29 # defines 85:90

H WWTh2 70 55

H YWTh2 72 57

H GFWh2 75 60

H FDMh2 79 64

H FATh2 84 69

R GenCor 24:38

R MatCor 85:90

55 phenWYG 9 15.76 0.3130

56 phenWYG 10 11.76 0.3749

57 phenWYG 11 23.92 0.6313

. . . . .

70 Direct 24 2.376 0.6458

71 Direct 25 2.698 0.8487

72 Direct 26 6.174 1.585

73 Direct 27 0.1120 0.7330E-01

. . . . .

85 Maternal 39 1.567 0.3788

86 Maternal 40 1.521 0.4368

87 Maternal 41 0.6419 0.7797

. . . . .

WWTh2 = Direct 2 70/phenWYG 55= 0.1507 0.0396

YWTh2 = Direct 2 72/phenWYG 57= 0.2581 0.0624

GFWh2 = Direct 2 75/phenWYG 60= 0.3084 0.0716

FDMh2 = Direct 3 79/phenD 18 64= 0.1350 0.0717

FATh2 = Direct 3 84/phenF 23 69= 0.0841 0.0402

GenCor 2 1 = Tr.si 25/SQR[Tr.si 24*Tr.si 26]= 0.7044 0.1025




GenCor 4 2 = Tr.si 31/SQR[Tr.si 26*Tr.si 33]= -0.1227 0.3247






MatCor 2 1 = Mater 86/SQR[Mater 85*Mater 87]= 1.5168 0.7131



Animal model

In this section we will illustrate the use of .ped files to define the genetic pedigreestructure between animals. This is an alternate method of estimating additivegenetic variance for these data. The variance matrix of the animals (sires, dams

15 Examples 254

and lambs) for which we only have data on lambs is given by

var (uA) = ΣA ⊗A−1

where A−1 is the inverse of the genetic relationship matrix. There are a totalof 10696 = 92 + 3561 + 7043 animals in the pedigree. The ASReml input file ispresented below. Note that this model is not equivalent to the sire/dam/littermodel with respect to the animal/litter components for gfw, fd and fat.

Multivariate Animal model

tag !P

sire

dam !P

grp 49

sex

brr 4

litter 4871

age wwt !m0 ywt !m0 # !M0 identifies missing values


./wwt/coop.ped

./wwt/coop.bin !DOPART $1 !CONTINUE !MAXIT 20 !RECODE !STEP 0.01

# $1 allows selection of PART as a command line argument

!PART 3

!EXTRA 4 # Force 4 more iterations after convergence criterion met

!PART

wwt ywt gfw fdm fat ~ Trait Tr.age Tr.brr Tr.sex Tr.age.sex,

!r Tr.tag ,

at(Tr,1).dam, at(Tr,2).dam, -at(Tr,3).dam .003,

at(Tr,1).lit, at(Tr,2).lit, at(Tr,3).lit, at(Tr,4).lit,









!f Tr.grp

1 2 3 # One multivariate R structure, 3 G structures

0 0 0 # No structure across lamb records

# First zero lets ASReml count te number of records

Tr 0 US #General structure across traits

7.66

5.33 13

.18 .66 .10

.78 2.1 .27 3.2

.73 2.02 .08 .20 1.44

15 Examples 255

Tr.tag 2 # Direct animal effects.

!PART 2

Tr 0 FA1 !GP

0.5 0.5 -.01 -.01 0.1

2.4 5.2 0.06 .8 .14

!PART 3

Tr 0 US

2.4800

2.8 6.4

0.0128 0.03 0.06

-.1 -.22 -.0011 0.72

0.24 0.55 0.0026 -0.0202 0.14

!PART

tag

at(Tr,1).dam 2 # Maternal effects.

!PART 2

2 0 CORGH !GFU

.99

1.6 2.54

!PART 3

2 0 US !GU

1.1 .58 .31

!PART

dam

at(Tr,1).lit 2 # Litter effects.

!PART 2

4 0 FA1 !GP # Factor Analytic

.5 .5 .01 .1 .01

4.95 4.63 0.037 0.941 0.102

!PART 3

4 0 US # Unstructured

5.073

3.545 3.914

0.1274 0.08909 0.02865

0.07277 0.05090 0.001829 1.019

!PART

lit

The term Tr.tag now replaces the sire (and part of dam) terms in the half-sibanalysis. This analysis uses information from both sires and dams to estimateadditive genetic variance. The dam variance component is this analysis only es-timates the maternal variance component. It is only significant for the weaningand yearling weights. The litter variation remains unchanged. The ASReml input

15 Examples 256

file again consists of several parts, which progressively build up to fitting unstruc-tured variance models to Tr.tag, Tr.dam, Tr.litter and error. A portion of theoutput file is presented below.

dam !P

age wwt !m0 ywt !m0


A-inverse retrieved from ainverse.bin

PEDIGREE [../wwt/coop.ped ] has 10696 identities, 29474 Non zero elements

QUALIFIERS: !DOPART 3 !CONTINUE !MAXIT 20 !RECODE !STEP 0.01

Reading ../wwt/coop.bin as BINARY SINGLE PRECISION


1 tag 10696 Direct 1 3.000 5380. 0.1070E+05 0 0

2 sire 1 Covariat 2 1.000 48.06 92.00 0 0

3 dam 10696 Direct 3 1.000 5197. 0.1070E+05 0 0

. . .

Forming 95033 equations: 40 dense



LogL=-21432.6 S2= 1.0000 35006 df

LogL=-21432.3 S2= 1.0000 35006 df

LogL=-21429.6 S2= 1.0000 35006 df

LogL=-21423.2 S2= 1.0000 35006 df

LogL=-21420.1 S2= 1.0000 35006 df

LogL=-21420.1 S2= 1.0000 35006 df

LogL=-21417.6 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df

LogL=-21417.2 S2= 1.0000 35006 df


at(Tr,1).age.grp 49 49 0.132682E-02 0.132682E-02 2.02 0 P

at(Tr,2).age.grp 49 46 0.908219E-03 0.908219E-03 1.15 0 P

at(Tr,4).age.grp 49 11 0.175614E-02 0.175614E-02 1.13 0 P

at(Tr,5).age.grp 49 28 0.223617E-03 0.223617E-03 1.73 0 P

at(Tr,1).sex.grp 49 49 0.902586 0.902586 2.88 0 P

at(Tr,2).sex.grp 49 46 15.3623 15.3623 3.50 0 P

at(Tr,3).sex.grp 49 46 0.280673 0.280673 3.71 0 P

at(Tr,5).sex.grp 49 28 1.42136 1.42136 1.80 0 P

. . . .


7.476 0.4918 0.1339 0.1875 0.1333

4.768 12.57 0.4381 0.3425 0.3938

15 Examples 257

0.1189 0.5049 0.1056 0.4864 0.1298

0.9377 2.221 0.2891 3.345 0.1171

0.4208 1.612 0.4869E-01 0.2473 1.333


3.898 0.8164 0.5763 0.3898E-01 0.6148

4.877 9.154 0.3689 -0.1849 0.7217

0.3029 0.2971 0.7085E-01-0.2416E-01 0.3041

0.6019E-01-0.4375 -0.5030E-02 0.6117 -0.4672

0.6154 1.107 0.4104E-01-0.1853 0.2570


0.9988 0.7024

0.5881 -0.7018


3.714 0.5511 0.1635 -0.6157E-01

2.019 3.614 0.5176 -0.4380

0.4506E-01 0.1407 0.2045E-01-0.3338

-0.1021 -0.7166 -0.4108E-01 0.7407


15 Tr.age 5 99.16

16 Tr.brr 15 116.52

17 Tr.sex 5 59.94

19 Tr.age.sex 4 5.10

There is no guarantee that variance component matrices will be positive definiteand this example highlights this issue. We used the !GU qualifier on the maternalcomponent to obtain the matrix

[0.9988 0.58810.5881 −0.7018

].

ASReml reports the correlation as 0.7024 which it obtains by ignoring the sign in-0.7018. This is the maternal component for ywt. Since it is entirely reasonableto expect maternal influences on growth to have dissipated at 12 months of age,it would be reasonable to refit the model omitting at(Tr,2).dam and changingthe dimension of the G structure.

Bibliography

Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics, London: Chapmanand Hall.

Cox, D. R. and Snell, E. J. (1981). Applied Statistics; Principles and Examples,London: Chapman and Hall.

Cressie, N. A. C. (1991). Statistics for spatial data, New York: John Wiley andSons, Inc.

Cullis, B. R. and Gleeson, A. C. (1991). Spatial analysis of field experiments -an extension to two dimensions, Biometrics 47: 1449–1460.

Cullis, B. R., Gleeson, A. C., Lill, W. J., Fisher, J. A. and Read, B. J. (1989).A new procedure for the analysis of early generation variety trials, AppliedStatistics 38: 361–375.

Cullis, B. R., Gogel, B. J., Verbyla, A. P. and Thompson, R. (1998). Spatialanalysis of multi-environment early generation trials, Biometrics 54: 1–18.

Dempster, A. P., Selwyn, M. R., Patel, C. M. and Roth, A. J. (1984). Statisti-cal and computational aspects of mixed model analysis, Applied Statistics33: 203–214.

Draper, N. R. and Smith, H. (1998). Applied Regression Analysis, John Wileyand Sons, New York, 3rd Edition.

Gilmour, A. R., Cullis, B. R. and Verbyla, A. P. (1997). Accounting for naturaland extraneous variation in the analysis of field experiments, Journal ofAgricultural, Biological, and Environmental Statistics 2: 269–273.

Gilmour, A. R., Cullis, B. R., Welham, S. J., Gogel, B. J. and Thompson, R.(2002). An efficient computing strategy for prediction in mixed linear mod-els, Computational Statistics and Data Analysis.

Gilmour, A. R., Thompson, R. and Cullis, B. R. (1995). AI, an efficient algorithmfor REML estimation in linear mixed models, Biometrics 51: 1440–1450.

258

Bibliography 259

Gleeson, A. C. and Cullis, B. R. (1987). Residual maximum likelihood (REML)estimation of a neighbour model for field experiments, Biometrics 43: 277–288.

Gogel, B. J. (1997). Spatial analysis of multi-environment variety trials, PhDthesis, Department of Statistics, University of Adelaide.

Green, P. J. and Silverman, B. W. (1994). Nonparametric regression and gener-alized linear models, London: Chapman and Hall.

Harvey, W. R. (1977). Users’ guide to LSML76, The Ohio State University,Columbus.

Harville, D. A. (1997). Matrix algebra from a statisticians perspective, Springer-Verlag, New York.

Kenward, M. G. and Roger, J. H. (1997). The precision of fixed effects estimatesfrom restricted maximum likelihood, Biometrics 53: 983–997.

Lane, P. W. and Nelder, J. A. (1982). Analysis of covariance and standardisationas instances of predicton, Biometrics 38: 613–621.

Patterson, H. D. and Nabugoomu, F. (1992). REML and the analysis of seriesof crop variety trials, Proceedings from the 25th International BiometricConference, pp. 77–93.

Patterson, H. D. and Thompson, R. (1971). Recovery of interblock informationwhen block sizes are unequal, Biometrika 58: 545–54.

Piepho, H.-P., Denis, J.-B. and van Eeuwijk, F. A. (1998). Mixed biadditivemodels, Proceedings of the 28th International Biometrics Conference.

Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-PLUS,Berlin: Springer-Verlag.

Robinson, G. K. (1991). That BLUP is a good thing: The estimation of randomeffects, Statistical Science 6: 15–51.

Searle, S. R. (1971). Linear Models, New York: John Wiley and Sons, Inc.

Searle, S. R. (1982). Matrix algebra useful for statistics, New York: John Wileyand Sons, Inc.

Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components,New York: John Wiley and Sons, Inc.

Bibliography 260

Self, S. C. and Liang, K. Y. (1987). Asymptotic properties of maximum likelihoodestimators and likelihood ratio tests under non-standard conditions, Journalof the American Statistical Society 82: 605–610.

Smith, A. B., Cullis, B. R., Gilmour, A. R. and Thompson, R. (1998). Multi-plicative models for interaction in spatial mixed model analyses of multi-environment trial data, Proceedings of the 28th International BiometricsConference.

Smith, A., Cullis, B. R. and Thompson, R. (2001). Analysing variety by environ-ment data using multiplicative mixed models and adjustments for spatialfield trend, Biometrics 57: 1138–1147.

Stevens, M. M., Fox, K. M., Warren, G. N., Cullis, B. R., Coombes, N. E. andLewin, L. G. (1999). An image analysis technique for assessing resistancein rice cultivars to root-feeding chironomid midge larvae (diptera: Chirono-midae), Field Crops Research 66: 25–26.

Stroup, W. W., Baenziger, P. S. and Mulitze, D. K. (1994). Removing spatialvariation from wheat yield trials: a comparison of methods, Crop Science86: 62–66.

Thompson, R., Cullis, B. R., Smith, A. B. and Gilmour, A. R. (2003). A sparseimplementation of the average information algorithm for factor analytic andreduced rank variance models. (submitted).

Verbyla, A. P., Cullis, B. R., Kenward, M. G. and Welham, S. J. (1999). Theanalysis of designed experiments and longitudinal data using smoothingsplines, Journal of the Royal Statistical Society, Series C 48: 269–311.

White, I. H., Thompson, R. and Brotherstone, S. (1999). Genetic and environ-mental smoothing of lactation curves with cubic splines, Journal of DairyScience 82: 632–638.

Wilkinson, G. N. and Rogers, C. E. (1973). Symbolic description of factorialmodels for analysis of variance, Applied Statistics 22: 392–399.

Wolfinger, R. D. (1996). Heterogeneous variance-covariance structures for re-peated measures, Journal of Agricultural, Biological, and EnvironmentalStatistics 1: 362–389.

Yates, F. (1935). Complex experiments, Journal of the Royal Statistical Society,Series B 2: 181–247.

Index

advanced processing arguments, 147

AI algorithm, 12

AIC, 17

Akaike Information Criteria, 17

aliasing, 74

animal breeding data, 2

arguments, 4

ASReml symbols

∼, 62

*, 35

., 35

#, 35

$, 35

!{, 64

!}, 64

+, 64

,, 64

-, 64

/, 64

:, 64

autoregressive, 83

Average Information, 2

balanced repeated measures, 204

Bayesian Information Criteria, 17

BIC, 17

binary files, 36

BLUE, 14

BLUP, 14

case, 63

combining variance models, 15

command file, 23

genetic analysis, 118

multivariate, 111

command line options, 142

commonly used functions, 64

conditional distribution, 11

conditional factors, 68

constraining variance parameters, 106

constraints

on variance parameters, 87

Convergence criterion, 49

correlated random effects, 14

correlation, 137

between traits, 111

model, 9

covariance model, 10

isotropic, 9

covariates, 34, 47, 73

cubic splines, 71

data field syntax, 39

data file, 22, 33, 34

binary format, 36

fixed format, 36

free format, 34

line, 47

data file line, 25

qualifiers, 48

syntax, 47

debug options, 143

dense, 74

design factors, 73

diagnostics, 17

diallal analysis, 69

direct product, 6, 8, 78

direct sum, 8

discussion list, 3

distribution

conditional, 11

marginal, 11

261

Index 262

Ecode, 32

EM update, 103

equations

mixed model, 13

error variance

heterogeneity, 8

errors, 169

expected information matrix, 12

F-statistic, 18

factors, 34

file

GIV, 123

pedigree, 119

Fisher-scoring algorithm, 12

fixed effects, 6

Fixed format files, 56

fixed terms, 62, 66

multivariate, 112

primary, 66

sparse, 67

free format, 34

functions of variance components, 31,

134

correlation, 137

heritability, 136

linear combinations, 135

syntax, 135

G structure, 78

definition lines, 87, 92

header, 92

more than one term, 105

genetic

data, 2

groups, 121

links, 118

models, 118

qualifiers, 118

relationships, 119

GIV, 123

graphics options, 143

half-sib analysis, 246

help, 3

heritability, 136, 165

heterogeneity

error variance, 8

hypothesis testing, 18

identifiable, 15

IID, 6

information matrix

expected, 12

observed, 12

initial values, 91

input file extension

.BIN, 36

.DBL, 36

.bin, 34, 36

.csv, 35

.dbl, 34, 36

.pin, 135

interactions, 68

isotropic

covariance model, 9

job control

options, 144

qualifiers, 49

key output files, 151

likelihood

log residual, 11

residual, 11

Log file, 146

longitudinal data, 2

balanced example, 237

marginal distribution, 11

measurement error, 84

memory options, 144

Index 263

menu options, 145

MET, 8

meta analysis, 2, 8

missing values, 35, 71, 73, 155

NA, 35

in explanatory variables, 73

in response, 73

mixed

effects, 6

model, 6

mixed model, 6

equations, 13

multivariate, 112

specifying, 25

model

animal, 118, 253

correlation, 9

covariance, 10

formulae, 62

random regression, 10

sire, 118

model building, 109

moving average, 70

multi-environment trial, 2, 8

multivariate analysis, 111, 231

example, 245

half-sib analysis, 246

Nebraska Intrastate Nursery, 20

non singular matrices, 78

nonidentifiable, 15

objective function, 13

observed information matrix, 12

operators, 64

options

command line, 142

ordering of terms, 74

orthogonal polynomials, 71

outliers, 166

output

files, 27

multivariate analysis, 114

objects, 165

output file extension

.asl, 150

.asp, 150

.asr, 27, 150, 151

.dbr, 150

.dpr, 150, 162

.pvc, 150

.pvs, 150, 162

.res, 150, 156

.rsv, 150, 162

.sln, 29, 150, 154

.spr, 150

.tab, 150, 164

.veo, 150

.vrb, 159

.vvp, 150, 161

.yht, 30, 150, 155

overspecified, 15

own models, 98

OWN variance structure, 97

!F2, 98

!T, 98

parameter

scale, 6

variance, 6

PC environment, 141

pedigree, 118

file, 119

post-processing

variance components, 146

power, 95, 100

predicted values, 31

prediction, 26, 126

qualifiers, 131

predictions

estimable, 32

prior mean, 14

Index 264

product

direct, 8

qualifier

!<, 43

!<=, 43

!<>, 43

!==, 43

!>, 43

!>=, 43

!E, 43

!*, 43

!+, 43

!-, 43

!/, 43

!=s, 103

!=, 43

!ABS, 43

!ADJUST, 55

!ALPHA, 121

!ARCSIN , 43

!ASMV, 50

!ASUV, 50

!A, 40

!BLUP, 55

!CINV, 59

!COLFAC, 50

!CONTINUE, 51, 109

!COS, 43

!CSV, 48

!DATAFILE, 55

!DENSE, 55

!DF, 55

!DISPLAY, 51

!DOPART, 52

!D, 43

!EXP, 43

!EXTRA, 59

!FACPOINTS, 59

!FILTER, 48

!FORMAT, 56

!GF, 103

!GIV, 121

!GP, 103

!GROUPS, 122

!GU, 103

!GZ, 103

!G, 40, 56, 58

!I, 40

!JOIN, 57, 58

!Jddm, 44

!Jmmd, 44

!Jyyd, 44

!KNOTS, 59

!L, 40

!MAKE, 122

!MAXIT, 49

!MAX, 44

!MGS, 122

!MIN, 44

!MOD, 44

!MVREMOVE, 52

!M, 44

!NA, 44

!NOCHECK, 59

!NOREORDER, 59

!NOSCRATCH, 59

!POLPOINTS, 59

!PPOINTS, 60

!PRINT, 57

!PVAL, 52

!P, 40

!READ, 57

!RECODE, 57

!REPEAT, 122

!REPORT, 57

!RESIDUALS, 57

!ROWFAC, 50, 53

!S2==1, 104

!S2==value, 104

!S2=, 104

Index 265

!SAVE, 58

!SCALE, 60

!SECTION, 53

!SELECT, 48

!SEQ, 45

!SET, 44

!SIN, 43

!SKIP, 48, 122

!SLOW, 60

!SPLINE, 54

!STEP, 54

!SUB, 45

!VCC, 58

!V, 45

!X, 58

!Y, 58

qualifiers

data file line, 48

genetic, 118

job control, 49

variance model, 103

R structure, 78

definition, 90

definition lines, 87

random

effects, 6

correlated, 14

regressions

model , 10

terms

multivariate, 112

random regressions, 105

random terms, 62, 67

RCB, 24

analysis, 79

design, 20

reading the data, 24, 39

REML, i, 2, 10, 16

REMLRT, 16

repeated measures, 2, 204

reserved words, 64

AEXP, 101

AGAU, 101

AINV, 102

ANTE[1], 101

AR2, 99

ARMA, 99

AR[1], 99

CHOL[1], 101

CORB, 100

CORGB, 100

CORGH, 100

CORU, 99

C, 142

DIAG, 101

D, 142

EXP, 100

E, 142

FACV[1], 102

FA[1], 101

F, 142

GAU, 100

GIV, 102

Gg, 142

IDH, 101

ID, 99

IEUC, 100

IEXP, 100

IGAU, 100

I, 142

L, 142

MA2, 99

MA[1], 99

M, 142

N, 142

OWN, 101

P, 142

R, 142

SAR, 99

Ss, 142

Index 266

Trait, 64, 72

US, 101

XFA[1], 102

a(t,r), 69

and(t,r), 64, 69

at(f,n), 64, 69

cos(v,r), 65, 69

fac(v,y), 64, 69

fac(v), 64, 69

g(f,n), 69

giv(f,n), 65, 69

i(f), 70

ide(f), 65, 70

inv(v,r), 65, 70

l(f), 70

leg(v,n), 65, 70

lin(f), 64, 70

log(v,r), 65, 70

ma1(f), 65, 70

ma1, 65, 70

mu, 64, 70

mv, 64, 71

p(v,n), 71

pol(v,n), 65, 71

s(v [,k ]), 71

sin(v,r), 65, 71

spl(v [,k ]), 64, 71

sqrt(v,r), 65, 71

uni(f,k), 72

uni(f,n), 65

uni(f), 65

units, 64, 72

xfa(f,k), 65, 72

residual

error, 6

likelihood, 11

response, 62

running the job, 27

scale parameter, 6

score, 12

section, 8

separability, 9

separable, 83

singularities, 74

sparse, 74

sparse fixed, 62

spatial

analysis, 213

data, 2

model, 82

specifying the data, 39

split plot design, 194

tabulation, 26

syntax, 128

testing of terms, 76

tests of hypotheses, 18

title line, 24, 39

trait, 34, 111

transformation, 41

syntax, 41

typographic conventions, 4

unbalanced

data, 202

nested design, 198

UNIX, 141

unreplicated trial, 219

variance

parameter, 6

variance components

functions of, 134

variance header line, 87, 89

variance model

combining, 15, 104

description, 92

forming from correlation models,

93

qualifiers, 103

specification, 78

Index 267

specifying, 79

variance parameters, 10

constraining, 87, 106

between structures , 107

within a model , 106

variance structures, 26, 87

multivariate, 113

Wald test, 18

weight, 62, 72

weights, 34

ASReml User Guide - uncronopio.orguncronopio.org/uploads/ASReml/ASReml_User_Guide.pdf · ASReml...

Documents

Transcript of ASReml User Guide - uncronopio.orguncronopio.org/uploads/ASReml/ASReml_User_Guide.pdf · ASReml...