Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate...

12
Multivariate Statistics for Wildlife and Ecology Research

Transcript of Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate...

Page 1: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Multivariate Statisticsfor Wildlife andEcology Research

Page 2: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Springer Science+Business Media, LLC

Page 3: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Kevin McGarigal Sam CushmanSusan Stafford

Multivariate Statisticsfor Wildlife andEcology Research

With 57 Figures

, Springer

Page 4: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Kevin McGarigal Department of Natural Resources

Conservation University of Massachusetts Amherst, MA 01003-5810 USA [email protected]

Susan Stafford Department of Forest Sciences Colorado State University Fort Collins, CO 80523-1470 USA

Sam Cushman Organismic and Evolutionary Biology University of Massachusetts Amherst, MA 01003-5810 USA [email protected]

Cover dlustration: Trivariate probability surfaces showing the distributlOn of three rodent species on three environmental variables (see Figure 1.4). Photos of iguana, sulphur-crested cockatoo, and mongoose by David Alexander.

Library of Congress Catalogmg-m-Publication Data McGarigal, Kevin.

Multivariate stallstlcs for wildlife and ecology research I Kevin McGarigal, Sam Cushman, Susan Stafford.

p. cm. Inc\udes bibliographical references. ISBN 978-0-387-98642-5 ISBN 978-1-4612-1288-1 (eBook) DOI 10.1007/978-1-4612-1288-1 1. Animal ecology-Research-Statistical methods. 2. Multi variate

analysis. 1. Cushman, Sam II. Stafford, Susan G., 1952-III. Title. QH54I.2.M39 2000 577'.07'27-dc21 99-16036

Printed on acid-free paper.

© 2000 Springer Science+Business Media New York OriginaIly published by Springer-Verlag New York, Inc. in 2000 Softcover reprint of the hardcover I st edition 2000 AII rights reserved. This work may not be translated or copied in whole or in part without the written permission ofthe pubIisher (Springer Springer Science+Business Media, LLC), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especiaIly identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

Production managed by Francine McNeilI; manufacturing supervised by Jeffrey Taub. Photocomposed copy prepared from the authors' WordPerfect and Microsoft Word files using FrameMaker.

9 8 7 6 5 4 3 2 l

ISBN 978-0-387-98642-5

Page 5: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Preface

This bookis intended to serve as an introduction to the use and interpretation ofthemost common multivariate statistical techniques used in wildlife ecologicalresearch. Specifically, this book is designed with three major functions in mind:(1) to provide a practical guide to the use and interpretation of selected multivari­ate techniques; (2) to serve as a textbook for a graduate-level course on the subjectfor students in wildlife ecology programs; and (3) to provide the background nec­essary for further study of multivariate statistics.

It is important to acknowledge upfront that it was not our intention to provide any"new" information in terms of multivariate statistics. Our primary purpose was tosynthesize and summarize the current body of literature on multivariate statisticsand to present it in a simplified form that could be understood by most wildliferesearchers and graduate students. Consequently, we have drawn heavily upon anexcellent array of more comprehensive books on the subject (e.g., Morrison 1967;Harris 1975; Gauch 1982; Dillon and Goldstein 1984; Digby and Kempton 1987;Hair, Anderson, and Tatham 1987; see Chapter I for citations).

So why do we need another textbook on the subject? First, most of the compre­hensive books are so laden with mathematical and theoretical detail that most wild­life researchers, particularly graduate students, find these books difficult, if notimpossible, to read and understand. It was our intention to draw from these sourcesonly the particularly relevant material pertaining to the application of these proce­dures in wildlife ecological research. Second, few of these books focus on ecologi­cal applications, and those that do, generally focus on applications in communityecology or are limited in scope (e.g., focus on asingle family of techniques). None ofthe treatments that we are aware of focus on wildlife habitat relationships.

In developing this book, our intent was to present multivariate statistical tech­niques in a manner that could be understood by wildlife researchers with an interme­diate knowledge of statistics. Because this book is targeted for wildlife researchers,

Page 6: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

vi Preface

the discussions are focused on applications in wildlife research, but the commentsare equally valid for most other ecological applications. In addition, this bookemphasizes the practical aspects of each technique-that is, what each technique isdesigned to accomplish and how it is used and interpreted-rather than the detailedmathematics and underlying theory. Nevertheless, some background in statistics isnecessary in order to understand all the material presented here. Because this book isintended to serve as an introduction to selected multivariate techniques, we stronglyrecommend that other texts be consulted (see Chapter 1 bibliography) to obtainadditional insight on the performance ofthese techniques under varying conditions.

It is beyond the scope of this (or any) book to provide an exhaustive and in-depthreview of all multivariate statistical techniques. Therefore, we include only thosetechniques commonly used in wildlife ecology research. Even so, it is not possible topresent all the techniques used by wildlife researchers. For example, there are sev­eral prominent ordination techniques used in ecological studies, but we focus onprincipal components analysis because it is the most well-understood and mostwidely used ordination method in wildlife ecology, and provide a brief overview ofother common ordination techniques for comparison. This does not mean that prin­cipal components analysis is the "best" choice ofmethods in all, or even most, cases.but rather, it is one of the more commonly used methods. Furthermore, we includeonly the true or classical multivariate techniques in this book; methods such as mul­tiple regression, which involve multiple variables, but not multiple dependent vari­ables, are not included.

This book is divided into three parts. In the first part (Chapter 1), we introduce thefield of multivariate data analysis common in wildlife ecological applications andclarify some of the terminology in order to avoid confusion in subsequent chapters.Each of the multivariate techniques covered in the text is succinctly described here.

In the second part (Chapters 2 to 5), we focus on each of four techniques (or fami­lies of techniques), emphasizing the general concepts and methods involved in theproper application of each technique. Here we focus on how to use each techniqueand how to interpret the results. Each chapter is divided into sections of practicalimportance, including a conceptual and geometric overview of the technique, typesofsuitable data, assumptions and diagnostics for testing the assumptions, sample sizerequirements, and important output resulting from the procedure. An illustrativewildlife example using the SAS system for personal computers is incorporated intoeach chapter. The example is intended to represent the kind of real-world data setsthat most wildlife ecologists collect. In other words, the assumptions are not alwaysmet, and the results are not always straightforward and easy to interpret. Each chapterin the second part is intended to stand alone without too much dependence on thematerial in other chapters. For a researcher interested in a specific technique, this hasthe advantage that each chapter can be read and understood without reading the entirebook. Unfortunately, this also results in some unavoidable redundancy, especially inthe case ofcommon statistical assumptions associated with the techniques.

In the third and final part of the book (Chapter 6), we summarize and compare thevarious multivariate techniques. The focus here is on when to use each technique.We compare the various techniques with respect to the types of research questions

Page 7: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Preface vii

and data sets that are appropriate for each technique. This part is intended to estab­lish a conceptual relationship among multivariate techniques and thereby serve as aguide for choosing the appropriate technique(s) in any particular application.

Each chapter ends with a bibliography of selected publications. References oneach technique (Chapters 2 to 5) are divided into two groups: (I) those related to thestatistical procedure, and (2) those on applications ofthe technique from the wildlifeliterature. In this format, the bibliography can serve those who desire additionaltechnical information, as well as those who wish to review how others have appliedthe techniques to answer wildlife research questions.

Finally, although there remains much within the field of multivariate data analy­sis that is not included here, we hope this book is effective in portraying the types ofwildlife research questions that can be addressed using multivariate techniques.Moreover, while statistics is often a fearful subject to many wildlife researchers­multivariate statistics especially so-we hope that our treatment provides enough ofa conceptual framework so that books more technical than this one can beapproached without apprehension.

AcknowLedgments. We are grateful to the many quantitative wildlife ecology studentswho provided myriad comments and useful suggestions during the development ofthis book. We thank Robert G. Anthony for providing the initial impetus for develop­ing this book. and several anonymous reviewers for providing useful suggestions.

Kevin McGarigaLSam CushmanSusan Stafford

Page 8: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Contents

Preface v

1 Introduction and Overview 11.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1.2 Multivariate Statistics: An Ecological Perspective. . . . . . . . . . . . 21.3 Multivariate Description and Inference. . . . . . . . . . . . . . . . . . . . 91.4 Multivariate Confusion! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 101.5 Types of Multivariate Techniques. . . . . . . . . . . . . . . . . . . . . . . .. 14

1.5.1 Ordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 141.5.2 Cluster Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 151.5.3 Discriminant Analysis. . . . . . . . . . . . . . . . . . . . . . . . .. 151.5.4 Canonical Correlation Analysis. . . . . . . . . . . . . . . . . .. 16

Bibliography 16

2 Ordination: Principal Components Analysis 192.1 Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 192.2 Conceptual Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 20

2.2.1 Ordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 202.2.2 Principal Components Analysis (PCA) . . . . . . . . . . . .. 23

2.3 Geometric Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 242.4 The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 252.5 Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27

2.5.1 Multivariate Normality " 282.5.2 Independent Random Sample and the Effects

of Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 312.5.3 Linearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33

Page 9: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

x Contents

2.6 Sample Size Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6.1 General Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6.2 Specific Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 Deriving the Principal Components. . . . . . . . . . . . . . . . . . . . . . 352.7.1 The Use of Correlation and Covariance Matrices . . . . 352.7.2 Eigenvalues and Associated Statistics. . . . . . . . . . . . . 372.7.3 Eigenvectors and Scoring Coefficients. . . . . . . . . . . . . 39

2.8 Assessing the Importance of the Principal Components. . . . . . . 412.8.1 Latent Root Criterion. . . . . . . . . . . . . . . . . . . . . . . . . . 412.8.2 Scree Plot Criterion. . . . . . . . . . . . . . . . . . . . . . . . . . . 412.8.3 Broken Stick Criterion. . . . . . . . . . . . . . . . . . . . . . . . . 432.8.4 Relative Percent Variance Criterion. . . . . . . . . . . . . . . 432.8.5 Significance Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.9 Interpreting the Principal Components. . . . . . . . . . . . . . . . . . . . 502.9.1 Principal Component Structure. . . . . . . . . . . . . . . . . . 502.9.2 Significance of Principal Component Loadings. . . . . . 512.9.3 Interpreting the Principal Component Structure. . . . . 532.9.4 Communality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.9.5 Principal Component Scores and Associated Plots. . . 57

2.10 Rotating the Principal Components. . . . . . . . . . . . . . . . . . . . . . . 582.11 Limitations of Principal Components Analysis. . . . . . . . . . . . . . 612.12 R-Factor Versus Q-Factor Ordination. . . . . . . . . . . . . . . . . . . . . 612.13 Other Ordination Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.13.1 Polar Ordination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.13.2 Factor Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642.13.3 Nonmetric Multidimensional Scaling. . . . . . . . . . . . . 662.13.4 Reciprocal Averaging. . . . . . . . . . . . . . . . . . . . . . . . . 672.13.5 Detrended Correspondence Analysis . . . . . . . . . . . . . 682.13.6 Canonical Correspondence Analysis. . . . . . . . . . . . . . 69

Bibliography 73Appendix 2.1 78

3 Cluster Analysis 813.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.2 Conceptual Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823.3 The Definition of Cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.4 The Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863.5 Clustering Techniques. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893.6 Nonhierarchical Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3.6.1 Polythetic Agglomerative Nonhierarchical Clustering. . 923.6.2 Polythetic Divisive Nonhierarchical Clustering. . . . . . 93

3.7 Hierarchical Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943.7.1 Polythetic Agglomerative Hierarchical Clustering ... 95

Page 10: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Contents xi

3.7.2 Polythetic Divisive Hierarchical Clustering. . . . . . .. 1203.8 Evaluating the Stability of the Cluster Solution. . . . . . . . . . . .. 1213.9 Complementary Use of Ordination and Cluster Analysis. . . . .. 1223.10 Limitations of Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . .. 123Bibliography 124Appendix 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 127

4 Discriminant Analysis 1294.1 Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1294.2 Conceptual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 130

4.2.1 Overview of Canonical Analysis of Discriminance. .. 1334.2.2 Overview of Classification. . . . . . . . . . . . . . . . . . . . .. 1344.2.3 Analogy with Multiple Regression Analysis and

Multivariate Analysis of Variance . . . . . . . . . . . . .. 1364.3 Geometric Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1374.4 The Data Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1384.5 Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 141

4.5.1 Equality of Variance-Covariance Matrices. .. 1414.5.2 Multivariate Normality. . . . . . . . . . . . . . . . . . . . . . .. 1444.5.3 Singularities and Multicollinearity. . . . . . . . . . . . . .. 1464.5.4 Independent Random Sample

and the Effects of Outliers . . . . . . . . . . . .. 1514.5.5 Prior Probabilities Are Identifiable. . . . . . . . . . . . . .. 1524.5.6 Linearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 153

4.6 Sample Size Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1534.6.1 General Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1534.6.2 Specific Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 154

4.7 Deriving the Canonical Functions. . . . . . . . . . . . . . . . . . . . . . .. 1554.7.1 Stepwise Selection of Variables . . . . . . . . . . . . . . . . .. 1554.7.2 Eigenvalues and Associated Statistics. . . . . . . . . . . .. 1584.7.3 Eigenvectors and Canonical Coefficients. . . . . . . . .. 159

4.8 Assessing the Importance of the Canonical Functions. . . . . . .. 1614.8.1 Relative Percent Variance Criterion. . . . . . . . . . . . . .. 1614.8.2 Canonical Correlation Criterion. . . . . . . . . . . . . . . .. 1624.8.3 Classification Accuracy. . . . . . . . . . . . . . . . . . . . . . .. 1634.8.4 Significance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1674.8.5 Canonical Scores and Associated Plots. . . . . . . . . . .. 169

4.9 Interpreting the Canonical Functions. . . . . . . . . . . . . . . . . . . .. 1694.9.1 Standardized Canonical Coefficients. . . . . . . . . . . . .. 1714.9.2 Total Structure Coefficients. . . . . . . . . . . . . . . . . . . .. 1714.9.3 Covariance-Controlled Partial F-Ratios . . . . . . . . . .. 1734.9.4 Significance Tests Based on Resampling Procedures.. 1754.9.5 Potency Index ..... . . . . . . . . . . . . . . . . . . . . . . . . .. 175

Page 11: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

xii Contents

4.10 Validating the Canonical Functions. . . . . . . . . . . . . . . . . . . . . . 1764.10.1 Split-Sample Validation. . . . . . . . . . . . . . . . . . . . . . . 1774.10.2 Validation Using Resampling Procedures. . . . . . . . . 178

4.11 Limitations of Discriminant Analysis. . . . . . . . . . . . . . . . . . . . 179Bibliography 180Appendix 4.1 185

5 Canonical Correlation Analysis 1895.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1895.2 Conceptual Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1905.3 Geometric Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1955.4 The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1965.5 Assumptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

5.5.1 Multivariate Normality. . . . . . . . . . . . . . . . . . . . . . . . 1985.5.2 Singularities and Multicollinearity. . . . . . . . . . . . . . . 1995.5.3 Independent Random Sample

and the Effects of Outliers .. . . . . . . . . . . . . . . . . . 2025.5.4 Linearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

5.6 Sample Size Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2045.6.1 General Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2045.6.2 Specific Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

5.7 Deriving the Canonical Variates. . . . . . . . . . . . . . . . . . . . . . . . 2055.7.1 The Use of Covariance and Correlation Matrices. . . 2055.7.2 Eigenvalues and Associated Statistics. . . . . . . . . . . . 2065.7.3 Eigenvectors and Canonical Coefficients. . . . . . . . . . 208

5.8 Assessing the Importance ofthe Canonical Variates. . . . . . . . . 2095.8.1 Canonical Correlation Criterion. . . . . . . . . . . . . . . . . 2095.8.2 Canonical Redundancy Criterion. . . . . . . . . . . . . . . . 2125.8.3 Significance Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2165.8.4 Canonical Scores and Associated Plots. . . . . . . . . . . 218

5.9 Interpreting the Canonical Variates. . . . . . . . . . . . . . . . . . . . . . 2205.9.1 Standardized Canonical Coefficients. . . . . . . . . . . . . 2205.9.2 Structure Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . 2215.9.3 Canonical Cross-Loadings. . . . . . . . . . . . . . . . . . . . . 2235.9.4 Significance Tests Based on

Resampling Procedures . . . . . . . . . . . . . . . . . . . . . 2255.10 Validating the Canonical Variates. . . . . . . . . . . . . . . . . . . . . . . 225

5.10.1 Split-Sample Validation. . . . . . . . . . . . . . . . . . . . . . . 2265.10.2 Validation Using Resampling Procedures. . . . . . . . 227

5.11 Limitations of Canonical Correlation Analysis. . . . . . . . . . . . . 227Bibliography 228Appendix 5.1 230

Page 12: Multivariate Statistics for Wildlife and Ecology Research978-1-4612-1288-1/1.pdf · Multivariate Statistics for Wildlife and Ecology Research. Springer Science+Business Media, LLC

Contents xiii

6 Summary and Comparison 2336.1 Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2336.2 Relationship Among Techniques . . . . . . . . . . . . . . . . . . . . . . .. 234

6.2.1 Purpose and Source of Variation Emphasized. . . . . .. 2346.2.2 Statistical Procedure . . . . . . . . . . . . . . . . . . . . . . . . .. 2366.2.3 Type of Statistical Technique and

Variable Set Characteristics 2376.2.4 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 2386.2.5 Sampling Design. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 239

6.3 Complementary Use of Techniques . . . . . . . . . . . . . . . . . . . . .. 241

Appendix: Acronyms Used in This Book

Glossary

Index

249

251

279