Multivariate Ordination Analyses -...

Multivariate Ordination Analyses: Principal Component Analysis

Dilys Vela

Tatiana BozaTatiana Boza

Multivariate AnalysesMultivariate Analyses

A multivariate data set includes more thani bl d d f b fone variable recorded from a number of

replicate sampling or experimental units,i f d bjsometimes referred to as objects.

If these objects areIf these objects are organisms, the variables might be morphological g p gor physiological measurements

If the objects are ecological samplingecological sampling units, the variables might be gphysicochemical measurements or species abundances

What ordinations analyses are ?

Ordination is arranging items along a scale (axis) orl i l Th d f di i imultiples axes. The proposed of ordination is

summarized graphically complex relationships,extracting one or few dominant patterns from anextracting one or few dominant patterns from aninfinite number of possible patterns.

The placement of variables along an axis it is possiblebecause the ordination it is base on the variablesbecause the ordination it is base on the variablescorrelation.

What ordination analyses help us to ?see?

Select the most important variables from multipleSelect the most important variables from multiple variables imagined or hypothesized.

Reveal unforeseen patterns and suggest unforeseen processes.p

What type of question can we answer with ordination analysis?with ordination analysis?

In ecology, to seek and describe pattern of process.

In community ecology, to describe the strongest patterns in species composition.

I i i d d fi iIn systematics, to recognize and to define species boundaries.

Multivariate Analysis

Ordination Analysis Clasification (or Clustering Analysis)

Direct Gradient Analysis

Indirect Gradient Analysis

Linear Regression

(Few Species)

Correspondence Analysis (CA) (Many Species)

Distant

Detrended CA (DCA)

Canonical CA (CCA)

Redundancy Analysis (RDA)

Values

P i i l N t i

Raw Data available

Principal Coordinate Analysis (PCoA

Non‐metric Dimensional Analysis (NMDS) Principal

Components Analysis (PCA)

Non‐metric Dimensional Analysis (NMDS)

Detrended CA (DCA)

Canonical CA (CCA)

(PCA) (NMDS)

Principal Components AnalysisPrincipal Components Analysis

Principal component analysis (PCA) is a statistical p p y ( )technique that has been specifically developed to address data reduction.In general terms the major aim of PCA is to reduce theIn general terms, the major aim of PCA is to reduce the complexity of the interrelationships among a potentially large number of observed variables to a relatively small

b f l b f h h hnumber of linear combinations of them, which are referred to as principal components.Principal components analysis finds a set of orthogonalPrincipal components analysis finds a set of orthogonal standardized linear combinations which together explain all of the variation in the original data.

What are the assumptions of PCA?What are the assumptions of PCA?

Assumes relationships among variablesAssumes relationships among variables.cloud of points in p‐dimensional space has linear dimensions that can be effectively summarized by the principal axes.

If the structure in the data is NONLINEAR (the cloud f d h hof points twists and curves its way through p‐

dimensional space), the principal axes will not be an efficient and informative summary of the dataefficient and informative summary of the data.

Considerations before to run a PCAConsiderations before to run a PCA

Normal DistributionsNormal Distributions

Data Outliers

f iTransformations

Standardization

Data Matrix

Normal DistributionsNormal Distributions

• When using PCA data normality is notWhen using PCA data normality is not essential. However, these methods are based on the correlation or covariance matrix whichon the correlation or covariance matrix, which is strongly affected by non‐normally distributed data and the presence of outliersdistributed data and the presence of outliers.

Data outliersData outliers

• Extreme values as well as outliers can have aExtreme values as well as outliers can have a severe influence on PCA, since they are based on the correlation or covariance matrix (Pisonet al., 2003).

• Outliers should thus be removed prior to the statistical analysis, or statistical methods able to handle outliers should be employed, and h i fl f l d bthe influence of extreme values needs to be reduced (e.g., via a suitable transformation).

TransformationsTransformationsTransformations, which change the scale of measurement of the data in relation to meeting the normality assumption ofthe data, in relation to meeting the normality assumption of parametric analyses and the homogeneity of variance assumption of most of these analyses.

Transformations are particularly important for multivariate procedures based on eigenanalysis (e.g. principal components analysis) because covariances and correlations measure linearanalysis) because covariances and correlations measure linear relationships between variables.

Transformations that improve linearity will increase theTransformations that improve linearity will increase the efficiency with which the eigenanalysis extracts the eigenvectors.

StandardizationStandardization

The first stage in rotating the data cloud is toThe first stage in rotating the data cloud is to standardize the data by subtracting the mean and dividing by the standard deviationand dividing by the standard deviation.

It may be argued that we should not divide by the standard deviation By standardizing wethe standard deviation. By standardizing, we are giving all species the same variation, i.e. a standard deviation of 1standard deviation of 1.

Data MatrixData Matrix

We actually can have it both ways:We actually can have it both ways: A PCA without dividing by the standard deviation is an analysis of the covariance matrix. A PCA in which you do indeed divide by the standard deviation is an analysis of the correlation matrixmatrix.

When using species/variables measured inWhen using species/variables measured in different units, you must use a correlation matrixmatrix.

Look at Descriptors

Homogeneous nature?

All Same Kind ?

Same Units?

Heterogenous nature?

Different kind?

Different Units?

Same Order of Magnitude Different order of Magnitude?

S matrix

(Covariance)

R matrix

(Correlation)( ) ( )

Advantages Disadvantages

Correlation The results of There are considerable differences in the Matrix analyses for

different sets of random variables are more directly

standard deviations, caused mainly bydifferences in scale.None of the correlations is particularly large in

absolute valueare more directly comparable.

absolute value.PCs has moderate‐sized coefficients for several

of the variables.PCs give coefficients for standardized variables

and are therefore less easy to interpret directly.

CovarianceMatrix

PCs for the covariance matrix

The sensitivity of the PCs to the units of measurement used for each element of theMatrix covariance matrix

are each dominated by a single variable.The variances and

measurement used for each element of thevariables. If there are large differences between the variances of the elements of the variables, then those variables whose variances are largest

total variance are more meaningful indices for measuring variability

will tend to dominate the first few PCs.

measuring variability in data sets that are symmetric.

Eigenvalues & EigenvectorsEigenvalues & Eigenvectors

The eigenvectors are the loadings of theThe eigenvectors are the loadings of the principal components spanning the new PCA coordinate systemcoordinate system.

The amount of variability contained in each principal component is expressed by theprincipal component is expressed by the eigenvalues which are simply the variances of the scoresthe scores.

PCA searches for the direction in the multivariate space thatin the multivariate space that contains the maximum variability. This is the direction of the first principal component (PC1). The second principal p pcomponent (PC2) has to be orthogonal (perpendicular) to PC1 and will contain thePC1 and will contain the maximum amount of the remaining data variability. S b t i i lSubsequent principal components are found by the same principle.

Biplots

A biplot is a visualization tool to t lt f PCA Th PCApresent results of PCA. The PCA

biplot is called the scaling process.

The loadings(arrows) represent the elements. The lengths of the arrows i h l di l i lin the plot are directly proportional to the variability included in the two components (PC1 and PC2) displayed, and the angle between any two arrows is a measure of the correlation between those variablescorrelation between those variables.

MisconceptionsMisconceptions

PCA cannot cope with missing values (butPCA cannot cope with missing values (but neither can most other statistical methods).

It does not require normalityIt does not require normality.

It is not a hypothesis test.

There are no clear distinctions between response variables and explanatory variables.

When should PCA be used?When should PCA be used?

In community ecology PCA is useful forIn community ecology, PCA is useful for summarizing variables whose relationships are approximately linear or at least monotonicapproximately linear or at least monotonic.

e.g. A PCA of many soil properties might be used to extract a few components that summarize main dimensions of soil variation

PCA is generally NOT useful for ordinatingcommunity data. Why? Because relationships among species are highly nonlinear.

Community trendsCommunity trends along environmenalgradients appear as

Beta Diversity 2R - Covariance

g pp“horseshoes” in PCA ordinations.

None of the PC axes effectively

i h dA

summarizes the trend in species composition along Axis 1composition along the gradient.

The “Horseshoe”EffectThe Horseshoe Effect

Curvature of the gradient and the degree of infolding ofCurvature of the gradient and the degree of infolding of the extremes increase with beta diversity.

PCA ordinations are not useful summaries ofPCA ordinations are not useful summaries of community data except when beta diversity is very low

Using correlation generally does better than covariance.This is because standardization by species improves the correlation between Euclidean distance and environmental distancedistance.

What if there’s more than one d l l l d ?underlying ecological gradient?

When two or more underlying gradients with high beta diversity a “horseshoe” is usuallyhigh beta diversity a horseshoe is usually not detectable.

Interpretation problems are more severeInterpretation problems are more severe.

Data Set

Morphological and anatomicalMorphological and anatomical variation of Calophyllum L.

(Calophyllaceae) in South America.

D. Vela

Kielmeyeroideae

Calophylleae

Caraipa

Calophylleae•Calophyllum•Neotatea•Marila•Marila•Mahurea•Clusiella•Kielmeyera•Caraipa•Haploclathra•Poeciloneuron•MesuaMesua•Kayea•Mammea

Endodesmieae•Endoodesmia•Lebrunia

CalophyllumStevens, 2006

Wurdarck & Davis (2009)

Distribution of Calophyllaceae

144 144 speciesspecies

259259speciesspecies

1010speciesspecies

176176speciesspecies

Stevens, 2006http://www.mobot.org/MOBOT/research/APweb/

www.wikimedia.org

VeinResin canal

http://www.botany.hawaii.edu/faculty/carr/images/cal_ino.jpghttp://pakuwon.wordpress.com/2009/02/13/nyamplung‐calophyllum‐inophyllum/

http://www.flickr.com/photos/mauroguanandi

Calophyllum brasiliense

• There is infraspecific variation in• There is infraspecific variation in tepal number between individuals of the same species, and between flowers from the same inflorescence.flowers from the same inflorescence.

Stevens (1974,1980)

Calophyllum brasiliense

http://www.nationaalherbarium.nl/sungaiwain/Calophyllum pisiferum

Calophyllum lanigerum

1 M i bj ti1. Main objective1.A To distinguish species limits of Calophyllum

in South America.

2 Specific objectives2. Specific objectives 2.A To analyze morphological and anatomical

i tivariation.

Data collection for morphological observationsHerbarium and personalHerbarium and personal

collections.

Collection sort: qualitative characteristics (Systematic Association Committee forAssociation Committee for descriptive Biological Terminology (cited by Stearn2006).

Measurement. Ruler and a digital g

caliper.

E l d t t iExcel data matrix .Specimen collections in rows and variables

in columns.

Leaf characters Flower characters Fruit charactersExternal Fruit length mm

Petiole length mm (PTL) Pedicel length mm (PDL)g

(FrLEx)

Leaf length cm (LL) Perianth width mm (PW )External Fruit width mm (FrWEx)

L f l th t id t tLeaf length at widest part cm (LWWP) Perianth length mm (PRL) Internal Fruit length mm (FrLIn)

Leaf width cm (LW) Anther length mm (AL)Internal Fruit width mm (FrWIn) ( ) g ( ) ( )

Apex length mm (PL) Anther width mm (AW) Stigma remained mm (StygR) Midrib width at abaxial side mm(MW) Stamen length mm (STL) Basal discoloration mm (BsDis)

Vein angle degree (VA) Filament length mm (FL) Stone mm (Stn)

Venation density (VD) Style length mm (STYL) Corky mm (CRK)

Gynoecium length mm (GL)Gynoecium length mm (GL)

Ovary length mm (OL)

Stigma width mm (SL)

REFERENCESREFERENCES

Claude, Julien. 2008. Morphometrics with R. Springer.

Gotelli, Nicholas J., and Aaron M. Ellison. 2004. A primer of ecological statistics. Sinauer Associates Publishers.

Jolliffe, I. T. 2002. Principal component analysis. Springer.

Legendre, Pierre, and Louis Legendre. 1998. Numerical ecology. Elsevier.

Q i G ld P d Mi h l J K h 2002 E i lQuinn, Gerald Peter, and Michael J. Keough. 2002. Experimental design and data analysis for biologists. Cambridge University Press.

Multivariate Ordination Analyses -...

Documents

Transcript of Multivariate Ordination Analyses -...

Fábio Daniel Santos Ferreira for Multivariate Analyses...MANOVA Multivariate Analysis of Variance MGLM Multivariate General Linear Model MNI Montreal Neurological Institute MOG Mixture

Multivariate return periods in hydrology: a critical …practitioners concerned with multivariate frequency analyses in hydrology. Finally, conclusions are drawn in Sect. 7. 2 Constructing

Multivariate Analysis in Ecology and Systematics: …...multivariate analyses. In the last section, we give examples of how some basic concepts in ecology, wildlife management, and

Multivariate Analyses Cellular Fatty Acids Bacteroides ... · MULTIVARIATE ANALYSES OF CELLULAR FATTY ACIDS 185 ids wereobtainedfromSigmaChemicalCo., St. Louis, Mo. Bacterialacid

Introduction to multivariate analysis Outlinebio501/R/lecturepdf/2018/12.Multivaria… · Introduction to multivariate analysis Outline • Why do a multivariate analysis • Ordination,

SudhirRaman-Multivariate Analyses with fMRI data … · MULTIVARIATE ANALYSES WITH fMRI DATA ... Bayes in SPM Generative ... Bishop (2006), Pitt & Miyung (2002), TICS . Motivation

Multivariate Analyses · To create a multivariate analysis, choose Analyze:Multivariate ( Y’s ). If you have ... For a covariance or correlation matrix, the sum of its eigenvalues

Basic concepts in ordination. What is ordination? Finding a concise and useful summary of the patterns within multivariate data. An arrangement of units.

Multivariate statistical analyses and machine learning for … · 2017-02-14 · Multivariate statistical analyses and machine learning for metabolomics Roy Goodacre School of Chemistry,

Multivariate data Peter Shaw. Apposite quotes “I would always start an analysis by running a PCA ordination” (Prof John Jeffers). “Multivariate statistics.

APPLICABILITY OF COPULA FUNCTIONS IN HYDROLOGY · multivariate flood frequency analyses, modelling droughts, rainfall analyses, IDF curves determination, to check adequacy of dam

Multivariate Analyses with fMRI data - TNU · MULTIVARIATE ANALYSES WITH fMRI DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) ... Bayes in SPM Generative Embedding

Validation of Multivariate Outlier Detection Analyses Used ... · Validation of Multivariate Outlier Detection Analyses Used to Identify Potential Drug-Induced Liver Injury in Clinical

Multivariate Analyses with manova and GLM - Department of

Oil facility operations: a multivariate analyses of water ... · 1 1 Oil facility operations: a multivariate analyses of water 2 pollution parameters 3 4 Babatunde A. Anifowose*1

Multivariate analyses in soil microbial ecology: a new ...pbil.univ-lyon1.fr/JTHome/ref/pub142 - V4.pdf · Multivariate analyses in soil microbial ecology: a new paradigm. J. Thioulouse

Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses in Environmental Data

Multivariate Ordination

Multivariate analyses of factors affecting dividend policy of … · 2013-10-08 · MULTIVARIATE ANALYSES OF FACTORS AFFECTING DIVIDEND POLICY OF ACQUIRED EUROPEAN BANKS Matthias

Multivariate analyses of factors affecting dividend policy ... analyses.pdf · Multivariate analyses of factors affecting dividend policy of acquired European banks Nnadi, M.A. ,