A tutorial for the spatial Analysis of Principal Components (sPCA ...
Transcript of A tutorial for the spatial Analysis of Principal Components (sPCA ...
![Page 1: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/1.jpg)
A tutorial for the spatial Analysis of PrincipalComponents (sPCA) using adegenet 2.0.0
Thibaut Jombart ∗
Imperial College London
MRC Centre for Outbreak Analysis and Modelling
June 23, 2015
Abstract
This vignette provides a tutorial for the spatial analysis of principal components(sPCA, [1]) using the adegenet package [2] for the R software [3]. sPCA is firstillustrated using a simple simulated dataset, and then using empirical data of Chamois(Rupicapra rupicapra) from the Bauges mountains (France). In particular, we illustratehow sPCA complements classical PCA by being more powerful for retrieving non-trivialspatial genetic patterns.
1
![Page 2: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/2.jpg)
Contents
1 Introduction 31.1 Rationale of sPCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The spca function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Contents of a spca object . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Graphical display of spca results . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Case study: spatial genetic structure of the chamois in the Baugesmountains 252.1 An overview of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Summarising the genetic diversity . . . . . . . . . . . . . . . . . . . . . . . . 282.3 Mapping and testing PCA results . . . . . . . . . . . . . . . . . . . . . . . . 352.4 Multivariate tests of spatial structure . . . . . . . . . . . . . . . . . . . . . . 422.5 Spatial Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . 43
2
![Page 3: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/3.jpg)
1 Introduction
This tutorial goes through the spatial Principal Component Analysis (sPCA, [1]), amultivariate method devoted to the identification of spatial genetic patterns. The purposeof this tutorial is to provide guidelines for the application of sPCA as well as to illustrateits usefulness for the investigation of spatial genetic patterns. After briefly going throughthe rationale of the method, we introduce the different tools implemented for sPCA inadegenet. This technical overview is then followed by the analysis of an empirical datasetwhich illustrates the advantage of sPCA over classical PCA for investigating spatial patterns.
1.1 Rationale of sPCA
Mathematical notations used in this tutorial are identical to the original publication [1].The sPCA analyses a matrix of relative allele frequencies X which contains genotypes orpopulations (later refered to as ’entities’) in rows and alleles in columns. Spatial informationis stored inside a spatial weighting matrix L which contains positive terms correspondingto some measurement (often binary) of spatial proximity among entities. Most often, theseterms can be derived from a connection network built upon a given algorithm (for instance,pp.752-756 in [4]). This matrix is row-standardized (i.e., each of its rows sums to one), andall its diagonal terms are zero. L can be used to compute the spatial autocorrelation of agiven centred variable x (i.e., with mean zero) with n observations (x ∈ Rn) using Moran’sI [5, 6, 7]:
I(x) =xTLx
xTx(1)
In the case of genetic data, x contains frequencies of an allele. Moran’s I can be used tomeasure spatial structure in the values of x: it is highly positive when values of x observedat neighbouring sites tend to be similar (positive spatial autocorrelation, referred to as globalstructures), while it is strongly negative when values of x observed at neighbouring sitestend to be dissimilar (negative spatial autocorrelation, referred to as local structures).
However, since it is standardized by the variance of x, Moran’s index measures only spatialstructures and not genetic variability. The sPCA defines the following function to measureboth spatial structure and variability in x:
C(x) = var(x)I(x) =1
nxTLx (2)
C(x) is highly positive when x has a large variance and exhibits a global structure;conversely, it is largely negative when x has a high variance and displays a local structure.This function is the criterion used in sPCA, which finds linear combinations of the alleles ofX (denoted Xv) decomposing C from its maximum to its minimum value. Because C(Xv)is a product of variance and autocorrelation, it is important, when interpreting the results, todetail both components and to compare their value with their range of variation (maximumattainable variance, as well as maximum and minimum I are known analytically). A structurewith a low spatial autocorrelation can barely be interpreted as a spatial pattern; similarly, astructure with a low variance would likely not reflect any genetic structure. We will later seehow these information can be retrieved from spca results.
3
![Page 4: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/4.jpg)
1.2 The spca function
The simulated dataset used to illustrate this section has been analyzed in [1], and correspondsto Figure 2A of the article. In adegenet, the matrix of alleles frequencies previously denotedX exactly corresponds to the @tab slot of genind or genpop objects:
library(adegenet)
library(spdep)
library(adehabitat)
data(spcaIllus)
obj <- spcaIllus$dat2A
obj
## /// GENIND OBJECT /////////
##
## // 80 individuals; 20 loci; 192 alleles; size: 110.2 Kb
##
## // Basic content
## @tab: 80 x 192 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 6-14)
## @loc.fac: locus factor for the 192 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: old2new(object = obj)
##
## // Optional content
## @pop: population of each individual (group size range: 10-35)
## @other: a list containing: xy
head(tab(obj[loc=1]))
## L01.1 L01.2 L01.3 L01.4 L01.5 L01.6 L01.7 L01.8 L01.9
## 0035 0 0 0 0 1 1 0 0 0
## 0352 0 0 1 0 1 0 0 0 0
## 0423 0 0 0 0 1 0 0 0 1
## 0289 0 0 0 0 0 1 0 0 1
## 0487 0 0 0 0 0 1 0 1 0
## 0053 0 0 0 0 1 1 0 0 0
The object obj is a genind object; note that here, we only displayed the table for the firstlocus (loc=1).
The function performing the sPCA is spca; it accepts a bunch of arguments, but onlythe first two are mandatory to perform the analysis (see ?spca for further information):
4
![Page 5: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/5.jpg)
args(spca)
## function (obj, xy = NULL, cn = NULL, matWeight = NULL, scale = FALSE,
## scannf = TRUE, nfposi = 1, nfnega = 1, type = NULL, ask = TRUE,
## plot.nb = TRUE, edit.nb = FALSE, truenames = TRUE, d1 = NULL,
## d2 = NULL, k = NULL, a = NULL, dmin = NULL)
## NULL
The argument obj is a genind/genpop object. By definition in sPCA, the studied entitiesare georeferenced. The spatial information can be provided to the function spca in severalways, the first being through the xy argument, which is a matrix of spatial coordinates with’x’ and ’y’ coordinates in columns. Alternatively, these coordinates can be stored inside thegenind/genpop object, preferably as @other$xy, in which case the spca function will detectand use this information, and not request an xy argument. Note that obj already containsspatial coordinates at the appropriate place. Hence, we can use the following command torun the sPCA (ask and scannf are set to FALSE to avoid interactivity):
mySpca <- spca(obj, ask=FALSE, type=1, scannf=FALSE)
##
## PLEASE NOTE: The components "delsgs" and "summary" of the
## object returned by deldir() are now DATA FRAMES rather than
## matrices (as they were prior to release 0.0-18).
## See help("deldir").
##
## PLEASE NOTE: The process that deldir() uses for determining
## duplicated points has changed from that used in version
## 0.0-9 of this package (and previously). See help("deldir").
5
![Page 6: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/6.jpg)
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
Note, however, that spatial coordinates are not directly used in sPCA: the spatialinformation is included in the analysis by the spatial weighting matrix L derived from aconnection network (eq. 1 and 2). Technically, the spca function can incorporate spatialweightings as a matrix (argument matWeight), as a connection network with the classes nb
or listw (argument cn), both implemented in the spdep package. The function chooseCN
is a wrapper for different functions scattered across several packages implementing a varietyof connection networks. If only spatial coordinates are provided to spca, chooseCN is calledto construct an appropriate graph. See ?chooseCN for more information. Note that many ofthe spca arguments are in fact arguments for chooseCN: type, ask, plot.nb, edit.nb, d1,d2, k, a, and dmin. For instance, the command:
mySpca <- spca(obj,type=1,ask=FALSE,scannf=FALSE)
performs a sPCA using the Delaunay triangulation as connection network (type=1, see?chooseCN), while the command:
6
![Page 7: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/7.jpg)
mySpca <- spca(obj,type=5,d1=0,d2=2,scannf=FALSE)
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
computes a sPCA using a connection network which defines neighbouring entities basedon pairwise geographic distances (type=5), considering as neighbours two entities whosedistance between 0 (d1=0) and 2 (d2=2).
Another possibility is of course to provide directly a connection network (nb object) ora list of spatial weights (listw object) to the spca function; this can be done via the cn
argument. For instance:
myCn <- chooseCN(obj$other$xy, type=6, k=10, plot=FALSE)
myCn
## Neighbour list object:
## Number of regions: 80
## Number of nonzero links: 932
## Percentage nonzero weights: 14.5625
## Average number of links: 11.65
7
![Page 8: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/8.jpg)
class(myCn)
## [1] "nb"
mySpca2 <- spca(obj,cn=myCn,scannf=FALSE)
produces a sPCA using myCn (k = 10 nearest neighbours) as a connection network.
When used interactively (scannf=TRUE), spca displays a barplot of eigenvalues and asksthe user for a number of positive axes (’first number of axes’) and negative axes (’secondnumber of axes’) to be retained. For the object mySpca, this barplot would be (here weindicate in red the retained eigenvalue):
barplot(mySpca$eig,main="Eigenvalues of sPCA", col=rep(c("red","grey"),c(1,100)))
Eigenvalues of sPCA
−0.
050.
000.
050.
100.
150.
20
Positive eigenvalues (on the left) correspond to global structures, while negative eigenvalues(on the right) indicate local patterns. Actual structures should result in more extreme(positive or negative) eigenvalues; for instance, the object mySpca likely contains one singleglobal structure, and no local structure. If one does not want to choose the number of
8
![Page 9: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/9.jpg)
retained axes interactively, the arguments nfposi (number of retained factors with positiveeigenvalues) and nfnega (number of retained factors with negative eigenvalues) can be used.Once this information has been provided to spca, the analysis is computed and stored insidean object with the class spca.
1.3 Contents of a spca object
Let us consider a spca object resulting from the analysis of the object obj, using a Delaunaytriangulation (type=1) as connection network:
mySpca <- spca(obj,type=1,scannf=FALSE,plot.nb=FALSE,nfposi=1,nfnega=0)
class(mySpca)
## [1] "spca"
mySpca
## ########################################
## # spatial Principal Component Analysis #
## ########################################
## class: spca
## $call: spca(obj = obj, scannf = FALSE, nfposi = 1, nfnega = 0, type = 1,
## plot.nb = FALSE)
##
## $nfposi: 1 axis-components saved
## $nfnega: 0 axis-components saved
## Positive eigenvalues: 0.2309 0.1118 0.09379 0.07817 0.06911 ...
## Negative eigenvalues: -0.08421 -0.07376 -0.06978 -0.06648 -0.06279 ...
##
## vector length mode content
## 1 $eig 79 numeric eigenvalues
##
## data.frame nrow ncol
## 1 $c1 192 1
## 2 $li 80 1
## 3 $ls 80 1
## 4 $as 2 1
## content
## 1 principal axes: scaled vectors of alleles loadings
## 2 principal components: coordinates of entities ('scores')
## 3 lag vector of principal components
## 4 pca axes onto spca axes
##
## $xy: matrix of spatial coordinates
## $lw: a list of spatial weights (class 'listw')
##
## other elements: NULL
9
![Page 10: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/10.jpg)
An spca object is a list containing all required information about a performed sPCA. Detailsabout the different components of such a list can be found in the spca documentation (?spca).The purpose of this section is to explicit how the elements described in [1] are stored insidea spca object.
First, eigenvalues of the analysis are stored inside the $eig component as a numeric vectorstored in decreasing order:
head(mySpca$eig)
## [1] 0.23087862 0.11184721 0.09378750 0.07816561 0.06910536 0.06429596
tail(mySpca$eig)
## [1] -0.05480010 -0.06279067 -0.06647896 -0.06978457 -0.07375563 -0.08421213
length(mySpca$eig)
## [1] 79
barplot(mySpca$eig, main="A variant of the plot\n of sPCA eigenvalues",
col=spectral(length(mySpca$eig)))
legend("topright", fill=spectral(2),
leg=c("Global structures", "Local structures"))
abline(h=0,col="grey")
10
![Page 11: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/11.jpg)
A variant of the plot of sPCA eigenvalues
−0.
050.
000.
050.
100.
150.
20
Global structuresLocal structures
The axes of the analysis, denoted v in eq. (4) [1] are stored as columns inside the $c1
component. Each column contains loadings for all the alleles:
head(mySpca$c1)
## Axis 1
## L01.1 1.268838e-02
## L01.2 1.665335e-16
## L01.3 -1.119979e-01
## L01.4 -4.440892e-16
## L01.5 -2.766095e-02
## L01.6 -4.477031e-02
tail(mySpca$c1)
## Axis 1
## L20.3 0.28715850
## L20.4 0.01485180
## L20.5 -0.01500353
11
![Page 12: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/12.jpg)
## L20.6 0.01659481
## L20.7 -0.14260743
## L20.8 -0.15388988
dim(mySpca$c1)
## [1] 192 1
The entity scores, denoted ψ = Xv in the article, are stored in columns in the $li component:
head(mySpca$li)
## Axis 1
## 0035 -0.4367748
## 0352 -0.8052723
## 0423 -0.4337114
## 0289 0.1434650
## 0487 -0.4802931
## 0053 -0.5421831
tail(mySpca$li)
## Axis 1
## 1074 -0.06178196
## 1187 -0.08144162
## 1260 0.41491795
## 1038 0.25643986
## 1434 0.35618737
## 1218 0.21433977
dim(mySpca$li)
## [1] 80 1
The lag vectors of the scores can be used to better perceive global structures. Lag vectorsare stored in the $ls component:
head(mySpca$ls)
## Axis 1
## 0035 -0.7076732
## 0352 -0.6321654
## 0423 -0.4822952
## 0289 0.3947791
## 0487 -0.2803381
12
![Page 13: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/13.jpg)
## 0053 -0.4848376
tail(mySpca$ls)
## Axis 1
## 1074 0.4930238
## 1187 -0.8384871
## 1260 0.6887072
## 1038 0.3665794
## 1434 0.3109197
## 1218 0.3329688
dim(mySpca$ls)
## [1] 80 1
Lastly, we can compare the axes of an classical PCA (denoted u in the paper) to the axes ofthe sPCA (v). This is achieved by projecting u onto v, but this projection is a particularone: because both u and v are centred to mean zero and scaled to unit variance, the value ofthe projection simply is the correlation between both axes. This information is stored insidethe $as component:
mySpca$as
## Axis 1
## PCA Axis1 -0.7363595
## PCA Axis2 0.3395674
1.4 Graphical display of spca results
The information contained inside a spca object can be displayed in several ways. While wehave seen that a simple barplot of sPCA eigenvalues can give a first idea of the global and localstructures to be retained, we have also seen that each eigenvalue can be decomposed into avariance and a spatial autocorrelation (Moran’s I) component. This information is providedby the summary function, but it can also be represented graphically. The correspondingfunction is screeplot, and can be used on any spca object:
screeplot(mySpca)
13
![Page 14: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/14.jpg)
0.0 0.5 1.0 1.5 2.0
Variance
Spa
tial a
utoc
orre
latio
n (I
) λ1
λ2λ3λ4λ5λ6λ7λ8λ9λ10λ11λ12
λ13λ14λ15λ16λ17λ18λ19λ20λ21
λ22λ23
λ24λ25λ26λ27λ28λ29λ30λ31λ32λ33λ34λ35λ36λ37λ38λ39λ40λ41
λ42λ43λ44λ45
λ46λ47λ48λ49λ50λ51λ52
λ53λ54λ55λ56λ57
λ58λ59λ60λ61
λ62λ63λ64λ65λ66λ67λ68λ69λ70λ71 λ72λ73λ74 λ75
λ76λ77λ78λ79
−0.5
−0.1
0
0.3
0.6
1
Spatial and variance components of the eigenvalues
The resulting figure represents eigenvalues of sPCA (denoted λi with i = 1, . . . , r, where λ1is the highest positive eigenvalue, and λr is the highest negative eigenvalue) according thetheir variance and Moran’s I components. These eigenvalues are contained inside a rectangleindicated in dashed lines. The maximum attainable variance by a linear combination ofalleles is the one from an ordinary PCA, indicated by the vertical dashed line on the right.The two horizontal dashed lines indicate the range of variation of Moran’s I, given thespatial weighting matrix that was used. This figure is useful to assess whether a given scoreof entities contains relatively enough variability and spatial structuring to be interpreted.For instance, here, λ1 clearly is the largest eigenvalue in terms of variance and of spatialautocorrelation, and can be well distinguished from all the other eigenvalues. Hence, onlythe first global structure, associated to λ1, should be interpreted.
The global and local tests proposed in [1] can be used to reinforce the decision ofinterpreting or not interpreting global and local structures. Each test can detect the presenceof one kind of structure. We can apply them to the object obj, used in our sPCA:
myGtest <- global.rtest(obj$tab,mySpca$lw,nperm=99)
myGtest
14
![Page 15: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/15.jpg)
## Monte-Carlo test
## Call: global.rtest(X = obj$tab, listw = mySpca$lw, nperm = 99)
##
## Observation: 0.01658103
##
## Based on 99 replicates
## Simulated p-value: 0.01
## Alternative hypothesis: greater
##
## Std.Obs Expectation Variance
## 4.723216e+00 1.298668e-02 5.791148e-07
plot(myGtest)
Histogram of sim
sim
Fre
quen
cy
0.011 0.012 0.013 0.014 0.015 0.016 0.017
05
1015
2025
The produced object is a randtest object (see ?randtest), which is the class of objects forMonte-Carlo tests in the ade4 package. As shown, such object can be plotted using a plot
function: the resulting figure shows an histogram of permuted test statistics and indicatesthe observed statistics by a black dot and a segment. Here, the plot clearly shows that theoberved test statistic is larger than most simulated values, leading to a likely rejection of
15
![Page 16: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/16.jpg)
the null hypothesis of absence of spatial structure. Note that because 99 permutations wereused, the p-value cannot be lower than 0.01. In practice, more permutations should be used(like 999 or 9999 for results intended to be published).
The same can be done with the local test, which here we do not expect to be significant:
myLtest <- local.rtest(obj$tab,mySpca$lw,nperm=99)
myLtest
## Monte-Carlo test
## Call: local.rtest(X = obj$tab, listw = mySpca$lw, nperm = 99)
##
## Observation: 0.01397349
##
## Based on 99 replicates
## Simulated p-value: 0.16
## Alternative hypothesis: greater
##
## Std.Obs Expectation Variance
## 1.131138e+00 1.311777e-02 5.723089e-07
plot(myLtest)
16
![Page 17: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/17.jpg)
Histogram of sim
sim
Fre
quen
cy
0.011 0.012 0.013 0.014 0.015 0.016
05
1015
2025
30
Once we have an idea of which structures shall be interpreted, we can try to visualizespatial genetic patterns. There are several ways to do so. The first, most simple approach isthrough the function plot (see ?plot.spca):
plot(mySpca)
17
![Page 18: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/18.jpg)
This figure displays various information, that we detail from the top to bottom and fromleft to right (also see ?plot.spca). The first plot shows the connection network thatwas used to define spatial weightings. The second, third, and fourth plots are differentrepresentations of a score of entities in space, the first global score being the default(argument axis). In each, the values of scores ($li[,axis] component of the spca object)are represented using black and white symbols (a variant being grey levels): white fornegative values, and black for positive values. The second plot is a local interpolation ofscores (function s.image in ade4 ), using grey levels, with contour lines. The closer thecontour lines are from each other, the stepest the genetic differentiation is. The third plotuses different sizes of squares to represent different absolute values (s.value in ade4 ): largeblack squares are well differentiated from large white squares, but small squares are lessdifferentiated. The fourth plot is a variant using grey levels (s.value in ade4, with ’greylevel’method). Here, all the three representations of the first global score show that genotypesare splitted in two genetical clusters, one in the west (or left) and one in the east (right).The last two plots of the plot.spca function are the two already seen displays of eigenvalues.
While the default plot function for spca objects provides a useful summary of the results,more flexible tools are needed e.g. to map the principal components onto the geographic
18
![Page 19: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/19.jpg)
space. This can be achieved using the colorplot function. This function can summarize upto three scores at the same time by translating each score into a channel of color (red, green,and blue). The obtained values are used to compose a color using the RGB system. See?colorplot for details about this function. The original idea of such representation is dueto [8]. Despite the colorplot clearly is more powerful to represent more than one score on asingle map, we can use it to represent the first global structure that was retained in mySpca:
colorplot(mySpca,cex=3,main="colorplot of mySpca, first global score")
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
0 2 4 6 8 10
02
46
810
colorplot of mySpca, first global score
x
y
See examples in ?colorplot and ?spca for more examples of applications of colorplot torepresent sPCA scores.
Another common practice is interpolating principal components to get maps of geneticclines. Note that it is crucial to perform this interpolation after the analysis, and not before,which would add artefactual structures to the data. Interpolation is easy to realize usinginterp from the akima package, and image, or filled.contour to display the results:
19
![Page 20: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/20.jpg)
library(akima)
x <- other(obj)$xy[,1]
y <- other(obj)$xy[,2]
temp <- interp(x, y, mySpca$li[,1])
image(temp, col=azur(100))
points(x,y)
Note that for better clarity, we can use the lagged principal scores ($ls) rather thanthe original scores ($li); we also achieve a better resolution using specific interpolatedcoordinates:
interpX <- seq(min(x),max(x),le=200)
interpY <- seq(min(y),max(y),le=200)
temp <- interp(x, y, mySpca$ls[,1], xo=interpX, yo=interpY)
image(temp, col=azur(100))
points(x,y)
20
![Page 21: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/21.jpg)
Alternatively, filled.contour can be used for the display, and a customized color palettecan be specified:
myPal <- colorRampPalette(c("firebrick2", "white", "lightslateblue"))
annot <- function(){title("sPCA - interpolated map of individual scores")
points(x,y)
}filled.contour(temp, color.pal=myPal, nlev=50,
key.title=title("lagged \nscore 1"), plot.title=annot())
21
![Page 22: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/22.jpg)
Besides assessing spatial patterns, it is sometimes valuable to assess which alleles actuallyexhibit the structure of interest. In sPCA, the contribution of alleles to a specific structureis given by the corresponding squared loading. We can look for the alleles contributing mostto e.g. the first axis of sPCA, using the function loadingplot (see ?loadingplot for adescription of the arguments):
myLoadings <- mySpca$c1[,1]^2
names(myLoadings) <- rownames(mySpca$c1)
loadingplot(myLoadings, xlab="Alleles",
ylab="Weight of the alleles",
main="Contribution of alleles \n to the first sPCA axis")
22
![Page 23: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/23.jpg)
0.00
0.02
0.04
0.06
0.08
0.10
Contribution of alleles to the first sPCA axis
Alleles
Wei
ght o
f the
alle
les
L01.3
L01.8
L01.9L02.05
L02.09
L03.4
L03.5L04.1L04.2
L05.8
L05.9L06.07L06.08
L07.3
L08.06L08.07
L09.01
L09.05L09.06
L10.5
L11.4
L11.5L11.6
L12.4
L12.7
L12.8
L13.05
L13.06L14.03L14.05
L14.11
L15.03L15.09
L16.02L16.10
L17.1
L17.2
L17.4
L17.6
L17.7
L18.05L18.06
L19.04
L19.05
L19.12
L20.3
L20.7
L20.8
See ?loadingplot for more information about this function, in particular for the definitionof the threshold value above which alleles are annotated. Note that it is possible to alsoseparate the alleles by markers, using the fac argument, to assess if all markers havecomparable contributions to a given structure. In our case, we would only have to [email protected]; also note that loadingplot invisibly returns information about the alleleswhose contribution is above the threshold. For instance, to identify the 5% of alleles withthe greatest contributions to the first global structure in mySpca, we need:
temp <- loadingplot(myLoadings, threshold=quantile(myLoadings, 0.95),
xlab="Alleles",ylab="Weight of the alleles",
main="Contribution of alleles \n to the first sPCA axis",
fac=obj$loc.fac, cex.fac=0.6)
23
![Page 24: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/24.jpg)
0.00
0.02
0.04
0.06
0.08
0.10
Contribution of alleles to the first sPCA axis
Alleles
Wei
ght o
f the
alle
les
L01 L02 L03 L04 L05 L06 L07 L08 L09 L10 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20
L08.06L08.07
L11.4
L12.4
L14.11 L16.02L16.10
L17.2
L20.3
L20.8
temp
## $threshold
## 95%
## 0.02345973
##
## $var.names
## [1] "L08.06" "L08.07" "L11.4" "L12.4" "L14.11" "L16.02" "L16.10"
## [8] "L17.2" "L20.3" "L20.8"
##
## $var.idx
## L08.06 L08.07 L11.4 L12.4 L14.11 L16.02 L16.10 L17.2 L20.3 L20.8
## 71 72 99 105 130 146 154 157 187 192
##
## $var.values
## L08.06 L08.07 L11.4 L12.4 L14.11 L16.02
## 0.03044687 0.03037709 0.06111338 0.03199067 0.02799529 0.02873923
## L16.10 L17.2 L20.3 L20.8
24
![Page 25: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/25.jpg)
## 0.02806079 0.05793290 0.08246000 0.02368209
But to assess the average contribution of each marker, the boxplot probably is a bettertool:
boxplot(myLoadings~obj$loc.fac, las=3, ylab="Contribution", xlab="Marker",
main="Contributions by markers \nto the first global score", col="grey")
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
L01
L02
L03
L04
L05
L06
L07
L08
L09
L10
L11
L12
L13
L14
L15
L16
L17
L18
L19
L20
0.00
0.02
0.04
0.06
0.08
Contributions by markers to the first global score
Marker
Con
trib
utio
n
2 Case study: spatial genetic structure of the chamois
in the Bauges mountains
The chamois (Rupicapra rupicapra) is a conserved species in France. The Bauges mountainsis a protected area in which the species has been recently studied. One of the most importantquestions for conservation purposes relates to whether individuals from this area form a singlereproductive unit, or whether they are structured into sub-groups, and if so, what causes arelikely to induce this structuring.
25
![Page 26: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/26.jpg)
While field observations are very scarce and do not allow to answer this question, geneticdata can be used to tackle the issue, as departure from panmixia should result in geneticstructuring. The dataset rupica contains 335 georeferenced genotypes of Chamois from theBauges mountains for 9 microsatellite markers, which we propose to analyse.
2.1 An overview of the data
We first load the data:
data(rupica)
rupica
## /// GENIND OBJECT /////////
##
## // 335 individuals; 9 loci; 55 alleles; size: 235.7 Kb
##
## // Basic content
## @tab: 335 x 55 matrix of allele counts
## @loc.n.all: number of alleles per locus (range: 4-10)
## @loc.fac: locus factor for the 55 columns of @tab
## @all.names: list of allele names for each locus
## @ploidy: ploidy of each individual (range: 2-2)
## @type: codom
## @call: NULL
##
## // Optional content
## @other: a list containing: xy mnt showBauges
rupica is a genind object, that is, the class of objects storing genotypes (as opposedto population data) in adegenet. rupica also contains topographic information about thesampled area, which can be displayed by calling rupica$other$showBauges. Altitude mapsare displayed using the adehabitat package [9]. The spatial distribution of the sampling canbe displayed as follows:
rupica$other$showBauges()
points(rupica$other$xy, col="red",pch=20)
26
![Page 27: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/27.jpg)
This spatial distribution is clearly not random, but seems arranged into loose clusters.However, superimposed samples can bias our visual assessment of the spatial clustering. Usea two-dimensional kernel density estimation (function s.kde2d) to overcome this possibleissue.
rupica$other$showBauges()
s.kde2d(rupica$other$xy,add.plot=TRUE)
points(rupica$other$xy, col="red",pch=20)
27
![Page 28: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/28.jpg)
Unfortunately, geographical clustering is not strong enough to assign unambiguously eachindividual to a group. Therefore, we need to carry all analyses at the individual level, whichprecludes the use of most population genetics tools.
2.2 Summarising the genetic diversity
As a prior clustering of genotypes is not known, we cannot employ usual FST -basedapproaches to detect genetic structuring. However, genetic structure could still result ina deficit of heterozygosity. Use the summary of genind objects to compare expected andobserved heterozygosity:
rupica.smry <- summary(rupica)
##
## # Total number of genotypes: 335
##
## # Population sample sizes:
## 1
## 335
28
![Page 29: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/29.jpg)
##
## # Number of alleles per locus:
## Bm1818 Bm203 Eth225 Hel1 Bm4505 Bmc1009 Ilst030 Maf70 Sr
## 7 10 7 6 5 5 6 4 5
##
## # Number of alleles per population:
## 1
## 55
##
## # Percentage of missing data:
## [1] 0
##
## # Observed heterozygosity:
## Bm1818 Bm203 Eth225 Hel1 Bm4505 Bmc1009 Ilst030
## 0.5880597 0.6208955 0.5253731 0.7492537 0.6537313 0.5283582 0.6298507
## Maf70 Sr
## 0.5552239 0.4149254
##
## # Expected heterozygosity:
## Bm1818 Bm203 Eth225 Hel1 Bm4505 Bmc1009 Ilst030
## 0.6072158 0.6532517 0.5314591 0.7259626 0.6603405 0.5706082 0.6393235
## Maf70 Sr
## 0.5473112 0.4039120
plot(rupica.smry$Hobs, rupica.smry$Hexp, main="Observed vs expected heterozygosity")
abline(0,1,col="red")
29
![Page 30: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/30.jpg)
●
●
●
●
●
●
●
●
●
0.45 0.50 0.55 0.60 0.65 0.70 0.75
0.40
0.45
0.50
0.55
0.60
0.65
0.70
Observed vs expected heterozygosity
rupica.smry$Hobs
rupi
ca.s
mry
$Hex
p
The red line indicate identity between both quantities. Observed heterozygosity do not seemto deviate massively from theoretical expectations. This is confirmed by a classical pairwiset-test::
t.test(rupica.smry$Hexp, rupica.smry$Hobs,paired=TRUE,var.equal=TRUE)
##
## Paired t-test
##
## data: rupica.smry$Hexp and rupica.smry$Hobs
## t = 1.1761, df = 8, p-value = 0.2734
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.007869215 0.024249885
## sample estimates:
## mean of the differences
## 0.008190335
30
![Page 31: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/31.jpg)
We can seek a global picture of the genetic diversity among genotypes using a PrincipalComponent Analysis (PCA, function dudi.pca in the ade4 package). The analysis isperformed on a table of standardized alleles frequencies, obtained by scaleGen. Note thatwe disable the scaling option when performing the PCA, which would otherwise re-scale thedata and therefore erase the previous scaling of scaleGen. The function dudi.pca displaysa barplot of eigenvalues and asks for a number of retained principal components:
rupica.X <- scaleGen(rupica)
rupica.pca1 <- dudi.pca(rupica.X, cent=FALSE, scale=FALSE, scannf=FALSE, nf=2)
barplot(rupica.pca1$eig, main="Rupica dataset - PCA eigenvalues",
col=heat.colors(length(rupica.pca1$eig)))
Rupica dataset − PCA eigenvalues
0.0
0.5
1.0
1.5
2.0
2.5
3.0
The output produced by dudi.pca is a dudi object. A dudi object contains variousinformation; in the case of PCA, principal axes (loadings), principal components (syntheticvariable), and eigenvalues are respectively stored in $c1, $li, and $eig slots. Here is thecontent of the PCA:
31
![Page 32: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/32.jpg)
rupica.pca1
## Duality diagramm
## class: pca dudi
## $call: dudi.pca(df = rupica.X, center = FALSE, scale = FALSE, scannf = FALSE,
## nf = 2)
##
## $nf: 2 axis-components saved
## $rank: 51
## eigen values: 3.004 2.555 2.264 2.101 2.093 ...
## vector length mode content
## 1 $cw 55 numeric column weights
## 2 $lw 335 numeric row weights
## 3 $eig 51 numeric eigen values
##
## data.frame nrow ncol content
## 1 $tab 335 55 modified array
## 2 $li 335 2 row coordinates
## 3 $l1 335 2 row normed scores
## 4 $co 55 2 column coordinates
## 5 $c1 55 2 column normed scores
## other elements: cent norm
In general, eigenvalues represent the amount of genetic diversity — as measured bythe multivariate method being used — represented by each principal component (PC). Anabrupt decrease in eigenvalues is likely to indicate the boundary between true patterns andnon-interpretable structures. In this case, the first two PCs may contain some relevantbiological signal.
We can use s.label to display the two first components of the analysis. Kernel densityestimation (s.kde2d) is used for a better assessment of the distribution of the genotypes ontothe principal axes:
s.label(rupica.pca1$li)
s.kde2d(rupica.pca1$li, add.p=TRUE, cpoint=0)
add.scatter.eig(rupica.pca1$eig,2,1,2)
32
![Page 33: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/33.jpg)
d = 5
8
63 64 66 67 68 69
70
71 72
74 76
77 78 81
82
83
85 86
162 163
164
165
169
170
545 556 599 600
616
633 636
664
665
669
734
735
862 863 1170
1171
1429 1474 2000 2002
2004
2051
2053
2057
2158
2183
2198
2307
2309
2317
2319
2320 2321
2323
2329
2332 2333 2340
2344 2347
2350
2353
2355
2357
2358
2360
2372 2373 2374
2379 2401
2402
2406
2408
2412
2418 2443 2454
2459
2469
2485
2498 2503
2508
2514 2542 2546
2562 2564
2569
2577
2581 2588
2590
2596 2604
2609 2614
2615
2626 2628
2632
2638
2643
2645
2672
2680 2682
2689 2695
2698
2721 5572
5575
5576
5578 5579 5580 5583
5584
5852 5853
5855
5856 5862
5864
5866
5868
5869 5895
5896
5899
5904
5911 5912
5915 5919
5922 5924
5938
5939
5948
5951
5954
5956 5961 5963
5966 5969
5971 5972
5980
5983 5988
5991
5994 5995
5996
5997
6001
6004
6007
6008 6013
6014 6016 6019 6020
6021
6023
6024
6027
6029 6030
6031 6033
6034
6035
6036
6037
6043
6045
6071
6087
6100 6102
6111
6113 6132
6133
6142 6146
6147
6148
6149
6150
6151
6152 6153 6154
6155 6156 6157 6159
6161
6162 6163
6165
6166
6168 6184
6214
6216
6217 6218
6219
6220
6222 6223
6224 7072
7077 7090 7091
7094
7100
7101
7102
7104
7110
7112
7114
7120
7128
7136
7137
7141
7143 7161
7173
7174 7175
7176
7178
7179
7185
7186
7188 7189 7191
7192 7193 7194
7197
7199
7203 7206
7207
7209 7222 7224 7230
7234 7237
7239
7241
7244
7379
7380
7382 7385
7402
7435
7438
7443
7451
7456
7466
7469
7473
7474 7476 7477
7478
7480
7483
7484
7485
7496
7497
7500 7501 7502 7503 7505 7514
7516
7523 7524
7526
7535
7561
7562 7563
7566
7567 7570
7630
7634
7635
7636 7639 7641 7642 7643 7644 7647 7648
7652
7659
7666
7667
7668
6000b
6160b
6161b 7101b
7477b
7480b 7641b
Eigenvalues
This scatterplot shows that the only structure identified by PCA points to a few outliers.loadingplot confirms that this corresponds to the possession of a few original alleles:
loadingplot(rupica.pca1$c1^2)
33
![Page 34: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/34.jpg)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Loading plot
Variables
Load
ings
Bm203.221
Eth225.136Eth225.142
Eth225.146
Eth225.148
Eth225.151Hel1.120Bm4505.260
Bm4505.266Bmc1009.230
Ilst030.161
Maf70.134Maf70.139 Sr.231
We can go back to the genotypes for the concerned markers (e.g., Bm203) to check whetherthe highlighted genotypes are uncommon. truenames extracts the table of allele frequenciesfrom a genind object (restoring original labels for markers, alleles, and individuals):
X <- truenames(rupica)
## This accessor is now deprecated. Please use ’tab’ instead.
class(X)
## [1] "matrix"
dim(X)
## [1] 335 55
bm203.221 <- X[,"Bm203.221"]
table(bm203.221)
## bm203.221
## 0 1
## 331 4
34
![Page 35: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/35.jpg)
Only 4 genotypes possess one copy of the allele 221 of marker bm203 (the second resultcorresponds to a replaced missing data). Which individuals are they?
rownames(X)[bm203.221==0.5]
## named character(0)
These are indeed our outliers. From the point of view of PCA, this would be the onlystructure in the data. However, further analyses show that more is to be seen...
2.3 Mapping and testing PCA results
A frequent practice in spatial genetics is mapping the first principal components (PCs) ontothe geographic space. ade4 ’s function s.value is well-suited to do so, using black and whitesquares of variable size for positive and negative values. To give a legend for this type ofrepresentation:
s.value(cbind(1:11,rep(1,11)), -5:5, cleg=0)
text(1:11,rep(1,11), -5:5, col="red",cex=1.5)
35
![Page 36: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/36.jpg)
d = 2
−5 −4 −3 −2 −1 0 1 2 3 4 5
We apply this graphical representation to the first two PCs of the PCA:
showBauges <- rupica$other$showBauges
showBauges()
s.value(rupica$other$xy, rupica.pca1$li[,1], add.p=TRUE, cleg=0.5)
title("PCA - first PC",col.main="yellow" ,line=-2, cex.main=2)
36
![Page 37: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/37.jpg)
showBauges()
s.value(rupica$other$xy, rupica.pca1$li[,2], add.p=TRUE, csize=0.7)
title("PCA - second PC",col.main="yellow" ,line=-2, cex.main=2)
37
![Page 38: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/38.jpg)
As we can see, none of these PCs seems to display a particular spatial pattern. This visualassessment can be complemented by a test of spatial autocorrelation in these PCs. Thiscan be achieved using Moran’s I test. We use spdep’s function moran.mc to perform thesetwo tests. We first need to define the spatial connectivity between the sampled individuals.For these data, spatial connectivity is best defined as the overlap between home rangesof individuals, modelled as disks with a radius of 1150m. chooseCN is used to create thecorresponding connection network:
rupica.graph <- chooseCN(rupica$other$xy,type=5,d1=0,d2=2300, plot=FALSE,
res="listw")
The connection network should ressemble this:
rupica.graph
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 335
## Number of nonzero links: 18018
38
![Page 39: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/39.jpg)
## Percentage nonzero weights: 16.05525
## Average number of links: 53.78507
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 335 112225 335 15.04311 1352.07
plot(rupica.graph, rupica$other$xy)
title("rupica.graph")
We perform Moran’s test for the first two PCs, and plot the results.
pc1.mctest <- moran.mc(rupica.pca1$li[,1], rupica.graph, 999)
plot(pc1.mctest)
39
![Page 40: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/40.jpg)
−0.02 0.00 0.02 0.04
010
2030
4050
60
Density plot of permutation outcomes
Monte−Carlo simulation of Moran's Irupica.pca1$li[, 1]
Den
sity
This result is surprisingly significant. Why is this? Moran’s plot (moran.plot) representsthe tested variable against its lagged vector; we apply it to the first PC:
moran.plot(rupica.pca1$li[,1], rupica.graph)
40
![Page 41: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/41.jpg)
●
●
●
●
●●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●●
●●
●●
●
●
●
●
●●●● ●●
●●●●
●●●
●● ●●
●
●●●●●●
●●●●●
●●
● ●●
●
●●
●
●●●●
●
●
●●●● ●●● ●●●●●● ●●● ●●●● ●●●
●●●●
●●●
●●●●●●●
● ●
●
●●
●
●●●●
●●
●
●
●● ●●
●
●
●●●● ●●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●● ●●●●●●●●●
●
●●
●
●
●●●
●●●
●●
●
●●
●
●
●●
●
●
●●
●
●●●●●●●●●●●●●●
● ●●
● ●● ●
●●
●●●●●
●●
●
●
●●
●●●●
●
● ●●●●●●
●●
●●
●
●●●●●●●
●
●
●●●
●
●● ●●●●
●
●●
●
●●
●●●●●
●●●●●● ●● ●
●
●●●
●
●●
●●●●● ●●
●●●●
●●●●
●●
●
●●
●
●●
●●●●
●
●●●●●●
●
●
−20 −15 −10 −5 0
−2.
0−
1.5
−1.
0−
0.5
0.0
rupica.pca1$li[, 1]
spat
ially
lagg
ed r
upic
a.pc
a1$l
i[, 1
]1
2
3
4
56
7
9
1011
12151617
19
20
21
22
23
24
25
29
43
273
274275
276
Positive autocorrelation corresponds to a positive correlation between a variable and its lagvector. Here, we can see that this relation is entirely driven by the previously identifiedoutliers, which turn out to be neighbours. This is therefore a fairly trivial and uninterestingpattern. Results obtained on the second PC are less surprisingly non-significant:
pc2.mctest <- moran.mc(rupica.pca1$li[,2], rupica.graph, 999)
plot(pc2.mctest)
41
![Page 42: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/42.jpg)
−0.02 0.00 0.02 0.04
010
2030
40
Density plot of permutation outcomes
Monte−Carlo simulation of Moran's Irupica.pca1$li[, 2]
Den
sity
2.4 Multivariate tests of spatial structure
So far, we have only tested the existence of spatial structures in the first two principalcomponents of a PCA of the data. Therefore, these tests only describe one fragment of thedata, and do not encompass the whole diversity in the data. As a complement, we can useMantel test (mantel.randtest) to test spatial structures in the whole data, by assessing thecorrelation between genetic distances and geographic distances. Pairwise Euclidean distancesare computed using dist. Perform Mantel test, using the scaled genetic data you used beforein PCA, and the geographic coordinates.
mtest <- mantel.randtest(dist(rupica.X), dist(rupica$other$xy))
plot(mtest, nclass=30)
42
![Page 43: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/43.jpg)
Histogram of sim
sim
Fre
quen
cy
−0.10 −0.05 0.00 0.05 0.10 0.15
020
4060
80
Interestingly, this test turns out to be marginally significant, and would encourage us to lookfor spatial patterns. This is the role of the spatial Principal Component Analysis.
2.5 Spatial Principal Component Analysis
We apply an sPCA to the rupica dataset, using the connection network used previously inMoran’s I tests:
rupica.spca1 <- spca(rupica, cn=rupica.graph,scannf=FALSE,
nfposi=2,nfnega=0)
barplot(rupica.spca1$eig, col=rep(c("red","grey"), c(2,1000)),
main="rupica dataset - sPCA eigenvalues")
43
![Page 44: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/44.jpg)
rupica dataset − sPCA eigenvalues
−0.
005
0.00
00.
005
0.01
00.
015
0.02
00.
025
0.03
0
The principal components associated with the first two positive eigenvalues (in red) shall beretained. The printing of spca objects is more explicit than dudi objects, but named withthe same conventions:
rupica.spca1
## ########################################
## # spatial Principal Component Analysis #
## ########################################
## class: spca
## $call: spca(obj = rupica, cn = rupica.graph, scannf = FALSE, nfposi = 2,
## nfnega = 0)
##
## $nfposi: 2 axis-components saved
## $nfnega: 0 axis-components saved
## Positive eigenvalues: 0.03001 0.01445 0.00924 0.006851 0.004485 ...
## Negative eigenvalues: -0.008643 -0.006407 -0.004488 -0.003977 -0.003248 ...
##
## vector length mode content
44
![Page 45: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/45.jpg)
## 1 $eig 51 numeric eigenvalues
##
## data.frame nrow ncol
## 1 $c1 55 2
## 2 $li 335 2
## 3 $ls 335 2
## 4 $as 2 2
## content
## 1 principal axes: scaled vectors of alleles loadings
## 2 principal components: coordinates of entities ('scores')
## 3 lag vector of principal components
## 4 pca axes onto spca axes
##
## $xy: matrix of spatial coordinates
## $lw: a list of spatial weights (class 'listw')
##
## other elements: NULL
Unlike usual multivariate analyses, eigenvalues of sPCA are composite: they measure boththe genetic diversity (variance) and the spatial structure (spatial autocorrelation measuredby Moran’s I). This decomposition can also be used to choose which principal componentto interprete. The function screeplot allows to display this information graphically:
screeplot(rupica.spca1)
45
![Page 46: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/46.jpg)
While λ1 indicates with no doubt a structure, the second eigenvalue, λ2 is less clearly distinctfrom the successive values. Thus, we shall keep in mind this uncertainty when interpretingthe second principal component of the analysis.
We map the sPCA results using s.value and lagged scores ($ls) instead of the PC ($li),which are a ’denoisified’ version of the PCs.
showBauges()
s.value(rupica$other$xy, rupica.spca1$ls[,1], add.p=TRUE, csize=0.7)
title("sPCA - first PC",col.main="yellow" ,line=-2, cex.main=2)
46
![Page 47: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/47.jpg)
This first PC shows a remarkably clear structure opposing two high-altitude areas separatedby a valley, which is thought to be an obstacle to the dispersal of Chamois (due to higherexposition to predation in low-altitude areas).
The second PC of sPCA shows an equally interesting structure:
showBauges()
s.value(rupica$other$xy, rupica.spca1$ls[,2], add.p=TRUE, csize=0.7)
title("sPCA - second PC",col.main="yellow" ,line=-2, cex.main=2)
47
![Page 48: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/48.jpg)
The smaller clusters appearing on this map correspond to social units identified by directobservation in the field. Therefore, this genetic structure is merely a reflect of the socialbehaviour of these individuals.
Both genetic structures can be represented altogether using colorplot. The final figureshould ressemble this (although colors may change from one computer to another):
showBauges()
colorplot(rupica$other$xy, rupica.spca1$ls, axes=1:2, transp=TRUE, add=TRUE,
cex=3)
title("sPCA - colorplot of PC 1 and 2\n(lagged scores)", col.main="yellow",
line=-2, cex=2)
48
![Page 49: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/49.jpg)
49
![Page 50: A tutorial for the spatial Analysis of Principal Components (sPCA ...](https://reader031.fdocuments.in/reader031/viewer/2022022415/584d21b91a28ab8573902d5d/html5/thumbnails/50.jpg)
References
[1] Jombart T, Devillard S, Dufour A-B and Pontier D (2008) Revealing cryptic spatialpatterns in genetic variability by a new multivariate method. Heredity 101: 92-103.
[2] Jombart, T. (2008) adegenet: a R package for the multivariate analysis of genetic markers.Bioinformatics 24: 1403-1405.
[3] R Development Core Team (2011) R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
[4] Legendre P and Legendre L (1998) Numerical ecology. Elsevier Science B. V., Amsterdam.
[5] Moran P (1948). The interpretation of statistical maps. Journal of the Royal StatisticalSociety, B 10: 243–251.
[6] Moran, P. (1950) Notes on continuous stochastic phenomena. Biometrika 37: 17–23.
[7] Cliff A and Ord J (1981) Spatial Processes. Model & Applications. London: Pion.
[8] Menozzi P, Piazza A and Cavalli-Sforza LL (1978) Synthetic maps of human genefrequencies in Europeans. Science 201: 786–792.
[9] Callenge C (2006) The package ”adehabitat” for the R software: a tool for the analysisof space and habitat use by animals. Ecological Modelling 197: 516–519.
50