Introduction to multivariate analysis applications in...
Transcript of Introduction to multivariate analysis applications in...
![Page 1: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/1.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Introduction to multivariate analysis— applications in genomics —
Thibaut Jombart([email protected])
MRC Centre for Outbreak Analysis and ModellingImperial College London
MSc “Modern epidemiology”22-03-2013
1/33
![Page 2: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/2.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genomic data
Genetic diversity of pathogen populations
2/33
![Page 3: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/3.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genomic data
Genetic diversity of pathogen populations
3/33
![Page 4: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/4.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate data: some examples
Association between individuals? Correlations between variables?
4/33
![Page 5: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/5.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate data: some examples
Association between individuals? Correlations between variables?
4/33
![Page 6: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/6.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/33
![Page 7: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/7.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/33
![Page 8: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/8.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/33
![Page 9: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/9.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis to summarize diversity
5/33
![Page 10: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/10.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis: an overview
Multivariate analysis, a.k.a:
• “dimension reduction techniques”
• “ordinations in reduced space”
• “factorial methods”
Purposes:
• summarize diversity amongst observations
• summarize correlations between variables
6/33
![Page 11: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/11.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis: an overview
Multivariate analysis, a.k.a:
• “dimension reduction techniques”
• “ordinations in reduced space”
• “factorial methods”
Purposes:
• summarize diversity amongst observations
• summarize correlations between variables
6/33
![Page 12: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/12.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/33
![Page 13: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/13.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/33
![Page 14: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/14.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/33
![Page 15: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/15.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/33
![Page 16: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/16.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Most common methods
Differences lie in input data:
• quantitative/binary variables: Principal Component Analysis(PCA)
• 2 categorical variables: Correspondance Analysis (CA)
• >2 categorical variables: Multiple Correspondance Analysis(MCA)
• Euclidean distance matrix: Principal Coordinates Analysis(PCoA) / Metric Multidimensional Scaling (MDS)
Many other methods for ≥ 2 data tables, spatial analysis,phylogenetic analysis, etc.
7/33
![Page 17: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/17.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/33
![Page 18: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/18.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/33
![Page 19: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/19.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
1 dimension, 2 dimensions, P dimensions
Need to find most informative directions in a P -dimensional space.
8/33
![Page 20: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/20.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis(‖u‖2 =
∑Pj=1 u
2j = 1)
• v ∈ RN ; v = Xu =∑P
j=1 ujxj : principal component
→ find u so that 1N ‖v‖
2 = var(v) is maximum.
9/33
![Page 21: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/21.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis(‖u‖2 =
∑Pj=1 u
2j = 1)
• v ∈ RN ; v = Xu =∑P
j=1 ujxj : principal component
→ find u so that 1N ‖v‖
2 = var(v) is maximum.
9/33
![Page 22: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/22.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis(‖u‖2 =
∑Pj=1 u
2j = 1)
• v ∈ RN ; v = Xu =∑P
j=1 ujxj : principal component
→ find u so that 1N ‖v‖
2 = var(v) is maximum.
9/33
![Page 23: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/23.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Reducing P dimensions into 1
• X ∈ RN×P ; X = [x1| . . . |xP ]: data matrix
• u ∈ RP ; u = [u1, . . . , uP ]: principal axis(‖u‖2 =
∑Pj=1 u
2j = 1)
• v ∈ RN ; v = Xu =∑P
j=1 ujxj : principal component
→ find u so that 1N ‖v‖
2 = var(v) is maximum.
9/33
![Page 24: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/24.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (⇐⇒ cor(v1,v2) = 0)→ find u2 so that 1
N ‖v2‖2 = var(v2) is maximum
10/33
![Page 25: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/25.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (⇐⇒ cor(v1,v2) = 0)→ find u2 so that 1
N ‖v2‖2 = var(v2) is maximum
10/33
![Page 26: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/26.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (⇐⇒ cor(v1,v2) = 0)→ find u2 so that 1
N ‖v2‖2 = var(v2) is maximum
10/33
![Page 27: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/27.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Keeping more than one principal component
• u1 and v1: 1st principal axis and component
• u2 and v2: 2nd principal axis and component
→ constraint: u1 ⊥ u2 (⇐⇒ cor(v1,v2) = 0)→ find u2 so that 1
N ‖v2‖2 = var(v2) is maximum
10/33
![Page 28: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/28.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
How many principal components to retain?
Choice based on “screeplot”: barplot of eigenvalues
Retain only “significant” structures... but not trivial ones.
11/33
![Page 29: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/29.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures12/33
![Page 30: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/30.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures12/33
![Page 31: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/31.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outputs of multivariate analyses: an overview
Main outputs:
• principal components: diversity amongst individuals
• principal axes: nature of the structures
• eigenvalues: magnitude of structures12/33
![Page 32: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/32.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Usual summary of an analysis: the biplot
Biplot: principal components (points) + loadings (arrows)
• groups of individuals
• discriminating variables (longest arrows)
• magnitude of the structures
13/33
![Page 33: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/33.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
14/33
![Page 34: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/34.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
14/33
![Page 35: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/35.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
14/33
![Page 36: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/36.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Multivariate analysis in a nutshell
• variety of methods for different types of variables
• principal components (PCs) summarize diversity
• variable loadings identify discriminating variables
• other uses of PCs: maps (spatial structures), models(response variables or predictors), ...
14/33
![Page 37: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/37.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genomic data
Genetic diversity of pathogen populations
15/33
![Page 38: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/38.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 39: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/39.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 40: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/40.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 41: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/41.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 42: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/42.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 43: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/43.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 44: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/44.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 45: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/45.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 46: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/46.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
From DNA sequences to patterns of biological diversity
16/33
![Page 47: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/47.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
⇒ Multivariate analysis use to summarize genetic diversity.
17/33
![Page 48: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/48.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
⇒ Multivariate analysis use to summarize genetic diversity.
17/33
![Page 49: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/49.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
⇒ Multivariate analysis use to summarize genetic diversity.
17/33
![Page 50: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/50.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DNA sequences: a rich source of information
• hundreds/thousands individuals
• up to millions of single nucleotide polymorphism (SNPs)
⇒ Multivariate analysis use to summarize genetic diversity.
17/33
![Page 51: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/51.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
First application of multivariate analysis in genetics
PCA of genetic data, native human populations (Cavalli-Sforza 1966, Proc B)
First 2 principal components separate populations into continents.
18/33
![Page 52: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/52.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
First application of multivariate analysis in genetics
PCA of genetic data, native human populations (Cavalli-Sforza 1966, Proc B)
First 2 principal components separate populations into continents.
18/33
![Page 53: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/53.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Applications: some examples
PCA of genetic data + colored maps of principal components(Cavalli-Sforza et al. 1993, Science)
Signatures of Human expansion out-of-Africa.
19/33
![Page 54: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/54.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Since then...
Multivariate methods used in genetics
• Principal Component Analysis (PCA)
• Principal Coordinates Analysis (PCoA) / Metric MultidimensionalScaling (MDS)
• Correspondance Analysis (CA)
• Discriminant Analysis (DA)
• Canonical Correlation Analysis (CCA)
• ...
20/33
![Page 55: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/55.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Since then...
Applications
• reveal spatial structures (historical spread)
• explore genetic diversity
• identify cryptic species
• discover genotype-phenotype association
• ...
• review in Jombart et al. 2009, Heredity 102: 330-341
Applications in genetics of pathogen populations.
21/33
![Page 56: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/56.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Since then...
Applications
• reveal spatial structures (historical spread)
• explore genetic diversity
• identify cryptic species
• discover genotype-phenotype association
• ...
• review in Jombart et al. 2009, Heredity 102: 330-341
Applications in genetics of pathogen populations.
21/33
![Page 57: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/57.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Outline
Multivariate analysis in a nutshell
Applications to genomic data
Genetic diversity of pathogen populations
22/33
![Page 58: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/58.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
23/33
![Page 59: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/59.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
23/33
![Page 60: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/60.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
23/33
![Page 61: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/61.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Why investigate the diversity of pathogen populations?
Genetic data: increasingly important in infectious diseaseepidemiology
Purposes
• classify pathogens, describe theirrelationships
• assess the spatio-temporaldynamics of infectious diseases
• reconstruct epidemiologicalprocesses (transmission)
23/33
![Page 62: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/62.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Different questions at different scales
Where and how can multivariate analysis of pathogen genetic databe useful?
24/33
![Page 63: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/63.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Different questions at different scales
Where and how can multivariate analysis of pathogen genetic databe useful?
24/33
![Page 64: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/64.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
25/33
![Page 65: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/65.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
25/33
![Page 66: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/66.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
25/33
![Page 67: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/67.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
25/33
![Page 68: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/68.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Describing pathogen populations
Population genetics: identify populations of organisms anddescribe their relationships
What is a population?
• Usual definition: set of organisms mating at random
• Problem: no “mating” in most pathogens (e.g. viruses,bacteria)
• Genetic clusters: set of genetically related pathogens (e.g.same outbreak, same epidemic).
⇒ aim: identify and describe genetic clusters
25/33
![Page 69: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/69.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Genetic clustering using K-means & BIC(Jombart et al. 2010, BMC Genetics)
Variance partitioning model (ANOVA):
tot . variance = (bet . groups) + (wit . groups)
Performances:
• K-means ≥ STRUCTURE on simulated data (various islandand stepping stone models)
• orders of magnitude faster (seconds vs hours/days)
26/33
![Page 70: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/70.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Genetic clustering using K-means & BIC(Jombart et al. 2010, BMC Genetics)
Variance partitioning model (ANOVA):
tot . variance = (bet . groups) + (wit . groups)
Performances:
• K-means ≥ STRUCTURE on simulated data (various islandand stepping stone models)
• orders of magnitude faster (seconds vs hours/days)
26/33
![Page 71: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/71.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
PCA of seasonal influenza (A/H3N2) data
Data: seasonal influenza (A/H3N2), 500 HA segments.
Little temporal evolution, burst of diversity in 2002??
27/33
![Page 72: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/72.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
PCA of seasonal influenza (A/H3N2) data
Data: seasonal influenza (A/H3N2), 500 HA segments.
Little temporal evolution, burst of diversity in 2002??
27/33
![Page 73: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/73.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Which diversity to represent?Total diversity not relevant to analyse clusters.
Discriminant Analysis of Principal Components (DAPC):(Jombart et al. 2010, BMC Genetics)
• maximizes group discrimination (“between/within” ratio)
• provides group membership probabilities (prediction possible)
• as computer-efficient as PCA
28/33
![Page 74: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/74.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Which diversity to represent?Total diversity not relevant to analyse clusters.
Discriminant Analysis of Principal Components (DAPC):(Jombart et al. 2010, BMC Genetics)
• maximizes group discrimination (“between/within” ratio)
• provides group membership probabilities (prediction possible)
• as computer-efficient as PCA
28/33
![Page 75: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/75.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DAPC of seasonal influenza (A/H3N2) data
Strong temporal signal, originality of 2006 isolates (new alleles).
29/33
![Page 76: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/76.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DAPC of seasonal influenza (A/H3N2) data
Strong temporal signal, originality of 2006 isolates (new alleles).
29/33
![Page 77: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/77.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Identifying antigenic clusters in influenza (A/H3N2)
Antigenic clusters identified directly from AA sequences.30/33
![Page 78: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/78.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Identifying antigenic clusters in influenza (A/H3N2)
Antigenic clusters identified directly from AA sequences.30/33
![Page 79: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/79.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DAPC to identify structuring alleles
DAPC finds combinations of alleles most differing between groups.
Simulated data:(Jombart & Ahmed 2011, Bioinformatics)
• 2 clusters, 50 isolates each
• 1,000,000 non structured SNPs
• 1,000 structured SNPs(i.e. different frequencies betweengroups)
Possible applications to pathogen GWAS (e.g. SNPs related toantibiotic resistance in bacteria).
31/33
![Page 80: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/80.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
DAPC to identify structuring alleles
DAPC finds combinations of alleles most differing between groups.
Simulated data:(Jombart & Ahmed 2011, Bioinformatics)
• 2 clusters, 50 isolates each
• 1,000,000 non structured SNPs
• 1,000 structured SNPs(i.e. different frequencies betweengroups)
Possible applications to pathogen GWAS (e.g. SNPs related toantibiotic resistance in bacteria).
31/33
![Page 81: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/81.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
32/33
![Page 82: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/82.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
32/33
![Page 83: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/83.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
32/33
![Page 84: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/84.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
32/33
![Page 85: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/85.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Limits of multivariate analysis
Methicillin-resistant Staphylococcus aureus (MRSA) outbreak within hospital,Thailand. ∼ 200 full-genome sequences. ∼ 1, 000 SNPs.
Observations:
• greater diversity than expected
• genetic clusters can be defined
• transmissions at within-cluster level
• multivariate analysis = loss ofinformation
Multivariate analysis usually not informative on small-scale processes.
32/33
![Page 86: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/86.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific application to pathogen genetic data
• limits reached when reconstructing fine-scale processes
33/33
![Page 87: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/87.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific application to pathogen genetic data
• limits reached when reconstructing fine-scale processes
33/33
![Page 88: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/88.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific application to pathogen genetic data
• limits reached when reconstructing fine-scale processes
33/33
![Page 89: Introduction to multivariate analysis applications in genomicsadegenet.r-forge.r-project.org/files/simGWAS/lecture-MSc-MVA.1.2.pdf · Multivariate analysis in a nutshellApplications](https://reader031.fdocuments.in/reader031/viewer/2022022117/5ca9e0e288c993130d8c8829/html5/thumbnails/89.jpg)
Multivariate analysis in a nutshell Applications to genomic data Genetic diversity of pathogen populations
Summary
• multivariate analysis used for ∼ 50 years in genetics, still anactive field for methodological development
• increasingly useful as datasets grow
• specific application to pathogen genetic data
• limits reached when reconstructing fine-scale processes
33/33