0 introduction
-
Upload
dmitry-grapov -
Category
Documents
-
view
10.591 -
download
0
Transcript of 0 introduction
![Page 1: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/1.jpg)
Introduction to Metabolomic Data Analysis
Dmitry Grapov, PhD
Intr
oduc
tion
![Page 2: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/2.jpg)
Important
•This is an introduction to a series of 8 tutorials for metabolomic data analysis
•Download all the required files and software here:
https://sourceforge.net/projects/teachingdemos/files/Winter%202014%20LC-MS%20and%20Statistics%20Course/
•Then follow the directions in the software/startup.R to launch all accompanying software
Intr
oduc
tion
![Page 3: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/3.jpg)
Goals?
![Page 4: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/4.jpg)
Analysis at the Metabolomic Scale
![Page 5: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/5.jpg)
Cycle of Scientific DiscoveryData Acquisition
DataData AnalysisHypothesis Generation
Data ProcessingHypothesis
![Page 6: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/6.jpg)
Univariate vs. MultivariateUnivariate
Gro
up 1
Gro
up 2
Multivariate Predictive Modeling
Hypothesis testing (t-Test, ANOVA, etc.) PCA O-/PLS/-DA
![Page 7: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/7.jpg)
univariate/bivariate vs.
\ multivariate
mixed up samples?outliers?
Univariate vs. Multivariate
![Page 8: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/8.jpg)
Data Analysis Goals
• Are there any trends in my data?– analytical sources – meta data/covariates
• Useful Methods– matrix decomposition (PCA, ICA, NMF)– cluster analysis
• Differences/similarities between groups?– discrimination, classification, significant changes
• Useful Methods– analysis of variance (ANOVA), mixed effects models– partial least squares discriminant analysis (O-/PLS-DA)– Others: random forest, CART, SVM, ANN
• What is related or predictive of my variable(s) of interest?– Regression, correlation
• Useful Methods– correlation– partial least squares (O-/PLS)
Exploration Classification Prediction
![Page 9: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/9.jpg)
Data Complexity
nm
1-D 2-D m-D
Data
samples
variables
complexity
Meta Data
Experimental Design =
Variable # = dimensionality
![Page 10: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/10.jpg)
Univariate Qualities• length (sample size)
• center (mean, median, geometric mean)
• dispersion (variance, standard deviation)
• range (min / max),
• quantiles
• shape (skewness, kurtosis, normality, etc.)
mean
standard deviation
![Page 11: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/11.jpg)
Data QualityMetrics
• Precision
• Accuracy
Remedies
• normalization
• outliers detection
*Start lab 1-statistical analysis
![Page 12: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/12.jpg)
Univariate Analyses• Identify differences in sample population
means• sensitive to distribution shape
• parametric = assumes normality
• error in Y, not in X (Y = mX + error)
• optimal for long data
• assumed independence
• false discovery rate (FDR) long
wide
n-of-one
![Page 13: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/13.jpg)
Type I Error: False Positives
• Type II Error: False Negatives
• Type I risk =
• 1-(1-p.value)m
m = number of variables tested
FDR correction
• p-value adjustment or estimate of FDR (Fdr, q-value)
False Discovery Rate (FDR)
Bioinformatics (2008) 24 (12):1461-1462
![Page 14: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/14.jpg)
Achieving “significance” is a function of:
significance level (α) and power (1-β )
effect size (standardized difference in means)
sample size (n)
*finish lab 1-statistical analysis
![Page 15: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/15.jpg)
ClusteringIdentify
•patterns
•group structure
• relationships
•Evaluate/refine hypothesis
•Reduce complexity
Artist: Chuck Close
![Page 16: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/16.jpg)
Cluster AnalysisUse the concept similarity/dissimilarity to group a collection of samples or variables
Approaches• hierarchical (HCA)• non-hierarchical (k-NN, k-means)• distribution (mixtures models)• density (DBSCAN)• self organizing maps (SOM)
Linkage k-means
Distribution Density
![Page 17: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/17.jpg)
Hierarchical Cluster Analysis• similarity/dissimilarity
defines “nearness” or distance
X
Y
euclidean
X
Y
manhattan Mahalanobis
X
Y*
non-euclidean
![Page 18: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/18.jpg)
Hierarchical Cluster Analysis
single complete centroid average
Agglomerative/linkage algorithm defines how points are grouped
![Page 19: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/19.jpg)
Dendrograms
Sim
ilarit
y
x
xx
x
![Page 20: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/20.jpg)
Exploration Confirmation
How does my metadata match my data structure?
Hierarchical Cluster Analysis
*finish lab 2-Cluster Analysis
![Page 21: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/21.jpg)
Projection of Data
The algorithm defines the position of the light sourcePrincipal Components Analysis (PCA)
• unsupervised• maximize variance (X)
Partial Least Squares Projection to Latent Structures (PLS)
• supervised• maximize covariance (Y ~ X)
James X. Li, 2009, VisuMap Tech.
![Page 22: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/22.jpg)
Interpreting PCA Results
Variance explained (eigenvalues)
Row (sample) scores and column (variable) loadings
![Page 23: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/23.jpg)
How are scores and loadings related?
![Page 24: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/24.jpg)
Centering and Scaling
PMID: 16762068
*finish lab 3-Principal Components Analysis
![Page 25: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/25.jpg)
Use PLS to test a hypothesis
time = 0 120 min.
Partial Least Squares (PLS) is used to identify planes of maximum correlation between X measurements and Y (hypothesis)
PCA PLS
![Page 26: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/26.jpg)
Modeling multifactorial relationships
dynamic changes among groups~two-way ANOVA
![Page 27: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/27.jpg)
PLS Related ObjectsModel• dimensions, latent variables (LV)• performance metrics (Q2, RMSEP, etc)• validation (training/testing, permutation, cross-validation)• orthogonal correctionSamples• scores• predicted values• residualsVariables• Loadings• Coefficients, summary of loadings based on all LVs• VIP, variable importance in projection• Feature selection
![Page 28: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/28.jpg)
“goodness” of the model is all about the perspective
Determine in-sample (Q2) and out-of-sample error (RMSEP) and compare to a random model
• permutation tests
• training/testing
*finish lab 4-Partial Least Squares and lab 5-Data Analysis Case Study
![Page 29: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/29.jpg)
Biological Interpretation
• Visualization• Enrichment• Networks
– biochemical– structural– spectral– empirical
Projection or mapping of analysis results into a biological context.
![Page 30: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/30.jpg)
Organism specific biochemical relationships and information
Multiple organism DBs
• KEGG
• BioCyc
• Reactome
• Human
• HMDB
• SMPDB
Identification of alterations in biochemical domains
*finish lab 6-Metabolite Enrichment Analysis
![Page 31: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/31.jpg)
2. Calculate Mappings
1. Generate Connections
3. Create Network
Grapov D., Fiehn O., Multivariate and network tools for analysis and visualization of metabolomic data, ASMS, June 08, 2013, Minneapolis, MN
Network Mapping
![Page 32: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/32.jpg)
Connections and Contexts
Biochemical (substrate/product)• Database lookup• Web query
Chemical (structural or spectral similarity )• fingerprint generation
Empirical (dependency)• correlation, partial-correlation
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99
![Page 33: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/33.jpg)
Mapping Analysis Results
Analysis results Network Annotation Mapped Network
*finish lab 7-Network Mapping I
![Page 34: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/34.jpg)
Biochemical Relationships
http://www.genome.jp/dbget-bin/www_bget?rn:R00975
![Page 35: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/35.jpg)
Structural Similarity
http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
![Page 36: 0 introduction](https://reader036.fdocuments.in/reader036/viewer/2022062319/554e8570b4c90573338b4686/html5/thumbnails/36.jpg)
Mass Spectral Connections
Watrous J et al. PNAS 2012;109:E1743-E1752 *finish lab 8-Network Mapping II