Metabolite Profiling and Identification Employing High Resolution MS Strategies
David Portwood
Syngenta, Jealotts Hill International Research Centre, UK
Classification: PUBLIC
2
We bring plant potential to life
Syngenta is one of the world’s leading companies
with more than 24,000 employees in over 90
countries dedicated to our purpose: Bringing plant
potential to life.
Our Crop Protection and Seeds products help
growers increase crop yields and productivity. We
contribute to meeting the growing global demand
for food, feed and fuel and are committed to
protecting the environment, promoting health and
improving the quality of life.
Classification: PUBLIC
3
Metabolomics
● Metabolomics applications within Syngenta
- Plant breeding
- Early detection of desirable traits is very valuable
- Sensory and nutritional profiling, relating chemical composition to
desirable traits
- Fundamental research
- Drought tolerance
- Ripening processes
- Effect of genetic manipulation
- New agrochemical products
- Mode of action studies
Classification: PUBLIC
4
Data acquisition workflow
Extract GC or LC-MS
3D dataset:
(time vs. m/z vs. intensity)
Deconvolution
Compound identification
Data compilation
Normalisation
QC and Data analysis
Data combination
Classification: PUBLIC
5
Study Design
● Interested in metabolic content of fruit
and vegetables in support of breeding
programs
● Early detection of desirable traits is very
valuable
● Also interested in metabolic and genetic
basis of ripening
● Here we looked at 4 different varieties
of Tomato as part of a European
Systems Biology project*
● The study compared three non-ripening
varieties with a normal (wild type)
tomato* A collaboration between Nottingham University and Syngenta
(Charlie Hodgeman and Graham Seymour, Charlie Baxter, Mark Earll,
Dave Portwood, Mark Seymour). Funded by the BBSRC
Ripening
Gen
oty
pe
AC+
CNR
NOR
RIN
Classification: PUBLIC
6
Study Design
● Four different cultivars:
- WT wild type (Ailsa Craig or AC++)
- NOR as WT except non-ripening locus
- RIN ripening inhibited
- CNR colourless non-ripe
● Sampling; post anthesis (flowering), breaker (fruit starts to change
colour) and post-breaker
● Extracts were analysed by GC-MS (Trace DSQII), UPLC-MS/MS (LTQ)
and accurate mass UPLC-MS/MS (Orbitrap Velos)
● Data processed using either RefinerMS or mzMine abd Simca-P (PCA +
OPLS)
Classification: PUBLIC
7
Deconvolution of 3D MS data
● Methods we use:
- Genedata MS refiner (Commercial)
- Re-samples 3D MS data onto a common grid and
applies alignment
- Filtering and feature detection produce a peak list
across all samples
- mzMine (Open source)
- Feature detects in the mass dimension to extract
single ion chromatograms
- Peak detection, alignment by mass and time
- Identification by database searching
● In both cases “missing peaks” are back filled with baseline
levels to ensure correct statistical treatment of data
Classification: PUBLIC
8
Identification
● Identification of peaks is the single most challenging aspect
● UPLC-MS:
- Spectra have low fragmentation and are often adducts
- Peak recognition by retention time and accurate mass of parent ion and or adducts
- Hierarchical identification process
- In-house library of ~ 300 compounds (automated in mzmine)
- In house NIST library search (automated in mzmine)
- Public databases
- MS-MS fragments (manual)
● GCMS:
- Samples derivatised (MOX + TMS), Spectra are highly fragmented
- Peak recognition by NIST search and Retention index
- Some overlap occurs particularly with sugars
● Although partly automated, careful manual curation of data is essential to prevent
misidentification. Even then the possibility of wrongly identified peaks remains.
Classification: PUBLIC
9
Quality control of data
● Principal Components Analysis is used as a
first pass quality assurance tool
● All instruments show drift, provided it is
consistent and minor it may be corrected by
normalisation
● Composite mixed samples are repeatedly
injected to ensure consistent results
● Occasionally instrument problems occur which
compromise the data
● Example of GC-MS data with analytical batch
to batch variation
Composite mixed sample
Technical replicates
Injection failure
Classification: PUBLIC
10
Normalisation
● Normalisation is often required to reduce overall amplitude variations caused by spectrometer response or sample dilution factors
● Normalisation has to be applied with caution as it can also remove useful variation
● The plot shows QC, Blank and Amino acid standards repeatedly injected during a batch of LC-MS experiments. In this case total signal normalisation was effective in removing baseline variation.
● Internal standard normalisation works very well with accurate mass data, not so well with low resolution MS (due to interference)
● Syngenta commissioned Umetrics to add normalisation filters within SIMCA-P
Total signal
normalisation
Minimising the effects of closure on analytical data Erik Johansson, Svante
Wold, Kristina Sjodin Analytical Chemistry 1984,56,1685Classification: PUBLIC
11
OPLS model of combined Polar and Apolar DSQ-GC-MS
● OPLS model built vs development time
● OPLS – able to partition systematic effects from desired effects
● Scores plot on left - green indicates pre-breaker and orange/red post breaker
ripening in 1st component (predictive component)
● Scores plot on right - analytical variation in 2nd orthogonal component
Classification: PUBLIC
12
3-D Plot of Combined Positive and Negative Ion LC-MS (LTQ)
● OPLS model built vs development time
● Three orthogonal components were
obtained
● 1st related to AC++ ripening
● 2nd related to CNR diverging
Classification: PUBLIC
13
Stability of metabolic trajectory – excellent agreement
Thermo LTQ
Nominal mass
Genedata MS refiner processed
“fresh” samples
Thermo LTQ Velos Orbitrap
High resolution accurate mass
mzMine processed
Samples stored for 6months -18’C
Classification: PUBLIC
14
Gene Expression vs Metabolite data
● OPLS vs time models for both gene and metabolite data show strong similarities
● CNR divergence from an early point, joint development up to breaker stage for AC RIN and NOR, further development of AC post breaker
Classification: PUBLIC
15
Comparison of genotypes
● Comparing the OPLS
loadings for each
genotype vs time we can
observe metabolite
changes that are unique
and are shared by each
genotype
● Here the wildtype (AC) is
compared with the
colourless non-ripe
tomato (CNR)
(delta-tomatine)
Visualization of GC-TOF/MS Based Metabolomics Data for Identification of Biochemically
Interesting Compounds Using OPLS Class Models Susanne Wiklund, Erik Johansson,
Lina Sjöström, Ewa J, Anal. Chem. 2008, 80 (1), pp 115–122Classification: PUBLIC
16
Scenarios in Annotation
● Unequivocal assignments
- mass spectrometry data alone is able to assign the identity to an
unknown component
● Ambiguous assignments
- mass spectrometry data can lead to more than one potential identity,
such as class assignments (e.g. Flavanoids)
● Equivocal assignments
- mass spectrometry data alone cannot assign an identity wholly or
even in part
Classification: PUBLIC
17
New and Developing Technologies
● Advances in technologies
- High resolution, high mass accuracy instrumentation
- E.g. New TOFs and Orbitraps
● On-line data bases and library tools
- E.g. Chemspider, Kegg, HMDB, MassBank, Chebi
● Development of mass spectrometry based library utilities
- Existing instrument vendor offerings
- In-house specific offerings
- Fragmentation predictors (e.g. Mass Fragment, Mass Frontier)
Classification: PUBLIC
18
Advances in Technologies – Mass Accuracy
● Accurate mass measurement is used to determine elemental formulae
● The better the accuracy the less the ambiguity
● Mass accuracy is defined as the ratio of the m/z measurement error to
the true m/z
● External mass calibration methods are less mass accurate than internal
calibration method
- E.g. LTQ-Orbitrap
- External calibration <3ppm
- Internal calibration <1ppm
Classification: PUBLIC
19
Understanding what accurate mass measurement gives us...
Mass Resolution
Ma
ss A
ccu
racy
Classification: PUBLIC
20
Tomato Metabolite Example (30,000 res)
Component Measured
Mass (H+)
Proposed
Formula
Mass error
ppm
No. of hits
within
2ppm
Proposed
Annotation
A 205.09718 C11H13O2N2 0.126 1 Tryptophan
B 273.07611 C15H13O5 1.318 1 Naringenin
C 355.10245 C16H19O9 0.257 1 Chlorogenic
Acid
D 414.33694 C27H44O2N 0.685 1 Tomatidinol
E 416.35260 C27H46O2N 0.490 1 Tomatidine
F 578.40503 C33H56NO7 0.172 1 -Tomatine
G 740.45850 C39H66NO12 0.739 2 γ-Tomatine
Assumption: nitrogen rule only
Classification: PUBLIC
21
On-line Database Hits – “Proposed Formula” Search
Component Measured
Mass (H+)
Proposed
Formula
Chebi MassBank HMDB Chemspider Kegg
A 205.09718 C11H12O2N2 90 24 1 124 8
B 273.07611 C15H12O5 82 4 2 459 15
C 355.10245 C16H18O9 16 8 2 68 4
D 414.33694 C27H43O2N 19 0 0 187 1
E 416.35260 C27H45O2N 18 0 0 206 1
F 578.40503 C33H55NO7 2 0 0 30 0
G 740.45850 C39H65NO12 1 0 0 8 0
How plant (tomato) specific are these databases?
Classification: PUBLIC
22
Conclusions
● Analysed the metabolome of fruit from wild type and tomato ripening
mutants using high resolution chromatography mass spectrometry
methods
- GC-MS, UPLC-MS/MS
- accurate mass high resolution UPLC-MS/MS
● Data was processed using a combination of RefinerMS and Sieve
followed by multivariate statistics using SIMCA-P (PCA and OPLS-DA)
● Annotation of components using a combination retention indexes and
mass spectra (accurate mass etc)
● Observed distinct metabolic differences between genotypes associated
with ability to develop ripening competency
Classification: PUBLIC
23
Conclusions
● Comparison with authentic reference material offer the best way to
obtain an unequivocal identification
● But a number of procedures can be utilised to improve confidence in
annotation of components detected in metabolomics studies
● High resolution and high mass accuracy are essential in obtaining
elemental formulae
- The higher the mass accuracy the better!
● On-line database searches are very useful but in many cases do not
contain “plant” specific metabolites
Classification: PUBLIC
24
Conclusions
● Further corroborative information can be obtained using MS/MS,
fragmentation predictor software or in-house libraries
● No excuse for not reading the literature!
● Where no reference material is available many assignments are likely to
remain ambiguous, however, this may be adequate for many
applications
● And ultimately, where MS does fail to provide the answer then
preparative LC and NMR may be required
Classification: PUBLIC
25
Acknowledgments
● Nottingham University
- Charlie Hodgman
- Graham Seymour
● Genedata
- Peter Haberl
- Mike Bowyer
● Funding
- ESB-link
- ERA-Net post genomics:
TOMQML
● Syngenta
- Charles Baxter
- Mark Seymour
- Mark Earll
- Zsuszanna Ament
Classification: PUBLIC
Metabolite Profiling and Identification Employing High Resolution MS Strategies
David Portwood
Syngenta, Jealotts Hill International Research Centre, UK
Classification: PUBLIC
Top Related