Introducing Undergraduate Students to Metabolomics Using a ...

5
Introducing Undergraduate Students to Metabolomics Using a NMR- Based Analysis of Coee Beans Peter Olaf Sandusky* Department of Chemistry, Wellesley College, Wellesley, Massachusetts 02481, United States * S Supporting Information ABSTRACT: Metabolomics applies multivariate statistical analysis to sets of high-resolution spectra taken over a population of biologically derived samples. The objective is to distinguish subpopulations within the overall sample population, and possibly also to identify biomarkers. While metabolomics has become part of the standard analytical toolbox in many areas of chemical research, its principles and methods have not yet been generally incorporated into the undergraduate chemistry curriculum. Identication of the arabica and robusta varieties of green coee beans using 1 H NMR-based principle component analysis provides an inexpensive teaching laboratory experiment that introduces students to the methods of metabolomics. The experiment does not require any expensive chemicals, or unique equipment or software, or access to higher-eld instruments. Because there is a general curiosity among students about the chemical composition of coee, the experiment is also particularly engaging to the studentsinterest and imagination. KEYWORDS: Upper-Division Undergraduate, Analytical Chemistry, Agricultural Chemistry, Bioanalytical Chemistry, Chemometrics, Food Science, NMR Spectroscopy, Hands-On Learning/Manipulatives INTRODUCTION Metabolomics is a statistical approach to understanding the complex organic chemistry of samples derived from biological sources, including blood, urine, animal and plant tissue extracts, and microbial cultures. A set of high-resolution spectra is taken over a population of samples. Multivariate statistics is then used to analyze the set of spectra in order to detect subpopulations within the parent sample population, and identify the variations in chemistry responsible for the subpopulations. Metabolomics has found a wide and growing application in a number of areas of chemical research. In the past ten years ACS journals have published 1121 metabolomics papers, of which 808 were published in the last ve years (Table S1). However, despite this, the incorporation of metabolomics into the undergraduate chemistry curriculum has been limited. While the topic is treated in some undergraduate chemistry programs, it is ignored in most. This Journal has published a number of useful articles describing ways in which multivariate statistics may be incorporated into undergraduate chemistry curricula. 1-10 However, few of these papers capture the essential features of metabolomics (InstructorsNote 1). This paper describes a laboratory experiment used as part of an upper-level course in advanced analytical chemistry at Pomona College, in which students applied the methods of NMR-based metabolomics to distinguish between the arabica and robusta varieties of unroasted coee beans. Almost all commercially cultivated coee belongs to one of two species: Cof fea arabica, and Cof fea canephora, commonly called robusta. These two species dier statistically in the quantities of the various metabolites found in the beans. 11,12 In this laboratory experiment, the students are provided with a set of authentic unroasted coee bean samples representing both arabica and robusta coees from a number of dierent countries, and one sample of unknown species. The students characterize the water-extractable organic components in these samples using 1 H NMR spectroscopy, and then analyze the population of 1 H NMR spectra using principal component analysis 13,14 (PCA) to distinguish the subpopulations represent- ing the arabica and robusta varieties. They can then assign the unknown sample into either the arabica or robusta subpopulations. PCA is a basic statistical tool used in metabolomics. Consider a PCA calculation on a data set where m data variables were measured over a population of n samples. (In the case of this experiment each data variable would be a 1 H NMR integral.) The data input to the PCA computer program would be an n × Received: July 25, 2016 Revised: June 3, 2017 Published: July 26, 2017 Laboratory Experiment pubs.acs.org/jchemeduc © 2017 American Chemical Society and Division of Chemical Education, Inc. 1324 DOI: 10.1021/acs.jchemed.6b00559 J. Chem. Educ. 2017, 94, 1324-1328 Downloaded via UNIV NACIONAL AUTONOMA MEXICO on August 31, 2021 at 22:21:08 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

Transcript of Introducing Undergraduate Students to Metabolomics Using a ...

Page 1: Introducing Undergraduate Students to Metabolomics Using a ...

Introducing Undergraduate Students to Metabolomics Using a NMR-Based Analysis of Coffee BeansPeter Olaf Sandusky*

Department of Chemistry, Wellesley College, Wellesley, Massachusetts 02481, United States

*S Supporting Information

ABSTRACT: Metabolomics applies multivariate statistical analysis tosets of high-resolution spectra taken over a population of biologicallyderived samples. The objective is to distinguish subpopulations withinthe overall sample population, and possibly also to identify biomarkers.While metabolomics has become part of the standard analytical toolboxin many areas of chemical research, its principles and methods have notyet been generally incorporated into the undergraduate chemistrycurriculum. Identification of the arabica and robusta varieties of greencoffee beans using 1H NMR-based principle component analysis providesan inexpensive teaching laboratory experiment that introduces studentsto the methods of metabolomics. The experiment does not require anyexpensive chemicals, or unique equipment or software, or access tohigher-field instruments. Because there is a general curiosity amongstudents about the chemical composition of coffee, the experiment is alsoparticularly engaging to the students’ interest and imagination.

KEYWORDS: Upper-Division Undergraduate, Analytical Chemistry, Agricultural Chemistry, Bioanalytical Chemistry, Chemometrics,Food Science, NMR Spectroscopy, Hands-On Learning/Manipulatives

■ INTRODUCTION

Metabolomics is a statistical approach to understanding thecomplex organic chemistry of samples derived from biologicalsources, including blood, urine, animal and plant tissue extracts,and microbial cultures. A set of high-resolution spectra is takenover a population of samples. Multivariate statistics is then usedto analyze the set of spectra in order to detect subpopulationswithin the parent sample population, and identify the variationsin chemistry responsible for the subpopulations.Metabolomics has found a wide and growing application in a

number of areas of chemical research. In the past ten years ACSjournals have published 1121 metabolomics papers, of which808 were published in the last five years (Table S1). However,despite this, the incorporation of metabolomics into theundergraduate chemistry curriculum has been limited. Whilethe topic is treated in some undergraduate chemistry programs,it is ignored in most.This Journal has published a number of useful articles

describing ways in which multivariate statistics may beincorporated into undergraduate chemistry curricula.1−10

However, few of these papers capture the essential features ofmetabolomics (Instructors’ Note 1). This paper describes alaboratory experiment used as part of an upper-level course inadvanced analytical chemistry at Pomona College, in whichstudents applied the methods of NMR-based metabolomics to

distinguish between the arabica and robusta varieties ofunroasted coffee beans.Almost all commercially cultivated coffee belongs to one of

two species: Cof fea arabica, and Cof fea canephora, commonlycalled “robusta”. These two species differ statistically in thequantities of the various metabolites found in the beans.11,12 Inthis laboratory experiment, the students are provided with a setof authentic unroasted coffee bean samples representing botharabica and robusta coffees from a number of differentcountries, and one sample of unknown species. The studentscharacterize the water-extractable organic components in thesesamples using 1H NMR spectroscopy, and then analyze thepopulation of 1H NMR spectra using principal componentanalysis13,14 (PCA) to distinguish the subpopulations represent-ing the arabica and robusta varieties. They can then assign theunknown sample into either the arabica or robustasubpopulations.PCA is a basic statistical tool used in metabolomics. Consider

a PCA calculation on a data set where m data variables weremeasured over a population of n samples. (In the case of thisexperiment each data variable would be a 1H NMR integral.)The data input to the PCA computer program would be an n ×

Received: July 25, 2016Revised: June 3, 2017Published: July 26, 2017

Laboratory Experiment

pubs.acs.org/jchemeduc

© 2017 American Chemical Society andDivision of Chemical Education, Inc. 1324 DOI: 10.1021/acs.jchemed.6b00559

J. Chem. Educ. 2017, 94, 1324−1328

Dow

nloa

ded

via

UN

IV N

AC

ION

AL

AU

TO

NO

MA

ME

XIC

O o

n A

ugus

t 31,

202

1 at

22:

21:0

8 (U

TC

).Se

e ht

tps:

//pub

s.ac

s.or

g/sh

arin

ggui

delin

es f

or o

ptio

ns o

n ho

w to

legi

timat

ely

shar

e pu

blis

hed

artic

les.

Page 2: Introducing Undergraduate Students to Metabolomics Using a ...

m matrix in which each row corresponds to one sample, andeach column corresponds to one data variable. The PCAcalculation then determines a new set of variables, the principalcomponents (PCs). After the PCA calculation each sample,rather than being described as it originally was by a set of mdata values, d, is now described by a set of “scores”, s. Eachsample will have one score for each of the principalcomponents calculated. For sample j and principal componentk,

= + + + +s l d l d l d l d...jk k j1 k j k j mk jm1 2 2 3 3 (1)

Here “lik” is the “loading” coefficient linking the data variable iwith the score in PC k (Instructors’ Note 2).The principal components are determined such that the

variance of the sample population in PC 1 is greater than thevariance of the sample population in PC 2, and in turn thevariance of the sample population in PC 2 is greater than thevariance of the sample population in PC 3, and so on. Becausemost of the variance of the sample population in principalcomponent space is concentrated in the first few principalcomponents, a plot of the PC 1 versus PC 2 scores will oftenreveal subpopulations within the parent sample population.Likewise, a plot of PC 1 versus PC 2 loadings will indicatewhich data variables, or, in the case of this experiment, whichcoffee metabolites, are significantly responsible for differencesbetween the subpopulations.

■ THE EXPERIMENT

A detailed description of the experimental procedure, and thestudent handout used in the advanced analytical course atPomona College in the Fall semester of 2013, are included inthe Supporting Information.

Materials

Samples of unroasted arabica and robusta coffee beans from avariety of different countries were purchased from variousvendors as detailed in Tables S2A and S2B. (Also seeInstructors’ Note 3.)

Coffee Extraction

Steps 1−3 below were performed by the students during thefirst laboratory period.

1. Each team of students was provided with ten samples ofunroasted coffee beans, including four or five samples ofauthentic arabica beans, four or five samples of authenticrobusta beans, and one sample of beans of unknown type.Samples were ground using an electric coffee beangrinder. (See Instructors’ Note 3 on coffee beangrinders.)

2. A weighted portion of unroasted ground coffee beansfrom each sample, approximately 0.15 g contained in a 2mL Eppendorf tube, was incubated at 95 °C in 1.5 mL ofD2O for 1 h. Samples were cooled on ice for 15 min, andcoffee solids were pelleted down by centrifugation.

3. Supernatants were immediately lifted off the pellets andtransferred to fresh 2 mL Eppendorf tubes. Samples werestored at −4 °C until the second laboratory period(usually 2 days).

Acquiring and Processing NMR Spectra

Steps 4−7 below were performed by the students during thesecond laboratory period.

4. NMR tubes containing 30 mM phosphate buffer (pH 6)and 0.46 mM TMSP [3-(trimethylsilyl) propionic-2,2,3,3-d4-acid] were prepared from the coffee extractsupernatants from step 3.

5. 1H NMR spectra were acquired and processed on thecomplete set of samples. (Instrumental parameters andrepresentative spectra can be found in Instructors’ Note4 and Figures S1 and S2.)

6. The spectra were aligned by assigning the TMSP methylpeak to 0.000 ppm, and the entire population of spectrawas overlaid. The integral regions, or “buckets”, werechosen so as to include all the major peaks observed inthe downfield region of the spectra from 9.5 to 5.0 ppm(Figure 1 and Instructors’ Note 5).

7. Each individual spectrum was integrated using theintegral regions determined in step 6. The resultingintegral text files were then e-mailed or transferred byUSB stick to a student’s own personal computer.

Principal Component Analysis

Steps 8 and 9 below were performed by the students at sometime of their own choosing following the second laboratoryperiod.

8. The integral text files were read into a spreadsheetprogram, and the integrals were arranged into the formatof a PCA data input matrix, so that each rowcorresponded to one sample and each columncorresponded to one NMR integral-bucket region.(Alternative procedures for constructing, calibrating andnormalizing the PCA data input matrix are described inInstructors’ Note 6 in the Supporting Information.)

9. The spreadsheet data matrix from step 8 was read into aPCA program, and PCA was performed on the data setwith mean centering and unit variance weighting(Instructors’ Notes 2 and 7).

Figure 1. 400 MHz 1H NMR spectra of D2O extracts of unroastedcoffee beans. The population of spectra used in the PCA calculationwhose results are presented in Figures 2 and 3 are shown here. Spectraare aligned with the TMSP methyl peak assigned to 0.000 ppm.Arabica and robusta spectra are overlaid separately, and the scaling inthis figure is adjusted so that the height of the sucrose anomericproton peak at 5.43 ppm is the same in all spectra. Bracketscorrespond to one possible set of integral-buckets. Peak assignmentsare based on Wei et al.15

Journal of Chemical Education Laboratory Experiment

DOI: 10.1021/acs.jchemed.6b00559J. Chem. Educ. 2017, 94, 1324−1328

1325

Page 3: Introducing Undergraduate Students to Metabolomics Using a ...

■ HAZARDSWater extracts of coffee beans will initially be hot and can burn.

■ RESULTSThe downfield half of the 1H NMR spectrum of water extractsof unroasted coffee beans is dominated by just six species:caffeine, sucrose, trigonelline, and three isomers of caffeoyl-quinic acid (CQA). All unroasted coffee extracts, both those ofthe arabica and the robusta varieties, will contain all thesemetabolites, but at subtly different, though statisticallydistinguishable, relative amounts (Figure 1).The 1H and 13C NMR spectra of water extracts of unroasted

coffee beans have been rigorously assigned using COSY andHSQC by Wei et al.15 However, stacking of aromaticmetabolites, particularly caffeine and the aromatic rings of theCQAs, causes variations in chemical shifts due to ring currenteffects,16 and this complicates the comparison of 1H spectrafrom different samples. Nonetheless, integral-bucket regionscan be defined on the downfield half of the overlaid populationof spectra such that each integral-bucket represents the relativeconcentration of one metabolite, or in the case of the CQAs,the concentration of a mixture of three isomers of CQA (Figure1). Thus, no specialized software for bucket integration isneeded.The subpopulations of arabica and robusta samples can be

viewed as discrete clusters in the PC 1 versus PC 2 scores plots(Figure 2, Figure S3, and Instructors’ Note 8). The

discrimination between arabica and robusta sample clusters isfor the most part along the PC 1 coordinate (Figure 2 andTable 1). This allows for a direct reading of the PC 1 versus PC2 loadings plots to mean that, statistically, hot water extracts ofarabica samples have higher concentrations of sucrose andtrigonelline, and lower concentrations of caffeine and CQAs,than those of robusta samples (Figure 3). Whereas GABA (γ-

aminobutyric acid) does not appear to differ statistically muchbetween the two groups (Instructors’ Note 5).

■ DISCUSSIONThis experiment was originally developed and used in a coursein advanced analytical chemistry taught at Pomona Collegeduring the Fall semester of 2013. Subsequently the experimentwas repeated several times by the author, without studentinvolvement, first at California State University Bakersfield, inorder to determine whether the results could be reproducedusing a different sample set, and then at Eckerd College, inorder to confirm that the experiment could be performed on a300 MHz instrument (Instructors’ Note 9).The Pomona course enrolled 22 students, all upper-division

undergraduate chemistry or biochemistry majors. The labo-ratory sections of the course met twice a week for 3 h each. Thestudents organized themselves into “project teams” for thecourse, with four or five students in each team (Instructors’Note 10).17 Each team rotated through the course experimentschedule independently, so that only one team would be doingthe coffee bean NMR metabolomics experiment each week. Atthe beginning of the first laboratory session of the experimentthe instructor gave a 20 min overview to all members of theproject team (Instructors’ Note 11). Subsequently, at each stepin the experiment the instructor demonstrated any novel

Figure 2. PC 2 vs PC 1 scores plot from PCA calculation on 400 MHzNMR integral data from the spectra shown in Figure 1. Integral datafrom two experiments done on the same coffee bean sample set, butperformed on different days, were combined in one PCA calculation.(See Table 1 for sample key.) The “b” label is used to indicate samplesrun in the second experiment. The integrals were calibrated relative tothe TMSP methyl peak integral, which was assigned to a value of 1.00in each spectrum. Percent numbers on axes indicate sample populationvariance in the corresponding PC.

Table 1. Coffee Sample Key for Figure 2

Sample Country of Origin Coffee Varieties

A11 Tanzania ArabicaA12 Ethiopia ArabicaA13 Guatemala ArabicaA14 Brazil ArabicaA15 Mexico ArabicaR10 Mexico RobustaR11 Vietnam Robusta (1)R13 Vietnam Robusta (2)R14 Philippines RobustaR15 a Robusta

aCountry of origin unknown.

Figure 3. PC 2 vs PC 1 loadings plot from PCA calculation describedin Figure 2.

Journal of Chemical Education Laboratory Experiment

DOI: 10.1021/acs.jchemed.6b00559J. Chem. Educ. 2017, 94, 1324−1328

1326

Page 4: Introducing Undergraduate Students to Metabolomics Using a ...

procedures on the first sample, and the students thenperformed the experiment on the remaining nine samplesindependently (Instructors’ Note 12).All five student teams were successful in producing PC 1

versus PC 2 scores plots showing resolved clustering of arabicaand robusta samples, and correctly identified the “unknown”sample’s membership (Figure 2 and Instructors’ Note 13).An excellent paper by Wei et al. describes the use of NMR-

based metabolomics to distinguish between arabica and robustasamples of unroasted coffee beans.18 Their method begansample preparation with frozen beans, used both 13C and 1Hspectra taken on a 500 MHz instrument, employed specializedsoftware for rigorous bucket integration of the spectra, andinvoked advanced statistical methods such as orthogonalprojection of latent structure discriminate analysis (OPLS-DA). These are cutting edge techniques within the context ofNMR-based metabolomics, and allow not only for thediscrimination between coffee bean types, but to a degreedetermination of the place of origin as well. However, themethods described by Wei et al. are too time-consuming, waytoo expensive, and, possibly, too mathematically sophisticated,to be used in an undergraduate laboratory course curriculum.In contrast, the experiment described in this paper was

designed to be used as part of an upper-level undergraduatecourse in analytical chemistry or instrumental analysis. Theexperiment uses only 1H spectra taken at 300 or 400 MHz. Thissignificantly cuts down on the instrument time required, andmakes the experiment potentially available at schools withoutaccess to higher-field instruments. The experiment can be easilyperformed in two 3 h laboratory periods. A population ofunroasted coffee bean samples, adequate to supply samples forthe experiment for several years, can be collected for less than$100. The only piece of specialized equipment required, anelectric coffee bean grinder, can be purchased for $18. All otherpieces of equipment needed, including the statistics software,are generally available at any school with a 300 or 400 MHzNMR instrument. PCA, which is often taught to undergraduatestudents in their biology and social science courses, is readilyexplained to upper-level chemistry students. Finally, since mostchemistry students drink coffee, an experiment examining thechemical composition of coffee beans is particularly engaging totheir imagination.

■ ASSOCIATED CONTENT

*S Supporting Information

All material is available at The Supporting Information isavailable on the ACS Publications website at DOI: 10.1021/acs.jchemed.6b00559.

Table listing recent metabolomics papers published inACS journals, tables of coffee bean samples and vendorsources, detailed experiment procedure, student labo-ratory handout, typical arabica and robusta NMR spectraat 400 and 300 MHz, typical instrument parameter sets,and instructors’ notes (PDF, DOCX)

■ AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected].

ORCID

Peter Olaf Sandusky: 0000-0002-9514-241X

Notes

The author declares no competing financial interest.

■ ACKNOWLEDGMENTS

The author thanks the Departments of Chemistry at PomonaCollege (Claremont, CA) and California State UniversityBakersfield (Bakersfield, CA) for funds used to support thedevelopment of this experiment. The author thanks DavidGrove of the Eckerd College Department of Chemistry (St.Petersburg, FL) for the use of the department’s NMRinstrument during the development of this experiment.

■ REFERENCES(1) Howery, D. G.; Hirsch, R. F. Chemometrics in the ChemistryCurriculum. J. Chem. Educ. 1983, 60 (8), 656−659.(2) Chau, F. T.; Chung, W. H. Using Matlab to Assist Under-graduates in Learning Chemometrics. J. Chem. Educ. 1995, 72 (4),A84−A85.(3) Ribone, M. E.; Pagani, A. P.; Olivieri, A. C.; Goicoechea, H. C.Determination of the Active Principal in a Syrup by Spectropho-tometry and Principal Component Regression Analysis: An AdvancedUndergraduate Experiment Involving Chemometrics. J. Chem. Educ.2000, 77 (10), 1330−1333.(4) Cazar, R. A. An Exercise on Chemometrics for a QuantitativeAnalysis Course. J. Chem. Educ. 2003, 80 (9), 1026−1029.(5) Wanke, R.; Stauffer, J. An Advanced Undergraduate ChemistryLaboratory Experiment Exploring NIR Spectroscopy and Chemo-metrics. J. Chem. Educ. 2007, 84 (7), 1171−1173.(6) Gilbert, M. K.; Luttrell, R. D.; Stout, D.; Vogt, F. IntroducingChemometrics to the Analytical Curriculum: Combining Theory andLab Experience. J. Chem. Educ. 2008, 85 (1), 135−137.(7) Pierce, K. M.; Schale, S. P.; Le, T. M.; Larson, J. C. An AdvancedAnalytical Chemistry Experiment Using Gas Chromatography- MassSpectrometry, MATLAB, and Chemometrics To Predict BiodieselBlend Percent Composition. J. Chem. Educ. 2011, 88 (6), 806−810.(8) Pezzolo, A. D. L. To See the World in a Grain of Sand:Recognizing the Origin of Sand Specimens by Diffuse ReflectanceInfrared Fourier Transform Spectroscopy and Multivariate ExploratoryData Analysis. J. Chem. Educ. 2011, 88 (9), 1304−1308.(9) de Oliveira, R. R.; das Neves, L. S.; de Lima, K. M. G.Experimental Design, Near-Infrared Spectroscopy, and MultivariateCalibration: An Advanced Project in a Chemometrics Course. J. Chem.Educ. 2012, 89 (12), 1566−1571.(10) Stitzel, S. E.; Sours, R. E. High-Performance LiquidChromatography Analysis of Single-Origin Chocolates for Methyl-xanthine Composition and Provenance Determination. J. Chem. Educ.2013, 90 (9), 1227−1230.(11) Petracco, M. Our Everyday Cup of Coffee: The ChemistryBehind Its Magic. J. Chem. Educ. 2005, 82 (8), 1161−1167.(12) Coleman, W. F. The Chemistry of Coffee. J. Chem. Educ. 2005,82 (8), 1167.(13) Basilevsky, A. Applied Matrix Algebra in the Statistical Sciences;Dover Publications, Inc.: Mineola, NY, 2005; pp 248−264.(14) Miller, J. N.; Miller, J. C. Statistics and Chemometrics forAnalytical Chemistry, 4th ed.; Pearson-Prentice Hall: New York, 2000;pp 217−221.(15) Wei, F.; Furihata, K.; Hu, F.; Miyakawa, T.; Tanokura, M.Complex Mixture Analysis of Organic Compounds in Green CoffeeBean Extract by Two-Dimensional NMR Spectroscopy. Magn. Reson.Chem. 2010, 48, 857−865.(16) D’Amelio, N.; Fontanive, L.; Uggeri, F.; Suggi-Liverani, F.;Navarini, L. NMR Reinvestigation of the Caffeine−ChlorogenateComplex in Aqueous Solution and in Coffee Brews. Food Biophysics2009, 4, 321−330.(17) Walters, P. J. Role-Playing in Analytical Chemistry Laboratories:Part I. Anal. Chem. 1991, 63 (20), 977A−985A.

Journal of Chemical Education Laboratory Experiment

DOI: 10.1021/acs.jchemed.6b00559J. Chem. Educ. 2017, 94, 1324−1328

1327

Page 5: Introducing Undergraduate Students to Metabolomics Using a ...

(18) Wei, F.; Furihata, K.; Koda, M.; Hu, F.; Kato, R.; Miyakawa, T.;Tanokura, M. 13C NMR-Based Metabolomics for the Classification ofGreen Coffee Beans According to Variety and Origin. J. Agric. FoodChem. 2012, 60, 10118−10125.

Journal of Chemical Education Laboratory Experiment

DOI: 10.1021/acs.jchemed.6b00559J. Chem. Educ. 2017, 94, 1324−1328

1328