Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and...

12

Click here to load reader

Transcript of Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and...

Page 1: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Journal of Magnetic Resonance 242 (2014) 67–78

Contents lists available at ScienceDirect

Journal of Magnetic Resonance

journal homepage: www.elsevier .com/locate / jmr

Quantitative Quantum Mechanical Spectral Analysis (qQMSA)of 1H NMR spectra of complex mixtures and biofluids

http://dx.doi.org/10.1016/j.jmr.2014.02.0081090-7807/� 2014 Elsevier Inc. All rights reserved.

⇑ Corresponding author.E-mail address: [email protected] (R. Laatikainen).

Mika Tiainen, Pasi Soininen, Reino Laatikainen ⇑School of Pharmacy, University of Eastern Finland, P.O. Box 1627, FIN-70211 Kuopio, Finland

a r t i c l e i n f o

Article history:Received 1 October 2013Revised 24 December 2013Available online 18 February 2014

Keywords:Quantum mechanicalQuantitative NMRSpectral analysisQMSASerumMixturesBiofluids

a b s t r a c t

The quantitative interpretation of 1H NMR spectra of mixtures like the biofluids is a demanding task dueto spectral complexity and overlap. Complications may arise also from water suppression, T2-editing, pro-tein interactions, relaxation differences of the species, experimental artifacts and, furthermore, the spec-tra may contain unknown components and macromolecular background which cannot be easilyseparated from baseline. In this work, tools and strategies for quantitative Quantum Mechanical SpectralAnalysis (qQMSA) of 1H NMR spectra from complex mixtures were developed and systematicallyassessed. In the present approach, the signals of well-defined, stoichiometric components are describedby a QM model, while the background is described by a multiterm baseline function and the unknownsignals using optimizable and adjustable lines, regular multiplets or any spectral structures which canbe composed from spectral lines. Any prior knowledge available from the spectrum can also be addedto the model. Fitting strategies for weak and strongly overlapping spectral systems were developedand assessed using two basic model systems, the metabolite mixtures without and with macromolecular(serum) background. The analyses show that if the spectra are measured in high-throughput manner, theconsistent absolute quantification demands some calibration to compensate the different response fac-tors of the protons and compounds. On the other hand, the results show that also the T2-edited spectracan be measured so that they obey well the QM rules. In general, qQMSA exploits and interprets the spec-tral information in maximal way taking full advantage from the QM properties of the spectra and, at thesame time, offers chemical confidence which means that individual components can be identified withhigh confidence on the basis of their accurate spectral parameters.

� 2014 Elsevier Inc. All rights reserved.

1. Introduction

Quantitative 1H NMR (qHNMR) spectroscopy is widely used tonumerous applications as recently reviewed by Pauli et al. [1], fromthe composition of natural products and drug impurity analysis [2]to metabolomics of complex biological samples such as urine, cere-brospinal fluid, serum, fecal extracts and tissue extracts [3–9]. Theadvantages of NMR are its quantitative and nondestructive nature,high reproducibility and that measurements can be done withminimal sample preparation and calibration. Thus, the cost ofone such analysis is very low when the number of quantifiedmetabolites is taken into account – a single NMR sample measuredin high-throughput manner, can be transformed into more than200 metabolic descriptors [10]. In a recent review, the authorsended to conclude also that NMR is the best method for identifyingand quantifying urinary compounds and that it permits measure-

ment of the largest number of metabolites (209) and also yieldsthe greatest chemical diversity [9].

The information content of 1H NMR spectra of biological sam-ples is high, but transformation of this complex information intoaccurate concentrations of individual compounds is not necessarilystraightforward. Each compound may contribute up to tens of indi-vidual signals and the signals may overlap seriously with eachother. In addition, the spectrum may contain signals from un-known components and macromolecular background. Furthercomplications may arise from water suppression, T2-editing, pro-tein interactions, and any instrumental or experimental artifacts.Furthermore, pH and ionic strength may cause variations to thechemical shifts and line widths which complicate automatedquantification.

There are two fundamentally different approaches for NMR met-abolomic analysis. In the chemometric approaches (non-targetedmetabolomics), the pattern-recognition methods are used toanalyze whole spectra and only the signals that seem to be respon-sible for the selected patterns are identified [11,12], whereas in

Page 2: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

68 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

quantitative metabolomics (targeted metabolomics) the concentra-tions of known metabolites are determined [13,14]. An advantageof the chemometric methods is that spectra can be analyzed with-out any prior knowledge of metabolites present in the sample andthus these methods can be used even for wholly novel compoundsor sample materials. Nonetheless, underlying any statistical treat-ment of NMR spectra in metabolomics is the basic notion thatmetabolites are the actual variables of interest, since they representthe underlying physical model that generated the observed data,and that is why the quantitative data on metabolites is invaluable.

In principle, a compound can be quantified from the 1H NMRspectrum of a complex mixture whenever even a single line ofthe compound can be identified. The separation of the signalscan be further improved by expanding the spectrum to more thanone dimension but, unfortunately, the 2D NMR spectra are usuallylower in sensitivity. Also, the relationship between the signalintensity and concentration is not so straightforward in 2D spectraand needs calibration, whereas in 1D 1H spectra the signal area isusually nearly proportional to the concentration even in very lowconcentrations [2]. The use of the selective total correlation spec-troscopy (TOCSY) experiments [15,16] is an alternative, but theseexperiments are typically meant for detection of only a few tar-geted metabolites. Nevertheless, the multidimensional methodscan be invaluable in screening the composition of the sample asbased on diagnostic correlations. The strategy combining 2D and1D suits well for applications in which the variation of the chemi-cal diversity is broad, like in plant metabolomics [17]. However, ifthe number of the samples is large and the concentrations are inthe micromolar range, as common with, e.g. biological samples,the final quantification based on a 1D spectrum is desirable.

A distinctive feature of high-resolution 1D NMR spectra is thateven the most complex spectrum of a compound, composed ofthousands of individual spectral lines, can be described by a fewspectral parameters within experimental accuracy, employing aquantum mechanical (QM) theory [18]. Thus, even in the case ofthe 1H NMR spectrum of a complex mixture, there are strict QMrules between the lines of individual compounds. The utilizationof this feature in structural analysis has a long tradition since thelegendary LAOCOON3 program [19] was published in 1964. How-ever, none of the presently available quantitative analysis methodslike the spectral binning [20], peak alignment algorithms andcurve-fitting methods [21–24], singular value decomposition [25]or targeted profiling [13,26] fully utilizes these rules.

Our objective was to build up and assess a spectral analyticaltool that uses and interprets the spectral information in maximalway taking also maximal advantage from any prior knowledgeavailable from the sample and its components. The approach,called herein quantitative Quantum Mechanical Spectral Analysis(qQMSA), is based on the strict QM rules between the molecularstructure of a compound and the positions and intensities of its ob-served NMR spectral lines and merely means the complete itera-tive analysis of the spectra, based on the QM spin Hamiltonian[27]. In our approach, the signals of well-defined, stoichiometric,components can be described by QM model while the backgroundis described by a multiterm baseline function [28] and the un-known signals using optimizable and adjustable lines, regular mul-tiplets or any spectral structures which can be composed fromspectral lines, taking into account also the available prior knowl-edge about the sample. The program qQMTLS (quantitative Quan-tum Mechanical Total-Line-Shape) combines our previousconstrained total-line-shape (CTLS) approach [2,29] and the itera-tive QM spectral analysis [30]. In this work, we developed and as-sessed strategies for qQMSA of biological samples using humanserum as the model system. We also tried to find answers to suchfundamental questions like ‘‘how strictly the NMR signal areas areproportional to the concentrations if the spectra are measured with

high-throughput manner’’ and ‘‘do the T2-edited and water sup-pressed spectra really obey exactly the QM rules’’. qQMSA is in factthe method to provide the answers to these questions which areusually left unanswered.

2. Materials and methods

2.1. Sample preparation

To avoid the problems arising from unknown samples, theassessment was done by using a series of samples containing com-mon metabolites listed in Table 1 (+ urea and acetate) and back-ground of human serum from five individuals. The metabolitesincluded to the samples are those that are abundant in human ser-um samples and visible in the high-throughput manner measuredT2-edited spectra [4]. The backgrounds are actually human serumprotein and lipoprotein samples with the small molecules dialyzedout. All the metabolites used in this work were their naturalenantiomers.

Dialyses cassettes (PIERCE, Slide-A-Lyzer Dialyses Cassette, 10 KMWCO) were washed in 300 ml dialyses buffer (10 mM Na2HPO4,150 mM NaCl and 2.7 mM KCl, pH 7.4) to remove glycerol fromthe membrane (5 min magnetic stirring). Five 3 ml aliquots of hu-man serum from five different individuals were dialyzed in threephases with magnetic stirring: (1) 1.5 l dialysis buffer, 1 h at roomtemperature; (2) 1.5 l dialysis buffer, 2 h at 4 �C; (3) 1.5 l dialysisbuffer, overnight (20 h) at 4 �C. Protein concentrations of the serawere measured before and after dialyses using the biuret method[31,32]. After dialyses, the sera were concentrated by ultrafiltrationin order to obtain the original protein concentrations. The centrif-ugal filters (VWR Centrifugal Filter, 10 K MWCO) were washed sev-eral times with water to remove glycerol from the filter membrane.The sera were filtered at 10,000g and 4 �C.

Samples with physiological metabolite concentrations of hu-man serum (Table S-1) were prepared by using sodium phosphatebuffer (75 mM Na2HPO4 in 80%/20% H2O/D2O, pH 7.4; includingalso 0.08% sodium 3-(trimethylsilyl)propionate-2,2,3,3-d4 and0.04% sodium azide) as a solvent. The NMR samples without andwith human serum background (pairs of parallel samples withthe same metabolite concentrations) were prepared by adding300 ll the above mentioned metabolite mixtures to 300 ll dialysisbuffer or dialyzed serum, respectively.

2.2. NMR Spectroscopy

All the measurements were performed on Bruker AVANCE III500 (Bruker, Karlsruhe, Germany) NMR spectrometer operatingat 500.36 MHz equipped with a 5 mm selective inverse probe.Shimming and tuning of the samples were performed automati-cally. The baseopt digital filter, which yields the perfectly flat base-line, was used. The spectra were measured at the physiologicaltemperature of 310 K.

The 1H NMR spectra of the mixtures without serum backgroundwere recorded with 160k data points after 4 dummy scans using 8transients acquired with an automatically calibrated 90� pulse andapplying a Bruker noesypresat pulse sequence with mixing time of10 ms and irradiation field of 25 Hz to suppress the water peak.The acquisition time was 5.5 s, the relaxation delay (d1) 3.0 s andthe additional relaxation delay before suppression (d2) 51.5 s.

The T2-edited spectra of the mixtures with serum backgroundwere measured in the high-throughput manner [4]. The spectrawere collected with 64k data points using 24 transients acquiredafter 4 dummy scans with a Bruker 1D CPMG pulse sequence withwater peak suppression and a 78 ms T2-filter with a fixed echo

Page 3: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Table 1Concentration ranges (lM) of metabolites in metabolite mixtures.

Metabolite Concentration range (lM) Metabolite Concentration range (lM)

3-Hydroxybutyrate 60–271 Glycine 169–286Acetoacetate 24–118 Histidine 39–83Alanine 312–491 Isoleucine 36–86Citrate 56–130 Lactate 555–1680Creatine 20–80 Leucine 64–126Creatinine 42–93 Phenylalanine 46–100Glucose 3880–6580 Pyruvate 35–99Glutamine 361–586 Tyrosine 38–76Glycerol 26–104 Valine 155–262

Table 2Response factors of glucose determined from the spectra measured with differentsettings.

qHa Hb qpresatc presatd qpresatc presatd

D2O D2O D2O D2O H2O + D2O H2O + D2O

a-Glucosea-H1 0.962 0.875 0.960 0.880 0.950 0.924a-H2 0.974 0.993 0.965 0.993 0.904 0.909a-H3 1.000 0.910 1.000 0.920 0.969 1.000a-H4 0.978 0.953 0.990 0.990 1.000 0.978a-H5 0.965 0.997 0.975 1.000 0.850 0.885a-H6A 0.977 0.997 0.953 0.994 0.884 0.868a-H6B 0.975 1.000 0.955 0.981 0.811 0.840

b-Glucoseb-H1 nd nd nd nd nd ndb-H2 0.988 0.869 0.949 0.840 1.000 0.993b-H3 0.996 0.955 0.978 0.945 0.986 1.000b-H4 0.986 0.959 0.951 0.926 0.952 0.954b-H5 0.989 0.993 1.000 1.000 0.974 0.989b-H6A 1.000 1.000 0.913 0.914 0.870 0.881b-H6B 0.982 0.987 0.904 0.908 0.845 0.863

a Basic proton spectrum (zg): 128k data points (td), 4 dummy scans (ds), 8transients (ns), acquisition time (aq) 7.7 s, relaxation delay (d1) 52.3 s and 90�pulse.

b Basic proton spectrum (zg): td = 128k, ds = 4, ns = 32, aq = 7.7 s, d1 = 2.3 s and90� pulse.

c Noesypresat pulse sequence (noesygppr1d): 10 ms mixing time, td = 128k,ds = 4, ns = 8, aq = 7.7 s, d1 = 3.0 s, additional relaxation delay before suppression(d2) 49.3 s and 90� pulse.

d As in c, but d2 = 0.

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 69

delay of 403 ls to minimize diffusion and J-modulation effects. Theacquisition time was 3.3 s and the relaxation delay 3 s.

T2 times of the metabolites were estimated by measuring T2-edited spectra of a metabolite mixture with different T2-filters(60–4000 ms). The spectra were recorded with 64k data pointsusing 128 transients acquired after 4 dummy scans. The acquisi-tion time and the relaxation delay were 3.3 s and 60 s, respectively.

2.3. Program qQMTLS

The program qQMTLS was written by employing Compaq VisualFORTRAN under Microsoft� Developer Studio and the programruns in standard desktop PC. The graphical interface allows allthe spectral preparations possibly useful for qQMSA, although nor-mally the analysis can be started without baseline corrections orany other manipulations from the spectrum imported from a mod-ern spectrometer where it has been prepared as described above.The experimental spectra can be imported into the program inthe JDX or PERCH formats. The program and the templates forthe serum analyses are freely available for non-commercial aca-demic purposes from the corresponding author. However, the fileand case management of the present version demands the supportof PERCH NMR Software (http://www.perchsolutions.com).

The CPU time for analysis does not form a limitation for themethod: a simulation of the spectrum or one iteration cycle ofthe case corresponding to the model J (Table 6) takes ca. one sec-ond in a microcomputer equipped with Intel�Xeon� CPU E3-1245 V2 (3.40 GHz) processor. This means that the CPU time bur-den for a typical analysis is less than 1 min. In practice, in the man-ual analysis one needs to reserve a few minutes of human time.Tools for automation, model building and peak alignment are un-der development.

Example of the spectral parameter file and the correspondingoutput file are included in the supporting material (Tables S-2and S-3).

2.4. Line Shape function

The goal of qQMSA is to minimize the residual root mean-square (rrms) between the observed and calculated spectral inten-sities I(mi)

rrms ¼X

IobsðmiÞ � IcalðmiÞ½ �2 ð1Þ

at spectral points i. In qQMTLS, the total line shape I(m) of an NMRfrequency domain spectrum (intensity I vs. spectral frequency m) isexpressed as a sum of the QM systems Qn(m), the signals Sm(m) whichare not described using QM model, and the baseline B(m) which rep-resents the macromolecular background and instrumental artifacts(Fig. 1):

IðmÞ ¼X

XnQ nðmÞ þX

xmSmðmÞ þ BðmÞ ð2Þ

The concentrations Xn and xm are the parameters whose valuesare wanted. The QM spectra Qn(m), can be described by theequation

QnðmÞ ¼ Fnðm;w; J;R;D; line shapeÞ ð3Þ

where the function Fn is a non-explicit function defined by the spinHamiltonian and the vectors w, J, R and D contain the chemicalshifts, coupling constants, response factors and line widths neededto describe the spin system, respectively. The chemical shifts, cou-pling constants and line widths depend on the chemical structureof the component and may also depend on the chemical composi-tion of the sample. The response factors, line widths and the lineshape parameters depend also on experimental conditions andinstrumentation. The response factors account for the small differ-ences caused mainly by differential relaxation of different speciesand they should ideally be close to 1.0. All these parameters reflectthe physical and chemical conditions of the sample and can betherefore worth an inspection after spectral analysis. The line shapeis expressed as an optimizable sum of Lorentzian, Gaussian and dis-persion functions [29].

The non-quantum mechanical signals Sm(m) can be used to de-scribe the signals which arise from unknown or chemically non-stoichiometric components. Examples of the latter are the broad

Page 4: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Table 3The mean absolute error (MAE) and the linear regression calibrated MAE percentages of the concentrations (calculated for 20 metabolite mixtures) in quantification of the 18metabolites using 5 different analysis protocols.

MAE % Calibrated MAE %f

Aa Bb Cc Dd Ee Aa Bb Cc Dd Ee

3-Hydroxybutyrate 4.2 7.4 3.7 5.5 5.3 3.8 4.3 3.2 2.7 2.6Acetoacetate 17.4 4.2 11.5 6.8 8.2 4.2 3.7 4.8 4.6 3.0Alanine 5.3 3.7 3.8 1.9 2.8 1.4 1.8 1.5 1.5 1.3Citrate 6.3 6.1 11.3 4.5 1.6 3.9 6.1 2.8 4.5 1.6Creatine 7.3 7.3 7.9 7.0 5.1 6.3 7.1 7.0 7.1 5.0Creatinine 11.4 7.1 10.5 5.5 2.2 4.9 7.1 3.7 5.6 2.0Glucose 9.8 1.6 2.1 1.8 2.8 1.3 1.2 1.3 1.0 0.5Glutamine 16.5 8.2 6.0 4.8 2.5 3.7 3.6 2.9 2.3 2.5Glycerol 23.9 37.2 16.8 15.5 9.0 24.3 29.4 10.3 9.3 8.9Glycine 1.7 2.2 2.8 2.4 1.7 1.4 2.0 1.4 2.4 1.6Histidine 21.5 21.2 14.1 3.4 2.6 11.4 14.1 2.4 3.5 2.3Isoleucine 4.4 6.7 5.1 3.5 2.7 4.4 5.6 3.2 2.7 2.7Lactate 1.0 2.4 2.1 1.4 2.2 0.9 1.1 1.0 1.1 0.7Leucine 20.1 9.4 10.5 5.5 6.6 3.1 2.5 1.6 1.4 1.9Phenylalanine 12.6 7.3 3.1 5.5 4.3 6.6 6.5 2.7 4.0 2.1Pyruvate 4.0 1.7 4.8 2.3 1.2 1.1 1.7 1.1 2.0 1.2Tyrosine 3.1 3.5 2.3 6.1 5.7 3.3 3.5 2.4 3.6 2.2Valine 5.7 1.4 7.1 1.4 1.9 1.1 1.4 1.4 1.2 0.9Average 9.8 7.7 7.0 4.7 3.8 4.8 5.7 3.0 3.4 2.4

a Constant line width (1.5 Hz) with 100% Lorenzian line shape, linear baseline, no weighting.b Line width optimized for all species independently, 100% Lorenzian line shape, linear baseline, no weighting.c Same line width for all species and Gaussian contribution to line shape optimized, intensity overweighting for weak signals, the regions overweighted: 0.90–1.23, 3.52–

3.69 and 6.80–7.90 ppm, 256 baseline terms.d Line width and Gaussian contribution to line shape optimized, intensity overweighting for weak signals, 100 baseline terms, overweighting as in the protocol C, constant

line width (1.2 Hz) for glycerol.e Line width and Gaussian contribution to line shape optimized, 50 baseline terms, overweighting as in the protocol C, fit-and-lock protocol for pyruvate; acetoacetate;

glycerol and aromatic metabolites, coupling constants and response factors optimized for glucose and lactate, template used for line widths.f See Eq. (7). The calibrated MAE takes into account the bias arising mainly from the differences between the response factors of different compounds. The linear regression

coefficients (k) are given in Table 4.

Table 4The results of the best fitting protocol (E in Table 3). The slopes k and R2 values arefrom the linear regression analysis of the theoretical and raw concentrations (see Eq.(7)) for data of 20 samples.

Metabolite k a R2 MAE % Calibrated MAE %

3-Hydroxybutyrate 0.959(0.006) 0.994 5.3 2.6Acetoacetate 0.929(0.007) 0.992 8.2 3.0Alanine 0.977(0.004) 0.982 2.8 1.3Citrate 1.006(0.004) 0.994 1.6 1.6Creatine 1.024(0.097) 0.983 5.1 5.0Creatinine 0.991(0.007) 0.984 2.2 2.0Glucose 0.972(0.002) 0.999 2.8 0.5Glutamine 1.010(0.007) 0.955 2.5 2.5Glycerol 1.013(0.019) 0.916 9.0 8.9Glycine 1.007(0.004) 0.985 1.7 1.6Histidine 1.018(0.006) 0.984 2.6 2.3Isoleucine 1.005(0.007) 0.982 2.7 2.7Lactate 0.981(0.002) 0.999 2.2 0.7Leucine 1.067(0.006) 0.988 6.6 1.9Phenylalanine 0.960(0.007) 0.985 4.3 2.1Pyruvate 1.002(0.004) 0.997 1.2 1.2Tyrosine 0.945(0.006) 0.984 5.7 2.2Valine 1.019(0.003) 0.995 1.9 0.9

a The numbers in parentheses give the standard errors of the regression coeffi-cients k.

Table 5The determined T2 times and the recovery percentages for two different T2-filters.

Metabolite T2/s Recovery-% with given T2-filter

80 ms 400 ms

3-Hydroxybutyrate 1.2 95 79Acetate 4.3 99 94Alanine 1.0 95 76Citrate 0.4 88 53Creatine 1.2 96 80Creatinine 2.3 98 89Glucose 0.8 93 70Glutamine 0.8 93 71Histidine 1.3 96 80Lactate 1.5 96 83Leucine 1.0 94 75Phenylalanine 2.1 97 87Pyruvate 3.8 99 93Threonine 1.0 95 76Tyrosine 1.3 96 81Valine 1.0 95 77

70 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

signals in the T2-edited serum spectra arising from the CH3- andCH2-protons of lipids in lipoproteins (Fig. 1). The general formulaof Sm(m) is

SmðmÞ ¼ Gmðm;w; J;D; line shapeÞ ð4Þ

where G is a multiplet function (singlet, doublet, triplet, etc.) whichcan be regular (1:1 doublet, 1:2:1 triplet, etc.) with a splitting de-scribed by J or irregular one for which the relative positions, inten-sities and line widths are freely optimizable or constrained to atemplate model. To rationalize the output, the lines can be arranged

into ‘‘complexes’’ so that their total areas are reported at the end ofthe analysis. For example, the CH3 signal of lipoproteins in T2-editedserum spectra can be described well using from 6 to 9 lines formingthe ‘‘complex’’. Also the baseline B(m) can be optimized during iter-ation. Modern NMR spectrometers have a very linear baseline, thusB(m) describes mostly broad signals originating from macromole-cules, not instrumental artifacts. In qQMTLS, an optimizable multi-term baseline function

BðmÞ ¼X

n

bnbðmÞ ð5Þ

is employed [28]. The b(m) terms are cos2-functions

bðmÞ ¼ cos2ðm� mnÞ=dm ð6Þ

Page 5: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Table 6The non-calibrated and calibrated mean absolute errors (MAEs) of the concentrations, calculated for 23 metabolite mixtures with serum background, using different analysisprotocols.

MAE % Calibrated MAE %f

F (%)a G (%)b H (%)c I (%)d J (%)e Fa Gb Hc Id Je

3-Hydroxybutyrate 40.0 21.3 22.0 10.7 17.3 31.1 20.8 20.7 11.3 11.7Acetoacetate 28.7 27.9 23.1 37.9 32.6 27.5 25.3 16.5 12.3 13.6Alanine 6.6 7.2 6.2 6.4 6.0 3.3 3.6 3.1 3.8 2.8Citrate 5.2 4.8 4.6 9.9 3.0 3.7 3.8 3.1 7.9 2.7Creatine 27.9 24.0 31.4 9.0 7.3 23.2 18.2 24.0 8.9 5.8Creatinine 12.3 15.2 10.2 7.3 7.6 12.0 10.2 8.5 7.6 6.5Glucose 2.6 1.3 1.7 1.2 1.7 1.7 1.3 1.5 0.9 1.1Glutamine 7.8 10.9 9.1 6.8 2.5 3.6 3.4 4.7 6.4 2.6Glycine 10.1 9.1 11.1 3.6 7.1 5.1 6.1 5.0 3.4 4.1Histidine 48.2 15.0 9.6 13.0 12.5 13.7 13.5 9.7 13.1 10.4Isoleucine 55.7 16.9 23.7 16.0 11.5 25.3 17.1 15.1 8.1 11.4Lactate 9.4 9.7 7.7 7.7 2.4 5.1 4.8 5.8 2.8 2.2Leucine 21.8 18.7 24.8 25.1 18.3 14.1 13.6 11.5 13.1 8.4Phenylalanine 7.6 13.2 8.4 4.6 6.4 7.5 11.5 6.7 4.5 3.4Pyruvate 28.2 26.9 18.2 22.8 22.4 6.6 10.4 10.9 5.7 7.2Tyrosine 21.9 13.4 7.0 6.4 7.3 10.3 13.2 7.2 6.0 5.9Valine 4.1 12.8 4.4 6.4 9.0 3.2 8.6 3.8 2.8 4.5Average 19.9 14.6 13.1 11.5 10.3 11.6 10.9 9.3 7.0 6.1

a Same line width for all species optimized, regions overweighted: 0.90–1.23, and 6.80–7.90 ppm, intensity overweighting for weak signals.b Otherwise like F but instead of overweighting, fit-and-lock protocol was used.c Same line width for all species optimized, fit-and-lock protocol for pyruvate; acetoacetate; isoleucine; leucine; 3-hydroxybutyrate; valine and the aromatic metabolites.d Template used for line widths, fit-and-lock as in H, the response factors from the T2-edited spectra of individual metabolites applied.e Otherwise like I but the line widths were more strongly (doubly) forced to the template values and the values of 3-hydroxybutyrate were obtained as described later on in

the text.f See footnote f in Table 3. The linear regression coefficients (k) are given in Table 7.

Fig. 1. A part of 500 MHz T2-edited (T2-filter time 80 ms) 1H NMR spectra of humanserum (black = observed spectrum, red = calculated spectrum, blue = QM lines,violet = fitted lines, grey = baseline, and green = difference between the calculatedand the observed spectrum). The calculated spectrum is the sum of the QM lines,fitted lines and baseline. Spectral insert shows the complex transition pattern ofleucine methyls. The lipid signals can be automatically created in qQMTLS bydefining the number of lines in the particular range and allowing the program toconstruct and optimize the multiplet. One should note that a good fit of thesesignals can be obtained with many line combinations.

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 71

where b(m) = 0 when m > mn + dm or m < mn � dm and they are set toevenly distributed positions (mn + 1 = mn + dm) so that their sumB(m) = 1.0 at every point when all bn = 1. When compared to the‘‘global’’ Fourier expansion used in the PERCHit iterator [30], theadvantage of the ‘‘localized’’ function is that any part of the spec-trum can be fitted independently of the rest of the spectrum, pre-serving the exact form of the baseline function in memory (Oneshould note that a linear baseline is obtained by setting all b equaland that, in order to prevent explosion of the coefficients, one has toadd a constraint to keep them as small as possible).

2.5. Iterative solution of the non-linear equations

Eq. (1) defines a set of linear equations for Xn, xm and coeffi-cients bn. If the model spectra (Qn(m) in Eq. (1)) are available forthe components and their chemical shifts match the observedspectrum, the concentrations Xn can be obtained by solving theequations. If the model spectra are experimental, they may containinstrumental artifacts and impurity signals. More serious is thatthe spectral parameters, especially the chemical shifts, and lineshapes may differ from those of the sample to analyzed so that asignificant bias is induced to the regression results. We have re-cently outlined an approach based on the adaptive spectral li-braries (ASLs) in which the model spectra are obtained usingQMSA and stored in the form of spectral parameters, which canbe used to simulate the model spectra in any magnetic field andwith any line width and shape [33]. In addition, the ASLs offer anefficient and convenient way to store the spectra.

If also the chemical shifts, coupling constants, line widths andline shape parameters need to be adjusted, the Eqs. (1)–(6) definea set of nonlinear equations and, for their iterative solution, thetrial values of the spectral descriptors need to be given prior tothe fitting. These trial values can be obtained from ASL. The iteratoralgorithm has been outlined before [30] and not discussed here.Because the concentration of a component may be defined by somevery weak but otherwise well-defined and separated signal, theprincipal component regression (PCR) is not useful in qQMSA, con-trary to the single compound applications. However, PCR can beused if an integral transform broadening [27,30] is used in an earlyphase of the analysis.

The number of the terms that are optimizable in qQMSA growswhen the number of components grows. This problem can be han-dled by separating the non-correlated parameters like the baselineterms and chemical shifts. Moreover, most of the coupling con-stants can be kept fixed since they are insensitive to experimentalconditions. Those coupling constants that may need to be adjustedcan be defined case-specifically. The dimension of the largest Hes-sian matrix needed in the present application is only ca. 200 and,

Page 6: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

72 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

thus, the number of metabolites could be easily multiplied withoutcomputational resource problems. Fortunately, for the biofluidslike serum and cerebrospinal fluid the number of common low-molecular-weight metabolites analyzable or to be analyzed fromtheir 1H NMR spectrum is usually at most ca. 50 [9].

2.6. Dynamic range of the populations: weighting, locking and ignore

The large dynamic range of the populations forms a challengefor the qQMSA of complex mixtures. Our objective that even thesmallest populations of the spectral metabolites varying from 1to 100 M units should be quantified with relative standard errorsmaller than 10%, demands special solutions for problems arisingfrom signal overlap. The typical rrms-error of fitting is ca. 0.2% ofthe maximum intensity, which means that weak signals at the rootof the intensive signal can be poorly defined and the line shapeartifacts may hinder the accurate estimation of their populations.In applications like the drug impurity analysis 13C decoupling canbe useful in quantification of very weak signals [2].

A typical spectrum of a biological sample consists of an inten-sive high field region and a weak low field region. The latter regioncontains, for example, signals from aromatic amino acids whichhave signals also in the more crowded area at the high field. If boththe regions are fitted simultaneously, the high field spectral fea-tures, such as intensive signals of glucose or protein background,may disturb the analysis so that the fit for the aromatic region suf-fers. qQMTLS offers two solutions to this problem: (i) the aromaticregion is strongly overweighted in the fitting or (ii) the aromaticregion is fitted first and the populations and spectral parametersdefined by that region are then locked for the analysis of the highfield region (called later fit-and-lock). In the case of T2-edited serumspectra, another example of information rich regions with weaksignals is the region between 0.75 and 1.10 ppm.

It can be also worthwhile to use a weighting protocol in whichthe strong intensities obtain lower weights in the least-square fit-ting. In the case of serum, this suits well for the range dominatedby strong glucose signals. The weight can also be bound to theoverlap of the signals or to the rrms-fit, so that the crowded areasor the areas with poor consensus between the observed and calcu-lated spectra obtain lower weights in the fitting. In addition, thepoorly defined ranges, for example around the water signal, canbe completely ignored from the analysis. Also, when a proton givesa well-defined signal and the other signals of the compound arecomplex and composed of numerous weak lines, the latter can beignored from the analysis. For example the leucine CH- and CH2-signals, which are complex and overlap with protein backgroundbut not with any invaluable metabolite information, can be ignoredfrom the analysis because the methyl signals (which have strongand well-defined lines, although quantum mechanically rathercomplex, see Fig. 1) contain the quantitative information in moreeasily digestible form.

2.7. Prior knowledge, constraints and templates

If a spectral parameter, like the chemical shifts of some minorcomponents in the glucose range of serum spectrum, is not welldefined, one can also define the range within which a parameteris constrained to stay during the iteration. Also the linear con-straints, similar to those used in CTLS analysis [2,29], are usefulwhen a parameter is poorly defined by the spectrum or it corre-lates with others. In qQMTLS one can do all this also by a template:the spectral parameters, for example chemical shifts and linewidths, typical for similar samples are defined in a template file,which is an ordinary spectral parameter file (see S-2), and thenthe least-square forces are used to constraint the parameters to

these target values. Instead of a template, when the trial valuesare known to be good, they can be used as the target values.

2.8. Alignment problem, second-order effects and different fieldstrengths

The basic protocol of qQMTLS demands that the correspondingcalculated and observed signals overlap in the beginning of theiteration [27], which means that the trial chemical shifts shouldrather be within 1–2 times the line width from the correct ones.The problem is common with other protocols and many effortshave been used to develop strategies [26,34,35] for the peak align-ment and its automation. The property that the exact structures ofmultiplets from each nucleus can be simulated in any field offersspecial tools also for peak alignment in qQMSA.

In the case of serum spectra, aligning the alanine methyl dou-blets is usually enough to align the signals of the other compoundsin the samples. If there are larger differences for certain signals, inqQMTLS one can use an interactive graphical tool to overlap indi-vidual multiplets. Alternatively, one can apply a broadening basedon Bartlett window, which allows an artificial overlap of the ob-served and calculated signals [27] and works well if the signalsare well defined but not so well with the very complex signalsof, for example, the CH protons of some amino acids. However, be-cause the alignment problem is not an issue in this work, it is notdiscussed widely here.

The second-order effects are related to the alignment problemand can therefore be discussed here. For example, the chemicalshifts of metabolites like glutamine and glutamic acid are sensitiveto cations and may sometimes overlap with each others, too. Thisforms a challenge especially in analysis of urine but may disturbalso the analyses of cerebrospinal fluid and cell extracts. In ourcase, the chemical shifts of the diastereotopic CH2 protons of gluta-mine differ only by 11–13 Hz (at 500 MHz) while the geminal cou-pling constants are ca. �15 Hz. This leads to strong second-ordereffects and make the outlook of the signals sensitive to small vari-ations of their relative chemical shifts. In QMSA the signals can besimulated with any chemical shift difference on fly during the iter-ative process.

The second-order effects depend also on field strengths andtherefore the same experimental model spectra cannot be usedfor the spectra measured at different field strengths. In qQMSAthe same model can be used; changing the field demands justthe change of the FIELD parameter in the spectral parameter file(see Table S-2).

3. Results and discussion

The performance of qQMSA was assessed with metabolite mix-tures without and with a human serum background (pairs of par-allel samples with the same metabolite concentrations). The intentof the first assessment was to test the performance of program, dif-ferent fitting protocols and parameters with known metabolitemixtures without complex background. These samples correspondfor example to the situation in which the proteins are removed byfiltration of serum or protein content of the samples is low, e.g. inthe case of urine. The intent of the second assessment was to testthe performance of qQMSA with serum spectra in the presence ofprotein background and as measured in the high-throughput man-ner [4].

Because all the spectra were measured with automatic shim-ming and identical settings with those the routine spectra mea-sured in the metabolomics projects [4], our statistics should beclose to the statistics obtainable with corresponding real samples.One might say that the spectra without the macromolecular back-

Page 7: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 73

ground represent almost ideal ones for quantification while thespectrum with the background is opposite, suffering about almostall the possible complications mentioned above.

The tools for model building and for automation are underdevelopment and therefore not discussed here. Anyhow, the com-bination of QMSA and ASL’s offers special features also for imple-mentation of this phase of the analysis.

3.1. Response factors and linear regression calibration

The area of NMR signal per proton is ideally the same for all theprotons of the same compound and qQMSA should give the samevalue (1.0) for their response factors. However, there are reasonswhy this is not fulfilled in practice: different relaxation times,water suppression, macromolecular interactions (with T2-editing),exchangeable protons, imperfect pulse calibration and proton-13Cisotope couplings. In order to explore some of these effects, weanalyzed the spectra of glucose measured with different settings(Table 2). If the spectrum was measured quantitatively with a longrelaxation delay, the response factors were almost identical ascould be presumed. However, in the other cases significant effectsare visible so that the response factors vary from 0.81 to 1.00,which means that corresponding bias (up to 19%) could be inducedto the integrals. These effects have a special importance in theaccurate quantification of low concentration components overlap-ping with high concentration components, like glucose in manybiological samples.

In the case of T2-edited spectra measured in the high-through-put manner, some response factors need special care. For example,the response factor of lactate CH proton can be as low as 0.70 whencompared to that of the lactate CH3 protons. Other examples ofsuch lowered response factors are the aromatic protons. On the ba-sis of these examples, we ended up to the practice that the largestresponse factor of a compound is set to 1.0. In cases, where the re-sponse factors cannot be obtained accurately or the concentrationof the component is low (so that its response factor has not mucheffect on the total fit), they can be fixed to values determinedelsewhere.

The response factors (when defined as above, 1.0 for the largestvalue of each compound) take into account the differences be-tween different protons of a compound but the response differ-ences between compounds are not taken into account. This maylead to bias in the concentrations if no calibration is done. In the

Table 7The results of the best fitting protocol (J in Table 6) for metabolite mixtures with human sedata from 23 samples.

Metabolite ka R2b

3-Hydroxybutyrate 0.882(0.022) 0.903 (I: 0Acetoacetate 1.592(0.045) 0.887 (I: 0Alanine 1.055(0.008) 0.961 (I: 0Citrate 1.015(0.007) 0.989Creatine 0.953(0.013) 0.969Creatinine 0.965(0.016) 0.897Glucose 1.015(0.003) 0.973 (I: 0Glutamine 1.005(0.008) 0.952Glycine 0.942(0.010) 0.924 (I: 0Histidine 1.088(0.026) 0.780Isoleucine 0.999(0.026) 0.774 (I: 0Lactate 1.010(0.005) 0.992Leucine 0.855(0.017) 0.804Phenylalanine 1.071(0.010) 0.963Pyruvate 1.312(0.021) 0.927 (I: 0Tyrosine 1.056(0.015) 0.891Valine 0.914(0.011) 0.915 (I: 0

a The numbers in parentheses give the standard errors of the regression coefficients kb The numbers in parentheses give the best value and the corresponding protocol (fro

following (Tables 3, 4, 6 and 7), we report the statistics assuming,first, that the ratio between the signal area and the concentration isthe same for all the compounds, and second, the statistics wherethe differences between compounds are taken into accountthrough calibration based on linear regression analysis using theequation

concentration ¼ k� raw concentration ð7Þ

where the raw concentration is obtained by comparing the signalarea to that of the reference compound, given by qQMTLS whenthe molarity of the reference compound is given. The calibratedMAE (Mean Absolute Error) is calculated by using these concentra-tions instead of the raw ones. The inverse of the slope (1/k) gives therelative response factor of the compound as compared to that of thereference compound. The calibration may remove also bias arisingfrom the sample preparation.

3.2. Quantification of metabolite mixtures without macromolecularbackground

The performance of qQMTLS was tested with 20 metabolitemixtures each containing 18 metabolites (+ acetate and urea) withphysiological concentrations of human serum (Fig. 2 and Table 1).Numerous different protocols were tested (selected protocols inthe Table 3), starting from fitting with constant line width andno weightings or constraints ending up to fitting with optimizedline widths and shapes, weighted areas, locked integrals, optimiza-tion of some response factors, and use of a template.

A majority of the metabolites could be quantified with an aver-age error <10% (Table 3) no matter which protocol was used, eventhough the signal-to-noise ratio (S/N) for low concentration metab-olites, like the aromatic amino acids, was only ca. 10. When moresophisticated fitting protocols were used, glycerol was the onlymetabolite causing problems (because its concentrations are lowand for both of its signals are partly overlaid by bigger signals).When glycerol was excluded from the results, the calibrated MAEdropped from 2.4% to 2.0% for the best model E (Table 3).

The fitting of the low concentration metabolites giving bothwell-defined signals and signals that are hardly visible (swampedwith stronger lines and/or spectral background) form a specialchallenge. The fit-and-lock, fitting of specific areas separately, forexample aromatic region of aromatic amino acids, followed bylocking of the corresponding parameters yielded clearly better re-

rum background. The slopes k and R2 values are from the linear regression analysis of

MAE %b Calibrated MAE %b

.904) 17.3 (I: 10.7) 11.7 (I: 11.3)

.916) 32.6 (H: 23.1) 13.6 (I: 12.3)

.963) 6.0 2.83.0 2.77.3 5.87.6 (I: 7.3) 6.5

.976) 1.7 (I: 1.2) 1.1 (I: 0.9)2.5 2.6

.936) 7.1 (I: 3.6) 4.1 (I: 3.4)12.5 (H: 9.6) 10.4 (H: 9.7)

.879) 11.5 11.4 (I: 8.1)2.4 2.218.4 8.46.4 (I: 4.6) 3.4

.950) 22.4 (H: 18.2) 7.2 (I: 5.7)7.3 (I: 6.4) 5.9

.965) 9.0 (F: 4.1) 4.5 (I: 2.8)

.m Table 6) if not protocol J.

Page 8: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Fig. 2. 500 MHz 1H NMR spectrum of a typical metabolite mixture (without serum background) used in the testing.

Fig. 3. Correlations between the theoretical and quantified metaboliteconcentrations.

74 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

sults than an overweighting of the same areas. This indicates that a‘‘leak’’ of spectral artifacts, intensity of stronger signals and back-ground cannot be completely prevented with the overweighting.

Another topic demanding a special care is formed by the linewidths of weak and complex signals. The correlation coefficientsof the low concentration metabolites, like glycerol, were improvedwhen a constant or the same line width or template for line widthswere used, as compared to the fitting protocols in which the linewidths were freely optimized. In the case of mixtures with low dy-namic range, simple fitting with line width optimization yields rea-sonable results. Even in the case of these metabolite mixtures(with large dynamic range) all the other metabolites excluding his-tidine and glycerol could be quantified with average errors smallerthan 10% when using such a protocol. On the other hand, fittingwith same, optimized line width for all species combined withoverweighting of low concentration metabolites, forms a goodstarting point for high dynamic range mixtures, especially if thereis no prior knowledge about line widths.

In the best fitting protocol (E in Tables 3 and 4, and Fig. 3), theaverage errors of all the metabolite concentrations were smallerthan 10% and the average of all the metabolites average errorswas 3.8% or for the calibrated values 2.4%. In this protocol the linewidths and shapes were optimized, overweighting and locking ofthe populations were used, some response factors and couplingconstants (glucose and lactate) were optimized, and a templatewas used. Without the calibration the MAE was 3.8% and the con-centrations deviated from the real ones so that the average bias(representing also the deviation of the response factors from one)was 2.5% (average 100 � |k � 1| from Table 4) and the largest biaswas 7.1% for acetoacetate.

3.3. T2 editing

T2-editing is commonly used for protein background removal.In principle, the T2 effects on signal intensities can be divided into(i) the direct effects via different T2 times of the metabolites andinto (ii) the effects arising from the metabolites interactions withproteins. In the present experiments the former contribution wassounded, while, the latter is one of the origins of the systematic dif-ferences between the estimated and theoretical concentrations inthe samples with serum background.

In order to estimate the bias that T2-editing causes to metabo-lite concentrations, T2 times of metabolites were estimated fromT2-edited spectra of a metabolite mixture measured with differentT2-filters (60-4000 ms). The spectra were analyzed using qQMTLS

with response factor value one for all the protons and so the esti-mates are average T2 times of the corresponding molecule (not de-fined for a specific proton of the molecule). Fig. 4 presents therecoveries of six different metabolites as a function of T2-filtertime. Nonlinear one phase exponential decay (GraphPad Prism ver-sion 5.03 for Windows) was applied to the T2 time analysis. Table 5gives the T2 times and the calculated recovery percentages withtwo different T2-filter times for 16 common metabolites. T2-filterof 80 ms causes on average 5% errors to the concentrations,whereas 400 ms T2-filter time causes severe bias of average 21%.The effects of the long T2-filter are even bigger, if the concentrationratios are monitored. For example, the ratio of alanine to citrate,changes from 1.08 to 1.43 when T2-filter of 400 ms is used insteadof 80 ms. In practice, this has no serious implications if spectrameasured with different T2-filter times are not compared.

The experiment shows that even the cautious T2-editing yieldsa small but significant systematic bias in the absolute concentra-tions of the metabolites. However, the concentrations can be con-verted to absolute ones if the magnitudes of the effects are known.In addition, T2-editing may distort signals. In our experiments a

Page 9: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Fig. 4. The recovery of six common metabolites as a function of T2-filter time.Concentrations obtained from 1H NMR spectra (noesypresat) were used as the zeropoints (concentration 100%).

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 75

visible J-modulation was seen in the 3-hydroxybutyrate CH2-signal(see Fig. S-1) and, hardly, in the alanine and lactate CH3-signals.From the point of the quantitativity, the effects are insignificantand blended in analysis with the response factor effects. One cansay fairly that the moderately T2-edited serum spectra can be mea-sured so that they obey well the QM rules of the non-edited sys-tem, which is a prerequisite if the QM rules are wanted to befully exploited in their analysis.

3.4. Quantification of serum samples with macromolecularbackground

The performance of qQMTLS was tested with 23 human serummimics. The aim was to have as natural samples as possible withknown metabolite concentrations. Five different typical serumbackgrounds (which vary significantly, see Fig. 5) were used andthe metabolite concentrations were randomized within physiolog-ical range. Additionally, the protein concentrations of the sampleswere adjusted to match the original protein concentrations of thesera. Fig. 6 shows T2-edited spectra of the original (real) serumand the sample (serum mimic) prepared from serum by removingthe small molecule components with dialysis and replacing themwith the known amounts of common metabolites. Glycerol was ex-cluded from the results because there were also minute amounts ofit originating from sample preparation (from dialyses cassettes de-spite careful washing).

In the case of spectra with complex background, like serum,straightforward fitting with freely optimized line widths does notwork. Also here the use of the same, optimized line width for all

Fig. 5. 500 MHz T2-edited (T2-filter time 80 ms) 1H NMR spectra of the fi

species, with overweighting the signals of low concentrationmetabolites, is a good starting point for fitting, although the bestresults are obtained when the fit-and-lock protocol and a templatefor line widths are used (Table 6).

Also the T2-edited spectra of the pure individual metaboliteswere analyzed in order to determine the effects of spectral editingto the response factors of the metabolites, and when these re-sponse factors (for example for glucose values between 0.915and 1.000) were applied in the final protocol, slightly better resultswere obtained (especially for the slopes of the aromatic com-pounds). It should be mentioned that the metabolites with the big-gest errors and the lowest slopes (Table 7), acetoacetate andpyruvate, suffered most from the high-throughput manner per-formed measurements. However, when the regression calibratedMAE percentages are observed, the effects are partly compensated(Table 7).

3-Hydroxybutyrate, the third metabolite with poor results, hasonly one truly visible signal, methyl signal, which is on the steepside of the strong lipid CH2 signal (Figs. 1 and 6). This makes thefitting of 3-hydroxybutyrate sensitive to the line width constraintsand the serum background which varies from case to case (Fig. 5).The good results for 3-hydroxybutyrate in protocol I can bethought to be fortuitous and too sensitive to the line width con-straints: the doubling of the line width constraints weight in-creased the calibrated MAE to 21.2%. In protocol J, fit-and-lock ona narrow range (20 Hz) was used for 3-hydroxybutyrate. This ap-proach is not sensitive to the template constraints and allows amore effective description of the background in the fitting rangeand invocation of the line shape information of the doublet.

In these protocols (Table 6), also the baseline (see Eq. (5)) wasoptimized. It was found useful to force the baseline lightly towardlinear and positive applying baseline forcing parameters. In addi-tion to the instrumental background, which is almost linear andinsignificant in the modern NMR spectrometers, the baseline de-scribes mostly the macromolecular background. Since loose (un-forced) baseline can describe also the macromolecularbackground, these two contributions can be combined in estimat-ing the area of the background. Also fully linear baseline can beused, if the instrumentation allows it and if unambiguous informa-tion about the non-quantum mechanically described signals of thespectrum is wanted.

Our results show that, in the case of spectra with complex back-ground, like serum, the best results demand a tailored protocol.Fig. 7 and Table 7 give the results of the best protocol (J in Table 6)based on fit-and-lock of certain metabolites (3-hydroxybutyrate,acetoacetate, aromatic amino acids, isoleucine, leucine, pyruvateand valine), template constraints for line widths and the responsefactors obtained from the T2-edited spectra of individual metabo-lites. In the case of the moderately T2-edited serum spectra, mea-

ve different serum backgrounds used in the human serum mimics.

Page 10: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

Fig. 6. 500 MHz T2-edited 1H NMR spectra of real human serum (blue) and human serum mimic (red). The human serum background is the same in both samples butmetabolite concentrations are randomized.

Fig. 7. Correlations between the theoretical and quantified metabolite populations.A includes all the metabolites. For a more detailed view, glucose and lactate, whichboth have high concentrations when compared to the others, are excluded from theinsert B.

76 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

sured in the high-throughput manner, the slopes (from Table 7) ofacetoacetate and pyruvate deviated 59% and 31% from one, respec-tively, and for the others the deviation (bias) was 5.4%. A part ofthese deviations, like the increased calibration coefficients (k) ofthe aromatic amino acids, may reflect the metabolite-protein inter-actions [36,37], although in our samples these interactions weredecreased by dilution. Because the fraction of the free metabolitesdepends also the lipid composition of the lipoproteins [37] (we hadfive different plasma backgrounds), a part of the above bias mayarise from the personals variations. One should also bear in mindthat if the absolute concentrations are not needed, the concentra-tions obtained for the same kind of samples measured and ana-lyzed in the same way, can be compared to each other withoutany case-specific calibration.

3.5. General protocol

The model building forms the first phase of the analysis. Theeasiest way is to use a parameter file which is selected so thatthe composition is close to the current case. For setting up acompletely new system, tools based on the ASL’s are under

development. Next, prior to the iterative analysis, it is advisableto minimize the chemical shift effects arising from the concentra-tion and solute-reference effects. In the case of serum spectra onecan align the observed spectra according to the alanine methyl sig-nal, see the discussion above. It is also recommendable to presetthe response factors of the major compounds like glucose and lac-tate, to the ‘‘consensus’’ values obtained from a careful analysis of agood quality spectrum. If the slope parameters (k in Eq. (7)) and areknown, also they can be given at this stage (see S-2).

A special tailored iterative protocol is normally necessary inquantification of low concentration compounds. One can alsouse a template (the spectral parameter file obtained from theanalysis of the good quality spectrum) to guide the responsesor any spectral parameters to reasonable values. Normally thecoupling constants can be fixed, but their optimization can bedone for high concentration components for which the goodvalues are not available or if the conditions of the sample differgreatly from those used to determine the trial coupling valuesin ASL.

The basic iterative protocol for biofluids may consist of the fol-lowing phases: (i) fit the aromatic region and lock the populationsof the aromatic compounds, (ii) fit the high field methyl region andlock the populations, (iii) fit certain metabolites on the basis oftheir well resolved signals and lock and, finally, (iv) fit whole thespectrum (keeping the lockings). As an example of the phase (iii)one could mention threonine (not present in these samples) whichhas one signal under lactate methyl signal which can be verystrong in some samples, and therefore the bias (arising from thevery different response factors of the two lactate signals) may in-duce a bias also to the threonine population if threonine is notoptimized and locked before on the basis of its other signals. An-other type of example was offered by 3-hydroxybuturate whichhas four signals all of which are disturbed by the serum back-ground or some other components, see above.

3.6. Terminology and perspectives

We suggest here that all the NMR spectra analysis methodsbased on QM spin-Hamiltonian, from LAOCOON to qQMTLS, wouldbe called with the abbreviations QMSA or, in the case of the quan-titative applications, qQMSA. By using the term QMSA, we want tounderline the theoretical superiority of QMSA to the more approx-imate methods developed for quantitative 1D NMR. The term HiF-SA (H iterative Full Spin Analysis), promoted by Pauli et al. [38],tackle to the iterativity of the QMSA, which is not the most essen-tial feature of QMSA and should be used when the iterativity isstressed.

Page 11: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78 77

One might say that qQMSA and the program qQMTLS representa landmark in the history of QMSA, started by the line frequencybased programs LAOCOON3 [19] and NMREN/NMRIT [39]. As thesecond landmark one can consider the changeover to thetotal-line-shape principle, first by the program DAVINS [40] andDAISY [41] , and then with a different algorithm by the PERCHititerator [30]. The third major landmark was achieved when thespin-system size limitations were (greatly) solved so that, forexample, the spectra of strychnine or steroids can be simulatedin a few seconds [27]. As a landmark one can consider also thestructure verification based on the symbiosis of QMSA and molec-ular modeling [27]. The history will not end to qQMSA: there arestill many challenges in the structure verification and in thecombining multidimensional and spectral editing information intoQMSA.

4. Conclusions

This is the first systematic study of the quantitative QuantumMechanical Spectral Analysis (qQMSA) of complex mixtures. Thecombination qQMSA and the ASL’s offers tools and strategies whichare impossible in the other approaches developed for quantitativeanalysis of 1D NMR spectra: if the quantitative model for a systemis build for a certain field, the same model can be used for spectrameasured at some other field by just changing one value in the filewhich defines the spin system. QMSA allows also the handling thesecond-order effects which complicate the analyses of the metab-olites like glutamine. In addition to the quantitative information,qQMSA offers the chemical confidence, which means that individ-ual components can be identified with a high confidence on the ba-sis of the accurate coupling constants; this can be an invaluableproperty in many applications of qQMSA.

The qQMSA approach, including models describing unknowncomponents, background and prior knowledge from the samplesenables modeling of even the smallest details of the spectrumand the maximal quantitative NMR information analysis. Natu-rally, the fitting model and an appropriate protocol must be setup for the system prior to analysis. 2D spectra can be used to pickup the potential compounds from ASL in the model building phase,whereas qQMSA is used to obtain consistent populations (evenwithout any calibration) and the final chemical confidence.

For mixtures without a complex macromolecular background,with an appropriate fitting protocol and when the bias arising fromthe response factors was removed by calibration, MAE was only2.0% (2.4% with glycerol), which includes also the standard devia-tion (<1%) arising from the preparation of the test samples. With-out the calibration the bias increased the MAE to 3.8%, withmaximum of 7% for acetoacetate. Such systematic biases are toler-able in normal applications where the absolute concentrations arenot needed. On the other hand, the above MAE’s should be close tothe accuracy which is obtained with serum samples after removalof the macromolecular background.

Although the T2-edited spectra obey otherwise fairly well theQM rules, the response factors of some compounds (such as lactate,acetoacetate and pyruvate) in serum deviate significantly from oneand demand calibration if absolute concentrations are wanted.Also the effects of the T2-editing, high-throughput mannerperformed measurements and serum background (protein interac-tions) complicate qQMSA and, therefore, appropriate fitting proto-cols are necessary in the quantification of low concentrationcompounds. Anyhow, the average after-calibration MAE of 6.1%allows us to conclude that with an appropriate protocol, qQMTLSallows a fair quantification of the most common metabolitesalso in human serum without removal of the macromolecularbackground.

Acknowledgments

This work was supported by the Academy of Finland and theDoctoral Program of Organic Chemistry and Chemical Biology.

Appendix A. Supplementary material

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.jmr.2014.02.008.

References

[1] G.F. Pauli, T. Gödecke, B.U. Jaki, D.C. Lankin, Quantitative 1H NMR.Development and potential of an analytical method: an update, J. Nat. Prod.75 (2012) 834–851.

[2] P. Soininen, J. Haarala, J. Vepsäläinen, M. Niemitz, R. Laatikainen, Strategies fororganic impurity quantification by H-1 NMR spectroscopy: constrained total-line-shape fitting, Anal. Chim. Acta 542 (2005) 178–185.

[3] O. Beckonert, H.C. Keun, T.M.D. Ebbels, J. Bundy, E. Holmes, J.C. Lindon, J.K.Nicholson, Metabolic profiling, metabolomic and metabonomic procedures forNMR spectroscopy of urine, plasma, serum and tissue extracts, Nat. Protoc. 2(2007) 2692–2703.

[4] P. Soininen, A.J. Kangas, P. Würtz, T. Tukiainen, T. Tynkkynen, R. Laatikainen,M.R. Järvelin, M. Kähonen, T. Lehtimäki, J. Viikari, O.T. Raitakari, M.J.Savolainen, M. Ala-Korpela, High-throughput serum NMR metabonomics forcost-effective holistic studies on systemic metabolism, Analyst 134 (2009)1781–1785.

[5] J.F. Wu, Y.P. An, J.W. Yao, Y.L. Wang, H.R. Tang, An optimised samplepreparation method for NMR-based faecal metabonomic analysis, Analyst135 (2010) 1023–1030.

[6] N. Psychogios, D.D. Hau, J. Peng, A.C. Guo, R. Mandal, S. Bouatra, I. Sinelnikov, R.Krishnamurthy, R. Eisner, B. Gautam, N. Young, J.G. Xia, C. Knox, E. Dong, P.Huang, Z. Hollander, T.L. Pedersen, S.R. Smith, F. Bamforth, R. Greiner, B.McManus, J.W. Newman, T. Goodfriend, D.S. Wishart, The human serummetabolome, PLoS ONE 6 (2011) e16957.

[7] K. Suhre, H. Wallaschofski, J. Raffler, N. Friedrich, R. Haring, K. Michael, C.Wasner, A. Krebs, F. Kronenberg, D. Chang, C. Meisinger, H.E. Wichmann, W.Hoffmann, H. Völzke, U. Völker, A. Teumer, R. Biffar, T. Kocher, S.B. Felix, T. Illig,H.K. Kroemer, C. Gieger, W. Römisch-Margl, M. Nauck, A genome-wideassociation study of metabolic traits in human urine, Nat. Genet. 43 (2011)565–569.

[8] G. Nicholson, M. Rantalainen, J.V. Li, A.D. Maher, D. Malmodin, K.R. Ahmadi, J.H.Faber, A. Barrett, J.L. Min, N.W. Rayner, H. Toft, M. Krestyaninova, J. Viksna, S.G.Neogi, M.E. Dumas, U. Sarkans, P. Donnelly, T. Illig, J. Adamski, K. Suhre, M.Allen, K.T. Zondervan, T.D. Spector, J.K. Nicholson, J.C. Lindon, D. Baunsgaard, E.Holmes, M.I. McCarthy, C.C. Holmes, The MolPAGE consortium, a genome-widemetabolic QTL analysis in Europeans implicates two loci shaped by recentpositive selection, PLoS Genet. 7 (2011) e1002270.

[9] S. Bouatra1, F. Aziat, R. Mandal, A.C. Guo, M.R. Wilson, C. Knox, T.C. Bjorndahl,R. Krishnamurthy, F. Saleem, P. Liu, Z.T.Dame1, J. Poelzer, J. Huynh, F.S. Yallou1,N. Psychogios, E. Dong, R. Bogumil, C. Roehring, D.S. Wishart, The Human UrineMetabolome, Plos One, 8 (2013) e73076.

[10] J. Kettunen, T. Tukiainen, A.P. Sarin, A. Ortega-Alonso, E. Tikkanen, L.P.Lyytikäinen, A.J. Kangas, P. Soininen, P. Würtz, K. Silander, D.M. Dick, R.J.Rose, M.J. Savolainen, J. Viikari, M. Kähonen, T. Lehtimäki, K.H. Pietiläinen, M.Inouye, M.I. McCarthy, A. Jula, J. Eriksson, O.T. Raitakari, V. Salomaa, J. Kaprio,M.R. Järvelin, L. Peltonen, M. Perola, N.B. Freimer, M. Ala-Korpela, A. Palotie, S.Ripatti, Genome-wide association study identifies multiple loci influencinghuman serum metabolite levels, Nat. Genet. 44 (2012) 269–276.

[11] T.M.D. Ebbels, R. Cavill, Bioinformatic methods in NMR-based metabolicprofiling, Prog. Nucl. Magn. Reson. Spectrosc. 55 (2009) 361–374.

[12] J. Trygg, E. Holmes, T. Lundstedt, Chemometrics in Metabonomics, J. ProteomeRes. 6 (2006) 469–479.

[13] A.M. Weljie, J. Newton, P. Mercier, E. Carlson, C.M. Slupsky, Targeted profiling:quantitative analysis of H-1 NMR metabolomics data, Anal. Chem. 78 (2006)4430–4442.

[14] D.S. Wishart, Quantitative metabolomics using NMR, TrAC, Trends Anal. Chem.27 (2008) 228–237.

[15] C. Ludwig, D.G. Ward, A. Martin, M.R. Viant, T. Ismail, P.J. Johnson, M.J.O.Wakelam, U.L. Günther, Fast targeted multidimensional NMR metabolomics ofcolorectal cancer, Magn. Reson. Chem. 47 (2009) S68–S73.

[16] P. Sandusky, E. ppiah-Amponsah, D. Raftery, Use of optimized 1D TOCSY NMRfor improved quantitation and metabolomic analysis of biofluids, J. Biomol.NMR 49 (2011) 281–290.

[17] H.K. Kim, Y.H. Choi, R. Verpoorte, NMR-based metabolomic analysis of plants,Nat. Protoc. 5 (2010) 536–549.

[18] R.J. Abraham, Analysis of High Resolution NMR Spectra, Elsevier Publishing Co,New York, 1971.

[19] S. Castellano, A.A. Bothner-By, Analysis of NMR spectra by least squares, J.Chem. Phys. 41 (1964) 3863–3869.

[20] M.L. Anthony, B.C. Sweatman, C.R. Beddell, J.C. Lindon, J.K. Nicholson, Patternrecognition classification of the site of nephrotoxicity based on metabolic data

Page 12: Quantitative Quantum Mechanical Spectral Analysis (qQMSA) of 1H NMR spectra of complex mixtures and biofluids

78 M. Tiainen et al. / Journal of Magnetic Resonance 242 (2014) 67–78

derived from proton nuclear magnetic resonance spectra of urine, Mol.Pharmacol. 46 (1994) 199–211.

[21] R.A. Davis, A.J. Charlton, J. Godward, S.A. Jones, M. Harrison, J.C. Wilson,Adaptive binning: an improved binning method for metabolomics data usingthe undecimated wavelet transform, Chemom. Intel. Lab. Syst. 85 (2007) 144–154.

[22] D.J. Crockford, H. Keun, L.M. Smith, E. Holmes, J.K. Nicholson, Curve-fittingmethod for direct quantification of compounds in complex biological mixturesusing 1H NMR: application in metabonomic toxicology studies, Anal. Chem. 77(2005) 4556–4562.

[23] R. Stoyanova, A.W. Nicholls, J.K. Nicholson, J.C. Lindon, T.M. Brown, Automaticalignment of individual peaks in large high-resolution spectral data sets, J.Magn. Reson. 170 (2004) 329–335.

[24] R.J.O. Torgrip, J. Lindberg, M. Linder, B. Karlberg, S.P. Jacobsson, J. Kolmert, I.Gustafsson, I. Schuppe-Koistinen, New modes of data partitioning based onPARS peak alignment for improved multivariate biomarker/biopatterndetection in 1H NMR spectroscopic metabolic profiling of urine,Metabolomics 2 (2006) 1–19.

[25] Q.W. Xu, J.R. Sachs, T.C. Wang, W.H. Schaefer, Quantification and identificationof components in solution mixtures from 1D proton NMR spectra usingsingular value decomposition, Anal. Chem. 78 (2006) 7175–7185.

[26] P. Mercier, M. Lewis, D. Chang, D. Baker, D. Wishart, Towards automaticmetabolomic profiling of high-resolution one-dimensional proton NMRspectra, J. Biomol. NMR 49 (2011) 307–323.

[27] R. Laatikainen, M. Tiainen, S.P. Korhonen, M. Niemitz, Computerized Analysisof High-resolution Solution-state Spectra, Encyclopedia of MagneticResonance, John Wiley & Sons Ltd., 2011.

[28] T. Tynkkynen, T. Hassinen, M. Tiainen, P. Soininen, R. Laatikainen, 1H NMRspectral analysis and conformational behavior of n-alkanes in differentchemical environments, Magn. Reson. Chem. 50 (2012) 598–607.

[29] R. Laatikainen, M. Niemitz, W.J. Malaisse, M. Biesemans, R. Willem, Acomputational strategy for the deconvolution of NMR spectra with multipletstructures and constraints: analysis of overlapping C-13-H-2 multiplets of C-13 enriched metabolites from cell suspensions incubated in deuterated media,Magn. Reson. Med. 36 (1996) 359–365.

[30] R. Laatikainen, M. Niemitz, U. Weber, J. Sundelin, T. Hassinen, J. Vepsäläinen,General strategies for total-lineshape-type spectral analysis of NMR spectrausing integral-transform iterator, J. Magn. Reson., Ser. A 120 (1996) 1–10.

[31] A.G. Cornall, C.J. Bardawill, M.M. David, Determination of serum proteins bymeans of the biuret reaction, J. Biol. Chem. 177 (1949) 751–766.

[32] B.T. Doumas, D.D. Bayse, R.J. Carter, T. Peters, R. Schaffer, A candidate referencemethod for determination of total protein in serum. 1. Development andvalidation, Clin. Chem. 27 (1981) 1642–1650.

[33] M. Tiainen, H. Maaheimo, M. Niemitz, P. Soininen, R. Laatikainen, Spectralanalysis of 1H coupled 13C spectra of the amino acids: adaptive spectral libraryof amino acid 13C isotopomers and positional fractional 13C enrichments,Magn. Reson. Chem. 46 (2008) 125–137.

[34] E. Alm, T. Slagbrand, K.M. Åber, E. Wahlström, I. Gustafsson, J. Lindberg,Automated annotation and quantification of metabolites in 1H NMR data ofbiological origin, Anal. Bioanal. Chem. 403 (2012) 443–455.

[35] A. Alonso, M.A. Rodríguez, M. Vinaixa, R. Tortosa, X. Correig, A. Julià, S. Marsal,FOCUS: a Robust Workflow for Onedimensional NMR Spectral Analysis, Anal.Chem. 86 (2014) 1160–1169.

[36] M. Jupin, P.J. Michiels, F.C. Girard, M. Spraul, S.S. Wijmenga, NMR identificationof endogenous metabolites interacting with fatted and non-fatted humanserum albumin in blood plasma: fatty acids influence the HSA-metaboliteinteraction, J. Magn. Reson. 228 (2013) 81–94.

[37] M.D. Jupin, P.J. Michiels, F.C. Girard, M. Spraul, S.S. Wijmenga, NMRMetabolomics profiling of blood plasma mimics shows that medium- andlong-chain fatty acids differently release metabolites from human serumalbumin, J. Magn. Reson. 239 (2014) 34–43.

[38] J.G. Napolitano, D.C. Lankin, S.-N. Chen, G.F. Pauli, Complete 1H NMR spectralanalysis of ten chemical markers of Ginkgo biloba, Magn. Reson. Chem. 50(2012) 569–575.

[39] J.D. Swalen, C.A. Reilly, Analysis of complex NMR spectra. An iterative method,J. Chem. Phys. 37 (1962) 21–29.

[40] G. Hägele, M. Engelhardt, W. Boenigk, Simulation und Automatisierte Analysevon Kernresonanzspektren, VCH, Weinheim, 1987.

[41] U. Weber, H. Thiele, NMR-Spectroscopy: Modern Spectral Analysis, Wiley-VCH, Weinheim, 1998.