mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak...

21
mzMatch Excel Template Tutorial

Transcript of mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak...

Page 1: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

mzMatch Excel Template

Tutorial

Page 2: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Installation & Requirements

• Installation

• The template may be used to process mzMatch output text files without

additional installations or add-ins.

• Microsoft Excel 2007 required (2003 not sufficient, 2010 not tested)

• Requirements for full function

• R Statistical Software : for mzmatch pre-processing

R packages: XCMS (BioC), mzMatch.R (Rforge), rJava and XML (CRAN)

R package: rCDK : for FormulaGenerator

• Firefox or Internet Explorer : for Hyperlinks to online databases

• Thermo Xcalibur : for EIC lookup

• ReAdW : for conversion of .RAW to .mzXML files

• If you wish to use R and Xcalibur links: Open the template and update cells

D44 and D45 (on the Settings sheet) to the relevant paths on your

computer

Page 3: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Data Pre-processing

• Step 1 - Setup

• Open “mzMatch_Template.xltm” and SaveAs “yourfile.xlsm”

(Macro enabled workbook)

• Go to the “Settings” sheet

• Update cells D44 and D45 (on the Settings sheet) to the relevant

paths on your computer

• Step 2 - Convert RAW files to centroided mzXML files

• Save a copy of ReAdW.exe into the folder with your RAW data

• Click „Convert RAW to mzXML files‟ to run conversion

Page 4: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Data Pre-processing

• Step 3

• If files are from Exactive, split into

Pos and Neg using the Blue button

• Step 4

• Select „positive‟ or „negative‟ mode in cell K1

(only process one mode at a time)

• For each polarity: sort replicate .mzXML files into folders

according their experimental groups (sets).

• Check over the blue-shaded settings for mass, RT and Relatedpeaks

windows, and RSD filter. (xcms parameters can be changed in the macro)

• Run xcms/mzMatch with the purple „Combined Button‟

NOTE: Files must be sorted into sets (folders) to run RSD filter

NOTE: If xcms crashes in negative mode try selecting „mzData alt

method‟ in cell K2

• mzMatch output files will be saved in the folder with your files

Page 5: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Peak data import

• Step 1

• Import mzMatch output file “combined_related.txt” using the big

Red button (Settings sheet)

• Manually check that replicate samples are in adjacent columns

(if not, get cutting and pasting!)

• Step 2

• On the “Settings” sheet, enter the number of replicates in each

set (column F)

NOTE: if you have named samples with set prefixes, the next

Green button will do this for you

• Choose the Set-Type for each set using drop-down options in

column C

NOTE: hover mouse over cell C8 for more information

Page 6: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Update metabolite DB

• Step 1

• Externally, prepare a list of actual retention times for

authentic standards analysed under your current

chromatographic conditions. Any excel-readable file with

name, RT and mass (optional) in columns can be directly

imported. ToxID is good for this.

NOTE: names must exactly match those in DB. (except

that “,” can be replaced by “_” )

• Step 2

• Select the „Rtcalculator‟ sheet

• Enter the dead-volume time for your chromatographic

column (cell O9)

• Scroll to the right and manually update expected retention

times for given Pathways, Maps and Properties (if known)

• (optional) enter metabolite names and RT‟s for authentic

standards in columns A:B and W:X

NOTE: These can be entered automatically from an

external excel/tsv/csv file in step 3

Page 7: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Update metabolite DB

• Step 3

• Run the „Update Retention Times in DB‟ macro from either

„Settings‟ or „Rtcalculator‟ sheet

• If the prediction model looks good (ie r2 > 0.6), agree to update

RT‟s in DB, otherwise try altering the variables (cells E1:J1) to

suit your chromatography, and re-run the macro

• Step 4 (optional)

• If you have a species-specific database (eg. From metacyc or

KEGG) enter these annotations in column G (“PreferredDB”) of

the DB sheet.

NOTE: This can be simplified by matching database identifiers

using Excel‟s „Vlookup‟ function

• Select the entire database and Custom Sort: sort by „searchmass‟

(ascending) then by „PreferredDB‟ (ascending) to ensure

annotated metabolites are at the top of the list of each group of

isomers.

Page 8: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Run Metabolite Identification

• Step 1

• On the “Settings” sheet, check over the settings in columns F and I

are suitable

Most commonly changed settings are:

• Identification RT windows (F3 and F4) and mass window (F6)

• RT window for duplicate peaks (I9)

• MaxIntensity cutoff (I10)

• Select the adducts (cells K15:K21) that you wish to include in the

identification search

• Step 2

• Click „Run Identification Macro‟ on the Settings sheet

• This could take from 2 to 20 minutes

• Save the file as soon as the macro is finished

Page 9: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Metabolite Identification: Process

• Metabolite Identification Macro

• This macro annotates information to every peak in the „alldata‟ sheet

• Apon completion, all basepeaks are copied to the „allBasePeaks‟ sheet

• All identifications with confidence < 5 are copied to the „notlikely‟ sheet

• All identification with confidence => 5 are copied to the „identification‟ sheet

The identifications sheet is then checked for duplicates and shoulder peaks, and these are

moved to the „notlikely‟ sheet

Page 10: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Metabolite Identification: Process

• Peak Information columns

• A: neutral exact mass (from mzMatch)

• B: Retention Time (from mzMatch) in minutes

• C: Formula from DB with closest match to mass (if within ppm window)

• D: Number of isomers in DB with this exact formula

• E: Metabolite name: best match from DB for this mass and RT

• F: Confidence level according to parameters on „settings‟ sheet

• G: Records whether the metabolite is in a „preferred database‟ (from DB)

• H: Map: the general area of metabolism for this metabolite (usually from KEGG)

NOTE: column H can be changed by choosing a different header in cell H1

• I: mass error (in ppm) from nearest match in DB (if within 2 x ppm window)

• J: RT error relative to authentic standard (white) or predicted RT (grey) as % of RT

• K: altppm: mass error for the next closest mass in the DB (if within ppm window)

• L: Sig: records which sample sets are significant (peaks > blank and RSD < window)

Page 11: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Metabolite Identification: Process

• Peak Information cont.

• M: BP: Basepeak for that peak

• N: Mzdiff: mass difference between this peak and the basepeak

For basepeaks this column records common adducts/fragments/isotopes that were found

• O: relation.ship: relationship to the basepeak (according to mzMatch)

• P: addfrag: common adduct, fragment or neutral-loss

• Q: % error of C13-isotope intensity from theoretical

• R: % error of isotope intensity from theoretical for (Cl, S, N, O or H)

• S: RSD for QC samples (or for Treatment if no QC)

• T: minimum RSD for all included sample sets

• U: maximum intensity from all included sets

• V: Relation.id (from mzMatch)

• W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟

• X: P-value for unpaired T-test between „treatments‟ and „controls‟

• Y: Adduct of formula match to mass (ie H, Na, double-charge, etc)

• Z: Polarity

• AA: Number of detected peaks in included sets

Page 12: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Re-calibrate mass accuracy

• Step 1

• On the “Settings” or „Identification‟ sheet, click the “ppm check” button

• If the polynomial curve looks like a good fit, agree to re-calibrate

masses, otherwise, investigate the mass calibration manually

• Step 2

• Sort the „identification‟ sheet by ppm error (use the blue „sort‟ button)

• Remove metabolites with large errors (>1.5 ppm) by cut/paste to the

„notlikely‟ sheet

NOTE: easiest to manually annotate all mis-annotated peaks (in

column F), re-sort and move them all at once

NOTE: delete rows that have been removed (even if they appear

empty) to speed up processing

• Double-check the „altppm‟ column for alternate identifications before

you remove peaks

Page 13: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Manual Data Filtration

• Step 1 – recover false rejections

• Go to the „notlikely‟ sheet, check for „false rejections‟, particularly

with confidence of 4. (technical judgement required)

• Cut/paste false rejections onto the „identification‟ sheet

• Step 2 – manual filtration

• On the „Identification‟ sheet, check for „false positives‟ and move

to „notlikely‟ sheet by cut/paste, or by the „remove row‟ button

• Press the „colouring‟ button to make interpretation easier

• Press the „hyperlink‟ button to activate weblinks

• Use the Sort functions, info-boxes, graphs and hyperlinks to assist

(columns B,D,K,L,W)

• Step 3 – manual identification

• On the „Identification‟ sheet, check for duplicate identifications,

and choose alternative isomers where appropriate

Page 14: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

• Manual Filtration: suggested process

1. Related Peaks (mass difference, neutral loss)

2. Retention Time limits (min, max, %error)

3. Adduct likelihood (2+ or Na+)

4. Isomers (split peaks, duplicate identifications)

5. Isotopic abundance (C13 isotope, other unique isotopes)

6. Peak shape (check chromatogram if codadw < 0.95)

7. Biological likelihood (related pathways, common contaminants)

Manual Data Filtration

Page 15: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Biological data analysis

• Step 1

• If you have exactive pos/neg data, run the „Combine Pos/Neg‟

function after processing each set individually

• Step 2

• Run the Intensity comparison macro from the „Identification‟ sheet

or „settings‟ sheet by clicking „Compare All Sets‟. This calculates

mean and SD for each set and compares each set to the

designated „control‟ group (relative intensity and t-test).

• Step 3

• In the „Comparison‟ sheet, sort data by your column of interest:

• Relative intensity vs control

• P-value (t-test) vs control

• Metabolite Map or KEGG Pathway

• Use buttons at the top to plot graphs or export to motif/metexplore

Page 16: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Multivariate analysis

• Step 1

• This template doesn‟t incorporate functionality for multivariate

analysis, use the light blue Export button to export either

„allBasePeaks‟ or „Identifications‟ to Metaboanalyst, or R/matlab/etc

for further analysis

• Step 2

• If you wish to analyse all Basepeaks, run the „assign Basepeaks‟

macro to help with annotation

• Step 3

• Unidentified masses can be investigated by clicking the empty

„formula‟ (C) cell – this will run FormulaGenerator in R

Page 17: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Other Features

• Additional Macros:

• Isotope Search • for untargeted metabolic labelling studies

• C13, N15 and O18 supported

• Combine Datasets • combines negative and positive data (from same column)

• Formula Generator • Identify formulae for unknown masses (uses rCDK)

• Checks validity of formulae against “Fiehn‟s Golden Rules”

Page 18: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Other Features

• Additional Functions (Excel formula’s):

• FormulaMatch – looks up a mass in the database

• ExactMass – calculates exact mass of a formula

• PPMcalc – calculates the mass error from a given mass or formula

• IsotopeAbundance – Calculates the theoretical isotopic abundance for a

given atom in a formula

• FormulaValid – checks formula validity against 5 Golden Rules

• AtomCount – returns the number of specified atoms in a formula

• Pos – calculates the positive charge at a given pH (given # cations & basic pKa‟s)

• Neg - calculates the negative charge at a given pH (given # anions & acidic pKa‟s)

Page 19: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

FAQ

• WHERE TO START... which sheet?

All automated functions can be run from the Settings sheet.

After automated filtration and identification you can do manual curation on the „identification‟ sheet, including the

mass re-calibration. Additional metabolites can be retrieved from the „notlikely‟ (or „allBasePeaks‟) sheet simply by

using the cut/paste functions in Excel; it is recommended to cut/paste whole rows rather than individual cells. The

easiest approach for meaningful biochemical analysis is to run the „Compare all‟ function and sort the „Comparison‟

sheet according to your interests. Additional columns (eg. stats, normalised intensities, other information) can

always be added to the right of the existing data without affecting macro performance.

• POLARITY:

The polarity is automatically corrected by mzMatch.R during the peak picking process, and all masses that appear

in the Template are corrected neutral masses. Ensure that you set the correct 'polarity' option on the 'settings'

sheet before running anything. The polarity setting is also useful for combining positive and negative mode data,

and for the quicklink to Xcalibur qualbrowser EICs. (i.e. whether to add or subtract a proton to get from neutral

mass back to m/z).

Note: Due to the automatic polarity correction by mzMatch, the masses of cations in the database have been

corrected by one proton. (eg. The mass of choline in the DB is 103, rather than actual mass of 104).

Page 20: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

FAQ

• WHICH FILE TO USE FOR THE RETENTION TIME UPDATER?

You need to manually generate a list of retention times for authentic standards under the current LC conditions.

The simplest way is to use Toxid (or similar), otherwise do it manually from raw data.

The retention time updater has been tested on Toxid .csv output files. However it should work for any excel-

readable file that has a column for metabolite names and a column for retention times. (Note: the metabolite name

must be identical to the name in the database - the only exception is that underscore "_" may be used in the place

of comma "," to avoid issues with .csv files).

• IF IT RUNS SLOWLY?

The peak-picking process in XCMS is quite slow, this can be left to run overnight if you have many samples. The

speed of mzmatch.R functions and Excel macros will depend on the number of samples, number of detected

peaks, and your computer speed. Speed can be improved by applying tighter filters earlier in the process (eg.

Peak picking parameters and RSD filter), however this may cause loss of some peaks of interest.

Visualisation of results in Excel can be slow if there are many active formulas. Try turning automatic calculation off,

de-activating Hyperlinks, or running the „Trim file size‟ macro.

Page 21: mzMatch Excel Template - University of Strathclyde · • V: Relation.id (from mzMatch) • W: Peak Intensity ratio for mean of „treatments‟ vs mean of „controls‟ • X: P-value

Any Further Questions/Ideas

mzMatch information available at:

Mzmatch.sourceforge.net

Xcms information available at:

metlin.scripps.edu/xcms/

Information about this mzMatch template available directly from:

Dr Darren Creek

University of Glasgow

[email protected]

[email protected]