Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

31
1 Welcome! ass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind Course 1: General Introduction http://fiehnlab.ucdavis.edu/staff/kind CC-BY License

description

Chemistry. Biology. Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind Course 1: General Introduction. Informatics. http://fiehnlab.ucdavis.edu/staff/kind. CC-BY License. What is ChemInformatics?. Chemometrics est. 1975 Cheminformatics est. 1998. - PowerPoint PPT Presentation

Transcript of Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

Page 1: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

1

Welcome!

Mass Spectrometry meets ChemInformaticsWCMC Metabolomics Course 2013

Tobias Kind

Course 1: General Introduction

http://fiehnlab.ucdavis.edu/staff/kindCC-BY License

Page 2: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

2

What is ChemInformatics?

Chemistry

Statistics

Informatics

Mathematics

Chemometrics est. 1975Cheminformatics est. 1998

Page 3: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

3

Who uses Cheminformatics?All parts of chemistry heavily depend on cheminformatics.Life sciences, biochemistry, drug industries use cheminformatics.

20 years ago: 80% in lab – 20% in front of computerNow: 20% in lab - 70% in front of computer (*)

Examples:

• Organic chemistry – automated reaction planning, Beilstein search• Physical chemistry – modeling of structure properties (boiling points)• Inorganic chemistry – ligand bond interactions• Analytical chemistry – structure elucidation of small compounds• Biochemistry – protein/small molecule interaction networks

PhD(*) 10% fixing and installing new programs

Page 4: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

4

Motivation for Mass Spectrometry meets ChemInformatics

To be a master of spectra you need to be a master of structures in the first place.

(nist_m sm s) V inc ristine260 310 360 410 460 510 560 610 660 710 760 810

0

50

100

265 353 395 455 513 538604

636

676

705

723

747

765

807

NHO

O

NO H

HOON

OO

N

O

O

O

Complex MS data interpretations only possible with software MS data obtained by hyphenated techniques (GC-MS, LC-MS) Mass spectral database search and structure search routinely are used Mass spectrometers deliver multidimensional data

Page 5: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

5

Computer Illiteracy – a threat to your researchYour computer is your friendYou don’t have a computer? You don’t have a friend (just kidding)

• Assume you have a computer:Please step forward name: CPU, speed, memory, hard disk, OS

• You are a chemist, biochemist, biologist: Please step forward name: Computer language or DB you know

OS = operating system; DB = database, CPU = central processing unit

PDP-11 www.bell-labs.com

Page 6: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

6

Fighting Computer Illiteracy - name your PC

CPU INTEL,AMD,IBM,HP Pentium, Opteron, Xeon 12-20 Core

Memory DDR, DDR2 GEIL, KINGSTON 16-128 GByte

Hard disk SEAGATE, WD Raptor, Barracuda, Cheetah100-1000 GByte

OS MICROSOFT, LINUX Windows, Linux, OSX, Virtual OS

Language C, Basic, Perl, JAVA

Bit < Byte < kByte < MByte < GByte < TByte

Single Core < Dual Core < QuadCore < MultiCore

MFLOP/s < GFLOP/s < TFLOP/s < PFLOP/s

1 Thread < Dual Thread < MultiThreaded

Cray 2 in rot, Nixdorfmuseum, 2004,

Page 7: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

7

The free lunch is over – multithreading needed

Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm

NO YES

Can your metabolomics software use multiple CPUs?

Page 8: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

8

The free lunch is over – multithreading needed

Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm

Course example MZMINE alignment (7 files -18 min LC-MS) Single core vs. multi-core

50 seconds

3:29 minutes

Mors certa, hora incerta!

Page 9: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

9

Best recommendation ever for slow computersInstall an SSD!

Single hard diskSeagate 750 GB

SSD RAID10Samsung 830 (2 TB)

RAMDISKOSFMount

SSDs and Ramdisks have 200 to 1000-fold(!) 4k speed. 4k speed matters.

Page 10: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

10

Computer Illiteracy – learn a programming language

Why should you?

20% lab time – 80% computer timeMass spectrometers deliver data – not results

Why shouldn't you? (fake reasons)

You are too old to learn…You are not good with computers…Your have more important research to do…You are so rich you have programmers who work for you…

Picture Source: WIKI James Manners from Genova, Italia

Page 11: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

11

Computer Illiteracy – learn a programming language

• Learn any language which has a large code and user base (JAVA, Perl, Visual Basic)• Use IDEs with automatic code completion like MS Visual Express or Eclipse• Don’t re-invent code - use (and document) code search engines like

http://code.ohloh.net/ (formerly koders now ohloh); google.com/codesearchhttp://krugle.orgmoOMoOMoOMoOMoOmoOMoOMoOMoOMoOMoOMoOMoOMo

OMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMMMmoOMMMommMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMMMmoOMMMMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO

Language “cow” Language “brainfuck”

Do *not* learn these working but esoteric languagesThere are 1123 programming languages http://99-bottles-of-beer.net/

>>++++++++[<++++>-] >++++++++++++++[<+++++++>-] +>+++++++++++[<++++++++++>-] ++>+++++++++++++++++++[<++++++>-] ++>+++++++++++++++++++[<++++++>-] >++++++++++++[<+++++++++>-]

Page 12: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

12

Program development – Eclipse for JAVA example

Projects

JAVA or C code

Text output

Page 13: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

13

Your computer Illiteracy – your emergency helpersRegular expressions; SQL database requests; EXCEL VBA scripts or Perl scripts are special tools for data handling (Swiss army knifes) Regular expressions (RegEx) are used for finding and replacing text

[0-9] – represents all numbers Examples: \n\n – find double empty lines[a-z] – represents all small letters find \t replace with spaces “ “\n – represents new line (CR/LF) find two numbers in brackets ([0-9][0-9])\t – represents TAB

yr subject winner1901 Chemistry

Jacobus H. van 't Hoff

1902 Chemistry Emil

Fischer1903 Chemistry

Svante Arrhenius

1904 Chemistry Sir William

Ramsay1905 Chemistry

Adolf von Baeyer

1906 Chemistry Henri

Moissan1907 Chemistry

Eduard Buchner

1908 Chemistry Ernest

Rutherford1909 Chemistry

Wilhelm Ostwald

1910 Chemistry Otto

Wallach1913…

SELECT yr, subject, winner FROM nobel WHERE yr = 1909 and subject = 'chemistry'

yr subject winner1909 Chemistry Wilhelm Ostwald

Large Database Table SQL query Result

Visit the SQL Zoo

SQL is used for programming databases

Learn about RegEx

Page 14: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

14

Regular Expressions – example MS dataTask: create a list of 4 columns with names, formulas, CAS numbers and peaksProblem: 24,000 lines of mass spectral data (*.msp)Program: Textpad (WIN), Smultron (Mac)

Number of lines in text

(mainlib ) 2,5-P yrro lid ined ione, 1-methyl-3-phenyl-10 30 50 70 90 110 130 150 170 190

0

50

100

14 28 39 51 6378

89

104

117 131 160

189

ON

O

(m/z - intensity pair)

Enter (CR/LF) in gray

Page 15: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

15

Regular Expressions – example MS data Solution: replace Enter (\n) with TAB (\t) and use Replace ALL

Result: Metadata in one line

1

2

3

Page 16: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

16

Regular Expressions – example MS data Solution: copy only lines of interest (Mark ALL – Copy Bookmarked Lines)

Page 17: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

17

Regular Expressions – Result for MS data Solution: Replace redundant code with nothing, copy tab separated file to EXCEL

Result: 1:30 min for RegEx job(1 hour manually?)

Average spectrum size: 70 peaksMinimum size: 5 peaksMaximum size: 439 peaksMost spectra have 35 and 45 peaks

Page 18: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

18Try Marvin Space via Webstart

Be prepared – visualize your structures

Page 19: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

19

Calculation of tetrahedral and double bond stereoisomersHow many stereoisomers can you expect from glucose (KEGG)?Example: separation of species with ion mobility MS (FAIMS)

Example calculated with MarvinView (via JAVA Webstart)

O

HO

HO

OH

OH

OH

Glucose

Page 20: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

20

Computation of resonance forms (electron shifts)What are possible resonant structures?Important for mass spectral interpretation (electron impact, electrospray)

OH

Phenol

Example calculated with MarvinView Start via WebStart

Page 21: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

21

Generation of tautomers using MSketchHow many tautomers can you expect?Important for mass spectral interpretations and LC-MS.

H3C O

O

CH3

Methyl acetate

Example calculated with MarvinView Start via WebStart

Derivatization in GC-MS and LC-MS solves the tautomer problemCommon tautomerisms: Enol/Keto, Lactams, Amines/Imines, Amides/Imides

Page 22: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

22

Property calculations on chemicalize.org

Page 23: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

23

Mass spectral database search – know what existsHow many mass spectra with formula C11H8O3 in NIST DB?

Result: 19 for C11H8O3 in NIST05 DBDownload NIST-MS-Search

Page 24: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

24

Mass spectral interpretationAssign structural elements to mass spectral peaks

Download Mass Spectrum Interpreter Version 2

Page 25: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

25http://www.hmdb.ca/metabolites/HMDB09837

Mass Spec Scissors (ACDLabs Free)Q: What is peak m/z 281 in negative mode?

Page 26: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

26

Molecular Weight Calculator

522.00 524.00 526.00 528.00 530.00 532.000.0

20.0

40.0

60.0

80.0

100.0

Calculate isotopic massesFind formulas from massesCalculate isotopic patterns

Download MWTWIN

Page 27: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

27

Structure search – know what could be possibleHow many compounds (isomer structures) are found in public databases?

Result:272 for C11H8O3

http://www.chemspider.com/

Page 28: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

28

Stay tuned – new mass spectrometry publicationsvia Yahoo Pipes

[LINK][RSS]

Page 29: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

29

Be open minded – NMR can do some things better

ChenomX Profiler – with 312 pH and frequency tuned reference spectra

2D-NMR needed for de-novo structure elucidationNMR metabolic profiling is highly reproducible with low variance

Page 30: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

30

NMR prediction with ChemAxon Msketch

Page 31: Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

31

The Last Page - What is important to remember:

Learn about CPU type, memory, hard disks, bits and bytes; shock you colleagues with random questions about their computer

Think about automation, thinks you would like to do (even if you can’t) shock you colleagues with a small computer script

Use regular expressions for stupid or boring jobs you delete/replace data more than 3x - remember RegEx, RegEx, Regex

Use scripting languages for small problems (EXCEL VBA, PERL) steal some small examples and color your EXCEL data in rainbow color

Generate yourself a collection of programs and databases for MS try such programs in a Virtual Machine without messing up your system