Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

Post on 04-Feb-2016

53 views 0 download

Tags:

description

Chemistry. Biology. Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind Course 1: General Introduction. Informatics. http://fiehnlab.ucdavis.edu/staff/kind. CC-BY License. What is ChemInformatics?. Chemometrics est. 1975 Cheminformatics est. 1998. - PowerPoint PPT Presentation

Transcript of Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind

1

Welcome!

Mass Spectrometry meets ChemInformaticsWCMC Metabolomics Course 2013

Tobias Kind

Course 1: General Introduction

http://fiehnlab.ucdavis.edu/staff/kindCC-BY License

2

What is ChemInformatics?

Chemistry

Statistics

Informatics

Mathematics

Chemometrics est. 1975Cheminformatics est. 1998

3

Who uses Cheminformatics?All parts of chemistry heavily depend on cheminformatics.Life sciences, biochemistry, drug industries use cheminformatics.

20 years ago: 80% in lab – 20% in front of computerNow: 20% in lab - 70% in front of computer (*)

Examples:

• Organic chemistry – automated reaction planning, Beilstein search• Physical chemistry – modeling of structure properties (boiling points)• Inorganic chemistry – ligand bond interactions• Analytical chemistry – structure elucidation of small compounds• Biochemistry – protein/small molecule interaction networks

PhD(*) 10% fixing and installing new programs

4

Motivation for Mass Spectrometry meets ChemInformatics

To be a master of spectra you need to be a master of structures in the first place.

(nist_m sm s) V inc ristine260 310 360 410 460 510 560 610 660 710 760 810

0

50

100

265 353 395 455 513 538604

636

676

705

723

747

765

807

NHO

O

NO H

HOON

OO

N

O

O

O

Complex MS data interpretations only possible with software MS data obtained by hyphenated techniques (GC-MS, LC-MS) Mass spectral database search and structure search routinely are used Mass spectrometers deliver multidimensional data

5

Computer Illiteracy – a threat to your researchYour computer is your friendYou don’t have a computer? You don’t have a friend (just kidding)

• Assume you have a computer:Please step forward name: CPU, speed, memory, hard disk, OS

• You are a chemist, biochemist, biologist: Please step forward name: Computer language or DB you know

OS = operating system; DB = database, CPU = central processing unit

PDP-11 www.bell-labs.com

6

Fighting Computer Illiteracy - name your PC

CPU INTEL,AMD,IBM,HP Pentium, Opteron, Xeon 12-20 Core

Memory DDR, DDR2 GEIL, KINGSTON 16-128 GByte

Hard disk SEAGATE, WD Raptor, Barracuda, Cheetah100-1000 GByte

OS MICROSOFT, LINUX Windows, Linux, OSX, Virtual OS

Language C, Basic, Perl, JAVA

Bit < Byte < kByte < MByte < GByte < TByte

Single Core < Dual Core < QuadCore < MultiCore

MFLOP/s < GFLOP/s < TFLOP/s < PFLOP/s

1 Thread < Dual Thread < MultiThreaded

Cray 2 in rot, Nixdorfmuseum, 2004,

7

The free lunch is over – multithreading needed

Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm

NO YES

Can your metabolomics software use multiple CPUs?

8

The free lunch is over – multithreading needed

Herb Sutter (MS): http://www.gotw.ca/publications/concurrency-ddj.htm

Course example MZMINE alignment (7 files -18 min LC-MS) Single core vs. multi-core

50 seconds

3:29 minutes

Mors certa, hora incerta!

9

Best recommendation ever for slow computersInstall an SSD!

Single hard diskSeagate 750 GB

SSD RAID10Samsung 830 (2 TB)

RAMDISKOSFMount

SSDs and Ramdisks have 200 to 1000-fold(!) 4k speed. 4k speed matters.

10

Computer Illiteracy – learn a programming language

Why should you?

20% lab time – 80% computer timeMass spectrometers deliver data – not results

Why shouldn't you? (fake reasons)

You are too old to learn…You are not good with computers…Your have more important research to do…You are so rich you have programmers who work for you…

Picture Source: WIKI James Manners from Genova, Italia

11

Computer Illiteracy – learn a programming language

• Learn any language which has a large code and user base (JAVA, Perl, Visual Basic)• Use IDEs with automatic code completion like MS Visual Express or Eclipse• Don’t re-invent code - use (and document) code search engines like

http://code.ohloh.net/ (formerly koders now ohloh); google.com/codesearchhttp://krugle.orgmoOMoOMoOMoOMoOmoOMoOMoOMoOMoOMoOMoOMoOMo

OMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMMMmoOMMMommMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMMMmoOMMMMoOMoOMMMmoOMMMMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO MoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoOMoO

Language “cow” Language “brainfuck”

Do *not* learn these working but esoteric languagesThere are 1123 programming languages http://99-bottles-of-beer.net/

>>++++++++[<++++>-] >++++++++++++++[<+++++++>-] +>+++++++++++[<++++++++++>-] ++>+++++++++++++++++++[<++++++>-] ++>+++++++++++++++++++[<++++++>-] >++++++++++++[<+++++++++>-]

12

Program development – Eclipse for JAVA example

Projects

JAVA or C code

Text output

13

Your computer Illiteracy – your emergency helpersRegular expressions; SQL database requests; EXCEL VBA scripts or Perl scripts are special tools for data handling (Swiss army knifes) Regular expressions (RegEx) are used for finding and replacing text

[0-9] – represents all numbers Examples: \n\n – find double empty lines[a-z] – represents all small letters find \t replace with spaces “ “\n – represents new line (CR/LF) find two numbers in brackets ([0-9][0-9])\t – represents TAB

yr subject winner1901 Chemistry

Jacobus H. van 't Hoff

1902 Chemistry Emil

Fischer1903 Chemistry

Svante Arrhenius

1904 Chemistry Sir William

Ramsay1905 Chemistry

Adolf von Baeyer

1906 Chemistry Henri

Moissan1907 Chemistry

Eduard Buchner

1908 Chemistry Ernest

Rutherford1909 Chemistry

Wilhelm Ostwald

1910 Chemistry Otto

Wallach1913…

SELECT yr, subject, winner FROM nobel WHERE yr = 1909 and subject = 'chemistry'

yr subject winner1909 Chemistry Wilhelm Ostwald

Large Database Table SQL query Result

Visit the SQL Zoo

SQL is used for programming databases

Learn about RegEx

14

Regular Expressions – example MS dataTask: create a list of 4 columns with names, formulas, CAS numbers and peaksProblem: 24,000 lines of mass spectral data (*.msp)Program: Textpad (WIN), Smultron (Mac)

Number of lines in text

(mainlib ) 2,5-P yrro lid ined ione, 1-methyl-3-phenyl-10 30 50 70 90 110 130 150 170 190

0

50

100

14 28 39 51 6378

89

104

117 131 160

189

ON

O

(m/z - intensity pair)

Enter (CR/LF) in gray

15

Regular Expressions – example MS data Solution: replace Enter (\n) with TAB (\t) and use Replace ALL

Result: Metadata in one line

1

2

3

16

Regular Expressions – example MS data Solution: copy only lines of interest (Mark ALL – Copy Bookmarked Lines)

17

Regular Expressions – Result for MS data Solution: Replace redundant code with nothing, copy tab separated file to EXCEL

Result: 1:30 min for RegEx job(1 hour manually?)

Average spectrum size: 70 peaksMinimum size: 5 peaksMaximum size: 439 peaksMost spectra have 35 and 45 peaks

18Try Marvin Space via Webstart

Be prepared – visualize your structures

19

Calculation of tetrahedral and double bond stereoisomersHow many stereoisomers can you expect from glucose (KEGG)?Example: separation of species with ion mobility MS (FAIMS)

Example calculated with MarvinView (via JAVA Webstart)

O

HO

HO

OH

OH

OH

Glucose

20

Computation of resonance forms (electron shifts)What are possible resonant structures?Important for mass spectral interpretation (electron impact, electrospray)

OH

Phenol

Example calculated with MarvinView Start via WebStart

21

Generation of tautomers using MSketchHow many tautomers can you expect?Important for mass spectral interpretations and LC-MS.

H3C O

O

CH3

Methyl acetate

Example calculated with MarvinView Start via WebStart

Derivatization in GC-MS and LC-MS solves the tautomer problemCommon tautomerisms: Enol/Keto, Lactams, Amines/Imines, Amides/Imides

22

Property calculations on chemicalize.org

23

Mass spectral database search – know what existsHow many mass spectra with formula C11H8O3 in NIST DB?

Result: 19 for C11H8O3 in NIST05 DBDownload NIST-MS-Search

24

Mass spectral interpretationAssign structural elements to mass spectral peaks

Download Mass Spectrum Interpreter Version 2

25http://www.hmdb.ca/metabolites/HMDB09837

Mass Spec Scissors (ACDLabs Free)Q: What is peak m/z 281 in negative mode?

26

Molecular Weight Calculator

522.00 524.00 526.00 528.00 530.00 532.000.0

20.0

40.0

60.0

80.0

100.0

Calculate isotopic massesFind formulas from massesCalculate isotopic patterns

Download MWTWIN

27

Structure search – know what could be possibleHow many compounds (isomer structures) are found in public databases?

Result:272 for C11H8O3

http://www.chemspider.com/

28

Stay tuned – new mass spectrometry publicationsvia Yahoo Pipes

[LINK][RSS]

29

Be open minded – NMR can do some things better

ChenomX Profiler – with 312 pH and frequency tuned reference spectra

2D-NMR needed for de-novo structure elucidationNMR metabolic profiling is highly reproducible with low variance

30

NMR prediction with ChemAxon Msketch

31

The Last Page - What is important to remember:

Learn about CPU type, memory, hard disks, bits and bytes; shock you colleagues with random questions about their computer

Think about automation, thinks you would like to do (even if you can’t) shock you colleagues with a small computer script

Use regular expressions for stupid or boring jobs you delete/replace data more than 3x - remember RegEx, RegEx, Regex

Use scripting languages for small problems (EXCEL VBA, PERL) steal some small examples and color your EXCEL data in rainbow color

Generate yourself a collection of programs and databases for MS try such programs in a Virtual Machine without messing up your system