Computational analysis of metagenomic data: delineation of compositional features and screens for...

42
Intro GC Protein detection Nitrilases NHases PKS THM Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes Konrad U. F¨ orstner Bork Group, EMBL Promotionskolloquium 04. Februar 2009

description

PhD defense / Promotionsverteidigung, 2009, February 4th, Würzburg, Germany

Transcript of Computational analysis of metagenomic data: delineation of compositional features and screens for...

Page 1: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Computational analysis of metagenomic data: delineation ofcompositional features and screens for desirable enzymes

Konrad U. Forstner

Bork Group, EMBL

Promotionskolloquium

04. Februar 2009

Page 2: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 3: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 4: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

For the microbial ecologist, what can be cultured is the basis of hisconception of what exists. This is exactly like learning aboutanimals from visiting zoos.

Carl Woese

Page 5: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Page 6: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Page 7: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Page 8: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Page 9: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Great plate count anomaly

Less than 1% of the microbes can be cultured under standardconditions.

Page 10: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Metagenomics=

culture independent approaches

Page 11: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Workflow of metagenomics sequencing

Page 12: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Selected metagenomic data sets

Page 13: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Challenges

Usually a low coverage

Dominant species

Short sequences

Data size

⇒ storage/memory/CPU intensive⇒ software not developed for that

No standard protocols

⇒ hard to compare

Page 14: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 15: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

GC analysis

GC content = percentage ofGuanine-Cytosine bp in theDNA/RNA

influences a.o.

Melting temperature of DNA/RNACodon usage

Page 16: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

GC analysis - huge difference between soil and ocean water

Foerstner et al., 2005

Page 17: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

GC analysis - further data confirms statement

Raes et al., 2007

Page 18: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

GC analysis - possible influencing factors

Nitrogen availability

Genome size

Ultraviolet light exposure andrepair mechanism

Codon usage of pioneers

Page 19: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 20: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Metagenomics data sets as resources of biotech enzymes

Many microbial enzymes areessential tools in e.g. the chemical,pharma and food industries

Searching in metagenomic datasets might reveal new potentmembers of known enzymes classes

Page 21: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Protein detection and classification workflow

Page 22: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Nitrilases

Nitrile + water carboxylic acids + ammonia

One protein

Application in the chemical industry

Stereo- and regio-specific conversion of nitriles

Page 23: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Nitrilases - new members and subfamilies found

Raes et al., 2007

Page 24: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 25: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

NHases

Nitril hydratases (NHases)

Nitrile + water amide

Two domains

Application in the chemical industry

Acrylamide >30,000 tons/yearNicotinamide >3500 tons/year

Waste water treatment

Page 26: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

NHases - tree of the α domain

Foerstner et al., 2008

Page 27: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

NHases - Monosiga brevicollis’ taxomony

Page 28: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

NHases - in Monosiga brevicollis

Foerstner et al., 2008

Page 29: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 30: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I

Polyketide synthases (PKS) create a heterogeneous group ofsecondary metabolites

The synthesis is similar to the fatty acid synthesis

Multiple domains

We focused on polyketide synthases type I (PKS I)

Page 31: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - polyketide synthesis steps

This picture of this slide is removed due to copyright restriction.Jenke-Kodama et al., 2005

Page 32: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - examples of polyketides

Erythromycin Oleandomycin Aflatoxin B1

Page 33: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - tree of the AT domain HMM hits

Foerstner et al., 2008

Page 34: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - tree overview

Foerstner et al., 2008

Page 35: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - hit distribution

Foerstner et al., 2008

Page 36: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

PKS I - PKSs per genome

Foerstner et al., 2008

Page 37: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Table of content

1 Introduction

2 GC content analysis

3 Protein detection workflow

4 Nitrilases

5 Nitril hydratases

6 Polyketide synthases I

7 Take home messages

Page 38: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Take home messages

Metagenomics ...

... might help us to explore the complete microbial world

... still has many technical challenges

... can reveal the environmetal influence on genomic features

... can help discover new enzymes

Page 39: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Acknowledgements

Peer Bork

Thomas Dandekar

Lars Steinmetz

Toby Gibson

The whole Bork group esp. JeroenRaes and Takuji Yamada

Christian von Mering

Melly

My friends and family

Page 40: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Image sources/attribution - part 1/2

Orangutan Houston Zoo http://flickr.com/photos/billtex48/2178056762/ by (Bill and Mavis) -B&M

Opel Zoo 07.07.2007 http://flickr.com/photos/lamberty/754218458 by frijolito75

Giraffe http://flickr.com/photos/abelle/280246250/ by A.Bell

Snuggling http://flickr.com/photos/buckwoo/2421562192/ by Ken W!

Delicious Dead Bee and Hungry Ants http://flickr.com/photos/hamed/176176998/ by Hamed Saber

hundreds of fish swarm a soft coral head http://flickr.com/photos/g-na/370131126/ by g-na

hunt is on http://flickr.com/photos/doug88888/2930690305/ by doug88888

Long-billed Curlew http://flickr.com/photos/mikebaird/3011987508/ by mikebaird

145ps 01087.jpg http://flickr.com/photos/ricephotos/2679758872/ by IRRI Images

Polymicrobic biofilm epifluorescencehttp://commons.wikimedia.org/wiki/File:Polymicrobic_biofilm_epifluorescence.jpg

The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical PacificRusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. PLoS Biology Vol. 5, No. 3, e77doi:10.1371/journal.pbio.0050077

green farm http://flickr.com/photos/nakae/204037619/ by nakae

Acid Mine Drainage http://flickr.com/photos/savethewildup/400614071/ by savethewildup

blue ocean http://flickr.com/photos/coolskipper/27242821/ by coolskipper

Digestive system http://commons.wikimedia.org/wiki/File:Digestive_system_whitout_labels.svg

by Mariana Ruiz Villarreal

Pg166 bioreactor http://commons.wikimedia.org/wiki/File:Pg166_bioreactor.jpg

Page 41: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

Image sources/attribution - part 2/2

Big Drop-Off [...] http://flickr.com/photos/ctsnow/113339176/ by ctsnow

Sphaeroeca-colony http://commons.wikimedia.org/wiki/File:Sphaeroeca-colony.jpg by Dhzanette

Ocean view http://flickr.com/photos/provoost/399669002/ by Sjors Provoost

The hurdles http://flickr.com/photos/29621494N02/3060466344/ by paula fisher

Erythromycin http://de.wikipedia.org/w/index.php?title=Datei:Erythrommycin_A_B_C.svg byYikrazuul

Aflatoxin B1 http://de.wikipedia.org/w/index.php?title=Datei:

Aflatoxin_B1.svg&filetimestamp=20070113042046 by Bryan Derksen

Oleandomycin http://en.wikipedia.org/wiki/File:Oleandomycin.png by Edgar181

Tool rack http://en.wikipedia.org/wiki/File:Oleandomycin.png by L. Marie

Collaboration http://flickr.com/photos/fncll/145149313/ ChrisL AK

Base pair AT http://commons.wikimedia.org/wiki/File:Base_pair_AT.svg

Base pair GC http://commons.wikimedia.org/wiki/File:Base_pair_GC.svg

Page 42: Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes

Intro GC Protein detection Nitrilases NHases PKS THM

About this document

Created in LATEX using the beamer class, TeX Live and Emacs.

All these programs run on OpenBSD.

http://www.latex-project.org

http://latex-beamer.sourceforge.net

http://www.tug.org/texlive/

http://www.gnu.org/software/emacs

http://www.gimp.org/

http://www.openbsd.org

Published under the Creative Commons Attribution 3.0 License

http://creativecommons.org/licenses/by/3.0/

Document version 1.0 2009/02/04