The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

77
The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013

Transcript of The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Page 1: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

The Microbiome and Metagenomics

Catherine LozuponeCPBS 7711

September 19, 2013

Page 2: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

What is the microbiome?• “The ecological community of commensal,

symbiotic, and pathogenic microorganisms that share our body space”

• Microbiota: “collection of organisms” Microbiome: “collection of genes”

• Bacteria, Archaea, microbial eukaryotes (e.g. fungi or protists) and viruses.

• Body Sites– Important roles in health and disease: Gut, Mouth,

Vagina, Skin (diverse sites:Nasal epithelial)– Important roles in disease: Lung, blood, liver, urine

Page 3: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

The big tree

Pace, N.R.,The UniversalNature of Biochemistry. PNAS Vol 98(3) pp 805-808.

• Majority of life’s diversity is microbial

• Majority of microbial life cannot be grown in pure culture

Page 4: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

The Human Gut Microbiota• 100 trillion microbial cells: outnumber human

cells 10 to 1!• Most gut microbes are harmless or beneficial.

– Protect against enteropathogens– Extract dietary calories and vitamins– Prevent immune disorders

• List of diseases associated with dysbiosis ever growing– Inflammatory Diseases: IBD, IBS– Metabolic Diseases: Obesity, Malnutrition– Neurological Disorders– Cancer

Page 5: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 6: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

What do we want to understand?• What does a healthy microbiome look like?

– How diverse is it?– What types of bacteria are there?– What is their function?

• How variable is the microbiome?– Over time within an individual?– Across individuals?– Functionally?

• What are driving factors of variability?– Age, culture, physiological state (pregnancy)

• How do changes affect disease?– What properties (taxa, amount of diversity) change with disease?– Cause or affect?– Functional consequences of dysbiosis

• Host Interactions– Evolution/adaptation to the host over time.– Immune system

Page 7: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Culture-independent studies revolutionized our understanding of gut bacteria

• Culture-based studies over-emphasized the importance of easily culturable organisms (e.g. E. coli).

1. Extract DNA from environmental samples.

2.PCR amplify SSU rRNA gene (which species?) Sequence random fragments (which function?)

3. EvaluateSequences

Culture-independent surveys

Page 8: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Gut microbiota has simple composition at the phylum level

Data from: Yatsunenko et. al. 2012. Nature.

Different phyla: Animalsand plants

Page 9: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• Each person harbors > 1000 species.

• Some species are unique (red and blue)

• Some shared (purple)

• We know very little about what most of these species do!

Diversity of Firmicutes in 2 healthy adults

Page 10: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Sequencing technology renaissance enabled more complex study designs

• Sanger Sequencing (thousands)• Pyrosequencing (millions)• Illumina (billions!)

Page 11: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Metagenomics

• The study of metagenomes, genetic material recovered directly from environmental samples.

• Marker gene– PCR amplify a gene of interest – Tells you what types of organisms are there– Bacteria/Archaea (16S rRNA), Microbial Euks (18S

rRNA), Fungi (ITS), Virus (no good marker)• Shotgun

– Fragment DNA and sequence randomly.– Tells you what kind of functions are there.

Page 12: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Small Subunit Ribosomal RNA

• Present in all known life forms

• Highly conserved• Resistant to horizontal

transfer events

16S rRNA secondary structure

Page 13: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Other ‘Omics• MetaTranscriptomics (sequence version of

microarray)– Isolate all RNA– Deplete rRNA– Sequence all transcripts – Sometimes phenotype only seen in activity of the

microbiota• Metabolomics

– What metabolites does a community produce?– E.g. in feces or urine

• MetaProteomics– What proteins does a community produce?

Page 14: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Integrating Data Types• 16S rRNA -> shotgun metagenomics

– What gene differences cannot be explained by 16S?

– Selection by HGT• 16S/ genomics -> transcriptomics->

metabolomics – What species or genes (or combination of species

or genes), when expressed, are responsible for producing a given metabolite?

Page 15: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 16: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 17: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Sequencing Technologies• Sanger -> 454 Pyrosequencing -> Illumina

Page 18: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Short reads (pyrosequencing) can recapture the result.

• UW UniFrac clustering with Arb parsimony insertion of 100 bp reads extending from primer R357.

• Assignment of short reads to an existing phylogeny (e.g. greengenes coreset) allows for the analysis of very large datasets.

Liu Z, Lozupone C, Hamady M, Bushman FD & Knight R (2007) Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res 35: e120.

Page 19: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 20: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Preprocessing pyrosequencing datasets

• Quality filtering: Discard sequences that:– Are too short and too long (200-1000 range)– With low quality scores– With long homopolymers– Can trim poor quality regions from the ends

• PyroNoise and Chimeras– Can greatly inflate OTU counts– Pyronoise algorithm uses SFF files to fix noisy

sequences

• Use barcodes to assign sequences to samples

Page 21: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Defining species: OTU picking

• Cluster sequences based on % identity– 97% id typical for species– CD-HIT, UCLUST

• For Phylogenetic diversity measures need to make a tree– Align sequences: NAST, PyNAST– Denovo tree building: FastTree– Assign reads to sequences in a pre-defined

reference tree

Page 22: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Comparing Diversity

• Overview of methods for evaluating/comparing microbial diversity across samples using 16S rRNA– diversity: Measures how much is there?– diversity: How much is shared?

• Phylogenetic verses taxon based diversity. • Quantitative verses Qualitative diversity.• What types of taxa are driving the patterns? Which

species are associated with measured properties?• Tools: UniFrac/QIIME/Topiary Explorer• Lozupone, C.A. and R. Knight (2008) Species divergence and the

measurement of microbial diversity. FEMS Microbiol Rev. 1-22.

Page 23: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

How do we describe and compare diversity?

• Diversity:– “How many species are in a sample?”

• (e.g. 6 colors in A and 6 in B)– e.g.: Are polluted environments less diverse than pristine?

• Diversity:– “How many species are shared between samples?”

• (e.g. 2 shared colors between A and B)– e.g.: Does the microbiota differ with different disease

states?

A

B

Page 24: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Quantitative versus Qualitative measures

• Qualitative: Considers presence absence only– : How many species are in a sample?

• e.g.: 6 colors in both A and B.– How many species are shared between

samples?• e.g.: A and B are identical because the same colors

are present in both.

• Quantitative: Also considers relative abundance.– : Accounts for “evenness”:

• e.g. B, where the population is evenly distributed across the 6 species, is more diverse than A, where all species are present but red dominates.

– Samples will be considered more similar if the same species are numerically dominant versus rare.

• e.g. B and A no longer look identical because of differences in abundance.

A

B

Page 25: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

What is a phylogenetic diversity measure?

• Diversity:– Taxon: “How many species are in a sample?” – Phylogenetic: “How much phylogenetic divergence is in a

sample?” • (e.g. B more individually diverse than A - more

divergent colors)

• Diversity:– Taxon: “How many species are shared between samples?”– Phylogenetic: “How much phylogenetic distance is shared

between samples?”• (only related colors from B are in A)

A

B

Page 26: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Advantages of phylogenetic techniques.

• Phylogenetically related organisms are more likely to have similar roles in a community.

• Taxon-based methods assume a “star phylogeny,” where all relationships between taxa are ignored.

• Phylogeny and Taxon-based methods can be complementary.

Page 27: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Diversity Measures• Diversity

– Phylogenetic Diversity: PD– Taxon-based:

• observed # species (richness)• Correct for undersampling (Chao1, Ace)• Richness + evenness (Shannon-Weaver index)

• Diversity– Test if samples have significantly different membership.

• UniFrac Significance, P test, Libshuff (Phylogenetic)– Identify environmental variables associated with differences

between many samples.• Phylogenetic

– Unweighted and Weighted UniFrac– DPCoA

• Taxon-based: Jaccard/Sorenson indices

Page 28: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• Sum of branches leading to sequences in a sample.• Sample with taxa spanning the most branch length in this tree

represents the most phylogenetically and perhaps functionally divergent community.

Phylogenetic Diversity (PD)

Faith, D.P. (1992) Conservation evaluation and phylogenetic diversity. Biological Conservation 61, 1-10.

Page 29: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

PD Rarefaction

• Plot the amount of branch length against the # of observations.• Shape of curve allows for estimating how far we are from sampling all of

the phylogenetic diversity.• Allows for comparison of phylogenetic diversity between samples.

Eckburg, P.B., et al. (2005) Diversity of the human intestinal microbial flora. Science 308, 1635-1638.

Page 30: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Phylogenetic and OTU based techniques can be complementary

• Results of analyzing the same data with Chao1 and PD.

• Samples from stool, mouth, lung, plasma, and negative controls.

• Differentiation between the stool/mouth and negative controls greater with Chao1 than with PD

• The negative controls have few OTUs but they are phylogenetically diverse

• Chao1 estimates go up with sampling effort.

Page 31: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• Do two samples contain significantly different microbial populations?

• Can we see broad trends that relate many samples and explain them in terms of environmental factors?

Phylogenetic diversity: How is diversity partitioned across

samples?

Page 32: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Unique Fraction (UniFrac) metric • Qualitative phylogenetic diversity.• Distance = fraction of the total branch length that is unique to any particular

environment.

Lozupone and Knight, 2005, Appl Environ Microbiol 71:8228

Page 33: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Clustering with the UniFrac AlgorithmCan we see broad trends that relate many samples and explain them in terms of

environmental factors?

Page 34: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

What types of environments have similar phylogenetic diversity?

Temperature

0-100°C

pH

1-12

NutrientAvailability

OligotrophicEutrophic

Pressure

1-200 atm

Lozupone CA & Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104: 11436-11440.

Page 35: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Salinity is the most important factor

PCoA of UniFracDistanceMatrix

Page 36: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Hierarchical clustering (UPGMA)

of the same UniFrac distance matrix

Page 37: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Qualitative vs Quantitative measures of Phylogenetic Diversity

• Qualitative:– Unweighted UniFrac– Detects factors restrictive for microbial growth.– High temperature, low pH, founder effects.

• Quantitative: – Weighted UniFrac, DPCoA.– Detects transient changes.– Seasonal changes, nutrient availability, response to

pollution.• Yield different, complementary results and applying

both to same data can provide insight into nature of community changes.

Page 38: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Weighted UniFrac

Lozupone et al., 2007. Appl Environ Microbiol 73:1576

Qualitative Quantitative

Page 39: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Obesity and Gut Microbiota

• Mice heterozygous for mutation in Leptin gene interbreed.

• 16S gene sequenced for bacteria in gut of mothers and offspring.

Ley et al., (2005)Obesity Alters Gut Microbiota, PNAS Vol 102: pp 11070-11075

Page 40: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

So how about the obese mice?

Mice cluster perfectly by

mother

Ley et al., (2005)Obesity Alters Gut Microbiota, PNAS Vol 102: pp 11070-11075

Page 41: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Stronger clustering with obesity with Weighted UniFrac

Page 42: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Unweighted UniFrac

Weighted UniFrac

Eckburg, P.B., et al. (2005) Diversity of the human intestinal microbial flora. Science 308, 1635-1638.

• Unweighted: all samples cluster by individual.

• Weighted: stool looks different.

Comparison of human stool and mucosal

microbes

Page 43: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Measures in the same class cluster the data similarly

• Double principal coordinates analysis (DPCoA)– Another quantitative diversity

measure. – A matrix of species distances is first

used to ordinate the species using PCoA.

– The position of the communities in coordinate space is the average position of the species that they contain, weighted by relative abundances.

• Produces same results as weighted UniFrac.

Page 44: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• Computation enhancements create order of magnitude increases in speed and reduced memory requirements.

Fast UniFrac

Hamady, Lozupone and Knight, The ISME Journal. 2009. Epub ahead of print.

Page 45: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Avoiding bias• Pyrosequencing often produces high variability in the number of

sequences per sample.• This can introduce bias because undersampling creates inflated

beta diversity values

Lozupone et al. 2011. ISME. 5:169-72

• Randomly resampled a dataset at different depths and calculated the average UniFrac distance.

• Samples with fewer sequences look artificially different.

• Rarefaction: randomly select an even amount of sequences

Page 46: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Web interfaces have >2200 registered users.Unifrac papers have collectively 1250 citations.

461 citations

Page 47: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

www.microbio.me/qiime

Page 48: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Study effects drive clustering of Western adults

Lozupone et al. Genome Research. 2013

Page 49: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Age and culture drive differences

Page 50: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Supervised Learning, classical statistics, taxonomic classification, and phylogenetic trees; How can we use these

tools to understand which microbial taxa change across treatments?

Page 51: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Identifying compositional changes that drive diversity patterns

• Histograms

Page 52: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• 16S rRNA gene tree of OTUs prevalent in 2 studies of diet/obesity

– Turnbaugh 2009 Sci Transl Med. 1:6ra14– Ley 2006. Nature. 444:1022-3

• Clostridia clusters XIVa and IV are the most abundant in the healthy gut.

Histograms and trees can pain a different picture

Firmicutes

Peterson 2008 Cell Host Microbe: 3:417-27

Cluster XIVa ~43% of the total bacteria in the stool of healthy individuals (Maukonen 2006. J Med Microbiol. 55:625-33.)

Page 53: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Identifying taxonomic determinants

• Which taxa are significantly different between health and disease?– Using OTUs versus classifier derived taxa.

• PCoA Biplots:Which taxa are correlated with overall clustering patterns?

• Finding discriminatory OTUs with Supervised Learning.

• Applying classical statistical tests with out_category_significance.py

• Exploring relationships in trees.

Page 54: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Defining Taxa• 2 methods

– OTUs– Classifiers (e.g. the RDP classifier)

• For both methods phylogenetic depth of the taxa can be varied.– OTUs – different %IDs (97%, 95%, 90%)– Classifiers – different levels (species, genus, family)

• Advantage of using OTUs– Can evaluate phylotypes not related to known species or in

taxonomic groups with poorly defined systematics.– Each OTU represents an equal amount of phylogenetic divergence.

• Advantage of using Classifiers– Can more easily relate results to other published results.– Fewer taxa than OTUs.

Page 55: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

At what level should I classify?• Shallow

– 97% ID OTU or species-level taxonomy assignments– Advantage

• Biological properties of taxa have the potential to be more strictly defined

– Disadvantage• Can loose power to find associations in broader lineages in which a trait

is conserved

• Broad– 90% ID OTUs or family-level taxonomic assignments– Advantage

• More powerful for conserved traits

– Disadvantage• Association in a broader group is often driven by only a subset of its

members (i.e. if you detect that Gamma Proteobacteria go up you cannot say that E. coli did it!)

Page 56: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

When ill-defined systematics can cause trouble

Lozupone et al 2012Genome Research

Clostridium

Clostridium

Clostridium

Clostridium

Eubacterium

Eubacterium

Ruminococcus

Ruminococcus

RuminococcusRuminococcus

ClostridiumEubacterium

Blautia

Blautia

Clostridium cluster XIVaLachnospiraceae

Page 57: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

PCoA Bi-plots• Allows visualization of taxa and samples in the

same PCoA space

Page 58: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Finding discriminative OTUs• 2 methods

– Supervised learning– Classical statistics

• Supervised learning– Evaluates how well OTUs/taxa can be used to classify

by treatment.– Discriminative OTUs are those for which classification

power is reduced when they are removed from the set– Advantage:

• evaluates OTUs contextually rather than independently

– Disadvantage: • only works with Discrete sample groupings (i.e. will not

handle correlations with disease severity or changes over time)

Page 59: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Feature importance scores

• All OTUs with scores > 0.001 considered ‘important’– Yatsunenko et al

Nature 2012• Problem: We do not

know the direction of change.

• With only two categories – compare the means.

Page 60: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Classical Statistics Tests in QIIME• otu_category_significance.py

– i: otu table– m: category mapping– c: category (e.g. health status)– s: statistical test

• ANOVA• Pearson correlation• Paired T test• G-test of independence

– f: minimum number of samples found in to be considered – Removes OTUs that don’t pass the filter, performs a

statistical test on each OTU, corrects for multiple comparisons with FDR and Bonferroni correction.

– Can also be run on Taxa Summary tables files if in BIOM format.

Page 61: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Assign statistical significance values to bar charts

Page 62: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

ANOVA output

• I use these means and their significance to assess direction of change in Supervised learning results.

Page 63: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 64: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Are discriminatory OTUs related to each other and to type strains?

• Relate them in a tree.• ARB to make the tree using parsimony

insertion.– http://www.mpi-bremen.de/ARBSILVA.html

• Topiary explorer to visualize/color the tree and make publication quality graphics– http://topiaryexplorer.sourceforge.net

Page 65: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.
Page 66: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Sometimes associations are phylogenetically shallow

Erysipelotrichales with HIV infection

Page 67: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Genomics• Genomics : Thousands of

complete and draft genome sequences for human commensals publicly available– Promise: translate 16S into

functional predictions (PiCRUST)

– Challenges: no genomes for unculturable microbes

– Genes with high HGT

Distribution(16S rRNA)

ExperimentalConfirmation

(anaerobic culture)

Comparative genomics(complete genomes)

Page 68: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Annotating genes to functions• Based on similarity to genes of known function.

NCBI genomeshave functions

listed for predicted proteins

Page 69: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Databases for functional assignments

• COGs (Clusters of Orthologous Groups; http://www.ncbi.nlm.nih.gov/COG/)

• KEGG (Kyoto Encyclopedia of Genes and Genomes; http://www.genome.jp/kegg/)

• CAZy (Carbohydrate Active Enzymes database; http://www.cazy.org/)

• pFAM (protein family database; http://www.sanger.ac.uk/resources/databases/pfam.html)

Page 70: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

COG database• Orthologous groups

– A group of proteins that are expected to perform the same function in the different organisms in which they are found.

– Function is inferred for the whole group based on experimental work with one of its members.

– COGs are grouped into larger functional groups.

Page 71: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

KEGG database• Orthologous groups

(assigned KO numbers)

• Metabolic pathways.– Boxes contain

enzyme commission database (EC) numbers.

• Each EC is associated with KO numbers (a protein family that is known to perform that reaction).

Page 72: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Shotgun metagenomics

Page 73: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

KEGG pathway Ontology

Page 74: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Glycoside Hydrolases (GH)Degradation: hydrolyze glycosidic bonds between two carbs

or between a carb and a non-carb.

Important for degradation of plant polysaccharides.

GlycosylTransferases (GT)Biosynthesis: catalyze the transfer of sugar moeties.

Important for communication with host immune system.

• Database describing protein families predicted to be carbohydrate active based on homology

• Uses HMMs• Exact reaction

performed does not need to be known.

Page 75: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

• Similar to CAZy but with a broader scope.• Hidden Markov Models that describe

sequence motifs of a known function

Page 76: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Annotating genes to taxonomic groups

• Based on similarity to genes in a database of reference genomes.– http://www.genomesonline.org/cgi

-bin/GOLD/index.cgi• Mg-RAST uses best BLAST hit:

M5N4

Page 77: The Microbiome and Metagenomics Catherine Lozupone CPBS 7711 September 19, 2013.

Annotating metagenomes• MgRAST

http://metagenomics.anl.gov/metagenomics.cgi?page=Analysis

• Produces Table mapping samples to annotations that can be further processed in QIIME