A Data Analysis and Coordination Center for the Human Microbiome Project
description
Transcript of A Data Analysis and Coordination Center for the Human Microbiome Project
A Data Analysis and Coordination Center for the Human Microbiome Project
A Data Analysis and Coordination Center for the Human Microbiome ProjectOwen WhiteInstitute for Genome SciencesUniversity of Maryland School of Medicine
1Initiative 1: Data Resource Generation - sequencing of 400 strains of prokaryotic microbes from different body regions; recruitment of donors; collection of samples; metagenomic sequence analysis;
Initiative 2: Demonstration Projects - relationship between changes in the human microbiome and health or disease onset;
Initiative 3: Technology Development - development of improved culturing techniques; individual microbe sequencing;
Initiative 4: Ethical, Legal, and Social Implications Research - clinical and health; forensics; uses of new technologies; ownership of microbiome;
Initiative 5: Data Analysis and Coordinating Center - tracking, storing and distributing data; data retrieval tools; coordination of analyses and metadata standards; creation of a portal for international activities; and
Initiative 6: Computational Tool Development - new tool development; next generation sequencing platforms; large, complex sequence data; functional data and metadata.HMP Initiatives
DACC Roles and ResponsibilitiesTracking, storing and distributing data
Data and metadata standardization
Distribution of software tools and pipelines
Support for data analysis
Providing a repository of protocols and SOPs
Development of a comprehensive web portalDACC CollaboratorsThe Institute for Genome SciencesProject CoordinationWeb PortalCore PipelinesData and Metadata Management
The Joint Genome InstituteHMP Project Catalog (GOLD)Metagenome Analysis Strategies
Lawrence Berkeley National Lab16S Data Management (greengenes)HMP Data Analysis System (IMG)
University of Colorado at BoulderMetadata StandardsStatistical and Analytical Tools
In partnership with.
www.hmpdacc.org
1482
Reference Genome Sequence & Annotation Download
Reference Genome Sequence & Annotation Download
HMP Project Catalog Relational data modelTracks project statusStores comprehensive metadata Links to public data resourcesProvides search/filtering options
1570HMP Project Catalog* Includes active and targeted projects
Breakdown by Primary Body Site ** Includes active and targeted projects
Breakdown by Primary Body Site *
Contains a complete list of all Reference Strains along with detailed metadata about each. Provides both quick and advanced search and download options.
Reference Genomes: MIGS ComplianceDACC Management Web InterfaceEnforces the population of required fieldsRestricts contents of fields with controlled vocabulariesProvides both individual and bulk update optionsFollowed by QC steps prior to incorporation into the Catalog
Genome Analysis at IMG
Metagenomic WGS DataSarah YoungJohn Martin HMP Data Processing Working GroupMetagenomic WGS DataSarah YoungJohn Martin HMP Data Processing Working Group
Reference context for metagenome analysis
This coming yearVictor MarkowitzNikos KyrpidesJGI:
WGS submission to NCBI Centers and DACC are working with NCBI to use common schema and relevant metadata. Submission guide, usage Aspera client, usage of QIIME available
21NCBI ProjectsHMP Top Level(43021)16S(48489)WGS(43017)Characterizing microbiome of healthy individualsSource = HMP CentersAssociating microbiomewith diseaseSource = Demo Projects(46305)WGS16SReference Genome Top Level(28331)22
HMP-Wide Patient PhenotypeIHMC VariableTotalFraction IdenticalMappableNot mappableNot presentP1P2 PNSUBJID1.000.880.13SUBJID SUBJID Gender0.940.940.06Gender Age0.880.810.060.060.06Age_at_first_visit AgeAtEnrollment Race0.810.440.380.19Race Race_Other_Text Other Rrace0.560.310.250.44Other Race Race_Other Smoking0.380.310.060.63Smoking_status Lab0.310.190.130.060.63Diagnosis TID Smoking_duration0.310.190.130.69Smoking_status Drugs0.310.190.130.69Antacids, Steroids, AntibioticsWeight_kg0.250.250.060.69BP0.190.190.81Height0.190.190.81Disease0.190.060.130.81Institution0.130.000.130.88Dose0.130.060.060.88Duration0.130.060.060.88Start_date0.130.130.88TIDFinish_date0.130.130.88TIDLocation0.130.130.060.81Other Country Drug_name0.060.060.060.88HIV/AIDS0.001.00
Dirk Gevers & Ashlee EarlBroad Institute
CloVR - Cloud Virtual ResourceVirtual MachineTrimming, filteringTree GenerationORFpredictionPhylogeneticDiversityAssemblyCDS, tRNA, rRNA prediction Auto-AnnotationFunctional diversitySequencemappingSNP identificationSequencemappingSNP identificationQuantitative AnalysisQuantitative AnalysisMetagenomicsProkaryotesCommunity ComparisonAlignmentClassificationAssemblySequencemappingSNP identificationQuantitative Analysis16S PCR or RT-PCR ProductsTotal Metagenomic DNA or RNAReferenceSingle-Genomic or Pan-Genomic DNAEukaryoticDNA or RNAReferenceReferenceReferenceEukaryotesGenepredictionAuto-AnnotationAssemblyEukaryoticDNARaw Sequence Data
Local ComputerCompute Cloud
Annotated Sequence Data standardized nomenclature suitable for publicationPI: Florian FrickeTechnical lead: Sam Angiuoli
Large-scale Amazon DeploymentFlorian Fricke, Sam Angiuoli Institute for Genome SciencesPhase 2: More Access Open access dataAnnotated data sets, aggregated, searchableSome pre-computesReference data sets
Research networkProcessed filesAggregated datasets Metadata
We are surveying the community now!See:Heather Huot CreasyCathering JordanPhase 2External users will:Select data sets /results for downloadSearch for specific data Access data archives (may be some with controlled access)See data reports, stats about data, validation process, etcSee information about metadata
Phase 3: Analysis ToolsAnnotation Pipelines RAMMCAP Rapid analysis of Multiple Metagenomes with Clustering and Annotation PipelineShotgunFunctionalizeR
BinningSOrt-ITEMS Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences Community composition, comparative metagenomicsMEGAN (MEtaGenome ANalyzer) CARMA GAAS (Genome relative Abundance and Average Size) GalaxyGINKGO Metarep Suite of web based toolsMetastats compare clinical metagenomic samples from two treatment populationsRAMMCAP - Statistical metagenome comparisonShotgunFunctionalizeR R-package for functional comparison
Visualization Invue API and software suite for large scale data visualization
Online resourcesMy IMG/M tools for analyzing microbiome functional capability MG-RAST - variety of comparative and visualization tools
IGSLBLJennifer WortmanGary AndersenMichelle Gwinn Giglio Todd DeSantisHeather Huot Creasy Navjeet SinghBrandi CantarelVictor MarkowitzJonathan Crabtree Amy ChenJoshua Orvis Cesar ArzeJGIMark Mazaitis Nikos Kyrpides Victor Felix Konstantinos LioliosCatherine Jordan Anup Mahurkar Univ. of Colorado Cornell University : Ruth Ley, Rob KnightSan Diego State: Scott KelleyDan KnightsArgonne National Lab: Folker MeyerJustin Kuczynski
HMP DACC Team35SRA SAMPLE
SRA EXPERIMENT
SRA RUN
SFF FILE
1
1
1
1
1
*
SRA STUDY
FASTQ File