Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial...
-
Upload
julian-weaver -
Category
Documents
-
view
216 -
download
0
description
Transcript of Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial...
![Page 1: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/1.jpg)
Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the
Sapelo Island Microbial Observatory
Wade M. SheldonWade M. SheldonMary Ann MoranMary Ann Moran
James T. HollibaughJames T. Hollibaugh
![Page 2: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/2.jpg)
Genetic Sequence Databases Major informatics success storyMajor informatics success story Large repositories for nucleotide sequences (e.g. Large repositories for nucleotide sequences (e.g.
GenBank/EMBL/NDDJ ~16M)GenBank/EMBL/NDDJ ~16M) Automated and web-based data submission - Automated and web-based data submission -
required as part of publication processrequired as part of publication process Standardized alignment/search tools support use for Standardized alignment/search tools support use for
classificationclassification Numerous ‘environmental sequences’ – ecologists Numerous ‘environmental sequences’ – ecologists
now using to study biogeography, community now using to study biogeography, community structure, eco-physiologystructure, eco-physiology
![Page 3: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/3.jpg)
Problems with GenBank Metadata voluntary – limited in scopeMetadata voluntary – limited in scope
Title (definition), authors, key words, comments, Title (definition), authors, key words, comments, literature citationliterature citation
Many sequences unpublished, undescribedMany sequences unpublished, undescribed Quality control standards poorly enforcedQuality control standards poorly enforced No direct way to provide links to ancillary data No direct way to provide links to ancillary data
(URLs not officially supported, often removed)(URLs not officially supported, often removed) Very inefficient and often impossible for investigators to Very inefficient and often impossible for investigators to
obtain ecological context information, even from journalsobtain ecological context information, even from journals Comparisons of matched taxa by traits not possibleComparisons of matched taxa by traits not possible
![Page 4: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/4.jpg)
Consequence
Tremendous amount of bacterial sequence Tremendous amount of bacterial sequence data relevant to microbial ecologistsdata relevant to microbial ecologists
No established interfaceNo established interface
![Page 5: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/5.jpg)
Example – Insufficient Metadata
![Page 6: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/6.jpg)
Sapelo Island Microbial Observatory (http://simo.marsci.uga.edu)
MObs – NSF-funded network of sites or "microbial MObs – NSF-funded network of sites or "microbial observatories" established to discover novel microorganisms, observatories" established to discover novel microorganisms, microbial consortia, communities, activities and other novel microbial consortia, communities, activities and other novel properties, and to study their roles in diverse environmentsproperties, and to study their roles in diverse environments
Projects supported are expected to establish or participate in Projects supported are expected to establish or participate in an established, Internet-accessible knowledge network to an established, Internet-accessible knowledge network to disseminate the information resulting from these activitiesdisseminate the information resulting from these activities
SIMO - Investigating the diversity of prokaryotes, their SIMO - Investigating the diversity of prokaryotes, their physiological and genetic characteristics, and their physiological and genetic characteristics, and their biogeochemical activities in a salt marsh/estuarine ecosystem biogeochemical activities in a salt marsh/estuarine ecosystem in the southeastern U.S.in the southeastern U.S.
Knowledge networks:Knowledge networks: GenBankGenBank GCE-LTER ISGCE-LTER IS SIMO 16S rRNA DatabaseSIMO 16S rRNA Database
![Page 7: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/7.jpg)
SIMO 16S rRNA Database Purpose: LIMS, research tool, data disseminationPurpose: LIMS, research tool, data dissemination Designed to store sequence data and all supporting SIMO Designed to store sequence data and all supporting SIMO
research informationresearch information Hierarchical structure modeled after research workflowHierarchical structure modeled after research workflow Metadata on site geography, sample collection, all Metadata on site geography, sample collection, all
methodology, personnel, ancillary measurementsmethodology, personnel, ancillary measurements Extensive content control, error checkingExtensive content control, error checking Links to information in external databases (RDP II, Links to information in external databases (RDP II,
GenBank, GCE-LTER)GenBank, GCE-LTER) Queries by phylogenic and/or ecological characteristicsQueries by phylogenic and/or ecological characteristics
![Page 8: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/8.jpg)
Conceptual Diagram of the SIMO DatabaseSecondary/
External DataPrimary DataMetadata
Samples
Organisms
Sequences
PhylogeneticComparisons
Environment
Methodology
Study Site
Methodology
Methodology
Methodology
PhylogeneticGroups
Other Analyses
GenBank
GCE-LTER
Ancillary Data
RDP II
![Page 9: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/9.jpg)
List-based data entry linked to metadata tables
![Page 10: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/10.jpg)
Controlled vocabulary supports finely-targeted queriesAutomatic hyperlinks provide links to tasks
![Page 11: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/11.jpg)
List-based queries also simplify public interface
![Page 12: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/12.jpg)
Phylogenetic and ecological characteristics combined dynamically to create overview and query interface
![Page 13: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/13.jpg)
SIMO Metadata Metadata primarily stored in managed lists, linked Metadata primarily stored in managed lists, linked
to records by foreign key fieldsto records by foreign key fields Scalable design – details can be added Scalable design – details can be added
independently without altering data recordsindependently without altering data records Complete metadata for sequences generated by Complete metadata for sequences generated by
relational joinsrelational joins Links to external metadata in GCE-LTER Links to external metadata in GCE-LTER
database adds site geography, research history, database adds site geography, research history, long-term environmental characteristicslong-term environmental characteristics
![Page 14: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/14.jpg)
Metadata Standards No existing standard for environmental sequence No existing standard for environmental sequence
metadatametadata Sequence formats (FASTA, BIOML, BSML) Sequence formats (FASTA, BIOML, BSML)
designed for data parsing, sequence annotationdesigned for data parsing, sequence annotation SIMO metadata currently displayed in summary SIMO metadata currently displayed in summary
form on sequence detail pagesform on sequence detail pages Exploring adopting emerging standards like EMLExploring adopting emerging standards like EML
![Page 15: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/15.jpg)
Sequence Details
![Page 16: Efforts to Link Ecological Metadata with Bacterial Gene Sequences at the Sapelo Island Microbial Observatory Wade M. Sheldon Mary Ann Moran James T. Hollibaugh.](https://reader036.fdocuments.in/reader036/viewer/2022081514/5a4d1b697f8b9ab0599b280b/html5/thumbnails/16.jpg)
Future Directions Incorporating batch upload features for library Incorporating batch upload features for library
submissionssubmissions Integrating database with ‘RDP SeqMatch Agent’ Integrating database with ‘RDP SeqMatch Agent’
programs for automatic phylogenetic analysis, programs for automatic phylogenetic analysis, sequence annotationsequence annotation
Provide full metadata in formatted/printable and Provide full metadata in formatted/printable and parsable ASCII formats (XML)parsable ASCII formats (XML)
Participate in Entrez Link-Out to provide links to Participate in Entrez Link-Out to provide links to SIMO sequence entries from GenBankSIMO sequence entries from GenBank