Ben Busby, Ph.D.Genomics Outreach Coordinator
Recording Informative Metadata for Nutrigenomics!
Environmental Variation in the Rising Era of Individual Genome Sequence
Review of terminology and conceptsNext Generation Sequencing
Graphic Credit: Spencer Martin, UBC
Review of terminology and conceptsHow Genomes are Mapped and Assembled
© Martine Zilversmit 2013
http://1.usa.gov/1J1xmYs
NCBI NGS Online Workshop – Available on the NCBI YouTube Channel!
Review of terminology and conceptsHow Genomes are Mapped and Assembled
My View of Data Transfer Principles• Metadata Search
• Rapid NoSQL (for now)• Integration• Non-ambiguous identifiers
• Transferring Small amounts of Data• Data still gets transferred in the cloud• Underlying structure• Finding specific data from validated formats
• Democratization of Data• Rapid comparison by domain experts
• Reporting• Metrics to report data upload and [unique IP] download of datasets• Post-publication User Review
• The NCBI LinkOut Mechanism as a test suite
BioProject
BioProject
BioProject
BioProject
BioSample
BioSample
BioSample
SRA
dbGaP
dbGaP
2007 2008 2009 2010 2011 2012 2013 2014 2015
14,20153,216
139,311
374,464
485,727
566,181
660,665
876,849
1,002,935
Subjects
Investigation of NGS:SRA BLAST!
Investigation of NGS:MagicBLAST!
Domain-specific SRA and BioSample Submission Templates
Domain-specific Bulk SRA and BioSample Submission
GenBank and RefSeq!
24
Submission to GenBank!
Superbankit!
Reporting
Food Borne Pathogens
Food Borne Pathogens
Food Borne Pathogens
Where to Get More Information!
Where to Get More Information!
The Future
The Future (in my opinion)
The Future (in my opinion)…
Is already here
Ontological Standardization
Ontological Standardization
Ontological Standardization
Integration into a Larger Data Discovery Framework
BD2K - bioCADDIE
Integration into a Larger Data Discovery Framework
Integration into a Larger Data Discovery FrameworkExample: GOLD (JGI)
E-Utilities (Eutils)
Video available at:http://www.ncbi.nlm.nih.gov/education/webinars/
42
E-Utilities (Eutils)
43
Introducing… Entrez DirectThe E-utilities on the UNIX
command line
esearch –db gene –query “foxp2[gene] AND human[orgn]” | \
elink –target protein –name gene_protein_refseq | \
efetch –format fasta
ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/
44
Moving from FTP-scraping cron jobs to on-demand APIs
45
Edirect Cookbook (DRAFT)
46
Generating apps that work with our APIs and Data Structures,
and Improve Metadata:
NCBI Hackathons!
2015-2016, 8 Hackathons
Many Functional Software Products 3 Days
An Educational Resource for RNAseq
Available to
anyone on AWS
Part of an Online Workshop
First 5 lectures
now available
on
HackathonsJanuary 2016 6 functional software products 3 days
Hackathons
Hackathons
In April, July, August and
October 2016
we built on
those projects .
Finding immunogenic peptides from single RNA-seq samples
DangerTrackDifficult to assess regions
Combined score is the average of SVs, mappability, GC..
NCBI region list
Encode blacklist
Get More Info!
On Twitter @NCBI@DCGenomics
In 2017 we will Build on Those Projects!
Biomedical Informatics Hackathon January 9th – 11th NIH Campus, Bethesda!
NCBI Genomics Hackathon March 20-22nd NIH Campus, Bethesda
Top Related