Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive...
Transcript of Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive...
![Page 1: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/1.jpg)
Bioinforma)cs-‐in-‐a-‐Box 04/18/15
Vermont Gene)cs Network
Professional Development Event
Pomeroy Alumni Center, St. Michael’s College Colchester, VT
Faye D. Schilkey Na/onal Center for Genome Resources
![Page 2: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/2.jpg)
NCGR: National Center for Genome Resources
n Not-for-profit research organization n Formed: 1994 in Santa Fe, NM
n Expertise: Bioinformatics (21 yrs) and Next Gen Sequencing (8 yrs)
n Applies bioinformatics, software engineering and next-generation sequencing to solve the -omic challenges of 21st century
n Collaborative research and services
![Page 3: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/3.jpg)
Faye D. Schilkey, BS Computer Engineering
n First Career: Software engineering in automotive (robotics) and aerospace (guidance and autopilot systems).
n Second Career (Big Data): n IT/Software Engineering/Database Development in Genomics/
Bioinformatics (> 15 yrs) n Genome Sequencing Center Operations & Services (8 yrs) n Director, NM INBRE Bioinformatics Core (9 yrs) n Director, NM INBRE Sequencing & Bioinformatics Core (8 yrs) n Founding Steering Committee Member of Network of IDeA-
funded Core Laboratories NICL (6 yrs)
![Page 4: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/4.jpg)
Agenda n NCGR
n NM-INBRE Sequencing & Bioinformatics Core (SBC) and IDeA research advancement
n Sequencing and bioinformatics technologies
n Bioinformatics-in-a-Box
n Collaboration/education avenues n Summer Bioinformatics Intensive Internship
n NM Bioinformatics, Science and Technology (NMBIST) conference
n Sequencing and bioinformatics project ideas
n Conclusion and discussion
![Page 5: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/5.jpg)
Research at NCGR
u Focus u Human health and nutrition u Plant science u > 200 publications
AJ Brass Foundation
![Page 6: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/6.jpg)
Human Health Research Publications at NCGR
Dengue virus infection (Virology 2015)
Vibrio cholerae (Genomics Discovery 2014)
Guinea Pig (Genome Announc 2013)
Eyeless Hedgehog (PLoS One 2012)
Carrier Screening (Beyond Batten - Sci Transl Med 2011 )
Multiple Sclerosis (Twins study – Nature April 29, 2010 cover)
Sepsis (J Clin Microbiol. 2010)
Korean Genome (Nature 2009)
Mesothelioma (Proc Natl Acad Sci 2008)
Schizophrenia (PLos One 2008)
![Page 7: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/7.jpg)
• Medicago truncatula (Barrel clover) HapMap (500Mb) – Cornell, UVM, JCVI, NSF, UCSC, INRA-‐Montpellier, ENSAT-‐Toulouse, Boyce Thompson Inst. – Samuel Roberts Noble Foundation
• Medicago sativa (Alfalfa) Genome (860Mb) – Samuel Roberts Noble Foundation
• Theobroma Cacao (Chocolate) Genome (330Mb) – USDA-‐ARS & Mars, Inc., Washington State University, JGI, USDA-‐ARS, IBM, PIPRA, CUGI
• Glycine Max (Soybean) (1 Gbp) and Zea Mays (Maize) (2Gb) Genetic Diversity – Syngenta
• Sorghum Transcriptome – USDA-‐ARS
• Gossypium arboreum (Cotton) Genome (1.7 Gbps) – Texas Tech University & Bayer Crop Sciences
• Phytophthora capsici (100 Mbps) – Univ. of Tennessee, Ohio State Univ., USDA/NSF
• Legume Disease Resistance – Na/onal Science Founda/on, University of California – Davis
Plant/Animal/Fungus/Bacteria Science
![Page 8: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/8.jpg)
• Chickpea & Pigeon Pea Diversity – CIMMYT -‐ Genera/on Challenge Program, ICRISAT
• Andean Birds (Hummingbird) Transcriptome (1 Gbp)
– UNM, NSF
• Green Microalga (85 Mbp) and Diatom strain RGd-1 (25 Mbp) Genomes – Center for Biofilm Engineering, Montana State University
• Staphylococcus aureus strains (3 Mbp) – NMSU, OSU, NIH, NM-‐INBRE
• Burkholderia glumae (rice blight) genome (7.3 Mbp) – Louisiana State University
• Bacteroides xylanisolvens strains (6 Mbp) – USDA-‐ARS, DARPA, Vital Probes
• Polaromonas sp . Strain CG9_12 (pollutant degradation) Genome (5 Mbp) – Center for Biofilm Engineering, Montana State University
• Kibdelosporangium sp. MJ126-NF4 (Actinobacteria having natural products: anti-bacteria/viral/cancer)
Genome (11 Mbps) – UNM
Plant/Animal/Fungus/Bacteria Science
![Page 9: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/9.jpg)
NM-INBRE Sequencing and Bioinformatics Core (SBC) research advancement
0 2 4 6 8 10 12 14 16 18 20
2008
2009
2010
2011
2012
2013
2014
Number of projects, pubs, and grants
Year
Serving to date > 160 researchers/postdocs/students
Pubs in press (31)
Grants Awarded/Continued (30)
Projects (66)
![Page 10: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/10.jpg)
NM INBRE SBC Collaborations > 65 projects
23
1
3
4
2 3
2
1
1
1
1
1
1
2
1 2
2
15
INBRE 40 9 HHMI-SEA Phage INBRE 17 2014 -2015: 2008-2013:
![Page 11: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/11.jpg)
• Dr. Charles “Chad” Melancon - "De Novo Genome Sequencing of Novel Bacterial Isolates from Cave Environments." - UNM
• Dr. Douglas J. Perkins - “Discovery of Genetic Biomarkers for Severe Malaria” - UNM
• Dr. Rebecca A. Reiss - “Nanoinformatics: Characterizing Cell Proliferation on Nanostructured Titanium” - NM Institute of Mining & Tech
• Dr. Travis R. Robbins - “Comparing genomic variation caused by invasion of a novel threat versus geographic separation of populations” - NNMC
• Dr. Alvaro Romero - “Study of transcriptional changes upon dengue virus infection in the Asian tiger mosquito, Aedes albopictus” - NMSU
• Dr. Hitoshi Tsujimoto - “Study of transcriptional changes upon dengue virus infection in the Asian tiger mosquito, Aedes albopictus” - NMSU
• Drs. Ben Wheaton & Rob Miller - “The role of the immune system in spinal cord injury and recovery.” UNM
• Dr. Tim Wright - “Genomic Approaches to Detecting Evolutionary Responses in Biological Invaders " - NMSU
2014-2015 pilot awardees
![Page 12: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/12.jpg)
• Dr. Colleen Fordyce - “Cellular pH during carcinogenesis and how pH can be exploited for therapeutic benefit” - UNM
• Dr. Michael Franklin - “Epigentics of Pseudomonas aeruginosa during biofilm growth” - Montana State
• Dr. Kathryn A. Hanley - “Quasispecies Dynamics of West Nile Virus in Avian Reservoir Hosts” -NMSU
• Dr. Zoe Harrold - “Fire and Ice: metagenomic investigations of a unique sub----‐glacial ice cave system” - Montana State
• Dr. Mario Izaquierre-Sierra - “Transposable Element Regulation in Land Plants: Arabidopsis coilin and Cajal bodies, a case study.” - NNMC
• Dr. Thomas L. Kieft - “Metagenomic Sequencing of U-Contaminated Soils and Sediments” - NMTech
• Dr. Samuel A. Lee - “Illumina RNA-seq expression analysis of cranberry-derived proanthocyanidins for the prevention of Candida albicans urinary biofilms” – UNM
• Dr. Nora Perrone-Bizzozero - “Identification of KSRP neuronal RNA targets by RIP-Seq” - UNM
• Dr. Giancarlo Lopez-Martinez - “The transcriptomics of low-oxygen hormesis and irradiation: What drives the strong organismal performance improvement?” - NMSU
2014-2015 pilot awardees (cont.)
![Page 13: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/13.jpg)
Next Gen Sequencing Applications
Digital Transcript Expression Small RNA
Discovery & Expression
ChIP-SEQ
Genome Structural Variation
Mutation Frequencies
DNAse1 HS Sites
Genetic Association
De novo genome Sequencing
DNA Methylation
Metagenomics
Exome Sequencing
Splice Isoform Abundance
![Page 14: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/14.jpg)
SBC technologies accelerate IDeA research: Sequencing
Illumina HiSeq2000: • RNA, DNA, microRNA, and ChIP seq • 1x and 2x 50/100bp read lengths, ~300Gb yield/10-day run
PacBio RS II: Single Molecule Real-Time observation of DNA synthesis • No PCR bias, faster and accurate P6 polymerase • ~8000bp average read lengths • > 40kb read lengths • > 500Mb per v3 SMRT Cell • 8-16Gb yield per 16 cell run in 48 hours • DNA, De novo assembly, Base modification detection • IsoSeq: Determine the transcript landscape of your organism by sequencing
full-length transcripts and gene isoforms. No assembly required!
![Page 15: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/15.jpg)
Why Sequence mRNA? 1. Cost Effective: Transcriptome ≈ 2% Genome
2. Biologically relevant – active in affected cell or tissue
3. Enables genomic congruence analysis (gene expression, isoform usage and non-synonymous variant information
4. Identifies mutations that are not apparent by genome sequencing (epigenetic silencing, RNA editing, allele-specific expression)
![Page 16: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/16.jpg)
Drew Sheneman, New Jersey -- The Newark Star Ledger
de novo Assembly
![Page 17: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/17.jpg)
2) Custom bioinformatics for de novo/hybrid assemblies, ChIP, metagenomics, etc.
1) New simple bioinformatics tool for “biologists”
Focused on the most popular Next Gen Sequencing experiments:
• RNA-Seq (expression analysis) • DNA-Seq (mutation detection) • microRNA seq analysis
Bioinformatics
![Page 18: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/18.jpg)
RNA-Seq Analysis What’s involved?
![Page 19: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/19.jpg)
QUALITY CHECK TOOLS
n FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ n Evaluate data quality based on several benchmarks (seq
quality, GC content) n Easy to read report n Important to verify that the samples have consistent quality
n BLAST:
http://www.ncbi.nlm.nih.gov/books/NBK1762/ n Verify species
Bioinformatician obtains data and downloads, installs, updates and runs…
![Page 20: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/20.jpg)
TOOLS TO ALIGN/MAP READS TO GENOME Popular alignment algorithm n Tophat 2.0 http://ccb.jhu.edu/software/tophat/index.shtml
n Tophat 1.2/1.3
But what genome (and version) are you mapping against?
• UCSC: ftp://hgdownload.cse.ucsc.edu/goldenPath/ • NCBI: ftp://ftp.ncbi.nih.gov/genomes/ • Ensembl: ftp://ftp.ensembl.org/pub/ • Custom
Bioinformatician downloads, installs, updates and runs…
![Page 21: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/21.jpg)
READ QUANTIFICATION TOOLS
n HtSeq-Count: http://www-huber.embl.de/users/anders/HTSeq/doc/count.html n Raw hit count n Transcript or Gene-based results
n Cufflinks: http://cufflinks.cbcb.umd.edu/ n Normalizing, transcript-based quantification n FPKM/RPKM values n Gene-based values are aggregates
Bioinformatician downloads, installs, updates and runs…
![Page 22: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/22.jpg)
EXPRESSION ANALYSIS TOOLS n DESeq:
http://bioconductor.org/packages/release/bioc/html/DESeq.html
n Requires up-to-date R installation; works with raw-hit-count values
n EdgeR: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html
n Requires up-to-date R installation; works with raw-hit-count values
n Cuffdiff: http://cufflinks.cbcb.umd.edu/
n Part of cufflinks, new version also works with CSV files; works with FPKM values
Bioinformatician downloads, installs, updates and runs…
![Page 23: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/23.jpg)
COLLECT AND INTEGRATE ANNOTATION
ENSEMBL: http://www.ensembl.org/info/docs/api/index.html
NCBI: http://www.ncbi.nlm.nih.gov/refseq/
GO Interactive: http://amigo.geneontology.org/amigo
KEGG Interactive: http://www.genome.jp/kegg/genes.html
PubMed: http://www.ncbi.nlm.nih.gov/pubmed
Bioinformatician downloads, installs, updates resources and writes scripts to….
![Page 24: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/24.jpg)
Bioinformatician writes custom scripts in Perl, AWK and Python to
Find significant genes/ elements
Compare analysis results
![Page 25: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/25.jpg)
Sequencing provider Sequence files
Experimental design
Quality Checks
Read Mapping to Genome Quan/fica/on of reads
Expression Analysis (e.g. DeSeq)
Bioinforma/cian downloads, installs/updates various tools and performs
Annota/on
Significant gene discovery
Result comparison
RNA-Seq experiment and analysis ….. and analysis ….
results
“What if?”
2 months later of hard work by Bioinformatician
Requires analysis to be repeated
![Page 26: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/26.jpg)
Enter Bioinformatics-in-a-BoxTM
Web-based tool for organized data management and analysis of NGS data
![Page 27: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/27.jpg)
Securely share: • results • analysis steps • work together!
365x24
results
Publish faster
Bioinforma/cs-‐in-‐a-‐Box!
Easily execute “what if” ques/ons
Support provid
ed every step o
f
the way to ensu
re success
Collaborate
A Bioinformatics tool for “Biologists” and Bioinformaticians with large workloads!
• Organized analysis ar/facts • Parameter tracking
Computa/on power/disk is in the cloud or on your hardware
![Page 28: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/28.jpg)
Example Start with an RNA-Seq Data set
n Six Samples n 3 Normal Prostate and 3 Prostate Adenocarcinoma
Samples
n SRA Project n SRP003611
n Publication n Nacu, S., et al., Deep RNA sequencing analysis of
readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med Genomics, 2011. 4: p. 11.
![Page 29: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/29.jpg)
Bioinformatics-in-a-BoxTM: Obtain the Data
n Load your own data or from SRA n Combine Technical Replicates
RNA Seq Experiment
![Page 30: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/30.jpg)
Bioinformatics-in-a-Box: Quality Check n A click to run FastQC n A click to run BLAST & align to NCBI All Genomes Database (nr)
![Page 31: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/31.jpg)
Bioinformatics-in-a-Box: Read Quantification
Integrated Tools: n Cufflinks: FPKM values n Ht-SeqCount: Hit Count values
![Page 32: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/32.jpg)
Bioinformatics-in-a-Box: Expression Analysis
Cuffdiff: takes FPKM n Genes, isoforms, TSS
edgeR: takes Hit Count n Genes or isoforms
DESeq: takes Hit Count n Genes or isoforms
![Page 33: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/33.jpg)
Bioinformatics-in-a-Box: Integrated Annotations
n ENSEMBL n NCBI n GO n KEGG n PubMed
![Page 34: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/34.jpg)
n ENSEMBL n NCBI n GO n KEGG n PubMed
Bioinformatics-in-a-Box: Integrated Annotations
![Page 35: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/35.jpg)
n ENSEMBL n NCBI n GO n KEGG n PubMed
Bioinformatics-in-a-Box: Integrated Annotations
![Page 36: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/36.jpg)
n ENSEMBL n NCBI n GO n KEGG n PubMed
Bioinformatics-in-a-Box: Integrated Annotations
![Page 37: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/37.jpg)
n PubMed
Bioinformatics-in-a-Box: Integrated Annotations
![Page 38: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/38.jpg)
Bioinformatics-in-a-Box:
Set filter criteria n P-value n Adjusted p-value n Fold change n Absolute expression
Save your subset of genes
Find Significant Genes/Elements
![Page 39: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/39.jpg)
Bioinformatics-in-a-Box: Compare Results
Conclusion: Biological replicates are preferable
Indicates too many false positives with single-sample comparisons
5369 1317
962
Single Sample (T1 vs. C1) vs. Biological Replicates (T1,2,3 vs. C1,2,3)
![Page 40: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/40.jpg)
Bioinformatics-in-a-Box: Compare DE Results
Indicates many differences between algorithms!
Conclusion: It is advisable to use multiple algorithms
DESeq vs. edgeR vs. Cuffdiff
![Page 41: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/41.jpg)
Bioinformatics-in-a-Box: Compare Results Limma vs. NGS Algorithms
Conclusion: Limma found genes undetected by NGS tools
![Page 42: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/42.jpg)
Bioinformatics-in-a-Box: Compare Results
Limma detects differential genes missed by edgeR & DESeq
Limma vs. NGS Algorithms
Conclusion: Traditional algorithms can be useful for analyzing NGS data
![Page 43: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/43.jpg)
DNA-Seq Mutation Analysis
![Page 44: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/44.jpg)
DNA-Seq Mutation Analysis: Analysis steps
1. Obtain and load data 2. Quality check 3. Align to genome
n Bowtie, Bowtie2, BWA
4. Check actual coverage (optional) 5. Mutation detection
n GATK, samtools, pindel
6. Compare results
![Page 45: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/45.jpg)
Start with Data set: Human Exome
n Enrichment: Agilent Sure Select v4 n Configuration: 2x100; Approximately
100 million reads n Theoretic average coverage: ~130x
![Page 46: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/46.jpg)
n Note quality drop-off after base 60
Bioinformatics-in-a-Box: Quality Check
![Page 47: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/47.jpg)
Bioinformatics-in-a-Box: Align to Genome
Set mapping parameters,
including trimming
Set pairing parameters
![Page 48: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/48.jpg)
Bioinformatics-in-a-Box: Check actual coverage Lower than theoretical, as expected
![Page 49: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/49.jpg)
Bioinformatics-in-a-Box Integrated tools for SNP Detection
n GATK: https://www.broadinstitute.org/gatk/ n Samtools: http://samtools.sourceforge.net/ n FreeBayes: https://github.com/ekg/freebayes
Longer INDELs (> ~10b) and other SV
• Pindel: http://gmt.genome.wustl.edu/pindel/current/
![Page 50: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/50.jpg)
Bioinformatics-in-a-BoxTM: Mutation detection Select an algorithm of choice
Set pre-processing options
![Page 51: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/51.jpg)
SNP & INDEL Detection by hand n Using scripts, Integrate Annotation
n dbSNP, 1000genomes: URL API is slow, recommend local database installation
n Classification snpEff: http://snpeff.sourceforge.net/ n Selection, result comparison
§ Algorithm-specific filtering § Perl, Python, etc.
• Using scripts, filter by location, coverage, quality, type of mutation, codon impact, protein impact, clinical impact
n Using scripts, compare results
![Page 52: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/52.jpg)
Bioinformatics-in-a-BoxTM: Integrated Annotations
n Known SNP n Location, gene (if
appropriate) n Codon, amino-acid, protein
impact
• Up-stream/down-stream sequences, quality, coverage, allele frequency
![Page 53: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/53.jpg)
Bioinformatics-in-a-BoxTM: SNP Details
![Page 54: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/54.jpg)
Bioinformatics-in-a-BoxTM: SNP Quality
Mutation Viewer
![Page 55: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/55.jpg)
Bioinformatics-in-a-BoxTM: Insertion
Mutation Viewer
![Page 56: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/56.jpg)
Bioinformatics-in-a-BoxTM: Deletion
Mutation Viewer
![Page 57: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/57.jpg)
Bioinformatics-in-a-BoxTM: Integrated Annotations • Known SNP • Location, gene (if appropriate) • Codon, amino-acid, protein impact
• Up-stream/down-stream sequences, quality, coverage, allele frequency
![Page 58: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/58.jpg)
Filter by quality, location, impact, etc.
Save dataset
Bioinformatics-in-a-Box Selecting SNPs and INDELs
![Page 59: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/59.jpg)
Bioinformatics-in-a-Box: Compare SNP results GATK versus Sam
Different algorithms generate different results
![Page 60: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/60.jpg)
Bioinformatics-in-a-Box: Compare SNP results using Samtools & BWA versus Bowtie2
Different ALIGNMENT algorithms generate different results
![Page 61: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/61.jpg)
Bioinformatics-in-a-BoxTM: Compare InDel results using Samtools & BWA vs. Bowtie2
Different ALIGNERS generate different results
![Page 62: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/62.jpg)
End of DNA Mutation Detection
![Page 63: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/63.jpg)
Bioinformatics-in-a-Box: 365x24 n Data analysis
n Peer reviewed algorithms n RNA-Seq, SNP Detection and Genotyping and miRNA-Seq n What if? Scenarios
n Data management n Linking all primary data, algorithms, genome references,
parameters with results n Breadcrumb trail of what has been done, with what
settings and versions (algorithms and references)
n Secure worldwide collaboration
n Hands-on support (and documentation if you must…)
![Page 64: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/64.jpg)
NCGR/NM-INBRE Bioinformatics Internship The National Center for Genome Resources & NM-INBRE Present June 15, 2015 - July 31, 2015 (tentative dates) 7-Week Intensive Program June 15 – June 26: 2 weeks of instruction June 29 – July 31: 5 weeks of hands-on projects including a presentation of your work Deadline to apply: 11:59pm Thursday, April 30, 2015 SPACE IS LIMITED Targeted towards: Grads and undergrads PREREQUSITE: The program requires some knowledge of UNIX and includes prerequisite reading and understanding of chapters 4 and 5 of the following: http://my.safaribooksonline.com/book/bioinformatics/1565926641
![Page 65: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/65.jpg)
Annual Educational Symposia
New Mexico BioInformatics, Science and Technology (NMBIST) Symposium
“Transcriptional Control”
March 26,27 2015 Drury Plaza Hotel, Santa Fe, NM
- Experts in the field - Student poster session - Student speaking slot competition - Highlights: Dr. Klemens Hartel
![Page 66: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/66.jpg)
Sequencing and Bioinformatics project ideas 1) Small genome sequencing and de novo assembly Draft assemblies for genomes up to 100Mb in size.
• Pacbio only sequencing and assembly • Illumina only assembly • Pacbio/Illumina/454 hybrid assembly approaches
2) PacBio sequencing and analysis, projects include
• IsoSeq pilot • Base Modification Detection
3) Illumina genomic sequencing and mutation detection 4) Illumina RNA-seq or miRNA-seq and expression analysis 5) Bioinformatics only 6) Custom
![Page 67: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/67.jpg)
Conclusion and discussion The NM-INBRE SBC has the resources and track record to advance your research:
• Sequencing: Illumina and PacBio technologies
• Bioinformatics: Standard pipelines and custom analysis
Work with VGN to impact science!
Please contact us to find out more at [email protected]!
![Page 68: Bioinforma)cs,in,a,Box. 04/18/15. - Vermont Genetics Network · Summer Bioinformatics Intensive Internship ! NM Bioinformatics, Science and Technology (NMBIST) ... conference ! Sequencing](https://reader033.fdocuments.in/reader033/viewer/2022052722/5f0bcb6c7e708231d4323fbf/html5/thumbnails/68.jpg)
Acknowledgments NCGR/NMINBRE Sequencing and Bioinformatics Core
Science/Bioinformatics Sequencing Lab *Anitha Sundararajan Peter Nagm *Johnny Sena Jennifer Jacobi Joann Mudge Pooja Umale Nico Devitt Thiru Ramaraj IT/Administration Stephanie Guida Forrest Black Connor Cameron Kathy Myers Andrew Farmer *Lisana Chavez Boris Umylny Callum Bell NIH NIGMS (5P20GM103451)