ICAR Soybean Indore 2014
-
Upload
surya-saha -
Category
Science
-
view
381 -
download
1
description
Transcript of ICAR Soybean Indore 2014
Surya Saha Cornell University & Boyce Thompson Institute
[email protected] @SahaSurya
Directorate of Soybean Research, Indore June 7,2014
Slides: http://bit.ly/Soybean_Indore_2014
http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die
6/6/2014 Directorate of Soybean Research, Indore 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with
permission from originals by Christopher Ross. Original images are available under GPL at
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
6/7/2014 Directorate of Soybean Research, Indore 3
Sequencing
19
53
DNA Structure discovery
19
77
20
12
Sanger DNA sequencing by chain-terminating inhibitors
19
84
Epstein-Barr virus
(170 Kb)
19
87
Abi370
Sequencer
19
95
20
01
Homo sapiens (3.0 Gb)
20
05
454
Solexa
Solid
20
07
20
11
Ion Torrent
PacBio
Haemophilus influenzae (1.83 Mb)
20
13
Slide credit: Aureliano Bombarely
Sequencing over the Ages
Illumina
Illumina Hiseq X
454
6/7/2014 Directorate of Soybean Research, Indore 4
Pinus taeda
(24 Gb)
20
14
MinION
The Next Generation
6/6/2014 Directorate of Soybean Research, Indore 5
Its all about the $£€¥
http://www.genome.gov/sequencingcosts/
6/6/2014 Directorate of Soybean Research, Indore 6
First generation sequencing
Sanger method
6/6/2014 Directorate of Soybean Research, Indore 7
Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977
http://dailym.ai/1f1XeTB
Sanger method
6/6/2014 Directorate of Soybean Research, Indore 8
http://bit.ly/1g6Cudq
http://bit.ly/1lcQO4J
First generation sequencing
• Very high quality sequences (99.999%)
• Very low throughput
6/6/2014 Directorate of Soybean Research, Indore 9
Run Time Read Length Reads / Run
Total
nucleotides
sequenced
Cost / MB
Capillary
Sequencing
(ABI3730xl)
20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400
http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
Next generation sequencing
6/6/2014 Directorate of Soybean Research, Indore 10
6/6/2014 Directorate of Soybean Research, Indore 11
http://bit.ly/1keDtZQ
• Second generation • Third generation • Fourth generation • Next-next-generation • Next-next-next
generation http://www.acgt.me/blog/2014/3/10/next-generation-sequencing-must-diepart-2
Use the specific technology used to generate the data
– Illumina Hiseq/Miseq/NextSeq
– Pacific Biosciences RS I/RS II
– Ion Torrent Proton/PGM
– SOLiD
– 454
6/6/2014 Directorate of Soybean Research, Indore 12
http://www.acgt.me/blog/2014/3/10/next-generation-sequencing-must-diepart-2
454 Pyrosequencing
One purified DNA fragment, to one bead, to one read.
6/6/2014 Directorate of Soybean Research, Indore 13
http://bit.ly/1ehwxWN
GS FLX Titanium
http://bit.ly/1ehAcEh
Illumina
6/6/2014 Directorate of Soybean Research, Indore 14
Output 15 Gb 120 GB 1000 GB 1800 GB
Number of Reads
25 Million 400 Million 4 Billion 6 Billion
Read Length
2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
Illumina
6/6/2014 Directorate of Soybean Research, Indore 15
Output 15 Gb 120 GB 1000 GB 1800 GB
Number of Reads
25 Million 400 Million 4 Billion 6 Billion
Read Length
2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)
2x150 bp
Cost $99K $250K $740K $10M
Source: Illumina
$1000 human genome??
Illu
min
a
6/6/2014 Directorate of Soybean Research, Indore 16 http://1.usa.gov/1fP9ybl
Illu
min
a: M
ole
culo
6/6/2014 Directorate of Soybean Research, Indore 17
http://bit.ly/1aEPOBn
Pacific Biosciences SMRT sequencing
Single Molecule Real Time sequencing
6/6/2014 Directorate of Soybean Research, Indore 18
http://bit.ly/1naxgTe
Pacific Biosciences SMRT sequencing Error correction methods
6/6/2014 Directorate of Soybean Research, Indore 19
Hierarchical genome-assembly process (HGAP)
PB
Jelly
Enlish et al., PLOS One. 2012
PBJelly
6/6/2014 Directorate of Soybean Research, Indore 20
Pacific Biosciences SMRT sequencing Read Lengths
http://www.igs.umaryland.edu/labs/grc/
Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp
Oxford Nanopore
6/6/2014 Directorate of Soybean Research, Indore 21
https://www.nanoporetech.com/
• No data yet
• Error model
http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion
Others
• Ion Torrent Proton/PGM
• Nabsys
• SOLiD
6/6/2014 Directorate of Soybean Research, Indore 22
Comparison
6/6/2014 Directorate of Soybean Research, Indore 23
Next generation sequencing
6/6/2014 Directorate of Soybean Research, Indore 24
Run Time Read Length Quality
Total
nucleotides
sequenced
Cost /MB
454
Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10
Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15
Illumina Hiseq
2500 11days 2x125bp >Q30 1000 GB $0.05
Ion torrent 2h 400bp >Q20 50MB-1GB $1
Pacific
Biosciences 2h 5.5-8.5kb
>Q30 consensus
>Q10 single
400-800MB
/SMRT cell $0.33-$1
http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd
http://omicsmaps.com/
Next Generation Genomics: World Map of High-throughput Sequencers
Directorate of Soybean Research, Indore 6/6/2014 25
Real cost of Sequencing!!
Sboner, Genome Biology, 2011
6/7/2014 27 Directorate of Soybean Research, Indore
Library Types
Single end
Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)
Mate pair (MP, 2Kb to 20 Kb)
6/6/2014 Directorate of Soybean Research, Indore 28
F
F R
F R 454/Roche
F R Illumina
Illumina
Slide credit: Aureliano Bombarely
Implications of Choice of Library
6/6/2014 Directorate of Soybean Research, Indore 29 Slide credit: Aureliano Bombarely
Consensus sequence
(Contig)
Reads
Scaffold
(or Supercontig)
Pair Read information
NNNNN
Pseudomolecule
(or ultracontig)
F
Genetic information (markers)
NNNNN NN
6/6/2014 Directorate of Soybean Research, Indore 30
Quality control: Encoding
http://bit.ly/N28yUd
Phred score of a base is: Qphred = -10 log10 (e)
where e is the estimated probability of a base being incorrect
Which technology to use??
• Microbial genomes
• Eukaryotic genomes
• Resequencing genomes
• RNAseq and other XXXseq methods
6/6/2014 Directorate of Soybean Research, Indore 31
http://bit.ly/1ko9Kgh
6/7/2014 Directorate of Soybean Research, Indore 32
SOL Genomics Network
6/6/2014 Directorate of Soybean Research, Indore 33
The SGN Team!!
6/6/2014 Directorate of Soybean Research, Indore 34
Surya Saha, Tom Fisher-York, Hartmut Foerster, Suzy Strickler, Jeremy Edwards,
Noe Fernandez, Naama Menda, Aure Bombarely, Aimin Yan, Isaak Tecle
What's new on SGN?
• Tomato genome release 2.5 • Incorporates results from FISH • Nicotiana benthamiana genome sequence • Genome sequence and annotation • VIGS Tool • Select specific probes for VIGS • New BLAST interface • New Breeder functions • Later this year: Tomato genome release 3.0
6/6/2014 Directorate of Soybean Research, Indore 35
SGN Website
6/6/2014 Directorate of Soybean Research, Indore 36
http://solgenomics.net
6/6/2014 Directorate of Soybean Research, Indore 37
Main web page (front page):
WEB ICONS
TOOL BAR
6/6/2014 Directorate of Soybean Research, Indore 38
Main web page (front page):
TOOL BAR
(MENUS)
6/6/2014 Directorate of Soybean Research, Indore 39
But the DATA also can be edited
Locus Locus Editor Data
Community Data Curation
6/6/2014 Directorate of Soybean Research, Indore 40
You need • SGN account. • Activate submitter / Locus Editor privileges by SGN curator
Locus Locus Editor Data
6/6/2014 Directorate of Soybean Research, Indore 41
Tools
6/6/2014 Directorate of Soybean Research, Indore 42
Genome Browser
6/6/2014 Directorate of Soybean Research, Indore 43
Genomes in SGN
6/6/2014 Directorate of Soybean Research, Indore 44
6/7/2014 Directorate of Soybean Research, Indore 45
CassavaBase
6/7/2014 Directorate of Soybean Research, Indore 46
Cassava
● Tropical and subtropical regions
● Mainly grown for starchy roots
● Native to South America
● Major crop in Africa
● Food for 500 million people around the world
● Clonally propagated
● Accumulates toxic cyanogenic glucosides
● Requires processing before consumption
6/7/2014 Directorate of Soybean Research, Indore 47
NextGen Cassava Project
● Project: Adapt SGN database for Cassava Breeding
● Goal: Apply Genomic Selection to cassava breeding
● Predict breeding values from genotype information
● Shorten the breeding cycle
● Massive amounts of genotypic data (GBS)
● Phenotypic data
● Data management challenge
● Improve flowering
● http://nextgencassava.org
6/7/2014 Directorate of Soybean Research, Indore 48
CassavaBase
http://cassavabase.org/
SGN/Cassavabase behind the scenes
6/7/2014 Directorate of Soybean Research, Indore 49
● Perl/Catalyst MVC Framework
● PostgreSQL Database
● Generic Model Organism Database (GMOD)
– Chado relational database schema
– GBrowse
– JBrowse
● R
– Experimental design
– QTL mapping
– Genomic selection
Objectives
Provide cassava breeders and researchers access to data and tools in a centralized, user-friendly and reliable database.
– Improve partner breeding program information tracking
– Streamline management of genotypic and phenotypic data
– Pipeline genotypic and phenotypic data through Genomic Selection prediction analyses
6/7/2014 Directorate of Soybean Research, Indore 50
6/7/2014 Directorate of Soybean Research, Indore 51
Genomic Selection
The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. From Heffner et al. 2009 Crop Sci. 49:1–12
Information from a majority of lines in the breeding population (the training set) is used to create the prediction model. The model is then used to predict the phenotypes of the remaining lines (the validation set), using genotypic information only. The results from the model are compared to the actual data to give the prediction accuracy. Image courtesy of Martha Hamblin, Cornell University
Flow diagram of a genomic selection breeding program. Breeding cycle time is shortened by removing phenotypic evaluation of lines before selection as parents for the next cycle. From Heffner et al. 2009 Crop Sci. 49:1–12
Slide credit: Jeremy Edwards
6/7/2014 Directorate of Soybean Research, Indore 52
Data collection in the field
● Android tablets
● Field book app
– Jesse Poland's group at
USDA-ARS / Kansas
State University
Slide credit: Jeremy Edwards
6/7/2014 Directorate of Soybean Research, Indore 53
● Tassel 4 pipeline from Ed Bucker's group
● Discovery vs production
● Filtering
● Imputation
● Storing in Cassavabase
Slide credit: Jeremy Edwards
Genotyping by sequencing (GBS)
Genotyping by sequencing (GBS)
6/7/2014 Directorate of Soybean Research, Indore 54
6/7/2014 Directorate of Soybean Research, Indore 55
SolGS: A tool for genomic selection
Phenotyped
& Genotyped Lines
Prediction Model
Predicted Breeding Values
Genotyped Lines
Slide credit: Jeremy Edwards
Cassava Trait Ontology
6/7/2014 Directorate of Soybean Research, Indore 56
Kulakow et al. 2011
Kulakow et al. 2011
● Standard terminology ● Facilitate the sharing of information ● Allow users to query keywords related to traits
Slide credit: Jeremy Edwards
6/6/2014 Directorate of Soybean Research, Indore 58
Position available at Solgenomics
Cassavabase project
Plant Breeding + Bioinformatician
● Familiar with breeding
● Programming in Perl, R, SQL, Hadoop
● Linux
● Africa
● Genius
http://www.cassavabase.org/forum/posts.pl?topic_id=9
Thank you!! Questions??
6/6/2014 Directorate of Soybean Research, Indore 59