ICAR Soybean Indore 2014

59
Surya Saha Cornell University & Boyce Thompson Institute [email protected] @SahaSurya Directorate of Soybean Research, Indore June 7,2014 Slides: http://bit.ly/Soybean_Indore_2014 http://www.acgt.me/blog/2014/3/7/next-generation-sequencing-must-die

description

Presented on June 7 at Directorate of Soybean Research Khandwa Road, Indore, M.P, India - 452 001 http://www.nrcsoya.nic.in/default.htm

Transcript of ICAR Soybean Indore 2014

Page 2: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 2

You are free to:

Copy, share, adapt, or re-mix;

Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:

You attribute the work to its author and respect the rights

and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with

permission from originals by Christopher Ross. Original images are available under GPL at

http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

Page 3: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 3

Sequencing

Page 4: ICAR Soybean Indore 2014

19

53

DNA Structure discovery

19

77

20

12

Sanger DNA sequencing by chain-terminating inhibitors

19

84

Epstein-Barr virus

(170 Kb)

19

87

Abi370

Sequencer

19

95

20

01

Homo sapiens (3.0 Gb)

20

05

454

Solexa

Solid

20

07

20

11

Ion Torrent

PacBio

Haemophilus influenzae (1.83 Mb)

20

13

Slide credit: Aureliano Bombarely

Sequencing over the Ages

Illumina

Illumina Hiseq X

454

6/7/2014 Directorate of Soybean Research, Indore 4

Pinus taeda

(24 Gb)

20

14

MinION

The Next Generation

Page 5: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 5

Its all about the $£€¥

http://www.genome.gov/sequencingcosts/

Page 6: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 6

First generation sequencing

Page 7: ICAR Soybean Indore 2014

Sanger method

6/6/2014 Directorate of Soybean Research, Indore 7

Frederick Sanger 13 Aug 1918 – 19 Nov 2013 Won the Nobel Prize for Chemistry in 1958 and 1980. Published the dideoxy chain termination method or “Sanger method” in 1977

http://dailym.ai/1f1XeTB

Page 8: ICAR Soybean Indore 2014

Sanger method

6/6/2014 Directorate of Soybean Research, Indore 8

http://bit.ly/1g6Cudq

http://bit.ly/1lcQO4J

Page 9: ICAR Soybean Indore 2014

First generation sequencing

• Very high quality sequences (99.999%)

• Very low throughput

6/6/2014 Directorate of Soybean Research, Indore 9

Run Time Read Length Reads / Run

Total

nucleotides

sequenced

Cost / MB

Capillary

Sequencing

(ABI3730xl)

20m-3h 400-900 bp 96 or 386 1.9-84 Kb $2400

http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd

Page 10: ICAR Soybean Indore 2014

Next generation sequencing

6/6/2014 Directorate of Soybean Research, Indore 10

Page 13: ICAR Soybean Indore 2014

454 Pyrosequencing

One purified DNA fragment, to one bead, to one read.

6/6/2014 Directorate of Soybean Research, Indore 13

http://bit.ly/1ehwxWN

GS FLX Titanium

http://bit.ly/1ehAcEh

Page 14: ICAR Soybean Indore 2014

Illumina

6/6/2014 Directorate of Soybean Research, Indore 14

Output 15 Gb 120 GB 1000 GB 1800 GB

Number of Reads

25 Million 400 Million 4 Billion 6 Billion

Read Length

2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)

2x150 bp

Cost $99K $250K $740K $10M

Source: Illumina

Page 15: ICAR Soybean Indore 2014

Illumina

6/6/2014 Directorate of Soybean Research, Indore 15

Output 15 Gb 120 GB 1000 GB 1800 GB

Number of Reads

25 Million 400 Million 4 Billion 6 Billion

Read Length

2x300 bp 2x150 bp 2x125 bp (2x250 update mid-2014)

2x150 bp

Cost $99K $250K $740K $10M

Source: Illumina

$1000 human genome??

Page 16: ICAR Soybean Indore 2014

Illu

min

a

6/6/2014 Directorate of Soybean Research, Indore 16 http://1.usa.gov/1fP9ybl

Page 18: ICAR Soybean Indore 2014

Pacific Biosciences SMRT sequencing

Single Molecule Real Time sequencing

6/6/2014 Directorate of Soybean Research, Indore 18

http://bit.ly/1naxgTe

Page 19: ICAR Soybean Indore 2014

Pacific Biosciences SMRT sequencing Error correction methods

6/6/2014 Directorate of Soybean Research, Indore 19

Hierarchical genome-assembly process (HGAP)

PB

Jelly

Enlish et al., PLOS One. 2012

PBJelly

Page 20: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 20

Pacific Biosciences SMRT sequencing Read Lengths

http://www.igs.umaryland.edu/labs/grc/

Mean Read Length: 8391 bp Maximum Subread Length: 24585 bp

Page 21: ICAR Soybean Indore 2014

Oxford Nanopore

6/6/2014 Directorate of Soybean Research, Indore 21

https://www.nanoporetech.com/

• No data yet

• Error model

http://erlichya.tumblr.com/post/66376172948/hands-on-experience-with-oxford-nanopore-minion

Page 22: ICAR Soybean Indore 2014

Others

• Ion Torrent Proton/PGM

• Nabsys

• SOLiD

6/6/2014 Directorate of Soybean Research, Indore 22

Page 23: ICAR Soybean Indore 2014

Comparison

6/6/2014 Directorate of Soybean Research, Indore 23

Page 24: ICAR Soybean Indore 2014

Next generation sequencing

6/6/2014 Directorate of Soybean Research, Indore 24

Run Time Read Length Quality

Total

nucleotides

sequenced

Cost /MB

454

Pyrosequencing 24h 700 bp Q20-Q30 0.7 GB $10

Illumina Miseq 27h 2x250bp > Q30 15 GB $0.15

Illumina Hiseq

2500 11days 2x125bp >Q30 1000 GB $0.05

Ion torrent 2h 400bp >Q20 50MB-1GB $1

Pacific

Biosciences 2h 5.5-8.5kb

>Q30 consensus

>Q10 single

400-800MB

/SMRT cell $0.33-$1

http://bit.ly/1clLps3 http://1.usa.gov/1cLqIRd

Page 25: ICAR Soybean Indore 2014

http://omicsmaps.com/

Next Generation Genomics: World Map of High-throughput Sequencers

Directorate of Soybean Research, Indore 6/6/2014 25

Page 26: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 26

http://bit.ly/18pfUId

Page 27: ICAR Soybean Indore 2014

Real cost of Sequencing!!

Sboner, Genome Biology, 2011

6/7/2014 27 Directorate of Soybean Research, Indore

Page 28: ICAR Soybean Indore 2014

Library Types

Single end

Pair end (PE, 150-800 bp, Fwd:/1, Rev:/2)

Mate pair (MP, 2Kb to 20 Kb)

6/6/2014 Directorate of Soybean Research, Indore 28

F

F R

F R 454/Roche

F R Illumina

Illumina

Slide credit: Aureliano Bombarely

Page 29: ICAR Soybean Indore 2014

Implications of Choice of Library

6/6/2014 Directorate of Soybean Research, Indore 29 Slide credit: Aureliano Bombarely

Consensus sequence

(Contig)

Reads

Scaffold

(or Supercontig)

Pair Read information

NNNNN

Pseudomolecule

(or ultracontig)

F

Genetic information (markers)

NNNNN NN

Page 30: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 30

Quality control: Encoding

http://bit.ly/N28yUd

Phred score of a base is: Qphred = -10 log10 (e)

where e is the estimated probability of a base being incorrect

Page 31: ICAR Soybean Indore 2014

Which technology to use??

• Microbial genomes

• Eukaryotic genomes

• Resequencing genomes

• RNAseq and other XXXseq methods

6/6/2014 Directorate of Soybean Research, Indore 31

http://bit.ly/1ko9Kgh

Page 32: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 32

SOL Genomics Network

Page 33: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 33

Page 34: ICAR Soybean Indore 2014

The SGN Team!!

6/6/2014 Directorate of Soybean Research, Indore 34

Surya Saha, Tom Fisher-York, Hartmut Foerster, Suzy Strickler, Jeremy Edwards,

Noe Fernandez, Naama Menda, Aure Bombarely, Aimin Yan, Isaak Tecle

Page 35: ICAR Soybean Indore 2014

What's new on SGN?

• Tomato genome release 2.5 • Incorporates results from FISH • Nicotiana benthamiana genome sequence • Genome sequence and annotation • VIGS Tool • Select specific probes for VIGS • New BLAST interface • New Breeder functions • Later this year: Tomato genome release 3.0

6/6/2014 Directorate of Soybean Research, Indore 35

Page 36: ICAR Soybean Indore 2014

SGN Website

6/6/2014 Directorate of Soybean Research, Indore 36

http://solgenomics.net

Page 37: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 37

Main web page (front page):

WEB ICONS

TOOL BAR

Page 38: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 38

Main web page (front page):

TOOL BAR

(MENUS)

Page 39: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 39

But the DATA also can be edited

Locus Locus Editor Data

Community Data Curation

Page 40: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 40

You need • SGN account. • Activate submitter / Locus Editor privileges by SGN curator

Locus Locus Editor Data

Page 41: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 41

Tools

Page 42: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 42

Genome Browser

Page 43: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 43

Genomes in SGN

Page 44: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 44

Page 45: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 45

CassavaBase

Page 46: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 46

Cassava

● Tropical and subtropical regions

● Mainly grown for starchy roots

● Native to South America

● Major crop in Africa

● Food for 500 million people around the world

● Clonally propagated

● Accumulates toxic cyanogenic glucosides

● Requires processing before consumption

Page 47: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 47

NextGen Cassava Project

● Project: Adapt SGN database for Cassava Breeding

● Goal: Apply Genomic Selection to cassava breeding

● Predict breeding values from genotype information

● Shorten the breeding cycle

● Massive amounts of genotypic data (GBS)

● Phenotypic data

● Data management challenge

● Improve flowering

● http://nextgencassava.org

Page 48: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 48

CassavaBase

http://cassavabase.org/

Page 49: ICAR Soybean Indore 2014

SGN/Cassavabase behind the scenes

6/7/2014 Directorate of Soybean Research, Indore 49

● Perl/Catalyst MVC Framework

● PostgreSQL Database

● Generic Model Organism Database (GMOD)

– Chado relational database schema

– GBrowse

– JBrowse

● R

– Experimental design

– QTL mapping

– Genomic selection

Page 50: ICAR Soybean Indore 2014

Objectives

Provide cassava breeders and researchers access to data and tools in a centralized, user-friendly and reliable database.

– Improve partner breeding program information tracking

– Streamline management of genotypic and phenotypic data

– Pipeline genotypic and phenotypic data through Genomic Selection prediction analyses

6/7/2014 Directorate of Soybean Research, Indore 50

Page 51: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 51

Genomic Selection

The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. From Heffner et al. 2009 Crop Sci. 49:1–12

Information from a majority of lines in the breeding population (the training set) is used to create the prediction model. The model is then used to predict the phenotypes of the remaining lines (the validation set), using genotypic information only. The results from the model are compared to the actual data to give the prediction accuracy. Image courtesy of Martha Hamblin, Cornell University

Flow diagram of a genomic selection breeding program. Breeding cycle time is shortened by removing phenotypic evaluation of lines before selection as parents for the next cycle. From Heffner et al. 2009 Crop Sci. 49:1–12

Slide credit: Jeremy Edwards

Page 52: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 52

Data collection in the field

● Android tablets

● Field book app

– Jesse Poland's group at

USDA-ARS / Kansas

State University

Slide credit: Jeremy Edwards

Page 53: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 53

● Tassel 4 pipeline from Ed Bucker's group

● Discovery vs production

● Filtering

● Imputation

● Storing in Cassavabase

Slide credit: Jeremy Edwards

Genotyping by sequencing (GBS)

Page 54: ICAR Soybean Indore 2014

Genotyping by sequencing (GBS)

6/7/2014 Directorate of Soybean Research, Indore 54

Page 55: ICAR Soybean Indore 2014

6/7/2014 Directorate of Soybean Research, Indore 55

SolGS: A tool for genomic selection

Phenotyped

& Genotyped Lines

Prediction Model

Predicted Breeding Values

Genotyped Lines

Slide credit: Jeremy Edwards

Page 56: ICAR Soybean Indore 2014

Cassava Trait Ontology

6/7/2014 Directorate of Soybean Research, Indore 56

Kulakow et al. 2011

Kulakow et al. 2011

● Standard terminology ● Facilitate the sharing of information ● Allow users to query keywords related to traits

Slide credit: Jeremy Edwards

Page 57: ICAR Soybean Indore 2014
Page 58: ICAR Soybean Indore 2014

6/6/2014 Directorate of Soybean Research, Indore 58

Position available at Solgenomics

Cassavabase project

Plant Breeding + Bioinformatician

● Familiar with breeding

● Programming in Perl, R, SQL, Hadoop

● Linux

● Africa

● Genius

http://www.cassavabase.org/forum/posts.pl?topic_id=9

Page 59: ICAR Soybean Indore 2014

Thank you!! Questions??

6/6/2014 Directorate of Soybean Research, Indore 59