GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

23
ISMU pipeline for NGS data analysis and facilitating molecular breeding http://hpc.icrisat.cgiar.org/NGS/

description

 

Transcript of GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Page 1: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

ISMU pipeline for NGS data analysis and facilitating

molecular breeding

http://hpc.icrisat.cgiar.org/NGS/

Page 2: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• Short read length of sequences• Availability of many tools• Platform dependency and command line driven• No direct ways for prediction of SNPs between

genotypes• Quality scores vary depending on version and

technology

Challenges

Page 3: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

ISMU version 1

• SNP discovery from NGS data

– Pipeline for mapping / assembling

– Calling SNPs between genotypes

– Visualisation

Page 4: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

ISMU version 2

• Application of identified SNPs to breeding

Page 5: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• Benchmark available open source short reads assembly and downstream analysis programs/software.

• Assembly and polymorphism detection between genotypes and visualization

• Assay design (Illumina GoldenGate Assay), genotype calling and visualization and analysis of SNP genotyping and haplotype data

• Identify and use parental lines for using in MABC or MARS

• Discovery of SNP markers for use in foreground and background selection of MABC or MARS.

• Documentation of the pipeline and the integrated software.

Objectives of NGS Pipeline

Page 6: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Control Flowchart

ICRISATCROPS

YesNo

Input Data & validation

Upload Reference& data

Mapping (Maq,Novo)

Mapped reads

Assembly Visualization

Consensus calling

Report SNPs

• Extract sequences with SNPs• Design primers

• In silico validation by SNP2CAPS

DatabaseADT Score

G.G Assay

Bead Studio

Flapjack

Page 7: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Genotype 1 Genotype 2

Chrom1 Pos RefAllele Gtyp1 Gtyp25 303 A G ?

Maq NovoProgramme

SNP Bet Genotypes

Standard Methodology

Mapping Mapping

Assembly

SNP Callingag. Reference

ADT Scoring

Reporting

Remove duplicates

Check the inverse combination

Compare allele between genotypes

Base calling in 2nd genotype

Predicted SNPs against Reference

Page 8: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Customized Methodology (Consensus Base Calling-cc)

ccMaq ccNovo

SNP Calling

Genotype 1 Genotype 2

Programme

Inhouse Script

ADT scoring

Genotype 2fmaj=21/28=0.75

Genotype 1fmaj =38/40=0.95

Mapping Mapping

Page 9: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Consensus Base CallingParameters (Default)

• Max number of mismatches <= 7• Sum of mismatches score <=60• Min mapping quality =>0• Read depth threshold =>5• Major base frequency threshold => 0.75

Page 10: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

What if more than 2 genotypes?

Genotype1

Genotype2Genotype3

Genotype4

G1 G2 G3

G1 0 1 1

G2 0 0 1

G3 0 0 0

Combination of genotypes = (n2–n)/2

Page 11: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• Reads format fna and qual(Standard/Sanger)FastqSCARF fomatSolexa fastq, Solexa exportAB SOLiD read formatFASTA

• Reference sequenceChickpea transcript assemblyPearl millet transcript assemblyPigeonpea transcript assemblyMedicago genomeSorghum genome

NGS pipeline input data

Page 12: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

NGS pipeline (Input 1)

http://hpc.icrisat.cgiar.org/NGS/

Page 13: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

NGS pipeline (Input 2)

Page 14: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

NGS pipeline (Help page)

Page 15: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

NGS pipeline (Results)

Page 16: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

NGS pipeline (Visualisation)

Page 17: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Available in 2 Editions

1. Server Edition

2. Desktop Edition

Pipeline Editions

Page 18: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• User friendly web interface– Installation on following Linux platform

• Fedora 13• Cent OS 5

• Clients can be any OS with a web browser• Communication resources

• SMTP (Email)

• Session specific job processing- Avoid file over writing

Server Edition

Page 19: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Desktop Edition

• All functionalities of Server Edition on a Desktop

• Supported OS

• Fedora 13

• RHEL 5

• Single command installation

• Available in Installable CD

Page 20: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Future plans

•Consideration of new tools to integrate / update eg: BWA, Bowtie

•Implementation of the extension to the pipeline

•Evaluate cloud computing and high performance computing cluster options

•Initiatives such as iPlant (discovery environment – genotype to phenotype)

Page 21: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• Identification ofappropriate modules forMARS, GWS and GBS

• Integration of MARS andGWS module

• Linking of ISMU pipelinewith DMS of IBP

• Documentation & Trainingof ISMU pipeline

Future Plans: ISMU v 2

Page 22: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

Internet

Architecture

ReferenceSequences

Velvet

Perl Prog

Maq

Novo

CGISNP Database

Files downloading

DynamicQuerying

AssemblyVisualization

Input datavalidation

NGS Data Analysis pipeline at ICRISAT

Apache ServerHosting Web

Pages

SMTPServer

Page 23: GRM 2011: ISMU pipeline for NGS data analysis and facilitating molecular breeding

• Rajeev K. Varshney

• Abhishek Rathore

• Jayashree B

• Vivek Thakur

• R. Pradeep

• A. Bhanu Prakash

• Sarwar Azam

• G.Meenakshi

• David Marshall

• Iain Milne

Contributors

• Jonathan Jones

• David Studholme

• Greg May

• Andrew Farmer

• Jimmy Woodward

• Dave Edwards