TLI 2012: Data flows in integrated breeding

27
Data Flows in Integrated Breeding Graham McLaren

description

 

Transcript of TLI 2012: Data flows in integrated breeding

Page 1: TLI 2012: Data flows in integrated breeding

Data Flows in Integrated Breeding

Graham McLaren

Page 2: TLI 2012: Data flows in integrated breeding

Principles of DM for Integrated Breeding (IB)

IB requires high standards of sample and pedigree identification,

it requires integration of field and lab data, and quality is of paramount importance. Data collected during breeding processes

has immediate value for breeders and it also has cumulative value over years and

populations.

Page 3: TLI 2012: Data flows in integrated breeding

Information Cycle for Crop Improvement

Public Crop Informationaccessible via internet

Genetic Resources Information

Systems

Genomicsand

Genetics Databases

Crop Lead CentersCuration, integration and publication

of Public Crop Information

Breeding Inform

aticsC

omm

unity of Practice

InstitutionalCIS

NationalCIS

ProjectCIS

PrivateCIS

ARILocal CIS

NARSLocal CIS

NetworksLocal CIS

SMEsLocal CIS

Shared Information management Practices

Page 4: TLI 2012: Data flows in integrated breeding

Compatibility of DM Schemes

Users may have existing DM systems which need to be accommodated.

DM needs to be compatible across all members working on the same project.

Use of analysis and decision support tools and sharing of data with partners requires data to be formatted and stored in defined ways.

Training and support in DM and analysis is essential for IB projects

Page 5: TLI 2012: Data flows in integrated breeding

Breeding Partner 1

Breeding Partner 1Breeding Partner 2

Breeding Partner 3Breeding Partner n

Copy ofProject

Database

Data manager (DM):•Database management•Breeding logistics•Fieldbook preparation•Data entry/checking•Data management

Breeding Project n

Breeding Project 1Breeding Project 2

Breeding Project 3

Project data management

ProjectData

LocalBreeding

Data

Central DB curator:•QA for public data•Curation and integration•Distribution to projects•Publication on Internet•Global Trait Dictionary•Catalogue of Templates•Training of DMs and Curators

Update to project database

PublicDatabase

PublicCrop

Information

Crop lead Center n

Project data curator:•QA for project data•Curation and integration•Distribution to partners•Project Trait Dictionary•Fieldbook Templates•Update to public DB•Download of public DB•Training of partner DMs

Breeding datamanagement

Project databaseshared

Central database< shared and published >

Public CropCentral

Database

Breeding Data Flows

Page 6: TLI 2012: Data flows in integrated breeding

Genetic Resources

Improved Lines

Parental Material

Crossing Block

Nursery 1

Nursery 2

Evaluation TrialsGRSS

Cultivarsand

breeding lines

High densitygenotypingPhenotypic

characterization

High densitygenotypingPhenotypicevaluation

Multi-locationtesting

Breeding Inform

ation system

Public C

rop Information

STLIMS

FDM

MSL

TSL

A&DSChoose parental material based on haplotype values, known genes, traits and adaptation

A&DSDevelop crossing scheme based on genotype and phenotype compatibility

STLIMS

FDM

MSL

TSL

PIMPedigree information updated

Selection of lines based on QTL analysis / estimation of marker breeding valuesA&DS

Markergenotyping

PIM

ST LIMS MSL

A&DS

Pedigree information updated

Selection on index of marker values

ST FDM TSL

PIM

A&DSSelection of improved lines based on trait improvement and adaptation

Pedigree information updated

GRSS

GRSS

MSL

TSL

KeyInformation System

ST

LIMS

FDM

A&DS

PIM

SampleTracking

Pedigree Information

Laboratory Information

Field Data

Analysis & Decision Support

GeneticResourceService

Marker Service

TraitService

Platform Services

n cycles of selection and recombination

Interaction of breeding workflow and platform elements

Page 7: TLI 2012: Data flows in integrated breeding

The IBP Configurable Workflow System

Breeding Activities

Parental selectionCrossingPopulation development

GermplasmManagement

Open ProjectSpecify objectivesIdentify teamData resourcesDefine strategy

Project Planning

Experimental DesignFieldbook productionData collectionData loading

GermplasmEvaluation

Marker selectionFingerprintingGenotypingData loading

MolecularAnalysis

Quality AssuranceTrait analysisGenetic AnalysisQTL AnalysisIndex Analysis

DataAnalysis

Selected linesRecombinesRecombination plans

BreedingDecisions

MB design tool,Cross predictionand Strategic simulation

Breeding ProjectPlanning

Breeding nurseryand pedigreerecordmanagement

Breeding Management

System

Trial field bookand environment characterizationsystem

Field Trial Management

System

Genotypic DataManagement

System

Statistical analysisapplications andselection indices

AnalyticalPipeline

MABCMASMARSGWS

Decision Support System

Breeding Applications

Lab book,quality assuranceand diversityanalysis

Page 8: TLI 2012: Data flows in integrated breeding

The Breeding Management System

Breeding Management

System

ST

•Nursery Management•Characterization lists•Pedigree maintenance•Evaluation lists•Seed Inventory

Genotypic DataManagement

System

Field TrialManagement

System

Page 9: TLI 2012: Data flows in integrated breeding

Sample TrackingST

Page 10: TLI 2012: Data flows in integrated breeding

Characterizationlists

Genotyping Data Management System

Genotypic DataManagement

SystemBreeding

ManagementSystem

•Planting list•Sample list

LIMS

•Genotyping Data•Quality Assurance

STAnalyticalPipeline

Data Transformation-Genotyping Database-Application file formats

Page 11: TLI 2012: Data flows in integrated breeding

Tracking Genotyping SamplesST

Page 12: TLI 2012: Data flows in integrated breeding

Genotyping order formLIMS

Page 13: TLI 2012: Data flows in integrated breeding

Genotyping results:LIMS

Page 14: TLI 2012: Data flows in integrated breeding

Evaluation lists

Field Trial Management System

Field Trial Management

SystemBreeding

ManagementSystem

•Fieldbook preparation

Data Collection-Hand-held devises-Automatic measurement

•Environmentalcharacterization•Quality Assurance•Phenotyping data

AnalyticalPipeline

Data Transformation-Phenotyping Database-Application file formats

Experimental design and randomization

CWSConfiguration

System

Trait templates

Page 15: TLI 2012: Data flows in integrated breeding

The Trial Template

Page 16: TLI 2012: Data flows in integrated breeding
Page 17: TLI 2012: Data flows in integrated breeding
Page 18: TLI 2012: Data flows in integrated breeding

Diversity scoresPedigree treesCOP matricesPhenotype meansGenotype BLUPSStability measuresAdaptation scoresMarker scoresGenetic distanceGenetic mapsQTL estimates

Analytical Pipeline

AnalyticalPipeline

Genotypic DataManagement

System

•Genotyping QA•Diversity analysis•Genetic mapping•Phenotyping QA•Single site analysis•Multi site analysis•GxE Analysis•QTL Analysis•QTLxE Analysis

Field TrialManagement

System

Phenotyping data

Decision Support Tools

Genotyping data

Page 19: TLI 2012: Data flows in integrated breeding

Genotyping scores:LIMS

Page 20: TLI 2012: Data flows in integrated breeding

Diversity scoresPedigree treesCOP matricesPhenotype meansGenotype BLUPSStability measuresAdaptation scoresMarker scoresGenetic distanceGenetic mapsQTL estimates

Decision Support and Simulation

Decision Support Tools

•MBDT•Breeding indices•OptiMas

AnalyticalPipeline

Simulation Tools

•QuLine•QuHybrid•QuMARS•QuGene

Breeding Decisions

Germplasm lists forcharacterizationForeground markersBackground markersTarget genotypesDonor germplasmRecipient germplasmRanked germplasmSelection listsParental listsCrossing schemes

Population sizesSelection intensityMarker densitiesCrossing schemesSelection schemes

Trait selectionGE targetingOptimal breeding systems

Genetic modelsGE systemsBreeding methods

Page 21: TLI 2012: Data flows in integrated breeding

ICIS COP matrix

Lower Triangular part of Coefficient of Parentage MatrixROWID COLID ROWNO COLNO COP Optional Labels

50533 50533 1 1 0.9577 "IR 64" "IR 64"70125 50533 2 1 0.2231 "IR 72" "IR 64"70125 70125 2 2 0.9896 "IR 72" "IR 72"11105 50533 3 1 0.1872 "IR 36" "IR 64"11105 70125 3 2 0.5108 "IR 36" "IR 72"11105 11105 3 3 0.9478 "IR 36" "IR 36"

Lower Triangular part of Inverse Coefficient of Parentage MatrixROWID COLID ROWNO COLNO INV-COP Optional Labels

50533 50533 1 1 1.1113776 "IR 64" "IR 64"70125 50533 2 1 -0.1900738 "IR 72" "IR 64"70125 70125 2 2 1.4324875 "IR 72" "IR 72"11105 50533 3 1 -0.1170834 "IR 36" "IR 64"11105 70125 3 2 -0.7344297 "IR 36" "IR 72"11105 11105 3 3 1.4739708 "IR 36" "IR 36"

Page 22: TLI 2012: Data flows in integrated breeding

Flapjack QTL Information File

Compulsory FieldsQTLChromosomePositionMinimumMaximumTraitExperiment

Optional FieldsAddEffectsAddSEMinlog10(P)%VarExplainedPosMinFMPosMaxFMLFMRFM

Page 23: TLI 2012: Data flows in integrated breeding

Flapjack Map Data

The map file should contain information on the markers, the chromosome they are on, and their position within that chromosome. The markers do not need to be in any particular order as Flapjack will group and sort them by chromosome and distance once they are loaded.

Page 24: TLI 2012: Data flows in integrated breeding

Breeding program designer

Blue/gray – strategyGreen – GenerationYellow – selection roundPink/red – trait selection step

• To start, open ‘BreedingProgram.jar’• Can create/drag/drop any new objects anywhere• Use left mouse click to drag any piece and drop on higher hiearchy• Use centre mouse click to zoom• Edit in list/value boxes to set parameters

+ add new object at next levelX delete object

clone object

Scott Chapman

Page 25: TLI 2012: Data flows in integrated breeding

Available breeding simulation tools

QuLine, a computer software that simulates breeding programs for developing inbred lines

QuHybrid, a computer software that simulates breeding programs for developing hybrids

QuMARS, a computer software that simulates marker-assisted recurrent selection and genome-wide selection

Jiankang Wang

Page 26: TLI 2012: Data flows in integrated breeding

What can QuLine do?

Comparison of genetic gains from different selection methods Change in population mean Change in gene frequency Change in Hamming distance (distance of a selected

genotype to the target genotype) Comparison of cross performance

Selection history Rogers’ genetic distance Number of lines retained from each cross

Comparison of cost efficiency Number of families Individual plants per generation

Validation of theoriesJiankang Wang

Page 27: TLI 2012: Data flows in integrated breeding

Breeding Management

System

•Nursery Management•Characterization lists•Pedigree maintenance•Evaluation lists•Seed Inventory

Genotypic DataManagement

System

•Planting list•Sample list

LIMS

•Genotyping Data•Quality Assurance

Field Trial Management

System

•Fieldbook preparation

Data Collection-Hand-held devises-Automatic measurement

•Environmentalcharacterization•Quality Assurance•Phenotyping data

Genotypic DataManagement

System

•Planting list•Sample list

•Genotyping Data•Quality Assurance

AnalyticalPipeline

•Genotyping QA•Diversity analysis•Genetic mapping•Phenotyping QA•Single site analysis•Multi site analysis•GxE Analysis•QTL Analysis•QTLxE Analysis

Decision Support Tools

•MBDT•Breeding indices•OptiMas

Simulation Tools

•QuLine•QuHybrid•QuMARS•QuGene

GMS DMS GDMS

Integrating the applications of the Configurable Workflow System