Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

37
Predictive link between crop traits and ecoclimatic description of the original collecting site

description

Predictive link between crop traits and eco-climatic description of the original collecting site. Presented at the Botany Seminar series (http://botanyseminars.blogspot.com) on 25 February 2011 at the Department of Agriculture and Ecology, Faculty of Life Sciences, Copenhagen University, Denmark. URL: http://botanyseminars.blogspot.com/2011/01/25-february-dag-endresen.html Endresen, D.T.F. (2010). Predictive association between trait data and ecogeographic data for Nordic barley landraces. Crop Sci. 50(6):2418-2430. doi: 10.2135/cropsci2010.03.0174 Endresen, D.T.F., K. Street, M. Mackay, A. Bari, and E. De Pauw (2011). Predictive Association between Biotic Stress Traits and Eco-Geographic Data for Wheat and Barley Landraces. Crop Science 51 (5): 2036-2055. doi: 10.2135/cropsci2010.12.0717 Endresen, D.T.F., K. Street, M. Mackay, A. Bari, A. Amri, E. De Pauw, K. Nazari, and A. Yahyaoui (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science [Online first] doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec. 2011. Endresen, D.T.F. (2011). Utilization of Plant Genetic Resources: A Lifeboat to the Gene Pool [PhD Thesis]. Copenhagen University, Faculty for Life Sciences, Department of Agriculture and Ecology. Available at: http://goo.gl/pYa9x (PDF 37 MB). ISBN: 978-91-628-8268-6.

Transcript of Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Page 1: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Predictive link between crop traits and ecoclimatic description of the original

collecting site

Page 2: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

TOPICS

• The Nordic Gene Bank (1979)– Nordic Genetic Resource Center

(2008)• Gap analysis

– To complete gene bank collections for maximum genetic diversity

• Trait mining with FIGS– Predictive link between climate data

and trait data– Case studies:

• Morphological traits in Nordic barley• Biotic stress traits in wheat and barley• Blind prediction of stem rust, Ug99 in

bread wheat landraces Wheat at Alnarp, June 2010

Page 3: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Seed containers through the years

Nordic Gene Bank (1979-2007)

Seed store (with household freezers)

Page 4: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

NORDIC GENETIC RESOURCE CENTER (2008)

Nordic Genetic Resource Center, photo: January 2011

Seed store (with household freezers)Seed drying room

Page 5: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

SESTOOnline

database

http://sesto.nordgen.org

Page 6: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

SESTO distribution and georeferenced accessionsRed dots are the georeferenced collecting places Countries are colored by accessions DISTRIBUTED

Genebank material primarily originating from the Nordic region – seed requests primarily from the same region

ORIGIN VERSUS USE (SEED REQUESTS)

Page 7: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

DISTRIBUTED NETWORK OF

GENEBANK LOCATIONS

IN THE NORDIC REGION

map locations including clonal

archives

Page 8: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Svalbard: 1984

Page 9: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Svalbard: 2008

Statsbygg (2008)

Page 10: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

The Svalbard Global Seed Vault (2008) is operated by NordGen

Data portal online athttp://www.nordgen.org/sgsv

Inside the vault on 27 February 2008, Ola Westengen, Johan Bäckman and Simon Jeppson

Page 11: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

SVALBARD SEED VAULT BY COUNTRY OF ORIGIN (STATUS FEBRUARY 2011)

Status after three years of operation: February 2011

Page 12: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

GAP ANALYSIS

• Gap analysis• Identify gaps in the gene

bank collections• To maximize the conserved

genetic diversity

Page 13: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

WORMWOOD (ARTEMISIA ABSINTHIUM L.)NORDGEN: JUNE 2010

13

Maxentspecies distribution model(7 364 records)

Page 14: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

GAP ANALYSIS TO COMPLEMENT GENEBANK COLLECTIONS

Objectives of Gap analysis:

– Advice the planning of new collecting/gathering expeditions

– Identification of relevant areas were the crop species is predicted to be present

– Focus on areas least well represented in the genebank collection (maximize diversity)

14

See for example http://gisweb.ciat.cgiar.org/GapAnalysis/ for more information

Page 15: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

FIGS – FOCUSED IDENTIFICATION OF GERMPLASM STRATEGY

Climate layers from the ICARDA ecoclimatic database (De Pauw, 2003)

Page 16: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Challenges for improved utilization of genetic resources for crop improvement :

* Large gene bank collections* Limited screening capacity

16

Page 17: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

A NEEDLE IN A HAY STACK

• Scientists and plant breeders want a few hundred germplasm accessions to evaluate for a particular trait.

• How does the scientist select a small subset likely to have the useful trait?

17

Slide modified from slides by Ken Street, ICARDA FIGS team

Page 18: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

– Using climate data for prediction of crop traits BEFORE the field trials.

– Identification of landraces with a higher probability of holding an interesting trait property.

18

OBJECTIVES OF FIGS

Page 19: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

19

ASSUMPTION:The climate at the original source location, where the crop landrace was developed during long-term traditional cultivation, is correlated to the trait property.

AIM: To build a computer model explaining the crop trait score from the climate data.

Page 20: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

CLIMATE EFFECT DURING THE CULTIVATION PROCESS

Wild relatives are shaped by the environment

Primitive cultivated crops are shaped by local climate and humans

Traditional cultivated crops (landraces) are shaped by climate and humans

Modern cultivated crops are mostly shaped by humans (plant breeders)

Perhaps future crops are shaped in the molecular laboratory…?

20

Page 21: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

PREDICTIVE LINK BETWEEN ECO-GEOGRAPHY AND TRAITS

It is possible that the human mediated selection of landraces will contribute to the link between ecogeography and traits.

During traditional cultivation the farmer will select for and introduce germplasm for improved suitability of the landrace to the local conditions.

21

Page 22: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

er

y

a

n

is

F I G SOCUSED DENTIFICATION OF ERMPLASM TRATEGY

Data la

yers sie

ve acce

ssions

ba

sed on

latitud

e &

lon

gitud

e

Illustration by Mackay (1995)

22

FIGS:

Origin of FIGS: Michael Mackay (1986, 1990, 1995)

Page 23: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

CLIMATE DATA – WORLDCLIMThe climate data can be extracted from the WorldClim dataset.http://www.worldclim.org/ (Hijmans et al., 2005)

Data from weather stations worldwide are combined to a continuous surface layer.

Climate data for each landrace is extracted from this surface layer.

Precipitation: 20 590 stations

Temperature: 7 280 stations23

Page 24: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

24

CLIMATE DATALayers used in this study:

• Precipitation (rainfall)• Maximum temperatures • Minimum temperatures

Some of the other layers available:

• Potential evapotranspiration (water-loss)• Agro-climatic Zone (UNESCO classification)• Soil classification (FAO Soil map)• Aridity (dryness)

(mean values for month and year)Eddy De Pauw (ICARDA, 2008)

Page 25: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

DATA FOR THE SIMULATION MODEL

• Training set– For the initial calibration or training step.

• Calibration set– Further calibration, tuning step– Often cross-validation on the training set

is used to reduce the consumption of raw data.

• Test set– For the model validation or goodness of

fit testing.– External data, not used in the model

calibration.25

Page 26: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

MORPHOLOGICAL TRAITS IN NORDIC BARLEY LANDRACES

26

Field observations by Agnese Kolodinska Brantestam (2002-2003)

Multi-way N-PLS data analysis, Dag Endresen (2009-2010)

Priekuli (LVA) Bjørke (NOR) Landskrona (SWE)

Page 27: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

2712 months (mode 2)

14 s

ampl

es (m

ode

1)

12 months (mode 2)

14 s

ampl

es (m

ode

1)(m

ode 3)3 cl

imate

varia

bles

Precipitation

Max temp

Min temp

Min. temperature14

sam

ples Jan, Feb, Mar, …

(mode 2)

Max. temperature Precipitation

36 variables

3rd level for mode 32nd level for mode 31st level for mode 3

mod

e 1

Jan, Feb, Mar, …(mode 2)

Jan, Feb, Mar, …(mode 2)

MULTI-WAY DATA STRUCTURE (FOR N-PLS)

Page 28: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

MULTI-WAY N-PLS RESULTSNORDIC BARLEY LANDRACES

ExperimentSite Year

Heading days

Ripening days

Length of plant

Harvest index

Volumetric weight

Thousand grain weight

LVA 20021 n.s. n.s. n.s. n.s. *** n.s.

LVA 2003 *** n.s. ** ** *** n.s.NOR 2002 - * ** *** ** n.s.NOR 2003 ** *** *** * * n.s.

SWE 2002 ** *** n.s. ** * n.s.SWE 20032 n.s. ** n.s. n.s. ** n.s.

*** Significant at the 0.001 level (p-value)** Significant at the 0.01 level

* Significant at the 0.05 leveln.s. Not significant (at the above levels)

1 LVA 2002 Germination on spikes (very wet June)2 SWE 2003 Incomplete grain filling (very dry June)

Endresen (2010). Predictive association between trait data and ecogeographic data for Nordic barley landraces. Crop Science 50: 2418-2430. DOI: 10.2135/cropsci2010.03.0174

Page 29: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

STEM RUST IN WHEAT LANDRACES

29USDA GRIN, trait data online: http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049

Green dots indicate collecting sites for resistant wheat landraces and red dots collecting sites for susceptible landraces.

Page 30: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Principal component 1

Princip

al co

mponent 2

Prin

cipa

l com

pone

nt 3

3 PCs

1 PC

2 PCs

SIMCA ANALYSIS (PCA MODEL FOR EACH CLASS)

Illustration modified from Wise et al., 2006:201 (PLS Toolbox software manual)

Resistant samples

Intermediate*

Susceptible

*

Example from the stem rust set:

Page 31: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

CLASSIFICATION PERFORMANCE

• Positive predictive value (PPV)• PPV = True positives / (True positives + False positives)• Classification performance for the identification of

resistant samples (positives)

• Positive diagnostic likelihood ratio (LR+)• LR+ = sensitivity / (1 – specificity)• Less sensitive to prevalence than PPV

Page 32: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

MULTIVARIATE SIMCA RESULTSSTEM RUST IN WHEAT

Dataset (unit) PPV LR+ Estimated gainStem rust (accession) 0.54 (0.50-0.59) 3.07 (2.66-3.54) 1.95 (1.79-2.09)

Random 0.29 (0.26-0.33) 1.04 (0.90-1.20) 1.03 (0.91-1.16)(28 % resistant samples)

Stem rust (site) 0.50 (0.40-0.60) 4.00 (2.85-5.66) 2.51 (2.02-2.98)Random 0.19 (0.13-0.26) 0.94 (0.63-1.39) 0.95 (0.66-1.33)(20 % resistant samples)

Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (submitted). Predictive association between biotic stress traits and ecogeographic data for wheat and barley landraces. Crop Science, conditionally accepted, revision underway.

Page 33: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

NET BLOTCH IN BARLEY LANDRACES

33

Green dots indicate collecting sites for resistant wheat landraces and red dots collecting sites for susceptible landraces.

USDA GRIN, trait data online: http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?1041

Page 34: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

MULTIVARIATE SIMCA RESULTSNET BLOTCH IN BARLEY

Dataset (unit) PPV LR+ Estimated gainNet blotch (accession) 0.54 (0.48-0.60) 1.75 (1.42-2.17) 1.35 (1.19-1.50)

Random 0.40 (0.35-0.45) 0.99 (0.84-1.17) 0.99 (0.87-1.12)(40 % resistant samples)

Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (submitted). Predictive association between biotic stress traits and ecogeographic data for wheat and barley landraces. Crop Science, conditionally accepted, revision underway.

Page 35: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

MULTIVARIATE SIMCA RESULTSSTEM RUST (UG99) IN WHEAT

Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (draft manuscript). Sources of resistance in wheat to stem rust (Ug99) identified using Focused Identification of Germplasm Strategy (FIGS).

Ug99 set with 4563 wheat landraces screened for Ug99 in Yemen 2007, 10.2 % resistant accessions. The true trait scores for 20% of the accessions (825 samples) were revealed. We used trait mining with SIMCA to select 500 accessions more likely to be resistant from 3728 accession with true scores hidden (to the person making the analysis). The FIGS set was observed to hold 25.8 % resistant samples and thus 2.5 times higher than expected by chance.

Page 36: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

A LIFEBOAT TO THE GENE POOL

• PhD thesis: – A lifeboat to the gene pool

• Defense planned for 31 March– KU LIFE campus Fredriksberg– Auditorium: 3-13 / A2-70.3

Page 37: Botany Seminar on trait data mining using FIGS, Copenhagen (25 Feb 2011)

Thanks for listening!

Presented for the Botany Seminars 25 February 2011

Department of Ecology and Agriculture, Faculty of Life Sciences, Copenhagen University

Dag Terje Filip [email protected]