Bari a 2nd iwsrs conference - izmir - 29 april2014

37
2nd International Wheat Stripe Rust Symposium Predicting and Locating Sources of Resistance to Stripe (Yellow) Rust in Durum Wheat Genetic Resources 2 nd International Wheat Stripe Rust Symposium Izmir - Turkey, 28 April – 1 st May 2014 Grain Research & Development Corporation

description

 

Transcript of Bari a 2nd iwsrs conference - izmir - 29 april2014

Page 1: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Predicting and Locating Sources of Resistance to Stripe (Yellow) Rust in

Durum Wheat Genetic Resources 

2nd International Wheat Stripe Rust SymposiumIzmir - Turkey, 28 April – 1st May 2014

Grain Research &DevelopmentCorporation

Page 2: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Outline

Challenges and opportunities

Sub setting PGR – FIGS approach

Stripe rust resistance case

Work ahead

Partnership (new)

Page 3: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

• More than 7 million accessions • More than 1400 genebanks• Data/concepts• Search cost1 implications• Time2 lags implications

-----------------

Challenges - opportunities

Gollin D, Smale M, Skovmand B (2000) Searching an ex situ collection of wheat genetic resources. Am J Agric Econ 82:812–827

Koo B, Wright BD (2000) The optimal timing of evaluation of genebank accessions and the effects of biotechnology. Am J Agric Econ 82:797–811

1

2

Page 4: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

The utilisation of genebanks has not kept pace with their expansion!Gollin et al. (2000)

Page 5: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

New trait variation - FIGS

Net blotch (barley) Powdery mildew Russian wheat aphid (RWA) Sunn pest

Braidotti, G. (2009) Partners in Research for Development

A wheat landrace from Turkey collected in 1948 was discovered to carry genes of resistance to fungal diseases in 1980s.

Atalan-Helicke N (2012) Conserving diversity at the dinner table: plants, food

security and gene banks. Origins: Current Events in Historical Perspective

Accessed 5 April 2014

Challenges - opportunities

Page 6: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

PGR(Biodiversity)

Stratification/Multl-stage procedure

Sub-setting sub set

PGR(Biodiversity)

Sub setting Filtering/

Relationship FIGS set

(Trait)

PGR sub-setting: FIGS approach

6

By applying to plant genetic resources/agro-biodiversity the same selection pressure exerted on plants by evolution.

Sub setting to overcome the problem of the large size (search cost) of PGR collections

Page 7: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Detect presence of patterns (environment x trait)

Presence of patterns -----> quantification and prediction MacArthur (1972)

Assessing PGR/Agro-Biodiversity for rust resistance

Environment (tmin, tmax, prec)

Trait (T)(Resistance to stripe Rust)

Bayes – Laplace approach (inverse probability)

Learning based approach (risk minimization)

Cherkassky & Mulier (2007)

The Bayes-Laplace inverse theorem focuses on the probability of causes in relation to their effects, in contrast to the probability of effects in relation to their causes. Fisher (1922, 1930)

(E)

Page 8: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

FIGS powdery mildew setResults of screening

Accessions infected with 4 powdery mildew\isolates which were avirulent or virulent to the known Pm3 alleles

Of these 420 sites, 40% yielded accessions that were resistant to the isolates used – 211 accessions

Starting with a total pool of 16,000 accessions collected from 6,159 sites, the FIGS process chose 1,320 accessions collected from 420 sites

Kaur K; Street K; Mackay M; Yahiaoui N; Keller B (2008). Allele mining and sequence diversity at the wheat powdery mildew resistance locus Pm3. 11th IWGS, 24-29 Aug., Brisbane

Page 9: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

9

Distribution of new Pm3 alleles

FIGS powdery mildew set

Page 10: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Mining natural variation

By linking traits (phenotype), environments (and associated selection pressures) with genebank accessions (e.g. landraces and crop relatives) -> ‘focus’ in on those accessions most likely to possess trait specific

genetic variation.

0 50 100 150

010

2030

4050

60

Longitude

Latit

ude

Trait (disease score)Environnement FIGS subsetwww.icarda.org/

Page 11: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Focused Identification of Germplasm Strategy

Geo-referencing of collecting sites

11

Evaluation (phenotyping)

Environment (E)

Accession (G)

Trait (T)

FIGS approach – summarized

Page 12: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

FIGS pathways – so far…

User defined trait

Evaluation (limited) data

No evaluation data

Use filtering processIdentify environmental x trait relationship (model)

Use relationship to predict candidate sites

Knowledge (Specialised)

Use a priori process

Page 13: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Accuracy metrics

The ROC curve and the resulting pdf’s of trait distribution (trait states)

1

1

1-

ROC curve pdf’s of trait distribution

High AUC (area) values indication of potential trait-environment relationship

Patterns present in data

Predictions

Freq

uenc

y

True

pos

itive

rate

False positive rate

Environment

Page 14: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Parameters which provide information on the accuracy of the predictions (“trait x agro-climate”)

ObservedTolerant Susceptible

Predicted Tolerant a bSusceptible c d

Confusion matrix (2-by-2 contingency table)

Sensitivity = a/ (a + c) Specificity = d/(b + d)

• Sensitivity refers to the proportion of accessions with resistance scored as resistant, while

• Specificity refers to the proportion of accessions without resistance scored as susceptible

Both are indicators of the models ability to correctly classify observations.

Accuracy metrics

Page 15: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

pdf’s of trait distribution

Accuracy metrics

Randomness (no pattern)

1

1

1- ROC curve

PredictionsFr

eque

ncy

True

pos

itive

rate

False positive rate

Page 16: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Stripe rust – search for resistance

AimPredict accessions/areas likely to be resistant /conducive to stripe rust appearance/presence

HypothesisRelationship exists between the geographic distribution of stripe rust resistance and collection site climate descriptors

Page 17: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Sub-Setting procedure – a priori

ICARDA genebank ~ 20 000 accessions of durum wheat

2915 accs Entire collection

Training set

Test set

~ 725 accs

(before 2011)

~ 2915 accs

(2011/12)

Training set

Validation(actual evaluation)

Prediction/Location(in silicoevaluation)

Page 18: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium18

Layers used in the studies:• Precipitation (rainfall)• Maximum temperatures • Minimum temperatures

+ Derived GIS layers such as:• Potential evapotranspiration (water-loss)• Moisture/Aridity index

(mean values for month and year)

Eco-climate data (X)

ICARDA Geo-Informatics

Current ICARDA eco-climatic database, average: annual temperature (front), annual precipitation (middle), and winter precipitation (at the back)

(De Pauw 2008)

Site code prec01 prec02 prec03 prec04 prec05….. ari01 ari02 ari03 ari04 ari05

ETH-S893 25 36 72 154.22 148.88 0.167 0.246 0.439 1.098 1.169

NS_339 44 67 130.43 177.96 185.74 0.351 0.552 0.949 1.457 1.751

NS_559 23 40 61.89 129.04 102 0.226 0.397 0.511 1.206 0.998

Climate data (X)

Page 19: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Trait data set (Y)

.

.

.

.

.

Trait data (Y as dependent variable)

http://www.icarda.org/striperust2014/2nd-international-wheat-stripe-rust-symposium-2014/

Genetic Resources - ICARDA

Page 20: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Modeling framework

20

Yi ~

Trait data (Y)

Y ~ f(X)

Environmental data (X)

X is the set of variables that contains explanatory variables or predictors (climate data) where X R∈ m,

Y Y that is either a categorical (label) ∈or a numerical response (trait descriptor states).

Bari A. et al. (2011) Genetic Resources and Crop Evolutionhttp://www.springerlink.com/content/m7140x68v2065113/fulltext.pdf

Conceptual framework at:Bernoulli distribution

Page 21: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Sub setting - variables

Stripe rust Resistance/trait states

(Y) – Response variable

(X) – climate variables

Page 22: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Geographical Information System

(GIS)

Arc GisEnvironmental data/layers(surfaces)

R language(Development of algorithms)

> Data transformation ()> Model <- model(trait ~ climate)> Measuring accuracy metrics> ….

Platform - analysis

22

Modeling purpose Generation of environmental data

Algorithms : to search for dependency, if it exists!

Climate datato generate surfaces

Page 23: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Machine learning classification (models) algorithms

Support Vector Machines (SVM)

Random Forest (RF)

Neural Network (NN)

x1

x2

xp

F(x

)

Bari A, Street K, Mackey M, Endresen DTF, De Pauw E, Amri A (2012) Focused identification of germplasm

strategy (FIGS) detects wheat stem rust resistance linked to environmental variables.

Genet Resour Crop Evol 59:1465–1481

Page 24: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Models used – non linear

• Y normally distributed at each value of X• Variance of Y should be constant for each value of xi

(homogeneity of variance)• No serial correlation – values of Y independent of one another• A linear or curvi-linear response

Time consuming transformations – if assumptions violated

Assumptions of the linear model

Limits to detecting relationships that have higher dimensions or are more complex

Page 25: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Model   AUC Sensitivity Specificity

Proportion

correct Kappa

SVM mean 0.72 0.65 0.78 0.74 0.40

  lower 0.69 0.61 0.74 0.72 0.35

  upper 0.74 0.69 0.82 0.77 0.45

RF mean 0.70 0.64 0.76 0.73 0.37

  lower 0.67 0.61 0.71 0.69 0.30

  upper 0.73 0.67 0.81 0.76 0.44

NN mean 0.73 0.69 0.77 0.74 0.41

  lower 0.70 0.58 0.69 0.70 0.35

  upper 0.76 0.79 0.85 0.78 0.48

Results – accuracy metrics values Training/validation set – define dependency “approximation” function

Page 26: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Model   AUC Sensitivity Specificity

Proportion

correct Kappa

SVM mean 0.72 0.67 0.78 0.75 0.41

  lower 0.71 0.64 0.74 0.73 0.36

  upper 0.74 0.70 0.81 0.76 0.45

RF mean 0.71 0.63 0.80 0.75 0.40

  lower 0.70 0.58 0.77 0.73 0.36

  upper 0.73 0.67 0.84 0.77 0.45

NN mean 0.74 0.74 0.74 0.73 0.41

  lower 0.72 0.65 0.67 0.69 0.37

  upper 0.76 0.83 0.81 0.76 0.46

Test/unknown set – in silico evaluation vs actual evaluation

Results – accuracy metrics values (Yr)

Page 27: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Classifier method AUC Cohen’s Kappa

Principal Component Regression (PCR)

0.69 (0.68-0.70) 0.40 (0.37-0.42)

Partial Least Squares (PLS)

0.69 (0.68-0.70) 0.41 (0.39-0.43)

Random Forest (RF) 0.70 (0.69-0.71) 0.42 (0.40-0.44)

Support Vector Machines (SVM)

0.71 (0.70-0.72) 0.44 (0.42-0.45)

Artificial Neural Networks (ANN)

0.71 (0.70-0.72) 0.44 (0.42-0.46)

0.0 0.4 0.8

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0

0.00

.51

.01

.52

.02

.53

.0

Results – accuracy metrics values Stem rust – previous research

Page 28: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

-0.2

90

0.29

0.58

0.87

1.16

-0.5 0.0 0.5 1.0 1.5

01

23

4

Distribution by trait stateTr

ue p

ositi

ve ra

te

Freq

uenc

yBari et al. (2014). Predicting resistance to stripe (yellow) rust in wheat genetic resources using Focused Identification of Germplasm Strategy (FIGS). Journal of Agricultural Science

ROC plots (left) and density plots class prediction (right)

False positive rate Predicted probability

Results – Graphs -Stripe rust

Page 29: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Results – Model predictions

SVM

RF

NN

Accuracy metrics (ROC) plots for the SMV, RF and NN models applied to the evaluation data not made known to the model.

The histograms are about predictions of resistance and susceptibility, where

= susceptibility

= resistance.

Page 30: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Results – spatial patterns

Likelihood of an area yielding traits of resistance to stripe rust (yellow colour)

Longitude

Latit

ude

Page 31: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Sub-Setting procedure – adjustment based on phenology

Alignment of data based on phenology

To reduce:

• The “out phase” differences due to different growing seasons/periods

The daily data were derived from models involving the proposed model by Epstein (1991) as a sum of harmonic components.

Page 32: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Modelling/predictions Capturing the shift induced by climate

Based on the estimation of the duration of the period during the year in which neither moisture nor temperature are limiting to plants.

Target specific phase of cropdevelopment

Bari et al. (in press). Searching for climate change related traits in plant genetic resources collections using Focused Identification of Germplasm Strategy (FIGS). Options Méditerranéennes.

Alignment of data based on phenology

Page 33: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Accuracy and agreement parameters of aligned data

Sub-Setting procedure – adjustment based on phenology - results

Data type AUCOmission rate Sensitivity Specificity

Correct classification Kappa

monthly 0.81 0.28 0.72 0.90 0.86 0.61

daily data 0.82 0.30 0.70 0.93 0.88 0.64

aligned daily data 0.83 0.28 0.72 0.95 0.90 0.70

210 days

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

-0.2

90

0.29

0.58

0.87

1.16

Page 34: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Modelling/predictions Capturing the shift induced by climate - verification

0 100 200 300

020

40

60

80

x$x

x$ysm

th

Data alignment to growing season

Algorithms

Separate phase variation from amplitude variation

0 100 200 300

5010

015

020

0

x$x

x$ys

mth

Site (i) : Si(xi, yi) Site (j): Sj(xj, yj)

day

rain

fall

day

http://mpe2013.org/

We are not there yet …

Page 35: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Future directions (in summary)

Trait data (Y) Environmental data (X)

x

u

y

u for yet unknown variables

FIGS aims to deal with unobserved inputs, uncertainty, and un-ambiguity (v) CC induced shift (bias)Z to eventually capture the dynamics (complexity)

v

Climate change

FIGS

Z(t)

Page 36: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

“Applied Mathematics and Omics Technologies for

Discovering Biodiversity and Genetic Resources for Climate

Change Mitigation and Adaptation to Sustain

Agriculture in Drylands”

http://mpe2013.org/

Future directions

Explore the use of a variety of applied mathematics approaches in relation to phenology aspects of both the pathogen and the host.

Expect to appear also at MPE

host pathogenSummary proceedings

Page 37: Bari a   2nd iwsrs conference - izmir - 29 april2014

2nd International Wheat Stripe Rust Symposium

Teşekkür EderimThank you

Abdallah Bari Kumarse Nazari Miloudi NachitAhmed Amri Ken Street Chandra BiradarAmor Yahyaoui Dag Endresen