Bari a 2nd iwsrs conference - izmir - 29 april2014
description
Transcript of Bari a 2nd iwsrs conference - izmir - 29 april2014
2nd International Wheat Stripe Rust Symposium
Predicting and Locating Sources of Resistance to Stripe (Yellow) Rust in
Durum Wheat Genetic Resources
2nd International Wheat Stripe Rust SymposiumIzmir - Turkey, 28 April – 1st May 2014
Grain Research &DevelopmentCorporation
2nd International Wheat Stripe Rust Symposium
Outline
Challenges and opportunities
Sub setting PGR – FIGS approach
Stripe rust resistance case
Work ahead
Partnership (new)
2nd International Wheat Stripe Rust Symposium
• More than 7 million accessions • More than 1400 genebanks• Data/concepts• Search cost1 implications• Time2 lags implications
-----------------
Challenges - opportunities
Gollin D, Smale M, Skovmand B (2000) Searching an ex situ collection of wheat genetic resources. Am J Agric Econ 82:812–827
Koo B, Wright BD (2000) The optimal timing of evaluation of genebank accessions and the effects of biotechnology. Am J Agric Econ 82:797–811
1
2
2nd International Wheat Stripe Rust Symposium
The utilisation of genebanks has not kept pace with their expansion!Gollin et al. (2000)
2nd International Wheat Stripe Rust Symposium
New trait variation - FIGS
Net blotch (barley) Powdery mildew Russian wheat aphid (RWA) Sunn pest
Braidotti, G. (2009) Partners in Research for Development
A wheat landrace from Turkey collected in 1948 was discovered to carry genes of resistance to fungal diseases in 1980s.
Atalan-Helicke N (2012) Conserving diversity at the dinner table: plants, food
security and gene banks. Origins: Current Events in Historical Perspective
Accessed 5 April 2014
Challenges - opportunities
2nd International Wheat Stripe Rust Symposium
PGR(Biodiversity)
Stratification/Multl-stage procedure
Sub-setting sub set
PGR(Biodiversity)
Sub setting Filtering/
Relationship FIGS set
(Trait)
PGR sub-setting: FIGS approach
6
By applying to plant genetic resources/agro-biodiversity the same selection pressure exerted on plants by evolution.
Sub setting to overcome the problem of the large size (search cost) of PGR collections
2nd International Wheat Stripe Rust Symposium
Detect presence of patterns (environment x trait)
Presence of patterns -----> quantification and prediction MacArthur (1972)
Assessing PGR/Agro-Biodiversity for rust resistance
Environment (tmin, tmax, prec)
Trait (T)(Resistance to stripe Rust)
Bayes – Laplace approach (inverse probability)
Learning based approach (risk minimization)
Cherkassky & Mulier (2007)
The Bayes-Laplace inverse theorem focuses on the probability of causes in relation to their effects, in contrast to the probability of effects in relation to their causes. Fisher (1922, 1930)
(E)
2nd International Wheat Stripe Rust Symposium
FIGS powdery mildew setResults of screening
Accessions infected with 4 powdery mildew\isolates which were avirulent or virulent to the known Pm3 alleles
Of these 420 sites, 40% yielded accessions that were resistant to the isolates used – 211 accessions
Starting with a total pool of 16,000 accessions collected from 6,159 sites, the FIGS process chose 1,320 accessions collected from 420 sites
Kaur K; Street K; Mackay M; Yahiaoui N; Keller B (2008). Allele mining and sequence diversity at the wheat powdery mildew resistance locus Pm3. 11th IWGS, 24-29 Aug., Brisbane
2nd International Wheat Stripe Rust Symposium
9
Distribution of new Pm3 alleles
FIGS powdery mildew set
2nd International Wheat Stripe Rust Symposium
Mining natural variation
By linking traits (phenotype), environments (and associated selection pressures) with genebank accessions (e.g. landraces and crop relatives) -> ‘focus’ in on those accessions most likely to possess trait specific
genetic variation.
0 50 100 150
010
2030
4050
60
Longitude
Latit
ude
Trait (disease score)Environnement FIGS subsetwww.icarda.org/
2nd International Wheat Stripe Rust Symposium
Focused Identification of Germplasm Strategy
Geo-referencing of collecting sites
11
Evaluation (phenotyping)
Environment (E)
Accession (G)
Trait (T)
FIGS approach – summarized
2nd International Wheat Stripe Rust Symposium
FIGS pathways – so far…
User defined trait
Evaluation (limited) data
No evaluation data
Use filtering processIdentify environmental x trait relationship (model)
Use relationship to predict candidate sites
Knowledge (Specialised)
Use a priori process
2nd International Wheat Stripe Rust Symposium
Accuracy metrics
The ROC curve and the resulting pdf’s of trait distribution (trait states)
1
1
1-
ROC curve pdf’s of trait distribution
High AUC (area) values indication of potential trait-environment relationship
Patterns present in data
Predictions
Freq
uenc
y
True
pos
itive
rate
False positive rate
Environment
2nd International Wheat Stripe Rust Symposium
Parameters which provide information on the accuracy of the predictions (“trait x agro-climate”)
ObservedTolerant Susceptible
Predicted Tolerant a bSusceptible c d
Confusion matrix (2-by-2 contingency table)
Sensitivity = a/ (a + c) Specificity = d/(b + d)
• Sensitivity refers to the proportion of accessions with resistance scored as resistant, while
• Specificity refers to the proportion of accessions without resistance scored as susceptible
Both are indicators of the models ability to correctly classify observations.
Accuracy metrics
2nd International Wheat Stripe Rust Symposium
pdf’s of trait distribution
Accuracy metrics
Randomness (no pattern)
1
1
1- ROC curve
PredictionsFr
eque
ncy
True
pos
itive
rate
False positive rate
2nd International Wheat Stripe Rust Symposium
Stripe rust – search for resistance
AimPredict accessions/areas likely to be resistant /conducive to stripe rust appearance/presence
HypothesisRelationship exists between the geographic distribution of stripe rust resistance and collection site climate descriptors
2nd International Wheat Stripe Rust Symposium
Sub-Setting procedure – a priori
ICARDA genebank ~ 20 000 accessions of durum wheat
2915 accs Entire collection
Training set
Test set
~ 725 accs
(before 2011)
~ 2915 accs
(2011/12)
Training set
Validation(actual evaluation)
Prediction/Location(in silicoevaluation)
2nd International Wheat Stripe Rust Symposium18
Layers used in the studies:• Precipitation (rainfall)• Maximum temperatures • Minimum temperatures
+ Derived GIS layers such as:• Potential evapotranspiration (water-loss)• Moisture/Aridity index
(mean values for month and year)
Eco-climate data (X)
ICARDA Geo-Informatics
Current ICARDA eco-climatic database, average: annual temperature (front), annual precipitation (middle), and winter precipitation (at the back)
(De Pauw 2008)
Site code prec01 prec02 prec03 prec04 prec05….. ari01 ari02 ari03 ari04 ari05
ETH-S893 25 36 72 154.22 148.88 0.167 0.246 0.439 1.098 1.169
NS_339 44 67 130.43 177.96 185.74 0.351 0.552 0.949 1.457 1.751
NS_559 23 40 61.89 129.04 102 0.226 0.397 0.511 1.206 0.998
Climate data (X)
2nd International Wheat Stripe Rust Symposium
Trait data set (Y)
.
.
.
.
.
Trait data (Y as dependent variable)
http://www.icarda.org/striperust2014/2nd-international-wheat-stripe-rust-symposium-2014/
Genetic Resources - ICARDA
2nd International Wheat Stripe Rust Symposium
Modeling framework
20
Yi ~
Trait data (Y)
Y ~ f(X)
Environmental data (X)
X is the set of variables that contains explanatory variables or predictors (climate data) where X R∈ m,
Y Y that is either a categorical (label) ∈or a numerical response (trait descriptor states).
Bari A. et al. (2011) Genetic Resources and Crop Evolutionhttp://www.springerlink.com/content/m7140x68v2065113/fulltext.pdf
Conceptual framework at:Bernoulli distribution
2nd International Wheat Stripe Rust Symposium
Sub setting - variables
Stripe rust Resistance/trait states
(Y) – Response variable
(X) – climate variables
2nd International Wheat Stripe Rust Symposium
Geographical Information System
(GIS)
Arc GisEnvironmental data/layers(surfaces)
R language(Development of algorithms)
> Data transformation ()> Model <- model(trait ~ climate)> Measuring accuracy metrics> ….
Platform - analysis
22
Modeling purpose Generation of environmental data
Algorithms : to search for dependency, if it exists!
Climate datato generate surfaces
2nd International Wheat Stripe Rust Symposium
Machine learning classification (models) algorithms
Support Vector Machines (SVM)
Random Forest (RF)
Neural Network (NN)
x1
x2
xp
F(x
)
Bari A, Street K, Mackey M, Endresen DTF, De Pauw E, Amri A (2012) Focused identification of germplasm
strategy (FIGS) detects wheat stem rust resistance linked to environmental variables.
Genet Resour Crop Evol 59:1465–1481
2nd International Wheat Stripe Rust Symposium
Models used – non linear
• Y normally distributed at each value of X• Variance of Y should be constant for each value of xi
(homogeneity of variance)• No serial correlation – values of Y independent of one another• A linear or curvi-linear response
Time consuming transformations – if assumptions violated
Assumptions of the linear model
Limits to detecting relationships that have higher dimensions or are more complex
2nd International Wheat Stripe Rust Symposium
Model AUC Sensitivity Specificity
Proportion
correct Kappa
SVM mean 0.72 0.65 0.78 0.74 0.40
lower 0.69 0.61 0.74 0.72 0.35
upper 0.74 0.69 0.82 0.77 0.45
RF mean 0.70 0.64 0.76 0.73 0.37
lower 0.67 0.61 0.71 0.69 0.30
upper 0.73 0.67 0.81 0.76 0.44
NN mean 0.73 0.69 0.77 0.74 0.41
lower 0.70 0.58 0.69 0.70 0.35
upper 0.76 0.79 0.85 0.78 0.48
Results – accuracy metrics values Training/validation set – define dependency “approximation” function
2nd International Wheat Stripe Rust Symposium
Model AUC Sensitivity Specificity
Proportion
correct Kappa
SVM mean 0.72 0.67 0.78 0.75 0.41
lower 0.71 0.64 0.74 0.73 0.36
upper 0.74 0.70 0.81 0.76 0.45
RF mean 0.71 0.63 0.80 0.75 0.40
lower 0.70 0.58 0.77 0.73 0.36
upper 0.73 0.67 0.84 0.77 0.45
NN mean 0.74 0.74 0.74 0.73 0.41
lower 0.72 0.65 0.67 0.69 0.37
upper 0.76 0.83 0.81 0.76 0.46
Test/unknown set – in silico evaluation vs actual evaluation
Results – accuracy metrics values (Yr)
2nd International Wheat Stripe Rust Symposium
Classifier method AUC Cohen’s Kappa
Principal Component Regression (PCR)
0.69 (0.68-0.70) 0.40 (0.37-0.42)
Partial Least Squares (PLS)
0.69 (0.68-0.70) 0.41 (0.39-0.43)
Random Forest (RF) 0.70 (0.69-0.71) 0.42 (0.40-0.44)
Support Vector Machines (SVM)
0.71 (0.70-0.72) 0.44 (0.42-0.45)
Artificial Neural Networks (ANN)
0.71 (0.70-0.72) 0.44 (0.42-0.46)
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.5 1.0
0.00
.51
.01
.52
.02
.53
.0
Results – accuracy metrics values Stem rust – previous research
2nd International Wheat Stripe Rust Symposium
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
-0.2
90
0.29
0.58
0.87
1.16
-0.5 0.0 0.5 1.0 1.5
01
23
4
Distribution by trait stateTr
ue p
ositi
ve ra
te
Freq
uenc
yBari et al. (2014). Predicting resistance to stripe (yellow) rust in wheat genetic resources using Focused Identification of Germplasm Strategy (FIGS). Journal of Agricultural Science
ROC plots (left) and density plots class prediction (right)
False positive rate Predicted probability
Results – Graphs -Stripe rust
2nd International Wheat Stripe Rust Symposium
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Results – Model predictions
SVM
RF
NN
Accuracy metrics (ROC) plots for the SMV, RF and NN models applied to the evaluation data not made known to the model.
The histograms are about predictions of resistance and susceptibility, where
= susceptibility
= resistance.
2nd International Wheat Stripe Rust Symposium
Results – spatial patterns
Likelihood of an area yielding traits of resistance to stripe rust (yellow colour)
Longitude
Latit
ude
2nd International Wheat Stripe Rust Symposium
Sub-Setting procedure – adjustment based on phenology
Alignment of data based on phenology
To reduce:
• The “out phase” differences due to different growing seasons/periods
The daily data were derived from models involving the proposed model by Epstein (1991) as a sum of harmonic components.
2nd International Wheat Stripe Rust Symposium
Modelling/predictions Capturing the shift induced by climate
Based on the estimation of the duration of the period during the year in which neither moisture nor temperature are limiting to plants.
Target specific phase of cropdevelopment
Bari et al. (in press). Searching for climate change related traits in plant genetic resources collections using Focused Identification of Germplasm Strategy (FIGS). Options Méditerranéennes.
Alignment of data based on phenology
2nd International Wheat Stripe Rust Symposium
Accuracy and agreement parameters of aligned data
Sub-Setting procedure – adjustment based on phenology - results
Data type AUCOmission rate Sensitivity Specificity
Correct classification Kappa
monthly 0.81 0.28 0.72 0.90 0.86 0.61
daily data 0.82 0.30 0.70 0.93 0.88 0.64
aligned daily data 0.83 0.28 0.72 0.95 0.90 0.70
210 days
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
-0.2
90
0.29
0.58
0.87
1.16
2nd International Wheat Stripe Rust Symposium
Modelling/predictions Capturing the shift induced by climate - verification
0 100 200 300
020
40
60
80
x$x
x$ysm
th
Data alignment to growing season
Algorithms
Separate phase variation from amplitude variation
0 100 200 300
5010
015
020
0
x$x
x$ys
mth
Site (i) : Si(xi, yi) Site (j): Sj(xj, yj)
day
rain
fall
day
http://mpe2013.org/
We are not there yet …
2nd International Wheat Stripe Rust Symposium
Future directions (in summary)
Trait data (Y) Environmental data (X)
x
u
y
u for yet unknown variables
FIGS aims to deal with unobserved inputs, uncertainty, and un-ambiguity (v) CC induced shift (bias)Z to eventually capture the dynamics (complexity)
v
Climate change
FIGS
Z(t)
2nd International Wheat Stripe Rust Symposium
“Applied Mathematics and Omics Technologies for
Discovering Biodiversity and Genetic Resources for Climate
Change Mitigation and Adaptation to Sustain
Agriculture in Drylands”
http://mpe2013.org/
Future directions
Explore the use of a variety of applied mathematics approaches in relation to phenology aspects of both the pathogen and the host.
Expect to appear also at MPE
host pathogenSummary proceedings
2nd International Wheat Stripe Rust Symposium
Teşekkür EderimThank you
Abdallah Bari Kumarse Nazari Miloudi NachitAhmed Amri Ken Street Chandra BiradarAmor Yahyaoui Dag Endresen