University of HelsinkiSakari Kuikka One-out all-out principle or Bayesian Integration ? Sakari...
-
Upload
augusta-lynch -
Category
Documents
-
view
225 -
download
0
Transcript of University of HelsinkiSakari Kuikka One-out all-out principle or Bayesian Integration ? Sakari...
University of Helsinki Sakari Kuikka
One-out all-out principle or One-out all-out principle or Bayesian Integration ? Bayesian Integration ?
Sakari Kuikka: University of HelsinkiSeppo Rekolainen: Finnish Environmental
InstituteMikko Mukula: University of HelsinkiJouni Tammi: University of Helsinki
Laura Uusitalo: University of Helsinki
University of Helsinki Sakari Kuikka 2
FEM research group at the University of Helsinki
• 1 professor, 3 postdoctoral researchers, 6 postgraduate researchers, 2 graduate students
• 2 locations: Helsinki and Kotka• Research interests:
– Decision analysis of renewable resources– Integrating different sources of data and
other knowledge: Bayesian analysis– Identification and quantification of risks in
the use of natural resources– Analysis of management of natural
resources in the face of risks and uncertainty in the information and control
• => User of information in an essential role
University of Helsinki Sakari Kuikka 3
Aim of data collection and data analysis
to increase the probability of correct decision making
Correct? = achieving aim with high probability, or avoiding problem with high probability (like ”points of no return”)
University of Helsinki Sakari Kuikka 4
Objectives of the talk
1) To briefly discuss the sources of uncertainty
2) To briefly represent the Bayes theory
3) To represent a classification model based on the Bayes rule in classification
4) To compare the results to “one-out all-out” principle
University of Helsinki Sakari Kuikka 5
Number of elements and chance for misclassification
EU CIS Ecostat Guidance 2003
University of Helsinki Sakari Kuikka 6
Risk: e.g. probability to be, or to go, above a critical threshold?
Probabilistic calculus may be needed for a correct decision (dioxin or P load ”of no return”)
0
0.01
0.02
0.03
0.04
0 20 40 60 80 100 120 140 160 180 200
value of interest variable
pro
babili
ty
A
B
critical value (e.g. dioxin)
University of Helsinki Sakari Kuikka 7
Risk
Risk = probability * loss
Two alternative coin games:A) 0.5 * 1000 euros and 0.5 * (- 1000 euros) or B) 0.5 * 10 000 euros and 0.5 * (-10 000
euros)
I would pay at least 500 – 2 000 euros to get the first game instead of the second.
University of Helsinki Sakari Kuikka 8
Sources of uncertainty
• Variability over: time, space, measurements, uncertainty in model selection
• E.g. several visits to the same lake can produce different measurements/assessment values
• E.g. a lake can naturally have poor benthos (e.g. due to high fish predation?) => causalities are not allways deterministic
University of Helsinki Sakari Kuikka 9
Uncertainties
So, there are uncertainties:
1) In measurements (mostly this here)
2) In causal relationships of nature
It is diffult to separate these in a data analysis!
University of Helsinki Sakari Kuikka 10
Bayes rule
P (b|a) = ------------------------P (a|b) P (b)
P (a)
a: data, observations, etc.
b: probability of parameter value, or hypothesis
Note:all argumentation is based on probability
distributions, not on single values!
University of Helsinki Sakari Kuikka 11
Likelihood
• P (measurement | correct value)• E.g. if correct value is 10, we may have:
Measurement Probability 12 0.2 10 0.6 8 0.2So, measurement 12 can be linked to several real values of the lake !
University of Helsinki Sakari Kuikka 12
Bayes rule: probabilistic dependencies
Real number
of fish (B)
Observations
(data), A
Observations
Real number
of fish
P (A|B)
P (B|A)
University of Helsinki Sakari Kuikka 13
Bayesian inference:P(N | data) P(data | N) P(N)
0
0.05
0.1
0.15
0.2
0.25
1 3 5 7 9 11 13 15 17 19 21 23 25
Population size (N)
pro
bab
ilit
y
P(N)
P(data|N)
P(N|data)
University of Helsinki Sakari Kuikka 14
0
20
40
60
>0.015 0.015-0.06
≤0.06
Pike yield
0
20
40
60
>0.015 0.015-0.06
≤0.06
Pike yield
0
20
40
60
>0.015 0.015-0.06
≤0.06
Pike yield
Disretization
University of Helsinki Sakari Kuikka 15
Applying Bayes rule
Several uncertain, but supporting information sources increase the total evidence (=decreases uncertainty)
In WFD, the probability (posterior) of a certain classification result, obtained after the probabilistic assessment result of first quality element (e.g. fishes), could be used as a prior for the analysis of the next element. And also should
= all quality elements have their own role
=> learning process of science
University of Helsinki Sakari Kuikka 16
Model structure
+ submodels (naive nets) under each element !
University of Helsinki Sakari Kuikka 17
Sub models: naive Bayesian nets
Class
Sp_1 Sp_2 Sp_3 Sp_4 Sp_5
Generally speaking, best methodology to classify
University of Helsinki Sakari Kuikka 18
Data in this analysis:input to naive nets
Only one lake type
Fish stock data: 80 lakes, gillnet Phytoplankton: 1330 samples Benthos: 71 samples (22 lakes)Macrophytes: 70 surveys (47 lakes)
”Truth” needed to test the method= arbitrary value of phosphorus was selected as a classifier for lake class
University of Helsinki Sakari Kuikka 19
Analysis of data
Classes: OK (high or good = < 30 ug TP/l )Restore (moderate or less = > 30 ug TP/l )
Probability of correct classification: leaving out one data point at time from parameter estimation, and using biological information of that data point to classify (the phosphorus of) that lake (weka software)
Data
Left out
University of Helsinki Sakari Kuikka 20
Model assumptions
• ”One out - all out”: total assessment is ”restore”, if one of the components goes to ”restore”
• Same model to test how Bayes rule works in classification
• Each element was analyzed with a separate, specific model (naive Bayes net). This ”meta-model” uses likelihoods estimated by those (also integrating) submodels
University of Helsinki Sakari Kuikka 21
Results 1: Likelihoods (probabilities of correct/uncorrect classifications)
Truth Assessm. Fish Macroph. Benthos Phytopl
OK OK 0.92 0.93 0.75 0.91
OK Restore 0.08 0.07 0.25 0.09
Restore Restore 0.77 0.69 0.65 0.79
Restore OK 0.23 0.31 0.35 0.21
Estimated by naive submodels for each element
The results of the last line are problematic!
University of Helsinki Sakari Kuikka 22
Results 2: one-out all-out
Applying one-out, all-out:
If lake is restore, P(assesm=resto)=0.99
If lake is OK, P(assesm=resto) = 0.37 ! (or even
higher, depending on some details)
= Potential for misclassification, i.e. lot of mismanagent!
University of Helsinki Sakari Kuikka 23
Results 3: Bayes rule/1
Applying Bayes rule for single oservation & naive net assessment (starting from prior = 0.5):
• obs: macr=OK; P (lake=OK) = 0.68• obs: fish=OK; P (lake=OK) = 0.80
Bayes rule for 2 joined observations:obs: macr=OK, fish=OK; P(lake=OK) = 0.89obs: benth=OK, phyt=OK; P(lake=OK) = 0.87obs: macr=resto, fish=resto; P (lake=resto) =
0.99
University of Helsinki Sakari Kuikka 24
Conclusions I: ”One out all out”
• The problem of the ”one out all out principle” is in the relatively high uncertainty between the real state of nature and the assessment result, i.e. in the likelihood functions (especially benthos in this data set)
• The more there are uncertain elements, the more likely is ”false alarm”
University of Helsinki Sakari Kuikka 25
Conclusions II: Bayes model
• Bayes rule helps to integrate uncertain evidence from several sources
• Assessment result ”restore” is likely to be correct with a Bayesian model
• Assessment result ”OK” is more uncertain, as it may mean a ”restore” lake (see likelihood relationships)
• Bayesian models are easiers and cheaper way to decrease uncertainty than increased monitoring effort
University of Helsinki Sakari Kuikka 26
Results 1: Likelihoods (probabilities of correct/uncorrect classifications)
Truth Assessm. Fish Macroph. Benthos Phytopl
OK OK 0.92 0.93 0.75 0.91
OK Restore 0.08 0.07 0.25 0.09
Restore Restore 0.77 0.69 0.65 0.79
Restore OK 0.23 0.31 0.35 0.21
Estimated by naive submodels for each element
The results of the last line are problematic!
University of Helsinki Sakari Kuikka 27
Conclusions III: Management
• There is clearly a need to link management decisions (program of measures) to the classification: they would give a content for the uncertainty in classification (=probability for misallocation of money?)
• We suggest that probability of misclassification is a policy issue, not a scientific issue
• Classification models may have an impact on interest to collect/improve data?
University of Helsinki Sakari Kuikka 28
Way forward I: Risk assessment and Risk
Management
Pressure
CHL
or
”P level
of no return”
ABCA = point estimate level
B= risk averse attitude in threshold only
C= implementation uncertainty included
0 0.005 0.01 0.015 0.02 0.025 0.03
10
30
50
70
90
110
130
150
170
University of Helsinki Sakari Kuikka 29
Conclusions IV
• Risk assessment and risk management must be separated (ref. to Scientific, Technical and Economic Committee for Fisheries)
• Framework directive = should risk attitude be country specific ? On which values of society it must be based on?
• Does the number of people per lake have an impact on management conclusions? (public participation = mechanism to bring in values)
University of Helsinki Sakari Kuikka 30
Conclusions V
• Bayesian network methodology is easy: one week education to start with your data
• Conceptual part is more difficult, but far more
easy than understanding the real information contents of test statistics in ”classical statistics”
• Bayesian parameter estimation (in some areas ”the most correct way to do it”) with e.g. Winbugs software is more difficult, but achievable in 6 – 8 months of work
• Education !!!! = Marie Curie activities, join with fisheries ?
University of Helsinki Sakari Kuikka 31
Way forward II: Multiobjective valuation
Improved lake
Ecolog.status Fishing Recr. inter.
Fish Macrop. Swimming Boating Kg/ha CPUE
goals
objectives
(weights 0 -1)
criteria
alternatives Lake 1 Lake 3Lake 2
An example of the value-tree
Anne-Marie HagmanMika MarttunenSYKE
University of Helsinki Sakari Kuikka 32
Way forward II: Example of ranking
The higher is the preference value, the higher preference the lake has on ”action list”
Several publications, work related to WFD starting
00.10.20.30.40.50.60.70.80.9
Lake
Hun
ttijär
vi
Lake
Isojä
rvi
Lake
Sää
ksjär
vi
Lake
Ahv
enlam
pi
Lake
Sah
ajärv
i
Lake
Iso-
Vuota
va
Lake
Ven
unjär
vi
number ofinhabitantsattractiveness
attainment
Ecolog. status
cottages
swimming
By: Anne-Marie HagmanMika MarttunenSYKE