Wuhan Key
-
Upload
tayyab-saeed -
Category
Documents
-
view
214 -
download
0
Transcript of Wuhan Key
-
8/13/2019 Wuhan Key
1/22
Decison Making With Uncertainty And Data Mining
David L. OlsonDepartment of Management
University of Nebraska
Lincoln, NE 6888!"#$%"'( #)'!#'%
*+- "'( #)'!8
Dolson/0nl.ed0
Des1eng 203 &corresponding a0t1or(
4c1ool of 50siness
University of 4cience and ec1nology of 71ina
efei +n10i '""'6 9.:. 71inadas1/0stc.ed0
Keywords- M0ltiple attrib0te decision making &M+DM(; data mining; 0ncertainty ; *0
-
8/13/2019 Wuhan Key
2/22
Abstract.
Data mining is a ne=ly developed and emerging area of comp0tational intelligence t1at offers
ne= t1eories, tec1ni>0es, and tools for analysis of large data sets. ?t is e@pected to offer more
and more s0pport to modern organi
-
8/13/2019 Wuhan Key
3/22
b0siness processes. 1e field of data mining aims to improve decision making by foc0sing on
discovering valid, compre1ensible, and potentially 0sef0l kno=ledge from large data sets.
1is paper presents a brief demonstration of t1e 0se of Monte 7arlo sim0lation in grey
related analysis. 4im0lation provides a means to more completely describe e@pected res0lts, to
incl0de identification of t1e probability of a partic0lar option being best in a m0ltiattrib0te
setting. 1e ne@t section describes a Monte 7arlo sim0lation of res0lts of decision tree analysis
of real credit card data. Monte 7arlo sim0lation provides a means to more completely assess
relative performance of alternative decision tree models. :elative performance of crisp and
f0
-
8/13/2019 Wuhan Key
4/22
reflect 0ncertainty as e@pressed by f0
-
8/13/2019 Wuhan Key
5/22
+ntFnio B".6 ".8C B".) ".$C B".' ".#C B".# ".8C B"." ".#C B".# ".)C B".) %.""C
*Gbio B".' ".#C B"." ".'C B".6 ".8C B"." ".6C B"." ".)C B"." ".'C B"." ".#C
+lberto B".# ".6C B".'" ".8"C B".6 ".8C B"." ".8"C B". ".$"C B".'" ".#C B".) %.""C
*ernando B".8 %.""C B". ".)C B".6 ".8C B".% ".6C B"." ".)"C B".# ".8"C B". ".)"C
?sabel B"." ".$C B".6 ".$C B".# ".6C B".6 ".$C B"." "."C B".# ".8"C B"." ".$"C
:afaela B".6 ".8C B".% ".C B".# ".6C B".' ".)C B"." ".#C B".# ".8"C B".%" ".C
+ll of t1ese inde@ val0es are positive. 1e ne@t step of t1e grey related met1od is to standardi0ences based
on t1e optimal =eig1ted interval n0mber val0e for every alternative. 1is is defined as t1e
interval n0mber for eac1 attrib0te defined as t1e ma@im0m left interval val0e over all
alternatives, and t1e ma@im0m rig1t interval val0e over all alternatives. *or "1, t1is =o0ld yield
t1e interval n0mber B".%), ".C. 1is reflects t1e ma@im0m =eig1ted val0e obtained in t1e data
set for attrib0te "1. able gives t1is vector, =1ic1 reflects t1e range of val0e possibilities
&entries are not ro0nded(-
-
8/13/2019 Wuhan Key
6/22
able - :eference N0mber Iector
"1 "2 "# "$ "% "& "'
Ma@&Min( ".%)"" ".''" "."' ".%6' "."' "."'' ".%8)
Ma@&Ma@( "."" ".'' ".'" ".#)" ".#"" ".'#"" ".""
Distances are defined as t1e ma@im0m bet=een eac1 interval val0e and t1e e@tremes generated.
able # s1o=s t1e calc0lated distances by alternative.
able #- Distances *rom +lternatives to :eference N0mber Iector
Distances "1 "2 "# "$ "% "& "'+ntFnio B."#, ."'C B", "C B."', .%'C B.", ."C B."#, .'"'C B", ."%C B", "C
*Gbio B.%', .%$'C B.'%, .8C B", "C B."8), .%C B.""), .
"6)C
B."', .%6C B.%), .
"'C
+lberto B."8, .%''C B.%6, ."8'C B", "C B."), .")C B", "C B."%', .
%"C
B", "C
*ernando B", "C B.%', .%%C B", "C B.%', .%C B.""), ."$C B", "C B.%, .%6C
?sabel B."), ."%)C B.", "C B."%, ."6C B", "C B."#, .%8C B", "C B."6', .
"C
:afaela B."#, ."'C B.%8, .C B."%, ."6C B.%, .%C B."#, .'"'C B", "C B.%6', .
'#)C
1e ma@im0m distance for eac1 alternative to t1e ideal is identified as t1e largest distance
calc0lation in eac1 cell of able #. 1ese ma@ima are s1o=n in able .
able - Ma@im0m Distances
Distances "1 "2 "# "$ "% "& "'
+ntFnio "."' " ".%' "." ".'"' "."% "
*Gbio ".%$' ".8 " ".% "."6) ".%6 "."'
+lberto ".%'' ".%6 " ".") " ".%" "*ernando " ".%' " ".% "."$ " ".%6
?sabel ".") "." "."6 " ".%8 " "."6'
:afaela "."' ". "."6 ".% ".'"' " ".'#)
6
-
8/13/2019 Wuhan Key
7/22
+ reference point
(C(&,(&B,...,(C'&,('&B,(C%&,(%&&B """"""" nunuuuuuU +++
= is establis1ed as
t1e ma@im0m of entries in eac1 col0mn of able ). 1is point 1as a minim0m of " and a
ma@im0m of ".8". 10s t1e reference point is B", ".8C. Ne@t t1e met1od calc0lates t1e
ma@im0m distance bet=een t1e reference point and eac1 of t1e 2eig1ted Matri@ 7 val0es.
5ased 0pon =eig1t interval n0mber standardi0ence
(C(&,(&B,...,(C'&,('&B,(C%&,(%&&B """"""" nunuuuuuU +++
= , t1e form0la
for t1is calc0lation is given as follo=s.
JC,B(C&(,&BJma@ma@JC,B(C&(,&BJ
JC,B(C&(,&BJma@ma@JC,B(C&(,&BJminmin(&
""""
""""
++++
++++
+
+
=
ikikki
ikik
ikikki
ikikki
i
cckukucckuku
cckukucckuku
k
21ere & (,"& + ( is called resolving coefficient. 1e smaller is, t1e greater its
resolving po=er. ?n general, B"%C .1e val0e of may c1ange according to t1e practical
sit0ation.
:es0lts by alternative are given in able 6-
able 6- 2eig1ted Distances to :eference 9oint
Distances "1 "2 "# "$ "% "& "' A(erages
+ntFnio ".)8)%# % ".6%6""" ".)$8%# ".#8)#' ".$'))%% % ).*)1%12
*Gbio ".""""" ". % ".6'"## ".)#"8 ".8#6' ".8888$ ).%*)$$%
+lberto ".6%%%%% ".8#6' % ".)%$6'6 % ".6#)"$ % ).'**)#'
*ernando % ".6%6""" % ".6'"## ".68%#%6 % ".8#6' ).''11#2
?sabel ".) ".86%6$ ".)6')6 % ".%6))$ % ".)#$"' ).*)$&%1
:afaela ".)8)%# ".68#'% ".)6')6 ".68%'" ".#8)#' % ".#)"" ).&$2'*2
)
-
8/13/2019 Wuhan Key
8/22
1e average =
=n
i
ii kn
r%
(&%
& mi ,...,',%= ) of t1ese =eig1ted distances is 0sed as t1e
reference n0mber to order alternatives. 1ese averages reflect 1o= far a=ay eac1 alternative is
from t1e nadir, along =it1 1o= close t1ey are to t1e ideal, m0c1 as in O94?4. 1is set of
n0mbers indicates t1at ?sabel is t1e preferred alternative, alt1o0g1 +ntFnio is e@tremely close,
=it1 +lberto and *ernando close be1ind. 1is closeness demonstrates t1at t1e f0
-
8/13/2019 Wuhan Key
9/22
Xis t1e random n0mber dra=n &=1ic1 is t1e area(
?fX!-
( ) ( )
"!
aaaaaaXaX
+
++=
%'#%'% &8(
?f! X!$K-
( )'' aaK
!XaX
+= &$(
?f!$KX-
( ) ( ) ( )"!
aaaaaaXaX
+
+=
%'##%# &%"(
O0r calc0lation is based 0pon dra=ing a random n0mber reflecting t1e area &starting on
t1e left &a%( as ", ending on t1e rig1t &a#( as %(, and calc0lating t1e distance on t1e !a@is. 1e
sim0lation soft=are 7rystal 5all =as 0sed to replicate eac1 model %,""" times for eac1 random
n0mber seed. 1e soft=are enabled co0nting t1e n0mber of times eac1 alternative =on.
9robabilities given in able ) are t10s simply t1e n0mber of times eac1 alternative 1ad t1e
1ig1est val0e score divided by %,""". 1is =as done ten times, 0sing different seeds. 1erefore,
mean probabilities and standard deviations &std( are based on %",""" sim0lations. 1e Min and
Ma@ entries are t1e minim0m and ma@im0m probabilities in t1e ten replications s1o=n in t1e
table.
able )- 4im0lated 9robabilities of 2inning for Uniform *0
-
8/13/2019 Wuhan Key
10/22
seed6789 0.381 0.000 0.179 0.046 0.394 0.000
seed7890 0.343 0.000 0.199 0.02 0.406 0.000
seed8901 0.328 0.000 0.201 0.04 0.426 0.000
seed9012 0.33 0.000 0.189 0.048 0.410 0.000
seed0123 0.360 0.000 0.183 0.03 0.404 0.000
!in 0.328 0.000 0.168 0.040 0.384 0.000!ean 0.34 0.000 0.189 0.047 0.410 0.000
!a" 0.381 0.000 0.210 0.03 0.429 0.000
std 0.017 0.000 0.012 0.004 0.01 0.000
3.%. +nalysis of :es0lts
1e res0lts for eac1 system =ere very similar. Differences =ere tested by t!test of
differences in means by alternative. None of t1ese difference tests =ere significant at t1e ".$
level &t=o!tailed tests(. 1is establis1es t1at no significant difference in interval or trape
-
8/13/2019 Wuhan Key
11/22
21ile t1is e@ample is on a small set of data, t1e intent =as to demonstrate =1at co0ld be
done in t1at conte@t co0ld be applied on large!scale data sets as =ell. O0r proposal is 0ni>0e to
o0r kno=ledge, proposing t1e 0se of sim0lation to more f0lly 0se grey!related data t1at more
acc0rately reflects t1e real problem. ?f t1is co0ld be done =it1 small!scale data sets, o0r
contention is t1at it can also be done =it1 large!scale data sets in a data mining conte@t.
$. Grey elated Decision +ree Model
Arey related analysis is e@pected to provide improvement over crisp models by better reflecting
t1e 0ncertainty in1erent in many 10man analystsH minds. Data mining models based 0pon s0c1
data are e@pected to be less acc0rate, b0t 1opef0lly not by very m0c1. o=ever, grey related
model inp0t =o0ld be e@pected to be stabler 0nder conditions of 0ncertainty =1ere t1e degree of
c1ange in inp0t data increased.
2e applied decision tree analysis to a small set &%,""" observations total( of credit card
data. Originally, t1ere =as one o0tp0t variable &=1et1er or not t1e acco0nt defa0lted, a binary
variable =it1 % representing defa0lt, " representing no defa0lt( and 6 available e@planatory
variables. 1ese variables =ere analy
-
8/13/2019 Wuhan Key
12/22
1e e@planatory variables incl0ded five binary variables and one categorical variable,
=it1 t1e remaining '" being contin0o0s. o reflect f00e decision
trees =ere obtained, =it1 form0las again given belo=. + total of seven e@planatory variables
=ere 0sed in t1ese fo0r categorical decision trees.
1ese models =ere t1en entered into a Monte 7arlo sim0lation &s0pported by 7rystal
5all soft=are(. + pert0rbation of eac1 inp0t variable =as generated, set at five different levels of
pert0rbation. 1e intent =as to meas0re t1e loss of acc0racy for crisp and grey related models.
1e model res0lts are given in t1e seven model reports in t1e appendi@. 4ince different
variables =ere incl0ded in different models, it is not possible to directly compare relative
acc0racy as meas0red by fitting test data. o=ever, t1e means for t1e acc0racy on test data for
eac1 model given in able $ s1o= t1at t1e crisp models declined in acc0racy more t1an t1e
categorical models. 1e col0mn 1eadings in able $ reflect t1e degree of pert0rbation sim0lated.
%'
-
8/13/2019 Wuhan Key
13/22
able $- Mean Model +cc0racy
Model "ris, ).2% ).%) 1.)) 2.)) #.)) $.)) ).2%
7ontin0o0s % ".)" ".)" ".)" ."68 ".6) ".66 ".6 ".)"
7ontin0o0s ' ".6) ".6) ".6) ".6) ".6) ".66 ".66 ".6)
7ontin0o0s ".)% ".)% ".)" ".6$ ".6) ".6) ".66 ".)%"ontinuous ).&-# ).&-# ).&-) ).&*) .&') ).&&' ).&%' ).&-#
7ategorical % ".)" ".)" ".68 ".6) ".66 ".66 ".6 ".)"
7ategorical ' ".)" ".)" ".)" ".6$ ".68 ".6) ".6) ".)"
7ategorical ".)" ".)" ".)" ".6$ ".6$ ".68 ".6) ".)"
7ategorical # ".)" ".)" ".)" ".6$ ".68 ".6) ".6) ".)"
"ategorical ).')) ).')) ).&-% ).&** ).&'* ).&') ).&&% ).'))
1e f0
-
8/13/2019 Wuhan Key
14/22
tec1ni>0e offers more insig1ts to assist o0r decision making in f0
-
8/13/2019 Wuhan Key
15/22
B8C 9earl, ., 9robabilistic reasoning in intelligent systems, Net=orks of 9la0sible
inference, Morgan a0fmann, 4an Mateo,7+ %$88.
B$C Aa0 2.L., 50e1rer D... Iag0e sets. ?EEE rans, 4yst. Man, 7ybern, '&%$$( 6%"!6%#
%
-
8/13/2019 Wuhan Key
16/22
+99END?- Models and t1eir res0lts
7ontin0o0s Model %-
I!&$3&.$$454I!%*31.%$464I!&3#.-14546777
Fre*)en+$ ,-art
proportion
.000
.094
.187
.281
.374
0
93.
187
280.
374
0.68 0.69 0.70 0.72 0.73
1000 Trials 994 /ispla$edFore+ast ,ont 1 a++)ra+$
est matri@-
Model " Model % +cc0racy
+ct0al " # %6
+ct0al % %# ') ".)"
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".6)!".)
pert0rbation B!".","."C ".6!".)#
pert0rbation B!%,%C ".6'!".)pert0rbation B!','C ".8!".)#
pert0rbation B!,C ".)!".)#pert0rbation B!#,#C ".6!".)
0. 0.60 0.6 0.70 0.7
%6
-
8/13/2019 Wuhan Key
17/22
7ontin0o0s Model '-
I!&$3&.$$45467
Fre*)en+$ ,-art
proportion
.000
.117
.233
.30
.466
0
116.
233
349.
466
0.6 0.66 0.67 0.69 0.70
1000 Trials 991 /ispla$ed
Fore+ast ,ont 2 a++)ra+$
est matri@-
Model " Model % +cc0racy+ct0al " #" %$
+ct0al % %# ') ".6)
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".6!".)%pert0rbation B!".","."C ".6!".)%
pert0rbation B!%,%C ".6"!".)#
pert0rbation B!','C ".8!".)
pert0rbation B!,C ".!".)8pert0rbation B!#,#C ".!".)6
0. 0.60 0.6 0.70 0.7
%)
-
8/13/2019 Wuhan Key
18/22
7ontin0o0s Model -
I!&$3&.$$454I!%*31.%$464I! .2*4645777
Fre*)en+$ ,-art
proportion
.000
.09
.119
.178
.237
0
9.2
118.
177.7
237
0.6 0.68 0.70 0.73 0.7
1000 Trials 996 /ispla$ed
Fore+ast ,ont 3 a++)ra+$
est matri@-
Model " Model % +cc0racy+ct0al " ## %
+ct0al % %# ') ".)%
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".6!".)6pert0rbation B!".","."C ".6!".)6
pert0rbation B!%,%C ".$!".))
pert0rbation B!','C ".#!".)$
pert0rbation B!,C ".!".)8pert0rbation B!#,#C ".!".)6
0. 0.60 0.6 0.70 0.7
%8
-
8/13/2019 Wuhan Key
19/22
-
8/13/2019 Wuhan Key
20/22
7ategorical Model '-
I!&$89high94I!%$89lo94
I!"D:89mid94I!#'89lo9464574if!"D:89lo9454677
I!%$89high94if!$*89mid945467467457
Fre*)en+$ ,-art
proportion
.000
.06
.111
.167
.222
0
.
111
166.
222
0.6 0.68 0.70 0.73 0.7
1000 Trials 997 /ispla$edFore+ast ,at2 a++)ra+$
est matri@-
Model " Model % +cc0racy
+ct0al " #' %)
+ct0al % % '8 ".)"
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".6!".)
pert0rbation B!".","."C ".6#!".)6
pert0rbation B!%,%C ".6%!".)6pert0rbation B!','C ".8!".)6
pert0rbation B!,C ".)!".8"pert0rbation B!#,#C ".6!".)$
0.60 0.6 0.70 0.7
'"
-
8/13/2019 Wuhan Key
21/22
7ategorical Model -
I!&$89high946457
Fre*)en+$ ,-art
proportion
.000
.170
.340
.10
.680
0
170
340
10
680
0.68 0.69 0.69 0.70 0.70
1000 Trials 982 /ispla$ed
Fore+ast ,at3 a++)ra+$
est matri@-
Model " Model % +cc0racy+ct0al " '6
+ct0al % # ) ".)"
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".68!".)"pert0rbation B!".","."C ".6)!".)%
pert0rbation B!%,%C ".66!".)'
pert0rbation B!','C ".6'!".)
pert0rbation B!,C ".$!".)pert0rbation B!#,#C ".$!".)6
0.60 0.6 0.70 0.7
'%
-
8/13/2019 Wuhan Key
22/22
7ategorical Model #-
I!&$89high94
I!%$89lo94
I!"D:89mid94I!#'89lo9464574I!"D:89lo94I!&13.%464574677
I!%$89high94I!$*89mid945467467 457
Fre*)en+$ ,-art
proportion
.000
.09
.118
.177
.236
0
9
118
177
236
0.66 0.69 0.71 0.74 0.76
1000 Trials 998 /ispla$ed
Fore+ast ,at4 a++)ra+$
est matri@-
Model " Model % +cc0racy
+ct0al " #% %8
+ct0al % %' '$ ".)"
4im0lation acc0racy of %"" observations, %""" sim0lation r0ns
pert0rbation B!".',".'C ".6!".)6
pert0rbation B!".","."C ".6#!".))pert0rbation B!%,%C ".6%!".))
pert0rbation B!','C ".8!".))pert0rbation B!,C ".)!".))pert0rbation B!#,#C ".!".)8
0. 0.60 0.6 0.70 0.7
''