Wuhan Key

download Wuhan Key

of 22

Transcript of Wuhan Key

  • 8/13/2019 Wuhan Key

    1/22

    Decison Making With Uncertainty And Data Mining

    David L. OlsonDepartment of Management

    University of Nebraska

    Lincoln, NE 6888!"#$%"'( #)'!#'%

    *+- "'( #)'!8

    Dolson/0nl.ed0

    Des1eng 203 &corresponding a0t1or(

    4c1ool of 50siness

    University of 4cience and ec1nology of 71ina

    efei +n10i '""'6 9.:. 71inadas1/0stc.ed0

    Keywords- M0ltiple attrib0te decision making &M+DM(; data mining; 0ncertainty ; *0

  • 8/13/2019 Wuhan Key

    2/22

    Abstract.

    Data mining is a ne=ly developed and emerging area of comp0tational intelligence t1at offers

    ne= t1eories, tec1ni>0es, and tools for analysis of large data sets. ?t is e@pected to offer more

    and more s0pport to modern organi

  • 8/13/2019 Wuhan Key

    3/22

    b0siness processes. 1e field of data mining aims to improve decision making by foc0sing on

    discovering valid, compre1ensible, and potentially 0sef0l kno=ledge from large data sets.

    1is paper presents a brief demonstration of t1e 0se of Monte 7arlo sim0lation in grey

    related analysis. 4im0lation provides a means to more completely describe e@pected res0lts, to

    incl0de identification of t1e probability of a partic0lar option being best in a m0ltiattrib0te

    setting. 1e ne@t section describes a Monte 7arlo sim0lation of res0lts of decision tree analysis

    of real credit card data. Monte 7arlo sim0lation provides a means to more completely assess

    relative performance of alternative decision tree models. :elative performance of crisp and

    f0

  • 8/13/2019 Wuhan Key

    4/22

    reflect 0ncertainty as e@pressed by f0

  • 8/13/2019 Wuhan Key

    5/22

    +ntFnio B".6 ".8C B".) ".$C B".' ".#C B".# ".8C B"." ".#C B".# ".)C B".) %.""C

    *Gbio B".' ".#C B"." ".'C B".6 ".8C B"." ".6C B"." ".)C B"." ".'C B"." ".#C

    +lberto B".# ".6C B".'" ".8"C B".6 ".8C B"." ".8"C B". ".$"C B".'" ".#C B".) %.""C

    *ernando B".8 %.""C B". ".)C B".6 ".8C B".% ".6C B"." ".)"C B".# ".8"C B". ".)"C

    ?sabel B"." ".$C B".6 ".$C B".# ".6C B".6 ".$C B"." "."C B".# ".8"C B"." ".$"C

    :afaela B".6 ".8C B".% ".C B".# ".6C B".' ".)C B"." ".#C B".# ".8"C B".%" ".C

    +ll of t1ese inde@ val0es are positive. 1e ne@t step of t1e grey related met1od is to standardi0ences based

    on t1e optimal =eig1ted interval n0mber val0e for every alternative. 1is is defined as t1e

    interval n0mber for eac1 attrib0te defined as t1e ma@im0m left interval val0e over all

    alternatives, and t1e ma@im0m rig1t interval val0e over all alternatives. *or "1, t1is =o0ld yield

    t1e interval n0mber B".%), ".C. 1is reflects t1e ma@im0m =eig1ted val0e obtained in t1e data

    set for attrib0te "1. able gives t1is vector, =1ic1 reflects t1e range of val0e possibilities

    &entries are not ro0nded(-

  • 8/13/2019 Wuhan Key

    6/22

    able - :eference N0mber Iector

    "1 "2 "# "$ "% "& "'

    Ma@&Min( ".%)"" ".''" "."' ".%6' "."' "."'' ".%8)

    Ma@&Ma@( "."" ".'' ".'" ".#)" ".#"" ".'#"" ".""

    Distances are defined as t1e ma@im0m bet=een eac1 interval val0e and t1e e@tremes generated.

    able # s1o=s t1e calc0lated distances by alternative.

    able #- Distances *rom +lternatives to :eference N0mber Iector

    Distances "1 "2 "# "$ "% "& "'+ntFnio B."#, ."'C B", "C B."', .%'C B.", ."C B."#, .'"'C B", ."%C B", "C

    *Gbio B.%', .%$'C B.'%, .8C B", "C B."8), .%C B.""), .

    "6)C

    B."', .%6C B.%), .

    "'C

    +lberto B."8, .%''C B.%6, ."8'C B", "C B."), .")C B", "C B."%', .

    %"C

    B", "C

    *ernando B", "C B.%', .%%C B", "C B.%', .%C B.""), ."$C B", "C B.%, .%6C

    ?sabel B."), ."%)C B.", "C B."%, ."6C B", "C B."#, .%8C B", "C B."6', .

    "C

    :afaela B."#, ."'C B.%8, .C B."%, ."6C B.%, .%C B."#, .'"'C B", "C B.%6', .

    '#)C

    1e ma@im0m distance for eac1 alternative to t1e ideal is identified as t1e largest distance

    calc0lation in eac1 cell of able #. 1ese ma@ima are s1o=n in able .

    able - Ma@im0m Distances

    Distances "1 "2 "# "$ "% "& "'

    +ntFnio "."' " ".%' "." ".'"' "."% "

    *Gbio ".%$' ".8 " ".% "."6) ".%6 "."'

    +lberto ".%'' ".%6 " ".") " ".%" "*ernando " ".%' " ".% "."$ " ".%6

    ?sabel ".") "." "."6 " ".%8 " "."6'

    :afaela "."' ". "."6 ".% ".'"' " ".'#)

    6

  • 8/13/2019 Wuhan Key

    7/22

    + reference point

    (C(&,(&B,...,(C'&,('&B,(C%&,(%&&B """"""" nunuuuuuU +++

    = is establis1ed as

    t1e ma@im0m of entries in eac1 col0mn of able ). 1is point 1as a minim0m of " and a

    ma@im0m of ".8". 10s t1e reference point is B", ".8C. Ne@t t1e met1od calc0lates t1e

    ma@im0m distance bet=een t1e reference point and eac1 of t1e 2eig1ted Matri@ 7 val0es.

    5ased 0pon =eig1t interval n0mber standardi0ence

    (C(&,(&B,...,(C'&,('&B,(C%&,(%&&B """"""" nunuuuuuU +++

    = , t1e form0la

    for t1is calc0lation is given as follo=s.

    JC,B(C&(,&BJma@ma@JC,B(C&(,&BJ

    JC,B(C&(,&BJma@ma@JC,B(C&(,&BJminmin(&

    """"

    """"

    ++++

    ++++

    +

    +

    =

    ikikki

    ikik

    ikikki

    ikikki

    i

    cckukucckuku

    cckukucckuku

    k

    21ere & (,"& + ( is called resolving coefficient. 1e smaller is, t1e greater its

    resolving po=er. ?n general, B"%C .1e val0e of may c1ange according to t1e practical

    sit0ation.

    :es0lts by alternative are given in able 6-

    able 6- 2eig1ted Distances to :eference 9oint

    Distances "1 "2 "# "$ "% "& "' A(erages

    +ntFnio ".)8)%# % ".6%6""" ".)$8%# ".#8)#' ".$'))%% % ).*)1%12

    *Gbio ".""""" ". % ".6'"## ".)#"8 ".8#6' ".8888$ ).%*)$$%

    +lberto ".6%%%%% ".8#6' % ".)%$6'6 % ".6#)"$ % ).'**)#'

    *ernando % ".6%6""" % ".6'"## ".68%#%6 % ".8#6' ).''11#2

    ?sabel ".) ".86%6$ ".)6')6 % ".%6))$ % ".)#$"' ).*)$&%1

    :afaela ".)8)%# ".68#'% ".)6')6 ".68%'" ".#8)#' % ".#)"" ).&$2'*2

    )

  • 8/13/2019 Wuhan Key

    8/22

    1e average =

    =n

    i

    ii kn

    r%

    (&%

    & mi ,...,',%= ) of t1ese =eig1ted distances is 0sed as t1e

    reference n0mber to order alternatives. 1ese averages reflect 1o= far a=ay eac1 alternative is

    from t1e nadir, along =it1 1o= close t1ey are to t1e ideal, m0c1 as in O94?4. 1is set of

    n0mbers indicates t1at ?sabel is t1e preferred alternative, alt1o0g1 +ntFnio is e@tremely close,

    =it1 +lberto and *ernando close be1ind. 1is closeness demonstrates t1at t1e f0

  • 8/13/2019 Wuhan Key

    9/22

    Xis t1e random n0mber dra=n &=1ic1 is t1e area(

    ?fX!-

    ( ) ( )

    "!

    aaaaaaXaX

    +

    ++=

    %'#%'% &8(

    ?f! X!$K-

    ( )'' aaK

    !XaX

    += &$(

    ?f!$KX-

    ( ) ( ) ( )"!

    aaaaaaXaX

    +

    +=

    %'##%# &%"(

    O0r calc0lation is based 0pon dra=ing a random n0mber reflecting t1e area &starting on

    t1e left &a%( as ", ending on t1e rig1t &a#( as %(, and calc0lating t1e distance on t1e !a@is. 1e

    sim0lation soft=are 7rystal 5all =as 0sed to replicate eac1 model %,""" times for eac1 random

    n0mber seed. 1e soft=are enabled co0nting t1e n0mber of times eac1 alternative =on.

    9robabilities given in able ) are t10s simply t1e n0mber of times eac1 alternative 1ad t1e

    1ig1est val0e score divided by %,""". 1is =as done ten times, 0sing different seeds. 1erefore,

    mean probabilities and standard deviations &std( are based on %",""" sim0lations. 1e Min and

    Ma@ entries are t1e minim0m and ma@im0m probabilities in t1e ten replications s1o=n in t1e

    table.

    able )- 4im0lated 9robabilities of 2inning for Uniform *0

  • 8/13/2019 Wuhan Key

    10/22

    seed6789 0.381 0.000 0.179 0.046 0.394 0.000

    seed7890 0.343 0.000 0.199 0.02 0.406 0.000

    seed8901 0.328 0.000 0.201 0.04 0.426 0.000

    seed9012 0.33 0.000 0.189 0.048 0.410 0.000

    seed0123 0.360 0.000 0.183 0.03 0.404 0.000

    !in 0.328 0.000 0.168 0.040 0.384 0.000!ean 0.34 0.000 0.189 0.047 0.410 0.000

    !a" 0.381 0.000 0.210 0.03 0.429 0.000

    std 0.017 0.000 0.012 0.004 0.01 0.000

    3.%. +nalysis of :es0lts

    1e res0lts for eac1 system =ere very similar. Differences =ere tested by t!test of

    differences in means by alternative. None of t1ese difference tests =ere significant at t1e ".$

    level &t=o!tailed tests(. 1is establis1es t1at no significant difference in interval or trape

  • 8/13/2019 Wuhan Key

    11/22

    21ile t1is e@ample is on a small set of data, t1e intent =as to demonstrate =1at co0ld be

    done in t1at conte@t co0ld be applied on large!scale data sets as =ell. O0r proposal is 0ni>0e to

    o0r kno=ledge, proposing t1e 0se of sim0lation to more f0lly 0se grey!related data t1at more

    acc0rately reflects t1e real problem. ?f t1is co0ld be done =it1 small!scale data sets, o0r

    contention is t1at it can also be done =it1 large!scale data sets in a data mining conte@t.

    $. Grey elated Decision +ree Model

    Arey related analysis is e@pected to provide improvement over crisp models by better reflecting

    t1e 0ncertainty in1erent in many 10man analystsH minds. Data mining models based 0pon s0c1

    data are e@pected to be less acc0rate, b0t 1opef0lly not by very m0c1. o=ever, grey related

    model inp0t =o0ld be e@pected to be stabler 0nder conditions of 0ncertainty =1ere t1e degree of

    c1ange in inp0t data increased.

    2e applied decision tree analysis to a small set &%,""" observations total( of credit card

    data. Originally, t1ere =as one o0tp0t variable &=1et1er or not t1e acco0nt defa0lted, a binary

    variable =it1 % representing defa0lt, " representing no defa0lt( and 6 available e@planatory

    variables. 1ese variables =ere analy

  • 8/13/2019 Wuhan Key

    12/22

    1e e@planatory variables incl0ded five binary variables and one categorical variable,

    =it1 t1e remaining '" being contin0o0s. o reflect f00e decision

    trees =ere obtained, =it1 form0las again given belo=. + total of seven e@planatory variables

    =ere 0sed in t1ese fo0r categorical decision trees.

    1ese models =ere t1en entered into a Monte 7arlo sim0lation &s0pported by 7rystal

    5all soft=are(. + pert0rbation of eac1 inp0t variable =as generated, set at five different levels of

    pert0rbation. 1e intent =as to meas0re t1e loss of acc0racy for crisp and grey related models.

    1e model res0lts are given in t1e seven model reports in t1e appendi@. 4ince different

    variables =ere incl0ded in different models, it is not possible to directly compare relative

    acc0racy as meas0red by fitting test data. o=ever, t1e means for t1e acc0racy on test data for

    eac1 model given in able $ s1o= t1at t1e crisp models declined in acc0racy more t1an t1e

    categorical models. 1e col0mn 1eadings in able $ reflect t1e degree of pert0rbation sim0lated.

    %'

  • 8/13/2019 Wuhan Key

    13/22

    able $- Mean Model +cc0racy

    Model "ris, ).2% ).%) 1.)) 2.)) #.)) $.)) ).2%

    7ontin0o0s % ".)" ".)" ".)" ."68 ".6) ".66 ".6 ".)"

    7ontin0o0s ' ".6) ".6) ".6) ".6) ".6) ".66 ".66 ".6)

    7ontin0o0s ".)% ".)% ".)" ".6$ ".6) ".6) ".66 ".)%"ontinuous ).&-# ).&-# ).&-) ).&*) .&') ).&&' ).&%' ).&-#

    7ategorical % ".)" ".)" ".68 ".6) ".66 ".66 ".6 ".)"

    7ategorical ' ".)" ".)" ".)" ".6$ ".68 ".6) ".6) ".)"

    7ategorical ".)" ".)" ".)" ".6$ ".6$ ".68 ".6) ".)"

    7ategorical # ".)" ".)" ".)" ".6$ ".68 ".6) ".6) ".)"

    "ategorical ).')) ).')) ).&-% ).&** ).&'* ).&') ).&&% ).'))

    1e f0

  • 8/13/2019 Wuhan Key

    14/22

    tec1ni>0e offers more insig1ts to assist o0r decision making in f0

  • 8/13/2019 Wuhan Key

    15/22

    B8C 9earl, ., 9robabilistic reasoning in intelligent systems, Net=orks of 9la0sible

    inference, Morgan a0fmann, 4an Mateo,7+ %$88.

    B$C Aa0 2.L., 50e1rer D... Iag0e sets. ?EEE rans, 4yst. Man, 7ybern, '&%$$( 6%"!6%#

    %

  • 8/13/2019 Wuhan Key

    16/22

    +99END?- Models and t1eir res0lts

    7ontin0o0s Model %-

    I!&$3&.$$454I!%*31.%$464I!&3#.-14546777

    Fre*)en+$ ,-art

    proportion

    .000

    .094

    .187

    .281

    .374

    0

    93.

    187

    280.

    374

    0.68 0.69 0.70 0.72 0.73

    1000 Trials 994 /ispla$edFore+ast ,ont 1 a++)ra+$

    est matri@-

    Model " Model % +cc0racy

    +ct0al " # %6

    +ct0al % %# ') ".)"

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".6)!".)

    pert0rbation B!".","."C ".6!".)#

    pert0rbation B!%,%C ".6'!".)pert0rbation B!','C ".8!".)#

    pert0rbation B!,C ".)!".)#pert0rbation B!#,#C ".6!".)

    0. 0.60 0.6 0.70 0.7

    %6

  • 8/13/2019 Wuhan Key

    17/22

    7ontin0o0s Model '-

    I!&$3&.$$45467

    Fre*)en+$ ,-art

    proportion

    .000

    .117

    .233

    .30

    .466

    0

    116.

    233

    349.

    466

    0.6 0.66 0.67 0.69 0.70

    1000 Trials 991 /ispla$ed

    Fore+ast ,ont 2 a++)ra+$

    est matri@-

    Model " Model % +cc0racy+ct0al " #" %$

    +ct0al % %# ') ".6)

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".6!".)%pert0rbation B!".","."C ".6!".)%

    pert0rbation B!%,%C ".6"!".)#

    pert0rbation B!','C ".8!".)

    pert0rbation B!,C ".!".)8pert0rbation B!#,#C ".!".)6

    0. 0.60 0.6 0.70 0.7

    %)

  • 8/13/2019 Wuhan Key

    18/22

    7ontin0o0s Model -

    I!&$3&.$$454I!%*31.%$464I! .2*4645777

    Fre*)en+$ ,-art

    proportion

    .000

    .09

    .119

    .178

    .237

    0

    9.2

    118.

    177.7

    237

    0.6 0.68 0.70 0.73 0.7

    1000 Trials 996 /ispla$ed

    Fore+ast ,ont 3 a++)ra+$

    est matri@-

    Model " Model % +cc0racy+ct0al " ## %

    +ct0al % %# ') ".)%

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".6!".)6pert0rbation B!".","."C ".6!".)6

    pert0rbation B!%,%C ".$!".))

    pert0rbation B!','C ".#!".)$

    pert0rbation B!,C ".!".)8pert0rbation B!#,#C ".!".)6

    0. 0.60 0.6 0.70 0.7

    %8

  • 8/13/2019 Wuhan Key

    19/22

  • 8/13/2019 Wuhan Key

    20/22

    7ategorical Model '-

    I!&$89high94I!%$89lo94

    I!"D:89mid94I!#'89lo9464574if!"D:89lo9454677

    I!%$89high94if!$*89mid945467467457

    Fre*)en+$ ,-art

    proportion

    .000

    .06

    .111

    .167

    .222

    0

    .

    111

    166.

    222

    0.6 0.68 0.70 0.73 0.7

    1000 Trials 997 /ispla$edFore+ast ,at2 a++)ra+$

    est matri@-

    Model " Model % +cc0racy

    +ct0al " #' %)

    +ct0al % % '8 ".)"

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".6!".)

    pert0rbation B!".","."C ".6#!".)6

    pert0rbation B!%,%C ".6%!".)6pert0rbation B!','C ".8!".)6

    pert0rbation B!,C ".)!".8"pert0rbation B!#,#C ".6!".)$

    0.60 0.6 0.70 0.7

    '"

  • 8/13/2019 Wuhan Key

    21/22

    7ategorical Model -

    I!&$89high946457

    Fre*)en+$ ,-art

    proportion

    .000

    .170

    .340

    .10

    .680

    0

    170

    340

    10

    680

    0.68 0.69 0.69 0.70 0.70

    1000 Trials 982 /ispla$ed

    Fore+ast ,at3 a++)ra+$

    est matri@-

    Model " Model % +cc0racy+ct0al " '6

    +ct0al % # ) ".)"

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".68!".)"pert0rbation B!".","."C ".6)!".)%

    pert0rbation B!%,%C ".66!".)'

    pert0rbation B!','C ".6'!".)

    pert0rbation B!,C ".$!".)pert0rbation B!#,#C ".$!".)6

    0.60 0.6 0.70 0.7

    '%

  • 8/13/2019 Wuhan Key

    22/22

    7ategorical Model #-

    I!&$89high94

    I!%$89lo94

    I!"D:89mid94I!#'89lo9464574I!"D:89lo94I!&13.%464574677

    I!%$89high94I!$*89mid945467467 457

    Fre*)en+$ ,-art

    proportion

    .000

    .09

    .118

    .177

    .236

    0

    9

    118

    177

    236

    0.66 0.69 0.71 0.74 0.76

    1000 Trials 998 /ispla$ed

    Fore+ast ,at4 a++)ra+$

    est matri@-

    Model " Model % +cc0racy

    +ct0al " #% %8

    +ct0al % %' '$ ".)"

    4im0lation acc0racy of %"" observations, %""" sim0lation r0ns

    pert0rbation B!".',".'C ".6!".)6

    pert0rbation B!".","."C ".6#!".))pert0rbation B!%,%C ".6%!".))

    pert0rbation B!','C ".8!".))pert0rbation B!,C ".)!".))pert0rbation B!#,#C ".!".)8

    0. 0.60 0.6 0.70 0.7

    ''