Bayesian Networks - Yonsei Universitysclab.yonsei.ac.kr/courses/07AI/data/6)Bayesian Networks.pdf3...

of 56/56
1 Bayesian Networks
  • date post

    11-Oct-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Bayesian Networks - Yonsei Universitysclab.yonsei.ac.kr/courses/07AI/data/6)Bayesian Networks.pdf3...

  • 1

    Bayesian Networks

  • 2

    Bayes’ Rule & Bayesian Inference

    • Bayesian inference is statistical inference in which probabilities are interpreted as degrees of belief

    • The name comes from the frequent use of the Bayes’ rule

    )()()|(

    )(),()|(

    EPHPHEP

    EPEHPEHP ==

  • 3

    Bayesian Networks

    • Many variables exist and joint probability distribution is important– How to represent joint probability distribution effectively

    • Bayesian networks: Graphical representation (acyclic directed graph) of joint probability distribution – Node : Variables– Edge : Probabilistic dependency

    • A node can represent any kind of variable, be it an observed measurement, a parameter, a latent variable, or a hypothesis

    • Node are not restricted to representing random variables

  • 4

    Why Use BNs?• Explicit management of uncertainty

    • Modularity (modular specification of a joint distribution) implies maintainability

    • Better, flexible and robust decision making – MEU (Maximization of Expected Utility), VOI (Value of Information)

    • Can be used to answer arbitrary queries - multiple fault problems (General purpose “inference” algorithm)

    • Easy to incorporate prior knowledge and to understand

  • 5

    = P(A) P(S) P(T|A) P(L|S) P(B|S) P(C|T,L) P(D|T,L,B)

    P(A, S, T, L, B, C, D)

    Conditional Independencies Efficient Representation

    T L B D=0 D=10 0 0 0.1 0.90 0 1 0.7 0.30 1 0 0.8 0.20 1 1 0.9 0.1

    ...

    Lung Cancer (L)

    Smoking (S)

    Chest X-ray (C)

    Bronchitis (B)

    Dyspnoea (D)

    Tuberculosis (T)

    Visit to Asia (A)

    P(D|T,L,B)

    P(B|S)

    P(S)

    P(C|T,L)

    P(L|S)

    P(A)

    P(T|A)

    [Lauritzen & Spiegelhalter, 95]

    An Example of Bayesian Network (Asia Network)

    A kind of disease

    A kind of disease

    Difficult respiration

  • 6

    Bayesian Network Knowledge Engineering

    • Objective: Construct a model to perform a defined task

    • Participants: Collaboration between domain expert and BN modeling expert

    • Process: iterate until “done”– Define task objective– Construct model– Evaluate model

  • 7

    The Knowledge Acquisition Task

    • Variables:

    – Collectively exhaustive, mutually exclusive values

    – Clarity test: value should be knowable in principle

    • Structure

    – If data available, can be learned

    – Constructed by hand (using “expert” knowledge)

    – variable ordering matters: Causal knowledge usually simplifies

    • Probabilities

    – Can be learned from data

    – Sensitivity analysis

  • 8

    Learning Bayesian Network

    • Why learn a Bayesian network?– It is expensive to gain expert knowledge ...

    • But Collecting data is cheap!!– It is difficult to use Expert knowledge for BN

    • Applying data to learning method... Easy!!– It can be mixed expert knowledge and data– Knowledge discovery

    DATADATA

    PriorPriorInformationInformation

    LearningMethod

    LearningLearningMethodMethod

    S

    E

    C

    D

    …………0.30.7FT0.10.9TT

    P(E|S,C)CS

    Structure

    CPT(parameter)

  • 9

    Compactness and Node Ordering

    • Compact BN has order k (number of edges) < N (number of nodes)– Sparsely connected

    • The optimal order to add nodes is to add the “root causes” first, then variable they influence, so on until “leaves” reached

    • Example of poor ordering (which still represent the same joint distribution)

  • 10

    Network Construction Algorithm

    1) Choose the set of relevant variables Xi that describe the domain

    2) Choose an ordering for the variables

    3) While there are variable left:

    1) Pick a variable Xi and add a node to the network for it

    2) Set Parents(Xi ) to some minimal set of nodes already in the net such that the conditional independence property is satisfied

    3) Define the CPT (Conditional Probability Table) for Xi

    ))(|(=),...,|( 11- iiii XParentsXPXXXP

  • 11

    K2 Algorithm (1)

    • Assume that node input in order• A set of n nodes in order, an upper bound u on the # of parents a

    node may have, and a database D containing m cases

    {K2} end{for} end

    ),:'node thisof Parents' ,,:'Node' write({while}; end

    false; : dOKToProcee else };{:

    ;: then if

    });{,(: });{,( maximizes that -)Pred(in node thebe let

    do and dOKToProcee whiletrue;: dOKToProcee );,(:

    ;: do to1:for

    K2; procedure

    ii

    ii

    newold

    oldnew

    inew

    iii

    i

    iold

    i

    x

    zPP

    PPziscoreP

    ziscorexzu

    iscoreP

    ni

    π

    ππ

    πππ

    π

    πφπ

    =∪=

    =>

    ∪=∪

    <

    ==

    ==

    Pred(node) that returns the set of nodes that precede node in the node ordering

  • 12

    K2 Algorithm (2)

    • Example– Node ordering : A B C D E

    A A B

    C

    A B

    C

    A B

    D

    C

    A B

    D

    E

    C

    A B

    D

    E

  • 13

    Complex Structure• Theorem : Finding maximal scoring network structure with at most k

    parents for each variables(nodes) is NP-hard for k >1• We solve the problems by using heuristic search

    – Traverse the search space looking for high-scoring structures

    • Caching : To update the score of after a local change, we only need to re-score the families that were changed in the last move(Decomposability)

    S

    E

    C

    D

    S

    E

    C

    D

    S

    E

    C

    D

    S

    E

    C

    D

    Delete C EReverse C E

    Add C D

  • 14

    Algorithm B

    Score metric

    • A greedy construction heuristic

  • 15

    Decision Networks• A decision network represents information about

    – The agent’s current state– It’s possible actions– The state that will result from the agent’s action– The utility of that state

    • Also called, Influence Diagrams

    • Types of nodes– Chance nodes: Represents random variables (same as Bayesian

    networks) – Decision nodes (rectangles) : Represent points where the decision

    maker has a choice of actions– Utility nodes (diamonds) : Represent the agent’s utility function (also

    called value nodes)

  • 16

    Example: Studying

    • Trying to decide whether to study hard for the “Advanced Artificial Intelligence” exam

    • Your mark will depend – How hard you study – How hard the exam is – How well you did at “Artificial Intelligence” course (which

    indicates how well prepared you are)

    • Expected utility for two possible actions (study hard or not) given that you received an HD (High Degree) for AI, but the exam is hard

  • 17

    Example: Studying

  • 18

    Research Overview

    BayesianNetworks

    Learning

    StructureDesignInference

    ManualConstruction

    OntologyFuzzy

    HierarchicalStructure

    EvolutionaryLearning

    DynamicBayesianNetworks

  • 19

    BN+BN: Behavior Network with Bayesian Network for Intelligent Agent

  • 20

    Agenda

    • Motivation• Related Works• Backgrounds

    – Behavior network– Bayesian network

    • BN+BN• Experimental Results• Conclusions and Future Works

  • 21

    Motivation

    • ASM (Action selection mechanism) : Combine behaviors for generating high-level behaviors– Impossible to insert global goals in explicit or implicit manner

    • Behavior network– Inserting global goals into ASM– Difficult to insert human’s prior knowledge about goal activation

    because there are many goals to be active• Bayesian network

    – Computational method to represent human’s knowledge into graph model with inference capability

    • BN+BN– Applying Bayesian network to represent prior knowledge about

    goal activation in behavior network

  • 22

    Related Works

    • Bayesian network structure learning from data– Conditional independence test– Scoring-based optimization– Hybrid of the two approaches

    • Agent architectures– Reactive control : Don’t think, react– Deliberative control : Think hard, then act

    • BDI (belief, desire, intention) model– Hybrid control : Think and act independently, in parallel– Behavior-based control : Think the way you act– Layered architecture

    • Brian Duffy “Social robot”

  • 23

    Behavior-based AI

    • Behavior-based AI– Controller consists of a collection of “behaviors”

    • Coordinating multiple behaviors– Deciding what behavior to execute at each point in time

    locomote

    avoid hitting things

    explore

    manipulate the world

    build maps

    sensors actuators

    behavior-based robotics[Brooks, 1986]

  • 24

    Layered Architecture

    The social robot

    [Brian Duffy, 2000]

  • 25

    Representation

    • Basic characteristic Competition of behaviors • Behavior

    – Precondition : A set of states that have to be true for the execution of behavior

    – Add list : A set of states that are likely to be true by the execution of behavior

    – Delete list : A set of states that are likely to be false by theexecution of behavior

    – Activation level • External links

    – From goals to behavior– From environmental states to behavior

    • Internal links

  • 26

    An Example

    AB

    C D

    E F

    G H

    Predecessor linkSuccessor link

  • 27

    Spreading Activation

    State Behavior

    If (state=true) & (states ∈ precondition of behavior)

    Then activation (behavior) += activation (state)

    Goal Behavior

    If (goal=true) & ( goal ∈ add list)

    Then activation (behavior) += activation (goal)

    Behavior1 Behavior2 Behavior1 Behavior2

    If predecessor (behavior1, behavior2) = true

    Then activation (behavior2) += activation (behavior1)

    predecessor successor

    If successor link (behavior1, behavior2) = true

    Then activation (behavior1) += activation (behavior2)

  • 28

    Action Selection

    /* Select one action among candidates */WHILE (1) {

    initialization(); // clear candidate listspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviorsFOR all behaviors {

    IF( all preconditions are true && activation (behavior) > threshold) {

    candidate (behavior); // register to candidate list}

    }/* select one candidate behavior with the highest activation */IF( candidate () = NULL) { /* there is no candidate behavior in the list */

    threshold = 0.9 * threshold; /* decrease threshold */}ELSE{

    select();break;

    }}

  • 29

    Overview

    • BAYES rule– Pr (A|B) = (Pr (B|A) X Pr (A)) / Pr (B)

    • How to encode dependencies among variables?– Impossible to encode all dependencies– Bayesian network Approximation for real dependencies

    among variables with graphical model• How to construct Bayesian network?

    – By expert Manual construction– By learning from data

    • Structure learning• Parameter learning

    • Bayesian network = Structure (dependencies among variables) + Conditional probability table (parameters)

  • 30

    An Example

  • 31

    Overview

    S1

    Sp

    S3

    S2

    G1

    Gn

    G2

    B1

    Bk-1B3

    Bk

    B2

    B : BehaviorS : Environmental state

    G : Goal: Variable of Bayesian netV

    w1w2

    wn

    … …

    Bayesian network(Goal coordination)

    V1

    V3Vr

    V2

    …Environmental states

    Weights of goals

  • 32

    Algorithm

    • Spreading activation

    • Action selection…initialization(); // clear candidate listBayesian(); // infer weights of goalsspreading activation(); // updates activation levelnormalization(); // normalize activation level of behaviors…

    Goal Behavior

    If (goal=true) & ( goal ∈ add list)

    Then activation (behavior) += activation (goal) * weight (goal)

    w

  • 33

    Environments

    Area 1 Area 4

    Area 2 Area 3

    Area 1 Area 4

    Area 2 Area 3

    Area 1 : Many small obstaclesArea 2 : One light sourceArea 3 : Two light sourcesArea 4 : Long obstacles

    How to know area without map?Bayesian network

    Coordinating goal of BN with area information

    Goals of behavior network- Minimizing bumping in two different obstacle styles- Go to light source

  • 34

    Behavior Network Design

    ObstacleIs Near

    NothingAround Robot

    Light Level I

    Light Level II

    No LightSource

    Going to Light Source

    Minimizing Bumping A

    MinimizingBumping B

    FollowingLight

    GoingStraight

    AvoidingObstacle

    Sensors Behaviors Goals

    ObstacleIs Near

    NothingAround Robot

    Light Level I

    Light Level II

    No LightSource

    Going to Light Source

    Minimizing Bumping A

    MinimizingBumping B

    FollowingLight

    GoingStraight

    AvoidingObstacle

    Sensors Behaviors Goals

  • 35

    Bayesian Network Learning

    Area Evaluation Sensor Value

    Data Generation

    Distance1500.0498.0

    Distance 2 Distance 3400.0 1022.0300.0 1020.0

    Light 1499.0450.0

    Light 2 Area200.0 1220.0 2

    Bayesian Network Learning

    Area Evaluation Sensor Value

    Data Generation

    Distance1500.0498.0

    Distance 2 Distance 3400.0 1022.0300.0 1020.0

    Light 1499.0450.0

    Light 2 Area200.0 1220.0 2

    Bayesian Network Learning

    Area Evaluation Sensor Value

    Data Generation

    Distance1500.0498.0

    Distance 2 Distance 3400.0 1022.0300.0 1020.0

    Light 1499.0450.0

    Light 2 Area200.0 1220.0 2

    Bayesian Network Learning

    Area Evaluation Sensor Value

    Data Generation

    Distance1500.0498.0

    Distance 2 Distance 3400.0 1022.0300.0 1020.0

    Light 1499.0450.0

    Light 2 Area200.0 1220.0 2

    Bayesian Network Learning

  • 36

    Bayesian Network Learning (2)Light8 Light2

    Area2Light7

    Area4 Light4

    Area1

    Area3

    Distance6

    Distance5

    Distance1

    Distance2

    Distance3

    Distance4

  • 37

    Simulation Results

    Only behavior network BN+BN

    -Two results of only behavior network Ignoring light source though it is nearMinimizing bumping is well satisfied

    -BN+BN Robot does not ignore light source but it bumps many times in area 1Controlling degree of Bayesian network’s incorporation into behavior network

    is needed

  • 38

    An Efficient Attribute Ordering Optimization in Bayesian Networks for Prognostic Modeling

    of the Metabolic Syndrome

  • 39

    Outline• Motivation

    – Bayesian Networks– Why Is Attribute Ordering Optimization in BN Needed?

    • Backgrounds– Metabolic Syndrome– Bayesian Networks in Medical Domain

    • Proposed Method– Overall Flow– Preprocessing & Attribute Selection – Attribute Ordering Optimization – Structure & Parameter Learning

    • Experiments– Dataset – Parameter & Setting– Results & Analyses

    • Conclusions & Future Works

  • 40

    Bayesian Networks• Bayesian networks

    – Be represented as directed acyclic graph• Nodes Probabilistic variables• Arches Dependencies between variables

    – Powerful technique for handling uncertainty

    • BN structure– Learned from the learning data– Designed by domain expert

    • Several learning algorithms for BN structureThe K2 algorithm

  • 41

    Why Is Attribute Ordering in BN is Needed?• The K2 algorithm

    – When BN is learned, former attributes can be the parents of following attributes, but following ones cannot

    – Attribute ordering influences BN structure

    • Different BN structure with the same attributes (Example)

    Attribute ordering matters !

    1 2

  • 42

    Metabolic Syndrome• It requires the presence of three or more of the followings

    (NCEP-ATP III):– Abdominal obesity:

    • waist circumference >=102 cm in men• waist circumference >=88 cm in women

    – Hypertriglyceridemia: • Triglyceridemia >= 150 mg/dL

    – Low HDL cholesterol: • HDL cholesterol

  • 43

    Bayesian Networks in Medical Domain

    • Strengths of the BNs– Allow researchers to use the domain knowledge– Be interpretable and easily understood– Be superior in capturing interactions among input variables– Be less influenced by small sample size

    • BN applications in medical domain– Antal et al used the BN to construct diagnostic model of ovarian

    cancer & to classify its samples (2004)– Aronsky & Haug used the BN to diagnose of pneumonia (2000)– Burnside used BN to diagnose breast cancer (2000)– In addition, BN is utilized for several purposes such as patient

    caring, tuberculosis model

  • 44

    Overall Flow

    Pre-processing

    AttributeOrdering

    BN learning

    Metabolic Syndrome Data

    Pre-processing

    Ordering by Attribute Group

    Attribute Grouping

    Attribute Selection

    Ordering in Groups

    Structure Learning

    Change?

    Prediction

    Metabolic Syndrome Normal

    Yes

    No

    Yes

    No

    Change?

    Parameter Learning

  • 45

    Preprocessing & Attribute Selection

    • Attribute selection– 11 informative attributes are

    selected with medical knowledge

    – 11 attributes include 8 attributes for definition plus 3 attributes from reference (Girod et al., 2003)

    • Preprocessing– BN requires discretized input– Medical knowledge helped

    discretization process (Mykkanen et al., 2004)

  • 46

    Attribute Ordering Optimization

  • 47

    Representation & Fitness Evaluation• Chromosome representation

    GID GSize A1 A2 An3...

    Chromosome

    Attribute Group

    G1 G2 G3 ... Gm-1 Gm

    2/)1(1))(( ),(

    ),( +

    +−=

    nnIfRankn

    p jgjg

    • Fitness evaluation: By prediction rate of learning data

    • Initialization: Performed at random

    • Selection: Rank-based selection– p(g,j), the probability that each individual I(g,j) is selected, is

  • 48

    Genetic Operations• Cycle crossover operation & displacement mutation operation are used

    G1 G2G3G5G6 G7G4

    G1G2 G3G5G6 G7 G4

    G2G6 G1 G4

    G7 G5 G3

    G5 G3 G7

    G1 G6 G4 G2

    G1 G2G6 G4 G3G5G7

    G3G5 G7G1G2G6 G4

    An example of cycle crossover operation

    G1 G2 G3 G5G4 G6 G7

    G6 G7

    G1 G2 G3 G5G 4

    G1 G2 G3 G5G4G6 G7

    삽입

    An example of displacement mutation operation

    Proposed Method

  • 49

    Structure & Parameter Learning• Structure Learning: The K2 algorithm

    • Parameter learning: Parameters are calculated from learning data

  • 50

    Dataset• Dataset (Shin et al., 1996)

    – Surveys were conducted twice in 1993 and 1995 in Yonchon County, Korea

    – Dataset contains• 1135 subjects including no missing value are used• 18 attributes that could influence the prediction of the

    metabolic syndrome

    1995

    Metabolic SyndromeMetabolic Syndrome

    NormalNormal

    1993

    ??Prediction

    &Analysis

    BMIBPAgeSex,…

    • Problem– Data in 1993 State in 1995 (Normal vs. Metabolic syndrome)

  • 51

    Parameter & Setting• GA

    – Population size: 20– Generation limit: 100– Selection rate: 0.8– Crossover rate: 1– Mutation rate: 0.02

    • Models for Comparison– For neural networks

    • 11 (input) - 20 (hidden) - 2 (output)– For k-nearest neighbors

    • k=3 (by preliminary experiment)• Data partition

    – For evolution process• 3:1:1 (learning data : validation data : test data)

    – For comparison experiment• 5 fold cross validation

  • 52

    Data Analysis by Age

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    26-35 36-45 46-55 56-65 66-75

    Age

    Rat

    e of norm

    alRat

    e of M

    S

    • The rate of metabolic syndrome increases by age• Decrease of last part could be influenced by death by

    complications of MS

  • 53

    Comparison by Attribute Selection

    0.65

    0.66

    0.67

    0.68

    0.69

    0.7

    0.71

    0.72

    0.73

    0.74

    8 11 18

    The number of attributes

    Pred

    ictio

    n ra

    te

  • 54

    Ordering Optimization Process

    0.6700

    0.6800

    0.6900

    0.7000

    0.7100

    0.7200

    0.7300

    0.7400

    1 11 21 31 41 51 61 71 81 91

    Generation

    Fitn

    ess

    AverageHigheset

    • Evolves well• Converges after 60th generation

  • 55

    Comparison Before and After Optimization

  • 56

    Comparison with Other Models

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 300 /GrayImageDepth -1 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 1200 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile () /PDFXOutputCondition () /PDFXRegistryName (http://www.color.org) /PDFXTrapped /Unknown

    /Description >>> setdistillerparams> setpagedevice