Review for Finals 2011

Review for Finals 2011159.302

SEARCH

CONSTRAINT SATISFACTION PROBLEMS

GAMES

FUZZY LOGIC

NEURAL NETWORKS

GENETIC ALGORITHM (not included in the exam)

LOGIC (not included in the exam)

SEARCH – 15 marks

CONSTRAINT SATISFACTION PROBLEMS – 10 marks

GAMES – 8 marks

FUZZY LOGIC – 7 marks

NEURAL NETWORKS – 12 marks

Allotment of marks

Total = 60 marks

FUNDAMENTALS (true or false) – 8 marks

Search

Input: Multiple Obstacles: x, y, angleTarget’s x, y, angle

Robot Navigation

Output: Robot angle, speed

Obstacle Avoidance, Target Pursuit, Opponent Evasion

Cascade of Fuzzy SystemsCascade of Fuzzy Systems

Adjusted Speed

Adjusted Angle

Next Waypoint

N

Y

Adjusted Speed

Adjusted Angle

Fuzzy System 1: Target PursuitFuzzy System 1: Target Pursuit

Fuzzy System 2: Speed Control for Target Pursuit

Fuzzy System 3: Obstacle Avoidance

Fuzzy System 4: Speed Control for Obstacle Avoidance

ObstacleDistance < MaxDistanceTolerance and closer than Target

Actuators

Path planning Layer:

The A* Algorithm

Multiple Fuzzy Systems employ the various robot behavioursMultiple Fuzzy Systems employ the various robot behaviours

Fuzzy System 1Fuzzy System 1




Path Planning LayerPath Planning Layer

CentralControl

Target Target PursuitPursuit

ObstacleObstacleAvoidanceAvoidance

7

SEARCHSEARCHBackground and MotivationBackground and Motivation

General Idea: Search allows exploring alternativesGeneral Idea: Search allows exploring alternatives

11

• Background• Uninformed vs. Informed Searches• Any Path vs. Optimal Path• Implementation and Performance

Topics for Discussion:

These algorithms provide the conceptual backbone of almost every approach to the systematic exploration of alternatives.

8

SEARCHSEARCHGraph Search as Tree SearchGraph Search as Tree Search

11

C

A D G

• We can turn graph search problems graph search problems (from S to G) into tree search problems tree search problems by:

1. Replacing undirected links by 2 directed links2. Avoiding loops in path (or keeping track of visited nodes globally)

S

B

S

DD

C G C G

A B

GC

9

SEARCHSEARCHMore Abstract Example of GraphsMore Abstract Example of Graphs

Planning actions (graph of possible states of the world)Planning actions (graph of possible states of the world)

11

A B C

A B

C

A B

C

A

B

C

A

B

C

Put C on A

Put C on B

Put B on C

Put A on C

Here, the nodes denote descriptions of the state of the world

Path = “plan of actions”

10

SEARCHSEARCHClasses of SearchClasses of Search

33

Class Name Operation

Any Path UninformedDepth-First Systematic exploration of the

whole tree until a goal is foundBreadth-First

Any Path Informed Best-First

Uses heuristic measure of goodness of a state

(e.g. estimated distance to goal)

Optimal UninformedUniform-Cost Uses path-length measure.

Finds “shortest” path.

Optimal Informed A*

Uses path “length” measure and heuristic. Finds “shortest” path.

11

SEARCHSEARCHSimple Search AlgorithmSimple Search Algorithm

A Search Node is a path from some state X to the start state A Search Node is a path from some state X to the start state e.g. (X B A S)e.g. (X B A S)

33

1. Initialise Q with search node (S) as only entry; set Visited = (S).

The state of a search node is the most recent state of the pathThe state of a search node is the most recent state of the path e.g. Xe.g. X

Let Q be a list of search nodesLet Q be a list of search nodes e.g. (X B A S) (C B A S) …)e.g. (X B A S) (C B A S) …)

Let Let SS be the start state. be the start state. 2. If Q is empty, fail. Else, pick some search node N from Q.

3. If state (N) is a goal, return N (we’ve reached the goal).

4. (Otherwise) Remove N from Q.

5. Find all the descendants of state (N) not in Visited and create all the one-step extensions of N to each descendant.

6. Add the extended paths to Q; add children of state (N) to Visited.

7. Go to Step 2.

Critical Decisions:

Step 2: picking N from Q.Step 6: adding extensions of

N to Q.

12

SEARCHSEARCHImplementing the Search StrategiesImplementing the Search Strategies

44

1. Pick first element of Q.Breadth-First

2. Add path extensions to end of Q.

1. Pick first element of Q.Depth-First (backtracking search)

2. Add path extensions to front of Q.

1. Pick best (measured heuristic value of state) element of Q.Best-First (greedy search)

2. Add path extensions anywhere in Q (it may be efficient to keep the Q ordered in some way so as to make it easier to find the “best” element.

Heuristic functions are applied in the hope of completing the search quicker or finding a

relatively good goal state. It does not guarantee finding the

“best” path though.

13

SEARCHSEARCHDepth-First with Visited ListDepth-First with Visited List

44

Pick firstfirst element of Q; Add path extensions to frontfront of Q.

Step Q Visited

1 (S) S

2 (AS)(BS) A, B, S

3 (CAS)(DAS)(BS) C, D, B, A, S

4 (DAS)(BS) C, D, B, A, S

5 (GDAS)(BS) G,C,D,B,A,S

Sequence of State Expansions: S-A-C-D-G

C

A D G

S

B

In DFS – nodes are pulled off the queue and inserted into the queue using a stack.

14

SEARCHSEARCHVisited StatesVisited States

44

Keeping track of visited states generally improves time efficiency generally improves time efficiency when searching graphs, without affecting correctness. Note, however, that substantial additional space substantial additional space may be required to keep track of visited states.

If all we want to do is find a path from the start to the goal, there is no advantage to adding a search node whose state is already the state of another search node.

Any state reachable from the node the second time would have been reachable from that node the first time.

Note that, when using Visited, each state will only ever have at most one path to it (search node) in Q.

We’ll have to revisit this issue when we look at optimal searching

15

SEARCHSEARCHWorst Case Running TimeWorst Case Running Time

44

In the worst case, all the searches, with or without visited list may have to visit each state at least once.

d is depthb is branching factorNumber of States in the tree = bd < (bd+1 – 1)/(b-1) < bd+1

d=0

d=1

d=2

d=3

b=2

Max Time is proportional to the Max. number of Nodes visited.

So, all searches will have worst case running times that are at least proportional to the total number of states and therefore exponential in the “depth” parameter.

16

SEARCHSEARCHWorst Case SpaceWorst Case Space

44

Depth-first maximum Q size: (b-1)d ≈ bd

d=0

d=1

d=2

d=3

Max Q Size = Max (#visited – #expanded).

visited

expanded

b=2

Breadth-first max. Q size: bd

17

SEARCHSEARCHCost and Performance of Any-Path MethodsCost and Performance of Any-Path Methods

44

Search Method

Worst Time Worst Space Fewest states?

Guaranteed to find a path?

Depth-First bd+1 bd NoYes*

Breadth-Firstbd+1 bd Yes

Yes

Best-First bbd+1 d+1 **** bd No Yes*

Searching a tree with branching factor b and depth d (without using a without using a Visited ListVisited List)

*If there are no indefinitely long paths in the search space**Best-First needs more time to locate the best node in Q.

Worst case time is proportional to the number of nodes added to Q.Worst case space is proportional to maximal length of Q.

18

SEARCHSEARCHCost and Performance of Any-Path MethodsCost and Performance of Any-Path Methods

44

Search Method

Worst Time

Worst Space

Fewest states?

Guaranteed to find a path?

Depth-First bd+1 bd bd+1 NoYes*

Breadth-Firstbd+1 bd bd+1 Yes

Yes

Best-First bd+1 ** bd bd+1 No Yes*

Searching a tree with branching factor b and depth d (with Visited Listwith Visited List)

*If there are no indefinitely long paths in the search space**Best-First needs more time to locate the best node in Q.

Worst case time is proportional to the number of nodes added to Q.Worst case space is proportional to maximal length of Q and and Visited listVisited list..

19

SEARCHSEARCHStates vs. PathStates vs. Path

44

d=0

d=1

d=2

d=3

b=2

Using a Visited list helps prevent loops; that is, no path visits a state more than once. However, using the Visited list for very large spaces may not be appropriate as the space requirements would be prohibitive.

Using a Visited list, the worst-case time performance is limited by the number of states in the search space rather than the number of paths through the nodes in the space (which may be exponentially larger than the number of states.

20

SEARCHSEARCHSpace (the final frontier)Space (the final frontier)

44

In large search problems, memory is often the limiting factor.

• Imagine searching a tree with branching factor 8 and depth 10. Assume a node requires just 8 bytes of storage. Then, Breadth-First search might require up to:

(23)10 * 23 = 233 bytes = 8,000 Mbytes = 8 Gbytes

One strategy is to trade time for memory. For example, we can emulate Breadth-First search by repeated applications of Depth-First search, each up to a preset depth limit. This is called Progressive Deepening Search Progressive Deepening Search (PDS)(PDS):

1. C=12. Do DFS to max. depth C. If path is found, return it.3. Otherwise, increment C and go to Step 2.

See Tutorial on Search

22

SEARCHSEARCHSimple Search AlgorithmSimple Search Algorithm


33

1. Initialise Q with search node (S) as only entry; set Visited = (S).


Let Q be a list of search Let Q be a list of search nodesnodes e.g. (X B A S) (C B A S) e.g. (X B A S) (C B A S) …)…)Let Let SS be the start state. be the start state.

2. If Q is empty, fail. Else, pick some search node N from Q.



5. Find all the descendants of state (N) not in Visited and create all the one-step extensions of N to each descendant.

6. Add the extended paths to Q; add children of state (N) to Visited.

7. Go to Step 2.

Do NOT use Visited ListVisited List for

OptimalOptimal Searching!

Critical Decisions:

Step 2: picking N from Q.Step 6: adding extensions of N to

Q.

23

SEARCHSEARCHUniform CostUniform Cost

44

Step Q

1 (0 S)

2 (2 AS)(5 BS)

3 (4 CAS)(6 DAS)(5 BS)

4 (6 DAS)(5 BS)

5 (6 DBS)(10 GBS)(6 DAS)

6 (8 GDBS)(9 CDBS)(10 GBS)(6 DAS)

7 (8 GDAS)(9 CDAS)(8 GDBS)(9 CDBS) (10 GBS)

C

A D G

S

B

22

22 44

55

3322

5511

Uniform Cost enumerates paths in order of total path cost!

S

D C

C G

D G

C G

A B22

66

55

99 88

4466 1010

99 88

Sequence of State Expansions: S – A – C – B – D – D - G

The sequence of path extensions corresponds precisely to path-length order, so it is not surprising we find the shortest path.

0 2 4 5 6 6 8 0 2 4 5 6 6 8

24

SEARCHSEARCHSimple Simple OptimalOptimal Search Algorithm: Search Algorithm: Uniform Cost Uniform Cost + + Strict Expanded ListStrict Expanded List


33

1. Initialise Q with search node (S) as only entry, set Expanded = {}.set Expanded = {}.


Let Q be a list of search Let Q be a list of search nodesnodes e.g. (X B A S) (C B A S) …)e.g. (X B A S) (C B A S) …)

Let Let SS be the start state. be the start state.

2. If Q is empty, fail. Else, pick some search node N from Q.



5. If state (N) in Expanded, go to Step 2; otherwise, add state (N) If state (N) in Expanded, go to Step 2; otherwise, add state (N) to Expanded List.to Expanded List.

6. Find all the children of state(N) (Not in Expanded)(Not in Expanded) and create all the one-step extensions of N to each descendant.

7. Add all the extended paths to Q. If descendant state already in Add all the extended paths to Q. If descendant state already in Q, keep only shorter path to state in Q.Q, keep only shorter path to state in Q.

Take note that we need to add some

precautionary measures in the

adding of extended paths to Q.

8. Go to Step 2.

In effect, discard that state

25

SEARCHSEARCHUniform Cost (with Strict Expanded List)Uniform Cost (with Strict Expanded List)

44

C

A D G

S

B

22

22 44

55

3322

5511

Step Q Expanded

1 (0 S)

2 (2 AS)(5 BS) S

3 (4 CAS)(6 DAS)(5 BS) S, A

4 (6 DAS)(5 BS) S, A, C

5 (6 DBS)(10 GBS)(6 DAS) S, A, C, B

6 (8 GDAS)(9 CDAS)(10 GBS) S, A, C, B, D

Sequence of State Expansions: S – A – C – B – D - GRemove because C was expanded already

Remove because there’s a new shorter path to G

Remove because D is already in Q (our convention: take element at the front of the Q if a path leading to the same

state is in Q).

27

SEARCHSEARCHWhy use estimate of goal distance?Why use estimate of goal distance?

33

• Order in which UC looks at states. AA and BB are same distance from start SS, so will be looked at before any longer paths. No “bias” towards goal.

SA B G

• Assume states are points in the Euclidean space

28

SEARCHSEARCHWhy use estimate of goal distance?Why use estimate of goal distance?

33

SA

• Order of examination using distance from SS + estimate of distance to GG.

• Note: “bias”“bias” toward the goal; The points away from G look worse

• Order in which UC looks at states. AA and BB are same distance from start SS, so will be looked at before any longer paths. No “bias” towards goal.

• Assume states are points in the Euclidean space

B G

29

SEARCHSEARCHGoal DirectionGoal Direction

33

• UC is really trying to identify the shortest path to every state in the graph in order. It has no particular bias to finding a path to a goal early in the search.

• We can introduce such a bias by means of heuristic function h(N)h(N), which is an estimate (h)(h) of the distance from a state to the goal.

• Instead of enumerating paths in order of just length (gg), enumerate paths in terms of ff = estimated total path length = g + hg + h.

• An estimate that always underestimates the real path length to the goal is called admissibleadmissible. For example, an estimate of 0 is admissible (but useless). Straight line distance is admissible estimate for path length in Euclidean space.

• Use of an admissible estimate guarantees that UC will still find the shortest path.

• UC with an admissible estimatewith an admissible estimate is known as A* SearchA* Search.

30

SEARCHSEARCHA*A*

44

C

A D G

S

B

22

22 44

55

3322

5511

Step Q

1 (0 S)

2 (4 AS)(8 BS)

3 (5 CAS)(7 DAS)(8 BS)

4 (7 DAS)(8 BS)

5 (8 GDAS)(10 CDAS)(8 BS)

Sequence of State Expansions: S – A – C – D - G

Heuristic Values:A=2, B=3, C=1, D=1, S=0, G=0

• Pick bestbest (by path length + heuristic) element of Q, Add path extensions anywhereanywhere in Q.

31

SEARCHSEARCHStates vs. PathsStates vs. Paths

44

We have ignored the issue of revisiting states in our discussion of Uniform-CostUniform-Cost Search Search and A*A*. We indicated that we could not use the Visited List Visited List and still preserve optimality, but can we use something else that will keep the worst-case cost of a search proportional to the number of states in a graph rather than to the number of non-looping paths?

d=0

d=1

d=2

d=3

b=2

32

SEARCHSEARCHConsistencyConsistency

44

To enable implementing A* using the strict Expanded List, H needs to satisfy the following consistency (also known as monotonicity) conditions

1. h(Si) = 00, if ni is a goal

2. h(Si) - h(Sj) <=<= c(Si, Sj), if nj a child of ni

That is, the heuristic cost in moving from one entry to the next cannot decrease by more than the arc cost between the states. This is a kind of triangle inequality triangle inequality This condition is a highly desirable property of a heuristic function and often simply assumed (more on this later).

ni

nj

goal

h(Si)

h(Sj)

C(Si, Sj)

33

SEARCHSEARCHA* (with Strict Expanded List)A* (with Strict Expanded List)

44

Note that the heuristic is admissible and consistent.

A C

G

S

B

11 11

22

100100

22

Heuristic Values:A=100, B=8888, C=100, S=90, G=0

Step Q Expanded List

1 (90 S)

2 (90 BS)(101 AS) S

3 (101 AS)(104 CBS) A, S

4 (102 CAS)(104 CBS) C, A, S

5 (102 GCAS) G, C, A, S

Underlined paths are chosen for extension.

If we modify the heuristic in the example we have been considering so that it is consistent, as we have done here by increasing the value of h(B), then A* (with the Expanded List) will work.

34

SEARCHSEARCHDealing with inconsistent heuristicDealing with inconsistent heuristic

44

What can we do if we have an inconsistent heuristic but we still want optimal paths?

Modify A* so that it detects and corrects when inconsistency has led us astray.

Assume we are adding node1 to Q and node2 is present in Expanded List with node1.state = node2.state.

Strict:• Do NOT add node1 to Q.

Non-Strict Expanded List:

• If (node1.path_length < node2.path_length), then

1. Delete node2 from Expanded List2. Add node1 to Q.

35

SEARCHSEARCHOptimality and Worst Case ComplexityOptimality and Worst Case Complexity

44

Algorithm Heuristic Expanded List

Optimality Guaranteed?

Worst Case & Expansions

Uniform Uniform CostCost

None Strict Yes N

A*A* Admissible None Yes >N

A*A* Consistent Strict Yes N

A*A* Admissible Strict No N

A*A* Admissible Non-Strict Yes >N

Fuzzy Logic

Fuzzy Inference ProcessFuzzy Inference Process

Fuzzification Rule Evaluation

Defuzzification

e.g. thetae.g. theta e.g. forcee.g. force

Fuzzification: Translate input into truth valuesRule Evaluation: Compute output truth valuesDefuzzification: Transfer truth values into output

Fuzzy Inference Process

Obstacle Avoidance Problem

obstacle

(obsx, obsy)

(x,y)

Can you describe how the robot should turn based on the position

and angle of the obstacle?

Robot Navigation

Demonstration

Obstacle Avoidance & Target Pursuit

D:\Research\Conferences\Fuzzy Days - 9th 2006\Demo\Unmanned 30.7 - Scaled + Waypoint - Dynamic Obstacles - NZ\Unmanned.exe

Another example:Another example: Fuzzy Sets for Robot NavigationFuzzy Sets for Robot Navigation

Angle and DistanceAngle and Distance

Sub ranges for angles & distances overlapSub ranges for angles & distances overlap**

SMALL

MEDIUM

LARGE

NEAR

FAR

VERY FAR

Fuzzy Systems for Obstacle Avoidance

NEAR FAR VERY FAR

SMALL Very Sharp Sharp Turn Med Turn

MEDIUM Sharp Turn Med Turn Mild Turn

LARGE Med Turn Mild Turn Zero Turn

Nearest Obstacle (Distance and Angle)

NEAR FAR VERY FAR

SMALL Very Slow Slow Speed Fast Fast

MEDIUM Slow Speed Fast Speed Very Fast

LARGE Fast Speed Very Fast Top Speed

e.g. If the Distance from the Obstacle is NEAR and the Angle from the Obstacle is SMALL

Then turn Very Sharply.


Then turn Very Sharply.

Fuzzy System 3 (Steering)Fuzzy System 3 (Steering)

Fuzzy System 4 (Speed Adjustment)Fuzzy System 4 (Speed Adjustment)


Then move Very Slowly.


Then move Very Slowly.

Vision SystemVision System

AngleAngle

SpeedSpeed

Fuzzification

0.0 0.5-0.5 1.0-1.0-2.5-3.0 3.02.5

1.0

0.0

NEGATIVENEGATIVE POSITIVEPOSITIVEZEROZERO

Fuzzy Sets = { Negative, Zero, Positive }

Fuzzification ExampleFuzzification Example

Crisp Input: x = 0.25x = 0.25What is the degree of What is the degree of

membership of x in each of the membership of x in each of the Fuzzy Sets?Fuzzy Sets?

Assuming that we are using trapezoidal membership functions.

1. Fuzzification

Fuzzification

0.0 0.5-0.5 1.0-1.0-2.5-3.0 3.02.5

1.0

0.0

NEGATIVENEGATIVE POSITIVEPOSITIVEZEROZERO

Fuzzy Sets = { Negative, Zero, Positive }

Fuzzification ExampleFuzzification Example

ZE

x a d xF (0.25) max min ,1, ,0

b a d c

Sample Calculationsx 0.25

ZE

x a d xF (0.25) max min ,1, ,0

b a d c

0.25 ( 1) 1 0.25max min ,1, ,0

0.25 ( 1) 1 0.25

max min 1.67,1,1 ,0

1

P

0.25 ( 0.5) 3 0.25F (0.25) max min ,1, ,0

0.5 ( 0.5) 3 0.25

max min 0.75,1,5.5 ,0

0.75

N

0.25 ( 3) 0.5 0.25F (0.25) max min ,1, ,0

2.5 ( 3) 0.5 ( 0.5)

max min 6.5,1,0.25 ,0

0.25

Crisp Input:

Fzero(0.25)

Fnegative(0.25)

Fpositive(0.25)3 - 2.5

Sample CalculationsCrisp Input:

Fzero(-0.25)

Fnegative(-0.25)

Fpositive(-0.25)

y 0.25

ZE

0.25 ( 1) 1 ( 0.25)F ( 0.25) max min ,1, ,0

0.25 ( 1) 1 0.25

max min 1,1,1.67 ,0

1

P

0.25 ( 3) 3 ( 0.25)F ( 0.25) max min ,1, ,0

0.5 ( 0.5) 3 2.5

max min 0.25,1,6.5 ,0

0.25

N

0.25 ( 3) 0.5 ( 0.25)F ( 0.25) max min ,1, ,0

2.5 ( 3) 0.5 ( 0.5)

max min 5.5,1,0.75 ,0

0.75

LeftTrapezoid

Left_Slope = 0 Right_Slope = 1 / (A - B)

CASE 1: X < a Membership Value = 1

CASE 2: X >= b Membership Value = 0 CASE 3: a < x < b Membership Value = Right_Slope * (X - b)

Trapezoidal Membership Functions

a b


RightTrapezoid

Left_Slope = 1 / (B - A) Right_Slope = 0

CASE 1: X <= a Membership Value = 0 CASE 2: X >= b

Membership Value = 1

CASE 3: a < x < b Membership Value = Left_Slope * (X - a)

a b


Regular Trapezoid

Left_Slope = 1 / (B - A) Right_Slope = 1 / (C - D)

CASE 1: X <= a Or X >= d Membership Value = 0

CASE 2: X >= b And X <= c Membership Value = 1

CASE 3: X >= a And X <= b Membership Value = Left_Slope * (X - a)

CASE 4: (X >= c) And (X <= d) Membership Value = Right_Slope * (X - d)

a b c d

Inputs are applied to a set of if/then control rules.

2. Rule Evaluation

e.g. IF temperature is very hot, THEN set fan speed very high.

The results of various rules are summed together to generate a set of “fuzzy outputs”.

Fuzzy Control

NL NS NS

NS ZE PS

PS PS PL

Different stages of Fuzzy controlDifferent stages of Fuzzy control

N ZE P

NZEP

FAMMFAMMOutputsNL=-5 NS=-2.5ZE=0PS=2.5PL=5.0

W1 W4 W7

W2 W5 W8

W3 W6 W9

x

y

Fuzzy Control

1 N N

2 N ZE

3 N P

4 ZE N

5 ZE ZE

W min F (0.25),F ( 0.25) min 0.25,0.75 0.25

W min F (0.25),F ( 0.25) min 0.25,1 0.25

W min F (0.25),F ( 0.25) min 0.25,0.25 0.25

W min F (0.25),F ( 0.25) min 1,0.75 0.75

W min F (0.25),F ( 0.25) min 1,1 1

W

6 ZE P

7 P N

8 P ZE

9 P P

min F (0.25),F ( 0.25) min 1,0.25 0.25

W min F (0.25),F ( 0.25) min 0.75,0.75 0.75

W min F (0.25),F ( 0.25) min 0.75,1 0.75

W min F (0.25),F ( 0.25) min 0.75,0.25 0.25

Assuming that we are using the conjunction operator (AND) in the antecedents of the rules.

NL NS NS

NS ZE PS

PS PS PL

N ZE P

NZEP

FAMMFAMMx

y

W1 W4 W7

W2 W5 W8

W3 W6 W9

2. Rule Evaluation

Fuzzy Control

1 2 3 4 5 6 7 8 99

ii 1

(W NL W NS W PS W NS W ZE W PS W NS W PS W PL)OUTPUT

W

0.25 ( 5) 0.25 2.5 0.25 2.5 0.75 2.5 1 0 0.25 2.5 0.75 2.5 0.75 2.5 0.25 5

0.25 0.25 0.25 0.75 1 0.25 0.75 0.75 0.25

= -1.25/ 4.5 = -0.278

Defuzzification ExampleDefuzzification Example

Assuming that we are using the center of mass defuzzification method.

NL NS NS

NS ZE PS

PS PS PL

N ZE P

NZEP

FAMMFAMMx

y

W1 W4 W7

W2 W5 W8

W3 W6 W9

3. Defuzzification

Neural Networks

Training a NetworkTraining a NetworkBACKPROPAGATION TRAINING

-4.95-4.95

7.1

0.91

0.98

1

1-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzzOk

Oj

Oi

i (INPUT)

j (HIDDEN)

k (OUTPUT)

Wjk

Wij

Wik

tk – target output, Ok – actual output

( ) (1 )k k k k

jk jk k j

t O Ok O

W W O

ik ik k iW W O

_

_

_ _ _k

j

learning rate

error signal

O output of unit j

(1 )j j k jkk

Oj O W ij ij j iW W O

_

_

_ _ _k

i

learning rate

error signal

O output of unit i

Ok(1)

Oj

Ok(0) Ok(2)

j (HIDDEN)

k (OUTPUT)

k –subscript for all output nodes that connect to node j

BACKPROPAGATION TRAINING

We will now look at the formulas for adjusting the weights that lead into the output units of a back propagation network. The actual activation value of an output unit, k, will be ok and the target for unit, k, will be tk . First of all there is a term in the formula for k , the error signal:


where f’ is the derivative of the activation function, f . If we use the usual activation function:

the derivative term is:

-4.95-4.95

7.1

0.91

0.98

1

1-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

The formula to change the weight, wjk between the output unit, k, and unit j is:


where is some relatively small positive constant called the learning rate. With the network given, assuming that all weights start with zero values, and with = 0.1 we have:

0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

The k subscript is for all the units in the output layer however in this example there is only one unit. In the example, then:


0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

The formula for computing the error j for a hidden unit, j, is:


The weight change formula for a weight, wij that goes between the hidden unit, j and the input unit, i is essentially the same as before:

The new weights will be:

0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

Iterative minimization of error over training set: Iterative minimization of error over training set:

1. Put one of the training patterns to be learned on the input units.2. Find the values for the hidden unit and output unit.3. Find out how large the error is on the output unit.4. Use one of the back-propagation formulas to adjust the weights

leading into the output unit.5. Use another formula to find out errors for the hidden layer unit.6. Adjust the weights leading into the hidden layer unit via another

formula.7. Repeat steps 1 thru 6 for the second, third patterns, etc…

Backpropagation Training

Sum of Squared Errors

• This error is minimised during training.

n

i

m

k kO

kTE

1 1)(

wherewhere T Tkk is the target output;

OOkk is the actual output of the network;

mm is the total number of output units;nn is the number of training exemplars;

Training Neural Nets• Given: Data set, desired outputs and a Neural Net with mm weights. • Find a setting for the weights that will give good predictive

performance on new data. • Estimate expected performance on new data.

1. Split data set (randomly) into three subsetsthree subsets: Training set – used for picking weights Validation set – used to stop training Test set – used to evaluate performance

2. Pick random, small weights as initial values.3. Perform iterative minimization of error over training setiterative minimization of error over training set.4. Stop when errorerror on validation set reaches a minimum (to avoid

over-fitting).5. Repeat training (from Step 2) several times (to avoid local

minima).6. Use best weights to compute error on test set, which is the

estimate of performance on new datanew data. Do not repeat training to improve this.

Multi-Layer Feed-Forward Neural NetworkMulti-Layer Feed-Forward Neural Network

Why do we need BIAS UNITS (or Threshold Nodes)?

Apart from improving the speed of learning for some problems (XOR problem), bias units or threshold nodes are required for universal approximation. Without them, the feedforward network always assigns 0 output to 0 input. Without thresholds, it would be impossible to Without thresholds, it would be impossible to approximate functions which assign nonzero output to zero input.approximate functions which assign nonzero output to zero input. Threshold nodes are needed in much the same way that the constant polynomial ‘1’ is required for approximation by polynomials.

-4.95-4.95

7.1

0.91

0.98

1

1-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

Data Sets

• Split data set (randomly) into three subsets:1. Training set – used for picking weights2. Validation set – used to stop training3. Test set – used to evaluate performance

Training

Input Representation• All the signals in a neural net are [0, 1]. Input values should

also be scaled to this range (or approximately so) so as to speed training.

• If the input values are discrete, e.g. {A,B,C,D} or {1,2,3,4}, they need to be coded in unary form.

Output Representation

• A neural net with a single sigmoid output is aimed at binary classification. Class is 0 if y < 0.5 and 1 otherwise.

• For multi-class problems• Can use one output per class (unary encoding)• There may be confusing outputs (two outputs > 0.5 in

unary encoding)• More sophisticated method is to use special softmaxsoftmax

units, which force outputs to sum to 1.

Neural NetworksRegression Problems?

Classification Problems?

Backpropagation Learning

Practice solving using the tutorials.

Overfitting problem in network training

CSPConstraint Propagation

Simple Backtracking (BT)

Simple Backtracking with Forward Checking (BT-FC)

Simple Backtracking with Forward Checking with Dynamic Variable and Value Ordering

Practice solving using the tutorials.

GamesMin-Max

Alpha-Beta

Practice solving using the tutorials

Review for Finals 2011

Documents

Transcript of Review for Finals 2011