Slides are based on Negnevitsky, Pearson Education, 2005 1 Lecture 12 Hybrid intelligent systems:...

Slides are based on Slides are based on Negnevitsky, Pearson Education, 200Negnevitsky, Pearson Education, 20055 1

Lecture 12Lecture 12

Hybrid intelligent systems:Hybrid intelligent systems:Evolutionary neural networks and fuzzy Evolutionary neural networks and fuzzy evolutionary systemsevolutionary systems

IntroductionIntroduction Evolutionary neural networksEvolutionary neural networks Fuzzy evolutionary systemsFuzzy evolutionary systems SummarySummary


Evolutionary neural networksEvolutionary neural networks Although neural networks are used for solving a Although neural networks are used for solving a

variety of problems, they still have some variety of problems, they still have some limitations. limitations.

One of the most common is associated with neural One of the most common is associated with neural network training. The back-propagation learning network training. The back-propagation learning algorithm cannot guarantee an optimal solution. algorithm cannot guarantee an optimal solution. In real-world applications, the back-propagation In real-world applications, the back-propagation algorithm might converge to a set of sub-optimal algorithm might converge to a set of sub-optimal weights from which it cannot escape. As a result, weights from which it cannot escape. As a result, the neural network is often unable to find a the neural network is often unable to find a desirable solution to a problem at hand. desirable solution to a problem at hand.


Another difficulty is related to selecting an Another difficulty is related to selecting an optimal topology for the neural network. The optimal topology for the neural network. The “right” network architecture for a particular “right” network architecture for a particular problem is often chosen by means of heuristics, problem is often chosen by means of heuristics, and designing a neural network topology is still and designing a neural network topology is still more art than engineering.more art than engineering.

Genetic algorithms are an effective optimisation Genetic algorithms are an effective optimisation technique that can guide both weight optimisation technique that can guide both weight optimisation and topology selection.and topology selection.


y

0.91

3

4

5

6

7

8

x1

x3

x22

-0.8

0.4

0.8

-0.7

0.2

-0.2

0.6-0.3 0.1

-0.2

0.9

-0.60.1

0.3

0.5

From neuron:To neuron:

1 2 3 4 5 6 7 8

12345678

0 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

0.9 -0.3 -0.7 0 0 0 0 0 -0.8 0.6 0.3 0 0 0 0 00.1 -0.2 0.2 0 0 0 0 00.4 0.5 0.8 0 0 0 0 00 0 0 -0.6 0.1 -0.2 0.9 0

Chromosome: 0.9 -0.3 -0.7 -0.8 0.6 0.3 0.1 -0.2 0.2 0.4 0.5 0.8 -0.6 0.1 -0.2 0.9

Encoding a set of weights in a chromosomeEncoding a set of weights in a chromosome


The second step is to define a fitness function for The second step is to define a fitness function for evaluating the chromosome’s performance. This evaluating the chromosome’s performance. This function must estimate the performance of a function must estimate the performance of a given neural network. We can apply here a given neural network. We can apply here a simple function defined by the sum of squared simple function defined by the sum of squared errors. errors.

The training set of examples is presented to the The training set of examples is presented to the network, and the sum of squared errors is network, and the sum of squared errors is calculated. The smaller the sum, the fitter the calculated. The smaller the sum, the fitter the chromosome. chromosome. The genetic algorithm attempts The genetic algorithm attempts to find a set of weights that minimises the sum to find a set of weights that minimises the sum of squared errors.of squared errors.


The third step is to choose the genetic operators – The third step is to choose the genetic operators – crossover and mutation. A crossover operator crossover and mutation. A crossover operator takes two parent chromosomes and creates a takes two parent chromosomes and creates a single child with genetic material from both single child with genetic material from both parents. Each gene in the child’s chromosome is parents. Each gene in the child’s chromosome is represented by the corresponding gene of the represented by the corresponding gene of the randomly selected parent.randomly selected parent.

A mutation operator selects a gene in a A mutation operator selects a gene in a chromosome and adds a small random value chromosome and adds a small random value between between 1 and 1 to each weight in this gene.1 and 1 to each weight in this gene.


Crossover in weight optimisationCrossover in weight optimisation3

4

5

y6

x22

-0.3

0.9-0.7

0.5

-0.8

-0.6

Parent 1

x11

-0.2

0.1

0.4

3

4

5

y6

x22

-0.1-0.5

0.2-0.9

0.6

0.3

Parent 2

x11

0.9

0.3

-0.8

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9 0.4 -0.3 0.3 0.2 0.3 -0.9 0.60.9 -0.5 -0.8 -0.1

0.1 -0.7 -0.6 0.5 -0.80.9 -0.5 -0.8 0.1

3

4

5

y6

x22

-0.1

-0.5-0.7

0.5

-0.8

-0.6

Child

x11

0.9

0.1

-0.8


Mutation in weight optimisationMutation in weight optimisation

Original network3

4

5

y6

x22

-0.3

0.9-0.7

0.5

-0.8

-0.6x11

-0.2

0.1

0.4

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9

3

4

5

y6

x22

0.2

0.9-0.7

0.5

-0.8

-0.6x11

-0.2

0.1

-0.1

0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9

Mutated network

0.4 -0.3 -0.1 0.2


Can genetic algorithms help us in selecting Can genetic algorithms help us in selecting the network architecture?the network architecture?

The architecture of the network (i.e. the number of The architecture of the network (i.e. the number of neurons and their interconnections) often neurons and their interconnections) often determines the success or failure of the application. determines the success or failure of the application. Usually the network architecture is decided by Usually the network architecture is decided by trial trial and errorand error; there is a great need for a method of ; there is a great need for a method of automatically designing the architecture for a automatically designing the architecture for a particular application. Genetic algorithms may particular application. Genetic algorithms may well be suited for this task.well be suited for this task.


The basic idea behind evolving a suitable network The basic idea behind evolving a suitable network architecture is to architecture is to conduct a genetic searchconduct a genetic search in a in a population of possible architectures.population of possible architectures.

We must first choose a method of encoding a We must first choose a method of encoding a network’s architecture into a chromosome.network’s architecture into a chromosome.


Encoding the network architectureEncoding the network architecture

The connection topology of a neural network can The connection topology of a neural network can be represented by a square connectivity matrix. be represented by a square connectivity matrix.

Each entry in the matrix defines the type of Each entry in the matrix defines the type of connection from one neuron (column) to another connection from one neuron (column) to another (row), where 0 means no connection and 1 (row), where 0 means no connection and 1 denotes connection for which the weight can be denotes connection for which the weight can be changed through learning. changed through learning.

To transform the connectivity matrix into a To transform the connectivity matrix into a chromosome, we need only to string the rows of chromosome, we need only to string the rows of the matrix together.the matrix together.


Encoding of the network topologyEncoding of the network topology

From neuron:To neuron:

1 2 3 4 5 6

123456

0 0 0 0 0 00 0 0 0 0 01 1 0 0 0 01 0 0 0 0 00 1 0 0 0 00 1 1 1 1 0

3

4

5

y6

x22

x11

Chromosome: 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0


BBinary string representation for the network inary string representation for the network architecturearchitecture

10 steps as described on pages 222 and 289, 10 steps as described on pages 222 and 289, respectivelyrespectively– SStep 1: initial valuestep 1: initial values– Step 2: FStep 2: Fitness functionitness function– SStep 3: generating an initial populationtep 3: generating an initial population– SStep 4: decoding an individual chromosome tep 4: decoding an individual chromosome

into a neural networkinto a neural network


– SStep 5: repeating step 4tep 5: repeating step 4– SStep 6: selecting a pair of chromosomestep 6: selecting a pair of chromosomes– SStep 7: creating a pair of offspring tep 7: creating a pair of offspring

chromosomeschromosomes– SStep 8: placing the created offspring tep 8: placing the created offspring

chromosomes in the new populationchromosomes in the new population– Step 9: repeating step 6 to obtain more new Step 9: repeating step 6 to obtain more new

chromosomes chromosomes – SStep 10: go back to step 4, till the fixed number tep 10: go back to step 4, till the fixed number

of generations of generations


The cycle of evolving a neural network topologyThe cycle of evolving a neural network topologyNeural Network j

Fitness = 117

Neural Network j

Fitness = 117Generation i

Training Data Set 0 0 1.00000.1000 0.0998 0.88690.2000 0.1987 0.75510.3000 0.2955 0.61420.4000 0.3894 0.47200.5000 0.4794 0.33450.6000 0.5646 0.20600.7000 0.6442 0.08920.8000 0.7174 -0.01430.9000 0.7833 -0.10381.0000 0.8415 -0.1794

Child 2

Child 1

CrossoverParent 1

Parent 2

Mutation

Generation (i + 1)


Fuzzy evolutionary systemsFuzzy evolutionary systems Evolutionary computation is also used in the Evolutionary computation is also used in the

design of fuzzy systems, particularly for generating design of fuzzy systems, particularly for generating fuzzy rules and adjusting membership functions of fuzzy rules and adjusting membership functions of fuzzy sets. fuzzy sets.

In this section, we introduce an application of In this section, we introduce an application of genetic algorithms to select an appropriate set of genetic algorithms to select an appropriate set of fuzzy IF-THEN rules for a classification problem.fuzzy IF-THEN rules for a classification problem.

For a classification problem, a set of fuzzy For a classification problem, a set of fuzzy

IF-THEN rules is generated from numerical data. IF-THEN rules is generated from numerical data. First, we use a grid-type fuzzy partition of an input First, we use a grid-type fuzzy partition of an input

space.space.


Fuzzy partition by a 3Fuzzy partition by a 33 fuzzy grid3 fuzzy grid

0 1

A1 A2 A3

X1

B2

B1

B3

0

1X2

Class 1:

Class 2:

(x1)

(x2)

0

10 1

1

2

3

6

7

45

9

8

1110

12

16

15

14

13

x11

x21


Black and white dots denote the training patterns Black and white dots denote the training patterns of of ClassClass 1 and 1 and ClassClass 2, respectively. 2, respectively.

The grid-type fuzzy partition can be seen as a The grid-type fuzzy partition can be seen as a rule table. rule table.

The linguistic values of input The linguistic values of input xx1 (1 (AA11, , AA22 and and AA33) )

form the horizontal axis, and the linguistic values form the horizontal axis, and the linguistic values of input of input xx2 (2 (BB11, , BB22 and and BB33) form the vertical axis. ) form the vertical axis.

At the intersection of a row and a column lies the At the intersection of a row and a column lies the rule consequent. rule consequent.

Fuzzy partitionFuzzy partition


In the rule table, each fuzzy subspace can have In the rule table, each fuzzy subspace can have only one fuzzy IF-THEN rule, and thus the total only one fuzzy IF-THEN rule, and thus the total number of rules that can be generated in a number of rules that can be generated in a KKKK grid is equal to grid is equal to KKKK. .


Fuzzy rules that correspond to the Fuzzy rules that correspond to the KKKK fuzzy fuzzy partition can be represented in a general form as:partition can be represented in a general form as:

where where xxpp is a training pattern on input space is a training pattern on input space XX11XX2, 2,

PP is the total number of training patterns, is the total number of training patterns, CCnn is the is the

rule consequent (either rule consequent (either ClassClass 1 or 1 or ClassClass 2), and 2), and is the certainty is the certainty factor that a pattern in fuzzy factor that a pattern in fuzzy subspace subspace AAiiBBjj belongs to class belongs to class CCnn..

is Ai i = 1, 2, . . . , K

is Bj j = 1, 2, . . . , K

Rule Rij :IF x1p

THENxp

AND x2p Cn

n

ji

CBACF xp = (x1p, x2p), p = 1, 2, . . . , P

CFCFAAi i BBjj

CCnn


To determine the rule consequent and the certainty To determine the rule consequent and the certainty factor, we use the following procedure:factor, we use the following procedure:

Step 1Step 1:: Partition an input space into Partition an input space into KKKK fuzzy fuzzy subspaces, and calculate the strength of each class subspaces, and calculate the strength of each class of training patterns in every fuzzy subspace.of training patterns in every fuzzy subspace.

Each class in a given fuzzy subspace is represented Each class in a given fuzzy subspace is represented by its training patterns. The more training patterns, by its training patterns. The more training patterns, the stronger the class the stronger the class in a given fuzzy subspace, the in a given fuzzy subspace, the rule consequent becomes more certain when patterns rule consequent becomes more certain when patterns of one particular class appear more often than of one particular class appear more often than patterns of any other class.patterns of any other class.

Step 2Step 2:: Determine the rule consequent and the Determine the rule consequent and the certainty factor in each fuzzy subspace.certainty factor in each fuzzy subspace.


The certainty factor can be interpreted as The certainty factor can be interpreted as follows:follows:

If all the training patterns in fuzzy subspace If all the training patterns in fuzzy subspace AAiiBBjj

belong to the same class, then the certainty belong to the same class, then the certainty factor is maximum and it is certain that any new factor is maximum and it is certain that any new pattern in this subspace will belong to this class. pattern in this subspace will belong to this class.

If, however, training patterns belong to different If, however, training patterns belong to different classes and these classes have similar strengths, classes and these classes have similar strengths, then the certainty factor is minimum and it is then the certainty factor is minimum and it is uncertain that a new pattern will belong to any uncertain that a new pattern will belong to any particular class.particular class.


This means that patterns in a fuzzy subspace can This means that patterns in a fuzzy subspace can be misclassified. Moreover, if a fuzzy subspace be misclassified. Moreover, if a fuzzy subspace does not have any training patterns, we cannot does not have any training patterns, we cannot determine the rule consequent at all.determine the rule consequent at all.

If a fuzzy partition is too coarse, many patterns If a fuzzy partition is too coarse, many patterns may be misclassified. On the other hand, if a may be misclassified. On the other hand, if a fuzzy partition is too fine, many fuzzy rules fuzzy partition is too fine, many fuzzy rules cannot be obtained, because of the lack of cannot be obtained, because of the lack of training patterns in the corresponding fuzzy training patterns in the corresponding fuzzy subspaces.subspaces.


Training patterns are not necessarily Training patterns are not necessarily distributed evenly in the input space. As a distributed evenly in the input space. As a result, it is often difficult to choose an result, it is often difficult to choose an appropriate density for the fuzzy grid. To appropriate density for the fuzzy grid. To overcome this difficulty, we use overcome this difficulty, we use multiple multiple fuzzy rule tablesfuzzy rule tables..


Multiple fuzzy rule tablesMultiple fuzzy rule tables

K = 2 K = 3 K = 4 K = 5 K = 6

Fuzzy IF-THEN rules are generated for each fuzzy Fuzzy IF-THEN rules are generated for each fuzzy subspace of multiple fuzzy rule tables, and thus a subspace of multiple fuzzy rule tables, and thus a complete set of rules for our case can be specified complete set of rules for our case can be specified as: as:

22 x 2 x 2 33 x 3 x 3 4 4 x 4 x 4 5 5 x 5 x 5 6 6 x 6 x 6 = 90 = 90 rules.rules.


Once the set of rules Once the set of rules SSALLALL is generated, a new is generated, a new

pattern, pattern, xx = ( = (xx1, 1, xx2), can be classified by the 2), can be classified by the following procedure:following procedure:

Step 1Step 1:: In every fuzzy subspace of the multiple In every fuzzy subspace of the multiple fuzzy rule tables, calculate the degree of fuzzy rule tables, calculate the degree of compatibility of a new pattern with each class.compatibility of a new pattern with each class.

Step 2Step 2:: Determine the maximum degree of Determine the maximum degree of compatibility of the new pattern with each class.compatibility of the new pattern with each class.

Step 3Step 3:: Determine the class with which the new Determine the class with which the new pattern has the highest degree of compatibility, pattern has the highest degree of compatibility, and assign the pattern to this class.and assign the pattern to this class.


The number of multiple fuzzy rule tables The number of multiple fuzzy rule tables required for an accurate pattern classification required for an accurate pattern classification may be large. Consequently, a complete set of may be large. Consequently, a complete set of rules can be enormous. Meanwhile, these rules rules can be enormous. Meanwhile, these rules have different classification abilities, and thus have different classification abilities, and thus by selecting only rules with high potential for by selecting only rules with high potential for accurate classification, we reduce the number accurate classification, we reduce the number of rules.of rules.


Can we use genetic algorithms for selecting Can we use genetic algorithms for selecting fuzzy IF-THEN rules ?fuzzy IF-THEN rules ? The problem of selecting fuzzy IF-THEN rules The problem of selecting fuzzy IF-THEN rules

can be seen as a combinatorial optimisation can be seen as a combinatorial optimisation problem with two objectives.problem with two objectives.

The first, more important, objective is to The first, more important, objective is to maximise the number of correctly classified maximise the number of correctly classified patterns.patterns.

The second objective is to minimise the number The second objective is to minimise the number of rules. of rules.

Genetic algorithms can be applied to this Genetic algorithms can be applied to this problem.problem.


A basic genetic algorithm for selecting fuzzy IF-A basic genetic algorithm for selecting fuzzy IF-THEN rules includes the following steps:THEN rules includes the following steps:

Step 1Step 1:: Randomly generate an initial population of Randomly generate an initial population of chromosomes. The population size may be chromosomes. The population size may be relatively small, say 10 or 20 chromosomes. relatively small, say 10 or 20 chromosomes. Each gene in a chromosome corresponds to a Each gene in a chromosome corresponds to a particular fuzzy IF-THEN rule in the rule set particular fuzzy IF-THEN rule in the rule set defined by defined by SSALLALL..

Step 2Step 2:: Calculate the performance, or fitness, of Calculate the performance, or fitness, of each individual chromosome in the current each individual chromosome in the current population.population.


The problem of selecting fuzzy rules has two The problem of selecting fuzzy rules has two objectives: to maximise the accuracy of the pattern objectives: to maximise the accuracy of the pattern classification and to minimise the size of a rule set. classification and to minimise the size of a rule set. The fitness function has to accommodate both these The fitness function has to accommodate both these objectives. This can be achieved by introducing two objectives. This can be achieved by introducing two respective weights, respective weights, wwPP and and wwNN, in the fitness function:, in the fitness function:

where where PPss is the number of patterns classified is the number of patterns classified

successfully, successfully, PPALLALL is the total number of patterns is the total number of patterns

presented to the classification system, presented to the classification system, NNSS and and NNALLALL are are

the numbers of fuzzy IF-THEN rules in set the numbers of fuzzy IF-THEN rules in set SS and set and set SSALLALL, respectively., respectively.

ALL

SN

ALLP N

Nw

P

PwSf s )(


The classification accuracy is more important than The classification accuracy is more important than the size of a rule set. That is,the size of a rule set. That is,

ALL

S

ALL N

N

P

PSf s 10)(


Step 3Step 3:: Select a pair of chromosomes for mating. Select a pair of chromosomes for mating. Parent chromosomes are selected with a Parent chromosomes are selected with a probability associated with their fitness; a better probability associated with their fitness; a better fit chromosome has a higher probability of being fit chromosome has a higher probability of being selected.selected.

Step 4Step 4: : Create a pair of offspring chromosomes Create a pair of offspring chromosomes by applying a standard crossover operator. by applying a standard crossover operator. Parent chromosomes are crossed at the randomly Parent chromosomes are crossed at the randomly selected crossover point.selected crossover point.

Step 5Step 5:: Perform mutation on each gene of the Perform mutation on each gene of the created offspring. The mutation probability is created offspring. The mutation probability is normally kept quite low, say 0.01. The mutation normally kept quite low, say 0.01. The mutation is done by multiplying the gene value by –1.is done by multiplying the gene value by –1.


Step 6Step 6:: Place the created offspring chromosomes in Place the created offspring chromosomes in the new population.the new population.

Step 7Step 7:: Repeat Repeat Step 3Step 3 until the size of the new until the size of the new population becomes equal to the size of the initial population becomes equal to the size of the initial population, and then replace the initial (parent) population, and then replace the initial (parent) population with the new (offspring) population.population with the new (offspring) population.

Step 9Step 9:: Go to Go to Step 2Step 2, and repeat the process until a , and repeat the process until a specified number of generations (typically several specified number of generations (typically several hundreds) is considered.hundreds) is considered.

The number of rules can be cut down to less than The number of rules can be cut down to less than 2% of the initially generated set of rules.2% of the initially generated set of rules.

Slides are based on Negnevitsky, Pearson Education, 2005 1 Lecture 12 Hybrid intelligent systems:...

Documents

Transcript of Slides are based on Negnevitsky, Pearson Education, 2005 1 Lecture 12 Hybrid intelligent systems:...