s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

13
For Peer Review Only Statistical Model of Evolutionary Algorithm for Feed-Forward ANN Architecture Optimization Journal: Journal of Experimental & Theoretical Artificial Intelligence Manuscript ID: Draft Manuscript Type: Original Article Keywords: Artificial neural network, , Crossover., schema theory, topology mutation URL: http://mc.manuscriptcentral.com/teta Journal of Experimental & Theoretical Artificial Intelligence

description

h

Transcript of s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

Page 1: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

Statistical Model of Evolutionary Algorithm for Feed-Forward ANN Architecture Optimization

Journal: Journal of Experimental & Theoretical Artificial Intelligence

Manuscript ID: Draft

Manuscript Type: Original Article

Keywords: Artificial neural network, , Crossover., schema theory, topology mutation

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

Page 2: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

ABSTRACT The optimization of feed-forward

architecture designing is the evolution of

Artificial Neural Network (ANN). There is

no systematic procedure to design a near-

optimal architecture for a given application

or task. The pattern classification methods

and constructive and destructive algorithms

can be used for designing of architectures.

The proposed work develops the statistical

model of Evolutionary algorithm (EA) to

optimize the architecture. A single-point

crossover is applied with selective schemas

on the network space and evolution is

introduced in the mutation stage so that an

optimized ANNs are achieved.

Keywords: Artificial neural network,

topology mutation, schema theory

Crossover.

1 INTRODUCTION: Genetic algorithms

were developed by John Holland [1], [3], [2]

& [4]. Due to day to day life, a growing

number of applications combined with a

hardware enhancement, a variety of EAs are

becoming more and more popular. A family

of subsets of the search space and

appropriate process of re-encoding are two

notions and analogous to familiar facts

relating continuous maps and families of

open sets or measurable functions. In order

to apply on EA to a typical optimization

problem, we need to model the problem in a

suitable manner, i.e. to construct a search

space Ω together with positive valued fitness

function ƒ and a family of mating and

mutation transforms. Therefore EAs can be

represented by a 4 tuple (Ω, Μ, ₣, ƒ) order.

₣ is the family mating transforms; M is the

unary transformation on Ω. The total search

space is divided into invariant subsets [3]

and a crossover operation is performs on Ω.

While M is the family of mutations on Ω and

is Ergodic, i.e. it ensures that MarKov

process [5] modeling the algorithm is

irreducible. The schemata correspond to

invariant subsets of the search space and the

schema theorem can be reformulated in

general framework. The invariant subsets of

the search space are encoding process

relating continuous maps and families of

open sets or measurable functions and sigma

– algebras. A classical Geiringer theorem is

extended to represents a class of

evolutionary computation techniques with

crossover and mutation.

2.0 Representation of Evolutionary

Algorithm:

The mathematical foundation on

evolutionary algorithms representation given

section 1.0, we exploit the language of

category theory [6] is used. To apply on

evolutionary algorithm on a specific

Statistical Model of Evolutionary Algorithm for

Feed-Forward ANN Architecture Optimization

G.V.R. Sagar [email protected]

Assoc. Professor,

G.P.R. Engg. College.

Kurnool AP, 518007, India.

Dr. S. Venkata Chalam [email protected]

Professor,

CVR Engg. College,

Hyderabad AP, India

Page 1 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 3: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

optimization problem, we need to model the

problem in a respective manner. This needs

to build a search space Ω which contains

the elements of all the possible solutions to

the problem, a computable positive valued

fitness function ( )α,0: →Ωf and a suitable

family of ‘mating’ or ‘crossover’ and

mutation transforms.

The category of Heuristic 3-tuples: All the

families (F) are invariant subsets [3] of Ω,

characterize all families in set-theoretic and

sigma–algebra.

Let Γ denote a nonempty family of

transforms from mΩ to Ω for a fixed 1≥m ,

for m-fixable families. We then denote the

family of invariant subsets of Ω under

family of Γ is Γ∧ .

( ) Γ∈∀≤Ω≤=∧Γ TSSTSS m, 3.18

It follows that for every element x∈Ω there

is a unique element of Γ∧ containing x.

For a heuristic 3-tuple ( )MF ,,Ω=Ω is a 3-

tuple such that ΩΦ=∧ ,M . Let x ∈Ω, For

a single heuristic 3-tuple Ω = (Ω, F, M),

denoted by ΩxS then the smallest element of

F∧ is family of invariant subsets.

In a similar manner for given two heuristic

3-tuples ( )1111 ,, MFΩ=Ω and

( )2222 ,, MFΩ=Ω , we define a function

21: Ω→Ωaδ represents the reproduction

transformation called as morphism. Let

x∈Ω1 and y∈Ω2 1FT ∈∀ and

( ) ( ) 2,

2, FFyxx yx ∈∃Ω∈=∀ such that

( )( ) ( ) ( ) ( )( )yxFyxT yx δδδ ,, ,= 3.2

Similarly 1MM ∈∀ and 2MH xx ∈∃Ω∈∀

such that ( )( ) ( )( )xxx HM δδ = , gives a

collection of all morphisms from Ω1 into Ω2

denoted by M.

A Generalization of Geiringer’s theorem

for EA’s

A family of recombination operators, (also

see in [7]) of a given evolutionary algorithm

changes the frequency, with which various

elements of search space are sampled [1],

[8]. To illustrate this point, let i

n

i A1=Π=Ω

denote the search space of a given

evolutionary algorithm first discussed in [9].

Fix a population P consisting of a ‘m’

individuals with m being an even number. P

can be thought of as an m by n matrix whose

rows are individuals of the population P,

=

mnmm

n

n

aaa

aaa

aaa

P

....

.......

.......

.......

.......

....

....

21

22221

11211

1.0

The elements of the ith

column of P are

members of Ai. The general Geringer’s

theorem [10] tells us the limiting frequency

with which certain elements of the search

space are sampled in the long run, provided

one uses crossover operator [19] alone is

represented by ( )iph ,,Φ , where h∈Ai the

proportion of rows, say j of p for which

aji=h. i.e. if one starts with a population of

individuals and runs a evolutionary

algorithm in the absence of selection and

mutation (crossover being the only operator

involved) then, in the long run, the

frequency of occurrence of the individual

( )nhhh .....,, 21 before time t, represented by

( )thhh n ,.....,, 21Φ is

( )thhh nt

,.....,,lim 21Φ∞→

= ( )iphn

i

,,1

ΦΠ=

1.1

The limiting distributions of the frequency of

occurrence of individuals belonging to a

certain schema under these algorithms have

been computed also appeared in [11], [12],

[13]. The classical Geiringer theorem and

proposed or modified Geiringer algorithms

Page 2 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 4: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

established from the basic facts about

MarKov chains [5] and random walks on

groups. This is mainly a matter of

formulating the statement of the theorem in a

slightly different manner. This new point of

view not only the existing various of

Geiringer’s theorem applied EA’s, but also

extends the process on evolutions

algorithms. Below we shall give a more

formal description of an EA then the one

given in “Section 1”.

Framework:

A population P of size m is simply an

element of mΩ . (It is a column vector). An

elementary step is a probabilistic rule which

takes one population as an input and produce

another population of the same size as an

output. We shall consider the following

types of elementary steps.

Selection: Consider a given population P as

an input.

=

mx

x

x

P

.

.

.

2

1

with Ω∈ix 1.2

The individuals of P are evaluated

( )( )

( )mm xf

xf

xf

x

x

x

..

..

..

.

.

.

2

1

2

1

1.3

A new population P1 is obtained, where y

i’s

are chosen independently m times from the

individuals of P and yi=x

j with probability

P =

This means that, the individuals of P1 are

among there of P, and the expectation of the

number of occurrences of any individual of

P in P1 is proportional to the number of

=

my

y

y

P

.

.

.

2

1

1 1.4

occurrence of that individual in P times the

individual’s fitness value. In particular, the

fitter the individual is, the more copies of

that individual are likely to be present in P1.

On the other hand, the individuals having

relatively small fitness value are not likely to

enter into P1 at all. This is similar to imitate

the natural survival of fittest principle.

Crossover: The population P1 is the output

of the selection process. Now consider the

search space Ω be a set, Fix on ordered K-

tuple of integers ( )kqqqq ........,, 21= with

kqqq ≤≤≤ ........21. Let K denote a partition

of set 1, 2, …..m m∈N. Now partition the

set Ω that partition K is q-fit it

kpppK .......,, 21= with ii qP = and is

denoted by m

qΣ the family of all q-fit

partitions of 1, 2, ….m.

Let there are qkqq FFF ......,, 21 fixed families of

qi are operations on Ω and P1, P2,…..Pk be

the probability distributions on

( ) ( ) ( )qk

qk

q

q

q

q FFF ..........,,2

2

1

1 respectively. Let

Pm be the probability distribution on the

collection m

qΣ of partitions 1, 2, ……m

there the their exists a 2(k+1) tuple

( )mkqkqq PPPPFFF ,....,,,......,,, 2121Ω .

According to the above process . The given

reproduction K-tuple

( )mkqkqq PPPPFFF ,....,,,......,,, 2121Ω . The

individuals of P are portioned into pairwise

disjoint tuples for mating according to Pm is

( )

( )l

xf

xfm

l

j

Σ=1

Page 3 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 5: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

( ) ( )( )

=........,....,

,.....,....,,,......,,

21

2

2

2

2

2

1

1

1

1

2

1

1

j

qj

jj

qq

iii

iiiiiiK then

the corresponding tuples are given by

=

11

12

11

.

.

.

1

qi

i

i

x

x

x

Q

=

22

22

21

.

.

.

2

qi

i

i

x

x

x

Q …

=

jqj

j

j

i

i

i

j

x

x

x

Q

.

.

.2

1

1.5

Having selected the partition, replace every

one of the selected qj tuples

jqj

j

j

i

i

i

j

x

x

x

.

.

.2

1

with the qj – tuple 1.6

( )( )

( )

=Φ .

...,,

.,......

.,......

.,......

.,......

,...,,

,...,,

21

21

21

2

1

1

jqj

jj

jqj

jj

jqj

jj

iiiqj

iii

iii

xxxT

xxxT

xxxT

1.7

For a qj - tuple of transformations

( ) ( )qj

qjqj FTTT ∈,......, 21 selected randomly

according to the probability Pj on ( )qj

qjF .

This gives a new population.

=

my

y

y

P

.

.

.

2

1

1 1.8

Notice that a single child does not have to be

produced by exactly two parents. It is

possible that a child has more than two

parents. Asexual reproduction (mutation) is

also allowed.

A general evolutionary search algorithm

works as follows. Fix a cycle, say

j

nnSC1== when Sn is a finite sequence of

elementary steps. Now start the algorithm

with an initial population P given above may

be selected randomly. To run the algorithm

with cycle nSC = , simply input P into S1,

run S1, input the output of S1 to S2 ….. into

the O|P of Sj-1 into Sj and produce the new

O|P, say P1. Now as an initial population and

run the cycle C again. Continue the is loop

finitely many times depending on the

circumstances. A recombination sub-

algorithm defined by a sequence of

elementary steps of reproduction only.

Modified Evolutionary algorithm model:

The general structure of EA proposed in [14]

The evolution algorithm has used the

following operators

a. Initialization

b. Recombination or Crossover

c. Mutation

d. Selection

The frame work of the EAs approach

requires a floating architecture and a fixed

population size. The population size,

maximum size and structure of the network

and genetic parameters are user specified.

The weight population is initialized with

user-defined number of hidden nodes for

each individual in order to create a new

population and the weights are generated

randomly same as the size of the population.

Page 4 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 6: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

ANN Recombination or Crossover:

In the proposed method, from the above

discussion consider a search space set Ω and

family of transformations Fq form Ωq into Ω,

fix an ordered q-tuple of transforms

( )qq FTTT ∈......,, 21 . Now consider the

transformation qq

qTTT Ω→Ω:......,, 21

sending any given element

q

qx

x

x

Ω∈

.

.

.

2

1

into

( )( )

( )

q

qqj

q

q

xxxT

xxxT

xxxT

Ω∈

.

...,,

.,......

.,......

.,......

.,......

,...,,

,...,,

21

212

211

1.9

Let the subsequence j

nnSC1== (The

element step Sn is or recombination) the

recombination sub-algorithm of proposed

EA reproduces K-tuple

( )mkqkqq PPPPFFF ,....,,,......,,, 2121Ω , this heuristic

search algorithm results a MarKov process

with a state space of population, P of fixed

size m and is devoted by ( )mm P Ω∈Ω . Let

the transition probability Pxy is simply the

probability that the population my Ω∈ is

obtained from the population x by going

through the recombination cycle once. These

transition probabilities have been computed

but MarKov chain obtained is difficult to

analyze.

Let fix an EA ‘A’ and the probability that a

population y is obtained from the population

X upon the completion of n complete cycles

of the recombination with a probability

0. >n

yxP . We also write YX A→ for X

leads to Y or with a population mP Ω∈ , we

also write [P]A denotes the equivalence class

of the population P under the equivalence

relation →A .

Therefore “The MarKov chain initiated at

some population mP Ω∈ is irreducible and

its unique stationary distribution is the

uniform distribution on [P]A”.

Now fix a partition

( ) m

qnkPPPK ε∈= ,........., 21 when

( )n

k

nn qqqqn ,........., 21= and now fix a parti-

cular choice of tuples of transformation

( ) ( ) ( ) ni

ni

ni

ni

q

q

q

q

i

qin

ii

i FFTTTT →= :......,, 21 2.0

such that ( ) ,0......,, 21 >i

qin

iin

i TTTP

First notice that we can identify mΩ with the

set nk

nn qqq Ω×Ω×Ω ....21 via portion

( )kPPPK ...,, 21= as follows

given ( ) m

mxxxx Ω∈= ....,, 21 , identify mx Ω∈

with the one point cross over element

( )x

k

xx

x uuuu ...,, 21=r

when ( )n

aaa

x

i xxxu 121 ,....,,= ,

i

n

i Paqaa ∈...,,, 21 and n

aiaaa <<< ....21 .

Now to define a transformation as mmk

TTT KT Ω→Ω:....,,, 21

.

The output of the elementary step Sn

recombination is ( ) YXT k

TTT K=....,,, 21

when

mY Ω∈ corresponds to

( ) ( ) ( )( )x

kk

xx

y uTuTuTu ...,, 2211=r

. The transform

k

TTT KT ....,,, 21

is the bijection. Indeed, two-sided

inverse of k

TTT KT ....,,, 21

is the transformation

( ) 1

....,,, 21

−k

TTT KT which sends a given mx Ω∈ into

my Ω∈ corresponding to the elements

( ) ( ) ( )( )

nk

nn qqq

x

ki

xx uTuTuT

Ω××Ω×Ω

∈−−−

.....

...,,

21

21

1

2

1

1

1

2.1

Page 5 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 7: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

Then set of all such that transformation

denote by

=ionrecombinatfor chosen are ,....,

andin partition a is

21

m

qn,...., 21

k

k

TTT

n

TTT

KTH k

ε

2.2

Now consider the set of transformation H

from mΩ into itself as follows

nnjj

mm HFoFoFoFTTH ∈=Ω→Ω= − ,.....: 11

2.3

therefore any transformation HT ∈ is a

composition of bijections, hence is itself a

bijection so that we can say that mSHΩ

when mSΩ

is a group of permutations of mΩ .

Let G denotes the subgroup of SΩm

generated by H. Now, when a E A ‘A’ runs a

cycle on the input X amounts to selecting the

transformation form H independently and

applying them consecutively so that the

output of the cycle ‘C’ on the input X would

be T(x) for some HT ∈ chosen with some

positive probability.

We now proceed to define a random walk

associates to a group action.

Let Х be a finite set and G. be a finite group

generated by H. ( )GH ⊆ and let ‘d’ denotes

the identity of group ‘G’ ( )He∈ .

Let µ is a probability distribution on G

which is concentrated on

( )( ) n

xyPHggH .0 ∈⇔>µ the probability

that a state XY ∈ is reached from the state

Х in exactly n steps.

The random walk on action of a group G on

the set Х to be the Markov process with

transition probabilities is

( )yxggPyx xy =⋅=∈∀ µχ,

( )( )xgxg ⋅⋅=⋅Q 2.4

Since H generates G, ∃n large enough so that

Gg ∈∀ we assure 0).( >n

xgxP , such that

g

ngmmmg ........,2

2

2

1=

now

let Ggngn ∈= /max .

Therefore eq. (2.4) can be written as Markov

chain as by the definition of group action

( )( )n

Xdddmmmx

n

xgx gng

ggPP⋅⋅ =

.............)(21

( )( )≥=⋅⋅⋅⋅⋅⋅⋅

n

xdddmmmx

n

xgx gng

ggPP......))(....(((((.........()(

21

PP xddxdxdx ....))........()(()( ⋅⋅⋅⋅≥

P xxdddxddd...... )(.........()......)(.......( ⋅⋅⋅⋅⋅⋅⋅

( ) ( ) ( )( )..........1 xmmxmPxmPX g

ng

g

ng

g

ng

g

ngx ⋅⋅⋅ −

( ) ( )( )≥⋅⋅ XmmmXmmmP g

n

g

n

g

n

g

ng

g

n

g

n ............. 32132

According to equation (2.4)

( )( )( ) ( )∏=

− >≥ng

i

g

ni

ngnmd

0

0.µµ 2.5

Equation (2.5) is an irreducible Markov

chain with a finite state space and it has

unique stationary unique distribution

denoted by П is the initial distribution on x.

There fore we then have

( )X

X1

=∏ 2.6

Then the distribution in the next generation

say ρ . is given as .

( ) ( )∑∏∈

−=Hm

mxmX µµρ ).( 1 2.7

∑∈

=Hm

mX

)(1

µ 2.8

( )∏∑ ===∈

XX

mX Hm

1)(

1µ 2.9

Page 6 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 8: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

Since ∑∈

=Hx

m 1)(µ and µ is

concentrated on H.

Markov chain modeling an EA ‘A’ is a

random walk associated to a finite group set

of G on X generates a new population or set

H, in the long run with a uniform

distribution (П).

In this proposed evolutionary algorithms

given above to improve the behavior

between parents and off-springs. Single-

point crossover given in section xxx in

which, (single-point crossover) different

cutting points for each of the two parents in

the population. The cutting points are

independently extracted for each parent

because the genotype lengths of individuals

are variable. The cutting point is taken only

between the one layer and the next layer(for

two hidden layer between second layers of

two network parents); this means that a new

evolutionary weight matrix has created to

make connection between two layers at the

cutting points in the parents producing the

two off-springs, so that the population is

maintained constant. In each off-spring node

or layer creation and deletion is possible

based the predefined genetic parameters.

3.3 Topology Mutation:

The mutation transformations µ consist of

the transformations

Ω→Ω:a

M 3.0

where iSins AUa ∈⊆

Π∈ .,.........2,1 .

Therefore ( )ikii aaaa .....,, 21=→

for

nSiiia

k ,.......,2,1.......21 ⊆∈≤≤≤ → defined

as ( ) Ω∈=∀ ni xxxX .....,, 2 we have

( ) ( )na

yyyyxM ...,........., 21==→ 3.1

where =

=wiseotherx

jsomeforiqifay

q

jq

q

The global behavior of evolutionary

algorithms is to consider a group or family

of subsets of the search space and to predict

which ones of these subsets (say Q) must

satisfies the property is “The expected

number of occurrences of elements of Q

increases from one generation to the next”.

The each subset is called as schemata.

If the chromosomes length is fixed to n, the

search space is i

n

i AS 1=Π= where Ai is set of

all possible alleles which may occur at the ith

position in the chromosome. The next

section gives the selection of offspring based

on the fitness function.

3.4 Selection: A tournament is performed by

choosing the group of off-springs which are

selected randomly and reproducing the best

individual form this group. Now picking up

the P number of challengers as a group,

which is 10% of population size and arrange

the tournament with respected to fitness

between the P challengers and rth solution

and define the scores of rth solutions. The

scores are determined by the minimum

distance method using fitness function [18].

This is called the P tournament selection.

Arrange the scores of all the solutions in the

ascending order and pick up the best half

score positions. The best half scores are

considered for the next generation. Repeat

the process for r number of times, where ‘r’

is the twice the population size and obtained

the scores of ‘r’ number of P-tournaments.

The selection probabilities for P-tournament

selection are given by

1 1 1 3.2

More number of selection pressures and their

comparision are given in [15], [16].

6. EXPERIMENTAL SETUP

The idea proposed in this work emphasis on

evolving ANNS; a new evolutionary system

for evolving feed-forward ANNs from

Page 7 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 9: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

architecture space. In this contest, the

evolutionary process attempts to crossover

and mutate weights before performing any

structural or topology crossover and

mutation. The evolutionary process is

involved in mutation of weights and

topology. Weight mutation is carried out

before structural or topology mutation.

Population size in EA taken as 20 and 10

independent trails have given to get the

generalize behavior. Condition of

terminating criteria is taken as fixed iteration

and it is equal to 100 for EA. Table 5.1 gives

all the parameters of the algorithm and

default setting values are taken for

considered problem. All the experiments are

run by specifying the parameters and by

tuning the genetic parameters to obtain the

best solution.

Table: 5.1 Default parameters.

In this work we considered five benchmark

problems are used to check the ANN

optimization.

a) N.bit (2 and 4) Parity (even) classifier

b) Pima-India diabetes classifier

c) SPECT heart decies classifier

d) Brest cancer classifier

Performance of N-Bit Parity (XOR)

classification Problem:

In simultaneous evolution of architecture

and connection weights, we considered only

2-bit and 4-bit parity encoders with different

network sizes are given in this section.

FIGURE 5.8 Performance of Evolutionary ANN for

2-bit parity with initial sizes of [2 3 2 1 2].

FIGURE 5.8 Performance of Evolutionary ANN for

2-bit parity with initial sizes of [2 2 2 1 2].

For parity 2/4 all networks in the space has a

maximum of 10 nodes including 2/4 inputs ,

number of hidden nodes in layer one,

number of hidden nodes in layer two, 1

output node and two hidden layers i.e. the

size is [2/4 2/3 2 1 2]. This allowed for

hidden layer configurations up to 5 nodes to

be evolved. The average and best generation

over all runs that found a solution for parity-

2 using accuracy fitness function and the

smallest architecture size found. The mean

square error (MSE) for 10 trail runs are

given in the Table 5.1 and the performance

of 5 runs are shown in Fig (5.8) and the run

3,4, & 5 completed in 50 generations and

run1&2 completed in 20 generations. The

average number of hidden nodes over 10

successful trail runs is 2.1 and the average

number of connections is 7.9. For ten runs,

Symbol Parameter Default

value

N Population size 20

Seed Previously saved

population none

Probability of inserting a

hidden layer 0.1

Probability of deleting a

hidden layer 0.05

Probability of inserting a

neuron in hidden layer 0.05

Probability of deleting a

neuron hidden layer 0.05

Probability of crossover 0.1

Number of network

inputs

Problem

specific

Number of network

outputs

Problem

specific

K MSE in the range 10

Page 8 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 10: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

for the N-bit parity problems the best

individuals found with = 0.05,

= 0.05, = 0.01 and

= 0.01.

FIGURE 5.10 Performance of Evolutionary ANN for

4-bit parity with initial size of [4 5 4 1 2].

FIGURE 5.10 Performance of Evolutionary ANN for

4-bit parity with initial size of [4 4 5 1 2].

Table 5.2 Performance of ANN shown by

EA for different trails

Performance of Real-Time dataset

classification Problems:

For real time datasets all the data applied to

the training and test sets are acquired from

the UCI Machine Learning Repository [17].

Each input dataset variable should be

preprocessed so that its mean value,

averaged over the entire training set, is close

to zero, or else it is small compared to its

standard deviation

i) Pima-India-Diabetes dataset

problems.

ii) SPECT Heart Decease

Pima-India-Diabetes dataset composed of 8

attributes plus a binary class value to show

the signs of diabetes which corresponds to

the target classification value and includes

768 instances shown in Table 4.8. All the

datasets are divided in to two sets, using 500

instances for the training, 268 for the test.

For Single Proton Emission Computed

Tomography (SPECT) Heart datasets only

13 attributes are used as input parameters to

classify the problem and a total of 267

instances. The target value has stored at 14th

parameter in the data set. These data sets are

normalized before applying to the network.

All the datasets are divided in to two sets,

using 200 instances for the training and 67

for the test.

The evolutionary process initialized with all

the networks in the architecture space with

an defend architecture size, example of size

[x y z 1 n] i.e. x inputs, y hidden nodes in 1st

hidden layer, z hidden nodes in 2nd

hidden

layer, one output layer with one node and n

represents the number of layers After the

evolutionary ANN process the optimized

network consists of only 2 hidden nodes in a

single hidden layer with uni-model sigmoid

activation function and the result of the real

data classification problems are shown in

Figures and Tables

Trail

No.

MSE

([2 3 2 1 2])

MSE ([4 5 4 1 2])

1 9.0084e-003 3.2548e-006

2 2.1219e-026 1.3548e-002

3 2.0416e-014 6.3254e-011

4 1.3406e-003 5.4856e-019

5 2.1219e-026 9.2154e-026

6 9.0084e-003 9.3554e-004

7 2.1219e-026 2.8754e-014

8 2.0416e-014 9.2365e-013

9 1.3406e-003 3.4587e-001

10 3.2323e-022 8.2657e-016

Page 9 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 11: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

FIGURE 5.12 Performance of Evolutionary ANN for

Pima India diabetes with initial size of [9 4 5 1 2]

Table 5.4 Results of Pima India Diabetes.

FIGURE 5.13 Performance of Evolutionary ANN for

SPECT Heart dataset with initial size of [14 4 5 1 2].

FIGURE 5.14 Performance of Evolutionary ANN for

Breast Cancer dataset with initial size of [11 4 5 1 2].

Table 5.6 Results of SPECT Heart dataset.

For Pima-India classification the average

mean square error is 8.6214e-3. During the

training the network is adjusted according to

its error, whereas the test process provides

an independent measure of network

performance during and after training. The

best solution found in less than 50

generations with = 0.1,

= 0.05,

= 0.1 and

= 0.1. With another network

size of [9 5 4 1 2] is also shown in the Table

5.4 with minimum hidden nodes of 3 in a

single hidden layer. The results of the heart

dataset are as shown in Table 5.6 and a

comparisons with literature is shown in

Table 5.7. Ten runs are executed and the

Parameter

Experimental

Results

Number Of Runs 10 10

Number Of Generations 40 61

Number of Training patterns

used 500 500

Average Training Set

Accuracy 76.0 76.5

Number of Test patterns used 268 268

Average Test Set Accuracy 81.5 83.5

Initial Number of Hidden

layers / Nodes 2 / [4 5] 2 / [5 4]

Final Number of Hidden

layers / Nodes (Resulted NN) 1 / [2] 1 / [3]

Population size 50 50

Number of inputs 09 09

Number of outputs 01 01

Parameter

Experimental

Results

Number Of Runs 10 10

Number Of Generations 90 103

Number of Training patterns

used 200 200

Average Training Set

Accuracy 86.0 87.2

Number of Test patterns used 67 67

Average Test Set Accuracy 85.2 86.5

Initial Number of Hidden

layers / Nodes 2 / [4 5] 2 / [5 4]

Final Number of Hidden

layers / Nodes (Resulted NN) 1 / [3] 1 / [3]

Population size 50 50

Number of inputs 14 14

Number of outputs 01 01

Page 10 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 12: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

average percentage error values of the

training and test process are summarized in

Table 5.6 and 5 trail runs are shown in Fig

(5.13) with an average mean square error of

7.7264e-3. The best solutions reached in less

than 90 generations with = 0.1,

= 0.1, = 0.1 and

= 0.1. With another

network size of [14 5 4 1 2] is also shown in

the table with minimum hidden nodes of 3 in

a single hidden layer. The results of the

Brest Cancer dataset are as shown in Table

5.8 and a comparisons with literature is

Table 5.8 Results of Breast Cancer dataset.

shown in Table 5.9. Ten runs are executed

and the average percentage error values of

the training and test process are summarized

in Table 5.8 and 5 trail runs are shown in

Fig (5.14) with an average mean square error

of 5.3614e-3. The best solution reached in

less than 45 generations with = 0.05,

= 0.05,

= 0.05 and = 0.05 in all

runs. With another network size of [11 5 4 1

2] is also shown in the table with minimum

hidden nodes of 3 in a single hidden layer.

CONCLUSION:

The optimal weights in ANN in the phase of

learning have obtained by using the concept

of evolutionary genetic algorithm.

Determination of optimal architecture and

weights in ANN in the phase of learning has

obtained by using the concept of

evolutionary genetic algorithm. Proposed

method of both architecture and weights

adjustment has shown outperform at every

level for 2-Bit and 4-Bit parity compared to

the fixed network Back-Propagation and real

dataset classification problems reached the

excellent percentage of accuracy and

optimized network with less number of

hidden nodes and layers of less probability. .

REFERENCES:

[1] Michalewicz, Z. Genetic algorithms + data

structures = evolution programs. Springer-Verlag.

1996.

[2] M¨uhleinbeim, H. and Mahnig, T. Evolutionary

computation and beyond. In Y. Uesaka, P.

Kanerva, and H. Asoh, editors, Foundations of

Real-World Intelligence, CSLI Publications, pp.

123-188, 2001.

[3] Mitavskiy B. Crossover Invariant Subsets of the

Search Space for Evolutionary Algorithms.

Evolutionary Computation.

http://www.math.lsa.umich.edu/vbmitavsk/

[4] J. H. Holland, "Adaptation in Natural and

Artificial Systems. Ann Arbor", MI: Univ. of

Michigan Press, 1975.

[5] Coffey, S. An Applied Probabilist’s Guide to

Genetic Algorithms. A Thesis Submitted to The

University of Dublin for the degree of Master in

Science, 1999.

[6] Mac Lane, S. Categories for the working

mathematician. Graduate Texts in Mathematics

5, Springer-Verlag. 1971.

[7] Poli, R., Stephens, C., Wright, A., Rowe, J. A

Schema-Theory-Based Extension of Geiringer’s

Theorem for Linear GP and variable-length GAs

under Homologous Crossover, (2002).

[8] Vose, M. Generalizing the notion of a schema in

genetic algorithms Artificial Intelligence, 50(3):

385-396, 1991.

Parameter

Experimental

Results

Number Of Runs 10 10

Number Of Generations 45 52

Number of Training patterns

used 400 400

Average Training Set

Accuracy 97.0 97.0

Number of Test patterns used 240 240

Average Test Set Accuracy 98.5 98.5

Initial Number of Hidden

layers / Nodes 2 / [4 5] 2 / [5 4]

Final Number of Hidden

layers / Nodes (Resulted NN) 1 / [2 ] 1/ [2]

Population size 50 50

Number of inputs 11 11

Number of outputs 01 01

Page 11 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 13: s1-ln1431401995844769-1939656818Hwf-1896444750IdV51689985614314019PDF_HI0001

For Peer Review O

nly

[9] Radcliffe, N. The algebra of genetic algorithms.

Annals of Mathematics and Artificial

Intelligence, 10:339-384, 1994.

http://users.breathemail.net/njr/papers/amai94.pdf

[10] Geiringer, H. On the probability of linkage in

Mendelian heredity. Annals of Mathematical

Statistics, 15:25-57, 1944.

[11]Vose, M. and Wright, A. The simple genetic

algorithm and the Walsh transform: Part II, the

inverse. Evolutionary Computation, 6(3):275-

289, 1998.

[12]Stephens, C. and Waelbroeck, H. Schemata

evolution and building blocks. Evolutionary

Computation, 7(2):109-124, 1999.

[13] C. Stephens. The Renormalization Group and the

Dynamics of Genetic systems, to be published in

Acta Physica Slovaka, /0210271/ (2002).

http://arXiv.org/abs/condmat/

[14] Wright, A., Rowe, J., Poli, R., and Stephens C. A

fixed point analysis of a gene pool GA with

mutation. Proceedings of the Genetic and

Evolutionary Computation Conference (GECCO)

Morgan Kaufmann. 2002.

http://www.cs.umt.edu/u/wright/.

[15] Jun He, Xin Yao Drift analysis and average

time complexity of evolutionary algorithms;

Artificial Intelligence 127, 57–85, 2001.

[16]T. Chen, J. He, G. Sun, G. Chen, X. Yao, A new

approach to analyzing average time complexity

of population-based evolutionary algorithms on

unimodal problems, IEEE Trans. Syst., Man, and

Cybern., Part B 39 (5), 1092_1106, 2009.

[17] D.J. Newman, S. Hettich, C.L. Blake, and

C.J. Merz. UCI repository of machine learning

databases, 1998.

[18] M. Hutter, S. Legg, Fitness uniform

optimization, IEEE Trans. Evol. Comput. 10 (5)

568_589.2006.

[19] Liepins, G. and Vose, M. haracterizing cross-

over in Genetic Algorithms. Annals of

Mathematics and Artificial Intelligence, 5: 27 -

34.(1992).

Page 12 of 12

URL: http://mc.manuscriptcentral.com/teta

Journal of Experimental & Theoretical Artificial Intelligence

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960