Introduction to Neural Networks

Introduction to Neural Networks

John Paxton

Montana State University

Summer 2003

Chapter 4: Competition

• Force a decision (yes, no, maybe) to be made.

• Winner take all is a common approach.

• Kohonen learningwj(new) = wj(old) + (x – wj(old))

• wj is closest weight vector, determined by Euclidean distance.

MaxNet

• Lippman, 1987

• Fixed-weight competitive net.

• Activation function f(x) = x if x > 0, else 0.

• Architecture

a1 a2

-1

1

Algorithm

1. wij = 1 if i = j, otherwise –

2. aj(0) = si, t = 0.

3. aj(t+1) = f[aj(t) –*k<>j ak(t)]

4. go to step 3 if more than one node has a non-zero activation

Special Case: More than one node has the same maximum activation.

Example

• s1 = .5, s2 = .1, = .1

• a1(0) = .5, a2(0) = .1

• a1(1) = .49, a2(1) = .05

• a1(2) = .485, a2(2) = .001

• a1(3) = .4849, a2(3) = 0

Mexican Hat

• Kohonen, 1989

• Contrast enhancement

• Architecture (w0, w1, w2, w3)

• w0 (xi -> xi) , w1 (xi+1 -> xi and xi-1 ->xi)

xi-3 xi-2 xi-1 xi xi+1 xi+2 xi+3

0 - + + + - 0

Algorithm

1. initialize weights

2. xi(0) = si

3. for some number of steps do

4. xi(t+1) = f [ wkxi+k(t) ]

5. xi(t+1) = max(0, xi(t))

Example

• x1, x2, x3, x4, x5

• radius 0 weight = 1• radius 1 weight = 1• radius 2 weight = -.5• all other radii weights = 0• s = (0 .5 1 .5 0)• f(x) = 0 if x < 0, x if 0 <= x <= 2, 2

otherwise

Example

• x(0) = (0 .5 1 .5 1)

• x1(1) = 1(0) + 1(.5) -.5(1) = 0

• x2(1) = 1(0) + 1(.5) + 1(1) -.5(.5) = 1.25

• x3(1) = -.5(0) + 1(.5) + 1(1) + 1(.5) - .5(0) = 2.0

• x4(1) = 1.25

• x5(1) = 0

Why the name?

• Plot x(0) vs. x(1)

x1 x2 x3 x4 x5

2

1

0

Hamming Net

• Lippman, 1987

• Maximum likelihood classifier

• The similarity of 2 vectors is taken to be n – H(v1, v2)

where H is the Hamming distance

• Uses MaxNet with similarity metric

Architecture

• Concrete example:

x1

x2

x3

y2

y1

MaxNet

Algorithm

1. wij = si(j)/2

2. n is the dimensionality of a vector

3. yin.j = xiwij + (n/2)

4. select max(yin.j) using MaxNet

Example

• Training examples: (1 1 1), (-1 -1 -1)

• n = 3

• yin.1 = 1(.5) + 1(.5) + 1(.5) + 1.5 = 3

• yin.2 = 1(-.5) + 1(-.5) + 1(-.5) + 1.5 = 0

• These last 2 quantities represent the Hamming distance

• They are then fed into MaxNet.

Kohonen Self-Organizing Maps

• Kohonen, 1989

• Maps inputs onto one of m clusters

• Human brains seem to be able to self organize.

Architecture

x1

ym

y1

xn

Neighborhoods

• Linear 3 2 1 # 1 2 3

• Rectangular 2 2 2 2 2 2 1 1 1 2 2 1 # 1 2 2 1 1 1 2 2 2 2 2 2

Algorithm

1. initialize wij

2. select topology of yi

3. select learning rate parameters

4. while stopping criteria not reached

5. for each input vector do

6. compute D(j) = (wij – xi)2

for each j

Algorithm.

7. select minimum D(j)

8. update neighborhood units wij(new) = wij(old) + [xi – wij(old)]

9. update 10. reduce radius of neighborhood

at specified times

Example

• Place (1 1 0 0), (0 0 0 1), (1 0 0 0), (0 0 1 1) into two clusters

(0) = .6(t+1) = .5 * (t)• random initial weights

.2 .8

.6 .4

.5 .7

.9 .3

Example

• Present (1 1 0 0)

• D(1) = (.2 – 1)2 + (.6 – 1)2 + (.5 – 0)2 + (.9 – 0)2 = 1.86

• D(2) = .98

• D(2) wins!

Example

• wi2(new) = wi2(old) + .6[xi – wi2(old)]

.2 .92 (bigger)

.6 .76 (bigger)

.5 .28 (smaller)

.9 .12 (smaller)

• This example assumes no neighborhood

Example

• After many epochs

0 1 (1 1 0 0) -> category 20 .5 (0 0 0 1) -> category 1.5 0 (1 0 0 0) -> category 21 0 (0 0 1 1) -> category 1

Applications

• Grouping characters

• Travelling Salesperson Problem– Cluster units can be represented graphically

by weight vectors– Linear neighborhoods can be used with the

first and last cluster units connected

Learning Vector Quantization

• Kohonen, 1989

• Supervised learning

• There can be several output units per class

Architecture

• Like Kohonen nets, but no topology for output units

• Each yi represents a known class

x1

ym

y1

xn

Algorithm

1. Initialize the weights

(first m training examples, random)

2. choose 3. while stopping criteria not reached do

(number of iterations, is very small)

4. for each training vector do

Algorithm

5. find minimum || x – wj ||

6. if minimum is target class

wj(new) = wj(old) + [x – wj(old)]

else

wj(new) = wj(old) – [x – wj(old)]

7. reduce

Example

• (1 1 -1 -1) belongs to category 1• (-1 -1 -1 1) belongs to category 2• (-1 -1 1 1) belongs to category 2• (1 -1 -1 -1) belongs to category 1• (-1 1 1 -1) belongs to category 2

• 2 output units, y1 represents category 1 and y2 represents category 2

Example

• Initial weights (where did these come from?

1 -11 -1-1 -1-1 1

= .1

Example

• Present training example 3, (-1 -1 1 1). It belongs to category 2.

• D(1) = 16 = (1 + 1)2 + (1 + 1)2 + (-1 -1)2 + (-1-1)2

• D(2) = 4

• Category 2 wins. That is correct!

Example

• w2(new) = (-1 -1 -1 1) + .1[(-1 -1 1 1) - (-1 -1 -1 1)] =

(-1 -1 -.8 1)

Issues

• How many yi should be used?

• How should we choose the class that each yi should represent?

• LVQ2, LVQ3 are enhancements to LVQ that modify the runner-up sometimes

Counterpropagation

• Hecht-Nielsen, 1987

• There are input, output, and clustering layers

• Can be used to compress data

• Can be used to approximate functions

• Can be used to associate patterns

Stages

• Stage 1: Cluster input vectors

• Stage 2: Adapt weights from cluster units to output units

Stage 1 Architecture

x1

xn zp

z1

ym

y1

w11 v11

Stage 2 Architecture

x*1

x*n

zj

y*m

y*1tj1 vj1

Full Counterpropagation

• Stage 1 Algorithm

1. initialize weights, 2. while stopping criteria is false do

3. for each training vector pair do

4. minimize ||x – wj|| + ||y – vj||wj(new) = wj(old) + [x – wj(old)]vj(new) = vj(old) + [y-vj(old)]

5. reduce

Stage 2 Algorithm

1. while stopping criteria is false

2. for each training vector pair do

3. perform step 4 above

4. tj(new) = tj(old) + [x – tj(old)]

vj(new) = vj(old) + [y – vj(old)]

Partial Example

• Approximate y = 1/x [0.1, 10.0]

• 1 x unit

• 1 y unit

• 10 z units

• 1 x* unit

• 1 y* unit

Partial Example

• v11 = .11, w11 = 9.0• v12 = .14, w12 = 7.0• …• v10,1 = 9.0, w10,1 = .11

• test .12, predict 9.0.

• In this example, the output weights will converge to the cluster weights.

Forward Only Counterpropagation

• Sometimes the function y = f(x) is not invertible.

• Architecture (only 1 z unit active)

x1

xn zp

z1

ym

y1

Stage 1 Algorithm

1. initialize weights, (.1), (.6)

2. while stopping criteria is false do

3. for each input vector do

4. find minimum || x – w||

w(new) = w(old) + [x – w(old)]

5. reduce

Stage 2 Algorithm

1. while stopping criteria is false do2. for each training vector pair do3. find minimum || x – w ||

w(new) = w(old) + [x – w(old)]v(new) = v(old) + [y – v(old)]

4. reduce

Note: interpolation is possible.

Example

• y = f(x) over [0.1, 10.0]

• 10 zi units

• After phase 1, zi = 0.5, 1.5, …, 9.5.

• After phase 2, zi = 5.5, 0.75, …, 0.1

Introduction to Neural Networks

Documents

Transcript of Introduction to Neural Networks