A class of pattern recognition machines

56
Scholars' Mine Scholars' Mine Masters Theses Student Theses and Dissertations 1967 A class of pattern recognition machines A class of pattern recognition machines Shaw Yih Chung Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses Part of the Electrical and Computer Engineering Commons Department: Department: Recommended Citation Recommended Citation Chung, Shaw Yih, "A class of pattern recognition machines" (1967). Masters Theses. 5167. https://scholarsmine.mst.edu/masters_theses/5167 This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected].

Transcript of A class of pattern recognition machines

Page 1: A class of pattern recognition machines

Scholars' Mine Scholars' Mine

Masters Theses Student Theses and Dissertations

1967

A class of pattern recognition machines A class of pattern recognition machines

Shaw Yih Chung

Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses

Part of the Electrical and Computer Engineering Commons

Department: Department:

Recommended Citation Recommended Citation Chung, Shaw Yih, "A class of pattern recognition machines" (1967). Masters Theses. 5167. https://scholarsmine.mst.edu/masters_theses/5167

This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected].

Page 2: A class of pattern recognition machines

f

A CLASS OF PATTERN RECOGNITION MACHINES

By

Shaw Yih -- Chung - "f/i) -

A

THESIS

submitted to the faculty of the

UNIVERSITY OF MISSOURI AT ROLLA

in partial fulfillment of the work required for the

Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

Rolla, Missouri

1967

Approved By

I ;LQ f')J

J'tJ;:; C. I

':' /it,d ~ 1\afl'! (advisor)

Page 3: A class of pattern recognition machines

ACKNOWLEDGEMENT

The author wishes to express his appreciation to his

advisor, Dr. F. J. Kern, of Electrical Engineering Depart­

ment of University of Missouri at Rolla for his helpful

guidance throughout this study.

Special thanks are due to Dr. J. R. Betten, Professor of

Electrical Engineering, for his encouragement in initiating

this research.

The author also wishes to express his thanks to Dr. T. L.

Noack for his valuable help and comments.

iL

Page 4: A class of pattern recognition machines

TABLE OF CONTENTS

ACKNOWLEDGEMENT . . .

LIST OF ILLUSTRATIONS .

ABSTRACT

LIST OF SYMBOLS

CHAPTER 1 - INTRODUCTION

CHAPTER 2 - A SPECIAL CLASS OF PATTERN

CHAPTER 3 - THE STRUCTURE OF RECOGNITION MACHINE

CHAPTER 4 - SIMULATION OF THE PATTERN RECOGNITION MACHINES •. . . • . . . .

BIBLIOGRAPHY

APPENDIX A - LE~RNING BY THE RECOGNIT.JQ~ MACHINE

APPENDIX B - CALCULATION OF MISRECOGNITION RATE

APPENDIX C - FLOW CHART OF THE SIMULATION

VITA

iii

Pages

ii

iv

v

vii

1

8

12

17

28

30

36

44

50

Page 5: A class of pattern recognition machines

ABSTRACT

The classification of signals through the use of pattern

recognition techniques may be viewed as a statistical class­

ification problem since at least some pattern classes are

analog signals which have been contaminated by additive noise.

In order to represent these signals at the input of a pattern

classifier they are sampled and a vector and an m-dimensional

vector space used to describe the pattern. Consequently,

these pattern samples are processed to determine the parameters

of the statistics. In this study the additive noise was assumed

to be gaussian in nature.

This study was based on a priori knowledge of the learning

samples. That is to say, learning with a teacher was investi­

gated. The three classes of patterns used in the experimental

program were generated through computer simulation by adding

normally distributed random numbers to previously generated

signal classes. The sampled signals thus generated were

classified by means of maximum likelihood detectors.

The study was divided into two parts. In the first part

the statistics of the noise were calculated and the patterns

were classified on the basis of these statistics with no

learning taking place. In phase two of the study,learning

behavior was observed when a sequential calculation was made

to determine the noise statistics. It was shown that the

probability of mis-recognition of a given pattern asymptotically

v

Page 6: A class of pattern recognition machines

approached the theoretical minimum as the number of learning

samples increased. The probability of the misrecognition of

the patterns was also considered as a function of signal to

noise ratio and correlation between patterns. A theoretical

prediction of this probability of misrecognition was compared

with the results of simulation and found to agree closely.

vi

Page 7: A class of pattern recognition machines

List of Symbols

A Constant

c Constant

a. . Member of a covariance matrix ~J

d(So) Decision rule

I Identity matrix

- K Covariance matrix

m Dimensionality of the vector space

M Mean vector

M(k) Scalar component of a mean vector

n Number of pattern classes

N Noise vector

L(X) Likelihood function

p(X) Probability function

P(S/X) Condit~probability function

Po A priori probability of binary error

Q Threshold value

Pe Probability of error in recognition between two pattern

classes, binary error rate

PE Over all probability of misrecognition

qo A priori probability of each pattern class

S(t) Pattern sample in the form of a continuous time function

S Pattern sample in vector form

t Time variable

T Observation time of one sample

vii

Page 8: A class of pattern recognition machines

Ui Scalar component of a sample vector

V Logarithm of the likelihood function

X Unknown pattern sample, or input stimulus

a Probability of· error of the first kind

a Probability of error of the second kind

o(t-to ) Dirac delta function

~ Covariance matrix of random signal

~ Mean of a random signal

p Correlation between patterns

cr 2 Noise variance

~ Null set

A(x) Likelihood ratio function

C

e

Cost of making error of the first kind

Cost of making error of the second kind

Random variable representing unknown parameter or

parameters

IAI Determinant of a matrix

viii

Page 9: A class of pattern recognition machines

1

CHAPTER 1 - INTRODUCTION

The natural world, due to its random behavior, presents

humans with a variety of phenomena which are well categorized

and recognized by people as belonging to one class or to a set

of different classes. This recognition is generally accomplish­

ed through a learning of the various patterns. Hence, one may

tell the difference between "lion" and "tiger" or "All and "BII.

Any object in the world may be described as a distinct pattern

provided one is willing to assign an arbitrarily long descrip­

tion to it. Because of the size and complexity of this problem,

one would expect the "hardware" which could do the job of

pattern recognition would be large and complex. Furthermore,

the realization of a machine which will exhibit the human

behavior which is called "learning" is difficult to realize

on other than a simulation basis. In the beginning, at least,

no attempt will be made to discriminate between "automobile"

and "tiger" but rather between "lion" and "tiger". The differ­

ence seen here would be that lion and tiger are in the same

general class of natural objects whereas automobile and tiger

are not. Consequently, this study will be concerned with the

recognition of pattern samples coming from the same set of

natural objects.

Other difficulties arise in the processing of patterns

by pattern recognition machines. In order to be able to dis­

tinguish "tiger" from "lion ll one must provide a proper encoding

procedure so that a viable mathematical description of the

Page 10: A class of pattern recognition machines

2

pattern to be recognized is provided to the machine. A well

designed pattern recognition machine can recognize different

patterns only if the proper sensory equipment is available or,

alternatively, the proper encoding procedure for the various

patterns is done by a human prior to the presentation of the

patterns to the input terminals of the machine. For the moment,

suppose a pattern recognition machine has been built and is

available. Further suppose that an unknown pattern name X

is presented to it. The function of the machine is now to

identify the pattern as belonging to one of the pattern classes

which the machine is constructed to recognize.

where X is the

N is the

n is the

S is the

X = S. + N ~

i = 1, 2, ... , n

unknown sample

contaminating noise signal

number of pattern categories,

pattern class

and

Since,

(1)

the processing of the input signal done by the machine will

be statistical in nature. This type of calculation arises

because equation (1) defines X as a random variable.

The development of a statistical decision theory and the

advent of the readily available large scale computing facili­

ties have obviated the need to actually construcit specialized

"hardware ll to deal with pattern recognition problems. It is

Page 11: A class of pattern recognition machines

much more economical and much simpler to simulate these

machines on a general purpose digital computer. The

algorithms necessary to implement this type of pattern

recognition owe their development, by and large, to the

recent interest in implementations of statistical decision

3

theory. A large amount of literature, most of it since 1960 ,

may be found which pertains to the statistical pattern recog-

nl.'tl.'on problem (9) and (10) M ' t' t 'th" f' ld any l.nves l.ga ors 1n 1S l.e

( l) I ( 2) I an d ( 3 ) d t th ' d f th 1 b ' a op e l. ea 0 e samp e as el.ng repre-

sented by a vector in a multi-dimensional vector space. The

classification problem then reduces to a partitioning problem

in a multi-dimensional vector space; (1) the dimension of the

space being the same as the number of sampling points per

pattern. These sampling points may be the result of a sampling

process carried out on a continuous time function or might

simply represent a proper encoding of features which go to

make up a particular pattern. The classification of these

patterns may then be done by partitioning the resultant finite

dimensional vector space so that all members of one class fall

inside a given boundary and the space is divided into disjoint

subspaces. Each subspace will contain only members of one

particular given class. (I)

Pattern recognition schemes differ from one another de-

pending on how one approaches the problem of partitioning

the space. Many ~nvestigators have defined optimal knowledge

about the samples. Optimal schemes for space partitioning

Page 12: A class of pattern recognition machines

.C3Y have been developed using hyperplanes. .• These schemes

revolve around the use of variable threshold logic units as

the basic building block of the pattern recognition machine

and are preferred in situations where the signal is noise-

less. That is to say, this type of machine is employed in

instances where the pattern space is linearly separable.

Alternate schemes along this same avenue of approach have

( 31 been developed using hyperspheres instead of hyperplanes.

When the pattern space is not linearly separable the threshold

logic approach to pattern recognition may notxesultm machines

which will perfectly classify a set of patterns. The problem

then becomes one of maximizing the probability of classifica­

tion of a given pattern correctly. Thus, when the patterns

4

contain random noise or other contaminating components, pattern

recognition may be viewed asa statistical classification prob-

lem in m-dimensional vector space. The recognition problem is

thus tied in with the existence of probability density functions

which will describe the given patterns and their associated

noise components. This is necessary so that one may build a

statistical model which adequately represents the . recognition (10), (4) .

machine to be simulated. It has been pointed out

the maximum likelihood ratio detection scheme using linear

threshold detectors is an optimal classification procedure

if the noise is uncorrelated with the signal and gaussian in

nature.

Page 13: A class of pattern recognition machines

5

There are two important phases in pattern recognition

problems, learning and decision. In the learning phase the

pattern recognition machine must adapt itself to obtain the

threshold weights for determining hyperplanes or alternatively,

the statistics of the pattern class so that the probability

of classification may be raised to an acceptable level. Two

types of learning may be simulated. The first is learning

with a teacher and the second is learning without a teacher.

(7), (8) For the former, the decision boundaries are labeled

so that one knows in advance what class a particular pattern

belongs to during the learning phase. This is shown in

figure lA.

m-dimensional space

Fig. lA Learning with a teacher

,,.._--- , 'S •

S~~~- .. • \. y \ t X ) '.. ,-" ,.,~ .... ~ -..... ' ................ ,

-"';!, , r I

,," S } 1" I Z / .. ---- I I ,. . ~ J , ,,;. J ,_# , #-~ ,-_ ...

m-dimensional space

Fig. lB Learning without teacher

In learning without a teacher the recognizer is completely

isolated from the source generating the pattern classes so

that the decision boundaries are not known and must be

determined. This type of situation is shown in figure lB.

Comparing the two figures one notes the rather obvious advan­

tages of learning with a teacher. A more complicated and

adaptive scheme will be required for learning without a

Page 14: A class of pattern recognition machines

6

teacher, since the decision boundaries mayor may not be

hyperplanes. The hyperplane separation does not necessarily

determine an optimal decision boundary. In figure Ie a

optimal hyperplane may be found while in figure lD one notices

that it may not.

Hyperplane

m-dimensional space

Fig. Ie Linearly separable patterns

m-dimensional space

Fig. lD Linearly inseparable patterns

For instance, if the patterns are a normal random process

with equal covariance matrices, an optimal recognition scheme

may be structured by finding a proper hyperplane separating

regions of these classes. The probability of error in recog-

nition of each class would be a minimum when the optimal

decision boundary were found.

Hyperplane

m-dimensional space

Fig. 2A An optimal decision boundary

Hyperplane

m-dimenslonal space

Fig. 2B A non-optimal decision boundary

As may be seen by comparing figures 2A and 2B, the optimal

decision boundary may not always be a hyperplane, but may

rather .beci.hyperbola or parabola or even possibly a wavy line

Page 15: A class of pattern recognition machines

7

of some sort. In more complicated situations involving more

than two classes, learning without a teacher may be accom­

plished provided one first determines the statistical properties

of the patterns to be classified. Both parametric and non­

parametric techniques have been developed whereby one may

"learn" these statistics by sequentially processing a series

of patterns during the learning phase.

Once the learning phase of the pattern recognition se­

quence has been completed the decision phase is entered. In

this phase additional patterns from the same sample set may

be classified while the structure of the machine is kept

fixed. In the decision phase then the structured machine may

be used to classify patterns from the same sample set even

though they were not included in the set of patterns used

during the learning phase of the recognition process. Num­

erous studies have shown that in practical situations the

structured pattern recognizer possesses predictive properties

which enable it to classify patterns quite reliably once the

learning phase has been properly carried out.

Page 16: A class of pattern recognition machines

8

CHAPTER 2

A Special Class of Pattern Recognition

In application the patterns to be classified are pro-

duced as the result of a proper encoding procedure. One

practical example of this procedure would be the sampling

of a continuous time function within a given time window

duration T. This study will concern itself with the class­

ification of signals of this type. Pattern examples may be

thought of as a sequence of m sample values expressible as:

8 1 . j (~) , 8 1 j (t) · .... , 8 1

j (t) 2 ' m

8 2 . j(t,), 8 2 j (.~) , · .... , 8 2 j (~ (2)

..

. . . . . . . . . S j (t~ , 8 n i (\) , · .... , 8n .

j (tl n

0 < t < T -j = 1,2, ... ,p

The number of pattern examples in each class need not be

the same; however, it is convenient if the same number of

examples are selected from each different pattern category.

The pattern represented in (2) may be treated as vectors in

an m-dimensional vector space.

s j 1

. j j = {ull I u 12 I • • • ,

Page 17: A class of pattern recognition machines

9

S2 j

= {u2l j u 22

j u j} , , . . . , 2m

. . . . . . . . (3 )

S j ={unl j un2

j u j} n , I . . . , nm

For simplicity, the i-th sample in vector form will be

simply denoted by the capital letter, s.j. In equation (1) l.

it was seen that patterns were cor.rupte.q b.y· Gau~siann·Q.ii~

which will be presumed Gaussian in nature. In view of this,

(3) may be rewritten as follows:

Sj=Mj+N 1 1

S j = M j + N 2 2

s j = M j + N n n

( 3-1)

where M is the mean vector and N is the noise signal which

is presumed to be stationary and Gaussian with zero mean value.

Thus, ---r

S. J = 1.

M. j + N 1.

= M. j + N i = 1,2, ... ,n l.

Since, N = 0

and it is assumed that

I

M. J N = 0 for all i l.

it follows that

--r

M. J = M. l. 1.

(4)

Page 18: A class of pattern recognition machines

S) = M. i=l,2, ••. ,n

1 1

and M. is seen to be the mean vector. 1

Mik = Lim p+oo

P 1 \' ; . - [. J P . u' k . 11 1=

k=1,2, •.• ,m .

~ __ ~ __ ~ ____ ~ __ -L __ ~ __ ~t o M. !II T

Fig. 3 An Ensemble of Sampled Signals and Their Mean.

10

Likewise, N may also be denoted as an m-dimensional vector,

N = Nl

,N 2 , .•. ,Nm

. The covariance matrix may be found (9)

from the relation

K = Nt N

or K =

Page 19: A class of pattern recognition machines

Where each term in covariance matrix is defined to be:

a = N N = (u. -M.) ( u . . -M.) ;::; a pq p q lp l . lq . l · qp

i = l,2, .•• ,n

.p ,q = 1, 2 , • • • , m

Note that covariance is real symmetric and that

a .. > 0 II

for all i

11

It can be shown (14) that the covariance matrix is positive

definite. A real symmetric matrix "All is said to be positive

definite if there exists a nonsingular matrix X such that

xtAX = I, or equivalently, }lA x>O. Furthermore, a matrix is

positive definite if all leading principal minors are positive (IS) .

It follows that the matrix A- l is also positive definite

Let X = A-ly

xt ;::; yt A- l

then xt A X = (yt A-I) A (A- l y) = yt A-I y>O (10)

It is clear that K- l is also positive definite. This is

essential to the maximum likelihood decision scheme which

will be considered.

Page 20: A class of pattern recognition machines

12

Chapter 3

The Structure of the Recognition Machine

It can be shown that for a Gaussian process, Bayes'

rule is an optimal (4),(10) decision scheme which minimizes

the probability of misrecognition.

Bayes' rule is:

d(X) = 8 0 , that is, the unknown pattern, X~ 8 , is recognized

as a member of pattern class So.

d(X) = So

if peso/X) > P (Si/Y) for all i ~ 0 ( 11)

Thus, the pattern Sois the pattern with the largest a posteriori

probability from among all a posteriori probabilities in order

to make the decision in (11). All patterns are compared with

the unknown pattern X. So, the a posteriori probability can

be described in a more detailed form as:

P (S./X,{S.}) 1. J

i,j = l,2, ..... ,n (12)

The same a posteriori probabilites are preserved when the

sample means rather than the samples themselves are used in

(12). In statistical terminology, the set of sample means

M. , j=1,2, ... ,n, is a sufficient statisticClO)forthepararreters. J

Equation (12) then becomes:

( /' { }) PCS.) P(X,{M.} /8.)

P S. X, M. = 1. J 1. 1. J P{X,{M.})

J ( 13)

i,j = 1,2, ....• ,n

Page 21: A class of pattern recognition machines

Suppose that the occurrence of the events Sl' S2' ••..• , or

Sn is equally likely, i.e. the a priori probability of appear­

ance is the same for all patterns.

The joint probability density can be written as: n

P(X~{M.}) = L P (S . ) P(X,{M.}/S.) ] i=l ~ J ~

n = L P (S . ) P({M. }/S.) P(X/{M

j}, s. )

i=l ~ J 1 ~

The first two terms are obviously not related to the

occurence of X. The third term may vary for different s. ~

(15 )

but the summation over all pattern space will be simply the

a priori probability of its occurrence. This indicates that

the quantity on the left side of (15) is a constant.

Let P (X, {Mj }) = A2

Substitute (14) and (15-1) into (13).

It follows that

also

Since

P(S./X,{M.}) 1 ]

A3 = Al/A2

= A3 P (X, {M. } /S . ) J ~

P(X,{M.}/S.) = P({M.}/S.) P(X/{M.},S.) ] ~ ] ~ ] ~

M."E- {M. } 1 ]

(16)

( 17)

and the probability function in (17) is conditional on Si'

only Mi will influence its formation. The first term on the

Page 22: A class of pattern recognition machines

14

right side of (17) which is the distribution of all means

is presumed to be known a priori. Thus

P ( M. /S.) = P ( M. ) J 1. J

j = 1,2, •.. ,n

= a priori distribution of the means.

= A4

This may be learned by following a scheme which is presented

in Appendix A.

(16) can be rewritten as:

P(S./X,{M.}) = A P (X/M. ,S.) 1. J 1. 1.

where A = A3A4

Thus, P(S./X,M.) is a monotone function of P(X/M.,S.) (18) 1. J 1. 1.

and is called the likelihood probability function. The

decision rule of (11) will be based on this likelihood function

instead of (12),

Le. d(X) = So

P (X/So ,Mo) > P (X/S. ,M.) 1. 1.

i = 0 is a trivial exception.

(18-1)

Now, X is a Gaussian vector in m-dimensional space, therefore,

it will have a multi-dimensional probability distribution (4)

which is given by the equation,

P (X/M. , S.) = 1 exp [_.l- (X-M.) tK-l (X-M. ) ] (19) 1. 1. (21T)m/2 1K.1 1 /:.! 2 1. ~

The information regarding Mi is contained in 8 1 , Hence, the

notation of the likelihood probability function will be simpli-

fied hereafter to

Page 23: A class of pattern recognition machines

16

It is indicated that the maximum value of (20) occurs when

the unknown pattern sample has exactly the same mean as the

i-th pattern class, though it was not likely to happen.

A protctype recognition scheme can be realized by using

the decision rule outlined by (18-1) in connection with the

likelihood function developed in Equ. (20). It was shown in

Fig. 4.

In this study, three pattern classes were generated by

the computer. First,implementation of the recognition machine

was done on the assumption that mean and covariance were known

in advance. A complete block diagram of this rec~gnition

with no learning taking place was shown in Fig. 5.

Following this an implementation of the learning algorithm

discussed in Appendix A is investigated.

P (Sl/X

() a ~ S

X f!? (S2/Xl Pl iDecision 11 Pl n-o

I 11 I

I

l?(Sn/X

Fig. 4 Proto type recognition scheme.

Page 24: A class of pattern recognition machines

Ml Sl P (X/S l )

K

x 0 0

A a Pattern 'D S Generator

P (x/s2 )J PI Decision 52 Ii M2 "'I PI

rt ~ 0 X Selector K X Ii

1

M3 S3 "\ P(X/S 3 )

(

K X

X

'--Misrecognition Counting

Fig. 5 Recognition scheme -- Mean and covariance were calculated beforehand.

j-:-I

-...J

Page 25: A class of pattern recognition machines

18

Chapter 4

Simulation of the Pattern Recognition Machines

(a) Generation of the patterns

Imagine that the sets of pattern vectors in Fig. 4 are

the outputs of a random channel, whose input is the "TRUE"

pattern. Assume that the non-dispersed pattern in its

deterministic form can be described by an algebraic expression.

A typical E.K.G. waveform (13) shown in Fig. 6 will serve a

convenient example for discussion.

R

p T

Fig. 6 A typical E.K.G. waveform

In usual E.K.G. measurement practice, the data measured from

the human body, which can be viewed as a random channel, would

tend to be normally distributed. For the convenience of sim-

ulation, only the QRS part of the wave will be concerned.

This portion m~ght be approximated by a triangular configura­

tion which is expressible by the algebraic equations:

y - x 1 -

= 100 - x

0<x<50

(23) 50<x<lOO

Page 26: A class of pattern recognition machines

This function is shown in Fig. 7

50 == X 0<X<50

== (IOO-X) 50<x<IOO

o 50 100

Fig. 7 Pattern "1"

Physical abnormality or disease will distort the normal

wave so that patterns are produced. These patterns are

19

generally per.ceived as distortion of the E.K.G. pattern. For

simulation purposes, these distortions may be represented by

2 x

2 == Kl (IOO-x)

As shown in Fig. 8

0' 5!l. 100..

Fig. 8 Pattern

and also

sin TTX Y3 == K2 (100)

O<x<50

50<x<100

112"

Y2=K~X2 0 X 50

2 ==K l (100-X) 50<X<100

O<x<100

(24)

(25)

Page 27: A class of pattern recognition machines

As is shown in Fig. 9

a

Fig. 9 Pattern" 3 11

In Equ. 24, 25 Y2 and Y3 must be normalized so that

= =

T = 100

Where Yl' Y2 and Y 3 are generated by (23), (24) and (25)

respectively. Using Equ. (26), 1

(5/3)2"/ 50 1

= (2/3}2 50

it was found that

20

(26)

satisfy (26) so that the patterns are of equal energy. Math-

ematical complications are reduced in calculating probability

of error (Appendix B) when the patterns are so normalized.

The list of categories in this simulation program are assumed

exhaustive, i.e., all possible outcomes are included.

Thus, if X -E- S (27)

and

n equals 3 in the problem being studied.

The patterns described by (23) I (24), (25) are immersed in

a random environment so that

Page 28: A class of pattern recognition machines

21

8 1 = Yl + N ( 0, 0 2 )

8 2 = Y2 + N ( 0, 0 2 ) (28)

8 3 = Y3 + N ( 0, 0 2 )

The random function is Gaussian distributed with zero mean

and variance 0 2 • In order to simulate the patterns described

by (28) a complete computer program conducting the simulation

and recognition process with its flow chart was included in

Appendix c.

Page 29: A class of pattern recognition machines

22

(b) Simulation:· ,R.es.ults ·

A normal distribution of the random function was assumed

since the beginning of this study. Thus, the simulation pro-

gram has to generate these patterns described by (28) with

normal statistics. It has been shown (16) that sum of any n

independent random functions is normally distributed when n

is sufficiently large.

Let Xl' x 2 , ••• ,xn be a sequence of n independent random func­

tions with mean U and variance cr 2 •

Let No be a new random function

that x = x 1 + x 2 + . . . + X (29) n x - nll No = (30) .; ncr 2

then No has approximately a normal distribution N(O,l), i.e.,

with mean zero and variance "1".

TO avoid unnecessary computations, a proper value of n should

be used. For uniform random function distributed between (0,1)

p (x) = 1 0 < x < 1

= 0 elsewhere

then f!x P (x) dx = fol X dx = 1

J1 = '2

cr 2 f 1 2 dx := I:X2dX

1 = oX p (x) =12"

if chooses n = 12

then (30) reduces to

No = x - 6

( 31)

Page 30: A class of pattern recognition machines

In general, only uniformly distributed random functions

are available as a standard subroutine. Hence, by using

(31), a normally distributed function with mean "0" and

variance Ill" can be achieved approximately. For differ-

23

ent noise levels, (31) is to be multiplied with a factor.

N = a No ( 31-1)

a> 1

The two important phases of the study were, first, the recog­

nition of patterns without learning taking place; second,

recognition with learning features. In the first phase of this

study, both mean vectors of the pattern classes and covar­

iance matrix were assumed to be known in advance. The

likelihood probability or likelihood function of equation

(20) was calculated for each different pattern class. When

the largest P(X/S) or L(X/S) was found, a decision was made

subject to the decision rule outlined in (l8~1). In this

phase of study, no learning took place since the machine

was allowed to know the pertinent statistics beforehand.

For observing the behavior of the recognition machine,

patterns with different signal to noise ratios were simulated.

It was found that the misrecognition rate monotonically de­

creased when signal to noise ratio increased which agreed with

the theoretical predictions, Equ. (16) which are developed

in Appendix B. The result of the first phase which start no

Page 31: A class of pattern recognition machines

24

learning taking place is shown in Fig. 10. In the second

phase of the study, the recognition machine was then isolated

from the generating source. In the beginning of this phase,

the machine had no knowledge of the mean vector of the pattern

classes; hence, an initial guess on the mean vector was

required for each pattern. The seguential learning procedure

started when a set of learning samples was fed into the

machine which had a learning feature. The initial guess on

the mean was taken as the real mean until the first learning

sample was fed into the learning machine. The recognition

procedure was essentially the same as that in the first phase.

The learning in this phase was directed so far as the learn­

ing samples fed in the machine were labeled; in other words,

the pattern class of these learning samples were notified.

By the use of equ. (20) in Appendix A, the mean ,vector for

each pattern class was updated whenever a new learning sample

was fed into the learning machine. The updated mean in turn

would serve as the "real" mean. So long as the learning was

directed, the mean vectors learned in this manner, converged

to an acceptable level when a few samples were fed in so that

the misrecognition of the system~ymptotically approached the

theoretical minimum as those set by equ. (19) in Appendix B,

as the number of learning samples increased. The result of

the second phase was shown in Fig. 11.

Page 32: A class of pattern recognition machines

PE %

30 PE of Simulation

Theoretical PE

20

10 -90% Confidence Level ---::----

5 6 7 8 9 10 11 12 13 14 15

Fig. 10 Error rate as a function of signal to noise ratio

SiN

tv Ln

Page 33: A class of pattern recognition machines

50 PE

%

40

30

20

10

Fig. 11

Final PE = 3.6 %

1 2 3 4 5 6 7

Variance = 2 Initial Guess {20,20,20,20} for all patterns SIN =15

N

8 9 10 11 12 13 14 15 Number of Samples

Probability of mis-recognition as a function of nu~berof learning samples. .

N m

Page 34: A class of pattern recognition machines

27

In the previous discussion, the covariance matrix was assumed

fixed. Hence, channel characteristics were not changing

during the sequential learning period. Also, the rate of

change of the mean vectors of the pattern classes were assumed

very slow or stationary so that the "RealI! mean will finally

be learned when a considerable number of learning samples are

allowed.

(c) Conclusions and Further Recommendations for study.

The study in its entirety was on a simulation basis. One

can imagine the approach to the problem to be realistic,

subject to the restrictions posed throughout the study. The

sequential learning procedure was much more like an on line

operation. This strongly indicated the possibility of util-

izing this recognition scheme in a more general contex-t. The

removal of some of the restrictions in the theoretical deri-

vation would present no little difficulty. Since most problems

are more or less statistical in nature, the statistics of

other pattern recognition problems might not be the same as

the one here being studied. An optimal scheme for one statistic

may not be useful for the others. The development of a general

optimal recognition scheme should deserve the attentions of

researchers. In the learning phase, statistics were all assumed

fixed or time stationary. It would be more realistic to

remove this restriction so that a machine which could track

wandering patterns (9) ,

might be of more value to pattern recog-

nition problem.

Page 35: A class of pattern recognition machines

1.

BI;BLIOGRAPHY

Sebestyen, G. S., "Pattern Recognition by an Adaptive Process of Sample Set Construction", IRE Transactions, Vol. IT-a, April, 1962.---

28

2. Sebestyen, G. S., "Recognition of Membership in Classes", IRE Transactions on Information Theory,

3.

4.

5.

IT-7, January, 1961.

Cooper, P. W., "The Hyperplane in Pattern Recognition", Information and Control, Vol. 5, 1962.

Wainstain, L. A., and V. D. Zubakov, Extracti"on of Si~nals from Noise, Prentice-Hall, Inc., N. J., 19 2.

Spragins, J. D., Jr., "Reproducing Distributions for Machine Learning", Stanford Electronics Lab., Technical Report No. 6103-7, November 1963.

6. Patrick, E. A., and J. C. Hancock, "Non-supervised Learn­ing of Probability Spaces and Recognition of Patterns", IEEE International Conve"nt"ion Record, April, 1965.

7. Patrick, E. A., and J. C. Hancock, "Non-supervised Sequen­tial Classification and Recognition of Patterns", IEEE Transactions on Information Theory, Vol. I T-12, July 1966.

8. Hancock, J. C., and P. A. wintz, Signal Detection Theory, McGraw-Hill Book Company, 1966.

9. Abramson, N., and D. Braverman, "Learning to Recognize Patterns in a Random Environment",IRE Transactions on Information Theory, Vol. IT-S, April, 1962.

10. Braverman, D., "Learning Filters for Optimum Pattern Recognition", IRE Transactions" on Infbrma:tion Theory, Vol. IT-a, April, 1962.

11. Lowi tz, G. E., "Pattern Recognition Method Based on the Linear Separability of the Signal Space", Wescon 1963, Vol. 7, Part 6.

Page 36: A class of pattern recognition machines

29

12. Fischler, M., R. L. Mattson, O. Firschen, and L. D. Healy, "An Approach to General Pattern Recog­nition", IRE Transactions on Information Theor.Y, Vol. 8, IT-7, April, 1962.

13. Ackerman, E., Biophysical Science, Prentice-Hall, N. J., 1962.

14. Middleton, D., An Introduction to statistical Communica­tion Theory, McGraw-Hill, Inc., 1960.

15. Derusso, and M. Paul, State Variables for Engineers, John Wiley and Sons, Inc., 1965.

16. Meyers, P. L., Introductory Probability and statistical Applications, Addison-Wesley Publishing Company, 1965.

17. Lowan, A. N., Tables of Probability Functions, National Bureau of Standards, 1941.

Page 37: A class of pattern recognition machines

30

Appendix A

Learning by the Recognition Machine

(a) A recursion formula for machine learning.

Previously, in the initial description of the learning

machine shown in fig. 5, the mean vector and covariance

matrix were assumed to be known. It would be more realistic

to assume that neither the mean nor the covariance matrix

is known, unless a prior learning procedure has been applied.

Fortunately, it usually is the case that a set of learning

samples, xl' x 2 ' .•. , xk ' are available from the teaching

source. It is realized that the random environment in which

"patterns" are inunersed is familiar to the machine. In other

words, the covariance matrix is known a priori. The problem

left is to acquire a knowledge of the mean by learning.

Since all members of the learning samples are selected from

same random sourcei the probability distribution function is

assumed to be of the same form, but with unknown parameters.

An initial guess of these on its parameters will be required,

and a distribution P{x} assumed, before these samples are

exposed to the machine. A new p.d.f. P(X/xl } will be formed

after the first learning sample is used. Eventually, a

final version of P(X/xI , x 2 ' .... ,Xk ) can be derived, when all

these learning sarnpleshave been fed into the learning machine.

Symbolically, it can be expressed as follows:

Page 38: A class of pattern recognition machines

-)0- •••••• +

or L(X) -+-

In its learning phase, the mean is denoted by:

].l is the real mean of the sample mean.

¢ is the variance of the random mean.

x = M + N

Hence X = M = J.1

Since the noise and sample. mean are independent, (9) then

Covariance (X) = Covariance (M) + Covariance (N)

= K + ¢

In order to start the learning cycle, a guess on the mean

must be made

where P(Xl) = c 3 (a priori probability)

t -1 P(XI/M) = c l exp~1/2 (XI-M) K (XI-M) t -1 P(M) = c 2 exp-l/2 (M-po) ¢o (M-].lo),

31

(2 )

(3)

(4)

(5 )

(6 )

Collecting those terms involving M on the right side of equ. (6)

and substituting itbecornes

t -1 t -1 = c exp 1/2 (XI-M) K (xl -M)-1/2 (M-].lo) ¢o (M-Vo)

= c exp 1/2 · {Mt (K- l + cp:l) M

t -1 t-l -2 (M K xl + M CPo vo)}

where c = c l c 2/c 3

let I -1 -1

K- + ¢ 0 = <Pl

* The learning of the mean is discussed for only one of the pattern classes.

(7 )

Page 39: A class of pattern recognition machines

33

The iterared form of equ. (13) will assure the con­

vergence of the learning process. A well known theorem C5L.tn

statistics will provide more powerful proof.

The II Zero . .,. one law" says:

liThe sequence P(B/Yl 'Y 2 ' .... ,Yn ) of conditional probabilities

of a property B of the sequence Yl , Y 2 J o:"'Yn given the first

n terms of the sequence converges almost surely to 1 or 0

according as the sequence has or has not this property."

If t is the true value of t, then o

n -+- co

Where 8(t-t ) is the Dirac delta function. o

C15L

In the event that observations Yl , Y2 , ... do have the prop-

erties of t, then the samples fed into the learning machine

certainly will have the property of a specific mean. This

was guaranteed by the fact that all the learning samples are

selected from a certain pattern . In other words, the learning

is directed. It is sufficient to assure the convergence of the

learning process, though it is not necessary.

The convergence of the learning of the mean can be seen more

clearly by rearranging equ. (13) in terms learning samples and

then substituting (12) into (13),

~2 = K (<1>1 + K) -1 <1>1

K [K (K + <1>0) -1 CPo + K]-l ·K (K + ~o)-l cf>o

<1>0)-1 1]-1 K-1 K(!< -1 (16)

= K [(K + CPo + + CPo) cf>o

Page 40: A class of pattern recognition machines

also

).12

= K [(CPo + (K +CPo)]-l CPo

-1 = K [K + 2 ~ 0 ] cP 0

= K (K + cp )-1 ).11 + 1 -1

CP1 ( K + CPl) X2

= K (K + cP )-1 [K (K 1 -1 -1 + $0) llo + CPo (K + CPo) Xl]

+ CPl (K + CP1)-1 X2

= K (K + CP1)-1 K(K + CPo)-l ).10

34

+ K(K + $1)-1 CPo (K + $0)-1 Xl + CPl (K + CPl)-l X2

now,

K (K + $1)-1 K(K + CPo)-l ).10

= K [K (K + $0)-1 $0 + K]-l K(K + CPo)-l ).10

-1 -·1 = K [(K + $ 0) . {(K +$ 0) $ 0 + I} ] )l 0

-1 = K [K + 2 ~ 0] l.l 0

and -1 -1 -1 K (K + $1) $0 (K + $0) Xl + $1 (K + cP 1) X2

= K (K + CPo)-l CPo (K + $1)-1 Xl + $1 (K + $1)-1 X2

= K (K + CPo)-1 CPo (K + cP )-1 1 (Xl + X2 )

= cP 0 (K + $0)-1 K [K + K(K + CPo)-l $0]-1 (Xl + X2)

= CPo [ (K + CPo){I+ (K + $0)-1 CPo }J-l (Xl + X2)

= CPo [K + 2 CPo] -1 (Xl + X2 )

so that

It can similarly be shown by induction:

Page 41: A class of pattern recognition machines

113 = K[34>0 + K]-l 110

¢3 = K [3<po + K]-l ¢o

Hence a recursion formula

found to be:

-1 lln+1 = l< (K + n~o ]

¢n+1 = K[K + n~o]-l

1 K[k + ¢0]-1 or J.l n +1 = n n

¢n+l k

[~o k -1 = - + -] n n

It can be seen that 1 n - LX nn=ln

Lim ~h+l = ~ (null set) n+oo

35

+ <Po [3 ~o + K]-l (Xl + X2 + X3 )

(18)

for adapting the random mean is

n~ 0] -In n

)10 + CPo (K + L (Xi)

i=l (19 ) <Po

[ k -1 1 n x· J.lo +¢o <Po + -] ill n 1- (20)

i=l

CPo

(21)

(22)

According to statistic definition, (21) is just the real mean.

(22) indicates that after a large amount of samples have been

processed, a constant mean vector would be updated and become

the real mean vector. The learning behavior has been simulated

in computer; an apparent convergence property of the learning

process was observed, though the learning samples were finite.

Page 42: A class of pattern recognition machines

36

Appendix B

1. Definition of error

While the learning samples were well selected, statis­

tics are established. In the case where the random environment

was known to the learning machine, a reasonable assumption was

that the covariance matrix was known. The remaining job for

the machine is to set up an adaptive scheme to update the

random mean until the real mean or an acceptable updated mean

vector has been found.

In a realistic situation, only finite number of learning

samples are available. Due to this physical limitation,

termination of the machine learning within a short period of

time is inevitable. Hence, errors due to misclassification

are likely to be produced by the recognition machine. The

asymptotic behavior of the learning phase was discussed in

Appendix A. In this section the relation of probability of

error as a function of the signal to noise ratio and the cross­

coefficient among patterns will be determined. The binary

case is first considered using probability theory. A general

formula for predicting the over all probability of error for

the system can be found.

Generally, patterns are in the form of continuous time

function

O<t<T

O<t<T

Page 43: A class of pattern recognition machines

37

In order to process those patterns in a digital type recog-

nition machine, discrete sampling must be done before

presenting them to the recognition machine. Sl(t), S2(t)

will be mapped into an m-dimensional vector space.

i.e.

Sl : {ull

j u 12 j u j} , , ..... , 1m

S2 . {u2l j u 22

j u j} . , ••••• I 2m

t

(1)

all are now in vector form. Using Bayes rules a likelihood

function is found when an unknown sample, X, is presented

to the recognizer. As previously obtained, the likelihood

ratio function is:

where P (X/S 2) 1 1 (X-S 2) K- 1 (X-S2

) = m/2IKI1/2

exp "2 [21T ]

(2)

P (X/S 1 ) 1 1 (X-S 1 ) K- 1 (X-S 1) = exp

(21T] m/2 I K 11/2 2

K is the covariance matrix

The likelihood ratio is:

1 -1 1 -1 L(x) = exp ~ (X-S1 ) K (X-S1 ' -"2 (X-S 2) K (X-S 2 ) (2-2)

It is to be compared with a threshold value Q. (9)

A decision being made as:

When L (X) > Q Classify X to be S2

L (X) < Q Classify X to be Sl

Errors occur when the machine recognizes X as being from Sl

when X is really from S2' and vice versa.

Page 44: A class of pattern recognition machines

38

For convenience in calculating the probability of error,

let:

a = False dimissal (3)

= Probability of rejecting 81 when it is true

a = False alarm (4)

= Probability of accepting 8 2 when it is 81

If the samples are partitioned so that 8 1 is in region Dl and

8 2 is in region D2 , then

( 3-1)

( 4-1)

Assign ql' q2 to be the a priori probability of 8 1 , 8 2 respec~

tively.

Then, the probability of error will be

Fe = qla+ q2 8

For equally likely events when

1 ql = q2 = "2

(5) is simply I

Pe = a = a

Equ. (6) implies the threshold value implies Q = 1

h ' , (8)

It can be shown that the best c o~ce for Q lS

q2C S Q = qlC

a

(5)

(6)

(6-1)

Where C is the cost of making an error of the first kind (a) a

and Ce is the cost of making an error of the second kind (8)

Instead of considering (2-2) and taking the logrithm on both

Page 45: A class of pattern recognition machines

39

sides, let V = log L(X) = ~ (X-51 ) K-l(X-S l )

1 -1 -2 (X-8 2)K t X- 8 2 ) (7)

Since the right side is composed of X, V is also a random

function with Gaussian distribution.

Choosing C~ = CS ' V is now to be compared with the thres­

hold log Q = log 1 = 0,

X is recognized as 52 when V>O

X is recognized as 8 1 when V<O

According to (3-1) and (4-1)

co ~ = J oP(V/8 l ) dv

00 ( 8)

e = J oP(V/8 2 ) dv

The mean and variance of the random variable must be found

before the computation of either ~ or 8. Without due compli-

cated manipulation, further assumptions were listed below:

(a) 81 and 8 2 are of equal energy

i.e. J~8i dt = J'[S~ dt = E

-l<p<l

Where p is the cross-coefficient between two patterns,

i.e., it is the measurement of its similarity.

(b) The noise is white Gaussian.

then K = 02

I

(7) is simplified:

v = XtK- l (82 - 8 1 )

when X = 8 + N* 1

*81 is equivalent to Ml for clarity of notation.

(9)

Page 46: A class of pattern recognition machines

N = 0

v = 8 t K- 1 1 1

t = (8 1 8 2

Hence, from the assumptions of (9)

It becomes

= E (p - 1) vI 2

0'

This is the mean when X is 81

,

The variance has nothing to do with 81 or 8 2 -

Variance of v = Variance eX = N)

= Nt R- 1 (8 -8 ) -Nt K- 1 (8-S ) 2 1 2 1

= (S -S )t K- 1 NtN K-1(S -S ) 2 1 2 1

By definition

K = Nt N

t 80 variance of v = (8 2-81 )

t t t = 8 2 S2+8 1 8 1 - 28 1 8 2

= 2(E - pEl 2

0'

= 2E (1 - . p) 2

0'

Substituting (9) and (10) into (8), then

Let

CI. = - 2 (v - v,) - dv

2·Variance J co 1 , . . exp -

o (2TIVariance) 1/2

co = 10

1 [V + E(1-p)/0'2 ]2.

exp - ~~;==;~~

v v' = ~~~==~=

2/E (1- p ) / cr 2

2/E(1-p)/o2 2/E(1-p)/cr 2

dv

40

(10)

(11)

Page 47: A class of pattern recognition machines

41

co Then a. = fo

1 exp - [v' + IE(l-p) J2 dv' 14 cr 2

( 12)

Again let v" = Vi + IE (l-p)

I 4(12

So a. = J;E(l-P) 1 2

4(1 Z I7f

exp - v"

(12-1) is of the error function form

1 [1 erf (IE (l-p) ) J = 2" - 4 0 2

from (6-1)

then P 1 [1 erf (IE (l-p) ) ] = 2" - 40 2 . e

( 12-1) dv"

( 13)

(14 )

This is the probability of error of binary pattern recog- .

nition. Overall probability of error of the multiple pattern

case can be derived on the binary basis.

Suppose n patterns are observed. There would be n(n-l) 2 '

different probabilities of error independent mutually exclusive

events. Let the overall probability of error be denoted by PE'

Exploiting a theorem developed in probability theory. (16)

p 13 =

p 12 =

n(n.;.l) n(n ..... l) n(n-:-l) 2 2 2

PE = L p, - L P ,P. + L i=l e1 i<j=l e1 eJ

T K2f~oX Sin x dx f o Sl83dt = 100 = T 2 dt f 081 f~O x 2 dx

0.99

P . P . P k-'" •• e1 eJ e . .. (15)

K2 = Normalization Constant of pattern" 3"

12 50 = 13

f~ 8 1S2dx = Klf~Ox2xdx = 0.968

f~Ox2dx fT 8 1

2 dx

0

Page 48: A class of pattern recognition machines

Kl = Normalization constant of Pattern "2"

1 15 =

50 13 T f ~ OKI K2

x 2Sin x foS283 dx lOOdx

P23 = 2 = T

dx f~o x2

dx foSl

. pe12 = . . 1 [l-erf (v'~-. (1-P12)] "2

Pe13 = 1 [l-erf (/~ (l-p 13) ] 2" 46 2

pe23 = 1 [l-erf (/~ (1-P13) ] 2"

:2lA) =:= p~pe12

PCB) = P o Pe23

P (C) = P o Pe13

Po 1

A priori = "3 probability of misrecognition for each

binary case.

Since eventsA, B, and C are equally likely.

A error between 8 1 , S2

B error between 8 2 , 53

C error between 53' 51

P ::: P(At'lBnC) = peA) + PCB} + pCC) E ·

- P (AnE) - P (BnC) - p (cnA) + P (At'lBnC)

or

PE = peA) + PCB) + p(c) - P(A}P(B) - P(B)P(C) - P(C}P(A)

+ P (A) P (B) P (C)

42

(16)

(17)

(IB)

(19)

P(A), PCB) or p(C} can be calculated from equation C17}; each

represents the binary error between two patterns.

Page 49: A class of pattern recognition machines

The theoretical prediction of misrecognition was

shown in Fig. 10.

43

Page 50: A class of pattern recognition machines

Appendix C Flow Chart of the Simulation

Gaussian Function Generator

Start Generate RAND (0)

VAR=Constant o <RAND (0) <1 Uniformly-distributed

. 5 J--_<

Pattern "111

0 .- ~ Pattern

"2 "

Pattern " 3 !I

Pattern Generator

1---401 E= - 6

E = E+RAND (0) ' i=l 12 No

U(K,I,J)=M1j+E*VAR

K=l, 1=1, 50D,J=1,4

Yes

U(K,1,J)=M2j +E*VA

U(K,I,J)=M3 ,+E*VAR ~------~ . ) K=3,1=1,500,J=1,4

44

Page 51: A class of pattern recognition machines

45

Calculation of Covariance Matrix

3 500

r-- D(K,I,J)=U(K,I,J)-M(K,J) AA(IR,IC)=L ~ D (K, L, IR)IXK, L, IC) 11 f- K L 1500 K=I"13 , J=I,4

~,' 12 Inverse AA

Subroutine

Recognition

KK=RAND (0) . 3+ 14J----t Error=O

Indx=1,500 ----=y;..::e=s"--~ XX (J) = U (KK , I , J) 1------;

15 I=Indx, J=1,4

~ P(X/SKX ) ~ -~[XX(J)-M(KX,J)]tAA-1(XX(J)-M(KX,J)]~ 15~--~ ~

Or P(KX) KX = 1,3

16~ __ ~PLIKELI=P(1!' ~~IK~~[t~z:~Y~e~S~PLIKELI=PLIKEL K~l T '." K~l

____ ~PLIKELI:;::P ( 2 )1-----117 KX=2

~ ____ ~~~~~~DD~~~~PLIKELI=PLIKELI~~ ~ . KX=KX

~==~Error=Erro Indx

=Indx+l Yes Error rate=

Error 500

STOP

Page 52: A class of pattern recognition machines

___ LCV CL:_W 1).,-' .... 6u.6 ________ _ ~ at UJliV0rsi ty of' ~tisGouri at Rolla

S • 0 () 0 1 0 H~ r.: N S t ON t.J ( 3 , ~ 0 (l ,It) , U \1( 3 ,/d , D C3 , ') 0 f) ,It) , l\ II ( 4 , ,.-) , X X ( I .. ) ~ • f)Q()" . n'r:~ PO'I c:; I f)'-! II P.!V ( I" Itl ,'1 )(1·' {4 , , D8 n9 Y ( "3) , "~!:.!J.'t , , nK,.: ( " , , (U!~O, 'tOJ ______ _ S • 0 DO J P I :.\ F f\1 S rON /I I,' P ( '. , 't ) , ~~'I 0 X ( It ,4 ) , A X ( 't , 't ) , /I. Y ( it ,4 ) , A C ( 4 , 4 ) , ,\ 0 ( 4 , 4 ) S.onO/. OIf.!ENSTO~J IJHA(4) ,lJ~f-\(/t>

. _. ___ . S_._O.0J2 ') r: p S ::: () • () Q Q 0 L ___ -:... __ 5 • 0 () fJ () ! X = 5 0 0 S.0007 JS=4

___ -'5---. • ..:..:0 () f) C\ ~ I = J S . 5 • (H) n I) t·) /)::; J ~ C).on)o ~X=12

._ .. . __ S .. D iUJ C{= S0.-:'3..I ( '; './3."...''--____________ _ S.0017 CZ=S~RT(?/3.)

S. no 1 " Dn ')00 KV='t, 75 Stonl'f 'VK=I<'V $.0015 v~n=SQRr(VK) S.0016 1 on 20) KA=1,3

__ ~S_.i)_0J 7 K=K A s.OOt~ X=20. S.OOlQ on 20 J=l,JS c:; • () f)? n n Il_?? '_=_l!o..... :r..' 1...1 .:..:..x __________________ ~ __

~.0021 F=-6. s • 0 () ? ? [l n 2 " 1-1 == 1 , 1 2

____ s_._O!.12_j '11+ E = F +R ,'\N n ( IJ .. ) ______ _ . s. \) ():? I,

S.OO!.') JF(I(-?)S,f:,7

!i I r: ( x - ~) (l • ) 5 1. , '> 1 , 5;> ___ ('~.J)n?:,) ~l y:.x

S.0027 GO TO 2~~ s./)n?fI 5;'! Y=lOO.-X

____ S..!!1 n . .: C} GO Tn ??? S.n030 6 TF(X-50.)61,~1,62

S • I} I) ,) 1 6 1 Y :: X * ~: ?I i) 0 • ___ h(~J .\ ? y = Y t" r. h'

S.Cli)l.l GO TO 222 S • 0 () .VI 6 ? Y = ( 1 0 (). - X ) * "" ? / 5 0 •

___ 5_._0 Pl'> Y...:.x.~....:.·r.~.\:..:..'I;-;? __________________________________ _ s.n03h GO Tn 222 $ • (I.) J, 7 -, . P T = 3. 1', 1 I) <}

___ (, • f' 'L\ .q y:: S PIC 0 T >!<)( /1 (In. , * 50. ,j::.

S.l/lnq Y=Y*Cl m S.(ln~n 77znCK,r,J)=E""VAR

Page 53: A class of pattern recognition machines

___ 5. nO!t 1_. 1/ U I Y .. •. L,_,LI = yt-l:..:;::"~.:' ____________ ~ ______ . __

S.OQ~2 U~(K,J)=Y

S.0043 20 X=X+?O. _____ ~s~,0n4~ 701 (nrlrT~~~!t~I~r:~~ ___________________________________________________ ~ _________________ __

s . OOlt '5 f)O 23 K -= 1 ,3 S.on4~ WRITE (1,1031 ~

. ____ <;_.J)J2.'t 7 \~E.-U.E_'-ld. 0 0) (l/'" ( K , J ) , J =~s _!.-) _______ ---------_______ _

<;.n04~ no ~3 1=1,10 <; • () ()It " 2 '3 I,m r If: (~ ,1 () 0 ) (IJ ( K ,I , J ) , J ::: 1. , .J S) tl S.I)()t;:l I :::'1()0

S.0051 hL~1500.

s.nos? no ~5 IR~l,JS _____ 5. ~)05_~ DO 'lS_l[.= l.t~J~S~ _________ ____________________ ,

S • i) 0 I) I.. f\ h ( r R , r c ) ::: 0 • s.oo~s nn 17 K=I,1 S • n i) S ~., I) n ~ 7 J = 1 , L s • () 0') 7 37 A A ( 1 R r I, C ) == II. ,\ ( r R ,Ie) + D ( K , J , I R) ~ .. 0 ( K , J , J C ) , S.0058 15 AA( IP,TC)=,I\I\( P<,IC)/txL

___ --"s • I) 0 5 '1 P n If? I R = 1 , ,~t.5 s • 0 r) .) 0 3 6 H R J T r: (3, 1 0 0 ) (.', tJ ( I R , Ie) , I C= 1 , J S ) , I R

C MEAN AND CnNVA~!~NCE HAVE REEN OBTAINED S • n n (, 1 n (1 1 I) 0 T = ] , ,I S S.0062 no 150 J=l,JS S • I) n I, .\ 1 5 0 ,6 ( I ,J ) = .':i f, ( I ,J )

.. _______ s_ .. _I')D_~~t r. ilL l,_Li'l V~L( A. ~,!.~::> S, f)F L T A, f\lDI __ _ $.006') nr) 1601=1,,15

---------------,

<; • 0 I) (, ~ f),'l 16 () .1 = 1 , J S \.onA? 160 AfNV(T •. I)::/I(T,J' ~.JnAq no ]q fR=l,JS 5.1')(>'1 39 \,I~)- IH (3,11)0) (.',T~N(IR,IC).TC-=l,JS),IR

. _ ___ ~_~~) . .Q7() ~JRJ fF (3,102) 1'::'. fA S.0071 fNOX=l ------------------ -----'--------_._--

S • 0~)7? fRR=O. S. ')f!71 It? KK=~/I,..:II(()*3.+!.

<; • I) 0 7 I.. J F ( K K - ~~ ) 1 7, 1 7 ,1 ~ S.0075 18 KK=1

__ -<- . __ S .,.0 n 7 6 17 DO? 5 J = 1 , J s~---~------~---').')()77 25 XX(J)=U(KK,IN~>:,J)

S • ('\()7 :-l ~. :) ',)7'1

').0,1 'H)

C MATRIX MULTJPLlflTION K p= 1

45 no 53 J=l,JS 53 DXM(J)=XX(J}-UW{~P,J)

'--------- ------- -

,j::,.

-...J

~- .------.~ - ----

Page 54: A class of pattern recognition machines

____ 5. OO~_l. s.on~~

S.0,)':'3

, ___ D!1_,...!:;'/-L-r_:-__ 1-,.J_~s ___________________ _

AKf·! (r )=0. 00 5/+ J;:: 1 , J S

S , n I) '1 /, C 't I; '< r.q r ) '" ~ K '1( 1 )+ t, T \, V ( T t ,n .:: D >n'l (,1)

s.on"5 AKM(KP)=O. S.()(lq~ O()SO JP=],JS

_~~_~.D~R7 ~') rK~(KP)=PKM(KP)+nX~UPL)*~·~A~K~Mu(~I~P_'~ _________ ~ __________ _ S.')(HU PRnnY(KP)=-r.K~1(KP)·

C C1LClILfTlON ()F LTKF.LIHOOf) ~ • ,),' P') K P - ~,-,-P-,~-!- l,-__________________________________ _

S • 00 0 q T F ( i" P -] ) 1,5 f '+ 5, 55 S.OO~1 ~~ PLr~LI=rRO~Y(l)

__ , ____ S. ()IX::> K X = L ') .elf) ') , T r.( P l r K L I - P R [) ~ y ( ? ) ) 66 , £> 7 , 65 S • ,) l) q It h £> P L I K I. -T '" P r (1 r. y ( 2 ) <:; .. Jlfl(,)~___ _ _ K X = ? S.OOq~ GO TO 65 S.0007 67 ~RR=frR+l.

__ ---'~.lLOCJ R 6" T f ( P 1. J I< L~J ----'-P.:...:.R=O=3-:-Y-'("""3, ... )--"_7'--'1"--,,_7'--'2"--L.' -'----7-"'-5 __________ ~ ____________ _ S • 009 ') 7 1 P l. I K l I ;:: P ~~ 0 B Y ( 3 ) $.0100 KX=3 S,OHU__ r.n Tn 7~ ___ _ $.0102 72 [RR=FRR+l. 5.010) 75 rr-(KX-KKl7n,77,76

_____ S.,_'1JJL'. 7f) F~R-:::_ r:Rf:'_:t.l.,,---____ _ --------------S.ntos 17 INDX=TNDX+l

S • i) 10 .'" J f= ( T N f) Y - S () () ) 4 2 , I.? , P, 't ___ ,-,-S . .J.Uill .f) It F I~ I) ;:: r- Q n I " ,

S.OlJ~ \'/t'{IT[ (1,105) ::R~

S.0l0~ ~GO CONTINUF , ___ , _____ <:; _._nJ~ ST0'-'._p ______ ____ _

S.~111 100 r-OP:'1AT( 10X,4J=lR.?,I to) S.()11 ,? liP FfJR"Il\Tf lOX,3EIA.~) C.Jlll"\ 1('1 FI1RMt\T(fl0) S.()llf, left FOfH1/1T( 15X, T5,9H LEARNTNG) s.nll':> 105 Fm~MIIT(lOX,6H FQPOR,FIO.~)

r M~IN prf)~RIIM END 5.0116 END

S T nl~ 1\ G r: hIli P VI\~T"RlFS (TI\f;S: C=CO~1~1ON, F-=H1UIVhLH!CE)

U r~ ~1 F T .'\ ~ RF L I\nR Nlllvlf TAG Rt:L ADR NAME ·TAG REL AOR --------~---,----------~~------------------:-

NM~E ~ -co .. - --~----------

Page 55: A class of pattern recognition machines

. ----1..1 EV'-t_=-LJULOu..6 ________ _ ---I31LilS.I..3..6iLll1\ S [c r:JlR I R II •. J L'I_Jll._Cn~1£.lLAJ .lUl'L. _____ .

S. 0'1')' SIJtl,l.:n\IT H~f HNR T( A, N, r: p S, Df:L TA, ~JD) ____ SLJ.L-fl..·). P} __ f) J ,.( PJ S flV" h ( 4 0 , 40 ) , I~ ( Ii f)) , c: ( I~ 0) , r D ( It 0) , I (,) ( 40 )

S.()'103 DtLTlI = 1 .. 0 S.01l04 no 100 K=1,N

C CFTFIH1PJI\TION OF TI~F prvqT ElEI'1ENT S.I)I)()'j PIvrJr=O.O S.()()!)f) 00 101 T:-:K,N S.()f)()7 nq 101 .J-::-K,"I s • q I) 0 t'\ T F ( (\ q S ( ,\ ( T , J ) ) - A p. S ( ;:> J va T ) ) ~ 01 , 101 ,200 S • q i) 0 '1 2 (\ 0 P I V () T ~-' 1\ ( I ,J )

_______ <; .•. ILCttD ___ . T P ( :<_'-:_1 S.0011 rQ(KI=J

._--------_._---------_.

S.OOl? 101 ((J~JTINUE

["1Ft n=f):':1. T"'*PTvnT T t= ( A P S ( ~ I va T ) - E -=P'-=S-::)-=2:-:Q::":Q=-,-::2:-:9:-:C):-, 2:'-_ ::O:-l--------'-----~---------

S.Ol)l~

S.!)!)l/. C [XCH~NGE nF THE PIV0TAL ROW AND THE K-TH ROW

.s_~QQJ '5 ? 0 1 I F ( T,P.ll~_J_=_!S....Lf 02 , ? 0 3 , 2 0 2 S.OOl() ?O? KK=IP(K) S.0017 'no to? J=l,N

___ --LS.L.£lOl~ 7=tdKK,.11 S.OOl!') A(KK,J)=/\(I(,J) S.0020 10? ~(~,J)=l

________ • _1 •• _

S.()O?l S.OO??

C FXCI-iH;r-~ OF THF P TVOTf~L CClUH1N ANI) THE K-TH COLUMN ~3-Tr--ilr- ('V ) - i<Tz f)/t ,To-S~-X04--------'------~·

701 • . JJ= YO (Y. ) ___ --'-s~._'_'_() i)? , f) n 1 (l -:>, r::-: 1 , N

S.00?~ T~P=t(J,JJ)

S.no?,> ,f\(J,JJ)=ll.(l,K) . ____ s _~ n 11..?J:... ____ 1Ql...1'_LL-'_fU = TfW

C JORD~N SlEP S.~O?7 705 PJV~r=l./PTVOT

____ ~s~. _r.\():> , ~ n n 11) '+ .J = 1 ._~~ S • U J? '1 T r ( J- '< ) ? ()(Jt ·~2'-;O:-:7-;-,-:2;:-O;::-6:-----------------------------------..:..--

S.O~'0 706 "(J)=-A(K,J)*PIVOT ____ . __ S_·.(!'2.2J ______ ~ J) =_~~.LL!_.__'_K__'_) ______________________________________ .

S • 'vn ~ GO TO 2.()~

S J) () 3 3 ? 07 n. ( J ) = PI vrn S.n014 C(J)=l. S.nn3S lOA A(.J,K)=O. S.()\)1~ 104 !I(K,J)::=O~ "'" \D

Page 56: A class of pattern recognition machines

50

VITA

The author was born on February 21, 1940, in Kwangsi,

China. He completed his primary education in Taichung,

Taiwan, China, and also his high school education. He

entered Taiwan Provincial Cheng Kung University in September,

1958 and received his Bachelor of Science Degree in Electrical

Engineering in June, 1962.

He came to the United States in January, 1965 and enrolled

at the University of Missouri at Rolla, working toward the

Master of Science degree in Electrical Engineering Department.