A class of pattern recognition machines
Transcript of A class of pattern recognition machines
![Page 1: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/1.jpg)
Scholars' Mine Scholars' Mine
Masters Theses Student Theses and Dissertations
1967
A class of pattern recognition machines A class of pattern recognition machines
Shaw Yih Chung
Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses
Part of the Electrical and Computer Engineering Commons
Department: Department:
Recommended Citation Recommended Citation Chung, Shaw Yih, "A class of pattern recognition machines" (1967). Masters Theses. 5167. https://scholarsmine.mst.edu/masters_theses/5167
This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected].
![Page 2: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/2.jpg)
f
A CLASS OF PATTERN RECOGNITION MACHINES
By
Shaw Yih -- Chung - "f/i) -
A
THESIS
submitted to the faculty of the
UNIVERSITY OF MISSOURI AT ROLLA
in partial fulfillment of the work required for the
Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
Rolla, Missouri
1967
Approved By
I ;LQ f')J
J'tJ;:; C. I
':' /it,d ~ 1\afl'! (advisor)
![Page 3: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/3.jpg)
ACKNOWLEDGEMENT
The author wishes to express his appreciation to his
advisor, Dr. F. J. Kern, of Electrical Engineering Depart
ment of University of Missouri at Rolla for his helpful
guidance throughout this study.
Special thanks are due to Dr. J. R. Betten, Professor of
Electrical Engineering, for his encouragement in initiating
this research.
The author also wishes to express his thanks to Dr. T. L.
Noack for his valuable help and comments.
iL
![Page 4: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/4.jpg)
TABLE OF CONTENTS
ACKNOWLEDGEMENT . . .
LIST OF ILLUSTRATIONS .
ABSTRACT
LIST OF SYMBOLS
CHAPTER 1 - INTRODUCTION
CHAPTER 2 - A SPECIAL CLASS OF PATTERN
CHAPTER 3 - THE STRUCTURE OF RECOGNITION MACHINE
CHAPTER 4 - SIMULATION OF THE PATTERN RECOGNITION MACHINES •. . . • . . . .
BIBLIOGRAPHY
APPENDIX A - LE~RNING BY THE RECOGNIT.JQ~ MACHINE
APPENDIX B - CALCULATION OF MISRECOGNITION RATE
APPENDIX C - FLOW CHART OF THE SIMULATION
VITA
iii
Pages
ii
iv
v
vii
1
8
12
17
28
30
36
44
50
![Page 5: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/5.jpg)
ABSTRACT
The classification of signals through the use of pattern
recognition techniques may be viewed as a statistical class
ification problem since at least some pattern classes are
analog signals which have been contaminated by additive noise.
In order to represent these signals at the input of a pattern
classifier they are sampled and a vector and an m-dimensional
vector space used to describe the pattern. Consequently,
these pattern samples are processed to determine the parameters
of the statistics. In this study the additive noise was assumed
to be gaussian in nature.
This study was based on a priori knowledge of the learning
samples. That is to say, learning with a teacher was investi
gated. The three classes of patterns used in the experimental
program were generated through computer simulation by adding
normally distributed random numbers to previously generated
signal classes. The sampled signals thus generated were
classified by means of maximum likelihood detectors.
The study was divided into two parts. In the first part
the statistics of the noise were calculated and the patterns
were classified on the basis of these statistics with no
learning taking place. In phase two of the study,learning
behavior was observed when a sequential calculation was made
to determine the noise statistics. It was shown that the
probability of mis-recognition of a given pattern asymptotically
v
![Page 6: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/6.jpg)
approached the theoretical minimum as the number of learning
samples increased. The probability of the misrecognition of
the patterns was also considered as a function of signal to
noise ratio and correlation between patterns. A theoretical
prediction of this probability of misrecognition was compared
with the results of simulation and found to agree closely.
vi
![Page 7: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/7.jpg)
List of Symbols
A Constant
c Constant
a. . Member of a covariance matrix ~J
d(So) Decision rule
I Identity matrix
- K Covariance matrix
m Dimensionality of the vector space
M Mean vector
M(k) Scalar component of a mean vector
n Number of pattern classes
N Noise vector
L(X) Likelihood function
p(X) Probability function
P(S/X) Condit~probability function
Po A priori probability of binary error
Q Threshold value
Pe Probability of error in recognition between two pattern
classes, binary error rate
PE Over all probability of misrecognition
qo A priori probability of each pattern class
S(t) Pattern sample in the form of a continuous time function
S Pattern sample in vector form
t Time variable
T Observation time of one sample
vii
![Page 8: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/8.jpg)
Ui Scalar component of a sample vector
V Logarithm of the likelihood function
X Unknown pattern sample, or input stimulus
a Probability of· error of the first kind
a Probability of error of the second kind
o(t-to ) Dirac delta function
~ Covariance matrix of random signal
~ Mean of a random signal
p Correlation between patterns
cr 2 Noise variance
~ Null set
A(x) Likelihood ratio function
C
e
Cost of making error of the first kind
Cost of making error of the second kind
Random variable representing unknown parameter or
parameters
IAI Determinant of a matrix
viii
![Page 9: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/9.jpg)
1
CHAPTER 1 - INTRODUCTION
The natural world, due to its random behavior, presents
humans with a variety of phenomena which are well categorized
and recognized by people as belonging to one class or to a set
of different classes. This recognition is generally accomplish
ed through a learning of the various patterns. Hence, one may
tell the difference between "lion" and "tiger" or "All and "BII.
Any object in the world may be described as a distinct pattern
provided one is willing to assign an arbitrarily long descrip
tion to it. Because of the size and complexity of this problem,
one would expect the "hardware" which could do the job of
pattern recognition would be large and complex. Furthermore,
the realization of a machine which will exhibit the human
behavior which is called "learning" is difficult to realize
on other than a simulation basis. In the beginning, at least,
no attempt will be made to discriminate between "automobile"
and "tiger" but rather between "lion" and "tiger". The differ
ence seen here would be that lion and tiger are in the same
general class of natural objects whereas automobile and tiger
are not. Consequently, this study will be concerned with the
recognition of pattern samples coming from the same set of
natural objects.
Other difficulties arise in the processing of patterns
by pattern recognition machines. In order to be able to dis
tinguish "tiger" from "lion ll one must provide a proper encoding
procedure so that a viable mathematical description of the
![Page 10: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/10.jpg)
2
pattern to be recognized is provided to the machine. A well
designed pattern recognition machine can recognize different
patterns only if the proper sensory equipment is available or,
alternatively, the proper encoding procedure for the various
patterns is done by a human prior to the presentation of the
patterns to the input terminals of the machine. For the moment,
suppose a pattern recognition machine has been built and is
available. Further suppose that an unknown pattern name X
is presented to it. The function of the machine is now to
identify the pattern as belonging to one of the pattern classes
which the machine is constructed to recognize.
where X is the
N is the
n is the
S is the
X = S. + N ~
i = 1, 2, ... , n
unknown sample
contaminating noise signal
number of pattern categories,
pattern class
and
Since,
(1)
the processing of the input signal done by the machine will
be statistical in nature. This type of calculation arises
because equation (1) defines X as a random variable.
The development of a statistical decision theory and the
advent of the readily available large scale computing facili
ties have obviated the need to actually construcit specialized
"hardware ll to deal with pattern recognition problems. It is
![Page 11: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/11.jpg)
much more economical and much simpler to simulate these
machines on a general purpose digital computer. The
algorithms necessary to implement this type of pattern
recognition owe their development, by and large, to the
recent interest in implementations of statistical decision
3
theory. A large amount of literature, most of it since 1960 ,
may be found which pertains to the statistical pattern recog-
nl.'tl.'on problem (9) and (10) M ' t' t 'th" f' ld any l.nves l.ga ors 1n 1S l.e
( l) I ( 2) I an d ( 3 ) d t th ' d f th 1 b ' a op e l. ea 0 e samp e as el.ng repre-
sented by a vector in a multi-dimensional vector space. The
classification problem then reduces to a partitioning problem
in a multi-dimensional vector space; (1) the dimension of the
space being the same as the number of sampling points per
pattern. These sampling points may be the result of a sampling
process carried out on a continuous time function or might
simply represent a proper encoding of features which go to
make up a particular pattern. The classification of these
patterns may then be done by partitioning the resultant finite
dimensional vector space so that all members of one class fall
inside a given boundary and the space is divided into disjoint
subspaces. Each subspace will contain only members of one
particular given class. (I)
Pattern recognition schemes differ from one another de-
pending on how one approaches the problem of partitioning
the space. Many ~nvestigators have defined optimal knowledge
about the samples. Optimal schemes for space partitioning
![Page 12: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/12.jpg)
.C3Y have been developed using hyperplanes. .• These schemes
revolve around the use of variable threshold logic units as
the basic building block of the pattern recognition machine
and are preferred in situations where the signal is noise-
less. That is to say, this type of machine is employed in
instances where the pattern space is linearly separable.
Alternate schemes along this same avenue of approach have
( 31 been developed using hyperspheres instead of hyperplanes.
When the pattern space is not linearly separable the threshold
logic approach to pattern recognition may notxesultm machines
which will perfectly classify a set of patterns. The problem
then becomes one of maximizing the probability of classifica
tion of a given pattern correctly. Thus, when the patterns
4
contain random noise or other contaminating components, pattern
recognition may be viewed asa statistical classification prob-
lem in m-dimensional vector space. The recognition problem is
thus tied in with the existence of probability density functions
which will describe the given patterns and their associated
noise components. This is necessary so that one may build a
statistical model which adequately represents the . recognition (10), (4) .
machine to be simulated. It has been pointed out
the maximum likelihood ratio detection scheme using linear
threshold detectors is an optimal classification procedure
if the noise is uncorrelated with the signal and gaussian in
nature.
![Page 13: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/13.jpg)
5
There are two important phases in pattern recognition
problems, learning and decision. In the learning phase the
pattern recognition machine must adapt itself to obtain the
threshold weights for determining hyperplanes or alternatively,
the statistics of the pattern class so that the probability
of classification may be raised to an acceptable level. Two
types of learning may be simulated. The first is learning
with a teacher and the second is learning without a teacher.
(7), (8) For the former, the decision boundaries are labeled
so that one knows in advance what class a particular pattern
belongs to during the learning phase. This is shown in
figure lA.
m-dimensional space
Fig. lA Learning with a teacher
,,.._--- , 'S •
S~~~- .. • \. y \ t X ) '.. ,-" ,.,~ .... ~ -..... ' ................ ,
-"';!, , r I
,," S } 1" I Z / .. ---- I I ,. . ~ J , ,,;. J ,_# , #-~ ,-_ ...
m-dimensional space
Fig. lB Learning without teacher
In learning without a teacher the recognizer is completely
isolated from the source generating the pattern classes so
that the decision boundaries are not known and must be
determined. This type of situation is shown in figure lB.
Comparing the two figures one notes the rather obvious advan
tages of learning with a teacher. A more complicated and
adaptive scheme will be required for learning without a
![Page 14: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/14.jpg)
6
teacher, since the decision boundaries mayor may not be
hyperplanes. The hyperplane separation does not necessarily
determine an optimal decision boundary. In figure Ie a
optimal hyperplane may be found while in figure lD one notices
that it may not.
Hyperplane
m-dimensional space
Fig. Ie Linearly separable patterns
m-dimensional space
Fig. lD Linearly inseparable patterns
For instance, if the patterns are a normal random process
with equal covariance matrices, an optimal recognition scheme
may be structured by finding a proper hyperplane separating
regions of these classes. The probability of error in recog-
nition of each class would be a minimum when the optimal
decision boundary were found.
Hyperplane
m-dimensional space
Fig. 2A An optimal decision boundary
Hyperplane
m-dimenslonal space
Fig. 2B A non-optimal decision boundary
As may be seen by comparing figures 2A and 2B, the optimal
decision boundary may not always be a hyperplane, but may
rather .beci.hyperbola or parabola or even possibly a wavy line
![Page 15: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/15.jpg)
7
of some sort. In more complicated situations involving more
than two classes, learning without a teacher may be accom
plished provided one first determines the statistical properties
of the patterns to be classified. Both parametric and non
parametric techniques have been developed whereby one may
"learn" these statistics by sequentially processing a series
of patterns during the learning phase.
Once the learning phase of the pattern recognition se
quence has been completed the decision phase is entered. In
this phase additional patterns from the same sample set may
be classified while the structure of the machine is kept
fixed. In the decision phase then the structured machine may
be used to classify patterns from the same sample set even
though they were not included in the set of patterns used
during the learning phase of the recognition process. Num
erous studies have shown that in practical situations the
structured pattern recognizer possesses predictive properties
which enable it to classify patterns quite reliably once the
learning phase has been properly carried out.
![Page 16: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/16.jpg)
8
CHAPTER 2
A Special Class of Pattern Recognition
In application the patterns to be classified are pro-
duced as the result of a proper encoding procedure. One
practical example of this procedure would be the sampling
of a continuous time function within a given time window
duration T. This study will concern itself with the class
ification of signals of this type. Pattern examples may be
thought of as a sequence of m sample values expressible as:
8 1 . j (~) , 8 1 j (t) · .... , 8 1
j (t) 2 ' m
8 2 . j(t,), 8 2 j (.~) , · .... , 8 2 j (~ (2)
..
. . . . . . . . . S j (t~ , 8 n i (\) , · .... , 8n .
j (tl n
0 < t < T -j = 1,2, ... ,p
The number of pattern examples in each class need not be
the same; however, it is convenient if the same number of
examples are selected from each different pattern category.
The pattern represented in (2) may be treated as vectors in
an m-dimensional vector space.
s j 1
. j j = {ull I u 12 I • • • ,
![Page 17: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/17.jpg)
9
S2 j
= {u2l j u 22
j u j} , , . . . , 2m
. . . . . . . . (3 )
S j ={unl j un2
j u j} n , I . . . , nm
For simplicity, the i-th sample in vector form will be
simply denoted by the capital letter, s.j. In equation (1) l.
it was seen that patterns were cor.rupte.q b.y· Gau~siann·Q.ii~
which will be presumed Gaussian in nature. In view of this,
(3) may be rewritten as follows:
Sj=Mj+N 1 1
S j = M j + N 2 2
s j = M j + N n n
( 3-1)
where M is the mean vector and N is the noise signal which
is presumed to be stationary and Gaussian with zero mean value.
Thus, ---r
S. J = 1.
M. j + N 1.
= M. j + N i = 1,2, ... ,n l.
Since, N = 0
and it is assumed that
I
M. J N = 0 for all i l.
it follows that
--r
M. J = M. l. 1.
(4)
![Page 18: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/18.jpg)
S) = M. i=l,2, ••. ,n
1 1
and M. is seen to be the mean vector. 1
Mik = Lim p+oo
P 1 \' ; . - [. J P . u' k . 11 1=
k=1,2, •.• ,m .
~ __ ~ __ ~ ____ ~ __ -L __ ~ __ ~t o M. !II T
Fig. 3 An Ensemble of Sampled Signals and Their Mean.
10
Likewise, N may also be denoted as an m-dimensional vector,
N = Nl
,N 2 , .•. ,Nm
. The covariance matrix may be found (9)
from the relation
K = Nt N
or K =
![Page 19: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/19.jpg)
Where each term in covariance matrix is defined to be:
a = N N = (u. -M.) ( u . . -M.) ;::; a pq p q lp l . lq . l · qp
i = l,2, .•• ,n
.p ,q = 1, 2 , • • • , m
Note that covariance is real symmetric and that
a .. > 0 II
for all i
11
It can be shown (14) that the covariance matrix is positive
definite. A real symmetric matrix "All is said to be positive
definite if there exists a nonsingular matrix X such that
xtAX = I, or equivalently, }lA x>O. Furthermore, a matrix is
positive definite if all leading principal minors are positive (IS) .
It follows that the matrix A- l is also positive definite
Let X = A-ly
xt ;::; yt A- l
then xt A X = (yt A-I) A (A- l y) = yt A-I y>O (10)
It is clear that K- l is also positive definite. This is
essential to the maximum likelihood decision scheme which
will be considered.
![Page 20: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/20.jpg)
12
Chapter 3
The Structure of the Recognition Machine
It can be shown that for a Gaussian process, Bayes'
rule is an optimal (4),(10) decision scheme which minimizes
the probability of misrecognition.
Bayes' rule is:
d(X) = 8 0 , that is, the unknown pattern, X~ 8 , is recognized
as a member of pattern class So.
d(X) = So
if peso/X) > P (Si/Y) for all i ~ 0 ( 11)
Thus, the pattern Sois the pattern with the largest a posteriori
probability from among all a posteriori probabilities in order
to make the decision in (11). All patterns are compared with
the unknown pattern X. So, the a posteriori probability can
be described in a more detailed form as:
P (S./X,{S.}) 1. J
i,j = l,2, ..... ,n (12)
The same a posteriori probabilites are preserved when the
sample means rather than the samples themselves are used in
(12). In statistical terminology, the set of sample means
M. , j=1,2, ... ,n, is a sufficient statisticClO)forthepararreters. J
Equation (12) then becomes:
( /' { }) PCS.) P(X,{M.} /8.)
P S. X, M. = 1. J 1. 1. J P{X,{M.})
J ( 13)
i,j = 1,2, ....• ,n
![Page 21: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/21.jpg)
Suppose that the occurrence of the events Sl' S2' ••..• , or
Sn is equally likely, i.e. the a priori probability of appear
ance is the same for all patterns.
The joint probability density can be written as: n
P(X~{M.}) = L P (S . ) P(X,{M.}/S.) ] i=l ~ J ~
n = L P (S . ) P({M. }/S.) P(X/{M
j}, s. )
i=l ~ J 1 ~
The first two terms are obviously not related to the
occurence of X. The third term may vary for different s. ~
(15 )
but the summation over all pattern space will be simply the
a priori probability of its occurrence. This indicates that
the quantity on the left side of (15) is a constant.
Let P (X, {Mj }) = A2
Substitute (14) and (15-1) into (13).
It follows that
also
Since
P(S./X,{M.}) 1 ]
A3 = Al/A2
= A3 P (X, {M. } /S . ) J ~
P(X,{M.}/S.) = P({M.}/S.) P(X/{M.},S.) ] ~ ] ~ ] ~
M."E- {M. } 1 ]
(16)
( 17)
and the probability function in (17) is conditional on Si'
only Mi will influence its formation. The first term on the
![Page 22: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/22.jpg)
14
right side of (17) which is the distribution of all means
is presumed to be known a priori. Thus
P ( M. /S.) = P ( M. ) J 1. J
j = 1,2, •.. ,n
= a priori distribution of the means.
= A4
This may be learned by following a scheme which is presented
in Appendix A.
(16) can be rewritten as:
P(S./X,{M.}) = A P (X/M. ,S.) 1. J 1. 1.
where A = A3A4
Thus, P(S./X,M.) is a monotone function of P(X/M.,S.) (18) 1. J 1. 1.
and is called the likelihood probability function. The
decision rule of (11) will be based on this likelihood function
instead of (12),
Le. d(X) = So
P (X/So ,Mo) > P (X/S. ,M.) 1. 1.
i = 0 is a trivial exception.
(18-1)
Now, X is a Gaussian vector in m-dimensional space, therefore,
it will have a multi-dimensional probability distribution (4)
which is given by the equation,
P (X/M. , S.) = 1 exp [_.l- (X-M.) tK-l (X-M. ) ] (19) 1. 1. (21T)m/2 1K.1 1 /:.! 2 1. ~
The information regarding Mi is contained in 8 1 , Hence, the
notation of the likelihood probability function will be simpli-
fied hereafter to
![Page 23: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/23.jpg)
16
It is indicated that the maximum value of (20) occurs when
the unknown pattern sample has exactly the same mean as the
i-th pattern class, though it was not likely to happen.
A protctype recognition scheme can be realized by using
the decision rule outlined by (18-1) in connection with the
likelihood function developed in Equ. (20). It was shown in
Fig. 4.
In this study, three pattern classes were generated by
the computer. First,implementation of the recognition machine
was done on the assumption that mean and covariance were known
in advance. A complete block diagram of this rec~gnition
with no learning taking place was shown in Fig. 5.
Following this an implementation of the learning algorithm
discussed in Appendix A is investigated.
P (Sl/X
() a ~ S
X f!? (S2/Xl Pl iDecision 11 Pl n-o
I 11 I
I
l?(Sn/X
Fig. 4 Proto type recognition scheme.
![Page 24: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/24.jpg)
Ml Sl P (X/S l )
K
x 0 0
A a Pattern 'D S Generator
P (x/s2 )J PI Decision 52 Ii M2 "'I PI
rt ~ 0 X Selector K X Ii
1
M3 S3 "\ P(X/S 3 )
(
K X
X
'--Misrecognition Counting
Fig. 5 Recognition scheme -- Mean and covariance were calculated beforehand.
j-:-I
-...J
![Page 25: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/25.jpg)
18
Chapter 4
Simulation of the Pattern Recognition Machines
(a) Generation of the patterns
Imagine that the sets of pattern vectors in Fig. 4 are
the outputs of a random channel, whose input is the "TRUE"
pattern. Assume that the non-dispersed pattern in its
deterministic form can be described by an algebraic expression.
A typical E.K.G. waveform (13) shown in Fig. 6 will serve a
convenient example for discussion.
R
p T
Fig. 6 A typical E.K.G. waveform
In usual E.K.G. measurement practice, the data measured from
the human body, which can be viewed as a random channel, would
tend to be normally distributed. For the convenience of sim-
ulation, only the QRS part of the wave will be concerned.
This portion m~ght be approximated by a triangular configura
tion which is expressible by the algebraic equations:
y - x 1 -
= 100 - x
0<x<50
(23) 50<x<lOO
![Page 26: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/26.jpg)
This function is shown in Fig. 7
50 == X 0<X<50
== (IOO-X) 50<x<IOO
o 50 100
Fig. 7 Pattern "1"
Physical abnormality or disease will distort the normal
wave so that patterns are produced. These patterns are
19
generally per.ceived as distortion of the E.K.G. pattern. For
simulation purposes, these distortions may be represented by
2 x
2 == Kl (IOO-x)
As shown in Fig. 8
0' 5!l. 100..
Fig. 8 Pattern
and also
sin TTX Y3 == K2 (100)
O<x<50
50<x<100
112"
Y2=K~X2 0 X 50
2 ==K l (100-X) 50<X<100
O<x<100
(24)
(25)
![Page 27: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/27.jpg)
As is shown in Fig. 9
a
Fig. 9 Pattern" 3 11
In Equ. 24, 25 Y2 and Y3 must be normalized so that
= =
T = 100
Where Yl' Y2 and Y 3 are generated by (23), (24) and (25)
respectively. Using Equ. (26), 1
(5/3)2"/ 50 1
= (2/3}2 50
it was found that
20
(26)
satisfy (26) so that the patterns are of equal energy. Math-
ematical complications are reduced in calculating probability
of error (Appendix B) when the patterns are so normalized.
The list of categories in this simulation program are assumed
exhaustive, i.e., all possible outcomes are included.
Thus, if X -E- S (27)
and
n equals 3 in the problem being studied.
The patterns described by (23) I (24), (25) are immersed in
a random environment so that
![Page 28: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/28.jpg)
21
8 1 = Yl + N ( 0, 0 2 )
8 2 = Y2 + N ( 0, 0 2 ) (28)
8 3 = Y3 + N ( 0, 0 2 )
The random function is Gaussian distributed with zero mean
and variance 0 2 • In order to simulate the patterns described
by (28) a complete computer program conducting the simulation
and recognition process with its flow chart was included in
Appendix c.
![Page 29: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/29.jpg)
22
(b) Simulation:· ,R.es.ults ·
A normal distribution of the random function was assumed
since the beginning of this study. Thus, the simulation pro-
gram has to generate these patterns described by (28) with
normal statistics. It has been shown (16) that sum of any n
independent random functions is normally distributed when n
is sufficiently large.
Let Xl' x 2 , ••• ,xn be a sequence of n independent random func
tions with mean U and variance cr 2 •
Let No be a new random function
that x = x 1 + x 2 + . . . + X (29) n x - nll No = (30) .; ncr 2
then No has approximately a normal distribution N(O,l), i.e.,
with mean zero and variance "1".
TO avoid unnecessary computations, a proper value of n should
be used. For uniform random function distributed between (0,1)
p (x) = 1 0 < x < 1
= 0 elsewhere
then f!x P (x) dx = fol X dx = 1
J1 = '2
cr 2 f 1 2 dx := I:X2dX
1 = oX p (x) =12"
if chooses n = 12
then (30) reduces to
No = x - 6
( 31)
![Page 30: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/30.jpg)
In general, only uniformly distributed random functions
are available as a standard subroutine. Hence, by using
(31), a normally distributed function with mean "0" and
variance Ill" can be achieved approximately. For differ-
23
ent noise levels, (31) is to be multiplied with a factor.
N = a No ( 31-1)
a> 1
The two important phases of the study were, first, the recog
nition of patterns without learning taking place; second,
recognition with learning features. In the first phase of this
study, both mean vectors of the pattern classes and covar
iance matrix were assumed to be known in advance. The
likelihood probability or likelihood function of equation
(20) was calculated for each different pattern class. When
the largest P(X/S) or L(X/S) was found, a decision was made
subject to the decision rule outlined in (l8~1). In this
phase of study, no learning took place since the machine
was allowed to know the pertinent statistics beforehand.
For observing the behavior of the recognition machine,
patterns with different signal to noise ratios were simulated.
It was found that the misrecognition rate monotonically de
creased when signal to noise ratio increased which agreed with
the theoretical predictions, Equ. (16) which are developed
in Appendix B. The result of the first phase which start no
![Page 31: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/31.jpg)
24
learning taking place is shown in Fig. 10. In the second
phase of the study, the recognition machine was then isolated
from the generating source. In the beginning of this phase,
the machine had no knowledge of the mean vector of the pattern
classes; hence, an initial guess on the mean vector was
required for each pattern. The seguential learning procedure
started when a set of learning samples was fed into the
machine which had a learning feature. The initial guess on
the mean was taken as the real mean until the first learning
sample was fed into the learning machine. The recognition
procedure was essentially the same as that in the first phase.
The learning in this phase was directed so far as the learn
ing samples fed in the machine were labeled; in other words,
the pattern class of these learning samples were notified.
By the use of equ. (20) in Appendix A, the mean ,vector for
each pattern class was updated whenever a new learning sample
was fed into the learning machine. The updated mean in turn
would serve as the "real" mean. So long as the learning was
directed, the mean vectors learned in this manner, converged
to an acceptable level when a few samples were fed in so that
the misrecognition of the system~ymptotically approached the
theoretical minimum as those set by equ. (19) in Appendix B,
as the number of learning samples increased. The result of
the second phase was shown in Fig. 11.
![Page 32: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/32.jpg)
PE %
30 PE of Simulation
Theoretical PE
20
10 -90% Confidence Level ---::----
5 6 7 8 9 10 11 12 13 14 15
Fig. 10 Error rate as a function of signal to noise ratio
SiN
tv Ln
![Page 33: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/33.jpg)
50 PE
%
40
30
20
10
Fig. 11
Final PE = 3.6 %
1 2 3 4 5 6 7
Variance = 2 Initial Guess {20,20,20,20} for all patterns SIN =15
N
8 9 10 11 12 13 14 15 Number of Samples
Probability of mis-recognition as a function of nu~berof learning samples. .
N m
![Page 34: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/34.jpg)
27
In the previous discussion, the covariance matrix was assumed
fixed. Hence, channel characteristics were not changing
during the sequential learning period. Also, the rate of
change of the mean vectors of the pattern classes were assumed
very slow or stationary so that the "RealI! mean will finally
be learned when a considerable number of learning samples are
allowed.
(c) Conclusions and Further Recommendations for study.
The study in its entirety was on a simulation basis. One
can imagine the approach to the problem to be realistic,
subject to the restrictions posed throughout the study. The
sequential learning procedure was much more like an on line
operation. This strongly indicated the possibility of util-
izing this recognition scheme in a more general contex-t. The
removal of some of the restrictions in the theoretical deri-
vation would present no little difficulty. Since most problems
are more or less statistical in nature, the statistics of
other pattern recognition problems might not be the same as
the one here being studied. An optimal scheme for one statistic
may not be useful for the others. The development of a general
optimal recognition scheme should deserve the attentions of
researchers. In the learning phase, statistics were all assumed
fixed or time stationary. It would be more realistic to
remove this restriction so that a machine which could track
wandering patterns (9) ,
might be of more value to pattern recog-
nition problem.
![Page 35: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/35.jpg)
1.
BI;BLIOGRAPHY
Sebestyen, G. S., "Pattern Recognition by an Adaptive Process of Sample Set Construction", IRE Transactions, Vol. IT-a, April, 1962.---
28
2. Sebestyen, G. S., "Recognition of Membership in Classes", IRE Transactions on Information Theory,
3.
4.
5.
IT-7, January, 1961.
Cooper, P. W., "The Hyperplane in Pattern Recognition", Information and Control, Vol. 5, 1962.
Wainstain, L. A., and V. D. Zubakov, Extracti"on of Si~nals from Noise, Prentice-Hall, Inc., N. J., 19 2.
Spragins, J. D., Jr., "Reproducing Distributions for Machine Learning", Stanford Electronics Lab., Technical Report No. 6103-7, November 1963.
6. Patrick, E. A., and J. C. Hancock, "Non-supervised Learning of Probability Spaces and Recognition of Patterns", IEEE International Conve"nt"ion Record, April, 1965.
7. Patrick, E. A., and J. C. Hancock, "Non-supervised Sequential Classification and Recognition of Patterns", IEEE Transactions on Information Theory, Vol. I T-12, July 1966.
8. Hancock, J. C., and P. A. wintz, Signal Detection Theory, McGraw-Hill Book Company, 1966.
9. Abramson, N., and D. Braverman, "Learning to Recognize Patterns in a Random Environment",IRE Transactions on Information Theory, Vol. IT-S, April, 1962.
10. Braverman, D., "Learning Filters for Optimum Pattern Recognition", IRE Transactions" on Infbrma:tion Theory, Vol. IT-a, April, 1962.
11. Lowi tz, G. E., "Pattern Recognition Method Based on the Linear Separability of the Signal Space", Wescon 1963, Vol. 7, Part 6.
![Page 36: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/36.jpg)
29
12. Fischler, M., R. L. Mattson, O. Firschen, and L. D. Healy, "An Approach to General Pattern Recognition", IRE Transactions on Information Theor.Y, Vol. 8, IT-7, April, 1962.
13. Ackerman, E., Biophysical Science, Prentice-Hall, N. J., 1962.
14. Middleton, D., An Introduction to statistical Communication Theory, McGraw-Hill, Inc., 1960.
15. Derusso, and M. Paul, State Variables for Engineers, John Wiley and Sons, Inc., 1965.
16. Meyers, P. L., Introductory Probability and statistical Applications, Addison-Wesley Publishing Company, 1965.
17. Lowan, A. N., Tables of Probability Functions, National Bureau of Standards, 1941.
![Page 37: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/37.jpg)
30
Appendix A
Learning by the Recognition Machine
(a) A recursion formula for machine learning.
Previously, in the initial description of the learning
machine shown in fig. 5, the mean vector and covariance
matrix were assumed to be known. It would be more realistic
to assume that neither the mean nor the covariance matrix
is known, unless a prior learning procedure has been applied.
Fortunately, it usually is the case that a set of learning
samples, xl' x 2 ' .•. , xk ' are available from the teaching
source. It is realized that the random environment in which
"patterns" are inunersed is familiar to the machine. In other
words, the covariance matrix is known a priori. The problem
left is to acquire a knowledge of the mean by learning.
Since all members of the learning samples are selected from
same random sourcei the probability distribution function is
assumed to be of the same form, but with unknown parameters.
An initial guess of these on its parameters will be required,
and a distribution P{x} assumed, before these samples are
exposed to the machine. A new p.d.f. P(X/xl } will be formed
after the first learning sample is used. Eventually, a
final version of P(X/xI , x 2 ' .... ,Xk ) can be derived, when all
these learning sarnpleshave been fed into the learning machine.
Symbolically, it can be expressed as follows:
![Page 38: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/38.jpg)
-)0- •••••• +
or L(X) -+-
In its learning phase, the mean is denoted by:
].l is the real mean of the sample mean.
¢ is the variance of the random mean.
x = M + N
Hence X = M = J.1
Since the noise and sample. mean are independent, (9) then
Covariance (X) = Covariance (M) + Covariance (N)
= K + ¢
In order to start the learning cycle, a guess on the mean
must be made
where P(Xl) = c 3 (a priori probability)
t -1 P(XI/M) = c l exp~1/2 (XI-M) K (XI-M) t -1 P(M) = c 2 exp-l/2 (M-po) ¢o (M-].lo),
31
(2 )
(3)
(4)
(5 )
(6 )
Collecting those terms involving M on the right side of equ. (6)
and substituting itbecornes
t -1 t -1 = c exp 1/2 (XI-M) K (xl -M)-1/2 (M-].lo) ¢o (M-Vo)
= c exp 1/2 · {Mt (K- l + cp:l) M
t -1 t-l -2 (M K xl + M CPo vo)}
where c = c l c 2/c 3
let I -1 -1
K- + ¢ 0 = <Pl
* The learning of the mean is discussed for only one of the pattern classes.
(7 )
![Page 39: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/39.jpg)
33
The iterared form of equ. (13) will assure the con
vergence of the learning process. A well known theorem C5L.tn
statistics will provide more powerful proof.
The II Zero . .,. one law" says:
liThe sequence P(B/Yl 'Y 2 ' .... ,Yn ) of conditional probabilities
of a property B of the sequence Yl , Y 2 J o:"'Yn given the first
n terms of the sequence converges almost surely to 1 or 0
according as the sequence has or has not this property."
If t is the true value of t, then o
n -+- co
Where 8(t-t ) is the Dirac delta function. o
C15L
In the event that observations Yl , Y2 , ... do have the prop-
erties of t, then the samples fed into the learning machine
certainly will have the property of a specific mean. This
was guaranteed by the fact that all the learning samples are
selected from a certain pattern . In other words, the learning
is directed. It is sufficient to assure the convergence of the
learning process, though it is not necessary.
The convergence of the learning of the mean can be seen more
clearly by rearranging equ. (13) in terms learning samples and
then substituting (12) into (13),
~2 = K (<1>1 + K) -1 <1>1
K [K (K + <1>0) -1 CPo + K]-l ·K (K + ~o)-l cf>o
<1>0)-1 1]-1 K-1 K(!< -1 (16)
= K [(K + CPo + + CPo) cf>o
![Page 40: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/40.jpg)
also
).12
= K [(CPo + (K +CPo)]-l CPo
-1 = K [K + 2 ~ 0 ] cP 0
= K (K + cp )-1 ).11 + 1 -1
CP1 ( K + CPl) X2
= K (K + cP )-1 [K (K 1 -1 -1 + $0) llo + CPo (K + CPo) Xl]
+ CPl (K + CP1)-1 X2
= K (K + CP1)-1 K(K + CPo)-l ).10
34
+ K(K + $1)-1 CPo (K + $0)-1 Xl + CPl (K + CPl)-l X2
now,
K (K + $1)-1 K(K + CPo)-l ).10
= K [K (K + $0)-1 $0 + K]-l K(K + CPo)-l ).10
-1 -·1 = K [(K + $ 0) . {(K +$ 0) $ 0 + I} ] )l 0
-1 = K [K + 2 ~ 0] l.l 0
and -1 -1 -1 K (K + $1) $0 (K + $0) Xl + $1 (K + cP 1) X2
= K (K + CPo)-l CPo (K + $1)-1 Xl + $1 (K + $1)-1 X2
= K (K + CPo)-1 CPo (K + cP )-1 1 (Xl + X2 )
= cP 0 (K + $0)-1 K [K + K(K + CPo)-l $0]-1 (Xl + X2)
= CPo [ (K + CPo){I+ (K + $0)-1 CPo }J-l (Xl + X2)
= CPo [K + 2 CPo] -1 (Xl + X2 )
so that
It can similarly be shown by induction:
![Page 41: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/41.jpg)
113 = K[34>0 + K]-l 110
¢3 = K [3<po + K]-l ¢o
Hence a recursion formula
found to be:
-1 lln+1 = l< (K + n~o ]
¢n+1 = K[K + n~o]-l
1 K[k + ¢0]-1 or J.l n +1 = n n
¢n+l k
[~o k -1 = - + -] n n
It can be seen that 1 n - LX nn=ln
Lim ~h+l = ~ (null set) n+oo
35
+ <Po [3 ~o + K]-l (Xl + X2 + X3 )
(18)
for adapting the random mean is
n~ 0] -In n
)10 + CPo (K + L (Xi)
i=l (19 ) <Po
[ k -1 1 n x· J.lo +¢o <Po + -] ill n 1- (20)
i=l
CPo
(21)
(22)
According to statistic definition, (21) is just the real mean.
(22) indicates that after a large amount of samples have been
processed, a constant mean vector would be updated and become
the real mean vector. The learning behavior has been simulated
in computer; an apparent convergence property of the learning
process was observed, though the learning samples were finite.
![Page 42: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/42.jpg)
36
Appendix B
1. Definition of error
While the learning samples were well selected, statis
tics are established. In the case where the random environment
was known to the learning machine, a reasonable assumption was
that the covariance matrix was known. The remaining job for
the machine is to set up an adaptive scheme to update the
random mean until the real mean or an acceptable updated mean
vector has been found.
In a realistic situation, only finite number of learning
samples are available. Due to this physical limitation,
termination of the machine learning within a short period of
time is inevitable. Hence, errors due to misclassification
are likely to be produced by the recognition machine. The
asymptotic behavior of the learning phase was discussed in
Appendix A. In this section the relation of probability of
error as a function of the signal to noise ratio and the cross
coefficient among patterns will be determined. The binary
case is first considered using probability theory. A general
formula for predicting the over all probability of error for
the system can be found.
Generally, patterns are in the form of continuous time
function
O<t<T
O<t<T
![Page 43: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/43.jpg)
37
In order to process those patterns in a digital type recog-
nition machine, discrete sampling must be done before
presenting them to the recognition machine. Sl(t), S2(t)
will be mapped into an m-dimensional vector space.
i.e.
Sl : {ull
j u 12 j u j} , , ..... , 1m
S2 . {u2l j u 22
j u j} . , ••••• I 2m
t
(1)
all are now in vector form. Using Bayes rules a likelihood
function is found when an unknown sample, X, is presented
to the recognizer. As previously obtained, the likelihood
ratio function is:
where P (X/S 2) 1 1 (X-S 2) K- 1 (X-S2
) = m/2IKI1/2
exp "2 [21T ]
(2)
P (X/S 1 ) 1 1 (X-S 1 ) K- 1 (X-S 1) = exp
(21T] m/2 I K 11/2 2
K is the covariance matrix
The likelihood ratio is:
1 -1 1 -1 L(x) = exp ~ (X-S1 ) K (X-S1 ' -"2 (X-S 2) K (X-S 2 ) (2-2)
It is to be compared with a threshold value Q. (9)
A decision being made as:
When L (X) > Q Classify X to be S2
L (X) < Q Classify X to be Sl
Errors occur when the machine recognizes X as being from Sl
when X is really from S2' and vice versa.
![Page 44: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/44.jpg)
38
For convenience in calculating the probability of error,
let:
a = False dimissal (3)
= Probability of rejecting 81 when it is true
a = False alarm (4)
= Probability of accepting 8 2 when it is 81
If the samples are partitioned so that 8 1 is in region Dl and
8 2 is in region D2 , then
( 3-1)
( 4-1)
Assign ql' q2 to be the a priori probability of 8 1 , 8 2 respec~
tively.
Then, the probability of error will be
Fe = qla+ q2 8
For equally likely events when
1 ql = q2 = "2
(5) is simply I
Pe = a = a
Equ. (6) implies the threshold value implies Q = 1
h ' , (8)
It can be shown that the best c o~ce for Q lS
q2C S Q = qlC
a
(5)
(6)
(6-1)
Where C is the cost of making an error of the first kind (a) a
and Ce is the cost of making an error of the second kind (8)
Instead of considering (2-2) and taking the logrithm on both
![Page 45: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/45.jpg)
39
sides, let V = log L(X) = ~ (X-51 ) K-l(X-S l )
1 -1 -2 (X-8 2)K t X- 8 2 ) (7)
Since the right side is composed of X, V is also a random
function with Gaussian distribution.
Choosing C~ = CS ' V is now to be compared with the thres
hold log Q = log 1 = 0,
X is recognized as 52 when V>O
X is recognized as 8 1 when V<O
According to (3-1) and (4-1)
co ~ = J oP(V/8 l ) dv
00 ( 8)
e = J oP(V/8 2 ) dv
The mean and variance of the random variable must be found
before the computation of either ~ or 8. Without due compli-
cated manipulation, further assumptions were listed below:
(a) 81 and 8 2 are of equal energy
i.e. J~8i dt = J'[S~ dt = E
-l<p<l
Where p is the cross-coefficient between two patterns,
i.e., it is the measurement of its similarity.
(b) The noise is white Gaussian.
then K = 02
I
(7) is simplified:
v = XtK- l (82 - 8 1 )
when X = 8 + N* 1
*81 is equivalent to Ml for clarity of notation.
(9)
![Page 46: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/46.jpg)
N = 0
v = 8 t K- 1 1 1
t = (8 1 8 2
Hence, from the assumptions of (9)
It becomes
= E (p - 1) vI 2
0'
This is the mean when X is 81
,
The variance has nothing to do with 81 or 8 2 -
Variance of v = Variance eX = N)
= Nt R- 1 (8 -8 ) -Nt K- 1 (8-S ) 2 1 2 1
= (S -S )t K- 1 NtN K-1(S -S ) 2 1 2 1
By definition
K = Nt N
t 80 variance of v = (8 2-81 )
t t t = 8 2 S2+8 1 8 1 - 28 1 8 2
= 2(E - pEl 2
0'
= 2E (1 - . p) 2
0'
Substituting (9) and (10) into (8), then
Let
CI. = - 2 (v - v,) - dv
2·Variance J co 1 , . . exp -
o (2TIVariance) 1/2
co = 10
1 [V + E(1-p)/0'2 ]2.
exp - ~~;==;~~
v v' = ~~~==~=
2/E (1- p ) / cr 2
2/E(1-p)/o2 2/E(1-p)/cr 2
dv
40
(10)
(11)
![Page 47: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/47.jpg)
41
co Then a. = fo
1 exp - [v' + IE(l-p) J2 dv' 14 cr 2
( 12)
Again let v" = Vi + IE (l-p)
I 4(12
So a. = J;E(l-P) 1 2
4(1 Z I7f
exp - v"
(12-1) is of the error function form
1 [1 erf (IE (l-p) ) J = 2" - 4 0 2
from (6-1)
then P 1 [1 erf (IE (l-p) ) ] = 2" - 40 2 . e
( 12-1) dv"
( 13)
(14 )
This is the probability of error of binary pattern recog- .
nition. Overall probability of error of the multiple pattern
case can be derived on the binary basis.
Suppose n patterns are observed. There would be n(n-l) 2 '
different probabilities of error independent mutually exclusive
events. Let the overall probability of error be denoted by PE'
Exploiting a theorem developed in probability theory. (16)
p 13 =
p 12 =
n(n.;.l) n(n ..... l) n(n-:-l) 2 2 2
PE = L p, - L P ,P. + L i=l e1 i<j=l e1 eJ
T K2f~oX Sin x dx f o Sl83dt = 100 = T 2 dt f 081 f~O x 2 dx
0.99
P . P . P k-'" •• e1 eJ e . .. (15)
K2 = Normalization Constant of pattern" 3"
12 50 = 13
f~ 8 1S2dx = Klf~Ox2xdx = 0.968
f~Ox2dx fT 8 1
2 dx
0
![Page 48: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/48.jpg)
Kl = Normalization constant of Pattern "2"
1 15 =
50 13 T f ~ OKI K2
x 2Sin x foS283 dx lOOdx
P23 = 2 = T
dx f~o x2
dx foSl
. pe12 = . . 1 [l-erf (v'~-. (1-P12)] "2
Pe13 = 1 [l-erf (/~ (l-p 13) ] 2" 46 2
pe23 = 1 [l-erf (/~ (1-P13) ] 2"
:2lA) =:= p~pe12
PCB) = P o Pe23
P (C) = P o Pe13
Po 1
A priori = "3 probability of misrecognition for each
binary case.
Since eventsA, B, and C are equally likely.
A error between 8 1 , S2
B error between 8 2 , 53
C error between 53' 51
P ::: P(At'lBnC) = peA) + PCB} + pCC) E ·
- P (AnE) - P (BnC) - p (cnA) + P (At'lBnC)
or
PE = peA) + PCB) + p(c) - P(A}P(B) - P(B)P(C) - P(C}P(A)
+ P (A) P (B) P (C)
42
(16)
(17)
(IB)
(19)
P(A), PCB) or p(C} can be calculated from equation C17}; each
represents the binary error between two patterns.
![Page 49: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/49.jpg)
The theoretical prediction of misrecognition was
shown in Fig. 10.
43
![Page 50: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/50.jpg)
Appendix C Flow Chart of the Simulation
Gaussian Function Generator
Start Generate RAND (0)
VAR=Constant o <RAND (0) <1 Uniformly-distributed
. 5 J--_<
Pattern "111
0 .- ~ Pattern
"2 "
Pattern " 3 !I
Pattern Generator
1---401 E= - 6
E = E+RAND (0) ' i=l 12 No
U(K,I,J)=M1j+E*VAR
K=l, 1=1, 50D,J=1,4
Yes
U(K,1,J)=M2j +E*VA
U(K,I,J)=M3 ,+E*VAR ~------~ . ) K=3,1=1,500,J=1,4
44
![Page 51: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/51.jpg)
45
Calculation of Covariance Matrix
3 500
r-- D(K,I,J)=U(K,I,J)-M(K,J) AA(IR,IC)=L ~ D (K, L, IR)IXK, L, IC) 11 f- K L 1500 K=I"13 , J=I,4
~,' 12 Inverse AA
Subroutine
Recognition
KK=RAND (0) . 3+ 14J----t Error=O
Indx=1,500 ----=y;..::e=s"--~ XX (J) = U (KK , I , J) 1------;
15 I=Indx, J=1,4
~ P(X/SKX ) ~ -~[XX(J)-M(KX,J)]tAA-1(XX(J)-M(KX,J)]~ 15~--~ ~
Or P(KX) KX = 1,3
16~ __ ~PLIKELI=P(1!' ~~IK~~[t~z:~Y~e~S~PLIKELI=PLIKEL K~l T '." K~l
____ ~PLIKELI:;::P ( 2 )1-----117 KX=2
~ ____ ~~~~~~DD~~~~PLIKELI=PLIKELI~~ ~ . KX=KX
~==~Error=Erro Indx
=Indx+l Yes Error rate=
Error 500
STOP
![Page 52: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/52.jpg)
___ LCV CL:_W 1).,-' .... 6u.6 ________ _ ~ at UJliV0rsi ty of' ~tisGouri at Rolla
S • 0 () 0 1 0 H~ r.: N S t ON t.J ( 3 , ~ 0 (l ,It) , U \1( 3 ,/d , D C3 , ') 0 f) ,It) , l\ II ( 4 , ,.-) , X X ( I .. ) ~ • f)Q()" . n'r:~ PO'I c:; I f)'-! II P.!V ( I" Itl ,'1 )(1·' {4 , , D8 n9 Y ( "3) , "~!:.!J.'t , , nK,.: ( " , , (U!~O, 'tOJ ______ _ S • 0 DO J P I :.\ F f\1 S rON /I I,' P ( '. , 't ) , ~~'I 0 X ( It ,4 ) , A X ( 't , 't ) , /I. Y ( it ,4 ) , A C ( 4 , 4 ) , ,\ 0 ( 4 , 4 ) S.onO/. OIf.!ENSTO~J IJHA(4) ,lJ~f-\(/t>
. _. ___ . S_._O.0J2 ') r: p S ::: () • () Q Q 0 L ___ -:... __ 5 • 0 () fJ () ! X = 5 0 0 S.0007 JS=4
___ -'5---. • ..:..:0 () f) C\ ~ I = J S . 5 • (H) n I) t·) /)::; J ~ C).on)o ~X=12
._ .. . __ S .. D iUJ C{= S0.-:'3..I ( '; './3."...''--____________ _ S.0017 CZ=S~RT(?/3.)
S. no 1 " Dn ')00 KV='t, 75 Stonl'f 'VK=I<'V $.0015 v~n=SQRr(VK) S.0016 1 on 20) KA=1,3
__ ~S_.i)_0J 7 K=K A s.OOt~ X=20. S.OOlQ on 20 J=l,JS c:; • () f)? n n Il_?? '_=_l!o..... :r..' 1...1 .:..:..x __________________ ~ __
~.0021 F=-6. s • 0 () ? ? [l n 2 " 1-1 == 1 , 1 2
____ s_._O!.12_j '11+ E = F +R ,'\N n ( IJ .. ) ______ _ . s. \) ():? I,
S.OO!.') JF(I(-?)S,f:,7
!i I r: ( x - ~) (l • ) 5 1. , '> 1 , 5;> ___ ('~.J)n?:,) ~l y:.x
S.0027 GO TO 2~~ s./)n?fI 5;'! Y=lOO.-X
____ S..!!1 n . .: C} GO Tn ??? S.n030 6 TF(X-50.)61,~1,62
S • I} I) ,) 1 6 1 Y :: X * ~: ?I i) 0 • ___ h(~J .\ ? y = Y t" r. h'
S.Cli)l.l GO TO 222 S • 0 () .VI 6 ? Y = ( 1 0 (). - X ) * "" ? / 5 0 •
___ 5_._0 Pl'> Y...:.x.~....:.·r.~.\:..:..'I;-;? __________________________________ _ s.n03h GO Tn 222 $ • (I.) J, 7 -, . P T = 3. 1', 1 I) <}
___ (, • f' 'L\ .q y:: S PIC 0 T >!<)( /1 (In. , * 50. ,j::.
S.l/lnq Y=Y*Cl m S.(ln~n 77znCK,r,J)=E""VAR
![Page 53: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/53.jpg)
___ 5. nO!t 1_. 1/ U I Y .. •. L,_,LI = yt-l:..:;::"~.:' ____________ ~ ______ . __
S.OQ~2 U~(K,J)=Y
S.0043 20 X=X+?O. _____ ~s~,0n4~ 701 (nrlrT~~~!t~I~r:~~ ___________________________________________________ ~ _________________ __
s . OOlt '5 f)O 23 K -= 1 ,3 S.on4~ WRITE (1,1031 ~
. ____ <;_.J)J2.'t 7 \~E.-U.E_'-ld. 0 0) (l/'" ( K , J ) , J =~s _!.-) _______ ---------_______ _
<;.n04~ no ~3 1=1,10 <; • () ()It " 2 '3 I,m r If: (~ ,1 () 0 ) (IJ ( K ,I , J ) , J ::: 1. , .J S) tl S.I)()t;:l I :::'1()0
S.0051 hL~1500.
s.nos? no ~5 IR~l,JS _____ 5. ~)05_~ DO 'lS_l[.= l.t~J~S~ _________ ____________________ ,
S • i) 0 I) I.. f\ h ( r R , r c ) ::: 0 • s.oo~s nn 17 K=I,1 S • n i) S ~., I) n ~ 7 J = 1 , L s • () 0') 7 37 A A ( 1 R r I, C ) == II. ,\ ( r R ,Ie) + D ( K , J , I R) ~ .. 0 ( K , J , J C ) , S.0058 15 AA( IP,TC)=,I\I\( P<,IC)/txL
___ --"s • I) 0 5 '1 P n If? I R = 1 , ,~t.5 s • 0 r) .) 0 3 6 H R J T r: (3, 1 0 0 ) (.', tJ ( I R , Ie) , I C= 1 , J S ) , I R
C MEAN AND CnNVA~!~NCE HAVE REEN OBTAINED S • n n (, 1 n (1 1 I) 0 T = ] , ,I S S.0062 no 150 J=l,JS S • I) n I, .\ 1 5 0 ,6 ( I ,J ) = .':i f, ( I ,J )
.. _______ s_ .. _I')D_~~t r. ilL l,_Li'l V~L( A. ~,!.~::> S, f)F L T A, f\lDI __ _ $.006') nr) 1601=1,,15
---------------,
<; • 0 I) (, ~ f),'l 16 () .1 = 1 , J S \.onA? 160 AfNV(T •. I)::/I(T,J' ~.JnAq no ]q fR=l,JS 5.1')(>'1 39 \,I~)- IH (3,11)0) (.',T~N(IR,IC).TC-=l,JS),IR
. _ ___ ~_~~) . .Q7() ~JRJ fF (3,102) 1'::'. fA S.0071 fNOX=l ------------------ -----'--------_._--
S • 0~)7? fRR=O. S. ')f!71 It? KK=~/I,..:II(()*3.+!.
<; • I) 0 7 I.. J F ( K K - ~~ ) 1 7, 1 7 ,1 ~ S.0075 18 KK=1
__ -<- . __ S .,.0 n 7 6 17 DO? 5 J = 1 , J s~---~------~---').')()77 25 XX(J)=U(KK,IN~>:,J)
S • ('\()7 :-l ~. :) ',)7'1
').0,1 'H)
C MATRIX MULTJPLlflTION K p= 1
45 no 53 J=l,JS 53 DXM(J)=XX(J}-UW{~P,J)
'--------- ------- -
,j::,.
-...J
~- .------.~ - ----
![Page 54: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/54.jpg)
____ 5. OO~_l. s.on~~
S.0,)':'3
, ___ D!1_,...!:;'/-L-r_:-__ 1-,.J_~s ___________________ _
AKf·! (r )=0. 00 5/+ J;:: 1 , J S
S , n I) '1 /, C 't I; '< r.q r ) '" ~ K '1( 1 )+ t, T \, V ( T t ,n .:: D >n'l (,1)
s.on"5 AKM(KP)=O. S.()(lq~ O()SO JP=],JS
_~~_~.D~R7 ~') rK~(KP)=PKM(KP)+nX~UPL)*~·~A~K~Mu(~I~P_'~ _________ ~ __________ _ S.')(HU PRnnY(KP)=-r.K~1(KP)·
C C1LClILfTlON ()F LTKF.LIHOOf) ~ • ,),' P') K P - ~,-,-P-,~-!- l,-__________________________________ _
S • 00 0 q T F ( i" P -] ) 1,5 f '+ 5, 55 S.OO~1 ~~ PLr~LI=rRO~Y(l)
__ , ____ S. ()IX::> K X = L ') .elf) ') , T r.( P l r K L I - P R [) ~ y ( ? ) ) 66 , £> 7 , 65 S • ,) l) q It h £> P L I K I. -T '" P r (1 r. y ( 2 ) <:; .. Jlfl(,)~___ _ _ K X = ? S.OOq~ GO TO 65 S.0007 67 ~RR=frR+l.
__ ---'~.lLOCJ R 6" T f ( P 1. J I< L~J ----'-P.:...:.R=O=3-:-Y-'("""3, ... )--"_7'--'1"--,,_7'--'2"--L.' -'----7-"'-5 __________ ~ ____________ _ S • 009 ') 7 1 P l. I K l I ;:: P ~~ 0 B Y ( 3 ) $.0100 KX=3 S,OHU__ r.n Tn 7~ ___ _ $.0102 72 [RR=FRR+l. 5.010) 75 rr-(KX-KKl7n,77,76
_____ S.,_'1JJL'. 7f) F~R-:::_ r:Rf:'_:t.l.,,---____ _ --------------S.ntos 17 INDX=TNDX+l
S • i) 10 .'" J f= ( T N f) Y - S () () ) 4 2 , I.? , P, 't ___ ,-,-S . .J.Uill .f) It F I~ I) ;:: r- Q n I " ,
S.OlJ~ \'/t'{IT[ (1,105) ::R~
S.0l0~ ~GO CONTINUF , ___ , _____ <:; _._nJ~ ST0'-'._p ______ ____ _
S.~111 100 r-OP:'1AT( 10X,4J=lR.?,I to) S.()11 ,? liP FfJR"Il\Tf lOX,3EIA.~) C.Jlll"\ 1('1 FI1RMt\T(fl0) S.()llf, left FOfH1/1T( 15X, T5,9H LEARNTNG) s.nll':> 105 Fm~MIIT(lOX,6H FQPOR,FIO.~)
r M~IN prf)~RIIM END 5.0116 END
S T nl~ 1\ G r: hIli P VI\~T"RlFS (TI\f;S: C=CO~1~1ON, F-=H1UIVhLH!CE)
U r~ ~1 F T .'\ ~ RF L I\nR Nlllvlf TAG Rt:L ADR NAME ·TAG REL AOR --------~---,----------~~------------------:-
NM~E ~ -co .. - --~----------
![Page 55: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/55.jpg)
. ----1..1 EV'-t_=-LJULOu..6 ________ _ ---I31LilS.I..3..6iLll1\ S [c r:JlR I R II •. J L'I_Jll._Cn~1£.lLAJ .lUl'L. _____ .
S. 0'1')' SIJtl,l.:n\IT H~f HNR T( A, N, r: p S, Df:L TA, ~JD) ____ SLJ.L-fl..·). P} __ f) J ,.( PJ S flV" h ( 4 0 , 40 ) , I~ ( Ii f)) , c: ( I~ 0) , r D ( It 0) , I (,) ( 40 )
S.()'103 DtLTlI = 1 .. 0 S.01l04 no 100 K=1,N
C CFTFIH1PJI\TION OF TI~F prvqT ElEI'1ENT S.I)I)()'j PIvrJr=O.O S.()()!)f) 00 101 T:-:K,N S.()f)()7 nq 101 .J-::-K,"I s • q I) 0 t'\ T F ( (\ q S ( ,\ ( T , J ) ) - A p. S ( ;:> J va T ) ) ~ 01 , 101 ,200 S • q i) 0 '1 2 (\ 0 P I V () T ~-' 1\ ( I ,J )
_______ <; .•. ILCttD ___ . T P ( :<_'-:_1 S.0011 rQ(KI=J
._--------_._---------_.
S.OOl? 101 ((J~JTINUE
["1Ft n=f):':1. T"'*PTvnT T t= ( A P S ( ~ I va T ) - E -=P'-=S-::)-=2:-:Q::":Q=-,-::2:-:9:-:C):-, 2:'-_ ::O:-l--------'-----~---------
S.Ol)l~
S.!)!)l/. C [XCH~NGE nF THE PIV0TAL ROW AND THE K-TH ROW
.s_~QQJ '5 ? 0 1 I F ( T,P.ll~_J_=_!S....Lf 02 , ? 0 3 , 2 0 2 S.OOl() ?O? KK=IP(K) S.0017 'no to? J=l,N
___ --LS.L.£lOl~ 7=tdKK,.11 S.OOl!') A(KK,J)=/\(I(,J) S.0020 10? ~(~,J)=l
________ • _1 •• _
S.()O?l S.OO??
C FXCI-iH;r-~ OF THF P TVOTf~L CClUH1N ANI) THE K-TH COLUMN ~3-Tr--ilr- ('V ) - i<Tz f)/t ,To-S~-X04--------'------~·
701 • . JJ= YO (Y. ) ___ --'-s~._'_'_() i)? , f) n 1 (l -:>, r::-: 1 , N
S.00?~ T~P=t(J,JJ)
S.no?,> ,f\(J,JJ)=ll.(l,K) . ____ s _~ n 11..?J:... ____ 1Ql...1'_LL-'_fU = TfW
C JORD~N SlEP S.~O?7 705 PJV~r=l./PTVOT
____ ~s~. _r.\():> , ~ n n 11) '+ .J = 1 ._~~ S • U J? '1 T r ( J- '< ) ? ()(Jt ·~2'-;O:-:7-;-,-:2;:-O;::-6:-----------------------------------..:..--
S.O~'0 706 "(J)=-A(K,J)*PIVOT ____ . __ S_·.(!'2.2J ______ ~ J) =_~~.LL!_.__'_K__'_) ______________________________________ .
S • 'vn ~ GO TO 2.()~
S J) () 3 3 ? 07 n. ( J ) = PI vrn S.n014 C(J)=l. S.nn3S lOA A(.J,K)=O. S.()\)1~ 104 !I(K,J)::=O~ "'" \D
![Page 56: A class of pattern recognition machines](https://reader031.fdocuments.in/reader031/viewer/2022012412/616c3140158ba44af92993a5/html5/thumbnails/56.jpg)
50
VITA
The author was born on February 21, 1940, in Kwangsi,
China. He completed his primary education in Taichung,
Taiwan, China, and also his high school education. He
entered Taiwan Provincial Cheng Kung University in September,
1958 and received his Bachelor of Science Degree in Electrical
Engineering in June, 1962.
He came to the United States in January, 1965 and enrolled
at the University of Missouri at Rolla, working toward the
Master of Science degree in Electrical Engineering Department.