ALGEBRAIC DERIVATION OF NEURAL NETWORKS … S44... · ALGEBRAIC DERIVATION OF NEURAL NETWORKS AND...

A L G E B R A I C D E R I V A T I O N O F N E U R A L N E T W O R K S

A N D ITS A P P L I C A T I O N S I N I M A G E P R O C E S S I N G

by

PINGNAN SHI

B. A. Sc. (Electrical Engineering), Chongqing University, 1982 A. Sc. (Electrical Engineering), The University of British Columbia, 1987

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in

THE FACULTY OF GRADUATE STUDIES

THE DEPARTMENT OF ELECTRICAL ENGINEERING

We accept this thesis as conforming to the required standard

THE UNIVERSITY OF BRITISH COLUMBIA April 1991

© Pingnan Shi, 1991

In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.

Department

The University of British Columbia Vancouver, Canada

DE-6 (2/88)

Abstract

Artificial neural networks are systems composed of interconnected simple computing units

known as artificial neurons which simulate some properties of their biological counter

parts. They have been developed and studied for understanding how brains function,

and for computational purposes.

In order to use a neural network for computation, the network has to be designed in

such a way that it performs a useful function. Currently, the most popular method of de

signing a network to perform a function is to adjust the parameters of a specified network

until the network approximates the input-output behaviour of the function. Although

some analytical knowledge about the function is sometimes available or obtainable, it is

usually not used. Some neural network paradigms exist where such knowledge is utilized;

however, there is no systematical method to do so. The objective of this research is to

develop such a method.

A systematic method of neural network design, which we call algebraic derivation

methodology, is proposed and developed in this thesis. It is developed with an emphasis

on designing neural networks to implement image processing algorithms. A key feature

of this methodology is that neurons and neural networks are represented symbolically

such that a network can be algebraically derived from a given function and the resulting

network can be simplified. By simplification we mean finding an equivalent network (i.e.,

performing the same function) with fewer layers and fewer neurons. A type of neural

networks, which we call LQT networks, are chosen for implementing image processing al

gorithms. Theorems for simplifying such networks are developed. Procedures for deriving

such networks to realize both single-input and multiple-input functions are given.

To show the merits of the algebraic derivation methodology, LQT networks for im

plementing some well-known algorithms in image processing and some other areas are

developed by using the above mentioned theorems and procedures. Most of these net

works are the first known such neural network models; in the case there are other known

network models, our networks have the same or better performance in terms of compu

tation time.

i n

Table of Contents

Abstract ii

List of Tables vii

List of Figures viii

Acknowledgement x

A Glossary of Symbols xi

1 I N T R O D U C T I O N 1

2 B A C K G R O U N D 6

2.1 Artificial Neural Networks 6

2.1.1 Neuron Models 8

2.1.2 Network Models 12

2.2 Image Processing 17

2.2.1 Image Enhancement 17

2.2.2 Image Restoration 26

3 D E F I N I T I O N S A N D N O T A T I O N S 32

4 R E A S O N S F O R A L G E B R A I C D E R I V A T I O N 40

4.1 Drawbacks of the Learning Approach 41

' 4.2 The Advantages of The Analytical Approach 46

iv

4.2.1 Designing the Hamming Network 47

4.2.2 Designing the Parity Network 48

4.3 The Algebraic Derivation Methodology 50

5 S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R N E T

W O R K S 52

5.1 Neuron Models 52

5.2 LQT Networks 55

6 C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S 59

6.1 Network Equivalence 59

6.2 Network Quality 64

6.2.1 Network Depth 64

6.2.2 Network Size 65

6.3 Criterion for Network Simplification 66

7 D E R I V A T I O N P R O C E D U R E S 67

7.1 Realization of SISO Functions 68

7.2 Realization of MISO Functions 80

8 N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S 88

8.1 Network Realizations of Linear Techniques 88

8.2 Network Realizations of Non-linear Filters 93

8.2.1 Dynamic Range Modification 93

8.2.2 Order Statistic Filtering . 95

8.2.3 Directional Filtering 106

9 N E T W O R K R E A L I Z A T I O N S O F IR T E C H N I Q U E S 110

v

9.1 Network Realizations of Linear Filters 110

9.2 Network Realizations of Non-linear Filters 112

9.3 Comparison with the Hopfield Network 116

10 A P P L I C A T I O N S I N O T H E R A R E A S 119

10.1 Sorting 119

10.2 Communication 126

10.2.1 Improvement over Hamming Network 126

10.3 Optimization 130

10.3.1 Solving Simultaneous Equations 130

10.3.2 Matrix Inversion 132

11 C O N C L U S I O N S 134

11.1 Summary 134

11.2 Contributions 135

11.3 Future Work 137

Appendices 140

A T H E G E N E R A L I Z E D D E L T A R U L E 140

B H O P F I E L D N E T W O R K A L G O R I T H M 142

Bibliography 143

v i

List of Tables

4.1 The Exclusive OR 44

4.2 Network Parameters For Solving XOR Problem 44

4.3 Network Parameters For Solving Parity-3 Problem 45

4.4 Network Parameters For Solving Parity-4 Problem 45

4.5 Number of Learning Steps 46

vii

List of Figures

2.1 A Biological Neuron 9

2.2 An Artificial Neuron 9

2.3 Typical Activation Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-

logic; and (d) Sigmoid 10

2.4 A Three-layer Back-propagation Network 13

2.5 The Hopfield Network 14

2.6 The Hamming Network 15

2.7 Directional Smoothing Filter 21

2.8 Color Image Enhancement 26

3.1 The Schematical Representation of a Neuron 33

3.2 The Schematical Representation of the Input Neuron 35

3.3 Three Ways of Setting Up an Input Neuron 35

3.4 The Back-propagation Network 36

3.5 The Hopfield Network 36

3.6 The Hamming Network 37

4.1 Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c)

Parity-4 problem 43

4.2 Realizing Function (4.4) 49

4.3 Realizing Function (4.5) 49

4.4 The Network for Solving the Parity Problem 50

viii

5.1 The Contrast Stretching Function 53

5.2 Typical LQT Network Topologies 55

5.3 Representing Part of a Network 56

5.4 A LQT network 58

8.1 Network Architecture for Linear Filtering 89

8.2 Network Realization of the Order Statistic Function for n = 3 99

8.3 The Schematical Representation of OSnet 99

8.4 The Schematical Representation of Adaptive OSnet 100

8.5 A Max/Median Filter Realized by Using OSNets 101

8.6 A Max/median Filter Realized by Using OSNets 102

8.7 A Network Model of the CS Filter 106

8.8 A Network Realization of the DA Filter for the Case iV = 3 109

9.1 A Network Realization of the Maximum Entropy Filter 114

10.1 A Neural Network for Sorting an Input Array of Three Elements 125 10.2 The Network Model for Implementing the HC Algorithm 129

ix

Acknowledgement

First of all, I would like to acknowledge the indispensable support of my wife, Angela

Hui Peng. She has maintained an environment under which I have been able to devote

all my energy to this thesis. Moreover, she did all the drawings in this thesis, which are

far better than I could have done.

I would like to thank my supervisor, Professor Rabab K. Ward for her guidance and

support. I am grateful to her for introducing me to this challenging and fascinating world

of artificial neural networks, and providing me the opportunity to explore it.

Special thanks are due to Professor Peter Lawrence for his kindness and recommen

dation of a book written by P.M. Lewis, which is invaluable for my research.

Thanks are also due to Professor Geoffery Hoffmann, whose courses on neural net

works and immune systems gave me fundamental understandings of the two related fields,

and inspiration to continue my research on the former one.

Finally, I would like to thank John Ip, Doris Metcalf, Kevin O'Donnell, Robert Ross,

Qiaobing Xie, Yanan Yin, and many others in this and other departments who have

made my study here pleasant and productive.

x

A Glossary of Symbols

Notation

C x1,...,xn | w1,...,w2 3

V 3

u,n 0

< Xx,...,Xn | ltf!,...,«;2 >

AxB

(Z X i , . . . , X n | 10!,...,102 •

{| X U . . . , X n | W1,...,W2 |}

o

I

Explanation

Membership relation "belong to" and its negative "does not

belong to"

Strict containment relations

Linear neuron

For all

There exists one element

Union, intersection

Empty set

Besides the usual "minus" operator, it also denotes set

difference

Quasi-linear neuron

Cartesian product of sets A and B

Threshold-logic neuron

Either one of the three neuron types

/ is a function form set X to y

Composition of two functions

Sets of natural numbers, integers, real numbers, positive real

numbers including zero, and binary numbers.

Cardinality of the set X, length of the sequence S, absolute

value of the relative number x.

End of proof

xi

I End of solution

Implication operators

A , V , 0 Logic AND, OR, XOR

x, x Vector

||x|| L2 norm of vector x X Matrix

1,1 Vector of one's

0,0 Vector of zero's

0,1 Zero or unit matrices

{} Denote a set; various forms are {x^, {x\P(x)}, and

{ x i , x n j , etc.

f, g Original and distorted one-dimensional images

U, V Original and distorted two-dimensional images

W Window for processing images

xn

Chapter 1

I N T R O D U C T I O N

In recent years artificial neural networks have been successfully applied to solve prob

lems in many areas, such as pattern recognition/classification [60, 70], signal processing

[24, 62], control [71, 9], and optimization [35, 37]. The essence "of neural network problem

solving is that the network performs a function which is the solution to the problem.

Currently, the most popular method of designing a network to perform a function is

through a learning process, in which parameters of the network are adjusted until the

network approximates the input-output behaviour of the function. Although some ana

lytical knowledge about the function is sometimes available or obtainable, it is usually

not used in the learning process. There are some neural network paradigms where such

knowledge is utilized; however, there is no systematic method to do so. It is the objective

of this thesis to develop such a method.

Many tasks, such as those in image processing, require high computational speed.

Examples are real-time TV signal processing and target recognition. Conventional von-

Neumann computing machines are not adequate to meet the ever increasing demand for

high computational speed [66]. Among non-conventional computing machines, artificial

neural networks provide an alternative means to satisfy this demand [19].

The study of artificial neural networks has been inspired by the capability of the

nervous system and motivated by the intellectual curiosity of human beings to understand

themselves.

The nervous system contains billions of neurons which are organized in networks.

1

Chapter 1. INTRODUCTION 2

A neuron receives stimuli from its neighboring neurons and sends out signals if the cu

mulative effect of the stimuli is strong enough. Although a great deal about individual

neurons is known, little is known about the networks they form. In order to under

stand the nervous system, various network models, known as artificial neural networks,

have been proposed and studied [78, 52, 33, 12, 21]. These networks are composed of

simple computing units, referred to as artificial neurons since they approximate their

biological counterparts. Although these networks are extremely simple compared with

the complexity of the nervous system, some exhibit similar properties.

In addition to understanding the nervous system, artificial neural networks (referred

to as neural networks hereafter) have been suggested for computational purposes. Mc-

Culloch and Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used

to realize arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957

[79], was designed with the intention for computational purposes. Widrow proposed in

1960 [95] a network model called Madaline, which has been successfully used in adaptive

signal processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his

network could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and

his colleagues rediscovered the generalized delta rule [80], which enables a type of net

works known as the back-propagation networks, or multi-layer perceptrons, to implement

any arbitrary function up to any specified accuracy [30]. These contributions have laid

down the foundation for applying neural networks to solve a wide range of computational

problems, especially those digital computers take a long time to solve.

From the computational point of view, a neural network is a system composed of

interconnected simple computing units, each of which is identical to the others to some

extent. Here follows an important question: How can these simple computing units be

used collectively to perform complex computations which they can not perform individ

ually?


This problem can be attacked from two directions: (1) assuming a network model

(usually inspired by certain parts of the nervous system), adjusting its parameters until

it performs a given function; (2) assuming a set of simple computing units, constructing

a network in such a way that it performs a given function. The application of the back-

propagation networks is an example of the former approach; the designing of the Hopfield

network is an example of the latter approach. Due to reasons given in Chapter Four, we

have chosen the second approach.

Most existing network models were designed for the purpose of understanding how

the brain works. As a result, these models are restricted by the characteristics of the

nervous system. We, however, are mainly concerned with applying some computing

principles believed to be employed in the nervous system, such as collective computing,

massive parallelism and distributed representation, to solve computational problems.

Consequently, the network models we are going to develop shall be restricted only by

today's hardware technology.

Therefore, a more precise definition of the task we are undertaking is due here. The

problem we are concerned with is, assuming a finite supply of simple computing units,

how do we organize them in a computationally efficient way such that a network so

formed can perform a given function. By simple, we mean that these units can be mass

produced with today's hardware technology. By computationally efficient, we mean that

the network so formed can perform the function in a reasonably short period of time. The

latter restriction is necessary not only because computational speed is our main concern

but also because there exist an infinite number of networks which can perform the same

function.

Although a lot can be learnt from the works of researchers such as Kohonen, Hop-

field, and Lippmann on the constructing processes of their network models, a systemat

ical method of network design has not yet been developed. They relied mainly on their


imagination, intuition, as well as inspiration from their knowledge of the nervous sys

tem. However, to effectively use neural networks as computing machines, a systematical

method of network design is very much desired. It is our objective to develop such a

method.

In this thesis, we propose a methodology of network design, which consists of the

following five stages:

1. Find the minimum set of neuron models for a given class of functions;

2. Devise symbolic representations for these neurons and their networks;

3. Establish theorems for manipulating these symbols based on the computational

properties of the neurons and their networks;

4. Establish procedures for deriving neural networks from functions;

5. Use these procedures and theorems to derive and simplify network models for spec

ified functions.

We call this methodology algebraic derivation of neural networks.

This methodology is developed with an emphasis on deriving neural networks to

realize image processing techniques. Procedures and theorems for network derivation

are developed. They are used to construct network models for realizing some image

processing techniques and techniques in other areas. The results we have obtained show

that our methodology is effective.

This thesis consists of eleven chapters. Chapter Two provides some background in

formation on neural networks and image processing. The materials in the rest of the

thesis are believed to be new and are organized as follows. Chapter Three gives formal

definitions of neurons, neural networks, and related concepts. Chapter Four elaborates


on the reasons for our approach; we shall show the limitations of learning as used in

back-propagation networks and the limitations of conventional network design method

as exemplified by the Hamming network. Chapter Five finds a set of neuron models

which are used to form networks for realizing image processing techniques, and provides

symbolic representations for these neurons and their networks. Chapter Six develops

theorems for network simplification based on some basic computational properties of the

neurons and their networks. Chapter Seven gives procedures of algebraic derivation and

explains them through some examples, which in turn are useful in network derivations

in later chapters. Chapters Eight and Nine show examples of deriving neural networks

to realize image enhancement and restoration techniques respectively. Chapter Ten con

tains examples of deriving neural networks for realizing some techniques in other areas.

Chapter Eleven concludes our work and speculates on the future work.

Chapter 2

B A C K G R O U N D

This chapter gives necessary background information on neural networks and image pro

cessing. Section 2.1 gives an overview of neural networks with an emphasis on their appli

cations for computing. Concepts of neurons and neural networks are introduced. Some

neuron and neural network models are described. Among the network models proposed

in literature, the back-propagation network, the Hopfield network, and the Hamming

network are chosen to be described in more detail since they are good representatives of

their classes and also they will be referred to further in later chapters. Section 2.2 gives

an overview of problems in image enhancement and restoration and techniques used to

solve them.

2.1 Artificial Neural Networks

People have long been curious about how the brain works. The capabilities of the nervous

system in performing certain tasks such as pattern recognition are far more powerful than

today's most advanced digital computers. In addition to satisfying intellectual curiosity,

it is hoped that by understanding how the brain works we may be able to create machines

as powerful as, if not more powerful than, the brain.

The nervous system contains billions of neurons which are organized in networks.

A neuron receives stimuli from its neighboring neurons and sends out signals if the cu

mulative effect of the stimuli is strong enough. Although a great deal about individual

neurons is known, little is known about the networks they form. In order to understand

6

Chapter 2. BACKGROUND 7

the nervous system, various network models, known as artificial neural networks, have

been proposed and studied. These networks are composed of simple computing units,

referred to as artificial neurons since they approximate their biological counterparts.

Work on neural networks has a long history. Development of detailed mathematical

models began more than 40 years ago with the works of McCulloch and Pitts [63], Hebb

[29], Rosenblatt [78], Widrow [95] and others [74]. More recent works by Hopfield [33, 34,

35], Rumelhart and McClelland [80], Sejnowski [82], Feldman [18], Grossberg [27], and

others have led to a new resurgence of the field in the 80s. This new interest is due to the

development of new network topologies and algorithms [33, 34, 35, 80, 18], new analog

VLSI implementation techniques [64], and some interesting demonstrations [82, 35], as

well as by a growing fascination with the functioning of the human brain. Recent interest

is also driven by the realization that human-like performances in areas such as pattern

recognition will require enormous amounts of computations. Neural networks provide

an alternative means for obtaining the required computing capacity along with other

non-conventional parallel computing machines.

In addition to understanding the nervous system, neural networks have been suggested

for computational purposes ever since the beginning of their study. McCulloch and

Pitts in 1943 [63] showed that the McCulloch-Pitts neurons could be used to realize

arbitrary logic functions. The Perceptron, proposed by Rosenblatt in 1957 [79], was

designed with the intention for computational purposes. Widrow proposed in 1960 [95]

a network model called Madaline, which has been successfully used in adaptive signal

processing and pattern recognition [96]. In 1984, Hopfield demonstrated that his network

could be used to solve optimization problems [35, 36]. In 1986, Rumelhart and his

colleagues rediscovered the generalized delta rule [80], which enables a type of networks

known as the back-propagation networks to implement any arbitrary function up to any

specified accuracy. In European, researchers such as Aleksander, Caianiello, and Kohonen


have made conscious applications of computational principles believed to be employed

in the nervous system [3]. Some commercial successes have been achieved [4]. These

contributions have laid down the foundation for applying neural networks to solve a wide

range of computational problems, especially those digital computers takes a long time to

solve.

2.1.1 Neuron Models

The human nervous system consists of IO 1 0 - 1 0 u neurons, each of which has 103 -

104 connections with its neighboring neurons. A neuron cell shares many characteristics

with the other cells in the human body, but has unique capabilities to receive, process,

and transmit electro-chemical signals over the neural pathways that comprise the brain's

communication system.

Figure 2.1 shows the structure of a typical biological neuron. Dendrites extend from

the cell body to other neurons where they receive signals at a connection point called

a synapse. On the receiving side of the synapse, these inputs are conducted to the

body. There they are summed, some inputs tending to excite the cell, others tending to

inhibit it. When the cumulative excitation in the cell body exceeds a threshold, the cell

"fires"—sending a signal down the axon to other neurons. This basic functional outline

has many complexities and exceptions; nevertheless, most artificial neurons model the

above mentioned simple characteristics.

A n artificial neuron is designed to mimic some characteristics of its biological counter

part. Figure 2.2 shows a typical artificial neuron model. Despite the diversity of network

paradigms, nearly all neuron models — except for few models such as the sigma-pi neuron

(see [15]) — are based upon this configuration.

In Figure 2.2, a set of inputs labeled (xi,X2, ...,XN) is applied to the neuron. These

inputs correspond to the signals into the synapses of a biological neuron. Each signal


From other neurons

Dendrites

To other neurons

Figure 2.1: A Biological Neuron

XN

(Ol

(Oi

f(.) Activation Fuction

Figure 2.2: A n Art i f i c ia l Neuron

Xi is mult ipl ied by an associated weight w, 6 i w 2 , W N } , before it is applied to

the summation block, labeled Each weight corresponds to the "strength" of a single

biological synapse. The summation block, corresponding roughly to the biological cell

body, adds up al l the weighted inputs, producing an output e = Yl? w%Xi. e is then

compared wi th a threshold t. The difference a = e — t is usually further processed by

an activation function to produce the neuron's output. The activation function may be

a simple linear function, a threshold-logic function, or a function which more accurately

simulates the nonlinear transfer characteristic of the biological neuron and permits more

general network functions.

Most neuron models vary only in the forms of their activation functions. Some ex

amples are the linear, the quasi-linear, the threshold-logic, and the sigmoid activation

functions. The linear activation function is

fL(ct) = a

The quasi-linear activation function is

a if a > 0

0 otherwise

The threshold-logic activation function is

1 if a > 0

0 otherwise M<x) =

A n d the sigmoid activation function is

Figure 2.3 shows these four activation functions.

f(cx) A

l

• a

f(a) •

1

a - • a

(b) (c)

f(a) •

(d)

(2.1)

(2.2)

(2.3)

(2.4)

a

Figure 2.3: T y p i c a l Act ivat ion Functions: (a) Linear; (b) Quasi-linear; (c) Threshold-logic; and (d) Sigmoid.

The threshold-logic and the sigmoid neurons are the most commonly used neuron

models among all the models proposed i n the literature. Linear neuron model was used


in the early days of neural network study. Quasi-linear neuron model is rarely seen in

the literature.

The threshold-logic neuron model was used by McCulloch and Pitts [63] to describe

neurons which were later known as McCulloch-Pitts neurons. Rosenblatt later used this

model to construct the well-known Perceptron [79]. Lewis and Coates also used this

model, which they called T-gate, to form networks to implement logic functions [57].

The Hopfield network proposed by Hopfield [33] also uses the threshold-logic neurons as

the primitive processing units.

The sigmoid neuron model was used out of necessity by Rumelhart and his colleagues

to construct what is later known as the back-propagation network [80]. This neuron

model is used for the convenience of back-propagating errors. Variants of this model are

used in networks which are trained by error back-propagation.

The linear neuron model was used by Kohonen and Anderson respectively to construct

networks known as linear associators [53, 5]. Since networks of linear neurons are easy

to analyze and their capacities won't increase as the number of layers increases, they are

not academically challenging. Nevertheless, the practical use of linear neurons can not be

overlooked. They are important in performing certain tasks such as matrix computations.

Moreover, they are the basis of other more complex neuron models.

The quasi-linear neuron model is used in this thesis as a gating device (see Chapter

Seven). This model can be viewed as the neuron model, which is sometimes referred to as

threshold-logic node (see [58]), without saturation. The quasi-linear activation function

was used by Fukushima to construct S-cell which is one of the primitive processing units

of his network model (see [21]).


2.1.2 Network Models

Although a single neuron can perform certain pattern detection functions, the power of

neural computing comes from connecting neurons into networks. Larger, more complex

networks generally offer greater computational capabilities. According to their structures,

neural networks are classified to two categories: feedforward and feedback (recurrent)

networks.

Feedforward Networks

In feedforward networks, neurons are arranged in layers. There are connections between

neurons in different layers, but no connection between neurons i n the same layer. The

connection is unidirectional. The output of a feedforward network at time k depends

only on the network input at time (k — d), where d is the processing time of the network.

In other word, a feedforward network is memoryless.

M a n y neural network models belong to this class, examples are Perceptron, back-

propagation network, self-organizing feature map [52], counter-propagation network [30],

Neocognitron, and functional-link network [50]. In the following, the back-propagation

network is described since it is the most popular one i n neural network applications.

The back-propagation network was introduced by Rumelhart and his P D P group

[80]. The neuron model used i n this network is the sigmoid neuron. Figure 2.4 shows a

three-layer back-propagation network.

The back-propagation network is used to implement a mapping from a set of patterns

(input patterns) to another set of patterns (output patterns). The set composed of both

sets of input and output patterns is referred to as the training set. The implementation

is done by obtaining parameters (weights and thresholds) of a network through a error-

back-propagation training procedure known as the generalized delta rule (see Appendix


v i y, y„

Xi X; XN

Figure 2.4: A Three-layer Back-propagation Network

A).

The training begins by initializing the parameters with random values. Then an

input pattern is fed to the network. The stimulated output pattern is compared with

the desired pattern. If there is disagreement between these two patterns, the difference

is used to adjust the parameters until the output pattern matches the desired pattern.

This process is repeated for all the input patterns in the training set until a convergence

criterion is satisfied. After the training, the network is ready to work.

The back-propagation network has been applied in many areas including pattern

recognition [82, 72, 17, 80] and image processing [84]. It has been found to perform

well in most cases. Nevertheless, it has some practical problems, one of which is that it

usually takes a long period of time to train, especially if 100% accuracy is required.


Feedback Networks

Networks in which there are connections between neurons in the same layer and/or

from neurons in the later layers to that of earlier layers are called feedback or recurrent

networks. Examples are the Hopfield network [33], the Hamming network [58], the ART

networks [12], and the Bidirectional Associative Memories (BAM) [54]. In the following

the Hopfield network and the Hamming network are described in more detail.

Figure 2.5 shows the structure of a discrete Hopfield network. The neurons in such

a network are threshold-logic neurons. The network parameters are obtained through a

simple procedure (see Appendix B).

Figure 2.5: The Hopfield Network

After all the parameters are set up, the network is initialized with an input pattern

(normally a binary pattern). Then the output of the network is feedback to its input, and

the network iterates until it converges. The output after the convergence is the "true"

network output.

The Hopfield network was originally used as an associative memory [33]. The network

"memorizes" some patterns which are used as exemplars. When presented with a pattern,

the network will produce the exemplar which most resembles the input pattern.

The Hopfield network has also been applied to various areas including optimization


[35, 36, 89] and image processing [99, 83].

The Hopfield network is a "pure" feedback network in the sense that there is only

one layer of neurons and every neuron can feed its output to anyone else. There are

other networks which are mixtures of the feedforward networks and the "pure" feedback

networks. The Hamming network proposed by Lippmann [58] is an example of such

networks.

Figure 2.6 shows the structure of a Hamming network. The upper sub-network has a

feedback structure, and the lower sub-network has a feedforward structure. The neuron

model used in this network is the threshold-logic node but all the neurons are operating

in the linear range. Hence, they are virtually linear neurons.

upper subnet picks maximum

lower subnet calculate matching scores

Xo X l Xn-2 Xn-1

Figure 2.6: The Hamming Network

The Hamming network is used on problems where inputs are generated by selecting an


exemplar and reversing bit values randomly and independently. This is a classic problem

in communication theory, which occurs when binary fixed-length signals are sent through

a memoryless binary symmetric channel. The optimum minimum error classifier in this

case calculates the Hamming distance to the exemplar for each class and selects that

class with the minimum Hamming distance [23]. The Hamming distance is the number

of bits in the input which do not match the corresponding exemplar bits. The Hamming

network implements this algorithm.

For the network shown in Figure 2.6, parameters in the lower subnet are set such

that the matching scores generated by the outputs of the lower subnet are equal to ./V

minus the Hamming distances to the exemplar patterns. These matching scores range

from 0 to the number of elements in the input. They are the highest for those neurons

corresponding to classes with exemplars that best match the input. Parameters in the

upper subnet are fixed. All the thresholds are set to zero. Weights from each neuron to

itself are 1; weights between neurons are set to a small negative value.

After all the parameters have been set, a binary pattern is presented at the bottom

of the Hamming network. It must be presented long enough to allow the outputs of the

lower subnet to settle, and initialize the output values of the upper subnet. The lower

subnet is then removed and the upper subnet iterates until the output of only one neuron

is positive. Classification is then complete and the selected class is that corresponding

to the neuron with a positive output.

The Hamming network has been used as associative memory, as which the Hamming

network has some advantages over the Hopfield network [59]..

Chapter 2. BACKGROUND 1 7

2.2 Image Processing

Images consist a large percentage of information people daily receive. Since images are

often degraded due to the imperfection in the recording process, it is necessary to process

these images to reveal the true information they contain.

Image processing includes several classes of problems. Some basic classes are image

representation and modeling, image enhancement, image restoration, image reconstruc

tion, and image data compression. In this thesis, only image enhancement and image

restoration are considered. Hence, the term image processing hereafter is used to refer

to both areas of image enhancement and image restoration unless specified otherwise.

Image processing techniques may be divided into two main categories: transform-

domain methods, and spatial-domain methods. Approaches based on the first category

basically consist of computing a two-dimensional transform (e.g. Fourier or Hadamard

transform) of the image to be processed, altering the transform, and computing the

inverse to yield an image that has been processed in some manner. Spatial-domain tech

niques consist of procedures that operate directly on the pixels of the image in question.

In this thesis, both categories are introduced with an emphasis on the latter one.

2.2.1 Image Enhancement

Image enhancement techniques are designed to improve image quality for human viewing.

This formulation tacitly implies that an intelligent human viewer is available to recognize

and extract useful information from an image. This viewpoint also defines the human as

a link in the image processing system. Image enhancement techniques include contrast

and edge enhancement, pseudocoloring, noise filtering, sharpening, magnifying, and so

forth. Image enhancement is useful in feature extraction, image analysis, and visual

information display. The enhancement process itself does not increase the inherent infor

mation content in the data, but rather emphasizes certain specified image characteristics.

Enhancement algorithms are generally interactive and application-dependent.

Before we start the overview of image enhancement techniques, some notations have

to be explained here. U represents the image to be processed and V represents the

enhanced image. The ij t h element of U is denoted as U;J or u(i, j); similarly, or v(i,j)

is the ij t h element of V . A window is denoted as W and the window size is denoted as

i n

The following is an overview of some image enhancement techniques.

Point Operations

Point operations are zero memory operations where a given gray level u 6 [0, L] is mapped

into a grey level v £ [0, L] according to a transformation, that is,

v = f(u) (2.5)

Contrast Stretching Low-contrast images occur often due to poor or nonuniform

light conditions or due to nonlinearity or small dynamic range of the imaging sensor. A

typical contrast stretching transformation is

au, 0 < u < a

/3(u-a) + va, a<u<b (2.6)

7(u — b) + Vb, b < u < L

The slope of the transformation is chosen greater than unity in the region of stretch.

The parameters a and b can be obtained by examing the histogram of the image. For

example, the gray scale intervals where pixels occur most frequently would be stretched

most to improve the overall visibility of a scene.

v = <

Cl ipping and Thresholding A special case of contrast stretching where a = 7 = 0

is called clipping. This is useful for noise reduction when the input signal is known to lie

in the range [a, b]. Thresholding is a special case of clipping where a = b — t and the output becomes

binary. For example, a seemingly binary image, such as a printed page, does not give

binary output when scanned because of sensor noise and background illumination varia

tions. Thresholding is used to make such an image binary.

Intensity Level Slicing This technique permits segmentation of certain gray level

regions from the rest of the image. It is useful when different features of an image are

contained in different gray levels.

If the background is not wanted, the technique is

v

Otherwise, it is

v

Histogram Model ing The histogram of an image represents the relative frequency

of occurrence of the various gray levels in the image. Histogram-modeling techniques

modify an image so that its histogram has a desired shape. This is useful in stretching

the low-contrast levels of images with narrow histograms. Histogram modeling has been

found to be a powerful technique for image enhancement [41, 20].

Spatial Operations

Many image enhancement techniques are based on spatial operations performed on lo

cal neighborhoods of input pixels. Often, the image is convolved with a finite impulse

response filter called spatial mask.

L, a < u < b = < (2.7)

0, otherwise

L, a < u <b ~ (2.8)

u, otherwise


Noise Smoothing It is desirable to remove the noise out of a noise degraded

picture. An typical smoothing technique is the so-called spatial averaging. Here each

pixel is replaced by a weighted average of its neighborhood pixels, that is

v(m, n) = y ^ y ^ a(k,l)u(m — k,n — /) (2.9)

W is a suitably chosen window, and a(k,l) are the filter weights. A common class of

spatial averaging filters has all equal weights, giving

v(m,n) = -^YS2 u(m-k,n-l) (2.10) w {k,i)ew

where N\y is the number of pixels in the window W, i.e., Nw = \W\. Another spatial

averaging filter used often is given by

v(m,n) = -(u(m,n) + - (u(m — 1,n) + u{m + l ,n) + u(m,n — l) + u(m,n + 1))) (2.11)

that is, each pixel is replaced by its average with the average of its nearest four pixels.

Although the spatial averaging can smooth a picture, it also blur the edges. To protect

the edges from blurring while smoothing, a directional averaging filter can be useful [26].

Spatial average v(m, n : 6) are calculated in several directions (see Figure 2.7) as

v(m, n : 6) = - J r u(m-k,n-l) (2.12)

and a direction 9* is found such that \u(m,n) — u(m,n : #*)| is minimum. Then

v(m,n) = v(m,n : 0*) (2.13)

gives the desired result.

Median Filtering Median filtering is a non-linear process useful in reducing impulse

or salt-and-pepper noise [90]. Here the input pixel is replaced by the median of the pixels

contained in a window around the pixel, that is

u(m, n) = median{ii(m — k, n — I) : (k, I) € W} (2-14)


We

e

* - 1

k

Figure 2.7: Directional Smoothing Filter

where W is a suitably chosen window. The algorithm for median filtering requires ar

ranging the pixel values in the window in increasing or decreasing order and picking the

middle value. Generally the window size is chosen so that \W\ is odd. If \W\ is even,

then the median is taken as the average of the two values in the middle.

Unsharp Masking The unsharp masking technique is used commonly in the print

ing industry for crispening the edges [75]. A signal proportional to the unsharp, or

low-pass filtered, version of the image is subtracted from the image. This is equivalent to

adding the gradient, or a high-pass signal, to the image. In general the unsharp masking

operation can be represented by

where A > 0 and g(m,n) is a suitably defined gradient at (m, n). A commonly used

gradient function is the discrete Laplacian.

g(m, n) = u(m, n) — ^(u(m — 1, n) + u(m, n — 1) + u(m + l,n) + ii(m, n — 1)) (2.16)

Low-pass, Band-pass, and High-pass Filtering Low-pass filters are useful for

noise smoothing and interpolation. High-pass filters are useful in extracting edges and in

u(m,n) + \g(m, n) (2.15)

sharpening images. Band-pass filters are useful in the enhancement of edges and other

high-pass image characteristics in the presence of noise.

If hijp(rn,n) denotes a FIR low-pass filter, then a FIR high-pass filter, hjjp(rn,n),

can be defined as

hffp(m,n) = 8(m,n) — hLp(m,n) (2-17)

where <S(m,n) is a two-dimensional delta function defined as

1 if m = n — 0 tS(m, n) (2.18)

0 otherwise

Such a filter can be implemented by simply subtracting the low-pass filter output from its

input. Typically, the low-pass filter would perform a relatively long-term spatial average

(for example, on a 5 x 5, 7 x 7, or larger window).

A spatial band-pass filter can be characterized as

hBp(m, n) = hLl (m, n) - hL2 (m, n) (2.19)

where and denote the FIRs of low-pass filters. Typically, and would

represent short-term and long-term averages, respectively.

Zooming Often it is desired to zoom on a given region of an image. This requires

taking an image and displaying it as a larger one. Typical techniques of zooming is

replication and linear interpolation [45].

Replication is a zero-order hold where each pixel along a scan line is repeated once

and then each scan line is repeated. This is equivalent to taking a M x N image and

interlacing it by rows and columns of zeros to obtain a 2M x 2N matrix and convolving

the result with an array H, defined as

A I" 1 1 H = (2.20)

1 1

(2.22)

This gives

u(m,n) = u(M), k = [ y ] , ' = |y] , m,n = 0,1,2,... (2.21)

Linear interpolation is a first order hold where a straight line is first fitted in between

pixels along a row. Then pixels along each column are interpolated along a straight line.

For example, for a 2 x 2 magnification, linear interpolation along rows gives

ui(m,2n) = u(m,n), 0 < m < M - 1, 0 < n < TV - 1

vi(m,2n + l) = |[u(m,n) + u(m,n + 1)], 0 < m < M - 1, 0 < n < iV - 1

Linear interpolation of the preceding along columns gives the result as

«i(2m,n) = vi(m,n), 0 < m < M - 1, 0 < n < N - I

ui(2m + 1, n) = i[t>!(m,n) + vx(m + l,n)], 0<m<M-l,0<n<N~l

Here it is assumed that the input image is zero outside [0,M — 1] x [0, N — 1]. The

above result can also be obtained by convolving the 2M x 2N zero interlaced image with

the array H

(2.23)

H

i i i 4 2 4

i 1 i 1 1 1

L 4 2 4

(2.24)

whose origin (m = 0, n = 0) is at the center of the array. In most of the image processing

applications, linear interpolation performs quite satisfactorily. High-order (say, p) inter

polation is possible by padding each row and each column of the input image by p rows

and p columns of zeros, respectively, and convolving it p times with H. For example

p — 3 yields a cubic spline interpolation in between the pixels.

Transform Operations

In the transform operation enhancement techniques, zero-memory operations are per

formed on a transformed image followed by the inverse transformation. We start with

the transformed image U ' = {u'(k, /)} as

U' = A U A r (2.25)

where U = {u(m,n)} is the input image. Then the inverse transform of

v'(k,l) = f(u'(k,l)) (2.26)

gives the enhanced image as

V = A-1V'[AT]-1 (2.27)

The transform can be Discrete Fourier Transfer (DFT) or other orthogonal transforms.

Generalized Linear Filtering In generalized linear filtering, the zero-memory

transform domain operation is a pixel-by-pixel multiplication

u'(k,l) = g(k,l)u'(k,l) (2.28)

where g(k, I) is called a zonal mask.

A filter of special interest is the inverse Gaussian filter, whose zonal mask for TV x N

images is defined as

g(k,l) = l P l 2S2 J ' - 2 ( 2 2 9 )

g(N — k, N — /), otherwise

for the case when A in (2.25) is DFT. This is a high-frequency emphasis filter that restores

images blurred by atmospheric turbulence or other phenomena that can be modeled by

Gaussian PSFs.

Root Filtering The transform coefficients u'(k, I) can be written as

u'(M) = K ( M ) | e j m ° (2.30)

In root filtering, the a-root of the magnitude component of u'(k, I) is taken, while retain

ing the phase component, to yield

v'(k, I) = \u'(k, l)\ ae j e^' l\ 0 < a < 1 (2.31)


For common images, since the magnitude of u'(k, I) is relatively smaller at higher spatial

frequencies, the effect of a-rooting is to enhance higher spatial frequencies relative to

lower spatial frequencies.

Homomorphic Filtering If the magnitude term in (2.31) is replaced by the loga

rithm of |w'(A:,/)| and define

v'(kJ)^[log(\u'(k,l)\)}e^ (2.32)

then the inverse transform of v'(k,l), denoted by u(m,ra), is called the generalized cep-

strum of the image. In practice a positive constant is added to |u'(fc,Z)| to prevent the

logarithm from going to negative infinity. The image t>(ra, n) is also called the general

ized homomorphic transform, H, of the image u(m,n). The generalized homomorphic

linear filter performs zero-memory operations on the transform of the image followed

by inverse .//-transform. The homomorphic transformation reduces the dynamic range

of the image in the transform domain and increases it in the cepstral domain [39].

Pseudocoloring

In addition to the requirements of monochrome image enhancement, color image en

hancement may require improvement of color balance or color contrast in a color image.

Enhancement of color images becomes a more difficult task not only because of the added

dimension of the data but also due to the added complexity of color perception [55].

A practical approach to developing color image enhancement algorithms is shown in

Figure 2.8. The input color coordinates of each pixel are independently transformed into

another set of color coordinates, where the image in each coordinate is enhanced by its

own (monochrome) image enhancement algorithm, which could be chosen suitably from

the foregoing set of algorithms. The enhanced image coordinates T[, T£ are inverse

transformed to R', G', B' for display. Since each image plane Tk(m,n), k G {1,2,3},


T i ( monochrome •li „ R' T i ( monochrome •li „ R' image enhancemant

inve

rse

oord

inat

e sf

orm

atio

n

coordinate T 2 monochrome Ti (

inve

rse

oord

inat

e sf

orm

atio

n

G* cd

comversion image enhancemant inve

rse

oord

inat

e sf

orm

atio

n

Cl. cn

° S B ° S B monochrome T3' ( B' (

image enhancemant image enhancemant

Figure 2.8: Color Image Enhancement

is enhanced independently, care has to be taken so that the enhanced coordinates T'k

are with the color gamut of the R-G-B system. The choice of color coordinate system

Tk, k 6 {1,2,3}, in which enhancement algorithms are implemented may be problem-

dependent.

2.2.2 Image Restoration

Any image acquired by optical, electro-optical or electronic means is likely to be degraded

by the sensing environment. The degradation may be in the form of sensor noise, blur

due to camera misfocus, relative object-camera motion, random atmospheric turbulence,

and so on. Image restoration is concerned with filtering the observed image to minimize

the effect of degradations. The effectiveness of image restoration filters depends on the

extent and the accuracy of the knowledge of the degradation process as well as on the

filter design criterion. Image restoration techniques are classified according to the type

of criterion used.

Image restoration differs from image enhancement in that the latter is concerned more

with accentuation or extraction of image features rather than restoration of degradations.

Image restoration problems can be quantified precisely, whereas enhancement criteria are


difficult to represent mathematically. Consequently, restoration techniques often depend

only on the class or ensemble properties of a data set, whereas image enhancement

techniques are much more image dependent.

The most common degradation or observation model is

g = H f + n (2.33)

where g is the observed or degraded image, f is the original image and n is the noise

term. The objective of image restoration is to find the best estimate $ of the original

image f based on some criterion.

Unconstrained Least Squares Filters

From Equation (2.33), the noise term in the degradation model is given by

n = g - H f (2.34)

In the absence of any knowledge about n, a meaningful criterion is to seek an f such that

H f approximates g in a least-squares sense, that is,

J(f) = ||g-Hf||2 (2.35)

is minimum, where ||x|| is the L2 norm of vector x.

Inverse Filter Solving Equation (2.35) for f yields

f = ( H 2 ^ ) " 1 ^ (2.36)

If H is a square matrix and assuming that H - 1 exists, Equation (2.36) reduced to

f = H _ 1 g (2.37)

This filter is called inverse filter [26].


Constrained Least-Squares Filters

In order that the restoration filters have more effect than simply inversions, a constrained

least-square filter might be developed in which the constraints allow the designer addi

tional control over the restoration process.

Assuming the norm of the noise signal ||n||2 is known or measurable a posteriori from

the image, the restoration problem can be reformulated as minimizing ||Qf||2 subject

to ||g — H f ||2 = ||n||2. By using the method of Langrange multipliers, the restoration

problem becomes finding f such that

J(f) = ||Qf||2 + a(||g-Hf||2-||n||2) (2.38)

is minimum.

The solution to Equation (2.38) is

f = ( H T H + 7 Q T Q ) - 1 H r g (2.39)

where 7 — —. Pseudo-inverse Filter. If it is desired to minimize the norm of f, that is, Q = I,

then an estimate f is given by

f = ( H T H + 7 I ) - 1 H T g (2.40)

In the limit as 7 —> 0, the resulting filter is known as the pseudo-inverse filter [2].

Wiener Filter. Lef Rf and R n be the correlation matrices of f and n, defined

respectively as

Rf = £ { f f r } (2.41)

and

R n = £{nn r } (2.42)


where E{.} denotes the expected value operation.

By defining

Q r Q 4 R ^ R Q (2.43)

and substituting this expression to Equation (2.39), we obtain

f = ( H T H + 7 R f 1 R n ) - 1 H T g (2.44)

which is known as the Wiener filter[32].

M a x i m u m Entropy Filter. If the image f is normalized to unit energy, then each

/,• scalar value can be interpreted as a probability. Then the entropy of the image f would

be given by

Entropy = - f T In f (2.45)

where lnf refers to componentwise natural logarithms, that is

lnf = ( l n / 1 , l n / 2 , . . . , l n / N ) T (2.46)

If constrained least-squares approach is applied as before, then the negative of the entropy

could be minimized subject to the constraint that ||g — H f ||2 = ||n||2. Thus the objective

function becomes

J(f) = Flnf - a(||g - H f|| 2 -||n|| 2 ) (2.47)

The solution to this equation is

f = exp{- l - 2aH T (g - Hf)) (2.48)

Baysian Methods

In many imaging situations, for instance, image recording by film, the observation model

is non-linear as

g = s{Hf} + n (2.49)


where -s{x} is a componentwise non-linear function of vector x. The Bayes estimation

problem associated with Equation (2.49) is to find an estimate f such that

max p(f|g) (2.50)

where p(f |g) is the density function of f given g. From Bayes' law, we have

p ( f | g ) = K^M) ( 2 5 1 )

p(g)

Maximizing the above equation requires that a priori probability density on the right-

hand side be defined.

M a x i m i m u m A Posteriori Estimate Under the assumption of Gaussian statistics

for f and n, with covariance Rf and R n , the MAP estimate is the solution of minimizing

the following function [6]

lnp(f|g) = - | ( g - a{m})TR?(s - s{Hf})

- i ( f - ? f R f - 1 ( f - f ) + lnp(g) + ^ (2.52)

where K is a constant factor and ? is the mean of f . The solution is

{ = f + R f H F D R ^ g - s{Ui}) (2.53)

where D is a diagonal matrix defined as

D ^ D i a g { ^ | x = 6,} (2.54)

and bi are the elements of the vector b = Hf. Equation (2.53) is a nonlinear matrix

equation for f, and since f appears on both sides, there is a feedback structure as well [42].

Analogies in the estimation of continuous waveforms have been derived for communication

theory problems [91].


Maximimum Likelihood Estimate Associated with the M A P estimate is the

maximum likelihood (ML) estimate, which is derived by assuming that p(f|g) = p(g|f);

that is, the vector f is a nonrandom quantity. Accordingly, Function (2.52) is reduced to

lnp(f |g) = - i ( g - 5 { H f } ) T R i 1(g - s{Hf}) + Inp(g) + K (2.55)

The solution of minimizing the above function is

H T D R ; 1 ( g - a { H f } ) = 0 (2.56)

That is

g = s{m] (2.57)

Chapter 3

D E F I N I T I O N S A N D N O T A T I O N S

The field of neural networks has attracted many people from different disciplines. The

diversity of their backgrounds is reflected in the variety of terminologies. Although some

efforts are being made to address the terminology problem, a standard terminology is

still yet to come [16]. For the sake of clarity and some other reasons stated later in this

chapter, we give our definitions of neurons, neural networks, and other related concepts

here.

Definition 3.1 A neuron is a simple computing element with n inputs x\,x2, •••^n (n >

1) and one output y. It is characterized by n + 1 numbers, namely, its threshold t, and

the weights w\, w2, ...,wn, where W{ is associated with X{. A neuron operates on a discrete

time k = 1,2,3,4,..., its output at time k + 1 being determined by its inputs at time k

according to the following rule

The function of a neuron is to map points in a multi-dimensional space Xn to points

in a one-dimensional space y, that is,

n y{k + l) = fa(EwMk)-t) (3.1)

where fa(.) is a monotonic function.

fa-xn->y (3.2)

From the definition, fa is a composite function

fa = fi o f2 (3.3)

32

Chapter 3. DEFINITIONS AND NOTATIONS 33

Xi

x,—

xN

Figure 3.1: The Schematical Representation of a Neuron

where

h-.n^y (3.4)

and

f2:X n-*K (3.5)

Function f2 is a linear function in the sense that

/2(Ai£i + X2x2) = A1/2(x1) + A2/2(x2) VAx, A2 € K (3.6)

A neuron is defined on the triple (X n, y, f) where / is an element of the set J- of

activation functions. Let's denote the linear activation function as L, the threshold-logic

activation function as T, and the quasi-linear activation function as Q. A neuron is

schematically represented as that shown in Figure 3.1, where A £ T.

Definition 3.2 A neural network is a collection of neurons, each with the same time

scale. Neurons are interconnected by splitting the output of any neuron into a number of

lines and connecting some or all of these to the inputs of other neurons. A output may

thus lead to any number of inputs, but an input may come from at most one output.

A neuron is denoted as p. The i t h neuron in a network is denoted as pi. The set of

all the indices which specify the sequence of neurons is denoted as X. For a network of n neurons, T = { 1 , 2 , n } .


The output of a neuron is called the state of the neuron. The state of pi at time

k is denoted as Si(k). In this thesis, we assume that the states of all the neurons in a

network are updated at the same time instance. Such updating is known as synchronous

updating [25]. At k = 0, the network is initialized with appropriate values, and the

network then updates itself until it halts when certain criterion is met. The function of

a neural network with N inputs and M outputs is to map points in a multi-dimensional

space to points in another multi-dimensional space, that is, it performs the following

function

/ : XN -• yM (3.7)

The input of a network is denoted as x or x , and the output of a network is denoted as

V or y .

Neurons in a network are classified into three types: input neurons, output neurons,

and hidden neurons. An input neuron is a neuron whose initial state is an element of

the network input x; an output neuron is a neuron whose state at the time when the

network halts is an element of the network output y; a hidden neuron is a neuron which

does not belong to either one of the first two classes. A neuron can be both input and

output neuron.

The input neuron is somewhat special. For example, in a feedforward network, an

input neuron has only one input line. If the input x € B, then a threshold-logic neuron

can be used as the input neuron; if x € V, then a quasi-linear neuron can be used as

the input neuron; if x € 72., then a linear neuron has to be used as the input neuron.

In all the cases, a linear neuron can always be used as the input neuron. Therefore, for

the sake of simplicity, input neurons are always linear neurons unless specified otherwise.

An input neuron is schematically shown in Figure 3.2, which represents an linear neuron

with t = 0 and w — 1.


0 Figure 3.2: The Schematical Representation of the Input Neuron

0 0 (a) (b) (c) (d)

Figure 3.3: Three Ways of Setting Up an Input Neuron

There are three ways to set up an input neuron: (1) 5(0) = x, s(k > 0) = 0; (2)

s(k > 0) = x; and (3) s(k) = x(k). These three cases are shown in Figure 3.3. To

represent either one of the three settings, an input neuron without the input line is used

(see Figure 3.3.d).

At time k = 0, the input neurons are loaded with the network input, and the rest are

loaded usually with zeros. The computation time Tp of a network is the time between

when the network is initialized and when the solution appears on the output neurons.

Our definitions of neuron and neural network are not arbitrarily chosen. They provide

a unified framework to describe many kinds of neural network models. For instance, an

2 x 2 x 1 back-propagation network can be implemented by our network as that shown

in Figure 3.4, where S denotes the sigmoid activation function. At time k = 0, the

input neurons are loaded with network input x, and the rest of the neurons with zeros.

At k = 2, the state of the output neuron is the true output of the network—hence the

computation time of this network is 2. If the input neurons are set up as in Figure 3.3.b,

then s5(k > 2) = s5(k = 2).

Feedback network can also be implemented by our networks. For instance, a four-

neuron Hopfield network is implemented as shown in Figure 3.5. Note that here every

Figure 3.5: The Hopfield Network

neuron is both the input and output neuron. After the network is initialized, it updates

itself until s{(k = h + 1) = Si(h) ^ Si(h - 1) Vi <E J = {1,2,3,4}. The states of all the

neurons at k > h are the true network output—hence, h is considered the computation

time of this network.

Another example is the implementation of an 2 x 2 x 2 Hamming network as shown

in Figure 3.6. In Lippmann's original definition of Hamming network, there is a control

problem that the lower subnet has to be removed after the upper subnet is initialized

with the output of the lower subnet (see [58]). Since the network itself has no means

to do so, an external mechanism has to be employed, for example, a set of synchronized

switches between the lower subnet and upper subnet. In our networks, such a control

problem is easily solved by setting the input neurons as shown in Figure 3.3.a.


Y i Y 2

Figure 3.6: The Hamming Network

Neurons in our networks are usually numbered in an one-dimensional manner. Never

theless, they can be spatially arranged in a two or greater dimensional manner as shown in

previous examples. In those cases, the neurons are numbered either from top-to-bottom

and left-to-right (see Figure 3.4), or from left-to-right and bottom-to-top (see Figure 3.5

and Figure 3.6).

There are two kinds of connections from pi to pj. One is from pi directly to pj, and

the other is from pi to some other neurons and then to pj. The former is called direct

connection, and the latter indirect connection.

There are also two distinctive direct connections between two neurons, say, pi and pj,

in a network. One is from the output of />,• to the input of pj, whose strength or weight

is denoted as Wji. The other is from the output of pj to the input of pi, whose weight is

denoted as Wij.

The matrix W whose ijth element is is called the weight matrix of the network. A

vector whose ith element is the threshold of pi is called threshold vector and is denoted as

t. A vector whose ith element is the type of activation function of pi is called activation

vector and is denoted as a. A neural network is fully defined by the triple (W,t ,a) ,


which specifies what function the network performs.

Neurons in a network are usually grouped in layers. A layer of neurons is a group of

neurons each of which has no direct connections with others in the same group. Symbol

Ch denotes the set of all neurons belong to layer h, and C denotes the set of all the

possible Ch- From the definitions, we know that the following is true

E\Ch\ = \I\ (3.8)

h

The communication time tc(i,j) from pi to pj is defined as the shortest period of

time it takes for data in pi to reach pj. For example, in the Hamming network shown in

Figure 3.6, tc(2, 3) = 1 and ic(2,5) = 2. If there is no connection between two neurons,

then tc(i,j) = tc(j,i) = oo. The communication time ti(k,h) from layer k to layer h is

defined as

tt(k, h) = ma,x{tc(i,j) : i € Ck, j € Ch] (3.9)

A matrix C whose ij t h element is defined as 1 if wu ^ 0

3 T (3.10) 0 otherwise

is called connectivity matrix. The architecture Ap of a network is defined by the connec

tivity matrix of the network. The configuration Cp of a network is defined by the triple

(C,t,a). The architecture of a network specifies the structure of the network; there

are two major classes of network architectures, namely, feedforward and feedback. The

configuration of a network specifies what class of functions this network can perform.

The connectivity matrix C of a network contains a great deal of information about

the properties of the network.

Fact 3.1 A neural network has a feedforward architecture iffcij A Cji = 0 Vz, j € X.


Fact 3.2 A neural network has a feedback architecture if3i,j 6 T such that c t J Ac,,- = 1.

Fact 3.3 There is a direct connection between pi and pj iff c,j V Cji = 1

Fact 3.4 There is an indirect connection between pi and pj iff there exists a non-empty

sequence (kt, k2,A;n) such that either cklj A ck2kl A. . . A cfcnfcn_1 A cikn = 1 or ckli A ckikl A

...Acknkn_1Acjkn = l.

The neural networks of our definition are, mathematically speaking, a class of au

tomata networks [25]. Yet, our networks differ from the automata networks of the usual

sense in that the state of our networks is inherently infinite. Moreover, our networks are

different from that defined in the field of automata networks where neural networks are

defined as systems of McCulloch-Pitts (threshold-logic) neurons (see [7, 25]). Note that

such networks are only a subset of our networks. Nevertheless, analytical tools used in

the field of automata networks are useful in the analysis of our networks.

The analysis of our networks is certainly an interesting and challenging subject. How

ever, the main interest of this thesis is on the synthesis part, that is, how to map an

function to a network such that the network can perform the function. In other words,

we are interested in finding procedure(s) P such that

P:A^JV (3.11)

where A is the set of functions and J\f is the set of the triple (W,t, a).

Chapter 4

REASONS FOR ALGEBRAIC DERIVATION

The essence of neural network problem solving is that the network performs a function

which is a solution to the problem. The function can be viewed as a computing rule which

states how to compute a given input to get the correct output. This rule is coded in the

network architecture, the weights, the thresholds, as well as the activation functions.

A neural network is a collection of interconnected simple computing units, each of

which can be viewed as a computational instruction. Thus constructing a network to

solve a problem is equivalent to arranging these instructions such that a computing rule

is formed which solves the problem. This is similar to programming in digital computers.

There are basically two ways of "programming" a neural network: (1) analytically de

riving the network architecture and the network parameters based on a given computing

rule; or (2) assuming a network architecture and adjusting network parameters until a

proper computing rule is formed. Adjusting network parameters in such a meaningful

way is referred to as learning or training in the field of neural networks [94].

Let us refer to the first approach as analytical approach, and the second as learning

approach. There are two typical network models which are representatives of the two

approachs respectively. One is the Hamming network [59], which represents the former

approach. The other is the back-propagation network [80], which represents the latter

approach.

The learning approach is necessary when no computing rule is available or at least not

completely available. The role of learning here is to find a computing rule, which solves

40

Chapter 4. REASONS FOR ALGEBRAIC DERIVATION 41

the given problem, based upon partial solution of the problem. When the computing

rule is completely known, the learning approach is still useful in the sense that here the

role of learning is to automate the process of coding the computing rule.

However, there are some practical problems associated with the learning approach,

one of which is that it usually takes a very long period of time to train a network. This

problem is caused by several factors which shall be discussed in the next section.

When the computing rule is known, it is sensible to utilize it in constructing neural

networks. The analytical approach ensures proper network architectures and proper

network parameters.

However, the existing network models designed using the analytical approach are

only capable of solving a narrow range of problems. Moreover, these network models

are more or less hand crafted, heavily based upon the imagination and intuition of those

who developed these models. There is still no systematic method of network design. The

objective of this thesis is to develop such a method.

In the following sections, we shall demonstrate the drawbacks of the learning approach

with the example of training the back-propagation network to solve the parity problem;

we shall also describe the analytical approach in more details with examples of designing

the Hamming network, as well as designing a network model to solve the parity problem.

We then discuss the current situation of the analytical approach, and propose our method.

4.1 Drawbacks of the Learning Approach

The back-propagation network is the most popular network model used in neural network

applications because it can be trained to implement any function to any specified degree

of accuracy [30]. However, there are some practical problems with its learning process.

The first problem is that learning usually takes a long period of time even for a small


scale problem. One cause of this problem is due to the back-propagation algorithm itself.

Although many improved versions have been developed [69, 28, 44], the fastest version is

still time consuming. Another cause is that it is difficult to know in advance how many

neurons in each layer are required. These "magic" numbers have to be found, at least

for the time being, through trial-and-error. This means that after days, even weeks of

training, one has to start the training process all over again if the number of neurons in

the network are not enough.

The second problem is that after training a network to implement a certain function,

one rarely gains any insight that would help in training another network to implement a

similar function, e.g., a function which is in the same family as the previous function.

The nature of image enhancement and restoration problems makes the first problem

even more severe. Taking the example of implementing a median filter with the window

size of three (the simplest case), assuming that each pixel takes integer values ranging

from 0 to 255, then there are 2563 = 16,777,216 possible input patterns. In order to

achieve zero error rate, the training set may have to consist of all the input patterns

and their corresponding correct outputs. Such a training set is astronomically enormous,

needless to say for the cases where the window size is bigger. To underscore the com

plexity of learning, suppose we have the means to have neural networks learn one pattern

every 1 ms, then to implement the above mentioned median filter, it would take a net

work about 5 hours to learn all the patterns. However, with the same learning rate, to

implement a median filter with a window size of five, it would take a network about 35

years to learning all the possible patterns.

The first problem is well documented in the literature [92, 43, 81], hence we shall only

illustrate the second problem here through the example of training the back-propagation

network to solve the parity problem. This problem is impossible for the Perceptron

to solve [65], but is solvable by the back-propagation network [80], which is one of the


(a) (b) (c)

Figure 4.1: Networks for Solving : (a) XOR problem; (b) Parity-3 problem; and (c) Parity-4 problem.

reasons for the fame of this network model. We shall start with the XOR problem, which

is a special case of the parity problem when the problem size n = 2, and continue with

cases when n — 3 and n — 4.

Example 4.1: The Parity Problem.

The truth table of the Exclusive OR (XOR) is shown in Table 4.1. The network

architecture we have chosen is shown in Figure 4.1. The generalized delta rule (see

Appendix A) is used to train the network. All the patterns in Table 4.1 are included in

the training set. The result after the network has learnt how to solve the XOR problem,

i.e., for every input pattern its output is the same as that in Table 4.1, is shown in

Table 4.2.

Similarly, we carried out the training for n = 3 and n = 4. The network architectures

for solving these problems are also shown in Figure 4.1. The network parameters after

training are shown in Table 4.3 and Table 4.4 respectively. The number which has pj to


Table 4.1: The Exclusive OR

input output Xi x2 z 0 0 0 0 1 1 1 0 1 1 1 0

Table 4.2: Network Parameters For Solving XOR Problem

Pi Pi

P3 4.146451 -2.968075 P4 1.379474 -2.464423

h U -1.889660 0.380817

P3 P4 Ps -3.351054 2.622306

h -1.845504

the top and pi to the left is the value of the weight of the connection from pj to />,-. The

number which has to the top is the value of the threshold of neuron pi.

By looking at the three tables of network parameters, it is difficult to generalize what

the parameters of the network will be for cases where the problem size is five or larger.

Consequently, one has to start from scratch to train networks for these cases.

The number of learning steps required for the network to have learnt how to solve the

parity problem is shown on Table 4.5 for n = 2,3,4,5 and rj = 0.10,0.15,0.20,0.25, where

r] is the gain term of the generalized delta rule (see Appendix A). Here, the learning step

is defined as the interval during which all the network parameters are being updated


Table 4.3: Network Parameters For Solving Parity-3 Problem

Pi P2 P3

P4 -4.396586 5.399827 -3.631473 P5 -1.965704 2.695202 -0.726498 Pe 1.025570 -0.856012 2.323546

U h U -1.486861 -1.788286 -0.119762

P5 Pe Pi -4.192873 4.991574 -2.708791

t? 0.151018

Table 4.4: Network Parameters For Solving Parity-4 Problem

Pi P2 P3 P4

Ps -4.865602 -6.202526 5.522527 -6.278468 Pe -3.045859 4.738367 -4.957426 4.048179 Pr -6.845311 4.469164 -2.124128 4.499167 Ps -3.836795 -2.465528 2.714290 -2.221879

te -3.129540 3.778817 2.953945 -3.396108

Ps Pe P7 P9 -4.868719 5.105030 -5.718770 6.778777

1.817817

Chapter 4. REASONS FOR ALGEBRAIC DERIVATION

Table 4.5: Number of Learning Steps

46

n 7/ = 0.10 n = .15 n = 0.20 7/ = 0.25 2 9830 6485 4885 4003 3 30652 20581 15490 12527 4 94317 80384 69852 75428 5 170604 120640 89585 67585

once. The learning step is assumed constant, which implies that the updating is done

in parallel. Note that the number of learning steps increases in a manner that it soon

gets very large as the problem size increases. As a matter of fact, the so-called loading

problem (see [48]), which concerns how long it takes a neural network to have learnt every

pattern in the training set with the respect to the set size, turns out to be NP-complete

[47, 48]. Moreover, for the back-propagation network to solve the parity problem, the

training set may have to include every possible pattern, which means that the size of

the training set is two to the power of the problem size, i.e., the size of the training

set increases exponentially as the problem size increases. Therefore, training a network

for the cases of larger problem size not only means starting all over again, but may be

infeasible when the problem size is fairly large.

Finally, the network architectures shown in Figure 4.1 were chosen because we knew

they work from experiments done by other researchers [80]. Otherwise, extra amount of

work would have had to be done on finding the right network architectures.

4.2 The Advantages of The Analytical Approach

The analytical approach is demonstrated with examples of designing the Hamming net

work, as well as designing a network model to solve the parity problem.


4.2.1 Designing the Hamming Network

A classic problem in communication theory is that given a set of binary patterns, called

exemplars, for an given binary pattern, how to determine which exemplar is the closest to

the input pattern. This problem occurs when binary fixed-length signals are sent through

a memoryless binary symmetric channel. The optimum minimum error classifier in this

case calculates the Hamming distance to the exemplar for each class and selects that

class with the minimum Hamming distance [23]. The Hamming distance is the number

of bits in the input which do not match the corresponding exemplar bits. The Hamming

network developed by Lippmann [58] implements this algorithm.

Let us denote the input pattern and the exemplar as v and e respectively. Each

element of both vectors are binary and belong to the set {+1,-1}. Suppose there are

M exemplars {ei, e2,e^}, then the Hamming distance between an input pattern and

the i t h exemplar is

d^^N-v-Z) (4.1)

where N is the number of elements in the vector and v • e,- is the inner product of two

vectors.

Since minimizing d{ is equivalent to maximizing 6, = N — di = |(A/ + v • e;), the

optimum minimum error algorithm can be restated as follows:

1. Calculate 6,- Vt € { 1 , 2 , A f } ;

2. Find the index k such that bk = max{6j : i £ { 1 , 2 , M } } ;

3. Output the exemplar vector eV

To implement this algorithm, the Hamming network is structured in two subnets: the

lower subnet calculates bi (the i t h matching score) and the upper subnet finds the index

k. Note that the Hamming network does not implement the third step.

In the lower subnet, one neuron is enough to calculate each fe, by setting the weights

to vo'- = ^e\, the threshold to 0j — — y , and the activation function to

u if a > u

a if u> a > 0 (4.2)

0 otherwise

It is assumed that this neuron operates in its linear range, that is, it is essentially a linear

neuron. The lower subnet consists of M such neurons.

For the upper subnet, Lippmann used a feedback network known as lateral inhibitive

network or winner-take-all network, which mimics the heavy use of lateral inhibition

evident in the biological neural networks of the brain [49]. The weights in this subnet

are set as follows

w kl = < 1 if k = I

(4.3) -e iik^l

where 0 < e < All thresholds are set to zero. Neurons in this subnet have the

same activation functions as in (4.2) and also operate in the linear range. After this

subnet converges, only one neuron's output is positive and the rest are zeros. The one

with none-zero output corresponds to the exemplar which has the minimum Hamming

distance to the input pattern.

4.2.2 Designing the Parity Network

Now let us use the analytical approach to construct a neural network which solves the

parity problem. A function which solves this problem is such that it outputs 1 if there

are odd number of l's in the input pattern; 0 otherwise. This function is the so called

parity function.


Let

A 1 if E ? = i * ; > « V i e { i , 2 , . MJV} (4.4)

0 otherwise

Then the following function

N N

y = E(-i) ,'~ 1* = Efc* (4.5) i=i t=i

is equivalent to the parity function in the sense that the input-output behaviours are the

same.

Zi can be realized by a threshold-logic neuron shown in Figure 4.2, and y can also be

realised by a threshold-logic neuron shown in Figure 4.3. Therefore, the whole network

can be built as that shown in Figure 4.4.

Xi

•N

Figure 4.2: Realizing Function (4.4)

Zi

y

N

Figure 4.3: Realizing Function (4.5)


y

Figure 4.4: The Network for Solving the Parity Problem

As shown by this and previous examples, one can easily derive a network using the an

alytical approach. Note that both the network architectures and the network parameters

are simultaneously derived.

The advantages offered by the analytical approach are as follows: (1) no training is

needed; (2) the network architecture derived is proper if not optimal; and (3) complete

understanding of the role of every individual neuron.

Although a lot can be learnt from the work of researchers such as Hopfield, Koho-

nen and Lippmann on the constructing processes of their network models, they have not

yet developed a systematical method of network design. They relied mainly on their

imagination, intuition, as well as inspiration from their knowledge of the nervous sys

tem. However, to effectively use neural networks as computing machines, a systematical

method of network design is very much needed.

4.3 The Algebraic Derivation Methodology

Here we propose a methodology of network design, which consists of the following five

stages:








ified functions.

During our study on network design, we found that it is very useful to symbolically

represent these units and their networks. These symbols can then be manipulated to

yield proper network models. Because symbol manipulation is an important feature of

our methodology, we call it. algebraic derivation of neural networks.

The approach of symbolically representing computing units and manipulating these

symbols was used before in [57]. However, they were only concerned with designing

networks of threshold-logic neurons to realize logical functions. Here, we are concerned

with designing networks of diverse neuron types to realize more complex functions.

Chapter 5

S Y M B O L I C R E P R E S E N T A T I O N O F N E U R O N S A N D T H E I R

N E T W O R K S

Our task is to map functions or algorithms to neural networks. As required by the

algebraic derivation methodology, the first step is to determine a minimum set of neuron

models; the second step is to devise symbolic representations for the chosen neuron

models and their networks. These two steps are completed in this chapter.

5.1 Neuron Models

Since we restrict the choice to the class of neuron models defined in Chapter Three, the

task in the first step is reduced to deciding what types of activation functions to use.

Because of practical restrictions, these functions have to be as simple as possible.

Many techniques in image enhancement and restoration involve a weighted summa

tion of neighboring pixels and passing the sum to a function to determine the value of

processed pixel (see Section 2.2). This function is usually piecewise. An example is the

function used in the contrast stretching technique, which is shown in Figure 5.1. Denote the contrast stretching function as fc{%)- We can write another function as

f(x) = afQ(x) + p?fQ(x -a) + 7/o(x - b) (5.1)

where functions /Q(.) are shown in Figure 5.1 as dotted fines. By proper choice of

coefficients, we can ensure that

f(x) = fc(x) Vxe[0,c] (5.2)

52

Chapter 5. SYMBOLIC REPRESENTATION OF NEURONS AND THEIR NETWORKS53

y

o

d

x \(2)

Figure 5.1: The Contrast Stretching Function

Therefore, we can rewrite many functions used in image processing as sums of several

quasi-linear functions. Hence, the quasi-linear activation function is chosen.

where fh is a linear function. Linear activation function is also needed.

A lot of logic operations are employed in image processing techniques, therefore, the

threshold-logic activation function is also required.

Since functions used in image processing techniques can be expressed as compositions

of these three functions, we choose them to form the minimum set of activation functions.

Because neither one of these functions can be realized by combinations of the other two,

this set is indeed minimum.

The neuron models we have just chosen are linear, quasi-linear, and threshold-logic

neurons, which are referred to as LQT neurons. They are the simplest among all the

neuron models proposed in the literature, hence should have minimum difficulty in their

hardware implementations. In Chapter Seven, a formal proof is given which shows that

any image processing technique can be realized by networks of these neurons.

Note that

f(x) = / i O (fQ{x)JQ(x - a),fQ(x - b)) (5.3)

For the convenience of algebraic derivation, symbols are used to represent LQT neu

rons. The linear neuron is denoted as

y =C xi,x2,...,xn | w-l,w2,...,wn D t (5.4)

The quasi-linear neuron is denoted as

y =< x1,x2,...,xn | w1,w2,...,wn >* (5.5)

The threshold-logic neuron is denoted as

y =C xux2,...,xn \w1,w2,...,wnZl t (5.6)

Where xt- represents the i t h input to the neuron, W{ the weight associated with x,-, and

t the threshold of the neuron. To represent either one of LQT neurons, the following

notation is used.

V = {|xi,x2,...,xn \w1,w2,...,wn\} t (5.7)

If Wi = 1 Vi € { 1 , 2 , n } , then the neuron is denoted as

y = {|£i,X2, ...,£„!}' (5-8)

Sometimes, it is more convenient to use the following vector representations

y = {\x\ w\Y (5.9)

or

r/ = {|x|w|r (5.10)

where x = x. — (xi,x2,xn) and w — w = (w\, w2,wn).


5.2 LQT Networks

A network which is composed of LQT neurons is called a LQT network. Al l the networks

derived in this thesis are such networks.

Three typical topologies of LQT networks are illustrated in Figure 5.2. Both networks

in Figure 5.2.a and Figure 5.2.b are feedforward, and the network in Figure 5.2c is

feedback. Note that the network shown in Figure 5.2.b is different from conventional

feedforward neural networks in that some neurons in the third layer take inputs from the

input neurons.

y,(k) y,(k) y3(k) y4(k)

(a) (b) (c)

Figure 5.2: Typical LQT Network Topologies

The notation of a network is similar to that of a neuron. The network input is denoted

by x and the network output is denoted by y. The output of the i t h neuron is denoted

as Hi.

The notation of the network shown in Figure 5.2.a is

V = C < C x i , x 2 I w3Uw32Zit3, LZ x 1 ? x 2 | w41,w42-Jiu | w53, w54 >'5,

Note that the input neurons are not explicitly expressed here. When all the variables

in the notation are independent, the notation represents a whole network. Part of the

network can also be represented, such as

y = C Ps,pe | w 7 5 , w76Dt7 (5-12)

which represents the output neuron, and

y =C< nz, PA I ™53, wb4> t s, H I ™75, w76D t 7 (5.13)

which represents a part of the network as shown in dashed lines in Figure 5.3. The

Figure 5.3: Representing Part of a Network

notation of the network shown in Figure 5.2.b is

y = C < Xi , L Z X i , X 2 | ltf31,tU 3 2Z]' 3 ,IZ X i , X 2 | W41,W42Z\U | U>51,U>53, ™54 ><5,

< L Z x i , x 2 | w31,w32Z} t 3,n x i , x 2 | w41, wA2-3u,x2 \ w63, w64, w62 >'6 (5.14)

| 75,^76 D t 7

Another way of representing a whole network is

fi5 = < xufi3,p4\w51,w53,w54 >ts

= < M3,M4,X2 | W63, ^ 6 4 , W 6 2 >'6 (5.15)

^3 = C XUX2 IW31, w32n t 3

Hi = (Z X i ,X 2 I W 4i,l0 4 2]* 4

where symbol { represents one layer of neurons. Note that input neurons are not explicitly

expressed in this form either.

The form used in equations (5.11) and (5.14) is called compact form, and the form

used in (5.15) is called layered form. The layered form is easier to comprehend.

Sometimes, a neuron can be grouped to either one of several layers. An example is

shown in Figure 5.4, where the neuron with threshold t7 can be grouped either to the

second or to the third layer. To avoid ambiguity, in this thesis such neurons are always

grouped to the layer closest to the output. In this case, the neuron with the threshold t7

is grouped to the third layer, that is, the network's layered form representation is

y = C p5,fie,^7 \wS5,w8e,w87 D t s

^5 = rz p3,p41 w53,w54zi

Me = IZ fl3,P4 | ^ 6 3 , W64Z1

- rz x2 \ w723 t 7

M3 = \ZXl7X2\ 1031,1032 •

/ i 4 = rz xx,x2 \ w4l, w42]u

(5.16)

1*3

The symbolic representation of a feedback network is different from the representation

of a feedforward network. Since feedback networks are dynamic in the sense they have to

iterate many times, an indication of the iterations has to be included in the representation.


(5.17)

Figure 5.4: A LQT network

The representation of the feedback network shown in Figure 5.2.c is

yt(h + l) = LZ yi(h),y2(h),y3(h),y4(h) | wu,w12, i ^ , ^ Z)*1

y2(h + l) = LZ yi{h),y2(h),y3(h),y4(h) \wn,w22,w23,w24Zlt2

y3(h+l) = H yi(h),y2(h),y3(h),y4(h) \w31,w32,w33,w34Z}t3

y4(h + l) = LZ yi(h), y2(h), y3(h), y4(h) \ w41, w42, w43, w44 Z2U

where yi(h) is the i t h element of the network output at the h t h iteration. Vector y(0) =

(yi(0), 2/2(0), 2/3(0), y4(0))T are the network input. Note that neurons in this network are

both input and output neurons.

Another representation of the above feedback network is as follows

where

and

y(h + l) = L Z y ^ l W z 1

y(h) = (y1(h)1y2(h)Mh)Mh)?

t — (ti, t2, t3, t4)T

and W is the matrix composed of u^'s.

(5.18)

(5.19)

(5.20)

Chapter 6

C O M P U T A T I O N A L P R O P E R T I E S O F L Q T N E T W O R K S

As required in the third step of the algebraic derivation methodology, this chapter es

tablishes some theorems based on the computational properties of LQT networks. These

theorems are useful in simplifying and evaluating network realizations. Also, we shall

discuss some factors which affect the quality of a LQT network, and give a criterion for

network simplification.

6.1 Network Equivalence

Definition 6.1 If a function f(x) and a Network NA satisfy the following condition

where X is the definition domain of f, then f is said to be realized by NA, or NA is said

to be the network realization of f.

A well defined LQT network can only realize one function; but a function can be

realized by many different networks, thus comes the concept of network equivalence.

Definition 6.2 // Network A and Network B satisfy the following condition

where X is the definition domain of both NA and NB, then Network A and Network B

are said to be equivalent.

f(x) = NA{x) VxeX (6.1)

NA(x) = NB(x) VX<EX (6.2)

59

Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS 60

The following theorems concern the property of network equivalence. Theorems 6.1 to

6.8 concern single neuron, or networks of single neuron. The rest in this section concern

networks of multiple neurons.

Theorem 6.1 If NA is

VA = O l , Z 2, Zn | WX, W2, U>„|}* (6.3)

and Ng is

VB = ^zil,zia,...,Zin | wil,Wi9,...,win^t (6.4)

where sequence {z^, z , - 2 , Z i n } and {w^, W{2, ...,u;,„} are permutations of {z\,z2,...,zn}

and {wi, w 2 , w n } respectively, then NA and NB are equivalent.

Theorem 6.2 Let NA be

VA = {\zuz2,...,zn | w1,w2,...,wn§ t (6.5)

If W{ = 0, then NA is equivalent to

VB = {|2i,...,*t-i,*t+i,-,*n | «?!,...,u;.-_i,«;,-+!,.„,U7„|}* (6.6)


VA = {\zi,z2,...,zn | wuw2,...,wn$* (6.7)

If Zi == a, where a is a constant, then NA is equivalent to

NB = {\z1,...,zi_i,zi+1,...,zn |iwi,...,u7,-_i,u;i +i,...,«; n|} t~ 0 ," i (6.8)


VA = l > i , z 2 , z n | wx,w2,wn\Y (6.9)

If Zi = Zj and i < j, then NA is equivalent to

NB = { | * i,...,Zi,...,Zj_i,Zj + i , . . . , z n |ti^,...,tx>,- + u;j,...,u;J-_i,u;,-+i,...,u;n|}* (6.10)

or

NC = {|zi,...,Z,-_i,2,-+i,...,Zj,...,2„ | +«»,•,...,U>„|}' (6.11)

Theorem 6.5 / /TV^ is

y>i = C z I tu • * (6.12)

and NB is

yB=Cz\ 0w Z\p t (6.13)

where P > 0, then NA and NB are equivalent.


VA = C ZX, Z2, *n I wi, w2,u)„D( (6-14)

and NB be

VB = < Zx,Z2,...,Zn | V)i,W2,...,Wn > ' . (6.15)

If the output of NA is always greater than or equal to zero, then NA and NB are equivalent.


yA = C zx,z2,...,zn | wx,w2,...,wn^ (6.16)

and NB be

yB = C z i , z 2 , . . . , z n | Wi,w2,...,wnZi+ s (6.17)

where 1 > <5 > 0. If the output of NA is either zero or one, then NA and NB are

equivalent.

VA =< zx,z2,...,zn | wi,w2,...,wn >* (6.18)

and NB be

VB =C * I , z 2 , z n | w-i, w2,u)Jt+< (6.19)

where 1 > 6 > 0. If the output of NA is either zero or one, then NA and NB are

equivalent.


yA = {W,z2,...,znf (6.20)

and NB be

VB = { k i , ••, Z i - i , C Zi, ...,zi+m D'Szi+m+i, ••-,Zn|}<2 (6.21)

where i G {1 ,2 , ra} anc? n > i + m > 1. Ift = t1+t2, then NA and NB are equivalent.

Theorem 6.10 let NA be

{\zx,z2,...,zn \w1,w2,...,wn\} t (6.22)

and NB be

{| C zx | ^ 3 " , C 2 2 |u>2 3 < 2 , . . . ,C z n | u>„ D ( " l }^ 1 (6.23)

IfYJi=\ti = t, then NA and NB are equivalent.


yA = {iCz\wD t* |a|}<2 (6.24)

and NB be

yB = {\z\aw\y (6.25)

Ift — at. +12, then NA and NB are equivalent.

Chapter 6. COMPUTATIONAL PROPERTIES OF LQT NETWORKS


yA = {\Cza,zbDt> |a|}<2

and NB is

yB = { \ C z a , z b \ a , a D a h |}<2

then NA and NB are equivalent.


yA = {\<za,zb>ti |a|}<2

and NB be

yB = {\<za,zb\ct,a>at> |}<2

lfa>0, then NA and NB are equivalent.


yA = C {\zx,z2,...,zn \wx,w2,...,wn§tl Dh

and NB is

VB = {\z1,Z2,...,Zn | iyi,W 2,..., Wn^

where t = ti + t2, then NA and NB are equivalent.


yA = « zx,z2, ...,zn | wuw2, ...,wn >tl>'2

and NB be

yB =< Zi, z 2 , - . , zn | w1,w2,wn >*

If t = tx + t2 and t2 > 0, then NA and NB are equivalent.


The theorems mentioned so far can be easily verified by writing out the functions

realized by both networks NA and NB, and checking if the functions are equivalent.

Example 6.1 Prove Theorem 6.15 is true.

SOLUTION The function realized by NA is

yA = fd(fQ(zw-t1)-t2) (6.34)

If zw — ta > 0, then

VA = IQ(ZW - h - t2) (6.35)

otherwise,

VA = 0 (6.36)

This implies

VA = VB = IQ(ZW - t) (6.37)

which is the function realized by NB- Therefore, NA = NB- tt

6.2 Network Quality

Since a function can be realized by an infinite number of LQT networks, a criterion is

needed for us to choose a proper network among all its equivalents. Obviously, this choice

is based on the quality of the networks. There are several factors which affect the quality

of a network. They are introduced in the following sections.

6.2.1 Network Depth

The depth of a LQT network is defined as the longest path from the input neurons to the

output neurons. For the sake of simplicity, it is assumed that all the LQT neurons have

the same processing time, which is taken as one time unit. For a feedforward network,

its depth equals its computation time. For a feedback network, its computation time

depends not only on the convergence property of the algorithm it implements but also

on its depth. Since our objective of using neural networks is to achieve high computation

speed, the depth of a network is thus an important factor in choosing one network over

another.

For a feedforward network, its depth also corresponds to the number of layers it has.

Denote the depth of a network NA as D(NA) and the number of layers as L(NA), the

following relationship exists

6.2.2 Network Size

Neural networks eventually have to be implemented in hardware, hence we should be

aware of their size. It is natural to think the size of a network in terms of the number

of neurons in the network. However, since a neuron is allowed to have indefinite number

of inputs, it is more proper to measure the size of a network in terms of the number

of connections. Moreover, in VLSI implementation, it is the connections which occupy

most of the space on a chip.

The number of connections in a neural network equals the number of weights. Since it

is easier to count the number of weights in the symbolic representation of a LQT network,

we define the network size as the number of weights in the network.

Let us denote the number of neurons in network NA as N(NA) and its network size

as S(NA). It is clear that

D(NA) = L(NA) - 1. (6.38)

N(NA) < S(NA). (6.39)

6.3 Criterion for Network Simplification

Both network depth and network size have to be considered when choosing a network over

its equivalents. In this thesis, the following simple criterion is used: Given two networks

NA and NB which realize the same function, if D(NA) < D(NB) and S(NA) < S(NB),

or if D(NA) = D(NB) and S(NA) < S(NB), then NA is chosen over NB, and NA is said

to be better than NB- The process of choosing a network over another is referred to as

network simplification. The theorems in Section 6.1 are developed for such purpose.

Obviously, a network with the shortest depth and the minimum size is the best among

all the equivalent networks. Unfortunately, such a case is rare. The network with the

shortest depth often does not have the minimum size. How to effectively balance these

two factors is still an on-going research. In the following chapters, we only give upper

bounds on network depth and size for some functions.

There are also some other factors which affect the quality of a network. For example,

since there are inevitably some noise in any hardware implementation, how tolerant a

network is to noise is certainly a good measure of network quality. However, due to time

constraints, these factors are not considered in this thesis.

Chapter 7

D E R I V A T I O N P R O C E D U R E S

This chapter provides procedures of deriving LQT networks to realize functions. The

usage of these procedures is demonstrated through some examples, which in turn are

useful in deriving networks in later chapters.

A function which can be realized by a LQT network is referred to as a realizable

function; the network which realizes the function is referred to as the network realization

of the function. A function which has a known network realization is called a realized

function.

There are four classes of functions: single-input single-output (SISO) functions, single-

input multiple-output (SIMO) functions, multiple-input single-output (MISO) functions,

and multiple-input multiple-output (MIMO) functions. We shall only describe procedures

for realizing SISO and MISO functions since realizing SIMO or MIMO functions is just

multiple applications of these procedures.

From the definitions of LQT neurons, the following statements are true.

Fact 7.1 The network realization for the following function

n (7.1)

is

y = C x1,x2,...,xn | w1,w2,...,wn D* (7.2)

67

Chapter 7. DERIVATION PROCEDURES 68


E ? = i W{Xi - t if J2i=1 w{Xi > t

0 otherwise (7.3)

is

y =< x x , x 2 , . . . , x n \wuw2,...,wn > ' (7.4)


\ i ifU=i^i>t ,7_, y = \ (7-5)

[ 0 otherwise

is

y =n x 1 , x 2 , ...,xn | wt,w2, ...,wn • * (7.6)

Now that we have established network realizations for the above three simple func

tions, we can work on network realization of more complex functions.

7.1 Realization of SISO Functions

The procedure of deriving a network to realize a SISO function / is as follows:

step 1. Call decompose(/);

step 2. Compose all network realizations to form a network realization of / .

step 3. Apply theorems in Chapter Six to simplify the resulting network.

In the above procedure, decompose(/) is a recursive subroutine:

decompose(/):

1. Decompose the function / as

/ = fo 0 ( / 1 , / 2 , - , / n )

Chapter 7. DERIVATION PROCEDURES 6 9

such that f0 has a known network realization;

2. For all i € { 0 , 1 , 2 , n } , if fi is a realized function, replace it with its network

realization; else, call decompose(/,)

3. Return.

The derivation procedure may be viewed as consisting of two stages: step one is the

analysis stage, and steps two and three are the synthesis stage. The use of this procedure

is demonstrated through the following examples. Functions in these examples are real

unless otherwise specified.

Example 7.1.1 Given a function F whose output is either 0 or 1 , find a network

realization of its complement F.

S O L U T I O N Since

F = l - F ( 7 . 7 )

according to Fact 7 . 1 , the network realization of F is

F = c F | - I D " 1 ( 7 . 8 )

t)

Example 7.1.2 Find a network realization of the following function

a if x > t

0 otherwise ( 7 . 9 )

S O L U T I O N Let

1 iix>t

0 otherwise ( 7 . 1 0 )

then

V = a / i ( * ) ( 7 . 1 1 )


which can be decomposed to

2/ = / o ° / i (7.12)

where / 0 = a fa. F rom Fact 7.3 and Fact 7.1, we know both functions / 0 and / i have

known network realizations as

fo=Cfi\aD° (7.13)

and

fi(x) =C x3* (7.14)

Therefore, the network realization of function (7.9) is

y = C C xZi* | a D° (7.15)

Example 7.1.3 F i n d a network realization of the following function

S O L U T I O N Let

y = s a if t2 > x > ti

0 otherwise

a if x > t\

0 otherwise

(7.16)

(7-17)

and

a if x > t2

0 otherwise

then


y(x) = fi(x)-f2(x)

y = fo° (/i,/2)

(7.18)

(7.19)

(7.20)

where / 0 = fi — f%. From Fact 7.1, fo can be realized by a linear neuron,

/ o = C / 1 , / 2 | l , - l D ° (7.21)

From previous example, both fx and / 2 can be realized as

h{x) = C C x3h | oc D° (7.22)

and

/ 2(x) = C C x3t2 \ a D° (7.23)

Therefore, the network realization of y(x) is

y(x) = C C C xl]h | a D°, C C xZ)'2 | a D° | 1, - 1 D° (7.24)

Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to

y(x) =CC€1 xii* 1 | a D ° , c n xZi t 2 | - a D°D° (7.25)

Applying Theorem 6.10 again, this network can be further simplified to

y(x) =CO xZi u,\Zx-3t2 | a , - a D° (7.26)

tt

E x a m p l e 7 .1 .4 Find a network realization of the following function

a if x > t

0 otherwise (7.27)

SOLUTION Function y(x) can be decomposed to

y = foof1{x) (7.28)

where

fo = + <x (7.29)

Chapter 7. DERIVATION PROCEDURES

and

/ i ( x ) = 1 if -x > -t

0 otherwise

F r o m Fact 7.1, we know that

F r o m Fact 7.3, we know that

fo =C fi | - a D'

fi(x) = C x | - 1 • -t

Substituting this to (7.31), we have

y(x) = f0(x) = C C x\ - 1 | - a D~ a

Example 7.1.5 F i n d a network realization of the following function

S O L U T I O N Let

and

then

a if x = t

0 otherwise

A W = { a if x > t

0 otherwise

a if x > t

0 otherwise

y(x) = /i(x) - / 2 ( x )



V = / o ° ( / i , / 2 )

where fo = fi — / 2 , whose network realization according to Fact 7.1 is

/ o = C / i , / 2 | l , - l D°

(7.38)

(7.39)

We know that both /i(x) and /2(x) have network realizations (see Example 7.1.2 and

Example 7.1.4) as

/ i (x) =CC x •* |oO° (7.40)

and

/ 2 (x) =CC x | - 1 | - a D~ a (7.41)

Substituting (7.40) and (7.41) to (7.39), we have

y(x) = /o(x) = C C C x •* | a D°, C C x | - 1 | - a D " a | 1, - 1 D° (7.42)

Applying Theorem 6.10 and Theorem 6.11, we can simplify this network to

y(x) = C C x C x | - 1 | a,a D a . (7.43)

It can be rather tedious to follow the procedure explicitly, from now on we only follow

it implicitly, and put more emphasis on the simplification process.

Example 7.1.6 If function (7.34) is an integer function, find its network realization.

S O L U T I O N Let

a if x > t

0 otherwise (7.44)

and

a if x > t + 1

0 otherwise (7.45)

then

whose network realization, according to Fact 7.1, is

y=<Zfi(x\f2(x)\l,-l D°

Since, according to example 7.1.2,

fi(x) = C C x • * | o O °

and

/ , ( I ) = C C I : ' + 1 \aD°

substituting these two equation to equation (7.47) yields

2 / = C C C x = ] < | a D ° , C C x Zit+1 | a D ° [ 1 , - 1 D °

Applying Theorem 6.10 and Theorem 6.11, this network can be simplified to

y = C C x C x 3 t + 1 | a , - a D °

(7.46)

(7.47)

(7.48)

(7.49)

(7.50)

(7.51)


Oil x = t\

ct2 x = t2

y = \

S O L U T I O N Let

Mx)

Ctji X — tfi

a, if x = t{

0 otherwise

(7.52)

(7.53)


where i £ { 1 , 2 , n } , then

n (7.54)

t'=l

Recall Fact 7.1 and Example 7.1.5, we know that

V = C / l ( x ) , / 2 ( x ) , . . . , / n ( x ) D° (7.55)

and

(7.56)

Substitute (7.56) to (7.55), we have

y = C C C X D ' ^ I Z x | - I D - * 1 I a i , a : i D 0 1 , . .

C C xD*",IZ x I - I a n , a n 3 Q " D °

(7.57)

Applying Theorem 6.10, we can simplify this network to

y = C C xi]* 1 , C x I - l D _ t , , . . . , C xD*",IZ x | - 1ZT*" | o ^ a i , . . . , a B , a B D a (7.58)

where a = 52 "=1 a t - tt

This example shows that any single variable finite function can be realized by a LQT

network with a depth at most of 2 and a size at most of n. It leads to the following

Theorem 7.1 Any single variable finite function is realizable.

Example 7.1.8 If function (7.52) is an integer function, find its network realization.

S O L U T I O N Let

theorem.

a; if x = ti (7.59)

0 otherwise

where i € {1,2, ...,n}. It can be shown that

Recall Fact 7.1, and Example 7.1.6, we know that

y = C / i ( x ) , / 2 ( x ) , . . . , / n ( x ) D°

(7.60)

and

fi(x) = C C x • x • < i + 1 | a,-, - a ; D°

(7.61)

(7.62)

Substituting equation (7.62) to equation (7.61) yields

y = C C C x Z) 1 1 , C x Z)11"1"1 l a ^ - a ! D " , c r Z x D ' 2 , r Z x D t 2 + 1 | a 2 , - a 2 3 u,

CIZ x rz x | ctn,-ctn D°D°

which can be simplified to

(7.63)

y = crz x z i ^ r z x z i ' i + 1 , r z x z i < 2 , r z x z i < 2 + 1 , r z xz\ t n, rz xz\ t n + 1

(7.64)

If = + 1, then the above network realization can be further simplified to

y =CIZ xD '^rZ xD t 2,...,IZ xD*",rZ x • < n + 1 | a i , a 2 - <*i , . . . , a„ - a „ _ i , - a „ D° (7.65)

t Example 7.1.9 Find a network realization of the absolute function

y = |x| (7.66)

S O L U T I O N Let

x if x > 0

0 otherwise (7.67)

anc —x if —x > 0

0 otherwise

then

y(x) = fi(x) + f2(x)

According to Fact 7.1 and Fact 7.3, the network realization is

y = C < x > ° , < x | - 1 > ° D °


ax + b if x > t

0 otherwise

S O L U T I O N Let

and

f2(x) = {

-t iix>t

otherwise

1 if x > t

0 otherwise

then

y(x) = afi(x) + (at + 6)/2(x)

Recall Fact 7.1, Fact 7.2, and Fact 7.3, we have

y = C < x >*, C x \a,at + b D°

Example 7.1.11 Find a network realization of following function

y= < x if F = 1

0 otherwise (7.76)

where F is a function whose output is either 0 or 1.

S O L U T I O N Let

x-a if F = 1

0 otherwise (7.77)

where a < x Vx £ X, and A" is the definition domain of function (7.76). Then

y(x) = f.ix) + aF

whose network realization, according to Fact 7.1, is

y=CMx),F\l,aD°

fi(x) can be expressed as

AGO =

where b > x — a Vx £ A\ Let

x - a - f e F iix-a-bF>0

0 otherwise

then according to Fact 7.2,

and, according to Fact 7.1,

/ u (x ) = x - a - bF

fi(x) =< fn(x) >°

(7.78)

(7.79)

(7.80)

(7.81)

(7.82)

/ n (x ) = C z , F | l , - & D ° (7.83)

Hence,

h{x) =< x,F\l,-b> a

The network realization of F, according to Example 7.1.1, is

F = c F | - l y 1

Substituting (7.85) to (7.84) yields

Mx) =<x,F\l,b>a+b

Substituting this to (7.79) yields

y=C<x,F\l,b> a+ b,F\l,aD°

tt


x if F = 0

0 otherwise

where F is a function whose output is either 0 or 1

SOLUTION According to the previous example, the network realization

y =C< x,F | l , 6 > ° + b , F | l , a D°

and recall Example 7.1.1, we know that

F=cF\ - ID-1

Substituting this equation to (7.88) and after simplification, we have

y =C< x,F\ l , - f e> a , F | 1,-a D~ a

y = s


7.2 Realization of M I S O Functions

This section shows the network realization of MISO functions. The procedure of network

derivation of MISO functions is as follows.

step 1. Decompose function / as

/ = / i ° h

where fr is a linear function;

step 2. Proceed with the procedure for realizing SISO functions to find a network

realization of fY; step 3. Compose the network realization found in step 2 and the network realization

of /L to form the network realization of / ;

step 4. Using theorems in Chapter Six to simplify the resulting network.

The use of this procedure is shown through the following examples.


/ :Z>?-2> 2 (7.91)

where £>, is a finite set of numbers.

SOLUTION We consider the following two cases.

Case I: / is a linear function.

Since we can express / as

f = Y,biXi + b0 (7.92) t=i

where Xi £ X^. According to Fact 7.1.1, we have

f(x)=Cxux7,...,xn \bub2, ...A D b 0 (7.93)

Case II: / is a non-linear function.


Here we introduce a linear function as follows

fL : V3 (7.94)

and a finite non-linear function

fi : X>3 - V2 (7.95)

We now find the network realizations for both / i and fr,, and then compose them to form

the network realization of / .

Let

Mi±\Di\ V i e {1,2,3}

and Zi denotes an element in V3, and a,- = f\{zi). According to Example 7.1.7, the

network realization of fi is

fi{z) = C C 2 D * 1 , • 2 | - 1 I T * 1 , C z Z]ZM3, • 2 | - 1 • - * w 3

(7.96) | Ol, «1, % 3 ) Z M 3 ^ a

where a = a t - According to Fact 7.1,

z = JL(X) = C x1,x2,...,xn\bl,b2,...,bn Dbo (7.97)

Substituting (7.97) to (7.96) and after simplification, we have

y = C C x i , ...,xn\bi, ...,bn I f 1 , C X i , . . . ,x n| - 6 i , - b n Zi~ c i,

• xi , . . . ,x„|&i, ...,&„ U c « 3 , C xj , . . . , x „ | - & i , . . . , - & „ • " ^ (7.98)

where c, = z; + fe0- if

This example shows that any multi-variable finite function can be realized by a LQT

network with a depth at most of 2 and a size at most of 2Ma(n + 1). It leads to the

following theorem.

Theorem 7.2 Any multi-variable finite function is realizable.

Since a pixel takes integer value from 0 to 255, any image processing technique can

be expressed as finite functions. According to this theorem and Theorem 7.1, it is clear

that the LQT networks are capable of realizing any image processing technique.

In Example 7.2.1, we made an assumption / = f\ o //,, here we show under what

condition this assumption is true. Denote f2 = / i o / L , we have the following theorem.

Theorem 7.3 The necessary condition for f = fi is thatVxi, x2 E if f{x\) 7 f(x2),

then fL(xx) ^ / L ( X 2 ) .

P R O O F If there exist Xi, x2 E Z>" s u c n t n a t f(xi) / f(x2) but /L(£*I) = /L (^ 2 ) , then

fx o = /1 0 h(x2)- This implies that / 2(x x) = f2(x2), hence / 2 ^ / . I

Corollary 7.3.1 ITie necessary condition for f = f2 is M3 > M2.

P R O O F If M3 < M2, then there must exist a pair x a , x2 E T>™ such that f(xi) ^ f{x2)

but fL(xi) = h(x2) I

Theorem 7.4 Functions ft and fL are not unique.

P R O O F This theorem will be shown to be true in some of the following examples. I

As shown in Example 7.2.1, the size of the network realization of a non-linear multi-

variable finite function is proportional to M3. Therefore, according to Corollary 7.3.1, the

best projection function /j, is the one whose codomain size (M3) is equal to the codomain

size (M2) of the non-linear function itself.

Here, we introduce a projection function which is proper for all multi-variable finite

integer functions.

z = Y,xiM1i-1 ' (7.99)

8 = 1

Chapter?. DERIVATION PROCEDURES 83

Such a function is called one-to-one projection since / L ( X I ) ^ / L ^ ) iff £1 7 %2 Vx*i, x2 €

which implies M 3 = M xn .

Another projection function, which is proper only for a certain class of finite integer

functions, is as follows

z = X > < (7.100)

which has the property that Af3 = n(M\ — 1) + 1. We will refer to functions (7.99) and

(7.100) as projection I and projection II.

The following examples show network derivation of some multi-variable finite integer

functions.

Example 7.2.2 Find a network realization of the AND function

y — Xi A x2 A . . . A xn (7.101)

SOLUTION Using projection I /L{X) = £"=1 2 , _ 1a; t-, the AND function can be ex

pressed as

V = fx o h{x) (7.102)

where

' 1 if z > 2 n - l AW = (7.103)

0 otherwise which, according to Fact 7.1, has the following network realization

/ i ( z ) = C z D 2 " - 1 (7.104)

According to Fact 7.1, the network realization for J L { X ) is

h(x) = C xux2, xn I 1 , 2 , 2 " " 1 D ° (7.105)

Substituting it to (7.104) yields

y = C C x u x 2 , x n I 1 , 2 , . . , 2 " " 1 D ° D 2 " " 1 (7.106)


which, according to Theorem 6.10, can be further simplified to

y=CI :r i ,S2, - . . , *» | l , 2 , . . , 2 " - 1 z T - 1 (7.107)

The resulting network has a depth of 1 and a size of n. If we use projection II / L ( X ) = 5Z"=i and chose

1 if z > n

0 otherwise

then

Since

and

hence

(7.108)

y = fiofL(x) (7.109)

h(z) =C ^ •» (7.110)

h{x)=CxuX2,...,xnD 0 (7-111)

y = d x i , x 2 , . . . , s B (7.112)

The resulting network also has a depth of 1 and a size of n. (J Example 7.2.3 Find a network realization of the OR function

y = Xl V x 2 V,. . . , Vx n (7.113)

SOLUTION Using the projection I / L ( X ) = 2 ,_1x,-, the OR function can be

expressed as

y = fxofL(x) (7.114)

where

' 1 if z > 1 Mz) = { ~ (7.115)

0 otherwise


From previous example, the network realization can be easily derived as

If projection II is used, then we have the following network

y = C xi,x2, ...,xnZi

Note that both the two networks have a depth of 1 and a size of n. ft

Example 7.2.4 Find the network realization of an XOR function

y — X\ © x2 © ... 0 xn (7.116)

SOLUTION Using projection II JL(X) = £"=i xi-> t n e XOR function can be expressed

as

w here

Let

then

Since

y = / i ° h(x)

In =

1 if z is odd

0 if z is even

1 if z > i

0 otherwise

/i(*) = E ( - i r 7 i , ( * ) t=i

hi = C z

(7.117)

(7.118)

(7.119)

(7.120)

(7.121)

hence

h(z) = C C z l l ' ^ z D 2 , C z | 1,-1, (-I)*- 1 3° (7.122)


5ince

h(x) —C xi,x2,xn D ° (7.123)

we have

(7.124) y = C C xi,x2,xn Z]1, C #i, x 2 , a ; n • 2 , • x l 5 x 2 , x n

I c — i ) " - 1 ^ °

The resulting network has a depth of 2 and a size of ra2 + n.

For the sake of comparison, let us use projection I. Since it is difficult to express the

non-linear function / i for an arbitrary n , we only show a special case when n — 3.

Let

fL(x) = J2?-1*i

and

AC*) =

t ' = l

0 z = 0 1 z = 1 1 z = 2 0 z = 3 1 z = 4 0 z = 5 0 z = 6 1 z = 7

then

y(s) = / i ° h(x)

According to Example 7.1.8, we have

(7.125)

(7.126)

(7.127)

fi(z) = c r z z z i 1 , r z z z i 3 , r z z z i 4 , r z z z i 5 , r z z z i 7 |.i,—1,1,—1,1 D ° (7.128)


hence,

y = C C Xi,x2,x3 | 1,2,4 X i , x 2 , x 3 | 1,2,4 Z l 3 , C Xi, x 2 , x 3 | 1,2,4

C x ! , x 2 , x 3 | l , 2 , 4 D 5 , C I x 1 , x 2 , x 3 | 1,2,4 | 1 , - 1 , 1 , - 1 , 1 D°

Note that this network has a size of 20 while the network derived using projection II

has a size of 12 (n = 3).

Remark Example 7.2.4 shows that using projection II results in smaller network

size, this is generally true. Although in both Examples 7.2.2 and Example 7.2.3, using

projection II has no advantage in terms of network depth and network size, the magnitude

of weights is much smaller. This is important in hardware implementation, since the value

range of the weights have to be limited.

Chapter 8

N E T W O R K R E A L I Z A T I O N S O F I E T E C H N I Q U E S

In this chapter, image enhancement (IE) techniques are re-categorized according to their

realization difficulty into two classes: linear and non-linear filtering techniques. Examples

of realizing some of the techniques are provided here to show the effectiveness of the

algebraic derivation approach. Since the linear techniques are easy to realize, they are

treated briefly with two examples; the rest of this chapter is devoted to the realization

of non-linear techniques.

8.1 Network Realizations of Linear Techniques

A two-dimensional image can be projected to a one-dimensional image. For instance, an

image U whose size is K x L can be mapped to a one-dimensional image g with the

following relationship

9i = Ukji Vfce {1,2, ..,#}, V/G{1,2,...,I} (8.1)

where i — (k — 1)L + I. This mapping is one-to-one, that is, U can be uniquely mapped

back from g using equation (8.1).

For all the linear filtering techniques, the enhancement process can be expressed, in

matrix form, as

f = E g + t (8.2)

where g is the image to be enhanced and f is the enhanced image. For an input image

88

Chapter 8. NETWORK REALIZATIONS OF IE TECHNIQUES 89

of size n and an enhanced image of size m, the network realization of this function is

fi = C gii92i— ,9n | e n ,e 1 2 , . . . ,e l n D h

h — C g1,g2,...,gn | e 2i, e 2 2 , e 2 n D < 2

fm — ^ 9li92i ••••>9n | emli em2i ••••> emn 3*

A more compact notation of this network is

f = C g | E D *

The architecture of (8.3) is shown in Figure 8.1.

en

(8.3)

(8.4)

Figure 8.1: Network Architecture for Linear Filtering

Now, for the linear filtering techniques, the task left is to map those techniques to

the form shown in (8.2). The following two examples illustrate such mapping.

Example 8.1 Find a network realization of the averaging filter.

SOLUTION Suppose the following image

U

Ul,l « 1 , 2 "1,3 Ul,4

"2,1 "2,2 "2,3 "2,4

«3,1 "3,2 "3,3 "3,4

"4,1 "4,2 "4,3 "4,4

(8.5)


is to be enhanced by the equally weighted averaging filter of (2.10) and the window of

the filter is 3 x 3, then the processing, in the matrix form, is

where

with

1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0

1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0

0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0

0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0

1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0

1 1 1 0 1 1 1 0 1 1 1 0 0 0 0 0

0 1 1 1 0 1 1 1 0 1 1 1 0 0 0 0

0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0

0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0

0 0 0 0 1 1 1 0 1 1 1 0 1 1 1 0

0 0 0 0 0 1 1 1 0 1 1 1 0 1 1 1

0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1

0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0

0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 1

0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1

f — (/l»/2, •••,/l6)T

g = {gi,g2,--,gi6)T

g = A g (8.6)

(8.7)

(8.8)

9i = uKl Vfc,Ze {1,2,3,4}

Comparing (8.6) with (8.2), we obtain the weight matrix and the threshold vector as


follows

E = A t = 0

where 0 represents the zero vector whose elements are all zeros, ft

Example 8.2 Find a network realization of the zooming technique.

S O L U T I O N Suppose the following image

U =

(8.9)

(8.10) "1,1 "1,2

« 2 , 1 "2,2

is to be magnified two times by the the replication technique described in Section 2.2

such that the resulting image has the size of 4 x 4, then the zooming process, in the


matrix form, is

where

with

f =

1 0

1 0

0 1

0 1

1 0

1 0

0 1

0 1

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

1 0

1 0

0 1

0 1

1 0

1 0

0 1

0 1

g = R g

f — ( / i , h, fie)T

(8.11)

(8.12)

(8.13)

9i = Uk,i Vfc,/ € {1,2}

Comparing (8.11) with (8.2), we obtain the weight matrix and the threshold vector as


follows

E = R

t = 0 (8.14)

8.2 Network Realizations of Non-linear Filters

There is no unified theory of non-linear filtering techniques. As a result, there is no

universal network structure which can be used for all of them as in the case of linear

filters.

8.2.1 Dynamic Range Modification

Contrast stretching, clipping, thresholding, and window slicing are techniques which

modify the dynamic range of a given picture. For this group of techniques, the processes

have a generalized form as follows.

axx + b\ > x > t0

a2x + b2 t2> x >ti V

anx + bn tn > x >

The derivation of the network realization of above function is as follows.

Let

a'{x + 6- x > t{_i

0 otherwise M*) = { \/i E {l,2,...,n}

(8.15)

(8.16)

where

with

It can be verified that

«o = Oo = U

V = / o ° ( / i , / 2 , ...,/„)

where fo = YA=I fi, whose network realization is

fo = C / l , / 2 , —,fn 3°

According to Example 7.1.10, we know that

fi(x) = C < x >u~\ C X-3u~ l | Ci,di D°

(8.18)

(8.19)

(8.20)

(8.21)

where c, = a\ and di = (a(f,_i + &(•). Substitute this to equation (8.20) , we obtain the

whole network as follows

y = C / i , / 2 , ...,/„ Z>° {/,• = C M,1,M»2 I c « ' ^ « 3 °

= < x > t i~1

[ Mi2 = rz x z i ' - 1

(8.22)

Applying Theorem 6.10, the above network can be simplified to

y — C Mll,/i 2l,---,Mnl,Ml2,M22,---,Mn2 | Ci, C2, ..., C n, dX , d 2 , d n 3°

^ = < x >«••-! (8.23)

Hi2 = r z X D ' - 1

Based on the network model of (8.23), the network realization of the contrast stretch

ing technique (see section 2.2.1) is

v = c °, a, b, r z « z i a , r z u3 b \ a , (/? - a ) , ( a - fl + 7), (va - a/3), (vb - va + a/3 - 67) Z)°

Similarly, the network realization of the clipping is

v =C°, b, C C u3 b | /?, -/?, va, (vb - va) D° (8.24)

Finaly, the network realization of the thresholding is

(8.25)

8.2.2 Order Statistic Filtering

Order statistic filtering is a simple yet powerful technique widely used in signal processing

called the k t h order statistic of the array. An order statistic filter is a linear combination

of the order statistics. Variation in the weights results in various members of the order

statistic filter (OSF) family, some of which have been intensively studied and widely used

in signal processing. Median filters is one example [77, 46, 38, 68]. Order statistic filters

have the virtue of being very simple and having the edge preserving property. Their

performance is, if not optimal, satisfactory in most cases.

To formulate the problem of order statistic filtering in such a way that a network

realization can be derived, some functions need to be introduced. They are the rank

function, the comparison function, and the order statistic function.

Definition 8.1 For X = (xx, x 2 , x n ) , the rank function of Xi is

[11, 56]. The core operation is to find the k t h largest element in an array. This element is

ran k(Xi :X) = k, if X{ is the k t h largest element in X (8.26)

Definition 8.2 For X — (xx, x 2 , x n ) , the comparison function for x,- and Xj is

c(Xi,Xj) = <

0 Xi > Xj

0 Xi = Xj i > j

1 Xi = Xj i < j

1 Xi < Xj

(8.27)

Definition 8.3 For X = (a;1} x 2 , x n ) , the order statistic function is

os(k : X) = Xi if rank(x< : X) = k (8.28)

From the above definitions, we have the following theorem.

Theorem 8.1 The rank function rank(xi : X) and the sum of comparison functions

related to x,- have the following relationship,

n

rank(xi : X)— ^2c(xi,Xj) + I j / i (8.29)

The core operation in order statistic filtering is to to find the k t h largest member of

an array of pixels. For instance, the median filtering is to find the { r^) t h element of

the input array. Therefore, it is necessary to derive the network realization of the order

statistic function first.

Let i

m = Xi if rankfx,- : X) = k

V ' V i e {1,2,...,n} (8.30) 0 otherwise

then it is trivial to show that

os ( * :X) = (8-31) »'=i

that is,

os(k:X) = f0o(f1,f2,...,fn) (8.32)

where

/o = E / i (8-33)

which can be realized by a linear neuron as

/ o = C / i , / 2 , . . . , / „ D ° (8.34)

Let

F(Xi) ± { 1 if rank(x,- : X) — k 0 otherwise

Vi € {l,2,...,n} (8.35)

then,

, Xi if F(xi) = 1 fi(x) = { ; ViG{l ,2 , . . . ,n}

0 otherwise

According to Example 7.1.11, function fi(x) can be realized as

Mx) =C< xu F{xi) | 1, b > a+b, F(xi) | 1, a D°

(8.36)

(8.37)

where a < min{min{xi},min{a;2},min{a;n}} and b > max{max{ii}, m a x{ i2}

max{in}} — a. According to Example 7.1.8, the network realization of F(xi) is

F(xi) =CC rank(xt- : X) • f c , C rank(x,- : X) Zi k+1 | 1, -1 D°

According to Fact 7.1, the network realization of the rank function is

(8.38)

Since

rank(xi : X) =C c(xi,xi), ...,c(xi,xi_1),c(xi,xi+1), ...,c(xi,xn) D

1 «Z/j ^ x% c(xi,Xj) - \ if z > j

0 otherwise

- l (8.39)

(8.40)

and

1 Xj > Xi if i < j

0 otherwise

according to Example 7.1.4, and Fact 7.3, the network realization of c(xi,Xj) is

C C Xi, Xj | — 1,1Z3° j - 1 D - 1 if i > j

(8.41)

C Xi,Xj | - 1, I n 0 if i < j i € {l,2,...,n} (8.42)

Therefore, the network realization of the order statistic function is

V = C / i , / 2 , ...,/„ D°

{/,-=C Pi,F(Xi) | l ,<O 0

{pi =< xu F(xi) | 1,6 > a + 6

{F(xi) = C Hiufii2 I 1,-1 D°

pu = LZ rank(xj : X)Z3 k

pi2 = C rank(xj : X)3 k+1

{rank(xj : X) =C d x (x j , X i ) , ^ ( X J , X i_ j ) , C 2 (XJ, xi+1),C2(XJ, x„) D _ 1

{a\(xj, Xj) =C c^x^Xj) | - 1 D _ 1

Ci (x ,- ,X j ) = Cx,-,Xj | 1, —lZZf

c 2(x t",Xj) — LZXj, Xj | l , l d

where i £ {1,2, ...,n}. The above network can be simplified to

V —, C Pl,Pll, Hl2, A*2, H21, H22-, A*n, Pnl, Pn2 | 1, O, ~«, 1, «, ~ « , 1, «, ~« 3°

{/X,- =< X,-, / i j i , ^ j 2 | 1, 6, -6 > a + 6

/^ti — CZ CJI , . . . , C j j _ j , Cj j_|_i ,...,Cj > n | 1,..., 1,1,...,Id

fii2 — C CJI, C j ( j _ i , C t , i + i , C j j n I — 1 , — 1,1, lZZ\ k 1 + 1

(8.43)

(8.44)

id. CXi,Xj | l , - l n ° if i > j

C X J , X J | - 1,1D° if i < j

This network has a depth of 4 and a size of 2n(2n + 1). It is shown in Figure 8.2 for the

case n = 3.

This network is called OSnet and is denoted as OSnet(k,n), where n is the number

of inputs and k means that the network's output is the k t h largest element of the input

array. This network's schematical representation is shown in Figure 8.3.

The OSnet can be made adjustable by changing

Hij LZ Ci\, ..., Cj ti—\, , •• •, Q,n | 1,1,...,lZ] k-i+j-1 (8.45)


Figure 8.3: The Schematical Representation of OSnet


1 Xi

x2 r© Figure 8.4: The Schematical Representation of Adaptive OSnet

where j € {1,2}, to

,%+\i ••••> c t , n , X s

(8.46) | - 1 , . . , - 1 , 1 , . . . , 1 , - 1 ^ ' — 1

where xs is used to change the value of k. This new network model is called adaptive

OSnet. Its schematical representation is shown in Figure 8.4

Median Filtering

Median filtering is a simple operation in which the median value of the data within a

sliding window centered at a point is taken as the output of that point. Median filtering

has been shown to be effective in suppressing impulsive noise components while preserving

edges of a signal. It has been applied in several areas of digital signal processing which

include speech processing and image enhancement.

For an array of (2m + 1) numbers, the median is the (m + l) t h largest number.

Therefore, an OSnet(m + 1,2m + 1) can implement the median filter. Detailed account

of this network can be found in [83].

Separable Median Filtering. When median filtering is extended to two or more

dimensions, signal features are generally distorted or deleted. Separable median filtering

proposed in [68] yields somewhat better results in preserving signal features.

Separable median filters consist of two types of one-dimensional median filters — one

oriented in the horizontal direction and the other in the vertical. More explicitly, the


output value, y(l,k), at position (l,k) is given by

v(l, k) = median{z(/ — L, k),z(l, k),z{l + L, k)}

where z is defined as

z(p, q) = median{u(p, q-K),..., u(p, q),u(p, q + K)}

(8.47)

(8.48)

and u(l, k) are the sample values of the input signal.

A separable median filter with a window size of (2L + 1) x (2i^+l) can be implemented

by using (2L + 2) or (2K + 2) OSnets depending on whether the horizontal direction or

the vertical direction is median filtered first. The network of a separable median filter

with the horizontal direction filtered first is shown in Figure 8.5.

u(l-LJc-K) u(l-LJc) u(l-L,k+K)

u(l,k-K) u(l,k) u(l,k+K)

u(l+L,k-K) u(l+L,k) u(l+L,k+K)

z(l-LJc)

z0»

0

v(W

z(l+L,k)

Figure 8.5: A Max/Median Filter Realized by Using OSNets

Max/Median Filtering. Max/Median filtering is proposed by Arce and McLough-

lin [8] in order to achieve better preservation of signal features. In the two dimensional

case, if samples in lines separated by 45° is taken into account, the Max/Median filtering

is defined as

y{l, k) = max{z(s, 1), z(s, 2), z(s, 3), z{s,4)} (8.49)


where

z(s,l) = median{a(/, k - M), ...,a(l,k), ...,a(/, k + M)}

z(s,2) = median{a(/ — M, k), ...,a(/, k), ...,a(l + M, k)}

z(s,3) = median{a(/ + M,k — M),a(l,k),a(l - M,k + M)}

z(s,4) = median{a(/ - M , k - M ) , ...,a(/, fc), ...,a(/ + M, k + M)}

where a(/, A;) is the pixel's value of the input picture at position (/, k), and y(l, k) is the

output of the filter at the same position.

A Max/median filter with a window size of (2M + 1) x (2Af + 1) can be realized by

five OSNets as shown in Figure 8.6.

aG^-M) —r-a(W — T aOJc+M) — -

a(l-M,k) — r

aQ+MJc) — -

a(l+MJc-M) — --4

a(l-MJc+M) —' -

a(l-MJc-M) — -

a 0 « —f-a(l+MJc+M) — L

z(s,D

z(s,2)

z(s,3)

z(s,4)

Figure 8.6: A Max/median Filter Realized by Using OSNets

M a x / M i n Filtering

Taking neighborhood min or max is a generalization of shrinking or expanding the l's

in a two-valued picture. Iterated local min followed by iterated max can be used to

remove high-valued blemishes on a low-valued background. Similarly, iterated local max

followed by iterated local min can be used to remove low-valued blemishes on a high-

valued background.

Finding the maximum of an array is equivalent to finding the 1st largest element,

therefore, the OSnet(l,n) can be used. To find the minimum of an array of size n is

equivalent to find the n t h largest element, thus the OSnet(n,n) can be used. There

fore, the Max/Min filtering can be implemented by using OSnet(l,n) and OSnet(n,n)

iteratively.

Adaptive Order Statistic Filtering

A type of adaptive order statistic filters which are called comparison and selection filters

(CS filters) is proposed by Lee and Fam [56]. The output yi of the CS filter with parameter

h at position / through the input array Xi = (x/_m, ...,x/ + m) is defined as

where u\ and v\ are the sample average and sample median respectively, and h is an

integer satisfying 1 < h < m.

Vi = < os(m + 1 + h : Xi) if u; > V[

os(m + 1 — h : X{) otherwise (8.50)

Let

1 if ui > vi (8.51) <

0 otherwise

and

fi A os(m + l + h:Xi) if F = 1

(8.52) <

0 otherwise

(8.53)

then, function (8.50) can be rewritten as

yi = fi + h (8-54)

which can be realized by a linear neuron as follows

yi=Ch,f2D° (8.55)

According to Example 7.1.11, we have

/ i =C< os(m + l + h: X{),F | 1,6 > ° + 6 , F\l,aD° (8.56)

and according to Example 7.1.12, we have

f2 =c< os(m + l-h: Xt), F | 1, -b > a, F | 1, - a D~ a (8.57)

In the above two equations, a < min{os(A; : Xi) : k £ fc} and 6 > max{os(A; : X{) : A; £

/C} — a, where fc = {1,2,...,2m -f- 1}. Since os(& : X/) > 0 VA: £ fc, we choose a — 0.

Consequently, network realizations of / i and f2 can be simplified to

/ i =<os(m + l + / i : X , ) , F | l , 6 > 6 (8.58)

and

/ 2=<os(m + l - / i : X j ) , F | l , - 6 > ° (8.59)

According to Fact 7.3, the network realization of F is

F=LZuhv, | 1,-1 13° (8.60)

Since the network realizations of ui and vi are already known, the network realization of

the CS filter is

y, = C / i , / 2 D °

/ i = <os(m + H - / i : X , ) , F | l , 6 > 6

<

/ 2 = <os(m + l - / i : X / ) , J P | l , - 6 > ° (8.61)

{F =C os(m + 1 : X,), | 1, -1 Z3° {vi =C X;_m, X/, X; + m | C, C, C D°

where c = l/(2m + 1).

Three order statistic networks are needed to realize the above network. However, we

can combine all the three networks together into one network and reduce the number of

weights needed.

Let us define a function as follows

Z\ = os(m + 1 — h : X\)

z2 = F (8.62)

z3 = os(m + 1 + h : Xi)

where F is defined as in (8.51). Recall the network realization of the order statistics

function described earlier, we can now combine three such networks to form a network

realization for function defined in (8.62). After simplification, the network realization is

as follows.

Z\ — C Mn, M12, • Min D°

Z2 = (Z M21, M22, M2n, £/-m, X l + m | 1, 1, 1, — C, — C ZJ°

z3 = C M31) M32, M3n Z)°

{pki =< a;/_m+t_i,MfcoMfc,- I 1, 6, —6 >6 » G K and fc <G {1,2,3}

H\i = C c,i,c i 2, ...,cin

<

H2ki = ZZ qi,c<2,...,c,-n

• a;/_ m +t-i, x ( _ m + j _ i | 1,-1 Zf \ii> j

IZ x / _ m + , _ i , X ( _ m + j _ ! | - 1,1 if z < j

where dk{ = m + 1 + (fc — l)/i — i , and n = 2m + 1, i.e., the number of input elements.

This network model has a depth of 4 and a size of n(8n + 5).

(8.63)

i cij =

i e£ and fc € {1,2,3}

i,j <E IC

X l

Z i /

• V Z2 /

* \ -

Z3 7

1 h

,b Q)

0 L)

0 Q)

Figure 8.7: A Network Model of the CS Filter

Now, the CS filer has a network realization as follows:

y, = C / i , / 2 D °

(8.64) fi = <z1,z2\l,b> b

. /2 = < zz, z2 | 1, -b >°

The whole network has a depth of 5 and a size of n(8n + 5) + 6. This network is shown

in Figure 8.7.

8.2.3 Directional Filtering

Although smoothing techniques can clear a noisy image to certain degree, a common

side-effect they have is that edges get blurred. To alleviate the blurring effect, directional

filtering is sometimes used. An example is the directional averaging filter described in

Chapter Two.

Suppose the directional averaging is done along four different directions, say, 0°, 45°,

90°, 135°, then for each direction, an average is calculated as

v(m,n:9i) = u ( m - k , n - l ) , 0{ <E { (« - 1)45° : i € X = {1,2,3,4}} (8.65)

where We{ is the window along the direction of 0,-, and N is the window size which is the

same for all the windows. An optimal direction 6* is chosen such that \u(m, n) — v(m, n :

9*)\ = min{|u(m, n) — v(m,n : #;)| : i £ I}. The desired result is

v(m,n) — v(m,n : 9*) (8.66)

Let y = v(m,n), ?/,• = u(m, n : 9*), x = u(m, n), and fc = |y — yi\. We define the

following function

fi = { yi if rank(</>,- : $) = 4

0 otherwise

where $ = (^1? <>2, <j>3, 4>A)- It is easy to show that 4

2/ = J2f i=i

which can be realized by a linear neuron as follows.

(8.67)

(8.68)

y = C / i , / 2 , / 3 , / 4 D ° (8.69)

Let

A I 1 if rank(< ,- : $) = 4 I^i —

0 otherwise

then,

fi = S Vi ifFi = l

0 otherwise

which, according to Example 7.1.11, has a network realization

fi=C<yi,Fi\l,b> a + b,Fi\l,aD°

(8.70)

(8.71)

where a < min{min{t/,} : i £ 1} and b > max{max{y,} : i £ X} —a. Since ?/,• > 0 Vi £ X,

we choose a = 0. Consequently, the above network can be simplified to

fi=<yuFi\l,b> b (8.72)

(8.73)

(8.74)

i cij (8.75)

The network realization of Fi is

Ft =CC rank(& : $) ID4, C rank(fa : $) ZI5 | 1, - 1 Z>°

Because rank( >,- : $) < 4, this network can be simplified to

F{x{ : X) =LZ rank(& : *) ZI4

The rank function has a network realization as follows.

rank(<^ : $) - C c t l , c , - , , - - i , c,-4 | - 1 , - 1 , 1 , 1 D~*

LZ | 1,-1 Z)° iii>j <

- l , l Z ) ° if * °, < x, y{ | - 1 , 1 >°D°

The network realization of ?/,• is

Vi = C x-, x 2 , x f | d, d,d D°

where where cZ = 1/iV, and x is the j t h element of the sequence X , which is the set of

pixels in the window W6i. Note that if N = 2M + 1, x = xf+1. Composing all the network realizations together, we get the network model for the

directional averaging filter. After simplification, the network model is as follows.

V = C / i , / 2 , / 3 , / 4 D°

{/,- =< x], x 2 , ...x?, Fi | d, d , d , b >b i e l

{Fi =C C i i , . . . , C i ) , - _ i , C i i j + i , . . . , C , - 4 Z) 4

{cfj =LZ u},p2,n),n) ZZ\°

(8.76)

(8.77)

(8.78)

A f + l ..,X AT — d,—a", 1 — d,— c? >°

Mt2 = <x, 1 , . . ,xf ,x ,x^+ 1 , . . . ,xf |<*, . . . ,d,d-l , . . . ,d>°

where i , j G X. This network model has a depth of 5 and a size of 4(37V + 17). For the

case N = 3, it is shown in Figure 8.8, where the connection weights are not shown.


Figure 8.8: A Network Realization of the DA Filter for the Case N = 3

Chapter 9

N E T W O R K R E A L I Z A T I O N S O F I R T E C H N I Q U E S

Image restoration (IR) techniques are classified here, as in Chapter Eight, into two classes:

the linear and the non-linear filtering techniques. The inverse filter, the Wiener filter, and

so on belong to the first class; the Maximum Entropy filter and the Bayesian methods

belong to the second. This chapter shows derivation of LQT networks for realizing

some linear and non-linear filters, and a comparison with the Hopfield network which is

commonly used for solving optimization problems.

9.1 Network Realizations of Linear Filters

For most linear filters, the essential problem is to find the best estimate f such that the

following cost function

£(f) = a(||g-Hf||2-||n||2) + ||Qf||2 (9.1)

is minimum (see Section 2.2). The solution to this problem is

f = ( H T H + A Q T Q ) x H T g (9.2)

where A = 1/a. For function (9.2), the network realization, as shown in Chapter Eight, is

f = C g | W D 0 (9.3)

where

W = ( H T H + A Q T Q ) - 1 H r (9.4)

110

Chapter 9. NETWORK REALIZATIONS OF IR TECHNIQUES 111

and 0 is a row vector of zeros.

However, the complexity of image restoration stems from the difficulty of calculating

the inversion ( H T H + AQTQ) \ This is the inversion of a large matrix. Suppose we are

dealing with a picture of size 64 x 64 (a very small picture), the matrix ( H T H + AQTQ) has the size of 4096 x 4096. To reduce the computation complexity, Discrete Fourier

Transform (DFT) techniques [26] or iterative methods are used [40]

Here, we derive a network model which implements the steepest decent method, which

is the simplest iterative method.

The steepest decent method finds the optimal estimate by making an initial guess

and approaching the true optimal estimate step by step. Denoting the estimate at time

fc as f(fc), then the difference (f(fc + 1) — f(fc)) is proportional to the opposite direction

of the gradient of the cost function E at time k. The gradient vector of E is usually denoted as S7E, which at time k is

VE(k) = -2(AQ TQ + H T H ) f + 2 H T g (9.5)

Therefore, the estimate at time fc -f 1 is

f(fc + l) = i(fc).+ £((AQTQ + HTH)f(fc) - H T g )

= (I + aQ TQ + 6 H T H ) f (fc) - 8H Tg (9.6)

where 8 takes a small positive value. The optimal 8 is a function of f (fc). For the sake of

simplicity, it is usually chosen as a constant [93].

The network realization of function (9.6) is

f(fc + l )=Cf(fc )|W3 t (9.7)

where

W = I + £AQ TQ + 8H TH (9.8)

and

t = 6 H r g (9.9)

For an n x n, this network has a depth of 1 and a size of n 4 .

The task of finding the network realization of linear filters is now reduced to find the

weight matrix W and the threshold vector t, which only involve simple matrix operations

such as transposition and multiplication. The following are network realizations of two

well-known linear filters.

Psudo-inverse Filter. For the inverse filter, the weight matrix and the threshold

vector are

W = I + 6H TH (9.10)

t = <5Hrg

Wiener Filter. For the Wiener filter, the weight matrix and the threshold vector

are

W = I + ffif-1^ + 6 H T H ^

t = SWg

The matrix Rf is usually assumed diagonal, hence its inverse can be easily calculated.

9.2 Network Realizations of Non-linear Filters

Similar, to the network realization of image enhancement techniques, there is no universal

network architecture for non-linear image restoration filters. Here, we show the derivation

of LQT networks for realizing some of these filters and the limitation of LQT networks.

M a x i m u m Entropy Filter The cost function of a maximum entropy filter is

J(f) = Flnf - A(||g-Hf|| 2 -||n|| 2 ) (9.12)

The gradient vector is

VJ(f) = lnf + 1 - A(2H r(g - Hf)) (9.13)

where 1 = ( 1 , 1 , 1 ) T . Based on the steepest decedent method, the estimate at time

(ib + 1) is

f(k + l) = f(fc)-^(lnf(fc) + l + A(2H T(g-Hf(fc))))

= (I-2A<5HTH)f(fc) + (2<5AHTg-l)-<51nf(fc) (9.14)

The realization of the term In f by using LQT networks requires a huge amount of neurons.

One way to reduce the network size is to include a new type of neurons which have

the logarithm as their activation function. Since the logarithm function can be easily

implemented in analog circuits [64], it is not unreasonable to include such a neuron

model.

Let us denote the natural logarithm activation function as N. Function (9.14) can be

realized by a network model shown in Figure 9.1. Where

W = I - 2 A £ H T H (9.15)

and

t = 2<5AHTg - 1 (9.16)

M a x i m u m Likelihood Filter The cost function of a maximum likelihood filter is

A*) = " | ( g - *{Hf ^ R ^ g - s{Hi}) + lnp(g) + K (9.17)


VJ(f) = B ^ D R - ^ g - 5{Hf}) (9.18)


fi(k)

f.OO fn(k)

where

Figure 9.1: A Network Realization of the Maximum Entropy Filter

D 4 D i a g { ^ | x = k} (9.19)

and bi are the elements of the vector b = Hf". The estimate at time (fc + 1) is

i(k + l) = f(k) - 5 H r D R ^ 1 g + (5HTDR~15{Hf(A;)}

= Wf(fc) - t + Qs{Hf(fc)}

Assume the noise term n is independent, then ( R " 1 ) 7 = R^ 1 , hence we have

(9.20)

m r D R ~ 1 s { H f } - m ' R ^ D s j H f } (9.21)

The term

Ds{Hf} = Diag(s'(b1),s'(b2),..,s'(bn))(s(b1),s(b2),..,s(bn)) T

= (s'(b1)s(h),s'(b2)s(b2),...,s'(bn)s(K)) T (9.22)

where s'(i,) = x = bi. The term s'(6,-)s(6,-) is a multiplication of two variables,

whose network realization by LQT networks requires a huge amount of neurons. One

way to reduce the network size is to use the sigma-pi neurons mentioned in Section 2.1.

In the case when s(.) is linear, function (9.20) is reduced to

i(k + l) = i(k)-SR TR?g + 6H. TTL?s{H.i(k)}

= (I + H T R~ 1 H)f(fc) - ^ H T R z 1 g (9.23)

This filter then can be realized by the network model of (9.7) with

W ^ I + H 3 ! ^ 1 ! ! (9.24)

and

t = 6RTR?E (9.25)

M a x i m u m A Posteriori ( M A P ) Fi lter The cost function of a MAP filter is

J&) = -\(s-s{Bi})TR?te-8{Ht})

- f ) T R7 1 (? - f) + lnp(g) + K (9.26)


VJ(f) = H ^ D R - ^ g - s{Ui}) - Kfii - f) (9.27)

where D is defined as in (9.19). The estimate at time (k + 1) is

f( ib+l) = f(k) — H T D R ~ 1 g + 6H T DR^s{Hf(fc)} + ^ R j 1 ^ ^ ) - f)

= (I + ^Rj x ) f ( fc) - < 5R7 1 f -<5H T DRZ 1 g + (5HTDR^1s{Hi(fc)}(9.28)

Because of the term 6H rDR^ 15{Hf(fc)}, we have the same problem as in the implemen

tation of the maximum likelihood filter.

In the case when s(.) is linear, the MAP filter is reduced to

i(k + !) = (! + SRJ1 + 6H TR^n)i(k) - ^ R j 1 ? + H T R^ 1 g) (9.29)


This filter can be realized by the network model of (9.7) with

W = I + SR-J1 + 6H TR^H (9.30)

and

t = ^Ry 1? + H ^ g ) (9.31)

9.3 Comparison with the Hopfield Network

Zhou, Chellappa, et al. applied the Hopfield network to solve an image restoration

problem [99]. The restoration criterion they chose is that of minimizing the following

cost function

E = i | | g - H f | | 2 + ^ | | D f ||2 (9.32)

where || . || represents the L 2 norm, A is the Langrange multiplier, and f is the restored

image. Such a constrained cost function is widely used in the image restoration problems

[6] and is also similar to the regularization techniques used in early vision problems [73].

The Hopfield network can minimize the following energy function:

EH = - i v T T v - 0Tv, (9.33)

where T is the connection matrix, and 0, are the output and the threshold of the i t h

neuron respectively. By choosing the right connection matrix T and threshold vector 0, the minimization problem of the restoration can then be solved by the Hopfield network.

The cost function (9.32) has to be modified to fit the Hopfield network model because

an image vector is not binary. Since each pixel takes integer value (usually ranging from

0 to 255), it can be represented by a limited number of neurons, such as

f, = J2cjVj (9.34)


An example is Cj = 2^ x \ and L = 8, which is a binary representation of an integer

ranging from 0 to 255. Equation (9.34) can be expressed in the matrix form as

f = C v (9.35)

where

C =

ci,..,,ct, 0,...,0, 0,...,0

0, ...,0, c i , . . . , c i , 0, ...,0

0,...,0, 0,...,0,

Substitute (9.35) to (9.33), we get

Ci, ...,cL

(9.36)

E = i ( g - H C v ) r ( g - H C v ) + i A v T C T D T D C v

= I ( g T g - 2 g T H C v + v T C T ( A D T D + H T H ) C v )

Minimizing E is equivalent to minimizing

1 E' = - ( v r C r ( A D T D + H T H ) C v - 2 g T H C v ) (9.37)

Comparing (9.37) with (9.33), it can be seen that a Hopfield network can solve the

restoration problem by setting its parameters as follows:

(9.38) T = C T ( A D r D + H T H ) C

e = gT H c

The Hopfield network can solve any optimization problem whose cost function can

be modified to the form shown in (9.32), which is the case for most image restoration

problems. However, some practical problems arise when the Hopfield network is applied

to image restoration.


The first problem is that an enormously large number of neurons are needed for image

restoration even when the size of the picture is small. This is because that several neurons

are needed to represent a pixel since each neuron of a Hopfield network has only binary

output. Suppose eight neurons are used to represent a pixel, then for an n x n image,

8 x n2 neurons (e.g., if n=64, then 32,768 neurons), are needed, that is, the size of the

Hopfield network is 64n4.

The second problem is that neurons in the Hopfield network have to be updated

sequentially, i.e., one at a time, to ensure the convergence of the network [33]. This

is a serious restriction, since it nullifies the advantage of parallelism offered by neural

networks. Moreover, the need for a huge number of neurons makes this problem much

worse.

Our network model has a size of n4 neurons, and, most importantly, is guaranteed to

work in parallel. Some detailed comparison between our network model and the Hopfield

network in solving optimization problem in general are available in [87].

Chapter 1 0

A P P L I C A T I O N S I N O T H E R A R E A S

The purpose of this chapter is two-fold. First of all, we shall show that the applicability

of our methodology as well as the LQT networks is not limited to the area of image

processing. Secondly, we shall show that by using our methodology, better neural network

implementations can be easily developed. To serve such purpose, we have chosen some

algorithms which are used in areas other than image processing and have known neural

network implementations developed by other researchers.

1 0 . 1 Sorting

Sorting is one of the most important operations performed by computers. It is believed

that 25%-50% of all the works performed by computers consist of sorting data [1]. In

addition, there exist other techniques which are based on sorting, for instance, the order

statistic filtering. Therefore, increasing the speed of sorting is of great importance and

interest.

Since the advent of digital computers, many sorting schemes have been proposed in

the literature [1, 51, 61]. Some of the sorting schemes are executed in serial or parallel

digital computers, while others are executed in special-purpose machines. For bounded

fan-in logic circuits (networks of AND, OR, NOT gates), the minimum sorting time

achievable is 0(nlog 2n) for serial sorting schemes, and 0(log2 n) for parallel sorting

schemes (n here is the number of items to be sorted) [1]. For unbounded fan-in logic

circuits, the minimum sorting time is 0(1); however, the circuit size has to be more than

119

Chapter 10. APPLICATIONS IN OTHER AREAS 120

polynomial [13, 22]. It is implicit in [13] that if threshold-logic gates (neurons) are used,

the circuit size can be reduced to polynomial.

It is certainly possible to achieve constant sorting time using a polynomial-size LQT network since it employs threshold-logic neurons as one of three kinds of basic elements.

Here, we derive such a network for sorting integers, and by doing so, give exact size and

depth. This network implements the enumeration sorting algorithm, which is the fastest

sorting algorithm when implemented in parallel [1].

Given an sequence S = (xi, x 2 , x n ) where Xi is an integer, it can be sorted in

two ways: (1) S is sorted to a sequence (2/1,2/2, •••,2/n), where ?/,• < yj iff i < j; and (2)

iS" is sorted to a sequence (y1? y 2 , Vn), where j/; > yj iff i < j. Let us refer to the

first sequence as an increasingly sorted sequence and the second an decreasingly sorted

sequence. Without loss of generality, we shall only consider the decreasingly sorted

sequence and denote it as S~. It can be easily shown that for a sequence S of distinct numbers, it has the following

relationship with its decreasingly sorted sequence S~: yk = x; iff there are (k — 1) elements

in S which is greater than x,-.

To include cases where there are some elements in S which are equal, we introduce

the following definition and theorem.

D e f i n i t i o n 10 .1 x,- t> Xj means that X{ > XJ if i > j or xt- > Xj if i < j.

Define Qi = {XJ : Xj > x,-, VXJ £ 5 - (x,)}, we have the following theorem.

T h e o r e m 10 .1 Given a sequence S, if we arrange its elements to form an new sequence

S' according to the following rule:

p(xi) - k, ifl\Gi\ = k-1

where p(xi) means the position of xt- in S', then S' is a decreasingly sorted sequence of S.

PROOF The validity of this theorem is obvious. I

To derive a LQT network for sorting, we need to restate the enumeration sorting

algorithm in a more mathematically rigorous way. This requires two special functions

introduced in Chapter Eight; they are the comparison function and the rank function.

The comparison function is defined as

0 xt- > Xj

0 Xi = Xj i > j

1 Xi = Xj i < j

1 Xi j

C^X ^ ^ Xj^j — ^ (10.1)

and the rank function is defined as

(10.2)

(10.3)

rank(x,- : S) — k, if x,- is the k largest element in S

According to Theorem 10.1, the following equation is true. n

rank(Xi : S) = ^c (x , - ,Xj ) + 1 J=I

Now the enumeration sorting algorithm can be rephased as follows:

1. Calculate c(x;, Xj) G I = {1,2,.., n}, but i ^ j ;

2. Calculate rank(xj : S) Vi € T',

3. Vfc € 2T, Vk = xi if rank(x8- : S) = k.

We first derive network realizations for each step and them compose them together

to form the whole network.

For the case i > j ,

C^tCj^ X— * 0 XJ I X j

1 otherwise

1 Xi — Xj > 1 0 otherwise

(10.4)

The network realization of this function is

C X f Xj ^ — 3C^y Xj | 1 j X ]

For the case i < j,

The network realization of this function is

0 3C j %C j

1 otherwise

1 Xj — X{ > 0

0 otherwise

For the second step, the network realization of the rank function is

rank(xj : S) = C C(XJ,, x i ) , C ( X J , Xj_x) , C(XJ, X J + I ) , c ( x , - , x n ) D _ 1

Substitute (10.5) and (10.7) to (10.8), we have

rank(x,- : S) = C L Z X; , x i | — 1,1 Z I 1 , Z x ; , X j _ i | — 1,1 Z I 1 ,

LZ X i , x i + 1 | - 1,1 Z I 0 , L Z x,-, x n | - 1,1 Z I 0 ! ) " 1

For the third step, let

z(i, k) = <

Then it can be shown that

Xj if rank(xj : S) — k 0 otherwise

n Vk = k)

t = i

which has the following network realization

(10.5)

(10.6)

(10.7)

(10.8)

(10.9)

(10.10)

(10.11)

(10.12)

yk = C z(l,k), z(2, k),z(n, k) D° (10.13)

Let

1 if rank(x,- : S) = k 0 otherwise

(10.14)

then,

X{ F{ — k

0 otherwise z(i, k) = <

According to Example 7.11.11, the network realization of this function is

(10.15)

z{i,k) = C < Xi,Fi | l,b> c,Ft | l,a D° (10.16)

where a < min{a:,- : i £ I and x t 6 X}, b > max{i,- : i 6 I and Xi € X} — a, and

c = a + b. X is the set of all the possible values X{ may take. The network realization of

Fi, according to Example 7.1.6, is

Fi = C C ri Z l f c , C rt- | 1 , - 1 3 ° (10.17)

where = rank(a;,- : 5). Substitute this equation to (10.11), after simplifying the

resulting network realization, we have

z(i,k) = C < x i , r Z r , - Z l A \ r Z r i Z ] f c + 1 \l,b,-b> c,nriZl k,\ZriZ2 k + 1

(10.lo) \l,a,-aD°

Composing all the network realizations together and after some simplification, we

have the network implementation of the enumeration sorting algorithm:

yk = C fJ-l, /il,fc, Hl,k+l, 1*2, H2,ki M2,fc+1, Hn,Hn,k, Hn,k+\ | 1, «, —«, 1, «, — — , 1, —a Z>°

{ /Xi = < Xi,fiitk,fii,k+i I 1,6, > c

{/*,-,/ = C c{xi,xx),...,c(xi,xi_l),c{xi,Xi+l),...,c(xi,xn) Zi1'1 ( 1 °

r z Xi,xj | - l , 1 z i 1

111,-,^ | - 1 , 1 D ° {c(x,-, Xj) = <

Chapter 10. APPLICATIONS IN OTHER AREAS 1 2 4

{c(xi,Xj) = <

where i , j,k £ 1 and / G k + 1}. There are two special cases in the above network realization, namely, when k — 1 and

k — n. When k — 1, = 1 Vi £ X. Therefore, the network realization of ?/i can be

further simplified to

V\ = C A»i,2» /*2, A*2,2, A»«, A*n,2 I l , - a , l , - a , 1 , - a D ~ n a

{pi =< xt,mt2 | 1,-6 > a

{fr,2 =• c(x,-, x i ) , C ( X J , x,_i), c(x,-, x i + i ) , c ( x , - , i n) Z)1 ( 1 0 . 2 0 )

LZ Xi,Xj | - 1,1 ZJ1

LZ_ Xj | 1 1 ZD

where i,j £ J .

When A; = n, / ^ j 7 l + 1 = 0 Vi G X. Therefore, the network realization of t / n can be

further simplified to

Vn = C / / l , / * l,n, /l2, /t2,n, -, /^n, N , n | 1, a, 1, O, 1, fl D °

{m =< Xi,ui>n | 1,6 > c

{Pi,n =LZ c(xi, x j ) , c ( x i , x.-.i), c(x,-, x , - + 1 ) , c ( x j , xn) ( 1 0 . 2 1 )

Z ^-15 I I 5 1 3

CZ 3?t? \ 1 1 ZD

where i , j G X.

For an input array of n elements, our network has a depth of 4 and a size of n(n 2 + 6n — 5 ) . The computation time of our network is constant irrespective of the size of

the input array. It is always four times the processing time of one single neuron. Our

network for n — 3 is shown in Figure 1 0 . 1 , where the weights which are not shown have

the value of 1. To our knowledge, the aforementioned network is the first published neural network

which employs three kinds of neurons to implement the enumerate sorting algorithm [86].

{c(x,-,xj) = <

Chapter 10. APPLICATIONS IN OTHER AREAS

Figure 10.1: A Neural Network for Sorting an Input Array of Three Elements

Later, Bruck and Siu propose another neural network model which has an architecture

similar to that of ours [88]. It is composed of threshold-logic neurons and has a depth of

5.

Note that the size of our sorting network is polynomial only with respect to the

number of input elements. This is justifiable in applications where the value range of the

input elements is fixed, for instance, in image processing, the value range of a pixel is

restricted to [0,255].

10.2 Communication

Here, we chose an algorithm which is used in communication and has a well-known neural

network implementation called Hamming network.

10.2.1 Improvement over Hamming Network

In Chapter Two, we introduced the Hamming network which implements a minimum er

ror classification algorithm (referred to as Hamming Classification (HC) algorithm here).

The Hamming network has some shortcomings: (1) the network output is not the ex

emplar; (2) the network has to iterate several times; (3) the matching scores have to be

distinct; and (4) the output dynamic range of neurons has to be very large. Here, we

derive a network which overcomes the above shortcomings.

Assume there are m exemplars, and the size of the input pattern is n. Denote the

input pattern as x, the output pattern as y, the i t h exemplar as e,-, and the j t h element

of ei as e\. The HC algorithm is reformulated as follows:

1. Calculate d,- = \{n + x • e,) Vj G J = { 1 , 2 , m } ;

2. Find dk = max(<f,- : i € J);

3 . Assign j/, = t\ Vi 6 J = { 1 , 2 , n } .

The matching score d,- can be expressed as

di = \ S ' * i + \ (10-22)

whose network realization is

dj=Cx\ Sj D _ T ( 1 0 . 2 3 )

From the definition of the rank function, we know that

dk = max(dj : j £ J) iff rank(<4 : D) = 1

Define

A , -Zfc = <

1 if rank(4 : D) = 1 0 otherwise

then it can be shown that

(10.24)

K = £ c i - * i (10-25)

which has a network realization as

Vi =C ^ i , 2 2 , 2 m | e\, e 2 , D ° (10.26)

The network realization of zk is (recall the example in the previous section)

zk =CC rank (4 : D) Z)\ C rank (4 : D) | 1, -1 D° (10.27)

Since rank(cffc : D) > 1, C rank((4 : Z)) • 1= 1. According to Theorem 6.3, the above

network can be simplified to

zk =CC rank (4 : D) Z? | - 1 D~\ (10.28)

The network realization of the rank function (see Section 10.1) is

rank(d* : D) -C c(dk, dx),c(dk, dfc-i), c(dk, dk+i),.-, c(dk, dm) D 1 (10.29)

where

c(dk,dj) = < LZ dk,dj | - 1,1 D 1 if k > j

\Zdk,dj | - 1 , 1 ZI0 if k<j

Let

ch =LZ dk,dj | - 1,1 ZZ\h

Substituting (10.23) to (10.31), we have

ch = ZZCx\ek D~n/\ Cx\ej D~n/2 \ - 1,1

= LZ x, x | — ek, ej ZJh

= LZ x | (ej - ek) Z3h

Substituting the above network realization to (10.29), we have

rank(dfc : D) - CLZ x \ ekti Z21,LZ x \ efc,fc-i ZI 1 ,

LZ x | 4 , f c + 1 Z 0 , L Z x | ekim Z3°D~1

(10.30)

(10.31)

(10.32)

(10.33)

where ek>j = ej — ek.

Substitute (10.33) to (10.28), and then substitute the resulting network realization to

(10.26). After some simplification, we have the network realization of the HC algorithm

as follows:

{fik —LZ pk<i, ...,ft,it-i,ft,Hi, •••,Hk,m ZJ1

LZ x | ekj ZZ)1 if k > j <

LZ x | 4 j ZJ° if k < j

(10.34)


Figure 10.2: The Network Model for Implementing the HC Algorithm

where i £ {l,2,...,ra}, k,j £ {l,2,...,m}, and a,- = Z^LieJ- This network has a depth of

3 and a size of m2(n + 1) — m. It is shown in Figure 10.2 for the case m — n = 3. Our network overcomes the shortcomings of the Hamming network mentioned at the

beginning of this section. In particular, since the processing time of the Hamming network

is at least nine units of time, the processing speed of our network is at least three times

as fast as that of the Hamming network. Furthermore, since our network is perfectly

synchronized, input patterns can b e e f e d to the network in a pipeline fashion. The only

disadvantage of our network is that the network size is bigger than that of the Hamming

network (its network size is m(m + n)). Whether or not this is negligible or tolerable

depends on the relative importance of the cost of hardwares for particular applications.


10.3 Optimization

Optimization has been an major field of application for neural networks since the pub

lication of Hopfield's papers [35, 36]. In Chapter Nine, we have derived some network

models for solving image restoration problems, which are essentially optimization prob

lems. Here, we show the derivation of some network models for solving other optimization

problems, and compare the performance of our networks with the Hopfield network which

is a popular neural network used for optimization.

10.3.1 Solving Simultaneous Equations

The matrix format of the simultaneous equations is expressed as

A x = c (10.35)

where A is n x m matrix, x is a m x 1 vector, and c is a n x 1 vector. The objective

here is to find the vector x which satisfies equation (10.35).

Let us define a computational energy function as follows

£ = ( A x - c f ( A x - c ) (10.36)

This energy function can be shown to be a convex function, therefore any local minimum

is also the global minimum [14]. Since E attains value zero if and only if A x = c,

the problem of solving the simultaneous equation is now transformed to the problem of

minimizing E. If equation (10.35) has a solution, by minimizing E this solution can

always be found. If equation (10.35) has no solution, by minimizing E a best fit solution

in the L2 sense is found.

To find the minimum of E, we use the method of steepest descend. Here the change

of x is along the best searching direction, which is the opposite direction of the gradient

of E. Denoting Vi£ as the gradient vector, we get

VE = 2 A T A x - 2 A T c

= 2 A T ( A x - c ) (10.37)

Therefore, the increment of x is

A x = -<5AT(Ax - c) (10.38)

where 6 takes a small positive value. The optimum 6 is a function of x. Here for the sake

of simplicity we choose 5 as a constant.

We know that

x(Jb + 1) = x(fc) + Ax(Jb) (10.39)

This leads to

x(fc + l) = (I — <SATA)x(fc) + 6A.Tc

= Wx(Jfc) + t (10.40)

The network realization of the above function is

x(Jb + 1) =C x(Jb) | W D* (10.41)

This network has a depth of 1 and a size of ra2.

Simultaneous equations when transferred to a optimization problem can also be solved

by the Hopfield network—for a detailed account of how the Hopfield network is used for

solving optimization problems, please see Chapter Nine. The Hopfield network, or more

precisely the discrete version of Hopfield network, can only take binary values. Assume

that / neurons are used to encode one number, then the Hopfield network has a depth of

1 and a size of Z x m 2 . That is, it's network size is / times that of our network.


Another problem with the Hopfield network is its oscillatory behaviour, i.e., wondering

around a minimum. This phenomenon is due to the simultaneous updating of its neurons.

To avoid such behaviour, one way is to update its neurons sequentially, and as a matter

of fact, it is the only way to ensure the convergence of the network. However, by doing

so, the advantage of parallelism is lost. Our network uses simultaneous updating and

has no oscillatory behaviour. It always converges smoothly to the minimum if the search

step is properly chosen.

The precision of the solution obtained by using the Hopfield network is limited by the

number of neurons used. In our network, the precision is proportional to the number of

iterations.

Some results of using our networks to solve simultaneous equations and some detailed

comparison with the Hopfield network are available in [87].

10.3.2 Matr ix Inversion

In image processing and some other areas, matrix inversions are often required. Matrix

inversion is a time-consuming task when performed using digital computers. Here, we

derive a L Q T network for performing matrix inversion.

The objective of finding an inversion of some matrix A is to find another matrix X

such that

Equation (10.42) is a collection of n simultaneous equations (assuming A is a n x n

matrix) as follows:

where X; and i; are the i t h column vectors of X and I respectively. We know from the

A X = 1 (10.42)

Ax,- = U Vi € {l,2,...,n} (10.43)


previous example that the network model for solving the simultaneous equations is

x,.(Jfc + 1 ) = c xt- | W D * ' ( 1 0 . 4 4 )

where W = I — 6ATA, and t, = 6ATi,. Therefore, we can solve the matrix inversion by

either using the above network n times, or using n such networks at the same time.

Chapter 11

CONCLUSIONS

11.1 Summary

In this thesis, a systematic method of neural network design, which we call algebraic

derivation methodology, is proposed and developed. This work is motivated by the need

for a systematic method with which neural networks can be constructed for computational

purposes. The lack of such a method has limited the application of neural networks in

many areas, particularly the area of image processing.

The algebraic derivation methodology consists of the following stages:







ified functions.

This methodology is then applied and developed with an emphasis on deriving networks

to realize image processing techniques.

It is shown informally in Chapter Four and then formally in Chapter Seven that any

image processing technique can be realized by networks of LQT neurons. These neurons

134

Chapter 11. CONCLUSIONS 135

can be easily implemented in hardware, hence the implementation of their networks

should be straightforward.

Symbolic representations of LQT neurons and their networks (LQT networks) are then

devised in Chapter Five. Such representations enable us to algebraically manipulate the

process of network design so as to derive and simplify LQT networks.

Theorems are established in Chapter Six for manipulating symbols mentioned above.

These theorems are like theorems of Boolean algebra and used for network simplification.

Procedures of deriving networks to realize both single-input and multiple-input func

tions are given in Chapter Seven. Their usages are demonstrated through some simple

examples, which in turn are useful in deriving LQT networks in later chapters.

To demonstrate the merits of the algebraic derivation methodology, networks for

realizing some image enhancement and image restoration techniques, as well as techniques

in some other areas are derived in Chapters Eight, Nine, and Ten respectively. Most of

these networks are the first known neural network models; in the case there are other

known network models, our networks have the same or better performance in terms of

computation time.

11.2 Contributions

The main contribution of this thesis is the development of the algebraic derivation

methodology. Some important features of this methodology are as follows: 1) both

the network parameters and the network architectures are derived; 2) neurons and their

networks are represented in symbolic forms; and 3) these symbols are to be manipulated

so as to yield proper network models.

During the development of the algebraic derivation methodology, some specific con

tributions have been made. They are listed as follows.


Formal definitions of neurons and neural networks have been given. Networks so

defined can simulate the computation process of any network composed of linear-sum

neurons, such as the back-propagation network and the Hopfield network. Therefore,

they provide a unified framework for studying the computational properties of neural

networks, and may also serve as a model of general purpose neural computers.

Symbols to represent LQT neurons and their networks have been devised. Such

symbolic representations enable us to algebraically manipulate the process of network

design so as to yield proper network models.

A type of neural networks called LQT networks have been established with the ob

jective of realizing image processing techniques. These networks use the simplest neuron

models, and thus should encounter minimum difficulty when implemented in hardware.

The LQT networks are capable of realizing any image processing technique and some

techniques in other areas.

To simplify LQT networks, that is, to find networks with fewer layers and/or fewer

neurons, the concept of network equivalence has been introduced, and theorems of net

work equivalence have been developed.

Procedures for deriving neural networks to realize both single-input and multiple-

input functions have been developed. They are effective in deriving LQT networks as

shown in this thesis. Moreover, they are general procedures, not just restricted to the

LQT networks.

These theorems and procedures also provide a tool to analyze the computation com

plexity of neural networks. Consequently, not only we can evaluate different networks

which realize the same function, but also we can make some concrete comparison between

neural networks and other parallel computing machines.

Network models have been derived for implementing some well-known algorithms in

image processing and some other areas. For example, we have derived networks for


implementing the median filter and the Wiener filter, and a network for sorting. Most

of the networks we have derived are the first known such neural network models; in

the case there are other known network models, our networks have the same or better

performance in terms of computation time. For example, the computation time of our

sorting network is the same as that of the sorting network developed by Siu and Bruck

[88]; the computation time of our network which implements the Hamming Classification

algorithm is at least two times less than that of the Hamming network developed by

Lippmann [58].

Some interesting questions are raised during the development of the algebraic deriva

tion methodology, one of which is as follows: what is the minimum number of weights a

network must have in order to realize a given function? This is an important question

with both theoretical and practical values. It is evident, (see [30] for example), that

any problem solvable by digital computers can be solved by neural networks in a fixed

amount of time irrespective of the problem size. However, the number of weights is pro

portional to the problem size. Consequently, we may encounter such a situation that a

neural network may solve a given problem in, say, nanoseconds, but it may take millions

of years to construct such a network. Therefore, it is important to know the minimum

number of weights needed to solve a problem by neural networks. To our knowledge, this

important question hasn't been seriously considered in the field of neural networks.

11.3 Future Work

The algebraic derivation methodology is developed with an emphasis on deriving networks

to realize image processing techniques. It is desirable to expand this methodology to

form a unified theory of algebraic derivation. To do so, we may start with identifying

basic functions used in many areas such that a minimum set of neuron models can be


constructed, in much the same way as in Boolean algebra where the basic functions are

AND, OR, and NOT. Of course, the basic functions we are looking for are at a much

higher computation level so as to build more efficient computing machines.

As shown in Chapter Nine, some unconventional neuron models are more efficient for

realizing certain functions. One of such neuron models is the so-called sigma-pi neuron

(see [15]). Since it is not difficult to implement such neurons in hardware [64], there is

no reason to exclude them. We may also need to search for other neuron models which

are either computationally more efficient or easier to implement in hardware. That is,

we look for the basic neuron models from both perspectives of pure computation and

hardware implementation. This task is an essential part of the task mentioned in the

previous paragraph.

Another purpose of developing a theory of network design is to evaluate the efficiency

of neural networks as computing machines. A solid theory is needed for us to make

concrete comparisons between neural networks and other parallel computing machines.

Ultimately, neural networks have to be implemented in hardware for any practical

use. Since the neuron models used in our networks already have corresponding hard

ware implementations, it is reasonable to expect easy implementations of these networks.

However, hardware problems, such as the effect of inaccuracy in setting up network pa

rameters and thermal noise in analog circuits, have to be taken into consideration. Some

theoretical and experimental works are certainly needed.

Fault tolerance is one of the characteristics of many existing neural network paradigms,

and is considered a remedy to the inherent imperfection of analog implementations [64].

Fault tolerance is believed due to the distributed nature of rule representation in neu

ral networks and the redundancy of neurons. Our networks, however, are derived with

an emphasis on minimizing the number of neurons, thus eliminating such redundancy.

Hence, it is expected that our networks are less fault tolerant than networks having


component redundancy. This, however, may be solved by using some fault tolerance

techniques such as those already developed for digital computers. No matter what the

result will be, such an approach of explicitly building in fault tolerance ability may give

us some hints to better understanding of the fault tolerance ability of both artificial and

biological neural networks.

In this thesis, we are only concerned with deriving networks for solving problems

which have analytical solutions. There are also many interesting problems (e.g. pattern

recognition problems) which only have partial solutions. How to utilize this partial

knowledge is thus important. The underlining principle of our methodology may be

applied to help create better learning schemes with which not only network parameters

but also network architectures are adjusted in the learning process. Or we may derive

the network architecture based on the characteristics of the problem, and then let the

network adjust its parameters in the learning process.

Appendix A

T H E G E N E R A L I Z E D D E L T A R U L E

The generalized delta rule is an iterative gradient search algorithm. The objective of the

algorithm is to minimize the mean square error between the actual output of a back-

propagation network and the desired output.

Step 1 Initialize Weights and Thresholds:

Set all weights and neuron thresholds to small random values.

Step 2 Present Input and Desired Outputs:

Present a continuous valued input vector Xo,Xi,...,xn_i and specify the desired out

put ao,di,...,dm-i- If the network is used as a classifier then all desired outputs

are typically set to zero except for that corresponding to the class the input is

from. The input could be new on each trial or samples from a training set could

be presented cyclically until weights stabilize

Step 3 Calculate Actual Outputs:

Calculate outputs yo,yi,...,ym-i-

Step 4 Adapt Weights:

Use a recursive algorithm starting at the output neurons and working back to the

first hidden layer. Adjust weights by

Wji(h + 1) = Wji(h) + t]SjXi (A.l)

140

Appendix A. THE GENERALIZED DELTA RULE 141

In this equation Wji(h) is the weight from the i t h hidden neuron or from the i t h

element of the input to the jth neuron at time h, X{ is either the output of the i t h

neuron or the i t h element of the input , rj is a gain term, and 8j is an error term for

the j " 1 neuron. If the j t h neuron is an output neuron, then

= VA1 ~ Vj)(dJ ~ Vi) (A-2)

where dj is the desired output of the j t h neuron and yj is the actual output.

If the j t h neuron is an hidden neuron, then

Sj = Xj(l - Xj) ^2 SuWkj (A.3) fc

where k is over a l l neurons in the layers above neuron j. Thresholds are adapted in

a similar manner by assuming they are connection weights on links from auxiliary

constant-valued inputs. Convergence is sometime faster if a momentum term is

added and weight changes are smoothed by

Wji(h + 1) = wji(h) + rjSjXi + \(wji(h) - w{j(h - 1)) (A.4)

where 0 < A < 1.

Step 5 Repeat by Going to Step 2.

Appendix B

H O P F I E L D N E T W O R K A L G O R I T H M

The Hopfield network algorithm is described as follows.

Step 1 Assign Connection Weights

Wij - V«,i€{l,2 , . . ,n} (B.l) 0 * =j

where m is the total number of exemplars and n is the total number of neurons in

the network. In this formula Wij is the connection weight from the jth neuron to

the i t h neuron; and x* £ {0,1} is the i t h element of the exemplar for class s.

Step 2 Initialize with the Input Pattern

yi(0) = xi V t € { l , 2 , . . . , n } (B.2)

In this formula yi(k) is the output of the i t h neuron at time k and X{ £ {0,1} is the

i t h element of the input pattern.

Step 3 Iterate Until Convergence

+ = E - = i U W ; ( * ) V i e { l,2 , . . , n }

Vi(k + l) = fT{vi(k+l)) V i€ { l,2 , . . . , n }

where function fx is the activation function of the threshold-logic neuron. The

process is repeated until the network output remains unchanged with further itera

tions. The network output then represents the exemplar pattern that best matches

the input pattern.

142

Bibliography

Akl, S.G., Parallel Sorting Algorithms, Vol. 12 in Werner Rheinboldt (ed.), Notes and Reports in Computer Science and Applied Mathematics, Orlando: Academic Press, 1985.

Albert, A. Regression and the Moore-Penrose Pseudoinverse, Academic Press: New York, 1972.

Aleksander, I. (ed.), Neural Computing Architectures: The Design of Brain-like Machines, The MIT Press: Cambridge, Massachusetts, 1989.

Aleksander, I., Thomas, W.V., and Bowden, P.A., "WISARD, A Radical Step Forward in Image Recognition," Sensor Review, Vol. 4, No. 3, pp. 120-124, 1984.

Anderson, J.A., "A Simple Neural Network Generating An Interactive Memory," Mathematical Biosciences, Vol. 14, pp. 197-220, 1972.

Andrews, H.C. and B.R. Hunt, Digital Image Restoration, Englewood Cliffs, NJ: Prentice-Hall, 1977.

Arbib, M.A. , Brains, machines, and mathematics, 1987.

Arce, G.R. and M.P. McLoughlin, "Theoretical Analysis of the Max/Median Filter," IEEE Trans, on Acoust., Speech, Signal Processing, Vol. ASSP-34, No. 1, pp. 60-69, Jan. 1987.

Barto, A.G. , R.S. Sutton, and C.W. Anderson, "Neuronlike Adaptive Elements That Solve Difficult Learning Control Problems," IEEE Trans, on Systems, Man, and Cybernetics, Vol. SMC-13, No. 5, Sept./Oct., 1983.

Baum, E.B., J. Moody, and F. Wilczek, "Internal Representations for Associative Memory," NSF-ITP-86-138 Institute for Theoretical Physics, University of California, Santa Barbara, California, 1986.

Bovik, A.C. , T.S. Huang, and D.C. Munson, Jr., "A generalization of median filtering using linear combinations of order statistics," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-31, pp. 1342-1350, Dec. 1983.

Carpenter, G.A. and S. Grossberg, "The ART of Adaptive Pattern Recognition by A Self-Organizing Neural Network," IEEE Computer, March 1988, pp. 77-88.

143

Bibliography 144

Chandra, A.K. , L. Stockmeyer, and U. Vishkin, "Constant Depth Reducibility," Siam J. Comput., Vol. 13, pp. 423-439, 1984.

Cooper, L. and D. Steinberg, Introduction to Methods of Optimization, Saunders: Philadelphia, 1970.

Durbin, R. and D.E. Rumelhart, "Product Units: a Computationally Powerful and Biologically Plausible Extension to Back-propagation Networks," Neural Computation, Vol. 1, 1989, pp. 133-142.

Eberhart, Russell C , "Standardization of Neural Network Terminology," Neural Networks, Vol. 1, No. 2, June, 1990, pp. 244-245.

Elman, J.L., "Learning the Hidden Structure of Speech," Institute for Cognitive Science, University of California at San Diego, ICS Report 8701, Feb., 1987.

Feldman J.A. and D.H. Ballard, "Connectionist Models and Their Properties," Cognitive Science, Vol. 6, pp. 205-254, 1982.

Fox, G.C. and P.C. Messina, "Advanced Computer Architectures," IEEE Computers Magazine, pp. 67-74, 1989.

Frei, W., "Image Enhancement by Histogram Hyperbolization," Computer Graphics and Image Processing, Vol. 6, pp. 286-294, 1977.

Fukushima, K. and S. Miyake, "Neocognitron: A New Algorithm For Pattern Recognition Tolerant of Deformations And Shifts In Position," Pattern Recognition, Vol. 15, No. 6, pp. 455-469, 1982.

Furst M . , J.B. Saxe and M . Sipser, "Parity, Circuits and the Polynomial-time Hierarchy," Proc. 22nd IEEE Symposium on Foundations of Computer Science, 1981, pp. 260-270.

Gallager, R.G., Information Theory and Reliable Communication, John Wiley & Sons, New York, 1968.

Gevins, A.S. and N.H. Morgan, "Applications of Neural-Network (NN) Signal Processing in Brain Research," IEEE Trans, on Acoustic Signal Speach Processing, Vol. 36, No. 7, July, 1988.

Goles Eric and Servet Martinez, Neural and Automata Networks: Dynamical Behaviour and Applications, Kluwear Academic Publishers: Boston, 1990.

Gonzalez, R.C. and P. Wintz, Digital Image Processing, Reading, MA:Addison-Wesley, 1977.

Bibliography 145

Grossberg, S., The Adaptive Brain I: Cognition, Learning, Reinforcement, and Rhythm, and The Adaptive Brain II: Vision, Speech, Language, and Motor Control, Elsevier/North-Holland, Amsterdam, 1986.

Hagiwara, M . , "Accelerated Back Propagation Using Unlearning Based on Hebb Rule," in Proc. of IJCNN'90, 1-617-620, Jan., 1990.

Hebb, D.O., The Organization of Behavior, John Wiley &; Sons: New York, 1949.

Hecht-Nielsen, R., "Theory of the Back-propagation Neural Networks," In Proc. of the Intl. Joint Conf. on Neural Networks, Washington, D.C., New York: IEEE Press, pp. 1593-606, 1989.

Hecht-Nielsen, R., "Counter-Propagation Networks," Proc. of IEEE First International Conference on Neural Networks, New York, 1987.

Helstrom, C.W., "Image Restoration by the Method of Least Squares," J. Opt. Soc. Amer., Vol. 57, Mar. 1967, pp. 297-303.

Hopfield, J.J., "Neural Networks and Physical Systems with Emergent Collective Computational Abilities," Proc. Natl. Acad. Sci. USA, Vol. 79, 2554-2558, April 1982.

Hopfield, J.J., "Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons", Proc. Natl. Acad. Sci. USA, Vol. 81, 3088-3092, May 1984

Hopfield, J.J., and D.W. Tank, "Computing with Neural Circuits: A Model," Science, Vol. 232, 625-633, August 1986.

Hopfield, J.J., and D.W. Tank, "Simple 'Neural' Optimization Networks: An A/D Convert, Signal Decision Circuit, and a Linear Programming Circuit," IEEE Trans, on Circuits and Systems, Vol. CAS-33, No. 5, pp. 533-541, May 1986.

Hopfield, J.J. and D.W. Tank, "Neural Computation of Decisions in Optimization Problems," Biol. Cybern. Vol. 52, pp. 141-152, 1985.

Huang, T.S. (Ed.), Two-Dimensional Digital Signal Processing II: Transforms and Median Filters, New York: Springer-Verlag, 1981.

Huang, T.S., W.F. Schreiber, and O.J. Tretiak, "Image Processing," Proc. IEEE Vol. 59, pp. 1586-1609, 1971.

Huang, T.S., D.A. Barker and S.P. Berger, "Iterative Image Restoration," Applied Optics, Vol. 14, No. 5, pp. 1165-1168, May, 1975.

Bibliography 146

[41] Hummel, R .A . , "Image Enhancement By Histogram Transformation," Comput. Graph. Image Pore. Vol. 6, pp. 184-195, 1977.

[42] Hunt, B.R. , "Digital Image Processing," I E E E Proc , Vol. 63, pp. 693-708, Apri l , 1975.

[43] Irie, B. and S. Miyake, "Capabilities of Three-layered Perceptron," in Proc. of IJCNN'89, pp. 1-641-648, June, 1989.

[44] Izui, Y . and A . Pentland, "Speed Up Back Propagation," in Proc. of IJCNN'90, pp. 1-639-642, Jan. 1990.

[45] Jain, A . K . , Fundamentals of Digital Image Processing, Prentice-Hall: New Jersey, 1988.

[46] Jayant, N.S., "Average and median-based smoothing techniques for improving speech quality in the presence of transmission errors," I E E E Trans. Commun., Vol. Com-24, pp. 1043-1045, Sept. 1976.

[47] Judd, J.S., "Learning in Networks Is Hard," Proc. of the First Intl. Conf. on Neural Networks, pp. 685-692, IEEE, San Diego, California, June, 1987.

[48] Judd, J.S., Neural Network Design and the Complexity of Learning, The MIT Press, Cambridge, Massachussetts, 1990.

[49] Kandel, E.R., and J . H . Schwartz, Principles of Neural Science, Elsevier, New York, 1985.

[50] Klassen, M.S. and Y . H . Pao, "Characteristics of the Functional-link Net: A Higher Order Delta Rule Net," Proc. of IEEE Second Annual International Conference on Neural Networks, June San Diago, C A , 1988.

[51] Knuth, D.E . , The Art of Computer Programming: Sorting and Searching, Vol. 3, Reading, M A : Addison-Wesley, 1973.

[52] Kohonen, T., "Self-organized Formation of Topologically Correct Feature Maps," Biological Cybernetics, Vol. 43, pp. 59-69, 1982.

[53] Kohonen, T., "Correlation Matrix Memories," IEEE Trans, on Computers, C-21, pp. 353-359, 1972.

[54] Kosko, B. , "Adaptive Bidirectional Associative Momories," Applied Optics, Vol. 26, pp. 4947-4960, 1987.

[55] Kreins, E.R. and L . J . Allison, "Color Enhancement of Nimbus High Resolution Infrared Radiometer Data," Appl . Opt. Vol . 9, pp. 681, March 1970.

Bibliography 147

[56] Lee, Y . H . and A.T. Fam, "An edge gradient enhancing adaptive order statistic filter," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-35, No. 5, pp. 680-695, May 1987.

[57] Lewis, P.M. II, and C.L. Coates, Threshold Logic, John Wiley & Sons: New York, 1967.

[58] Lippmann, R.P., "An Introduction to Computing with Neural nets," IEEE ASSP Magazine pp. 4-22, April 1987.

[59] Lippmann, R.P., B. Gold, and M.L. Malpass, "A Comparison of Hamming and Hopfield Neural Nets for Pattern Classification," MIT Lincoln Laboratory Technical Report, TR-769, 1987.

[60] Lippmann, R.P., "Pattern Classification Using Neural Networks," IEEE Communications Magazine, Vol. 27, No. 11, pp. 47-64, Nov., 1989.

[61] Lorin, H. , Sorting and Sort System, Reading, MA: Addison-Wesley, 1975.

[62] Matthews, M.B., E.S. Moschytz, "Neural Network Nonlinear Adaptive Filtering Using the Exteneded Kalman Filter Algorithm," in Proc. of the Intl. Neural Network Conf., July 9-13, 1990, Paris, pp. 115-118.

[63] McCulloch, W.S. and W. Pitts, "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics, 5, pp. 115-133, 1943.

[64] Mead, C.A., Analog VLSI and Neural Systems, Addison-Wesley Publishing Company, 1989.

[65] Minsky, M . and S. Papert, Perceptron: An Introduction to Computational Geometry, Expanded Edition, The MIT Press, 1988.

[66] Mota-oka, T. and M . Kitsuregawa, The Fifth Generation Computer: The Japanese Challenge, John Wiley & Sons, 1985.

[67] Nakagawa, Y. and A. Rosenfeld, "A note on the use of local min and max operations in digital picture processing," IEEE Trans. Sys., Man, and Cybern. Vol. SMC-8, No. 8, pp. 632-635, Aug. 1978.

[68] Narendra, P.M. "A separable median filter for image noise smoothing," IEEE Trans. Pattern Anal. Mech. Intell., Vol. PAMI-3, No. 1, pp. 20-29, Jan. 1981.

[69] Owens, A.J . and D.L. Filkin, "Efficient Training of the Back-propagation Network by Solving a system of Stiff Ordinary Differential Equations," in Proc. of IJCNN'89, pp. 77-381-386, June, 1989.

Bibliography 148

Pao, Y . H . , Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company, 1989.

Parten, C.R., R .M. Pap, and C. Thomas, "Neurocontrol Applied to Telerobotics for the Space Shuttle," in Proc. of the Intl. Neural Network Conf., July 9-13, 1990, Paris,pp. 229-236.

Peeling, S.M., R.K. Moor, and M.J. Tomlinson, "The Multi-Layer Perceptron as a Tool for Speech Pattern Processing Research," in Proc. IoA Autumn Conf. on Speech and Hearing, 1986.

Poggio, T., V. Torre, and C. Koch, "Computational vision and regularization theory," Nature, Vol. 317, pp. 314-319, Sept. 1985.

Posch, T.E., "Models of the Generation and Processing of Signals by Nerve Cells: A Categorically Indexed Abridged Bibliography," USCEE Report 290, August 1968.

Powell, P.G. and B.E. Bayer, "A Method for the Digital Enhancement of Unsharp Grainy Photographic Images," Proc. Int. Conf. Electronic Image Processing, IEEE, UK, pp. 197-183, July 1982.

Pratt, W.K., "Median Filtering," in Semi-annual Report, Image Processing Institute, Univ. of Southern California, Sept. 1975, pp. 116-123.

Rabiner, L.R., M.R. Sambur, and C.E. Schmidt, "Applications of a nonlinear smoothing algorithm to speech processing," IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-23, No. 6, pp. 55-557, Dec. 1975.

Rosenblatt, R., Principles of Neurodynamics, New York, Spartan Books, 1959.

Rosenblatt, R., "The Perceptron: A Probabilistic Model for Information Storage and Organization In The Brain," Psychological Review 65, pp. 386-408, 1957.

Rumelhart, D.E., G.E. Hinton, and R.J. Williams, "Learning Internal Representations by Error Propagation," in D.E. Rumelhart &; J.L. McClelland (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Fundations, MIT Press, 1986.

Sawai, H. , A. Waibel, P. Haffner, M . Miyatake, and K. Shikano, "Parallelism, Hierarchy, Scaling in Time-delay Neural Networks for Spotting Japanese Phonemes/CV-syllables," in Proc. of IJCNN'89, pp. 77-81-88, June, 1989.

Sejnowski, T., and C R . Rosenberg, "NETtalk: A Parallel Network That Learns to Read Aloud," Johns Hopkins Univ. Technical Report JHU/EECS-86/01, 1986.

Bibliography 149

[83] Shi, Pingnan and Rabab K. Ward, "Using the Hopfield neural network to enhance binary noisy images," Proceedings of the Canadian Conference on Electrical and Computer Engineering, Vancouver, B.C. Canada, Nov. 3-4, 1988, pp. 760-763.

[84] Shi, Pingnan and Rabab K. Ward, "Using the perceptron to enhance binary noisy images," presented at the SCS Western Multiconference, San Diego, California, January 4-6, 1989.

[85] Shi, Pingnan and Rabab K. Ward, "A neural network implementation of the median filter," Proceedings of 1989 IEEE Pacific Rim Conference on Communications, Victoria, B.C., Canada, June 1-2, 1989, pp. 513-516.

[86] Shi, Pingnan and Rabab K. Ward, "A neural network structure for sorting non-negative integers in fixed time," Proceedings of 1989 Canadian Conference on Electrical and Computer Engineering, Montreal, Canada, Sept. 17-20, 1989, pp. 420-423.

[87] Shi, Pingnan and Rabab K. Ward, "The case for abandoning the biological resemblance restriction: An example of neural network solution of simultaneous equations," Proceedings of 1990 International Joint Conference on Neural Networks, San. Diego, California, June 17-21, 1990, pp. I l l 875-882.

[88] Siu, Kai-Yeung, and Jehoshua Bruck, "Neural Computation of Arithmetic Functions," IEEE Proceedings, Vol. 78, No. 10, Oct. 1990, pp. 1669-1675.

[89] Takeda M . and J.W. Goodman, "Neural Networks for Computation: Number Representations and Programming Complexity," Applied Optics, Vol. 25, No. 18, pp. 3033-3046, 15 Sept. 1986.

[90] Tukey, J.W., Exploratory Data Analysis, Addison-Wesley: Reading, Massachusetts, 1971.

[91] Van Trees, H.L., Detection, Estimation, and Modulation Theory, Vol. I, John Wiley & Sons: New York, 1967.

[92] Waibel, A., T. Hanazawa, G. Hinton, K. Shikano, and K. Larg, "Phoneme Recognition Using Time-delay Neural Networks," ATR Internal Report TR-1-0006, Oct.30, 1987.

[93] Walsh, G.R., Methods of Optimization, John Wiley & Sons: New York, 1975.

[94] Wasserman, P.D., Neural Computing: Theory and Practice, Van Nostrand Rein-hold: New York, 1989.

Bibliography 150

[95] Widrow, B. and M.E. Hoff, "Adaptive Switching Circuits", 1960 IRE WESCON Conv. Record, Part 4, 96-104, August 1960.

[96] Widrow, B. and R. Winter, "Neural Nets for Adaptive Filtering and Adaptive Pattern Recognition," IEEE Computer Magazine, pp. 25-39, March, 1988.

[97] Widrow, B. S.D. Stearns, Adaptive Singal Processing, Prentice-Hall: New Jersey, 1985.

[98] Woods, R.E. and R.C. Gonzalez, "Real Time Digital Image Enhancement," Proc. IEEE 69, pp. 643-644, 1981.

[99] Zhou, Y.T., R. Chellappa, and et al., "Image Restoration Using a Neural Network," IEEE Trans, on Acoust., Speech, and Sig. Proc, pp. 1141-1151, Vol. 36, No. 7,1988.

ALGEBRAIC DERIVATION OF NEURAL NETWORKS … S44... · ALGEBRAIC DERIVATION OF NEURAL NETWORKS AND...

Documents

Transcript of ALGEBRAIC DERIVATION OF NEURAL NETWORKS … S44... · ALGEBRAIC DERIVATION OF NEURAL NETWORKS AND...