Research Article Two-Dimensional Convolution Algorithm for ...
TWO-DIMENSIONAL IMAGE CONVOLUTION A THESIS IN …
Transcript of TWO-DIMENSIONAL IMAGE CONVOLUTION A THESIS IN …
TWO-DIMENSIONAL IMAGE CONVOLUTION
BY ANALOG COMPUTATION
by
DONALD ALLEN SYMES, B.S.T.
A THESIS
IN
COMPUTER SCIENCE
Submitted to the Graduate Faculty of Texas Tech University in
Par t ia l Fulfillment of the Requirements for
the Degree of
MASTER OF SCIENCE
Approved
May, 1997
re Q
m T^ \^^/\
rio.34
/^r'^-)0/>
ACKNOWLEDGEMENTS
I wish to thank Dr. Li, Dr. Oldham, and Dr. Wunsch for their patience,
^ -0 understanding and support. I also wish to thank the Ofiice of Naval Research for
providing the funding for the research, of which the subject of this thesis is part.
Mostly, I wish to thank my beloved wife, Cathy, without whom my life would have
neither rhyme nor reason.
11
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ii
LIST OF TABLES v
LIST OF FIGURES vi
CHAPTER
I. INTRODUCTION 1
1.1 Origins 1 1.2 Current Work 2
1.2.1 Neuromorphic systems 2 1.2.1.1 Parallel Analog Computation 3 1.2.1.2 Drawbacks to Analog 4
1.2.2 Algorithmic systems 5 1.3 This Work 6
II. MATHEMATICAL BASIS 7 2.1 Convolution 7
2.1.1 The Kernel 7 2.2 Correlation 8 2.3 Computational Expense 9 2.4 Problem Statement 10 2.5 Design 10
2.5.1 Voltage Reference 10 2.5.2 Multiplier Array 10 2.5.3 Summing Amphfier 11 2.5.4 Analog-to-Digital Converter 12 2.5.5 ISA Interface 12
2.5.5.1 Address Decoder 13 2.5.5.2 Index Register/Counter 14
III. PROCESS 16 3.1 Kernel 16 3.2 Calibration 16
3.2.1 Tables 17 3.2.2 Adjustment 18
3.3 Image 18 3.3.1 Loading 19 3.3.2 Border Fill 19
111
• I B ^
3.3.3 Convolution 19 3.4 Addressing constraints and penalties 20 3.5 Autocorrelation 22
3.5.1 Awkward Programming 23 3.5.2 Bus Traffic 23 3.5.3 Dynamic Range 23
IV. RESULTS 26 4.1 Analog Performance 26
4.1.1 Noise 26 4.1.2 Linearity, Accuracy and Precision 26
4.2 MultipUer Speed 28 4.3 Convolution Result 28 4.4 Computational Performance 30 4.5 Comparison with a Digital Parallel System 32
V. CONCLUSION 35 5.1 Future Work 35
REFERENCES 37
APPENDIX
A. SCHEMATICS 40
B. SOURCE CODE 66
IV
LIST OF TABLES
3.1 Floating-Point Edge-Enhancing Kernel 25
3.2 Normalized Integer Edge-Enhancing Kernel 25
LIST OF FIGURES
2.1 MultipUer Schematic. 15
3.1 Graphic Representation of Convolution. 21
4.1 Graph of Linearity. 29
4.2 Test Image. 31
4.3 Software Convolution. 31
4.4 Convolution by Accelerator. 34
VI
CHAPTER I
INTRODUCTION
Vision computing has been a subject of tremendous research interest since the
days of the original perceptron in the 1940's [McClelland 88]. The last decade or
so has shown a great deal of progress in our understanding of, at least, the early
stages of vision and of the underlying structures and processes that perform these
early processes in biological systems.
Despite the ever-increasing speed and power of digital computing systems, the
consensus appears to be that any practical vision system will, necessarily, be a
highly parallel, probably analog, computing system.
1.1 Origins
The eaxUest successful vision computer was the perceptron built by McCullogh
and Pitts in the nineteen forties. This system is more commonly viewed as one of
the early successes in machine intelligence and in neurocomputing. It was a large
array of "trainable" analog computing elements attached to a crude vision sensor.
The system was trained to distinguish between male and female faces and was
quite successful. The computation was performed by a number of simple parallel
analog computational elements arranged in highly interconnected layers.
Electronics technology was at the vacuum tube stage at that time, which
proved to be the main Umiting factor in building computers. Digital computers
were a rare and expensive luxury at that time also, and were far too limited in
power and storage density to offer a practical alternative.
Since that time, the vacuum tube has been replaced by the transistor, and
VLSI (Very Large Scale Integration) techniques have allowed the reUable
placement of millions of transistors in an area the size of a thumbnail. Research
into the fundamentals of visual neuroanatomy has also dramatically improved our
understanding of early visual structures and the processes they perform. The
rather crude perceptron has been replaced by circuits that more closely mimic the
response of the real biological retina.
1.2 Current Work
Two approaches dominate current research. The first is what Carver Mead
calls the "neuromorphic" — circuits that integrate acquisition and processing on a
single VLSI device. The more algortihmic approach reHes on more readily
available acquisition technology (typically CCD image sensors) and processes that
sensor's rasterized, serial output.
1.2.1 Neuromorphic systems
The neuromorphic vision chips, like the natural retina are focal-plane
processors that integrate sensing and considerable processing in a highly
compacted and interconnected array. They are by definition, massively parallel
analog processors. They exploit the nonUnear properties of transistors to perform.
in a single step, calculations that would require several steps of a digital
floating-point unit [Koch, Mathur 96]. Further, because each node on the device
an independent sense-and-compute unit, the calcuations proceed in parallel,
meaning that a complete result is available after only one step [Churchland 92].
Most of the work on silicon retinae refies on simple resistive grids to perform
the locally-weighted averaging that produces the desired gain adjustments in
response to global intensity, and also the antagonistic center-surround response
that enhances edges [Koch, Mathur 96]. The drawback to the simple resistive grid
is its linearity. The elements of the grid can be selected to perform simple hnear
computations, and once chosen, are fixed [Mead 90].
Integrating concepts from the Cellular Nonlinear Network (CNN), or more
accurately, adding photosensing to a CNN, allows a much wider range of local
mathematical operations [Roska, Chua 93]. The main drawback to the CNN at
this point is its relatively large areal requirement compared to the simpler cells —
from 4 to 60 times larger according to comparisons by Koch and Mathur
[Koch, Mathur 96].
1.2.1.1 Parallel Analog Computation
Analog processing elements are far less flexible than general-purpose digital
processors, but the vision algorithms that model early visual processing are also
fairly fixed, so building single-purpose hardware for vision systems is reasonable.
The chief benefit of going to analog processing is the extreme computational
density it allows compared to digital computation.
An analog multiplier circuit requires as few as four transistors; an eight-bit
digital multiplier uses over two hundred. For roughly the same device count, one
can perform fifty multiplications in analog, in parallel, with the same precision and
accuracy as the digital part. This huge increase in computational density comes
from the fact that analog computation reUes for its operation on the fundamental
laws of physics, rather than on some artificial algebra - addition and subtraction,
for example, are direct results of the law of conservation of charge [ElUott 92].
1.2.1.2 Drawbacks to Analog
The advantages of analog computation do not come free, however. Noise,
thermal drift, component matching, resistive and reactive loading and coupHng are
also direct results of those same fundamental laws, and all adversely affect the
operation of analog computational circuits. The same VLSI techniques that allow
the placement of over one million devices on a die for digital applications allow
very high density analog circuits as well. Placing an entire complex analog
operation on a single die is also one of the best ways of minimizing the effects of
all of the drawbacks to analog computing.
Placing the entire analog computational circuit on a single die limits the effects
of thermal drift because all of the components are drifting in the same direction
5
and are at very nearly the same temperature. Resistive and reactive loading and
coupling are also reduced by the very short wiring runs between components.
Component matching is usually quite good, variations being more of a
wafer/batch/process issue than affecting individual devices on a single die.
Circuits for trimming and tuning are also more readily implemented directly on
the die and programmed during post-fab testing than by providing access for
possibly thousands of external trim adjustments. For an example, see [Mead 90].
Noise is a complex issue, but part of its solution Hes in making the circuit as
compact as possible and you CcUinot get a circuit much more compact than a
single VLSI device.
1.2.2 Algorithmic systems
The objective of neuromorphic systems is to provide a general purpose
artificial vision system that is as flexible and adaptable as natural systems. They
are, by definition, custom VLSI devices, often large, difficult and expensive to
produce. As apphcations for these devices appear, and they move closer to
commercial production, most of these drawabacks will recede. In the meantime
there are a number of applications with more constrained requirements, both in
terms of the visual task required, and the funds available for implementation.
As an example, consider a system to monitor motorist compUance with
regulations governing access to commuter lanes during high-traffic hours. As a
6
civic project funds will be sharply constrained. On the other hand, the visual
environment is nicely constrained — moving vehicles in a well-defined lane with
nearly no background clutter. Careful selection of camera, angle and
region-of-interest greatly simplify the remaining task of locating edges and
tracking moving objects against an uncluttered background. An algorithmic
system — a general purpose microcomputer with some additional hardware and
software — is adequate.
1.3 This Work
The top-level research in this group is the hardware implementation of a
" Gabor" filter — a motion-sensitive and directionally selective filter designed to
extract motion information such as direction and velocity from a sequence of
images. Work is proceeding in stages, this stage being the implementation of a
proof-of- concept model in support of the next stage — a VLSI implementation of
a parallel analog multiplier designed to accelerate image convolution.
CHAPTER II
MATHEMATICAL BASIS
Convolution is a simple and well understood, but computationally expensive
algorithm that can, with a well-chosen kernel function, enhance edges and make
motion estimation much easier. It is the most common process in image
processing and vision computing, whether explicitly defined as a subprocess, or
noted as an effect (antagonistic center-surround response) [Koch, Mathur 96].
2.1 Convolution
Convolution is the process of multiplying two functions against one another at
each point in both functions. In image processing this is a two-dimensional process
described by the following integral:
/
OC rcc
/ I{^ - iy - 'n)Ki:'n)didT], -OC J— OC
(2.1)
where h{x.y) is the system characteristic equation (kernel) and f{x. y) is the
input.
2.1.1 The Kernel
The kernel used in convolution determines its function. Kernels may be
selected to perform high-pass (sharpening), low-pass (blurring) or bandpass
functions, or other visual effects such as "embossing." For edge detection, the
8
kernel of choice is a Laplacian-of-Gaussian (LoG) which is created using equation
2.2
k{i,j) = iiiidi
2^2 (2.2)
where a determines the spread of the central positive lobe of the function.
Adjusting a adjusts the sensitivity and selectivity of the edge-detection response.
Vision sensors, however, are not continuous, the image is sampled in two
dimensions. If Ax and Ay are considered the sampling intervals for the image,
then the discrete version of the image can be sampled from the continuous image
by the following relation.
/(771.772) = fc{niAx,n2Ay), (2.3)
where rii G [ 0 . . . . . Ni]. 772 G [ 0 . . . . . N2]^ the integral can then be recast as a
summation: N1-1N2-1
9{ni.n2) = E E / (^ i -k.n2- l)h{k., I)., (2.4) m = 0 Ti2=0
where h{k. I) is a 2-dimensional kernel and f(ni, 712) is the sampled image.
2.2 Correlation
Correlation, a measure of the similarity of two images, is a very similar process
with a nearly identical integral:
/
OC y o c
-OC J— OC (2.5)
9
which results in a very similar double sum:
Nx-l N2-I
9{ni:n2) = E E / ( ^ - ^ 1 : / - n2)h{k. I). (2.6) Til = 0 T l 2 = 0
Computationally, correlation amounts to convolution with a kernel that is the
same size as the image. Correlation is of interest in image and character
recognition, neural computing, and signal processing.
2.3 Computational Expense
The computational expense of both convolution and correlation is high. Both
are inner product matrix equations which require, for an 77 - pixel kernel and an
N"^ - pixel image, a total of n'^N'^ multiphcations. To be considered real-time, a
system should accommodate an image rate of at least 15 frames per second. A
512 X 512 - pixel monochrome system with a resolution of eight bits per pixel, at
15 frames per second, must handle a dataflow rate of 512^ x 15 = 3, 932.160 bytes
per second (3.75 Mbytes per second). A color system could easily triple that
requirement. This is the bandwidth consumed just getting the image into the
system. Processing the image once it is available consumes considerably more
bandwidth. To perform convolution on that 512 x 512 image using a 7 x 7 kernel
will require 512^ x 72 products per frame at 15 frames per second, for a total of
512^ X 72 X 15 = 192. 675.840 products per second. Correlation, where n = N^
requires .V* multiplications. For our example, a 512 x 512 image requires
68,719,476,736 multiphcations per frame, or 1,030,792,151,040 per second (at 15
10
frames per second). Add to these figures the bandwidth consumed by the
necessary code fetches, I /O, and other data manipulations; and the minimum
required bandwidth goes far beyond the foreseeable capacity of any general-
purpose computing machine that can reasonably be made portable.
2.4 Problem Statement
A hardware convolution accelerator supporting a 7 x 7 kernel was be
implemented using standard, off-the-shelf components. The accelerator takes
advantage of the parallel-friendly nature of the convolution double sum (Equation
2.4) and will perform the products-and-summation calculations in analog. The
accelerator resides as a peripheral on the ISA bus. Speed and accuracy of the
accelerator are compared to the host computer's CPU in section 4.1.
2.5 Design
The accelerator design consists of a few simple blocks: a voltage reference, the
multiplier array and a summing amplifier, an analog-to-digital converter and its
calibration circuit and an ISA bus interface.
2.5.1 Voltage Reference
The system reference voltage is nominally 1.2V. Because both the DACs and
ADC use the same reference in a closed, ratiometric fashion, absolute accuracy
and precision of the reference are not critical.
11
2.5.2 Multipher Array
The multipher array consists of 49 instances of a DAC-based muhipher. The
multipher requires a pair of DACs, one for unsigned image data (DAC A) and one
for signed kernel data (DAC B). For this impelementation, an Analog Devices
AD7528 dual 8-bit muhtplying DAC was chosen for its low cost, good inherent
precision, and simple interface. The array is divided into six individually buffered
rows of 8 DAC devices. These rows are paired into a high byte and a low byte to
provide a 16-bit interface. The 49th DAC device resides in the low byte of the
interface, and does not have a redrive buffer. A schematic for a single multipliers
is shown in Figure 2.1. This circuit is an adaptation of a pair of multipher circuits
found in the AD7528 data sheet [Analog Devices 92].
DAC A multiphes the reference voltage by the fraction "'"^12^"''" ' producing a
negative output voltage varying between 0 and -1.2V. DAC B uses this voltage as
its reference input, which it multiphes by the fraction —kernel^ 256 '^j producing an
output voltage varying between -1.2V and 1.2V. Thus, the transfer function of the
multipher is:
V, —pixel—kernel (pixel){kernel)
^^' 256 128 • 32768 ' (2.7)
12
2.5.3 Summing Amphfier
Summing 49 of these multiplier outputs with a summing amplifier produces
the overall transfer function:
1.2 ^ ' Vx,y = gain{offsei -}- E E vi^^^h.ykerneh^y). (2.8)
x=Qy=Q
The double sum at the heart of this transfer function is exactly the function
required for a 7 x 7 convolution on a single pixel. The gain factor and the offset
term are set by the calibration circuit.
2.5.4 Analog-to-Digital Converter
The result of the convolution sum-of-products is converted back to digital form
by an 8-bit ADC and read back into the host system's memory. Like any system
that interfaces between the analog and digital worlds, cahbration is required.
Calibration is automated by providing a pair of digitally-controlled
potentiometers, one to set the summing amplifier's offset voltage, and another to
set its gain. With these two adjustments under software control, the system can
match the output of the multipher array and summation function with the input
range of the ADC, thus reducing the effects of the inherent offset and gain errors
in both the DACs and the ADC. The cahbration process is explained in 3.2.
13
2.5.5 ISA Interface
The ISA bus is a simple, moderate-speed I/O channel. At 16 bits wide, with a
125nS clock and a minimum 3-cycle I /O transaction, the ISA bus has a theoretical
maximum transfer rate of 5.3 Mbytes per second. If this rate could be sustained in
a convolution with a 7 x 7 kernel, an image up to 85 x 85 pixels could be
processed at 15 frames per second (5.3e6/(49 x 15) = 85^). This rate cannot, of
course, be sustained. Code fetches and computation consume a substantial
fraction of the available bandwidth.
The ISA interface consists of two subsections: an address decoder and an index
register/counter.
2.5.5.1 Address Decoder
The ISA bus defines a 1024-byte I /O map distinct from the memory map. The
ISA standard also sets required locations for standard I /O functions like serial
channels and storage adapters [IBM 85]. To maximize compatibility with the
largest possible array of host systems, new hardware should be designed to occupy
one of the spaces in the I /O map that are defined as being otherwise vacant, and
with the abundance of non-standard hardware in the marketplace, the prudent
designer ensures that his hardware occupies the smallest possible footprint in the
I /O map, and that its base address is readily configurable. Our accelerator board
is equipped with a switch-programmable comparator that sets the base address of
14
the board. The largest contiguous vacant space is 80 bytes, which is too small to
accommodate the 98 DACs directly [Hogan 88]. Instead, we use a technique
common to complex adapters — indexing.
2.5.5.2 Index Register/Counter
Indexing is a technique that allows any number of data registers to be mapped
to a single I /O address and accessed by first writing the index number of the
register of interest to a separate index register. This board is equipped with an
index register/counter that maps sequential pairs from the array of 98 multiplier
DACs to a single 16-bit location. To minimize indexing overhead during
processing, the register/counter auto-increments following each write access to a
DAC, such that, on the next write access to the DAC data location, the next pair
of DACs in the sequence will automatically be selected.
15
ISTOSS-^yftf^^ I
iw I Muur ^ " u
Qg^^^AAd
Figure 2.1: Multipher Schematic.
CHAPTER III
PROCESS
This section describes the convolution process as it will be performed using
this accelerator board
3.1 Kernel
Processing an image begins with reading a kernel file, which is a text file
containing a header fine with the dimensions of the kernel and the value of gamma
on which that kernel is based (if any). This header is followed by an array of
floating-point values for use in the kernel. A typical edge-enhancing kernel file is
shown in Table 3.1.
Before this kernel can be loaded into the multiplier array, these values must be
converted to normalized, signed 7-bit binary values by the formula
kx.y — focy axiikerneil) "> ^here f^^y are the floating-point values from the kernel file,
kx.y are the binary values, and is the largest absolute value in the kernel file. This
would yield the kernel shown in Table 3.2, which can then be loaded into the
kernel section of the multipher array.
3.2 Calibration
Once a kernel is loaded, the board must be calibrated. Calibration attempts to
minimize the inherent gain and offset errors of the DACs, multiplier array,
summing amphfier and ADC by adjusting the minimum and maximum outputs of
16
17
the array-summer combination for a given kernel such that it uses the entire input
range of the ADC. There are several steps to cahbration.
3.2.1 Tables
Two 7 x 7 tables must be constructed: one to produce maximum output, and
one to produce minimum output. The two tables are built by placing a value of
255 (the maxium possible pixel value) in the appropriate locations such that, for
the example kernel:
0 0 0 0 0 0 0
0 0 0 255 0 0 0
I i ii\JLJU ^ "jy ^—
2 5 5 : ^ 6 ^ 2 ; ^ > 0
0:A;er7Zx V 0
0 0 255 255 255 0 0
= 0 255 255 255 255 255 0
0 0 255 255 255 0 0
(3.1)
0 0 0 255 0 0 0
0 0 0 0 0 0 0
and
255 255 255 255 255 255 255
255 255 255 0 255 255 255
18
rnin^_y = < 2^h\kerTij. y < 0
O'.kern^r^y > 0
255 255 0 0 0 255 255
= 255 0 0 0 0 0 255
255 255 0 0 0 255 255
(3.2)
255 255 255 0 255 255 255
255 255 255 255 255 255 255
3.2.2 Adjustment
By alternately loading these two tables into the image section of the multiplier
array, and adjusting the two digital potentiomenters that control the summing
amplifier's gain and off'set, the system is calibrated. First, the minimizing table is
loaded, and the offset pot is adjusted until the ADC produces the minimum
output. Then the maximizing table is loaded and gain is adjusted until the
maximum output is achieved. Then the minimum is checked and adjusted if
necessary, then the maximum is checked again. Iterating between the minimum
and maximum until either no more adjustment is required, or until some
maximum number of iterations is reached, the system settles on a combination
that is as well calibrated as the resolution of the digital pots allows.
19
3.3 Image
Once the kernel is loaded and the board calibrated, the image file may be
loaded and processed. The image file has a format similar to the kernel file in that
it is a text file, and the first line is a header with the dimensions of the image and
the maximum pixel value in the file.
3.3.1 Loading
If the kernel is N x N. and the image is S x T. allocate a memory array of
(5 -\- N — I) X {T -\- N — 1) bytes and read the image file into the S x T window at
the center of the memory array such that there is an ^^^ border surrounding the
image.
3.3.2 Border FiU
Because convolution exhibits an ahasing effect when the edge of the N x X
kernel moves beyond the edge of the image, yet we intend to process the entire
image, we must find a way to ehminate the edge effect. A simple way to ehminate
the edge effect is to create an fracN — 12 border around the image (described
above), and fill that space with repetitions of the pixels that make up the edge of
the image.
20
3.3.3 Convolution
Convolution begins by copying a kernel-sized array of pixels from the image
into the image side of the multipher array. After a brief setthng time, the ADC is
started, the image data pointers are incremented, and the next block may be
loaded while conversion is proceeding. Incrementing the data pointers amounts, in
graphical terms, to moving the image data window one pixel to the right.
By the time the loading of block *S -1- 1 is complete. Analog-to-digital
conversion of the convolution of block S is also complete, and its result may be
placed in the appropriate pixel location of the target image. See Figure 3.1.
For an 5" X T image and an N x N kernel, after loading block 5, the data
pointers are incremented not by 1, but by N — 1, bypassing the added border
pixels in order to move the data window to the beginning of the next row. After
processing T rows of S pixels, the image convolution is complete.
3.4 Addressing constraints and penalties
Experimentation has shown that copying the 7 rows of 7 bytes from the image
data into a dedicated 50-byte buffer, and then using the following code fragment
to copy the resulting 25 words to the board yields the best performance:
MOV CX, 25 ; Load counter with word count
MOV DX, DAC ; load I/O address register
MOV SP, BUFFER ; load pointer to data buffer
/I
Kerne
i
MM Processed Image
- r
I
I f
I
- I -k •
k . • p •
L . I p I
J . k i I
•^" P
. k ^ . • I
1 • 1 -
1
• ^ " . J .
1
• ^ -
1
•
L . 1 P •
k. 1
P •<
k« • p •< • •
L J . 1 1 P • » -
k ^ . 1 1 P ^ "
k J . 1 1 p n' 1 1 • •
— 1 — t 1 t 1 . . l . k J . k J . T i .^ _ M" v -;• Bor
• 1 1 1 1 • ^ " P ^ " P T - J 1. _» L J .
• • I I I
. -1 1. - i L - 1 . ' J 1 . 1 i 1
. . . L J . L J . k [ • • • 1 1 ^ ^ " p ^ " p ^ " P L J . k i ^ . k j . k [ • • I I I • "J" p ^" p ^ - p . J . k i / . k j . k , , « i 1 1 1
I
I I 1 r k J ) . k J . I • I I p " I " p • » ' k ^ . k ^ . I i • I p ^ " p f L J . k J . I I I I P ^ " P • ! ' • • I I
I I ^ " P J . L
I I • < • p i ^ . k I I
" p " I ' . k J .
I • • P " I ' . L J .
I I • P n ' . k J .
I I I I
• I • •*- r •%•
I I • • I - p ^ «
I I I • P ^ - P
. k J . k I I •
• P " » - P
. k J . k I I I
• P n " P . k ^ . k
I • I I • •
lip,iir(^ ,' ).l ( i i ap l i i r Ri;[)r(S(:iil at ion of C'onvohit ion.
22
REPNZ OUTSW DX, [ES:SP] ; copy da ta t o board
Copying the 49 bytes to a dedicated memory buffer also copies the data from
relatively slow DRAM system memory to the much faster SRAM cache, and also
allows sequential word access to a relatively long string of 25 words to maximize
eflBciency of the string instruction OUTS. The prefix REPNZ signifies that the
CPU is to perform the indicated instruction, decrement the counter register CX
and, if CX is not now zero, repeat the operation.
There is another benefit to this two-stage copy operation that also contributes
to improved performance. There is a significant performance penalty in
attempting a 16-bit memory reference to a so-called odd address (LSB = = 1).
This penalty would occur on every odd convolution cycle and seriously impact
performance if direct image-to-accelerator copies were attempted.
Intel processors rurming MS-DOS place a hmit on image size, too — the
64kByte segment size imposes a practical limit of 320 x 200 pixels for convenient
processing. A number of DOS extenders are available that claim to remedy this
hmitation, and a munber of 32-bit operating systems are available, but no
experiments were done with these products.
3.5 Autocorrelation
This architecture is not suited to accelerating autocorrelation. The small
computational array relative to the image size creates an awkward programming
23
task, drastically increases bus traffic, and the limited dynamic range of the board
comphcates handling of processed pixels.
3.5.1 Awkward Programming
The prime requirement for accelerating any kind of array processing is a clean
and natural interface between hardware and software. The odd size of the array
demands that the image to be processed have a width in pixels that is some
integer multiple of 49. Without this limitation, the task of autocorrelation cannot
be structured into a simple set of nested loops in the manner of convolution.
Including excess pixels on one edge of the image as fill is a poor solution because
of the 0(7^*) nature of autocorrelation.
3.5.2 Bus Traffic
Even accepting the requirement of a match between image and hardware, the
small size of the array relative to the image means that there is no kernel that can
be made resident in the array, calibrated once, and left in place for the duration of
processing. Instead each pixel in the image must be read from memory and loaded
into one DAC or another N"^ times compared to the N"^ -\-n'^ (where n is, at most
7) required by convolution. This is an increase in traffic of at least an order of
magnitude. In fact, testing shows a nearly 4x slowdown over direct floating-point
computation.
24
3.5.3 Dynamic Range
The limited resolution, and consequently hmited dynamic range, of the board
require recalibrating the system to each specific kernel. This is acceptable for
convolution, where a kernel is loaded once and kept resident during the processing
of the entire image. For autocorrelation, however, the kernel is the same size as
the image, and 7 x 7 images are seldom of interest. For maximum computational
performance it would be advantageous to recalibrate the board for each load into
the kernel side of the board, but the negative effect on time performance would be
extreme.
25
Table 3.1: Floating-Point Edge-Enhancing 7
-0.91 -1.42 -1.71 -1.74 -1.71 -1.43 -0.91
7 -1.43 -1.70 -0.72 0.13 -0.72 -1.70 -1.43
4.5 -1.71 -0.72 3.34 6.10 3.34 -0.72 -1.71
-1.74 0.13 6.10 10.00 6.10 0.13 -1.74
-1.71 -0.72 3.34 6.10 3.34 -0.72 -1.71
-1.43 -1.70 -0.72 0.13 -0.72 -1.70 -1.43
Kernel
-0.91 -1.43 -1.71 -1.74 -1.70 -1.43 -0.91
Table 3.2: Normahzed Integer Edge-Enhancing Kernel
-12 -18 -22 -22 -22 -18 -12
-18 -22 -9 2 -9
-22 -18
-22 -9 42 77 42 -9
-22
-22 2 77 127 77 2
-22
-22 -9 42 77 42 -9 -22
-18 -22 -9 2 -9 -22 -18
-12 -18 -22 -22 -22 -18 -12
CHAPTER IV
RESULTS
4.1 Analog Performance
Linearity, precision, accuracy and noise are the most important parameters of
the performance of any analog computational unit.
4.1.1 Noise
The effects of noise in an analog system can be minimized by providing
low-impedance ground and power planes, adequate decoupling capacitance, and by
using the maximum voltage swing available to the designer. It is also considered
good practice to isolate the circuit from noisy environments. Our board uses none
of these techniques.
The board is hosted in the electrically noisy environment of a microcomputer
chassis. There are no power or ground planes because the wire-wrap prototype
panels on which the board is constructed are not available with such planes. Such
decoupling is provided as space permits. An analog signal swing of ±1.2 Volts was
selected, not for performance, but for comparison with the eventual VLSI
implementation which would use the same range.
For all of these reasons, noise does intrude upon the overall performance of the
board.
26
27
4.1.2 Linearity, Accuracy and Precision
ror an analog-to-digital or digital-to-analog converter, accuracy and precision
are combined to measure the performance of the converter in terms of linearity.
Linearity, or rather, non-linearity, in a converter has two forms: integral and
differential. Consider an r^-bit digital-to-analog converter (DAC). In a DAC,
integral non-hnearity (INL) is measured by calibrating the DAC such that its
zero-value output is exactly 0 Volts, programming the DAC to its maximum
output, and measuring the difference between the DAC output and n — 1-^^ in
units of LSB (the AV represented by the least significant bit). INL is also called
gain error and can be minimized by careful calibration. Differential non-linearity
(DNL) is a similar measurement that is concerned with error between adjacent
values. Taken together, INL and DNL determine the useable resolution of a
converter — the part may offer 12 bits, but if it has a DNL of ±2 LSB (±2
counts), the useful resolution is only 10 bits because the error band of 4 means
that the two bits required to cover the error band are not meaningful.
The DAC parts used on the accelerator board are rated at ±0.5 LSB INL and
DNL, and the ADC is rated at ± 1 LSB. Adding the errors of the op-amps and
resistors in rest of each multiplier circuit and the summing amphfier, the overall
resolution of the board is expected to be between four and five bits.
The actual linearity as measured is shown in Figure 4.1. This chart was
produced by calibrating the board with the kernel shown in Table 3.1, and
28
supplying a ramping input that looked very much hke Equation (3.1). except
where Equation (3.1) has the value 255, our input would have the ramp value.
The chart shows the board output and the calculated output for each ramp value,
and the error between calculated and actual outputs (biased by 100 to make it
visible in the chart). Mean error is calculated at 1.3 counts (out of 256) and
absolute error is calculated at 2.6 counts. These error figures yield a precision of 6
bits and an accuracy of 5 bits. The resolution of the system can then be
comfortably described as 5 bits.
4.2 Multipher Speed
The DAC and op-amp parts selected are moderate-speed general purpose
devices suitable for the speed of the ISA bus. The DAC datasheet lists a settling
time of 200nS, and the op-amp datasheet lists 2uS to settle to 0.01%. This is close
to the value calculated from the output impedance and capacitance numbers given
in the datasheets that have r = 2.6AuS making the overall mulitpher time
constant 6r or 15.84uS (the multiplier has two amplifier stages).
A higher-speed version of the board, probably hosted on the PCI bus, would
require significantly faster devices. The DACs would need a digital interface about
three times faster than the AD7528's, and the op-amps would need to be
comparably faster for the faster expansion bus to make sense.
29
300
•Error • 100 •Calculated Result *ADCOutpiit Input Ramp
Input
Figure 4.1: Graph of Linearity.
30
4.3 Convolution Result
For the image shown in Figure 4.2, a floating-point convolution produced the
result in Figure 4.3. Compare this resuh with that of the hardware- accelerated
convolution in Figure4.4.
4.4 Computational Performance
The performance of ISA machines is highly variable, dependent not only on
CPU speed, but on number and types of I/O channels and the chipset that drives
those I/O channels. The ISA bus was defined when the fastest compatible CPU
was the 8 MHz 80286. At the time this project began, the typical processor in an
ISA system was a 33 or 66 MHz 80486 (both have a 33 MHz external bus), and
high performance systems used either a 60 or 66 MHz Pentium CPU (30 or 33
MHz external bus, respectively).
There were also two high-speed I/O channels in use, the 486-based,
consortium- developed VESA bus and the Pentium-based, Intel-developed PCI
bus. Both VESA and PCI channels carry a bus clock in the range of 25 to 33 MHz
and can therefore support a relatively small number of expansion slots. These
channels were also overkill for low-performance I /O devices hke modems and
floppy drive controllers. As a result, the chipsets that translate the CPU bus
signals and protocols to those of one of the high-speed channels, also produce
31
Figure 4.2: Test Image.
Figure 4.3: Software Convolution.
32
signals and protocols for the ISA bus. These chipset translations vary widely in
efficiency and performance.
Performing 7 x 7 convolutions on a 150 x 100 pixel image on a few machines
available in the labs around the CS department at TTU produced an average 5x
speedup over software-only convolution.
A 90MHz Pentium machine provides approximately break-even performance
with 1.29 seconds per frame for integer computation and 1.31 seconds per frame
using the accelerator. The trend toward microprocessors that operate their
computational cores and caches at increasing multiples of their external bus
speeds is increasing. The latest Intel Pentium Pro CPU runs internally at 200MHz
with mutiple integer computation pipelines, with a 200MHz LI cache, a 66MHz L2
cache and a 33MHz PCI expansion bus. A system based on one of these CPUs
would likely outperform this accelerator by a healthy margin.
4.5 Comparison with a Digital Parallel System
Adaptive Solutions, Inc. produces a line of digital parallel coprocessors for the
ISA and PCI buses - the CNAPS series. This section wiU briefly compare their
products with this analog acclerator.
The CNAPS boards are processors in their own right and rely on the host
CPU solely for loading in the source image and storing the result
[Adaptive Solutions 95/3]. This fact alone significantly reduces bus traffic and the
33
computational load on the host CPU. The arcitecture is SIMD and has a
maximum of 64 processing elements, meaning that 64 pixels could be processed in
parallel [Adaptive Solutions 95/2]. Therefore the CNAPS boards offer a significant
speedup over this accelerator [Adaptive Solutions 95/1].
Our accelerator requires less than 5 watts, mainly because it is a discrete
implementation with a significant digital content — the VLSI version wiU use
significantly less. The CNAPS boards use up to 25 Watts. Neither board has a
direct interface to video input, but the computational section of the accelerator
board can be implemented in an analog VLSI device, leaving enough board space
to add such an interface and still draw far less than 25 watts.
Implementing the 7 x 7 array in an analog VLSI and adding a video interface
has the further advantage of entirely offloading the chore of image I/O from the
host CPU and requiring only enough bus bandwidth to read out the processed
images.
II I mil • n n n n f M i r a T i II wi i'IT"
34
Figure 4.4: Convolution by Accelerator.
< \ ^I'^.r.i'.^fc^.^faWJMWlfMW^M.B.LB.*!'!. H J * I
CHAPTER V
CONCLUSION
A fully paraUel convolution system that performs an entire image convolution
in a single step would be the ideal. This board performs the convolution
summation- of-products in parallel for a given pixel, but processes each pixel
serially. Even given this limited parallehsm, and given the low speed of the ISA
bus, and the fact that the board stiU rehes on the host CPU for all of the data
handhng, blocking and ordering, the board offers a speedup of a factor of five over
the standard desktop system available at its time of inception.
Implementation of this project shows that, even using a small array, parallel
analog computation is an effective way to accelerate image convolution. It also
demonstrates the limitations of the ISA bus as a host technology for
high-bandwidth computation.
5.1 Future Work
This project is a demonstration of the vahdity of the concept of performing the
multiphcation/summation step of convolution using pixel-serial kemel-paraUel
analog computational elements. The future direction of research on this topic was
stated before this proof-of-concept project was begun — integration of this
computational array onto single analog VLSI die.
35
36
The board on which that analog VLSI device is to be integrated should not be
an ISA bus expansion board. Waiting for an ISA transaction to complete wastes a
significant number of cycles in today's high-speed microprocessors. A PCI board
would be a much better choice.
If this board is re-implemented in its discrete form, PCI should be the bus of
choice. This implies that very-high-speed DACs and op-amps will be required,
which will cause power consumption to increase. It may also require that the array
be divided into interleaved banks to cope with the speed of the PCI bus - even the
highest-speed commercially available DACs have fairly slow digital interfaces.
REFERENCES
[Adaptive Solutions 95/1] Adaptive Solutions, CNAPS Data Rook 1995.
[Adaptive Solutions 95/2] Adaptive Solutions, CNAPS/PCI-DLX Board Rtftrtnct Manual, 1995.
[Adaptive Solutions 95/3] Adaptive Solutions, CNAPS/PCI-PSP Board Rtftrtnct Manual, 1995.
[Analog Devices 92] Analog Devices, 'AD1528 CMOS Dual 8-hii Bufjtrtd
Multiplying DAC," Data Converter Reference Manual Volume I, 1992.
[Boahen and Anrdeou 1992] K. Boahen and A. Andreou, "A contrast sensitive sihcon retina with reciprocal synapses." In: Advances in Neural Information Processing Systems, Vol.4, Moody, J. E., Hanson. S. J., and Lippman, R. P., eds., pp. 764-772, Morgan Kaufmann, San Mateo, CA (1992).
[Burrus 85]
[Choudhary 90]
C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms, John Wiley & Sons, New York, 1985.
Alok N. Choudhary and Janak H. Patel, Parallel Architectures and Parallel Algorithms for Integrated Vision Systems, Kluwer Academic Pubfishers, 1990.
[Chua, Yang 88] Leon O. Chua and Lin Yang, "Cellular Neural Networks." IEEE Transactions on Circuits and Systems, IEEE, CAS-35, 1257, 1988.
[Chua, Roska 93] Leon O. Chua and Tamas Roska, "The CNN Paradigm." IEEE Transactions on Circuits and Systems, IEEE, CAS-40, 147, 1993.
[Churchland 92] Patricia M. Churchland, The Computational Brain. MIT Press, 1992.
[Damalcheruvu 93] Srinivas Damalcheruvu, A Two- Dimensional Convolution Unit Suitable for Analog VLSI Implementation With Vision Applications, Thesis (M.S.), Texas Tech Unversity, 1993.
[Elhott 92] WiUiam D. Elliott, Design and Performance Evaluation of a
Real-Time Im.age Processing Chip for Computer Vision,
Thesis (M.S.), Duke University, 1992.
37
38
[Hogan 88]
[Horn 86]
[IBM 85]
[Irvine 81]
[Koch, Li 94]
Thom Hogan, The Programmer's PC SOUK (hook, Microsoft Press, Redmond, WA, 1988.
Berthold Klaus Paul Horn, Robot Vision, MIT Press/McGraw-Hill, Cambridge, 1986.
IBM Corp., IBM PC/AT Technical Reference, IBM Corp, Boca Raton, 1985.
Robert G. Irvine, Operational Amplifier Characteristics and Applications, Prentice-Hall. New Jersey, 1981.
Christof Koch and Hua Li, Vision Chips: Implementing Vision Algorithms with Analog VLSI Circuits, IEEE Computer Society Press, 1994.
[Koch, Mathur 96] Cristof Koch and Bimal Mathur, " Neuromorphic Vision Chips," IEEE Spectrum, May, 1996.
[Lee 86]
[Li 94]
[Mahowald 92]
[McCleUand 88]
[Mead 89]
[Mead 90]
Hua Lee and Glen Wade, Imaging Technology, IEEE Press, New York, 1986.
Hua Li and Srinivas Damalcheruvu, "Locally Connected CMOS VLSI Design for Image Convolution," SPIE International Synnposium on Optical Engineering and Photonics in Aerospace Sensing, Orlando, FL, April 4-8, 1994.
Misha Mahowald, An Analog VLSI System for Stereoscopic Vision, Kluwer Academic Pubfishers, Boston, 1994.
James L. McClelland, Explorations in Parallel Distributed
Processing: A Handbook of Models, Programs, and Excercises,
MIT Press, Boston, 1988.
Carver Mead, "Adaptive retina," Analog VLSI implementations of neural .systems. Mead, C. and Ismail, M., eds., pp. 239-246, Kluwer, NorweU, MA (1989).
Carver Mead, "Neuromorphic Electronic Systems," Neurovision Systems, Madan Gupta and George Knopf, eds., pp. 463-470, IEEE Press, Piscataway, NJ (1994).
39
[Nussbaumer 82]
[Overington 92]
[Roska, Chua 93]
[Tohmieri 89]
[Yang 89]
Henri J. Nussbaumer, Fast Fourier Transform and Convolution Algorithms (Second Edition), Springer-Verlag, Berlin, 1982.
Ian Overington, Computer Vision - A Unified, Biologically-Inspired Approach. Elsevier, Amsterdam, 1992.
Tamas Roska and Leon O. Chua, "The CNN Universal Machine," IEEE Transactions on Circuits and Systems, IEEE, CAS-40, 1993.
Richard Tolimieri, Myoung An, Chao Lu, Algorithms for Discrete Fourier Tran.sform and Convolution, C. S. Burrus, Editor. Springer-Verlag, New York, 1989.
Woodward Yang, "Analog CCD processors for image filtering." Visual Information Processing: From Neurons to Chips, SPIE Proc. 1473: 114-127 (1991).
APPENDIX A
SCHEMATICS
i ^
UJUJ
S oo
° l S
>C5>5 - 5>o>
n sac 2. cog 5 aa^
> X
o
UJ
X
LU > m —I QL
O
- jS^r
*>
mzz
<oo
i oo S -
^ i JL ^
,
2 i
t
in
^ ffi
iA£:
•DA
C3
N
TR
ST
U
NTE
R
3 0 «f
*
5T-
o
o
i
5 2
l iT
oo iif
•iE'fi^
40
41
•iu I I
C ^ I Z C C C p
3 ooo o . - l
5 E i C C I l S i 8oS3S88E
t r
<
<
: ^ I ^ -A
Mi I \
i I I
illill §?
5lplSSSS Illill 5S
SSaaasss
sssszsas ,
5 So is is isas
SfisSaaaa
{SS^Siliy^'^ieiiii
<; *Si£iiiis9siiil*i?l - l -T: . I ' • M ' • ! •
IWMM lUUM MMll IMlm Mm mum cecjccEB ceciccce cecjpcre tecjccie ceesfcte cecsccie
SsS 355 S5S 355 3S3
^ fei
, ceesccjE
X5S
!2
3
?S3
i'
I i I I
3355
gSUSS . g3883 <>
! S' i
a 8 i a ® z <
o o >< UJ
a
42
/ ^ ^ .
f!
Ii
^ S
43
>-X
^ i X o <
I
5 M
^ . ^ - _
l illi iiili I f Ulhih I\ \ flSlii? l i i i si
i l l? f t ? t
44
' 5«
8 - 8 5 £ S j
8 B
I i § Mi ^
i i
g j^ S «
S* S5
1 i i
* S8
1—
.
" § |
1
es
I I 8 8 38§ h
I l lS i l i l
5 l o i
. _*-* - . » - - •
t l i l i
rH - === --
' nh i'i
n i l » e- " a
l i f t
T -^ ^- T. j}vmrj
nu ' ^ or e"
ill! I
M s
i l l
im
§ s a a i a s a s oS i
^ t
i'
45
s« . I s S?
fe i I I £' i 5 g * 8
g S
• 5 1
.' /-.
if
I
h < a
g 2 S?
5 i Si
J iBB
C 3!Sa §8iig88§ § 5 S
:/ -_:
it 1 fi
i 1-
i
, ii i • ^
§ • ?
- , . - . / •
i
« t
•• I 5
o 8 |
8 2iii
§1 iissi SiSiSgsi
§ I IISEI si§§g8sli
t ii H lliiigii
: f
,/ I
I I ^ C PT
i'
46
' 28
5 . H 2 M - -^
^ 5 to S
^ Si i
HI I « « ' I
i t —
5» i i
I—r ' i I "' " I '
9 8§ Si?
is
i r
e 8
:t ^ ? S i 5 -^ 8
2 S
S'-
n—^
!«
X i e
£«
' • \ ^
l »
i !?
3 | ;rn
fg i i£B8
iesSa s88igg8§9
-I !• !!
N i »
1
h ^•' 1 , . . - -. I I
1 ' 8 l
1
i'l
„ „ i elf '
' i?
* - ' t
. fill
Is l l i i igi i
I I
I I
w Si.
S
47
8 8 i &i 8.J • 'i. Ml I
« » f
i
gi
0 8 ?
T~r
I Ss 585
is < i cc
. l i i i . l i i i §1 i l s i l Si8iiSs§l
sKssasBS
IIS
I I ». . .-3» • - -
» L ^ils
^ils
nl
I"
l i i i ^ IF- BT
- 1 1 r s
Ii
in
ill 3 Pa
^ir
SSSSSBoS
SBBZSSSS
P g
8-.-8
^ 5 s«
f
9 5 f
• SI i
48
ii ii
^_, '^ _' _[ -^
ii 1 ' 88 5 8 |
I ^ "' r 1'
-- - —-
I'l
i 1
i
* i ft* I-- ' _ 8 > —-
is IK r 5 i
s 5 ; 9 liiJt sS§S§8si s
f i e fi
Bsaaaaas
l^il
i S
» « • 1 •
i
1 5§
:,
i i
5
'~
Sil l
gI ilsta g88i8gs§ I
i ,1 t li If
• I' t 'i I
1 * s
?i
i X
- " II § 8i osl
II
. fill 9 9 lissi 8i§i8§s§ 9
iip ssssBsas
ssssasoS
_E i-r » - -H « -.
s- f
i I
i I t i
49
fti
i—r
' I
=^. i s
I ^' 38|
3 . 8
* i fti § 9 iisSa 8§§S8S8§
I I aassssas
a. I —
11 l^ii
I li II
J Hi ii - ~
s s •;
l i gi SSg
ffii
. liii ilsSa si88§888 a
i I I [ i
§ I l^ssl 88888888 i
alp asssBSss
Bsssasaa
I i
lor
; i ft* - -
I 81 ?%|
fii
I I
0 I
• ,5 «
' i* >:
3-: I
^ 1
50
. fill ! I Msfl 88888888 I
ISSilaik
s a a 2 £ S £ s
8S8i§agS
^ t
S t
51
t i »i
- - ' Sg = 1
», 8 5
Ml \ ' c 9i? >
1
' i — i
1
• r
i-1
8 8
—11
ii
4 * -
--- ^
8 8 |
_
S|"'R
^
_8 >
_* 5 I
m —J
1—
8 _ §
Utsil 88888888
£ R
"I r
8 - = 5 fi
^ ^ ^ - - - ^
i i 88 g 8 |
i .;
^ £ i
J ™ i l I 88 88§
4 n i
g l i i i i I iiosi 88888888 I
Ik l§§§§§ii
^feaii 88888888 §
I I
J .
52
5 i.
I t
« 5 " «
" -- II I 88 g i |
>-
UJ . J Q .
ft?
£ R |
. liii \ \ l i s ^ i 888888881
l^ig > £S
I , 5 ••!.
ii 56 Sa?
. . . „ . _ . ,
' ~1 • !
*
1
!
-, s i
* 58
'
4 gs
I . liii
1 I iissi 88888888 I
„ liii
51 liaia 88888888 s
-'.UBcBsis-:
' J
U% siiiglii
I I
i I
• " ) .
53
ft J fij
I I I " I
J s
t SI i
I ' 81
±=h!
I I < i S i I—
i i ^^»fS 888888811 . liii
i l a s i 888888s§i
•BSZSSES * u
aaaasaSS
5 "
t t
54
, 5 1 'i 8 8
I
g i
j „ .. | 5
I 8^
<
. 5 5 fti 8- -_8
J 8 ^^^^
= ^* I 3 i i a f a 88§8ggs§ g
8 _ l ^ *? i i g fi
. iSSIiiii I l ^ l l ^ — - " - ^ gg
§ s
« I
. — = ^ .
i
-
~.
» —
1 j i
, .—
4 . 8i<
— .
. liii §1 l i ss i 8^^888881
i ;
in 5
i i
i i
8 i 8S?
. liii i l i i»t i 888388881
• ^ > - , •
I3§^§§ii W T
I I ._ ,.
i 1 I I i
55
Si
I ^ M
9
^ g
» S i
^ J i i J J i I 8§ sSf
Si
11 8 8
i ' l ^- I I iis^i 888888881 "5
8 - i fii
i L.
8 ._8
I i
i
§ ft
I' a. \—
I =3
g ft /,
' i i
r i i
/• ' i -,i * i
I 8 8 B S S
i
I I 38 giS
i
l i
. liii 5 I l l s ^ i 8888§§88i
i t
. liii 1 I liail 88888888 I
Els |g§gi§l§
• fc-
. • . ^ - - ^
41 tt a£ d i m od
I t
I -
56
' Si
8 ;•- ' ftj
9
t B 5 i- ftj
if t
i V I
§ i I i I ft
Si
il .5 =*.
= 5
I I
t- — — -« h -
g l i i i i i ilaii 88888888' 3i3 il
tit§§
§888 9
r'
i I
1 t
III
-r" -•f i f , ,
l l i t l i l l
I I I I ,
\ • • '
Ifr I f T;r - ^ - '?;i;,5«I'71fv''
l i i i ! I f I - I
.1
^\%\
lii=
iSi J ' m w
ooor §5 ^
I
1 iiiiiiii
5 s a a j a s a s oS ?
' ?
5"
57
8
* " S • Si ¥ & ;
; i I il I
I I
c £ 8 8
3
-
^ i
• — = - -
-
l i
1
i i
5 i fil ^ i _ i _ 8 =•
i i liissi 88888888 i
gi i i
is
Eii
3 « K ?3SS 8? S88 9
l i l i l i i l , b
N!
! [
J I i
1 ?
< -» UJ
I I =1 f l i
Si??,
I I l i s i i 88888888 1
— I l l l l i i l l l i
IM-
a c- PT
58
I «• si - 8 *
i
I ^
I i e ft
gi
II
Si
* i ft: ^^Qti 888888s8 §
R I E ft? CI — - o
S ft
-1 i
ssasasss ' i I
i ft A !•
' ii
s >
IllllSil ,i
o S
i I
i !,
8 §
S - ^ R T -
m S S O o
1 '! t
I 11
ii
II
l i
. liii I I llaii 88888888 I
-^ III lllllili
' I S 2 w t i I
59
r. ? S Si
? i
li i 5 E-
ill
.» S' i
i ft
i fi
= §1
OC UJ
Q.
I i l i s i a 888888881
^ i "'^
it
il
IlligSli
•BSZSSSS
.aaiaaaa a
aaaaaaSS
Si
M
60
I 8
I s «i
e i s i
. - • .
-; ii
1 =8 * *
• - -
r~.
. ^ „
<3 1
2->R^.
fti
B
5 g fti _-_J ^^siS 888888889
S?»fe
i . -
^
gs
. . ' - - Si ! ! I
i s
1 g l i i i
i i i i ss i 88888888 1
. J
-- - - ' I f i 1
l i
. liii Sisisi 88888888 1
Ih lliiigii
- • — • " • — « -
I I
8 S
y g. or
I I ?
I
61
.^l '- "i
I i
Si i i
i i
i , ^ •*• ? i
* g fit 8 * -
e i s I fti s 5 3ii 88888888 I
S ft
i i-
; j ; s
I' UJg
I a. t—
I =3
I i I ft
i^
ss
1 f gll
3 }
o 8 G8S
1 -t
1 I
f 8
I g l i i i J S i s f i 888888881
^ Ik liiiigii
I l i s t l 88888888 I ^ . = , . - . . . . „ u ^ -
t t f
5 t 5" s. »r I r
J i fti^^/ vs.
•! i. • ft
i ^
i ^ A 5 is
•I ' 'S«
I liii
I iis^i 88888888f
:i__ gSSSiSai
r.
I I t I
I-
62
1 t i
M if I
J S i
63
8SSSSSa§
g£ 38883858
ri snhit
o
is
o Ill I I
i i' i
5 e
§
i I
i
i53
L
f
ES
1
1 i '
— . — = — i « ' — ^ —
(]
«&
a
$ £ i s •<''' I s ^ - g« i i
liiiiii iiiiiii iiiiiii nnm nmn mun mmi = 1
1 i
i
o
64
g -
o o Mi
i 5
§.. ^ " G 5
O
o CO
. ; i
CO
o o
5 CO
o o
65
i
IP
•8
I
^ ri
w tu
O z
APPENDIX B
SOURCE CODE
/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
/ /
/ / F i l e : KERNEL.H / / Au tho r : Donald A. Symes / / P u r p o s e : D e f i n i t i o n f o r Kerne l c l a s s / /
// The kernel object cam create a convolution kernel given // a set of dimensions and a value for sigma. This floating-point // kernel will then be converted to a form useful to the hardwaxe // convolution accelerator. //
// Once created, the kernel may be stored to a disk file. A kernel // stored in a file can then be read in to a kernel object rather // thaji re-calculating. //
// A kernel file is a text file that has the following format for a // 7 X 7 kernel -
//
// w h sigma w and h are int values giving dimensions / / f f f f f f f of k e r n e l / / f f f f f f f sigma i s a f l o a t / / f f f f f f f / / f f f f f f f f a r e f l o a t s t h a t d e s c r i b e t h e k e r n e l / / f f f f f f f / / f f f f f f f / / f f f f f f f
/ /
/ / The d a t a members c o n s i s t of :
/ /
// w ajid h - WORD - dimensions of the kernel in pixels // sigma - float - the sigma value used to calculate kenrel // BYTE *maxa, *mina - pointers used in calibration // fdata - float* - pointer to the array of floating-point values // that make up the kernel // bdata - char* - pointer to signed 8-bit binary array
//
// The member functions are all public. Several constructors
66
// are provided:
//
// kernel0 - generic constructor. No mem allocated, no calculation. // Provided for completeness. //
// kernel(float ssigma, WORD ww, WORD hh) - constructor calculates // a kernel of given dimensions from given sigma using // formula //
// -[(i' 2 + j' 2) - 2sigma''2]e' [(i~2 + j ~2)/(2sigma' 2)] // k(x,y)= // 2sigma'"2 //
// kernel(ifstream &f) - construct kernel object from descr. in file.
//
// kernel(kernel &k) - copy constructor.
//
// A simple destructor is provided to ensure that the data arrays are // properly deallocated. //
// "kernel0 - destructor. Deallocates data arrays.
//
// Several utility functions are provided:
//
// create(float ssigma, WORD ww, WORD hh) - calculate kernel. // arguments - ssigma (float) sigma value used to calculate kernel // WW, hh(WORD) dimensions of kernel array. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail
//
// load(ifstream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error
//
// save(ofstream &f) - save kernel to file.
•e^aaa
68
// arguments - f (ofstream &) output file stream. // return - 0 (int) on success
// ~1 (int) on file access error //
// normalize0 - normalize kernel from float to binary. // arguments - none. // return - 0 (int) in all cases. //
// max_out() - returns sum of positive values in binary array
// arguments - none. //
// min_out() - returns sum of negative values in binary array // arguments - none. //
// widthO - return (WORD) dimension w // arguments - none. //
// height0 - return (WORD) dimension h // arguments - none. //
// SigmaO - return (float) value of sigma // arguments - none.
//
// pfdataO - return pointer (float *) to float version of kernel // array // arguments - none.
//
// pbdataO - return pointer (char *) to binary version of kernel
// array // arguments - none.
//
//
//////////////////////////////////////////////////////
#include <iostream.h> #include <fstream.h>
class kernel
{ WORD w, h; // dimensions of kernel in pixels BYTE *maxa, *mina;// pointers used in calibration
69
float sigma float *fdata char *bdata
// sigma used to calculate kernel // pointer to array of float version // pointer to array of binary version
public:
// constructors & destructors kernel 0;
kernel(float ssigma, WORD ww, WORD hh); kernel(ifstream &f); kernel(kernel &k); "kernel 0;
int create(float ssigma, WORD ww, WORD hh); // calculate new kernel int load(ifstream &f) int save(ofstream &f) int normalize(); int majc_out() ; int min_out();
// read kernel file // save kernel file
// normalize kernel from float to binary // return sum of positive values // return sum of negative values
WORD width0 { return w; } WORD height 0 { return h; } float SigmaO { return sigma; } float *pfdata() { return fdata; } char *pbdata() { return bdata; }
// functions called from constructors ONLY // calibrates board to current kernel // adjusts gain pot // adjusts zero pot // fills DACs // return current ADC input
private: void calibrateO ; int seek_max(); int seek_zero(); void set_data(BYTE * ) ; BYTE read.adcO; void settleO ;
/ > / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /
//
// File: KERNEL.CPP // Author: Donald A. Symes // Purpose: Function definitions for kernel class
//
// The member functions are all public. Several constructors
// are provided:
//
70
// kernel0 - generic constructor. No memory allocated, no // calculation.
// Provided for completeness.
//
// kernel (float ssigma, WORD ww, WORD hh) - constructor that // calculates a kernel of given dimensions from given // sigma using formula //
// -[(i~2 + j~2) - 2sigma'"2]e' [(i' 2 + j'•2)/(2sigma''2)] // k(x,y)= // 2sigma' 2 //
// kernel(ifstreajn &f) - construct kernel object from file desc. //
// kernel(kernel &k) - copy constructor. //
// A simple destructor is provided to ensure that the data arrays are // properly deallocated. //
// "kernel() - destructor. Deallocates data arrays.
//
// Several utility functions are provided:
//
// create(float ssigma, WORD ww, WORD hh) - calculates kernel. // arguments - ssigma (float) sigma used to calculate kernel // WW, hh(WORD) dimensions of kernel array. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail
//
// load(ifstream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error
//
// save(ofstream &f) - save kernel to file. // arguments - f (ofstream &) output file stream.
71
// return - 0 (int) on success
// "1 (int) on file access error //
// normalize0 - normalize float values in kernel // arguments - none. // return - 0 (int) n all cases. //
// max_out() - returns sum of positive values in binary array // arguments - none. //
// min_out() - returns sum of negative values in binary array // arguments - none. //
// widthO - return (WORD) dimension w // arguments - none. //
// height 0 - return (WORD) dimension h // arguments - none. //
// SigmaO - return (float) value of sigma // arguments - none. //
// pfdataO - return pointer (float *) to float kernel array // arguments - none.
//
// pbdataO - return pointer (char *) to binary kernel array // arguments - none.
//
// private functions called from constructors ONLY
//
// kernel::calibrateO calibrates board to current kernel // arguments - none. // returns - void.
//
// kernel::seek_max() adjusts gain pot // arguments - none. // returns - void. //
// kernel::seek_zero() adjusts zero pot // arguments - none. // returns - void.
72
/ /
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiuiiiiiiiiiiiii/i
#include <math.h> #include <float.h> #include <dos.h>
#include "always.h" #include "kernel.h" #include "board.h" #include "pot.h"
pot gain_pot(GAIN); pot zero_pot(ZERO);
// Constructors ////////////////////////////////////////// //
// kernel0 - generic constructor. No memory allocated, no // calculation.
// Provided for completeness. //
kernel::kernel()
{ fdata = NULL; bdata = NULL; mina = maxa = NULL; w = h = 0;
sigma = 0.0;
}
//
// kernel(ifstream &f) - construct kernel object from file.
//
kernel::kernel(ifstream &f)
{ mina = maxa = NULL;
f >> w >> h » sigma; // read header
float max = -1000.0; int size = w * h;
73
fdata = new float [size]; // allocate space bdata = new char [size]; for (int i = 0; i < size && f.good() && !f.eof(); i++) {
f » fdata[i]; // read float values from file if (max <= fabs(fdata[i]))
max = fabs(fdata[i]); // track abs max for normalization }
for (int 1 = 0 ; 1 < size; 1++) // fill binary array with values bdata[1] = 127.0 * fdata[l]/max; // normalized from float values
>
//
//
// kernel (float ssigma, WORD ww, WORD hh) - constructor that // calculates a kernel of given dimensions from given // sigma using the formula from Meng & Li
//
// -[(x' 2 + y''2) - 2sigma-2]e' [(x''2 + y-2)/(2sigma-2)] // k(x,y)= // 2sigma"2
//
kernel::kernel(float ssigma, WORD ww, WORD hh)
{ int xindex, yindex; float image, image1;
mina = maxa = NULL;
sigma = ssigma; w = ww; h = hh;
// set object values from args
fdata = new float[w * h]; // allocate data arrays
bdata = new char[w * h] ;
xindex=(w - l)/2; yindex=(h - l)/2;
// create indexing centers
74
for (int i = -xindex; i < xindex + 1; i++) {
for (int j = -yindex; j < yindex + 1; j++) {
image = exp(-(float)(i * i + j * j)/(2 * sigma * sigma)); imagel = ((float)(i * i + j * j ) - 2 * sigma * sigma) * image; image = imagel/(2 * sigma * sigma); fdata[((i + xindex) * w) + (j + yindex)]= -image;
} }
float max = -1000.0; // find max absolute value in kernel xindex = w * h; // for normalization for (int k = 0; k < xindex; k++)
if (max <= fabs(fdata[k])) max = fabs(fdata[k]) ;
float factor = 127/max; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array
bdata[1] = fdata[1] * factor;
//
// Copy constructor
//
kerne l : :ke rne l (kerne l &k) {
mina = maxa = NULL;
h = k.heightO; // set dimensions and sigma from source w = k.widthO ; sigma = k.SigmaO;
fdata = new float [w * h] ; // allocate data arrays
bdata = new char [w * h] ;
memcpy(fdata,k.pfdata0,(w * h) * sizeof(float)); memcpy(bdata,k.pbdata(),(w * h) * sizeof(char));
// copy
// arrays
} //-
75
// Destructor 1111111111111111111111111111111111111111111111111
kernel::"kernel()
{ if (fdata) delete fdata; // deallocate data arrays if (bdata) delete bdata; if (mina) delete mina; if (majca) delete maxa;
} //
// Utility functions 111111111111111111111111111111111111111111111
II // create(float ssigma, WORD ww, WORD hh) - calculates a kernel // of given dimensions from given sigma using the // formula from Meng & Li
//
// -[(x''2 + y' 2) - 2sigma' 2]e [(x' 2 + y-2)/(2sigma'^2)] // k(x,y)= // 2sigma''2
//
// arguments - ssigma (float) sigma value used to calculate kernel // WW, hh(WORD) dimensions of kernel array.
//
// ***WARNING*** assumes data pointers are NULL! ***WARNING***
//
// side effects - allocates w * h array of float // allocates w * h array of char
//
// ***WARNING*** assumes data pointers are NULL! ***WARNING***
//
// return - 0 (int) on success // -1 (int) on memory allocation fail
//
int kernel::create(float ssigma, WORD ww, WORD hh)
{ int xindex, yindex; float image, imagel;
sigma = ssigma;
76
w = ww; h = hh;
fdata = new float[w * h]; // allocate space bdata = new char[w * h] ;
if (!fdata || !bdata) // check for allocation error return -1;
// use calculation from Meng/Li routine xindex=(w - l)/2; yindex=(h - l)/2;
for (int i = -xindex; i < xindex + 1; i++) {
for (int j = -yindex; j < yindex + 1; j++) {
image = exp(-(float)(i * i + j * j ) / ( 2 * sigma * sigma)); imagel = ((float)(i*i + j*j) - 2 * sigma * sigma) * image; image = imagel/(2 * sigma * sigma); fdata[((i + xindex) * w) + (j + yindex)]= -image;
} }
float max =0.0; // find absolute max for normalization xindex = w * h; for (int k = 0; k < xindex; k++)
if (max <= fabs(fdata[k])) max = fabs(fdata[k]);
float factor = 127/max; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array
bdata[1] = fdata[1] * factor;
return 0;
} //
//
// normalize0 - normalize float values in kernel // arguments - none. // return - 0 (int) in all cases.
77
//
int kernel::normalize0 {
float max = -1000.0, min = 1000.0; int size = w * h;
for (int k = 0; k < size; k++) // find min and max {
if (max <= fdata[k]) max = fdata [k] ;
i f (min >= fda ta[k] ) min = fdata[k] ;
}
/ / normalize f loa t data f l o a t span = majc - min; for ( i n t 1 = 0 ; 1 < s i z e ; 1++)
fdata[1] = (fdata [1] - min)/span + min/span;
return 0;
//
// load(if stream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error
//
int kernel::load(ifstream &f)
{ f » w » h » sigma; // read header
int size = w * h; int i = 0;
if (fdata) delete fdata; if (bdata) delete bdata;
// deallocate existing arrays
// if any
fdata = new float [size]; // allocate new arrays
78
bdata = new char [size];
if(!fdata || !bdata) // if allocate failed return -1; // return -1
while (f.goodO kk !f.eof()) // read float values f » fdata [i++];
if (f .badO kk !f .eof()) // if there was a file error return -1; // return -1
float max = 0.0; int xindex = w * h;
for (int k = 0; k < xindex; k++) // find absolute max kernel value if (max <= fabs(fdata[k])) // for normalization
max = fabs (fdata [k] ) ;
float factor = max/127; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array
bdata[1] = fdata[1] * factor;
return 0;
} //
// save(ofstream &f) - save kernel to file. // arguments - f (ofstresim &) output file stream. // return - 0 (int) on success // -1 (int) on file access error
//
int kernel::save(ofstream &f)
{ f « w « h « sigma; // write header
int size = w * h; // set counter int i = 0; // and index
while (size— kk f .goodO) // write float values & f « fdata[i++]; // check for file errors
if (f .badO) // if there was an error
79
return -1; else
return 0;
// return -1
// otherwise return 0; } //
// max_out() - returns sum of positive values in binary array // arguments - none. //
int kernel::max_out() {
int retval = 0;
for (int i = 0, x = w * h ; i < x ; i++) if (bdata[i] >= 0)
retval += bdata[i];
return (retval);
} //
// min_out() - returns sum of negative values in binary array // arguments - none.
//
int kernel::min_out()
{ int retval = 0;
for (int i = 0 , x = w * h ; i < x ; i++) if (bdata[i] <= 0)
retval += bdata[i];
return (retval);
} Ih II II II II II II II II
void kernel: :calibrateO args - none return - void
Purpose: calibrate convolution accelerator board for maximum headroom for current kernel
80
void kernel: :calibrateO {
cout « "CalibratingXn";
int X = w * h; if (mina) delete mina; if (maxa) delete maxa;
// create arrays to will produce min and max out for given kernel maxa = new BYTE [x]; mina = new BYTE [x]; // allocate memset(maxa,0,x); memset(mina,0,x); // set to zero
for (int i = 0; i < x; i++)
{ if (bdata[i] < 0) //if kernel value negative
mina[i] = OxFF; // set min array value to max pixel if (bdata[i] > 0) // if kernel value positive
majca[i] = OxFF; // set max array value to max pixel }
// make up to 512 passes at cal int limit = 512;
while (!seek_zero() I I !seek_max() II limit—);
} //
// kernel::seek_max() adjusts offset pot // arguments - none. // returns - 0 if achieves zero setting // -1 if zero setting not made in 384 tries
//
int kernel::seek_zero()
{
cout « "Z" ; // report adjusting offset int z = 384; // set iteration limit static int lastmove = 0 ; // store dir of last adjustment BYTE last_read = OxFF, reading = OxFF; // set some starting values
set_data(mina); // fill image DACs with minimizing image frag
while (z—)
{
81
settleO; // let DAC array settle reading = read_adc(); // check value
if (reading == 0 kk last.read == 0 kk lastmove == 1) { // with these conditions, offset is BELOW 0
++zero_pot; // so move adjustment UP last_read = reading; continue; // and try again
} if (reading == 0 kk last_read != 0 && lastmove == 1) { // with these conditions,
++zero_pot; last_read = reading; continue;
} if (reading != 0 && last_read == 0 kk lastmove == 1) { // these conditions indicate last adjustment was it
—zero_pot; // move back down by one last_read = reading; lastmove = -1; // change direction indicator continue; // try again
}
if (reading != 0 && last_read != 0 && lastmove == 1) •[ // moving in wrong direction!
—zero_pot; // move down last.read = reading; lastmove = -1; // change direction indicator continue; // try again
} if (reading == 0 kk last_read == 0 kk lastmove == -1) { // conditions indicate last adjustment was the one
++zero_pot; // move back up by one
last_read = reading; lastmove = 1 ; // reverse direction indicator continue; // try again
} if (reading == 0 kk last.read != 0 && lastmove == -1)
return 0; // GOT IT! if (reading != 0 && last_read == 0 kk lastmove == -1) { // odd reading
82
}
—zero_pot; // keep moving down last_read = reading; continue; // try again
if (reading != 0 && last.read != 0 && lastmove == -l) { // Too high
—zero_pot; // move down last_read = reading; continue; // try again
>
kernel::seek_max() adjusts gain pot arguments - none, returns - 0 if achieves max: setting
-1 if max setting not made in 384 tries
>
return -1; } //
//
//
//
//
//
int kernel::seek_max()
{ cout « "G" ; // report that we are adjusting offset int z = 384; // set iteration limit static int lastmove = 0 ; // store direction of last adjustment BYTE last_read = 0, reading = 0; // set some starting values
set_data(majca) ; // fill image DACs with minimizing image fragment
while (z—)
{ settleO ; reading = read_adc();
// let DAC array settle // check value
if (reading == OxFF kk last.read == OxFF kk lastmove == 1) { // Too high
—gain_pot; // move down
last_read = reading; continue; // try again
>
if (reading == OxFF kk last.read != OxFF kk lastmove == 1)
83
{ —gain.pot; last_read = reading; continue;
>
if (reading != OxFF kk last.read == OxFF kk lastmove == 1) {
++gain_pot; last_read = reading;
lastmove = -1; continue;
} if (reading != OxFF kk last.read != OxFF kk lastmove == 1) {
++gain_pot; last_read = reading; lastmove = -1; continue;
}
if (reading == OxFF kk last_read == OxFF kk lastmove == -1) { // Too high
—gain_pot; // keep moving down last_read = reading; lastmove = 1; continue;
} if (reading == OxFF kk last_read != OxFF kk lastmove == -1)
return 0; // GOT IT! if (reading != OxFF kk last_read == OxFF kk lastmove == -1) { //
++gain_pot; last_read = reading; continue;
>
if (reading != OxFF kk last.read != OxFF kk lastmove == -1)
{ //
++gain_pot; last_read = reading; continue;
}
84
>
return -1;
} //
// kernel::set_data(BYTE *d) // args - d (BYTE *) // return - void
//
// loads image DACs from array pointed to by arg
//
void kernel::set_data(BYTE *d)
{ outp(COUNTER, 0x00); // set counter to point to image DACs
for (int i = 0, j = (w * h)/2; i < j; i++) outpw(DACS,((int*)d)[i]);
} //
// kernel::read_adc() // args - none // return - ADC reading
//
// convert and return value on ADC input
//
BYTE kernel::read_adc()
{ outp(ADC,0); settleO ; while (inp(COUNTER) k 0x80);
return inp(ADC);
}
void kernel: : settleO
{ int a = 10000; // waste time
while(a—);
}
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll II II File: IMAGE.H // Author: Donald A. Symes
85
// Purpose: Definition for image class
//
// The image object holds an image, creates an optional border // and performs convolution from a given kernel eind a source image. // The border is required to prevent aliasing at edges of image. The // border is filled with copies of the edge pixels.
//
// The image is read from or stored in a text file with the following
// format -
//
// w h z w (int) width of image in pixels / / i i i i i i i i i i i . . . . h (int) height in pixels
/ / i i i i i z (int) maximum pixel value in image // : (defaulted to 255) // : i (int) pixels //
// The data memebers are:
//
// w and h - WORD - width and height of image in pixels (bytes) // z - WORD - maximum pixel value (defaulted to 255) // b - WORD - width of border around active portion of image // bdata - BYTE* - pointer to image data array
//
// The member functions are all public. Several constructors are
// provided:
//
// image0 - generic constructor. No allocation. //
// image(WORD width, WORD height, WORD border) // side effects - allocates data array from given dimensions // and border, fills array with zeroes
//
// image (if stream &f, WORD border = 0) - reads image from file // side effects - allocates data array
//
// image(image &k) - copy constructor
//
// A simple destructor guarantees orderly deallocation of image array
//
// "imageO;
//
86
// Several utility functions are provided* //
// load(ifstream &f, WORD border = 0) - read image from file // args - f (ifstream &) input file stream // border (WORD) border size
// side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error
//
// save(ofstream &f) - write image to file
// args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error //
// widthO - return (WORD) width of active image (except border) // height0 - return (WORD) height of active image (except border) // maxO - return (WORD) maximum pixel value // bordO - return (WORD) border width // pbdataO - return (BYTE *) pointer to image array //
// display(WORD x,WORD y) - display image (active area) at given // coordinates. Does NOT perform bounds checking - image // position is responsibility of programmer. // args - X, y (WORD) pixel coordinates for upper left corner // of image. // return - void //
// There are three versions of convolution routine.
//
// conv.float(const kernelft k, const image& i) - FP version // performs all calculations in floating point, for comparison. // conv.int (const kernel& k, const image& i) - integer version // performs calculations using long integers, for comparison. // conv_accel(const kernel& k, const imageft i) - hardware accel // version. Perform calculation using accelerator board.
//
// Args - k (kernel &) kernel for convolution // i (image &) image to be convolved // side effects - allocate and delete kernel-sized array and // small array of pointers // return - (int) -1 on allocation error
8/
//
// For comparison of efficiency, some various convolution routines
// use different methods to access the accelerator board.
//
// hw_conv(const image &I, BYTE *kernel) - 'C++' language version
// very slow, used DMA, obsolete.
//
// hw_conv_8_noblk(const image &I) - 8-bit transfers direct from
// image array. Slowest of ASM versions.
//
// hw_conv_8_block(const image &I) - 8-bit transfers using
// intermediate buffer.
//
// hw_conv_a(const image &I, BYTE *kernel) - Uses assembly code for
// bulk of convolution. 16-bit transfers direct from image array
//
// hw_conv_16_block(const image &I, const kernel &k) - 16-bit xfers
// using intermediate buffer. Fastest of these versions.
//
l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l
#include <iostream.h>
#include <fstream.h>
#ifdef UICPP
extern int attached;
#endif
#ifndef UICPP static int attached = 0 ; // tracks state of display initialization
#endif
class image
{ WORD w, h, z, b; // width, height, maxpixel and border width BYTE *bdata; // pointer to data array
public: // constructors
imageO; // generic
image(WORD width, WORD height, WORD border); // blank
image(ifstream &f, WORD border = 0 ) ; // read from file
88
image(image &k);
imageO ; // destructor
// Utility functions
// copy
int load(ifstream &f, WORD border = 0); int save(ofstream &f);
WORD widthO { return w WORD height 0 { return h WORD max() { return z WORD bordO { return b
} } } }
BYTE *pbdata() { return bdata; }
int conv_f loat (kemel& k, imageft i) ; int conv_int (kernelft k, imageft i); int conv_accel(kemel& k, imageft i) ;
void display(WORD X,WORD y);
//
// Obsolete (slow) convolution functions
//
// hw_conv(const image &I, BYTE *kernel); // hw_conv_8_noblk(const image &I) ; // hw_conv_8_block(const image &I); // hw_conv_a(const image &I, BYTE *kernel); // hw_conv_16_block(const image &I, const kernel &k);
};
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll II II File: IMAGE.CPP // Author: Donald A. Symes // Purpose: Definition for image class member functions
//
// The member functions are all public. Several constructors are // provided:
//
// imageO - generic constructor. No allocation.
//
89
// image(WORD width, WORD height, WORD border)
// side effects - allocates data array from given dimensions // and border, fills array with zeroes //
// image(ifstream &f, WORD border = 0) - reads image from file // side effects - allocates data array //
// image(image &k) - copy constructor //
// A simple destructor guarantees orderly deallocation of image array //
// "imageO; //
// Several utility functions are provided: //
// load(if stream &f, WORD border = 0) - read image from file // args - f (ifstream &) input file stream // border (WORD) border size // side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error //
// save(ofstream &f) - write image to file // args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error //
// widthO - return (WORD) width of active image (except border) // height0 - return (WORD) height of active image (except border) // max() - return (WORD) maximum pixel value // bordO - return (WORD) border width // pbdataO - return (BYTE *) pointer to image array
//
// display(WORD x,WORD y) - display image (active area) at given // coordinates
//
// There are three versions of convolution routine.
//
// conv_float(const kernel* k, const imageft i) - FP version // performs all calculations in floating point, for comparison. // conv_int (const kernel* k, const imageft i) - integer version
90
// performs calculations using long integers, for comparison. // conv.accel(const kernel* k, const image* i) - hardware accel // version. Perform calculation using accelerator board. //
// Args - k (kernel *) kernel for convolution '/ i (image *) image to be convolved // side effects - allocate and delete kernel-sized array and // small array of pointers // return - (int) -1 on allocation error //
//
/////////////////////////////////////////////////////////////////////
#include <iostream.h> #include <math.h> #include <float.h>
#include "always.h" #include "kernel.h" #include "image.h" #include "board.h" extern "C" { #include "pcdip.h" }
// Constructors 111111111111111111111111111111111111111111
II // imageO - generic constructor. No allocation.
//
image::image()
{ bdata = NULL; w = h = z = b = 0;
// cout << "image: : imageO\n";
} //
// image(ifstream *f, WORD border = 0) - reads image from file // side effects - allocates data array
//
91
image::image(ifstream *f, WORD border) {
f » w » h » z; // read header b = border;
int skip = border * 2; // set some indexing constants int height = h + border; int width = w + border; int row = w + skip; int read_data, i, j;
bdata = new BYTE[(w + skip) * (h + skip)]; // alloc image space memset(bdata,0,(w+skip)*(h+skip)) ; // and set it to zero
for (i = border; i < height; i++)
{
for (j = border; j < width; j++)
{ f >> read_data; // read pixels into active bdata[(i * row) + j] = read_data; // part of array
} }
if (border) // if there is a border, copy pixels from edge of active { // part of array to fill border
height += border - 1; for (int k = border; k > 0; k—)
{ memcpy(*(bdata[(k - 1) * row]), *(bdata[k * row]), row); memcpy(*(bdata[(height - k + 1) * row]),
*(bdata[(height - k) * row]), row);
}
int stop = height + border; int ndx; for (int m = 0; m < stop; m++)
-C ndx = m * row; for (int n = 0; n < border; n++)
{ bdata [ndx + border - n - 1] = bdata[ndx + border];
92
} }
bdata [ndx + width + n] = bdata [ndx + width - 1] ;
} //
// image(WORD width, WORD height, WORD border) // side effects - allocates data aurray from given dimensions // and border, fills array with zeroes //
image::image(WORD ww, WORD hh, WORD border) {
z = 255; w = ww; h = hh; b = border; bdata = new BYTE [(w + border) * (h + border)]; memset(bdata,0,(w + border) * (h + border));
} //
// image(image &k) - copy constructor
//
//
image::image(image *k)
{ h = k.heightO; w = k.widthO ; z = k.maxO ; b = k.bordO; bdata = new BYTE [(w + b) * (h + b)]; memcpy(bdata,k.pbdata0,(w * h) * sizeof(char));
} //
// destructor
//
image::~ image()
{ if (bdata) delete bdata;
} //
//
X » c rH
93
///////////////////////////////////////////////////////////////////
//
// Utility functions
//
///////////////////////////////////////////////////////////////////
//
// load(if stream *f, WORD border = 0) - read image from file // args - f (ifstream *) input file stream // border (WORD) border size // side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error //
int image::load(ifstream *f, WORD border)
{ f » w » h >> z; // read header b = border;
int skip = border * 2; // set up some indexing constants int height = h + border; int width = w + border; int row = w + skip; int read_data;
// allocate data array, return -1 if allocation fails if ((bdata = new BYTE[(w + skip) * (h + skip)]) == NULL)
return -1;
for (int i = border; i < height; i++) // read data into active { // part of array
for (int j = border; j < width; j++)
{ f >> read_data;
bdata[(i * row) + j] = read.data;
}
}
if (border) // if there is a border, fill it with pixels from { // edge of active part of array
for (int k = border; k > 0; k—) { // copies top * bottom row into top * bottom border
iMiM^
94
memcpy(*(bdata[(k - 1) * row]), *(bdata[k * row]), row); memcpy(*(bdata[(height - k + 1) * row]),
*(bdata[(height - k) * row]), row); }
int stop = height + border; // create indexing constants int ndx;
for (int m = 0; m < stop; m++) // now copy left * right edge { // columns to left * right borders ndx = m * row;
for (int n = 0; n < border; n++) {
bdata [ndx + border - n - 1] = bdata [ndx + border]; bdata[ndx + width + n] = bdata[ndx + width];
} }
}
return 0;
} //
// save(ofstream *f) - write image to file // args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error int image::save(ofstream *f)
{ int x_ndx; int y_ndx = b * (w + (2 * b)); // set up indexing constants int x_max = w + b; // and variables int y_inc = x_max + b; int stop = y_inc * (h + b);
// write image to file
for (; y_ndx < stop ** !f.bad(); y_ndx += y_inc) for (x_ndx = b; x_ndx < x_max ** !f.bad(); x_ndx++)
f << bdata[y_ndx + x_ndx];
if (f.badO)
{ cerr << "Error saving file\n"; return -1;
95
>
return 0; } //
// conv.float(const kernel* k, const image* i)
// args - k (kernel *) kernel to be used m convolution
/' i (image *) image for convolution source //
// Perform convolution using floating point accumulator - allows // maximum dynamic range so sum won't overflow //
int image::conv_float(kernel* k, image* i) {
int ksize = k.widthO; // set up indexing constants int width = i.widthO; int height = i.height();
int skip = ksize - 1; float max = -1000.0, min = 1000.0; float *ker = k.pfdataO; // get pointer to FP kernel data float ace; // use float for accumulator
int *idata = new int [(width + skip) * (height + skip)]; if (idata == NULL) // allocate int array for temp target image
return -1; //if allocation failed, return error
BYTE **img = new BYTE *[ksize]; if (img == NULL) // allocate BYTE pointerpointer to source
return -1; // image conveniently (one pointer per row in // kernel)
img[0] = i.pbdataO; // point each at a row in source image for (int a = 1; a < ksize; a++)
img [a] = img[a - 1] + width + skip;
// this segment is the actual convolution for (int f = 0; f < height; f++) // for each column
{ for (int c = 0; c < width; C++) // of each row
{ ace =0.0; // clear accumulator, then
96
for (int d = 0; d < ksize; d++) // for each kernel column { for (mt e = 0; e < ksize; e++) // for each kernel row
'• // accumulate products ace = ace + (float)(ker[(d*ksize)+e] * img[d][e]);
® ~ ej // extra line for debug }
} if (ace >= max) // track maximum and minimum values
max = ace; // for normalization if (ace <= min)
min = ace; // convert float result to int // and place in int temp image
idata[((f + skip/2) * (width + skip)) + c + skip/2] = ace;
for (int p = 0; p < ksize;p++) // update source pointers img[p]++; } for (int q = 0; q < ksize; q++) // end of row - update source img[q] += skip; // pointers to column past border
} // end of convolution section
max = majc - min; // determine range of convolution output for (int g = 0, h = (width + skip) * (height + skip); g < h; g++)
bdata[g] = (unsigned char)(255.0 * ((float)idata[g] - min)/max); // normalize int array into BYTE array
delete img; // de-allocate local arrays delete idata; return 0;
//
// conv_int (const kernel* k, const image* i) // args - k (kernel ft) kernel to be used in convolution // i (image *) image for convolution source
//
// Perform convolution using floating point calculations. // Convert source image and kernel data to fractions to most // closely simulate board operation.
//
int image::conv_int (kernel* k, image* i)
97
{
int ksize = k.widthO; // make local indexing constants
int width = i.widthO; int height = i.height(); int skip = ksize - 1; int max = k.max_out(), min = k.min_out(); // get max range
char *ker = k.pbdataO; // get pointer to kernel BYTE array float ace, kk, ii;
BYTE **img = new BYTE *[ksize]; // allocate array of pointers if (img == NULL) return -1; // if alloc fails, return error img[0] = i.pbdataO; // simplify addressing for (int a = 1; a < ksize; a++)
img[a] = img[a - 1] + width + skip;
// Convolution section for (int f = 0; f < height; f++) // For each column, {
for (int c = 0; c < width; C++) // of each row of source image -
{ ace =0.0; // clear accumulator for (int d = 0; d < ksize; d++)// for each col of kernel
for (int e = 0; e < ksize; e++) // for each row of kernel { // convert kernel value to fraction
kk = (float)(ker[(d * ksize) + e])/127.0; // convert image value to fraction
li = (float)(img[d][e])/255.0;
// accumulate convolution value
ace = ace + (kk * ii);
} // convert float convolution value to normalized BYTE value
bdata[((f + skip/2) * (width + skip)) + c + skip/2] = 255 * ((ace * 127.0) - min)/(max - min);
for (int p = 0; p < ksize;p++) // advance pointer set to next img[p]++; // source column
} for (int q = 0; q < ksize; q++) // at end of row, skip border img[q] += skip;
}
98
// end of convolution section
delete img; return 0;
// deallocate local array
//
// eonv_aeeel(const kernel* k, const image* i) //
// args - k (kernel *) kernel to be used in convolution // i (image *) image for convolution source
//
// Perform convolution using hardware accelerator.
//
int image::conv_accel(kernel* k, image* i)
{
int X = w, y = h, kd = k.widthO; // Make local versions of int blk_sz = (kd * kd)/2 + 1 ; // variables to simplify int inc_x = kd - 1; // (and speed up) addressing BYTE *dt = bdata;
BYTE *blk = new BYTE [(kd * kd) + 1] ; // Allocate buffer static BYTE **img = new BYTE *[kd]; // Allocate pointer array
// source image
if ((blk == NULL) I I (img == NULL)) // if alloc fail, return
return -1; // error
img[0] = i.pbdataO; // set up pointer array for (int j = 1; j < kd; j++) // one pointer per kern row
img[j] = img[j - 1] + x + kd - 1; // pointing to first N rows of image
_asm _asm _asm _asm
_asm y_loop: _asm asm
push push
push push
mov
push mov
SI
di
es ds
ex.
ex ex.
y
X
;// preserve critical registers
;// set image row counter
;// save image row counter ;// set image column counter
•M
99
x_loop: _asm _asm .asm _asm _asm
b_loop: _asm
_asm _asm _asm _asm _asm _asm _asm _asm _asm
_asm _asm _asm _asm
_asm _asm _asm _asm
_asm
_asm _asm _asm _asm _asm _asm
_asm _asm
push mov mov les mov
push mov push Ids Ids repnz add pop pop loop
mov mov xor out
mov push Ids repnz
pop
m in m in m e out
les mov
ex ex,
dx, di.
bx,
ex ex, ds
S I ,
si,
kd ; ex ; [blk] ;
0 ;
dx ;
[img] ; [si + bx[
movsb ; bx, ds ex
b_l
ex, dx,
al. dx.
dx. ds
si. out
ds
al. al.
al, al. dx dx.
di. bx.
4 ;
oop ;
blk_sz ; COUNTER ; al ; al
DACS ;
[blk] ; sw ;
dx ; dx dx dx
al ;
[img] ; 0 ;
// save image coljjnn ccj.r.ter // set buffer xfer outer Ice:: comter // make copy for inner loop coujiter // set target pointer to buffer // set buffer offset
// save outer loop counter // move inner loop counter to count reg // save data segment for string ops // get source pointer for block row ;// from pointer axray
// copy current block row to buf // go to next pointer m pointer array // get data segment back to get next ptr // get outer loop counter // decrement counter, if != 0, loop
// Having copied data to buffer, // set buffer counter // point I/O address to COUNTER // clear value in COUNTER
// now point I/O address to DAC array // save data segment for string operation // set up source pointer to buffer // string I/O copy buffer to board // get data segment back
;// let board settle
;// start converter
// increment row pointers // point to pointer array // clear offset
100
_asm mov ex, kd ;// set counter
p_loop: _asm mov _asm inc _asm mov _asm add _asm loop
_asm mov not_done: _asm in _asm and _asm jnz
.asm
.asm
.asm asm .asm asm
mov in les mov inc mov
si, es: [di+bx]
si es:[di+bx], si bx, 4 p_loop
al, dx al, 0x80 not_done
dx, ADC al, dx di, [dt] es:[di], al di
// get pointer from array // increment it // store it // go to next pointer // decrement counter, if != 0, loop
dx, COUNTER ;// wait for ADC status bit in counter
// get status // isolate status // test it, and act
// ADC is ready now // read data // get current image pointer // store data // increment image pointer
WORD PTR [dt] , di ;// and store it
.asm
.asm
.asm
.asm asm
pop loop
mov mov les
ex x_loop
ex, kd bx, 0 di, [img]
;// get column counter ;// decrement counter, if != 0, loop
// if at last column, need to skip // pointer array past border
//
p_loop2: _asm mov _asm add _asm mov _asm add _asm loop
_asm mov _asm add asm mov
si, es: [di+bx] si, inc_x es:[di+bx], si bx, 4 p_loop2
// get pointer from array // increment by 2*border to next row // store updated pointer // point to next pointer // decrement counter, if != 0, loop
di, WORD PTR [dt] ;// and do the same for target image
di, inc_x ;// data pointer
WORD PTR [dt] , di
asm pop ex ;// now get row counter
101
_asm _asm
_asm
done: _asm
_asm _asm
_asm
dec
jz jmp
pop pop pop
pop
return 0;
ex done y_loop
ds es si di
// decrement counter, if == 0, done // this locution required because // eond jumps limited to +/-128 bytes
;// finished with convolution ;// restore registers
// return OK } //
//
// display(WORD x,WORD y) - display image (active area) at given // coordinates
// args - X, y (WORD) pixel coordinates for upper-left corner // of image in this object // return - void //
// Displays active image area with upper left corner of image at
// coordinates (x, y). Bounds checking is NOT performed - position // is responsibility of programmer. //
void image::display(WORD x,WORD y) { // int sereen_x, sereen_y;
if (!attached) // if image display libraxy not initialized
{ asm push es // save critical register attached = pe_attaeh(); // initialize graphics library pe_baekground_black(); // to 320 x 200 pe_gray_shades(); // with 64 level grey scale asm pop es // restore critical register
}
if (!attached) return;
// if graphics library init fails
// return without attempt to display
int row = w + 2 * b; // set an indexing constant
102
for (int i = b, sereen_y = y; i < h + b; i++, screen_y++) { // starting from active corner
for (int j = b, screen_x =x; j < w + b ; j++, screen_x++) { // display active area of image
asm push es // library does not preserve this register pe_write_pixel(sereen_y,sereen_x,bdata[(i*row)+j]/4);
asm pop es // scale pixel to 64-level grey }
} } / /
// hw_eonv(const image *I, BYTE *kernel) - 'C++' language version // very slow, used DMA, obsolete.
/ /
//void image::hw_eonv(const image *I, BYTE *kernel)
//{ // int bloek_size = kern.dim * kern_dim; // int counter = 0, stop = siz_x * siz.y; // BYTE *i_buf = new BYTE[bloek_size]; // int X = kern_dim/2, y = kern_dim/2; / /
// setup^dma(kernel,bloek_size,7);
// outp(COUNTER,0x80); // dma_eyele_nr(7);
/ /
// setup_dma(i_buf,bloek_size,7);
// I.get_block(i_buf);
/ /
// while(counter++ < stop)
// i II dma_eyele(7);
/ /
//.asm mov ex, OxOOFF; // need a short delay for board to settle
//here: //_asm loop here;
/ /
// outp(ADC,0); // I.get.block(NULL); // block.size = inp(ADC); // write.pixel(block.size,X,y); // if (++X > siz.x + kern.dim/2 - 1)
. iaE«»B**S8JST v:. -:
103
/ /
/ /
/ /
/ /
/ /
//} }
{ X =
y++
>
k e r n .
t
. d i m / 2 ;
/ / / /
//// hw_eonv_a(eonst image *I, BYTE *kernel) - Uses asm code for //// bulk of convolution. 16-bit xfers direct from image array ////
//// *** OBSOLETE ***
nil //void image::hw.eonv.a(eonst image *I, BYTE *kernel) //{
int block.size = (kern_dim*kem.dim)/2+l; int p; int X = (kern.dim/2)-1, y = kern.dim/2; int w = siz.x + kern.dim - 1; int sx = siz.x, sy = siz.y, kd = kern.dim;
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//
//.asm //.asm //.asm
//
//.asm //.asm //.asm
//
// set up indexing // constants
static BYTE *i.buf = new BYTE [block.size]; static BYTE **img = new BYTE *[kern.dim]; static BYTE *dt;
// static arrays // transfer block // source ptr array // target pointer
dt = data;
if (!i.buf I I !img) abort 0;
img[0] = I.get.img.ptrO ; for (int i = 1; i < kern.dim; i++)
img[i] = img[i - 1] + w;
// set target pointer
// if array alloc failed // die without explanation
// set source pointers
push si push di push bx
mov dx, COUNTER mov al, 0x80 out dx, al
;// save critical registers
;// set counter to access kernel
.V . 1
y
104
//.asm //.asm //.asm ll_a.3m
//.asm //.asm
II //.asm //.asm //.asm //.asm //.asm //.asm
II //k.loop //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm
II //.asm //y.loop //.asm //.asm
mov mov mov repnz mov out
mov mov mov push push pop
1: push mov mov repnz mov inc inc pop loop pop
mov ;
push mov
//x.loopl: //.asm
II //.asm //.asm
II //.asm //.asm //.asm //.asm
II
push
mov out
mov mov mov repnz
si. CX,
dx. out dx. dx.
CX,
bx.
di. es ds es
CX
CX,
si. mov
[kernel] block.size DACS sw
CLR.COUNT al
kd [img] [i_buf]
i i
kd [bx] sb
[bx] , si bx bx CX
k.l es
CX,
CX
CX,
CX
dx. dx.
si. CX,
dx. out
oopl
sy
sx
ADC al
[i_buf] block.size DACS sw
// put kernel in DACs
;// copy first block to transfer buffer
;// increment pointers
; // start ADC
// during conversion, xfer next // block to board (overlap)
d
105
//.asm //.asm
//.asm ll_a.3m
//.asm //.asm
II
mov mov mov push push pop
//k.loop2: //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm
II //.asm //.asm
II //.asm //.asm //.asm //.asm //.asm //.asm //.asm
II //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm
II //.asm
push mov mov repnz mov inc inc pop loop pop
mov in
mov mov imul add mov add mov
CX, kd bx, [img] di, [i.buf] es ds es
CX CX, kd si, [bx] movsb [bx], si bx bx CX k.loop2 es
dx, ADC al, dx
cl, al ax, y w ax, X bx, [dt] bx, ajc [bx], cl
mov ajc, X inc mov cmp jle mov shr mov inc
mov
ax X, ajc a^, w same.row a_x, kd ajc, 1 X, ajc
y
bx, [img]
;// copy next block to buffer
;// increment pointers
;// conversion should be done now, ;// get result
;// do some perverted and time-;// consuming address calculation
;// store result in target image
;// yet more addressing
Mij^<
106
II^asm mov //_asm mov //.asm dec //k_loop3:
mov add mov inc inc loop
row pop loop pop loop
pop pop pop
ex, kd ax , ex ax
s i , [bx] s i , ax [bx] , s i bx bx k_loop3
ex x.loopl ex y.loop
bx di si
//.asm //.asm //.asm //.asm //.asm //.asm
//
//same. //.asm //.asm //.asm //.asm
//
//.asm //.asm //.asm
//
//} ////
//// hw.eonv.8.bloek(eonst image *I) - accel convolution using //// 8-bit transfers and intermediate buffer.
nil nil *** OBSOLETE ***
nil //void image::hw.conv_8_bloek(const image *I) //{
int blk.sz = kern.dim * kern.dim; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1;
//
//
//
//
//
//
//
//
//
//
//
BYTE *dt = d a t a ; BYTE *blk = new BYTE [ b l k . s z ] ; s t a t i c BYTE **img = new BYTE * [ k d ] ;
img[0] = I . g e t . i m g . p t r O ; f o r ( i n t i = 1; i < kd; i++)
img[ i ] = img[i - 1] + s i z . x + kd - 1;
riiev';^ .J22::U
107
II II.asm //.asm //.asm //.asm //.asm
II //y.loop //.asm //.asm
II //x.loop //.asm //.asm //.asm //.asm //.asm
II //b.loop //.asm //.asm //.asm //.asm //.asm //.asm //.asm II _a.sm //.asm //.asm
II //.asm //.asm //.asm //.asm
II //.asm //.asm //.asm //.asm //.asm
II
push push push push mov
push mov
push mov mov les mov
push mov push Ids Ids repnz add pop pop loop
mov mov xor out
mov push Ids repnz pop
si
di es ds CX, y
ex ex, X
ex ex, kd dx, ex di, [blk] bx, 0
ex ex, dx ds si, [img] si, [si + bx] movsb bx, 4 ds ex b.loop
ex, blk.sz dx, COUNTER al, al dx, al
dx, DACS ds si, [blk] outsb ds
^.V«4
108
//.asm //.asm //.asm //.asm //.asm //.asm
II //.asm //.asm //.asm
II //p.loop: //.asm //.asm //.asm //.asm //.asm
II //.asm
in in in in inc out
les mov mov
mov inc mov add loop
mov //not.done: //.asm //.asm //.asm
II II_asm //.asm //.asm //.asm //.asm //.asm
II //.asm //.asm
II //.asm //.asm //.asm
II //p.loop2 //.asm //.asm
in and jnz
mov in les mov inc mov
pop loop
mov mov les
mov add
al. al. al. al. dx dx,
di, bx. ex.
si, si es: bx.
dx ;// let dx dx dx
al ;// sta
[img] 0 kd
es: [di+bx]
[di+bx], si 4
p.loop
dx.
al. al. not
dx.
al. di. es: di
COUNTER
dx 0x80 .done
ADC dx [dt] [di], al
WORD PTR [dt], di
ex x.l
CX,
bx.
di.
si.
si.
oop
kd 0 [img]
es: [di+bx] inc.x
• m^^m^^^i^^^mamammmmmmmmmmMii^)MA
109
//.asm //.asm //.asm //.asm //.asm //.asm II //.asm //.asm //.asm //.asm II //done: //.asm //.asm //.asm //.asm in
mov add l o o p mov add mov
pop dee
j z jnip
pop
pop pop pop
e s : [ d i + b x ] , s i b x , 4
p . l o o p 2 d i , WORD PTR [ d t ] d i , i n c . x WORD PTR [ d t ] , d i
ex ex done y . l o o p
ds
e s s i d i
/ / / /
//// hw_conv.l6.block(eonst image *I, const kernel *k) - conv using //// 16-bit xfers and intermediate buffer. Fastest of these versions
nil nil *** OBSOLETE ***
nil //void image::hw.conv_16.bloek(const image &I) //{
int blk.sz = (kern.dim * kern.dim)/2 + 1; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1;
//
//
//
//
//
//
//
//
//
//
//
//
//.asm //.asm //.asm
BYTE *dt = data; BYTE *blk = new BYTE [(kern.dim * kern.dim) + 1] ; static BYTE **img = new BYTE *[kd] ;
img[0] = I.get.img.ptrO; for (int i = 1; i < kd; i++)
img[i] = img[i - 1] + siz.x + kd - 1;
push push push
S I
di es
^mmi^\}rJl, - ^ I I T I •lliTTrjMii KR=:;^Ti
no //.asm
//.asm
II //y.loop //.asm //.asm
II //x.loop: //.asm //.asm //.asm //.asm //.asm
II //b.loop: //.asm //.asm //.asm //.asm //.asm II _a.sm //.asm //.asm
//.asm //.asm
II //.asm //.asm //.asm //.asm
II //.asm //.asm //.asm //.asm //.asm
II //.asm //.asm //.asm //.asm
push mov
push mov
push mov mov les mov
push mov push Ids Ids repnz add pop pop loop
mov mov xor out
mov push Ids repnz pop
in in in in
ds ex.
ex ex.
ex ex. dx.
di. bx.
ex CX,
ds si.
si.
y
X
kd ex [blk] 0
dx
[img] [si + bx]
movsb bx, ds ex
4
b.loop
ex. dx.
al. dx.
dx. ds si. out! ds
al. al. al. al.
blk.sz COUNTER al al
DACS
[blk] 3W
dx ;// dx dx dx
;II let board settle
^^«a
I l l //_asm
II.asm II II.asm II.asm II.asm II //p.loop //.asm
//.asm //.asm
II.asm II.asm II //.asm
inc out
les mov mov
;
mov inc mov add loop
mov //not.done:
11.asm //.asm //.asm
II II.asm II.asm II.asm //.asm //.asm //hi: //.asm //.asm
II.asm II.asm II II.asm //.asm
II //.asm
11.asm II.asm II //p_loop2 //.asm //.asm
in and jnz
mov in cmp jae xor
les mov inc mov
pop loop
mov mov les
•
mov add
dx
dx, al ;// St
di, [img] bx, 0 CX, kd
si, es:[di+bx] si
es:[di+bx], si bx, 4 p.loop
dx, COUNTER
al, dx al, 0x80 not.done
dx, ADC al, dx al, 128 hi al, al
di, [dt] es:[di], al di WORD PTR [dt], di
ex x.loop
ex, kd bx, 0 di, [img]
si, es: [di+bx] si, inc.x
3S
^tMM mK'iiwriiiii'i'ii ififTiTigrriTgiiTBi ^ssm
112
//.asm
//.asm
//.asm
//.asm
//.asm
//.asm
II //.asm
//.asm
//.asm
//.asm
II //done:
//.asm
//.asm
//.asm
II.asm //}
mov
add
loop
mov
add
mov
pop
dee
jz
jnip
pop
pop
pop
pop
es:[di+bx], si bx, 4
p_loop2
di, WORD PTR [dt]
di, inc.x
WORD PTR [dt] , di
ex
ex
done
y.loop
ds
es
si
di
/ / / /
//// hw.eonv.8.nobIk(const image *I) - using 8-bit transfers //// direct from image array. Slowest of ASM versions.
nil //void image::hw_conv_8.noblk(const image &I) //{
int blk.sz = kern.dim * kern.dim; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1; int size = ((siz.x + kd - 1) * (siz.y + kd - l))/2;
//
//
//
//
//
//
//
//
//
//
//
//
//
//.asm //.asm //.asm //.asm
BYTE *dt = data; BYTE *blk = new BYTE [blk.sz]; static BYTE **img = new BYTE *[kd];
img[0] = I . g e t . i m g . p t r O ; for ( i n t i = 1; i < kd; i++)
img[i] = img[i - 1] + s i z . x + kd - 1;
push s i push di push es push ds
. > 1. » • r . >. > t Pm*rjM^ •• • > j -»-mim».j
113
II 11.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm
II II.asm //.asm //.asm //.asm
II //y.loop: //.asm //.asm II //x.loop: II.asm //.asm II.asm II.asm II II.asm //.asm //.asm //.asm //.asm
II //b.loop: //.asm II.asm //.asm //.asm //.asm //.asm 11.asm II //.asm
les Ids push push pop pop mov
pop push mov dee
di, [img] si, es: [di] ds si di es ex, size
repnz movsw
pop loop
pop
ds ds CX, y ex
push ex mov ex, X
push ex mov ex, kd mov di, ex les bx, [img]
mov dx, COUNTER xor al, al out dx, al mov dx, DACS push ds
push ex mov ex, di Ids si, es:[bx] repnz outsb add bx, 4
ex b.loop
ds
mMk
114 II.asm //.asm
11.asm II.asm 11.asm //.asm
II //.asm
//.asm //.asm
II //p-loop //.asm
11.asm II.asm II.asm //.asm
II //.asm
in in in in inc out
les mov mov
mov inc mov add loop
mov //not.done:
II.asm //.asm //.asm
II II.asm II.asm //.asm //.asm
II.asm //.asm
II II.asm //.asm
II //.asm
II.asm //.asm
II //p.loop2 //.asm
II.asm
in and jnz
mov in les mov inc mov
pop loop
mov mov les
;
mov add
al. al, al, al, dx dx.
di. bx. ex.
si. si es: bx.
dx ;// lei dx dx dx
al ;// st«
[img] 0 kd
es: [di+bx]
[di+bx], si 4
p.loop
dx,
al. al. not
dx.
al. di. es: di
COUNTER .
dx 0x80 .done
ADC dx [dt]
[di], al
WORD PTR [dt], di
CX
x.loop
CX,
bx.
di.
si.
si.
kd 0 [img]
es: [di+bx] inc.x
vs:-^/':'v:^^ ........^s^,..,^*,^^.^,^^.-*^ m^^n.. .,3j..
115
//.asm //.asm //.asm II.asm //.asm //.asm
II //.asm //.asm
II.asm //.asm
II //done: //.asm
//.asm
II.asm
mov add loop mov add mov
pop dec
jz jmp
pop
pop pop
es:[di+bx], si bx, 4 p_loop2 di, WORD PTR [dt] di, inc.x WORD PTR [dt], di
ex ex done y.loop
ds es si
//.asm pop di
//} IIII III II III II IIIII IIIII Illill I III II IIIII nil II IIIII III III 11 nil III I II II File POT.H - define the class POT, which interfaces to the // Dallas Semiconductor Digital Potentiometers that // calibrate the board. //
enum pot.t {ZERO = 0x40, GAIN = 0x80};
class pot
{
long int value; int which;
void shift.it0;
public: pot(int which.one, long int init = 0x0080);
void operator ++(void); void operator —(void);
};
#include <dos.h> #include "always.h" #include "board.h"
- II ' 'inrilliy'i I I I 'I iMf'^rrrrvrxs^y,-,.,
116
#include "pot.h"
pot::pot(int which.one, long int init) {
which = which.one; value = init; shift.it0 ;
>
void pot::operator ++(void) {
if (value < OxOOOFFL) value++;
else {
if ((value > OxlOOOOL) ** (value < OxlFFOOL)) value += 0x00100;
else {
if (value == OxOOOFFL) value = OxlOOOOL;
else {
if (value == OxlFFOOL) value = OxOOOOOL;
} }
}
shift.itO;
void pot::operator —(void)
{ if (value < OxOOOFFL)
value—; else {
if ((value > OxlOOOOL) ** (value < OxlFFOOL)) value -= 0x00lOOL;
else
mg^' s^t^B^mmmmmssmmm
117
if (value == OxlOOOOL) value = OxOOOFFL;
else {
if (value == OxOOOOOL) value = OxlFFOOL;
>
shift.it();
void pot::shift.itO {
outp(COUNTER,which);
for (int i = 17; i > 0; i—) {
outp(POTS, ((BYTE)(value » (i - 1)) * 0x01)); }
outp(COUNTER,0); } ////////////////////////////////////////////////////////////////////
//
// File: UI.CPP // Author: Donald A. Symes // Purpose: Operate image and kernel classes to perform convolution. //
////////////////////////////////////////////////////////////////////
#define UICPP #inelude <conio.h> #inelude <iostream.h>
#include "always.h" #include "kernel.h" #include "image.h"
extern "C" { #include "pcdip.h"
lAlMl MMBM^Mife.
118
void main(int arge, chax **argv) {
ifstream f(argv[l]);
kernel k(f); f -closeO ;
// open kernel file
// create object kernel from file
f.open(argv[2]); image i(f, k.width()/2); f .closeO ;
// open image file // create image object from file
i.display(0,0); // display raw image
image j(i.widthO,i.height(),i.bord()); // create target image
j.conv_float(k,i); j.display(155,0);
j.conv.int(k,i); j.display(0,101);
j.conv.accel(k,i); j.display(155,101);
while(!kbhit()); getchO ; pc.detachO ; arge = arge;
// perform floating point convolution // display result
// use fractions this time // display that
// use accelerator
// wait for user to be done looking
// restore display to text mode // suppress warning for unused variable
S^BSSSSS ^^jMsMiaem
PERMISSION TO COPY
In presenting this thesis in partial fulfillment of the requirements for a
master's degree at Texas Tech University or Texas Tech University Health Sciences
Center, I agree that the Library and my major department shall make it freely
available for research purposes. Permission to copy this thesis for scholarly
purposes may be granted by the Director of the Library or my major professor.
It is understood that any copying or publication of this thesis for financial gain
shall not be allowed without my further written permission and that any user
may be liable for copyright infringement.
Agree (Permission is granted.)
/ ^ ^f^l-'^ f puSFs^ifnil ture Date
Disagree (Permission is not granted.)
Student's Signature Date
t'
dMi^ma