TWO-DIMENSIONAL IMAGE CONVOLUTION A THESIS IN …

TWO-DIMENSIONAL IMAGE CONVOLUTION

BY ANALOG COMPUTATION

by

DONALD ALLEN SYMES, B.S.T.

A THESIS

IN

COMPUTER SCIENCE

Submitted to the Graduate Faculty of Texas Tech University in

Par t ia l Fulfillment of the Requirements for

the Degree of

MASTER OF SCIENCE

Approved

May, 1997

re Q

m T^ \^^/\

rio.34

/^r'^-)0/>

ACKNOWLEDGEMENTS

I wish to thank Dr. Li, Dr. Oldham, and Dr. Wunsch for their patience,

^ -0 understanding and support. I also wish to thank the Ofiice of Naval Research for

providing the funding for the research, of which the subject of this thesis is part.

Mostly, I wish to thank my beloved wife, Cathy, without whom my life would have

neither rhyme nor reason.

11

TABLE OF CONTENTS

ACKNOWLEDGEMENTS ii

LIST OF TABLES v

LIST OF FIGURES vi

CHAPTER

I. INTRODUCTION 1

1.1 Origins 1 1.2 Current Work 2

1.2.1 Neuromorphic systems 2 1.2.1.1 Parallel Analog Computation 3 1.2.1.2 Drawbacks to Analog 4

1.2.2 Algorithmic systems 5 1.3 This Work 6

II. MATHEMATICAL BASIS 7 2.1 Convolution 7

2.1.1 The Kernel 7 2.2 Correlation 8 2.3 Computational Expense 9 2.4 Problem Statement 10 2.5 Design 10

2.5.1 Voltage Reference 10 2.5.2 Multiplier Array 10 2.5.3 Summing Amphfier 11 2.5.4 Analog-to-Digital Converter 12 2.5.5 ISA Interface 12

2.5.5.1 Address Decoder 13 2.5.5.2 Index Register/Counter 14

III. PROCESS 16 3.1 Kernel 16 3.2 Calibration 16

3.2.1 Tables 17 3.2.2 Adjustment 18

3.3 Image 18 3.3.1 Loading 19 3.3.2 Border Fill 19

111

• I B ^

3.3.3 Convolution 19 3.4 Addressing constraints and penalties 20 3.5 Autocorrelation 22

3.5.1 Awkward Programming 23 3.5.2 Bus Traffic 23 3.5.3 Dynamic Range 23

IV. RESULTS 26 4.1 Analog Performance 26

4.1.1 Noise 26 4.1.2 Linearity, Accuracy and Precision 26

4.2 MultipUer Speed 28 4.3 Convolution Result 28 4.4 Computational Performance 30 4.5 Comparison with a Digital Parallel System 32

V. CONCLUSION 35 5.1 Future Work 35

REFERENCES 37

APPENDIX

A. SCHEMATICS 40

B. SOURCE CODE 66

IV

LIST OF TABLES

3.1 Floating-Point Edge-Enhancing Kernel 25

3.2 Normalized Integer Edge-Enhancing Kernel 25

LIST OF FIGURES

2.1 MultipUer Schematic. 15

3.1 Graphic Representation of Convolution. 21

4.1 Graph of Linearity. 29

4.2 Test Image. 31

4.3 Software Convolution. 31

4.4 Convolution by Accelerator. 34

VI

CHAPTER I

INTRODUCTION

Vision computing has been a subject of tremendous research interest since the

days of the original perceptron in the 1940's [McClelland 88]. The last decade or

so has shown a great deal of progress in our understanding of, at least, the early

stages of vision and of the underlying structures and processes that perform these

early processes in biological systems.

Despite the ever-increasing speed and power of digital computing systems, the

consensus appears to be that any practical vision system will, necessarily, be a

highly parallel, probably analog, computing system.

1.1 Origins

The eaxUest successful vision computer was the perceptron built by McCullogh

and Pitts in the nineteen forties. This system is more commonly viewed as one of

the early successes in machine intelligence and in neurocomputing. It was a large

array of "trainable" analog computing elements attached to a crude vision sensor.

The system was trained to distinguish between male and female faces and was

quite successful. The computation was performed by a number of simple parallel

analog computational elements arranged in highly interconnected layers.

Electronics technology was at the vacuum tube stage at that time, which

proved to be the main Umiting factor in building computers. Digital computers

were a rare and expensive luxury at that time also, and were far too limited in

power and storage density to offer a practical alternative.

Since that time, the vacuum tube has been replaced by the transistor, and

VLSI (Very Large Scale Integration) techniques have allowed the reUable

placement of millions of transistors in an area the size of a thumbnail. Research

into the fundamentals of visual neuroanatomy has also dramatically improved our

understanding of early visual structures and the processes they perform. The

rather crude perceptron has been replaced by circuits that more closely mimic the

response of the real biological retina.

1.2 Current Work

Two approaches dominate current research. The first is what Carver Mead

calls the "neuromorphic" — circuits that integrate acquisition and processing on a

single VLSI device. The more algortihmic approach reHes on more readily

available acquisition technology (typically CCD image sensors) and processes that

sensor's rasterized, serial output.

1.2.1 Neuromorphic systems

The neuromorphic vision chips, like the natural retina are focal-plane

processors that integrate sensing and considerable processing in a highly

compacted and interconnected array. They are by definition, massively parallel

analog processors. They exploit the nonUnear properties of transistors to perform.

in a single step, calculations that would require several steps of a digital

floating-point unit [Koch, Mathur 96]. Further, because each node on the device

an independent sense-and-compute unit, the calcuations proceed in parallel,

meaning that a complete result is available after only one step [Churchland 92].

Most of the work on silicon retinae refies on simple resistive grids to perform

the locally-weighted averaging that produces the desired gain adjustments in

response to global intensity, and also the antagonistic center-surround response

that enhances edges [Koch, Mathur 96]. The drawback to the simple resistive grid

is its linearity. The elements of the grid can be selected to perform simple hnear

computations, and once chosen, are fixed [Mead 90].

Integrating concepts from the Cellular Nonlinear Network (CNN), or more

accurately, adding photosensing to a CNN, allows a much wider range of local

mathematical operations [Roska, Chua 93]. The main drawback to the CNN at

this point is its relatively large areal requirement compared to the simpler cells —

from 4 to 60 times larger according to comparisons by Koch and Mathur

[Koch, Mathur 96].

1.2.1.1 Parallel Analog Computation

Analog processing elements are far less flexible than general-purpose digital

processors, but the vision algorithms that model early visual processing are also

fairly fixed, so building single-purpose hardware for vision systems is reasonable.

The chief benefit of going to analog processing is the extreme computational

density it allows compared to digital computation.

An analog multiplier circuit requires as few as four transistors; an eight-bit

digital multiplier uses over two hundred. For roughly the same device count, one

can perform fifty multiplications in analog, in parallel, with the same precision and

accuracy as the digital part. This huge increase in computational density comes

from the fact that analog computation reUes for its operation on the fundamental

laws of physics, rather than on some artificial algebra - addition and subtraction,

for example, are direct results of the law of conservation of charge [ElUott 92].

1.2.1.2 Drawbacks to Analog

The advantages of analog computation do not come free, however. Noise,

thermal drift, component matching, resistive and reactive loading and coupHng are

also direct results of those same fundamental laws, and all adversely affect the

operation of analog computational circuits. The same VLSI techniques that allow

the placement of over one million devices on a die for digital applications allow

very high density analog circuits as well. Placing an entire complex analog

operation on a single die is also one of the best ways of minimizing the effects of

all of the drawbacks to analog computing.

Placing the entire analog computational circuit on a single die limits the effects

of thermal drift because all of the components are drifting in the same direction

5

and are at very nearly the same temperature. Resistive and reactive loading and

coupling are also reduced by the very short wiring runs between components.

Component matching is usually quite good, variations being more of a

wafer/batch/process issue than affecting individual devices on a single die.

Circuits for trimming and tuning are also more readily implemented directly on

the die and programmed during post-fab testing than by providing access for

possibly thousands of external trim adjustments. For an example, see [Mead 90].

Noise is a complex issue, but part of its solution Hes in making the circuit as

compact as possible and you CcUinot get a circuit much more compact than a

single VLSI device.

1.2.2 Algorithmic systems

The objective of neuromorphic systems is to provide a general purpose

artificial vision system that is as flexible and adaptable as natural systems. They

are, by definition, custom VLSI devices, often large, difficult and expensive to

produce. As apphcations for these devices appear, and they move closer to

commercial production, most of these drawabacks will recede. In the meantime

there are a number of applications with more constrained requirements, both in

terms of the visual task required, and the funds available for implementation.

As an example, consider a system to monitor motorist compUance with

regulations governing access to commuter lanes during high-traffic hours. As a

6

civic project funds will be sharply constrained. On the other hand, the visual

environment is nicely constrained — moving vehicles in a well-defined lane with

nearly no background clutter. Careful selection of camera, angle and

region-of-interest greatly simplify the remaining task of locating edges and

tracking moving objects against an uncluttered background. An algorithmic

system — a general purpose microcomputer with some additional hardware and

software — is adequate.

1.3 This Work

The top-level research in this group is the hardware implementation of a

" Gabor" filter — a motion-sensitive and directionally selective filter designed to

extract motion information such as direction and velocity from a sequence of

images. Work is proceeding in stages, this stage being the implementation of a

proof-of- concept model in support of the next stage — a VLSI implementation of

a parallel analog multiplier designed to accelerate image convolution.

CHAPTER II

MATHEMATICAL BASIS

Convolution is a simple and well understood, but computationally expensive

algorithm that can, with a well-chosen kernel function, enhance edges and make

motion estimation much easier. It is the most common process in image

processing and vision computing, whether explicitly defined as a subprocess, or

noted as an effect (antagonistic center-surround response) [Koch, Mathur 96].

2.1 Convolution

Convolution is the process of multiplying two functions against one another at

each point in both functions. In image processing this is a two-dimensional process

described by the following integral:

/

OC rcc

/ I{^ - iy - 'n)Ki:'n)didT], -OC J— OC

(2.1)

where h{x.y) is the system characteristic equation (kernel) and f{x. y) is the

input.

2.1.1 The Kernel

The kernel used in convolution determines its function. Kernels may be

selected to perform high-pass (sharpening), low-pass (blurring) or bandpass

functions, or other visual effects such as "embossing." For edge detection, the

8

kernel of choice is a Laplacian-of-Gaussian (LoG) which is created using equation

2.2

k{i,j) = iiiidi

2^2 (2.2)

where a determines the spread of the central positive lobe of the function.

Adjusting a adjusts the sensitivity and selectivity of the edge-detection response.

Vision sensors, however, are not continuous, the image is sampled in two

dimensions. If Ax and Ay are considered the sampling intervals for the image,

then the discrete version of the image can be sampled from the continuous image

by the following relation.

/(771.772) = fc{niAx,n2Ay), (2.3)

where rii G [ 0 . . . . . Ni]. 772 G [ 0 . . . . . N2]^ the integral can then be recast as a

summation: N1-1N2-1

9{ni.n2) = E E / (^ i -k.n2- l)h{k., I)., (2.4) m = 0 Ti2=0

where h{k. I) is a 2-dimensional kernel and f(ni, 712) is the sampled image.

2.2 Correlation

Correlation, a measure of the similarity of two images, is a very similar process

with a nearly identical integral:

/

OC y o c

-OC J— OC (2.5)

9

which results in a very similar double sum:

Nx-l N2-I

9{ni:n2) = E E / ( ^ - ^ 1 : / - n2)h{k. I). (2.6) Til = 0 T l 2 = 0

Computationally, correlation amounts to convolution with a kernel that is the

same size as the image. Correlation is of interest in image and character

recognition, neural computing, and signal processing.

2.3 Computational Expense

The computational expense of both convolution and correlation is high. Both

are inner product matrix equations which require, for an 77 - pixel kernel and an

N"^ - pixel image, a total of n'^N'^ multiphcations. To be considered real-time, a

system should accommodate an image rate of at least 15 frames per second. A

512 X 512 - pixel monochrome system with a resolution of eight bits per pixel, at

15 frames per second, must handle a dataflow rate of 512^ x 15 = 3, 932.160 bytes

per second (3.75 Mbytes per second). A color system could easily triple that

requirement. This is the bandwidth consumed just getting the image into the

system. Processing the image once it is available consumes considerably more

bandwidth. To perform convolution on that 512 x 512 image using a 7 x 7 kernel

will require 512^ x 72 products per frame at 15 frames per second, for a total of

512^ X 72 X 15 = 192. 675.840 products per second. Correlation, where n = N^

requires .V* multiplications. For our example, a 512 x 512 image requires

68,719,476,736 multiphcations per frame, or 1,030,792,151,040 per second (at 15

10

frames per second). Add to these figures the bandwidth consumed by the

necessary code fetches, I /O, and other data manipulations; and the minimum

required bandwidth goes far beyond the foreseeable capacity of any general-

purpose computing machine that can reasonably be made portable.

2.4 Problem Statement

A hardware convolution accelerator supporting a 7 x 7 kernel was be

implemented using standard, off-the-shelf components. The accelerator takes

advantage of the parallel-friendly nature of the convolution double sum (Equation

2.4) and will perform the products-and-summation calculations in analog. The

accelerator resides as a peripheral on the ISA bus. Speed and accuracy of the

accelerator are compared to the host computer's CPU in section 4.1.

2.5 Design

The accelerator design consists of a few simple blocks: a voltage reference, the

multiplier array and a summing amplifier, an analog-to-digital converter and its

calibration circuit and an ISA bus interface.

2.5.1 Voltage Reference

The system reference voltage is nominally 1.2V. Because both the DACs and

ADC use the same reference in a closed, ratiometric fashion, absolute accuracy

and precision of the reference are not critical.

11

2.5.2 Multipher Array

The multipher array consists of 49 instances of a DAC-based muhipher. The

multipher requires a pair of DACs, one for unsigned image data (DAC A) and one

for signed kernel data (DAC B). For this impelementation, an Analog Devices

AD7528 dual 8-bit muhtplying DAC was chosen for its low cost, good inherent

precision, and simple interface. The array is divided into six individually buffered

rows of 8 DAC devices. These rows are paired into a high byte and a low byte to

provide a 16-bit interface. The 49th DAC device resides in the low byte of the

interface, and does not have a redrive buffer. A schematic for a single multipliers

is shown in Figure 2.1. This circuit is an adaptation of a pair of multipher circuits

found in the AD7528 data sheet [Analog Devices 92].

DAC A multiphes the reference voltage by the fraction "'"^12^"''" ' producing a

negative output voltage varying between 0 and -1.2V. DAC B uses this voltage as

its reference input, which it multiphes by the fraction —kernel^ 256 '^j producing an

output voltage varying between -1.2V and 1.2V. Thus, the transfer function of the

multipher is:

V, —pixel—kernel (pixel){kernel)

^^' 256 128 • 32768 ' (2.7)

12

2.5.3 Summing Amphfier

Summing 49 of these multiplier outputs with a summing amplifier produces

the overall transfer function:

1.2 ^ ' Vx,y = gain{offsei -}- E E vi^^^h.ykerneh^y). (2.8)

x=Qy=Q

The double sum at the heart of this transfer function is exactly the function

required for a 7 x 7 convolution on a single pixel. The gain factor and the offset

term are set by the calibration circuit.

2.5.4 Analog-to-Digital Converter

The result of the convolution sum-of-products is converted back to digital form

by an 8-bit ADC and read back into the host system's memory. Like any system

that interfaces between the analog and digital worlds, cahbration is required.

Calibration is automated by providing a pair of digitally-controlled

potentiometers, one to set the summing amplifier's offset voltage, and another to

set its gain. With these two adjustments under software control, the system can

match the output of the multipher array and summation function with the input

range of the ADC, thus reducing the effects of the inherent offset and gain errors

in both the DACs and the ADC. The cahbration process is explained in 3.2.

13

2.5.5 ISA Interface

The ISA bus is a simple, moderate-speed I/O channel. At 16 bits wide, with a

125nS clock and a minimum 3-cycle I /O transaction, the ISA bus has a theoretical

maximum transfer rate of 5.3 Mbytes per second. If this rate could be sustained in

a convolution with a 7 x 7 kernel, an image up to 85 x 85 pixels could be

processed at 15 frames per second (5.3e6/(49 x 15) = 85^). This rate cannot, of

course, be sustained. Code fetches and computation consume a substantial

fraction of the available bandwidth.

The ISA interface consists of two subsections: an address decoder and an index

register/counter.

2.5.5.1 Address Decoder

The ISA bus defines a 1024-byte I /O map distinct from the memory map. The

ISA standard also sets required locations for standard I /O functions like serial

channels and storage adapters [IBM 85]. To maximize compatibility with the

largest possible array of host systems, new hardware should be designed to occupy

one of the spaces in the I /O map that are defined as being otherwise vacant, and

with the abundance of non-standard hardware in the marketplace, the prudent

designer ensures that his hardware occupies the smallest possible footprint in the

I /O map, and that its base address is readily configurable. Our accelerator board

is equipped with a switch-programmable comparator that sets the base address of

14

the board. The largest contiguous vacant space is 80 bytes, which is too small to

accommodate the 98 DACs directly [Hogan 88]. Instead, we use a technique

common to complex adapters — indexing.

2.5.5.2 Index Register/Counter

Indexing is a technique that allows any number of data registers to be mapped

to a single I /O address and accessed by first writing the index number of the

register of interest to a separate index register. This board is equipped with an

index register/counter that maps sequential pairs from the array of 98 multiplier

DACs to a single 16-bit location. To minimize indexing overhead during

processing, the register/counter auto-increments following each write access to a

DAC, such that, on the next write access to the DAC data location, the next pair

of DACs in the sequence will automatically be selected.

15

ISTOSS-^yftf^^ I

iw I Muur ^ " u

Qg^^^AAd

Figure 2.1: Multipher Schematic.

CHAPTER III

PROCESS

This section describes the convolution process as it will be performed using

this accelerator board

3.1 Kernel

Processing an image begins with reading a kernel file, which is a text file

containing a header fine with the dimensions of the kernel and the value of gamma

on which that kernel is based (if any). This header is followed by an array of

floating-point values for use in the kernel. A typical edge-enhancing kernel file is

shown in Table 3.1.

Before this kernel can be loaded into the multiplier array, these values must be

converted to normalized, signed 7-bit binary values by the formula

kx.y — focy axiikerneil) "> ^here f^^y are the floating-point values from the kernel file,

kx.y are the binary values, and is the largest absolute value in the kernel file. This

would yield the kernel shown in Table 3.2, which can then be loaded into the

kernel section of the multipher array.

3.2 Calibration

Once a kernel is loaded, the board must be calibrated. Calibration attempts to

minimize the inherent gain and offset errors of the DACs, multiplier array,

summing amphfier and ADC by adjusting the minimum and maximum outputs of

16

17

the array-summer combination for a given kernel such that it uses the entire input

range of the ADC. There are several steps to cahbration.

3.2.1 Tables

Two 7 x 7 tables must be constructed: one to produce maximum output, and

one to produce minimum output. The two tables are built by placing a value of

255 (the maxium possible pixel value) in the appropriate locations such that, for

the example kernel:

0 0 0 0 0 0 0

0 0 0 255 0 0 0

I i ii\JLJU ^ "jy ^—

2 5 5 : ^ 6 ^ 2 ; ^ > 0

0:A;er7Zx V 0

0 0 255 255 255 0 0

= 0 255 255 255 255 255 0

0 0 255 255 255 0 0

(3.1)

0 0 0 255 0 0 0

0 0 0 0 0 0 0

and

255 255 255 255 255 255 255

255 255 255 0 255 255 255

18

rnin^_y = < 2^h\kerTij. y < 0

O'.kern^r^y > 0

255 255 0 0 0 255 255

= 255 0 0 0 0 0 255

255 255 0 0 0 255 255

(3.2)

255 255 255 0 255 255 255

255 255 255 255 255 255 255

3.2.2 Adjustment

By alternately loading these two tables into the image section of the multiplier

array, and adjusting the two digital potentiomenters that control the summing

amplifier's gain and off'set, the system is calibrated. First, the minimizing table is

loaded, and the offset pot is adjusted until the ADC produces the minimum

output. Then the maximizing table is loaded and gain is adjusted until the

maximum output is achieved. Then the minimum is checked and adjusted if

necessary, then the maximum is checked again. Iterating between the minimum

and maximum until either no more adjustment is required, or until some

maximum number of iterations is reached, the system settles on a combination

that is as well calibrated as the resolution of the digital pots allows.

19

3.3 Image

Once the kernel is loaded and the board calibrated, the image file may be

loaded and processed. The image file has a format similar to the kernel file in that

it is a text file, and the first line is a header with the dimensions of the image and

the maximum pixel value in the file.

3.3.1 Loading

If the kernel is N x N. and the image is S x T. allocate a memory array of

(5 -\- N — I) X {T -\- N — 1) bytes and read the image file into the S x T window at

the center of the memory array such that there is an ^^^ border surrounding the

image.

3.3.2 Border FiU

Because convolution exhibits an ahasing effect when the edge of the N x X

kernel moves beyond the edge of the image, yet we intend to process the entire

image, we must find a way to ehminate the edge effect. A simple way to ehminate

the edge effect is to create an fracN — 12 border around the image (described

above), and fill that space with repetitions of the pixels that make up the edge of

the image.

20

3.3.3 Convolution

Convolution begins by copying a kernel-sized array of pixels from the image

into the image side of the multipher array. After a brief setthng time, the ADC is

started, the image data pointers are incremented, and the next block may be

loaded while conversion is proceeding. Incrementing the data pointers amounts, in

graphical terms, to moving the image data window one pixel to the right.

By the time the loading of block *S -1- 1 is complete. Analog-to-digital

conversion of the convolution of block S is also complete, and its result may be

placed in the appropriate pixel location of the target image. See Figure 3.1.

For an 5" X T image and an N x N kernel, after loading block 5, the data

pointers are incremented not by 1, but by N — 1, bypassing the added border

pixels in order to move the data window to the beginning of the next row. After

processing T rows of S pixels, the image convolution is complete.

3.4 Addressing constraints and penalties

Experimentation has shown that copying the 7 rows of 7 bytes from the image

data into a dedicated 50-byte buffer, and then using the following code fragment

to copy the resulting 25 words to the board yields the best performance:

MOV CX, 25 ; Load counter with word count

MOV DX, DAC ; load I/O address register

MOV SP, BUFFER ; load pointer to data buffer

/I

Kerne

i

MM Processed Image

- r

I

I f

I

- I -k •

k . • p •

L . I p I

J . k i I

•^" P

. k ^ . • I

1 • 1 -

1

• ^ " . J .

1

• ^ -

1

•

L . 1 P •

k. 1

P •<

k« • p •< • •

L J . 1 1 P • » -

k ^ . 1 1 P ^ "

k J . 1 1 p n' 1 1 • •

— 1 — t 1 t 1 . . l . k J . k J . T i .^ _ M" v -;• Bor

• 1 1 1 1 • ^ " P ^ " P T - J 1. _» L J .

• • I I I

. -1 1. - i L - 1 . ' J 1 . 1 i 1

. . . L J . L J . k [ • • • 1 1 ^ ^ " p ^ " p ^ " P L J . k i ^ . k j . k [ • • I I I • "J" p ^" p ^ - p . J . k i / . k j . k , , « i 1 1 1

I

I I 1 r k J ) . k J . I • I I p " I " p • » ' k ^ . k ^ . I i • I p ^ " p f L J . k J . I I I I P ^ " P • ! ' • • I I

I I ^ " P J . L

I I • < • p i ^ . k I I

" p " I ' . k J .

I • • P " I ' . L J .

I I • P n ' . k J .

I I I I

• I • •*- r •%•

I I • • I - p ^ «

I I I • P ^ - P

. k J . k I I •

• P " » - P

. k J . k I I I

• P n " P . k ^ . k

I • I I • •

lip,iir(^ ,' ).l ( i i ap l i i r Ri;[)r(S(:iil at ion of C'onvohit ion.

22

REPNZ OUTSW DX, [ES:SP] ; copy da ta t o board

Copying the 49 bytes to a dedicated memory buffer also copies the data from

relatively slow DRAM system memory to the much faster SRAM cache, and also

allows sequential word access to a relatively long string of 25 words to maximize

eflBciency of the string instruction OUTS. The prefix REPNZ signifies that the

CPU is to perform the indicated instruction, decrement the counter register CX

and, if CX is not now zero, repeat the operation.

There is another benefit to this two-stage copy operation that also contributes

to improved performance. There is a significant performance penalty in

attempting a 16-bit memory reference to a so-called odd address (LSB = = 1).

This penalty would occur on every odd convolution cycle and seriously impact

performance if direct image-to-accelerator copies were attempted.

Intel processors rurming MS-DOS place a hmit on image size, too — the

64kByte segment size imposes a practical limit of 320 x 200 pixels for convenient

processing. A number of DOS extenders are available that claim to remedy this

hmitation, and a munber of 32-bit operating systems are available, but no

experiments were done with these products.

3.5 Autocorrelation

This architecture is not suited to accelerating autocorrelation. The small

computational array relative to the image size creates an awkward programming

23

task, drastically increases bus traffic, and the limited dynamic range of the board

comphcates handling of processed pixels.

3.5.1 Awkward Programming

The prime requirement for accelerating any kind of array processing is a clean

and natural interface between hardware and software. The odd size of the array

demands that the image to be processed have a width in pixels that is some

integer multiple of 49. Without this limitation, the task of autocorrelation cannot

be structured into a simple set of nested loops in the manner of convolution.

Including excess pixels on one edge of the image as fill is a poor solution because

of the 0(7^*) nature of autocorrelation.

3.5.2 Bus Traffic

Even accepting the requirement of a match between image and hardware, the

small size of the array relative to the image means that there is no kernel that can

be made resident in the array, calibrated once, and left in place for the duration of

processing. Instead each pixel in the image must be read from memory and loaded

into one DAC or another N"^ times compared to the N"^ -\-n'^ (where n is, at most

7) required by convolution. This is an increase in traffic of at least an order of

magnitude. In fact, testing shows a nearly 4x slowdown over direct floating-point

computation.

24

3.5.3 Dynamic Range

The limited resolution, and consequently hmited dynamic range, of the board

require recalibrating the system to each specific kernel. This is acceptable for

convolution, where a kernel is loaded once and kept resident during the processing

of the entire image. For autocorrelation, however, the kernel is the same size as

the image, and 7 x 7 images are seldom of interest. For maximum computational

performance it would be advantageous to recalibrate the board for each load into

the kernel side of the board, but the negative effect on time performance would be

extreme.

25

Table 3.1: Floating-Point Edge-Enhancing 7

-0.91 -1.42 -1.71 -1.74 -1.71 -1.43 -0.91

7 -1.43 -1.70 -0.72 0.13 -0.72 -1.70 -1.43

4.5 -1.71 -0.72 3.34 6.10 3.34 -0.72 -1.71

-1.74 0.13 6.10 10.00 6.10 0.13 -1.74

-1.71 -0.72 3.34 6.10 3.34 -0.72 -1.71

-1.43 -1.70 -0.72 0.13 -0.72 -1.70 -1.43

Kernel

-0.91 -1.43 -1.71 -1.74 -1.70 -1.43 -0.91

Table 3.2: Normahzed Integer Edge-Enhancing Kernel

-12 -18 -22 -22 -22 -18 -12

-18 -22 -9 2 -9

-22 -18

-22 -9 42 77 42 -9

-22

-22 2 77 127 77 2

-22

-22 -9 42 77 42 -9 -22

-18 -22 -9 2 -9 -22 -18

-12 -18 -22 -22 -22 -18 -12

CHAPTER IV

RESULTS

4.1 Analog Performance

Linearity, precision, accuracy and noise are the most important parameters of

the performance of any analog computational unit.

4.1.1 Noise

The effects of noise in an analog system can be minimized by providing

low-impedance ground and power planes, adequate decoupling capacitance, and by

using the maximum voltage swing available to the designer. It is also considered

good practice to isolate the circuit from noisy environments. Our board uses none

of these techniques.

The board is hosted in the electrically noisy environment of a microcomputer

chassis. There are no power or ground planes because the wire-wrap prototype

panels on which the board is constructed are not available with such planes. Such

decoupling is provided as space permits. An analog signal swing of ±1.2 Volts was

selected, not for performance, but for comparison with the eventual VLSI

implementation which would use the same range.

For all of these reasons, noise does intrude upon the overall performance of the

board.

26

27

4.1.2 Linearity, Accuracy and Precision

ror an analog-to-digital or digital-to-analog converter, accuracy and precision

are combined to measure the performance of the converter in terms of linearity.

Linearity, or rather, non-linearity, in a converter has two forms: integral and

differential. Consider an r^-bit digital-to-analog converter (DAC). In a DAC,

integral non-hnearity (INL) is measured by calibrating the DAC such that its

zero-value output is exactly 0 Volts, programming the DAC to its maximum

output, and measuring the difference between the DAC output and n — 1-^^ in

units of LSB (the AV represented by the least significant bit). INL is also called

gain error and can be minimized by careful calibration. Differential non-linearity

(DNL) is a similar measurement that is concerned with error between adjacent

values. Taken together, INL and DNL determine the useable resolution of a

converter — the part may offer 12 bits, but if it has a DNL of ±2 LSB (±2

counts), the useful resolution is only 10 bits because the error band of 4 means

that the two bits required to cover the error band are not meaningful.

The DAC parts used on the accelerator board are rated at ±0.5 LSB INL and

DNL, and the ADC is rated at ± 1 LSB. Adding the errors of the op-amps and

resistors in rest of each multiplier circuit and the summing amphfier, the overall

resolution of the board is expected to be between four and five bits.

The actual linearity as measured is shown in Figure 4.1. This chart was

produced by calibrating the board with the kernel shown in Table 3.1, and

28

supplying a ramping input that looked very much hke Equation (3.1). except

where Equation (3.1) has the value 255, our input would have the ramp value.

The chart shows the board output and the calculated output for each ramp value,

and the error between calculated and actual outputs (biased by 100 to make it

visible in the chart). Mean error is calculated at 1.3 counts (out of 256) and

absolute error is calculated at 2.6 counts. These error figures yield a precision of 6

bits and an accuracy of 5 bits. The resolution of the system can then be

comfortably described as 5 bits.

4.2 Multipher Speed

The DAC and op-amp parts selected are moderate-speed general purpose

devices suitable for the speed of the ISA bus. The DAC datasheet lists a settling

time of 200nS, and the op-amp datasheet lists 2uS to settle to 0.01%. This is close

to the value calculated from the output impedance and capacitance numbers given

in the datasheets that have r = 2.6AuS making the overall mulitpher time

constant 6r or 15.84uS (the multiplier has two amplifier stages).

A higher-speed version of the board, probably hosted on the PCI bus, would

require significantly faster devices. The DACs would need a digital interface about

three times faster than the AD7528's, and the op-amps would need to be

comparably faster for the faster expansion bus to make sense.

29

300

•Error • 100 •Calculated Result *ADCOutpiit Input Ramp

Input

Figure 4.1: Graph of Linearity.

30

4.3 Convolution Result

For the image shown in Figure 4.2, a floating-point convolution produced the

result in Figure 4.3. Compare this resuh with that of the hardware- accelerated

convolution in Figure4.4.

4.4 Computational Performance

The performance of ISA machines is highly variable, dependent not only on

CPU speed, but on number and types of I/O channels and the chipset that drives

those I/O channels. The ISA bus was defined when the fastest compatible CPU

was the 8 MHz 80286. At the time this project began, the typical processor in an

ISA system was a 33 or 66 MHz 80486 (both have a 33 MHz external bus), and

high performance systems used either a 60 or 66 MHz Pentium CPU (30 or 33

MHz external bus, respectively).

There were also two high-speed I/O channels in use, the 486-based,

consortium- developed VESA bus and the Pentium-based, Intel-developed PCI

bus. Both VESA and PCI channels carry a bus clock in the range of 25 to 33 MHz

and can therefore support a relatively small number of expansion slots. These

channels were also overkill for low-performance I /O devices hke modems and

floppy drive controllers. As a result, the chipsets that translate the CPU bus

signals and protocols to those of one of the high-speed channels, also produce

31

Figure 4.2: Test Image.

Figure 4.3: Software Convolution.

32

signals and protocols for the ISA bus. These chipset translations vary widely in

efficiency and performance.

Performing 7 x 7 convolutions on a 150 x 100 pixel image on a few machines

available in the labs around the CS department at TTU produced an average 5x

speedup over software-only convolution.

A 90MHz Pentium machine provides approximately break-even performance

with 1.29 seconds per frame for integer computation and 1.31 seconds per frame

using the accelerator. The trend toward microprocessors that operate their

computational cores and caches at increasing multiples of their external bus

speeds is increasing. The latest Intel Pentium Pro CPU runs internally at 200MHz

with mutiple integer computation pipelines, with a 200MHz LI cache, a 66MHz L2

cache and a 33MHz PCI expansion bus. A system based on one of these CPUs

would likely outperform this accelerator by a healthy margin.

4.5 Comparison with a Digital Parallel System

Adaptive Solutions, Inc. produces a line of digital parallel coprocessors for the

ISA and PCI buses - the CNAPS series. This section wiU briefly compare their

products with this analog acclerator.

The CNAPS boards are processors in their own right and rely on the host

CPU solely for loading in the source image and storing the result

[Adaptive Solutions 95/3]. This fact alone significantly reduces bus traffic and the

33

computational load on the host CPU. The arcitecture is SIMD and has a

maximum of 64 processing elements, meaning that 64 pixels could be processed in

parallel [Adaptive Solutions 95/2]. Therefore the CNAPS boards offer a significant

speedup over this accelerator [Adaptive Solutions 95/1].

Our accelerator requires less than 5 watts, mainly because it is a discrete

implementation with a significant digital content — the VLSI version wiU use

significantly less. The CNAPS boards use up to 25 Watts. Neither board has a

direct interface to video input, but the computational section of the accelerator

board can be implemented in an analog VLSI device, leaving enough board space

to add such an interface and still draw far less than 25 watts.

Implementing the 7 x 7 array in an analog VLSI and adding a video interface

has the further advantage of entirely offloading the chore of image I/O from the

host CPU and requiring only enough bus bandwidth to read out the processed

images.

II I mil • n n n n f M i r a T i II wi i'IT"

34

Figure 4.4: Convolution by Accelerator.

< \ ^I'^.r.i'.^fc^.^faWJMWlfMW^M.B.LB.*!'!. H J * I

CHAPTER V

CONCLUSION

A fully paraUel convolution system that performs an entire image convolution

in a single step would be the ideal. This board performs the convolution

summation- of-products in parallel for a given pixel, but processes each pixel

serially. Even given this limited parallehsm, and given the low speed of the ISA

bus, and the fact that the board stiU rehes on the host CPU for all of the data

handhng, blocking and ordering, the board offers a speedup of a factor of five over

the standard desktop system available at its time of inception.

Implementation of this project shows that, even using a small array, parallel

analog computation is an effective way to accelerate image convolution. It also

demonstrates the limitations of the ISA bus as a host technology for

high-bandwidth computation.

5.1 Future Work

This project is a demonstration of the vahdity of the concept of performing the

multiphcation/summation step of convolution using pixel-serial kemel-paraUel

analog computational elements. The future direction of research on this topic was

stated before this proof-of-concept project was begun — integration of this

computational array onto single analog VLSI die.

35

36

The board on which that analog VLSI device is to be integrated should not be

an ISA bus expansion board. Waiting for an ISA transaction to complete wastes a

significant number of cycles in today's high-speed microprocessors. A PCI board

would be a much better choice.

If this board is re-implemented in its discrete form, PCI should be the bus of

choice. This implies that very-high-speed DACs and op-amps will be required,

which will cause power consumption to increase. It may also require that the array

be divided into interleaved banks to cope with the speed of the PCI bus - even the

highest-speed commercially available DACs have fairly slow digital interfaces.

REFERENCES

[Adaptive Solutions 95/1] Adaptive Solutions, CNAPS Data Rook 1995.

[Adaptive Solutions 95/2] Adaptive Solutions, CNAPS/PCI-DLX Board Rtftrtnct Manual, 1995.

[Adaptive Solutions 95/3] Adaptive Solutions, CNAPS/PCI-PSP Board Rtftrtnct Manual, 1995.

[Analog Devices 92] Analog Devices, 'AD1528 CMOS Dual 8-hii Bufjtrtd

Multiplying DAC," Data Converter Reference Manual Volume I, 1992.

[Boahen and Anrdeou 1992] K. Boahen and A. Andreou, "A contrast sensitive sihcon retina with reciprocal synapses." In: Advances in Neural Information Processing Systems, Vol.4, Moody, J. E., Hanson. S. J., and Lippman, R. P., eds., pp. 764-772, Morgan Kaufmann, San Mateo, CA (1992).

[Burrus 85]

[Choudhary 90]

C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms, John Wiley & Sons, New York, 1985.

Alok N. Choudhary and Janak H. Patel, Parallel Architectures and Parallel Algorithms for Integrated Vision Systems, Kluwer Academic Pubfishers, 1990.

[Chua, Yang 88] Leon O. Chua and Lin Yang, "Cellular Neural Networks." IEEE Transactions on Circuits and Systems, IEEE, CAS-35, 1257, 1988.

[Chua, Roska 93] Leon O. Chua and Tamas Roska, "The CNN Paradigm." IEEE Transactions on Circuits and Systems, IEEE, CAS-40, 147, 1993.

[Churchland 92] Patricia M. Churchland, The Computational Brain. MIT Press, 1992.

[Damalcheruvu 93] Srinivas Damalcheruvu, A Two- Dimensional Convolution Unit Suitable for Analog VLSI Implementation With Vision Applications, Thesis (M.S.), Texas Tech Unversity, 1993.

[Elhott 92] WiUiam D. Elliott, Design and Performance Evaluation of a

Real-Time Im.age Processing Chip for Computer Vision,

Thesis (M.S.), Duke University, 1992.

37

38

[Hogan 88]

[Horn 86]

[IBM 85]

[Irvine 81]

[Koch, Li 94]

Thom Hogan, The Programmer's PC SOUK (hook, Microsoft Press, Redmond, WA, 1988.

Berthold Klaus Paul Horn, Robot Vision, MIT Press/McGraw-Hill, Cambridge, 1986.

IBM Corp., IBM PC/AT Technical Reference, IBM Corp, Boca Raton, 1985.

Robert G. Irvine, Operational Amplifier Characteristics and Applications, Prentice-Hall. New Jersey, 1981.

Christof Koch and Hua Li, Vision Chips: Implementing Vision Algorithms with Analog VLSI Circuits, IEEE Computer Society Press, 1994.

[Koch, Mathur 96] Cristof Koch and Bimal Mathur, " Neuromorphic Vision Chips," IEEE Spectrum, May, 1996.

[Lee 86]

[Li 94]

[Mahowald 92]

[McCleUand 88]

[Mead 89]

[Mead 90]

Hua Lee and Glen Wade, Imaging Technology, IEEE Press, New York, 1986.

Hua Li and Srinivas Damalcheruvu, "Locally Connected CMOS VLSI Design for Image Convolution," SPIE International Synnposium on Optical Engineering and Photonics in Aerospace Sensing, Orlando, FL, April 4-8, 1994.

Misha Mahowald, An Analog VLSI System for Stereoscopic Vision, Kluwer Academic Pubfishers, Boston, 1994.

James L. McClelland, Explorations in Parallel Distributed

Processing: A Handbook of Models, Programs, and Excercises,

MIT Press, Boston, 1988.

Carver Mead, "Adaptive retina," Analog VLSI implementations of neural .systems. Mead, C. and Ismail, M., eds., pp. 239-246, Kluwer, NorweU, MA (1989).

Carver Mead, "Neuromorphic Electronic Systems," Neurovision Systems, Madan Gupta and George Knopf, eds., pp. 463-470, IEEE Press, Piscataway, NJ (1994).

39

[Nussbaumer 82]

[Overington 92]

[Roska, Chua 93]

[Tohmieri 89]

[Yang 89]

Henri J. Nussbaumer, Fast Fourier Transform and Convolution Algorithms (Second Edition), Springer-Verlag, Berlin, 1982.

Ian Overington, Computer Vision - A Unified, Biologically-Inspired Approach. Elsevier, Amsterdam, 1992.

Tamas Roska and Leon O. Chua, "The CNN Universal Machine," IEEE Transactions on Circuits and Systems, IEEE, CAS-40, 1993.

Richard Tolimieri, Myoung An, Chao Lu, Algorithms for Discrete Fourier Tran.sform and Convolution, C. S. Burrus, Editor. Springer-Verlag, New York, 1989.

Woodward Yang, "Analog CCD processors for image filtering." Visual Information Processing: From Neurons to Chips, SPIE Proc. 1473: 114-127 (1991).

APPENDIX A

SCHEMATICS

i ^

UJUJ

S oo

° l S

>C5>5 - 5>o>

n sac 2. cog 5 aa^

> X

o

UJ

X

LU > m —I QL

O

- jS^r

*>

mzz

<oo

i oo S -

^ i JL ^

,

2 i

t

in

^ ffi

iA£:

•DA

C3

N

TR

ST

U

NTE

R

3 0 «f

*

5T-

o

o

i

5 2

l iT

oo iif

•iE'fi^

40

41

•iu I I

C ^ I Z C C C p

3 ooo o . - l

5 E i C C I l S i 8oS3S88E

t r

<

<

: ^ I ^ -A

Mi I \

i I I

illill §?

5lplSSSS Illill 5S

SSaaasss

sssszsas ,

5 So is is isas

SfisSaaaa

{SS^Siliy^'^ieiiii

<; *Si£iiiis9siiil*i?l - l -T: . I ' • M ' • ! •

IWMM lUUM MMll IMlm Mm mum cecjccEB ceciccce cecjpcre tecjccie ceesfcte cecsccie

SsS 355 S5S 355 3S3

^ fei

, ceesccjE

X5S

!2

3

?S3

i'

I i I I

3355

gSUSS . g3883 <>

! S' i

a 8 i a ® z <

o o >< UJ

a

42

/ ^ ^ .

f!

Ii

^ S

43

>-X

^ i X o <

I

5 M

^ . ^ - _

l illi iiili I f Ulhih I\ \ flSlii? l i i i si

i l l? f t ? t

44

' 5«

8 - 8 5 £ S j

8 B

I i § Mi ^

i i

g j^ S «

S* S5

1 i i

* S8

1—

.

" § |

1

es

I I 8 8 38§ h

I l lS i l i l

5 l o i

. _*-* - . » - - •

t l i l i

rH - === --

' nh i'i

n i l » e- " a

l i f t

T -^ ^- T. j}vmrj

nu ' ^ or e"

ill! I

M s

i l l

im

§ s a a i a s a s oS i

^ t

i'

45

s« . I s S?

fe i I I £' i 5 g * 8

g S

• 5 1

.' /-.

if

I

h < a

g 2 S?

5 i Si

J iBB

C 3!Sa §8iig88§ § 5 S

:/ -_:

it 1 fi

i 1-

i

, ii i • ^

§ • ?

- , . - . / •

i

« t

•• I 5

o 8 |

8 2iii

§1 iissi SiSiSgsi

§ I IISEI si§§g8sli

t ii H lliiigii

: f

,/ I

I I ^ C PT

i'

46

' 28

5 . H 2 M - -^

^ 5 to S

^ Si i

HI I « « ' I

i t —

5» i i

I—r ' i I "' " I '

9 8§ Si?

is

i r

e 8

:t ^ ? S i 5 -^ 8

2 S

S'-

n—^

!«

X i e

£«

' • \ ^

l »

i !?

3 | ;rn

fg i i£B8

iesSa s88igg8§9

-I !• !!

N i »

1

h ^•' 1 , . . - -. I I

1 ' 8 l

1

i'l

„ „ i elf '

' i?

* - ' t

. fill

Is l l i i igi i

I I

I I

w Si.

S

47

8 8 i &i 8.J • 'i. Ml I

« » f

i

gi

0 8 ?

T~r

I Ss 585

is < i cc

. l i i i . l i i i §1 i l s i l Si8iiSs§l

sKssasBS

IIS

I I ». . .-3» • - -

» L îls

îls

nl

I"

l i i i ^ IF- BT

- 1 1 r s

Ii

in

ill 3 Pa

îr

SSSSSBoS

SBBZSSSS

P g

8-.-8

^ 5 s«

f

9 5 f

• SI i

48

ii ii

^_, '^ _' _[ -^

ii 1 ' 88 5 8 |

I ^ "' r 1'

-- - —-

I'l

i 1

i

* i ft* I-- ' _ 8 > —-

is IK r 5 i

s 5 ; 9 liiJt sS§S§8si s

f i e fi

Bsaaaaas

l^il

i S

» « • 1 •

i

1 5§

:,

i i

5

'~

Sil l

gI ilsta g88i8gs§ I

i ,1 t li If

• I' t 'i I

1 * s

?i

i X

- " II § 8i osl

II

. fill 9 9 lissi 8i§i8§s§ 9

iip ssssBsas

ssssasoS

_E i-r » - -H « -.

s- f

i I

i I t i

49

fti

i—r

' I

=^. i s

I ^' 38|

3 . 8

* i fti § 9 iisSa 8§§S8S8§

I I aassssas

a. I —

11 l^ii

I li II

J Hi ii - ~

s s •;

l i gi SSg

ffii

. liii ilsSa si88§888 a

i I I [ i

§ I l^ssl 88888888 i

alp asssBSss

Bsssasaa

I i

lor

; i ft* - -

I 81 ?%|

fii

I I

0 I

• ,5 «

' i* >:

3-: I

^ 1

50

. fill ! I Msfl 88888888 I

ISSilaik

s a a 2 £ S £ s

8S8i§agS

^ t

S t

51

t i »i

- - ' Sg = 1

», 8 5

Ml \ ' c 9i? >

1

' i — i

1

• r

i-1

8 8

—11

ii

4 * -

--- ^

8 8 |

_

S|"'R

^

_8 >

_* 5 I

m —J

1—

8 _ §

Utsil 88888888

£ R

"I r

8 - = 5 fi

^ ^ ^ - - - ^

i i 88 g 8 |

i .;

^ £ i

J ™ i l I 88 88§

4 n i

g l i i i i I iiosi 88888888 I

Ik l§§§§§ii

^feaii 88888888 §

I I

J .

52

5 i.

I t

« 5 " «

" -- II I 88 g i |

>-

UJ . J Q .

ft?

£ R |

. liii \ \ l i s ^ i 888888881

l^ig > £S

I , 5 ••!.

ii 56 Sa?

. . . „ . _ . ,

' ~1 • !

*

1

!

-, s i

* 58

'

4 gs

I . liii

1 I iissi 88888888 I

„ liii

51 liaia 88888888 s

-'.UBcBsis-:

' J

U% siiiglii

I I

i I

• " ) .

53

ft J fij

I I I " I

J s

t SI i

I ' 81

±=h!

I I < i S i I—

i i ^^»fS 888888811 . liii

i l a s i 888888s§i

•BSZSSES * u

aaaasaSS

5 "

t t

54

, 5 1 'i 8 8

I

g i

j „ .. | 5

I 8^

<

. 5 5 fti 8- -_8

J 8 ^^^^

= ^* I 3 i i a f a 88§8ggs§ g

8 _ l ^ *? i i g fi

. iSSIiiii I l ^ l l ^ — - " - ^ gg

§ s

« I

. — = ^ .

i

-

~.

» —

1 j i

, .—

4 . 8i<

— .

. liii §1 l i ss i 8^^888881

i ;

in 5

i i

i i

8 i 8S?

. liii i l i i»t i 888388881

• ^ > - , •

I3§^§§ii W T

I I ._ ,.

i 1 I I i

55

Si

I ^ M

9

^ g

» S i

^ J i i J J i I 8§ sSf

Si

11 8 8

i ' l ^- I I iis^i 888888881 "5

8 - i fii

i L.

8 ._8

I i

i

§ ft

I' a. \—

I =3

g ft /,

' i i

r i i

/• ' i -,i * i

I 8 8 B S S

i

I I 38 giS

i

l i

. liii 5 I l l s ^ i 8888§§88i

i t

. liii 1 I liail 88888888 I

Els |g§gi§l§

• fc-

. • . ^ - - ^

41 tt a£ d i m od

I t

I -

56

' Si

8 ;•- ' ftj

9

t B 5 i- ftj

if t

i V I

§ i I i I ft

Si

il .5 =*.

= 5

I I

t- — — -« h -

g l i i i i i ilaii 88888888' 3i3 il

tit§§

§888 9

r'

i I

1 t

III

-r" -•f i f , ,

l l i t l i l l

I I I I ,

\ • • '

Ifr I f T;r - ^ - '?;i;,5«I'71fv''

l i i i ! I f I - I

.1

^\%\

lii=

iSi J ' m w

ooor §5 ^

I

1 iiiiiiii

5 s a a j a s a s oS ?

' ?

5"

57

8

* " S • Si ¥ & ;

; i I il I

I I

c £ 8 8

3

-

^ i

• — = - -

-

l i

1

i i

5 i fil ^ i _ i _ 8 =•

i i liissi 88888888 i

gi i i

is

Eii

3 « K ?3SS 8? S88 9

l i l i l i i l , b

N!

! [

J I i

1 ?

< -» UJ

I I =1 f l i

Si??,

I I l i s i i 88888888 1

— I l l l l i i l l l i

IM-

a c- PT

58

I «• si - 8 *

i

I ^

I i e ft

gi

II

Si

* i ft: ^^Qti 888888s8 §

R I E ft? CI — - o

S ft

-1 i

ssasasss ' i I

i ft A !•

' ii

s >

IllllSil ,i

o S

i I

i !,

8 §

S - ^ R T -

m S S O o

1 '! t

I 11

ii

II

l i

. liii I I llaii 88888888 I

-^ III lllllili

' I S 2 w t i I

59

r. ? S Si

? i

li i 5 E-

ill

.» S' i

i ft

i fi

= §1

OC UJ

Q.

I i l i s i a 888888881

^ i "'^

it

il

IlligSli

•BSZSSSS

.aaiaaaa a

aaaaaaSS

Si

M

60

I 8

I s «i

e i s i

. - • .

-; ii

1 =8 * *

• - -

r~.

. ^ „

<3 1

2->R^.

fti

B

5 g fti _-_J ^^siS 888888889

S?»fe

i . -

^

gs

. . ' - - Si ! ! I

i s

1 g l i i i

i i i i ss i 88888888 1

. J

-- - - ' I f i 1

l i

. liii Sisisi 88888888 1

Ih lliiigii

- • — • " • — « -

I I

8 S

y g. or

I I ?

I

61

.^l '- "i

I i

Si i i

i i

i , ^ •*• ? i

* g fit 8 * -

e i s I fti s 5 3ii 88888888 I

S ft

i i-

; j ; s

I' UJg

I a. t—

I =3

I i I ft

i^

ss

1 f gll

3 }

o 8 G8S

1 -t

1 I

f 8

I g l i i i J S i s f i 888888881

^ Ik liiiigii

I l i s t l 88888888 I ^ . = , . - . . . . „ u ^ -

t t f

5 t 5" s. »r I r

J i fti^^/ vs.

•! i. • ft

i ^

i ^ A 5 is

•I ' 'S«

I liii

I iis^i 88888888f

:i__ gSSSiSai

r.

I I t I

I-

62

1 t i

M if I

J S i

63

8SSSSSa§

g£ 38883858

ri snhit

o

is

o Ill I I

i i' i

5 e

§

i I

i

i53

L

f

ES

1

1 i '

— . — = — i « ' — ^ —

(]

«&

a

$ £ i s •<''' I s ^ - g« i i

liiiiii iiiiiii iiiiiii nnm nmn mun mmi = 1

1 i

i

o

64

g -

o o Mi

i 5

§.. ^ " G 5

O

o CO

. ; i

CO

o o

5 CO

o o

65

i

IP

•8

I

^ ri

w tu

O z

APPENDIX B

SOURCE CODE

/ / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

/ /

/ / F i l e : KERNEL.H / / Au tho r : Donald A. Symes / / P u r p o s e : D e f i n i t i o n f o r Kerne l c l a s s / /

// The kernel object cam create a convolution kernel given // a set of dimensions and a value for sigma. This floating-point // kernel will then be converted to a form useful to the hardwaxe // convolution accelerator. //

// Once created, the kernel may be stored to a disk file. A kernel // stored in a file can then be read in to a kernel object rather // thaji re-calculating. //

// A kernel file is a text file that has the following format for a // 7 X 7 kernel -

//

// w h sigma w and h are int values giving dimensions / / f f f f f f f of k e r n e l / / f f f f f f f sigma i s a f l o a t / / f f f f f f f / / f f f f f f f f a r e f l o a t s t h a t d e s c r i b e t h e k e r n e l / / f f f f f f f / / f f f f f f f / / f f f f f f f

/ /

/ / The d a t a members c o n s i s t of :

/ /

// w ajid h - WORD - dimensions of the kernel in pixels // sigma - float - the sigma value used to calculate kenrel // BYTE *maxa, *mina - pointers used in calibration // fdata - float* - pointer to the array of floating-point values // that make up the kernel // bdata - char* - pointer to signed 8-bit binary array

//

// The member functions are all public. Several constructors

66

// are provided:

//

// kernel0 - generic constructor. No mem allocated, no calculation. // Provided for completeness. //

// kernel(float ssigma, WORD ww, WORD hh) - constructor calculates // a kernel of given dimensions from given sigma using // formula //

// -[(i' 2 + j' 2) - 2sigma''2]e' [(i~2 + j ~2)/(2sigma' 2)] // k(x,y)= // 2sigma'"2 //

// kernel(ifstream &f) - construct kernel object from descr. in file.

//

// kernel(kernel &k) - copy constructor.

//

// A simple destructor is provided to ensure that the data arrays are // properly deallocated. //

// "kernel0 - destructor. Deallocates data arrays.

//

// Several utility functions are provided:

//

// create(float ssigma, WORD ww, WORD hh) - calculate kernel. // arguments - ssigma (float) sigma value used to calculate kernel // WW, hh(WORD) dimensions of kernel array. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail

//

// load(ifstream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error

//

// save(ofstream &f) - save kernel to file.

•e^aaa

68

// arguments - f (ofstream &) output file stream. // return - 0 (int) on success

// ~1 (int) on file access error //

// normalize0 - normalize kernel from float to binary. // arguments - none. // return - 0 (int) in all cases. //

// max_out() - returns sum of positive values in binary array

// arguments - none. //

// min_out() - returns sum of negative values in binary array // arguments - none. //

// widthO - return (WORD) dimension w // arguments - none. //

// height0 - return (WORD) dimension h // arguments - none. //

// SigmaO - return (float) value of sigma // arguments - none.

//

// pfdataO - return pointer (float *) to float version of kernel // array // arguments - none.

//

// pbdataO - return pointer (char *) to binary version of kernel

// array // arguments - none.

//

//

//////////////////////////////////////////////////////

#include <iostream.h> #include <fstream.h>

class kernel

{ WORD w, h; // dimensions of kernel in pixels BYTE *maxa, *mina;// pointers used in calibration

69

float sigma float *fdata char *bdata

// sigma used to calculate kernel // pointer to array of float version // pointer to array of binary version

public:

// constructors & destructors kernel 0;

kernel(float ssigma, WORD ww, WORD hh); kernel(ifstream &f); kernel(kernel &k); "kernel 0;

int create(float ssigma, WORD ww, WORD hh); // calculate new kernel int load(ifstream &f) int save(ofstream &f) int normalize(); int majc_out() ; int min_out();

// read kernel file // save kernel file

// normalize kernel from float to binary // return sum of positive values // return sum of negative values

WORD width0 { return w; } WORD height 0 { return h; } float SigmaO { return sigma; } float *pfdata() { return fdata; } char *pbdata() { return bdata; }

// functions called from constructors ONLY // calibrates board to current kernel // adjusts gain pot // adjusts zero pot // fills DACs // return current ADC input

private: void calibrateO ; int seek_max(); int seek_zero(); void set_data(BYTE * ) ; BYTE read.adcO; void settleO ;

/ > / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / /

//

// File: KERNEL.CPP // Author: Donald A. Symes // Purpose: Function definitions for kernel class

//

// The member functions are all public. Several constructors

// are provided:

//

70

// kernel0 - generic constructor. No memory allocated, no // calculation.

// Provided for completeness.

//

// kernel (float ssigma, WORD ww, WORD hh) - constructor that // calculates a kernel of given dimensions from given // sigma using formula //

// -[(i~2 + j~2) - 2sigma'"2]e' [(i' 2 + j'•2)/(2sigma''2)] // k(x,y)= // 2sigma' 2 //

// kernel(ifstreajn &f) - construct kernel object from file desc. //

// kernel(kernel &k) - copy constructor. //

// A simple destructor is provided to ensure that the data arrays are // properly deallocated. //

// "kernel() - destructor. Deallocates data arrays.

//

// Several utility functions are provided:

//

// create(float ssigma, WORD ww, WORD hh) - calculates kernel. // arguments - ssigma (float) sigma used to calculate kernel // WW, hh(WORD) dimensions of kernel array. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail

//

// load(ifstream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error

//

// save(ofstream &f) - save kernel to file. // arguments - f (ofstream &) output file stream.

71

// return - 0 (int) on success

// "1 (int) on file access error //

// normalize0 - normalize float values in kernel // arguments - none. // return - 0 (int) n all cases. //

// max_out() - returns sum of positive values in binary array // arguments - none. //

// min_out() - returns sum of negative values in binary array // arguments - none. //

// widthO - return (WORD) dimension w // arguments - none. //

// height 0 - return (WORD) dimension h // arguments - none. //

// SigmaO - return (float) value of sigma // arguments - none. //

// pfdataO - return pointer (float *) to float kernel array // arguments - none.

//

// pbdataO - return pointer (char *) to binary kernel array // arguments - none.

//

// private functions called from constructors ONLY

//

// kernel::calibrateO calibrates board to current kernel // arguments - none. // returns - void.

//

// kernel::seek_max() adjusts gain pot // arguments - none. // returns - void. //

// kernel::seek_zero() adjusts zero pot // arguments - none. // returns - void.

72

/ /

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiuiiiiiiiiiiiii/i

#include <math.h> #include <float.h> #include <dos.h>

#include "always.h" #include "kernel.h" #include "board.h" #include "pot.h"

pot gain_pot(GAIN); pot zero_pot(ZERO);

// Constructors ////////////////////////////////////////// //

// kernel0 - generic constructor. No memory allocated, no // calculation.

// Provided for completeness. //

kernel::kernel()

{ fdata = NULL; bdata = NULL; mina = maxa = NULL; w = h = 0;

sigma = 0.0;

}

//

// kernel(ifstream &f) - construct kernel object from file.

//

kernel::kernel(ifstream &f)

{ mina = maxa = NULL;

f >> w >> h » sigma; // read header

float max = -1000.0; int size = w * h;

73

fdata = new float [size]; // allocate space bdata = new char [size]; for (int i = 0; i < size && f.good() && !f.eof(); i++) {

f » fdata[i]; // read float values from file if (max <= fabs(fdata[i]))

max = fabs(fdata[i]); // track abs max for normalization }

for (int 1 = 0 ; 1 < size; 1++) // fill binary array with values bdata[1] = 127.0 * fdata[l]/max; // normalized from float values

>

//

//

// kernel (float ssigma, WORD ww, WORD hh) - constructor that // calculates a kernel of given dimensions from given // sigma using the formula from Meng & Li

//

// -[(x' 2 + y''2) - 2sigma-2]e' [(x''2 + y-2)/(2sigma-2)] // k(x,y)= // 2sigma"2

//

kernel::kernel(float ssigma, WORD ww, WORD hh)

{ int xindex, yindex; float image, image1;

mina = maxa = NULL;

sigma = ssigma; w = ww; h = hh;

// set object values from args

fdata = new float[w * h]; // allocate data arrays

bdata = new char[w * h] ;

xindex=(w - l)/2; yindex=(h - l)/2;

// create indexing centers

74

for (int i = -xindex; i < xindex + 1; i++) {

for (int j = -yindex; j < yindex + 1; j++) {

image = exp(-(float)(i * i + j * j)/(2 * sigma * sigma)); imagel = ((float)(i * i + j * j ) - 2 * sigma * sigma) * image; image = imagel/(2 * sigma * sigma); fdata[((i + xindex) * w) + (j + yindex)]= -image;

} }

float max = -1000.0; // find max absolute value in kernel xindex = w * h; // for normalization for (int k = 0; k < xindex; k++)

if (max <= fabs(fdata[k])) max = fabs(fdata[k]) ;

float factor = 127/max; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array

bdata[1] = fdata[1] * factor;

//

// Copy constructor

//

kerne l : :ke rne l (kerne l &k) {

mina = maxa = NULL;

h = k.heightO; // set dimensions and sigma from source w = k.widthO ; sigma = k.SigmaO;

fdata = new float [w * h] ; // allocate data arrays

bdata = new char [w * h] ;

memcpy(fdata,k.pfdata0,(w * h) * sizeof(float)); memcpy(bdata,k.pbdata(),(w * h) * sizeof(char));

// copy

// arrays

} //-

75

// Destructor 1111111111111111111111111111111111111111111111111

kernel::"kernel()

{ if (fdata) delete fdata; // deallocate data arrays if (bdata) delete bdata; if (mina) delete mina; if (majca) delete maxa;

} //

// Utility functions 111111111111111111111111111111111111111111111

II // create(float ssigma, WORD ww, WORD hh) - calculates a kernel // of given dimensions from given sigma using the // formula from Meng & Li

//

// -[(x''2 + y' 2) - 2sigma' 2]e [(x' 2 + y-2)/(2sigma'^2)] // k(x,y)= // 2sigma''2

//

// arguments - ssigma (float) sigma value used to calculate kernel // WW, hh(WORD) dimensions of kernel array.

//

// ***WARNING*** assumes data pointers are NULL! ***WARNING***

//

// side effects - allocates w * h array of float // allocates w * h array of char

//

// ***WARNING*** assumes data pointers are NULL! ***WARNING***

//

// return - 0 (int) on success // -1 (int) on memory allocation fail

//

int kernel::create(float ssigma, WORD ww, WORD hh)

{ int xindex, yindex; float image, imagel;

sigma = ssigma;

76

w = ww; h = hh;

fdata = new float[w * h]; // allocate space bdata = new char[w * h] ;

if (!fdata || !bdata) // check for allocation error return -1;

// use calculation from Meng/Li routine xindex=(w - l)/2; yindex=(h - l)/2;

for (int i = -xindex; i < xindex + 1; i++) {

for (int j = -yindex; j < yindex + 1; j++) {

image = exp(-(float)(i * i + j * j ) / ( 2 * sigma * sigma)); imagel = ((float)(i*i + j*j) - 2 * sigma * sigma) * image; image = imagel/(2 * sigma * sigma); fdata[((i + xindex) * w) + (j + yindex)]= -image;

} }

float max =0.0; // find absolute max for normalization xindex = w * h; for (int k = 0; k < xindex; k++)

if (max <= fabs(fdata[k])) max = fabs(fdata[k]);

float factor = 127/max; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array


return 0;

} //

//

// normalize0 - normalize float values in kernel // arguments - none. // return - 0 (int) in all cases.

77

//

int kernel::normalize0 {

float max = -1000.0, min = 1000.0; int size = w * h;

for (int k = 0; k < size; k++) // find min and max {

if (max <= fdata[k]) max = fdata [k] ;

i f (min >= fda ta[k] ) min = fdata[k] ;

}

/ / normalize f loa t data f l o a t span = majc - min; for ( i n t 1 = 0 ; 1 < s i z e ; 1++)

fdata[1] = (fdata [1] - min)/span + min/span;

return 0;

//

// load(if stream &f) - reads kernel from file f. // arguments - f (ifstream &) input file stream. // side effects - allocates w * h array of float // allocates w * h array of char // return - 0 (int) on success // -1 (int) on memory allocation fail // -1 (int) on file access error

//

int kernel::load(ifstream &f)

{ f » w » h » sigma; // read header

int size = w * h; int i = 0;

if (fdata) delete fdata; if (bdata) delete bdata;

// deallocate existing arrays

// if any

fdata = new float [size]; // allocate new arrays

78

bdata = new char [size];

if(!fdata || !bdata) // if allocate failed return -1; // return -1

while (f.goodO kk !f.eof()) // read float values f » fdata [i++];

if (f .badO kk !f .eof()) // if there was a file error return -1; // return -1

float max = 0.0; int xindex = w * h;

for (int k = 0; k < xindex; k++) // find absolute max kernel value if (max <= fabs(fdata[k])) // for normalization

max = fabs (fdata [k] ) ;

float factor = max/127; // fill binary array with values for (int 1 = 0 ; 1 < xindex; 1++) // normalized from float array


return 0;

} //

// save(ofstream &f) - save kernel to file. // arguments - f (ofstresim &) output file stream. // return - 0 (int) on success // -1 (int) on file access error

//

int kernel::save(ofstream &f)

{ f « w « h « sigma; // write header

int size = w * h; // set counter int i = 0; // and index

while (size— kk f .goodO) // write float values & f « fdata[i++]; // check for file errors

if (f .badO) // if there was an error

79

return -1; else

return 0;

// return -1

// otherwise return 0; } //

// max_out() - returns sum of positive values in binary array // arguments - none. //

int kernel::max_out() {

int retval = 0;

for (int i = 0, x = w * h ; i < x ; i++) if (bdata[i] >= 0)

retval += bdata[i];

return (retval);

} //

// min_out() - returns sum of negative values in binary array // arguments - none.

//

int kernel::min_out()

{ int retval = 0;

for (int i = 0 , x = w * h ; i < x ; i++) if (bdata[i] <= 0)

retval += bdata[i];

return (retval);

} Ih II II II II II II II II

void kernel: :calibrateO args - none return - void

Purpose: calibrate convolution accelerator board for maximum headroom for current kernel

80

void kernel: :calibrateO {

cout « "CalibratingXn";

int X = w * h; if (mina) delete mina; if (maxa) delete maxa;

// create arrays to will produce min and max out for given kernel maxa = new BYTE [x]; mina = new BYTE [x]; // allocate memset(maxa,0,x); memset(mina,0,x); // set to zero

for (int i = 0; i < x; i++)

{ if (bdata[i] < 0) //if kernel value negative

mina[i] = OxFF; // set min array value to max pixel if (bdata[i] > 0) // if kernel value positive

majca[i] = OxFF; // set max array value to max pixel }

// make up to 512 passes at cal int limit = 512;

while (!seek_zero() I I !seek_max() II limit—);

} //

// kernel::seek_max() adjusts offset pot // arguments - none. // returns - 0 if achieves zero setting // -1 if zero setting not made in 384 tries

//

int kernel::seek_zero()

{

cout « "Z" ; // report adjusting offset int z = 384; // set iteration limit static int lastmove = 0 ; // store dir of last adjustment BYTE last_read = OxFF, reading = OxFF; // set some starting values

set_data(mina); // fill image DACs with minimizing image frag

while (z—)

{

81

settleO; // let DAC array settle reading = read_adc(); // check value

if (reading == 0 kk last.read == 0 kk lastmove == 1) { // with these conditions, offset is BELOW 0

++zero_pot; // so move adjustment UP last_read = reading; continue; // and try again

} if (reading == 0 kk last_read != 0 && lastmove == 1) { // with these conditions,

++zero_pot; last_read = reading; continue;

} if (reading != 0 && last_read == 0 kk lastmove == 1) { // these conditions indicate last adjustment was it

—zero_pot; // move back down by one last_read = reading; lastmove = -1; // change direction indicator continue; // try again

}

if (reading != 0 && last_read != 0 && lastmove == 1) •[ // moving in wrong direction!

—zero_pot; // move down last.read = reading; lastmove = -1; // change direction indicator continue; // try again

} if (reading == 0 kk last_read == 0 kk lastmove == -1) { // conditions indicate last adjustment was the one

++zero_pot; // move back up by one

last_read = reading; lastmove = 1 ; // reverse direction indicator continue; // try again

} if (reading == 0 kk last.read != 0 && lastmove == -1)

return 0; // GOT IT! if (reading != 0 && last_read == 0 kk lastmove == -1) { // odd reading

82

}

—zero_pot; // keep moving down last_read = reading; continue; // try again

if (reading != 0 && last.read != 0 && lastmove == -l) { // Too high

—zero_pot; // move down last_read = reading; continue; // try again

>

kernel::seek_max() adjusts gain pot arguments - none, returns - 0 if achieves max: setting

-1 if max setting not made in 384 tries

>

return -1; } //

//

//

//

//

//

int kernel::seek_max()

{ cout « "G" ; // report that we are adjusting offset int z = 384; // set iteration limit static int lastmove = 0 ; // store direction of last adjustment BYTE last_read = 0, reading = 0; // set some starting values

set_data(majca) ; // fill image DACs with minimizing image fragment

while (z—)

{ settleO ; reading = read_adc();

// let DAC array settle // check value

if (reading == OxFF kk last.read == OxFF kk lastmove == 1) { // Too high

—gain_pot; // move down

last_read = reading; continue; // try again

>

if (reading == OxFF kk last.read != OxFF kk lastmove == 1)

83

{ —gain.pot; last_read = reading; continue;

>

if (reading != OxFF kk last.read == OxFF kk lastmove == 1) {

++gain_pot; last_read = reading;

lastmove = -1; continue;

} if (reading != OxFF kk last.read != OxFF kk lastmove == 1) {

++gain_pot; last_read = reading; lastmove = -1; continue;

}

if (reading == OxFF kk last_read == OxFF kk lastmove == -1) { // Too high

—gain_pot; // keep moving down last_read = reading; lastmove = 1; continue;

} if (reading == OxFF kk last_read != OxFF kk lastmove == -1)

return 0; // GOT IT! if (reading != OxFF kk last_read == OxFF kk lastmove == -1) { //

++gain_pot; last_read = reading; continue;

>

if (reading != OxFF kk last.read != OxFF kk lastmove == -1)

{ //

++gain_pot; last_read = reading; continue;

}

84

>

return -1;

} //

// kernel::set_data(BYTE *d) // args - d (BYTE *) // return - void

//

// loads image DACs from array pointed to by arg

//

void kernel::set_data(BYTE *d)

{ outp(COUNTER, 0x00); // set counter to point to image DACs

for (int i = 0, j = (w * h)/2; i < j; i++) outpw(DACS,((int*)d)[i]);

} //

// kernel::read_adc() // args - none // return - ADC reading

//

// convert and return value on ADC input

//

BYTE kernel::read_adc()

{ outp(ADC,0); settleO ; while (inp(COUNTER) k 0x80);

return inp(ADC);

}

void kernel: : settleO

{ int a = 10000; // waste time

while(a—);

}

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll II II File: IMAGE.H // Author: Donald A. Symes

85

// Purpose: Definition for image class

//

// The image object holds an image, creates an optional border // and performs convolution from a given kernel eind a source image. // The border is required to prevent aliasing at edges of image. The // border is filled with copies of the edge pixels.

//

// The image is read from or stored in a text file with the following

// format -

//

// w h z w (int) width of image in pixels / / i i i i i i i i i i i . . . . h (int) height in pixels

/ / i i i i i z (int) maximum pixel value in image // : (defaulted to 255) // : i (int) pixels //

// The data memebers are:

//

// w and h - WORD - width and height of image in pixels (bytes) // z - WORD - maximum pixel value (defaulted to 255) // b - WORD - width of border around active portion of image // bdata - BYTE* - pointer to image data array

//

// The member functions are all public. Several constructors are

// provided:

//

// image0 - generic constructor. No allocation. //

// image(WORD width, WORD height, WORD border) // side effects - allocates data array from given dimensions // and border, fills array with zeroes

//

// image (if stream &f, WORD border = 0) - reads image from file // side effects - allocates data array

//

// image(image &k) - copy constructor

//

// A simple destructor guarantees orderly deallocation of image array

//

// "imageO;

//

86

// Several utility functions are provided* //

// load(ifstream &f, WORD border = 0) - read image from file // args - f (ifstream &) input file stream // border (WORD) border size

// side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error

//

// save(ofstream &f) - write image to file

// args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error //

// widthO - return (WORD) width of active image (except border) // height0 - return (WORD) height of active image (except border) // maxO - return (WORD) maximum pixel value // bordO - return (WORD) border width // pbdataO - return (BYTE *) pointer to image array //

// display(WORD x,WORD y) - display image (active area) at given // coordinates. Does NOT perform bounds checking - image // position is responsibility of programmer. // args - X, y (WORD) pixel coordinates for upper left corner // of image. // return - void //

// There are three versions of convolution routine.

//

// conv.float(const kernelft k, const image& i) - FP version // performs all calculations in floating point, for comparison. // conv.int (const kernel& k, const image& i) - integer version // performs calculations using long integers, for comparison. // conv_accel(const kernel& k, const imageft i) - hardware accel // version. Perform calculation using accelerator board.

//

// Args - k (kernel &) kernel for convolution // i (image &) image to be convolved // side effects - allocate and delete kernel-sized array and // small array of pointers // return - (int) -1 on allocation error

8/

//

// For comparison of efficiency, some various convolution routines

// use different methods to access the accelerator board.

//

// hw_conv(const image &I, BYTE *kernel) - 'C++' language version

// very slow, used DMA, obsolete.

//

// hw_conv_8_noblk(const image &I) - 8-bit transfers direct from

// image array. Slowest of ASM versions.

//

// hw_conv_8_block(const image &I) - 8-bit transfers using

// intermediate buffer.

//

// hw_conv_a(const image &I, BYTE *kernel) - Uses assembly code for

// bulk of convolution. 16-bit transfers direct from image array

//

// hw_conv_16_block(const image &I, const kernel &k) - 16-bit xfers

// using intermediate buffer. Fastest of these versions.

//

l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l

#include <iostream.h>

#include <fstream.h>

#ifdef UICPP

extern int attached;

#endif

#ifndef UICPP static int attached = 0 ; // tracks state of display initialization

#endif

class image

{ WORD w, h, z, b; // width, height, maxpixel and border width BYTE *bdata; // pointer to data array

public: // constructors

imageO; // generic

image(WORD width, WORD height, WORD border); // blank

image(ifstream &f, WORD border = 0 ) ; // read from file

88

image(image &k);

imageO ; // destructor

// Utility functions

// copy

int load(ifstream &f, WORD border = 0); int save(ofstream &f);

WORD widthO { return w WORD height 0 { return h WORD max() { return z WORD bordO { return b

} } } }

BYTE *pbdata() { return bdata; }

int conv_f loat (kemel& k, imageft i) ; int conv_int (kernelft k, imageft i); int conv_accel(kemel& k, imageft i) ;

void display(WORD X,WORD y);

//

// Obsolete (slow) convolution functions

//

// hw_conv(const image &I, BYTE *kernel); // hw_conv_8_noblk(const image &I) ; // hw_conv_8_block(const image &I); // hw_conv_a(const image &I, BYTE *kernel); // hw_conv_16_block(const image &I, const kernel &k);

};

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll II II File: IMAGE.CPP // Author: Donald A. Symes // Purpose: Definition for image class member functions

//

// The member functions are all public. Several constructors are // provided:

//

// imageO - generic constructor. No allocation.

//

89

// image(WORD width, WORD height, WORD border)

// side effects - allocates data array from given dimensions // and border, fills array with zeroes //

// image(ifstream &f, WORD border = 0) - reads image from file // side effects - allocates data array //

// image(image &k) - copy constructor //

// A simple destructor guarantees orderly deallocation of image array //

// "imageO; //

// Several utility functions are provided: //

// load(if stream &f, WORD border = 0) - read image from file // args - f (ifstream &) input file stream // border (WORD) border size // side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error //

// save(ofstream &f) - write image to file // args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error //

// widthO - return (WORD) width of active image (except border) // height0 - return (WORD) height of active image (except border) // max() - return (WORD) maximum pixel value // bordO - return (WORD) border width // pbdataO - return (BYTE *) pointer to image array

//

// display(WORD x,WORD y) - display image (active area) at given // coordinates

//

// There are three versions of convolution routine.

//

// conv_float(const kernel* k, const imageft i) - FP version // performs all calculations in floating point, for comparison. // conv_int (const kernel* k, const imageft i) - integer version

90

// performs calculations using long integers, for comparison. // conv.accel(const kernel* k, const image* i) - hardware accel // version. Perform calculation using accelerator board. //

// Args - k (kernel *) kernel for convolution '/ i (image *) image to be convolved // side effects - allocate and delete kernel-sized array and // small array of pointers // return - (int) -1 on allocation error //

//

/////////////////////////////////////////////////////////////////////

#include <iostream.h> #include <math.h> #include <float.h>

#include "always.h" #include "kernel.h" #include "image.h" #include "board.h" extern "C" { #include "pcdip.h" }

// Constructors 111111111111111111111111111111111111111111

II // imageO - generic constructor. No allocation.

//

image::image()

{ bdata = NULL; w = h = z = b = 0;

// cout << "image: : imageO\n";

} //

// image(ifstream *f, WORD border = 0) - reads image from file // side effects - allocates data array

//

91

image::image(ifstream *f, WORD border) {

f » w » h » z; // read header b = border;

int skip = border * 2; // set some indexing constants int height = h + border; int width = w + border; int row = w + skip; int read_data, i, j;

bdata = new BYTE[(w + skip) * (h + skip)]; // alloc image space memset(bdata,0,(w+skip)*(h+skip)) ; // and set it to zero

for (i = border; i < height; i++)

{

for (j = border; j < width; j++)

{ f >> read_data; // read pixels into active bdata[(i * row) + j] = read_data; // part of array

} }

if (border) // if there is a border, copy pixels from edge of active { // part of array to fill border

height += border - 1; for (int k = border; k > 0; k—)

{ memcpy(*(bdata[(k - 1) * row]), *(bdata[k * row]), row); memcpy(*(bdata[(height - k + 1) * row]),

*(bdata[(height - k) * row]), row);

}

int stop = height + border; int ndx; for (int m = 0; m < stop; m++)

-C ndx = m * row; for (int n = 0; n < border; n++)

{ bdata [ndx + border - n - 1] = bdata[ndx + border];

92

} }

bdata [ndx + width + n] = bdata [ndx + width - 1] ;

} //

// image(WORD width, WORD height, WORD border) // side effects - allocates data aurray from given dimensions // and border, fills array with zeroes //

image::image(WORD ww, WORD hh, WORD border) {

z = 255; w = ww; h = hh; b = border; bdata = new BYTE [(w + border) * (h + border)]; memset(bdata,0,(w + border) * (h + border));

} //

// image(image &k) - copy constructor

//

//

image::image(image *k)

{ h = k.heightO; w = k.widthO ; z = k.maxO ; b = k.bordO; bdata = new BYTE [(w + b) * (h + b)]; memcpy(bdata,k.pbdata0,(w * h) * sizeof(char));

} //

// destructor

//

image::~ image()

{ if (bdata) delete bdata;

} //

//

X » c rH

93

///////////////////////////////////////////////////////////////////

//

// Utility functions

//

///////////////////////////////////////////////////////////////////

//

// load(if stream *f, WORD border = 0) - read image from file // args - f (ifstream *) input file stream // border (WORD) border size // side effecs - allocates data array (w + border) * (h + border) // return - (int) -1 on allocation error // 0 on no error //

int image::load(ifstream *f, WORD border)

{ f » w » h >> z; // read header b = border;

int skip = border * 2; // set up some indexing constants int height = h + border; int width = w + border; int row = w + skip; int read_data;

// allocate data array, return -1 if allocation fails if ((bdata = new BYTE[(w + skip) * (h + skip)]) == NULL)

return -1;

for (int i = border; i < height; i++) // read data into active { // part of array

for (int j = border; j < width; j++)

{ f >> read_data;

bdata[(i * row) + j] = read.data;

}

}

if (border) // if there is a border, fill it with pixels from { // edge of active part of array

for (int k = border; k > 0; k—) { // copies top * bottom row into top * bottom border

iMiM^

94

memcpy(*(bdata[(k - 1) * row]), *(bdata[k * row]), row); memcpy(*(bdata[(height - k + 1) * row]),

*(bdata[(height - k) * row]), row); }

int stop = height + border; // create indexing constants int ndx;

for (int m = 0; m < stop; m++) // now copy left * right edge { // columns to left * right borders ndx = m * row;

for (int n = 0; n < border; n++) {

bdata [ndx + border - n - 1] = bdata [ndx + border]; bdata[ndx + width + n] = bdata[ndx + width];

} }

}

return 0;

} //

// save(ofstream *f) - write image to file // args - f (ofstream &) output file stream // return - (int) -1 on file error // 0 on no error int image::save(ofstream *f)

{ int x_ndx; int y_ndx = b * (w + (2 * b)); // set up indexing constants int x_max = w + b; // and variables int y_inc = x_max + b; int stop = y_inc * (h + b);

// write image to file

for (; y_ndx < stop ** !f.bad(); y_ndx += y_inc) for (x_ndx = b; x_ndx < x_max ** !f.bad(); x_ndx++)

f << bdata[y_ndx + x_ndx];

if (f.badO)

{ cerr << "Error saving file\n"; return -1;

95

>

return 0; } //

// conv.float(const kernel* k, const image* i)

// args - k (kernel *) kernel to be used m convolution

/' i (image *) image for convolution source //

// Perform convolution using floating point accumulator - allows // maximum dynamic range so sum won't overflow //

int image::conv_float(kernel* k, image* i) {

int ksize = k.widthO; // set up indexing constants int width = i.widthO; int height = i.height();

int skip = ksize - 1; float max = -1000.0, min = 1000.0; float *ker = k.pfdataO; // get pointer to FP kernel data float ace; // use float for accumulator

int *idata = new int [(width + skip) * (height + skip)]; if (idata == NULL) // allocate int array for temp target image

return -1; //if allocation failed, return error

BYTE **img = new BYTE *[ksize]; if (img == NULL) // allocate BYTE pointerpointer to source

return -1; // image conveniently (one pointer per row in // kernel)

img[0] = i.pbdataO; // point each at a row in source image for (int a = 1; a < ksize; a++)

img [a] = img[a - 1] + width + skip;

// this segment is the actual convolution for (int f = 0; f < height; f++) // for each column

{ for (int c = 0; c < width; C++) // of each row

{ ace =0.0; // clear accumulator, then

96

for (int d = 0; d < ksize; d++) // for each kernel column { for (mt e = 0; e < ksize; e++) // for each kernel row

'• // accumulate products ace = ace + (float)(ker[(d*ksize)+e] * img[d][e]);

® ~ ej // extra line for debug }

} if (ace >= max) // track maximum and minimum values

max = ace; // for normalization if (ace <= min)

min = ace; // convert float result to int // and place in int temp image

idata[((f + skip/2) * (width + skip)) + c + skip/2] = ace;

for (int p = 0; p < ksize;p++) // update source pointers img[p]++; } for (int q = 0; q < ksize; q++) // end of row - update source img[q] += skip; // pointers to column past border

} // end of convolution section

max = majc - min; // determine range of convolution output for (int g = 0, h = (width + skip) * (height + skip); g < h; g++)

bdata[g] = (unsigned char)(255.0 * ((float)idata[g] - min)/max); // normalize int array into BYTE array

delete img; // de-allocate local arrays delete idata; return 0;

//

// conv_int (const kernel* k, const image* i) // args - k (kernel ft) kernel to be used in convolution // i (image *) image for convolution source

//

// Perform convolution using floating point calculations. // Convert source image and kernel data to fractions to most // closely simulate board operation.

//

int image::conv_int (kernel* k, image* i)

97

{

int ksize = k.widthO; // make local indexing constants

int width = i.widthO; int height = i.height(); int skip = ksize - 1; int max = k.max_out(), min = k.min_out(); // get max range

char *ker = k.pbdataO; // get pointer to kernel BYTE array float ace, kk, ii;

BYTE **img = new BYTE *[ksize]; // allocate array of pointers if (img == NULL) return -1; // if alloc fails, return error img[0] = i.pbdataO; // simplify addressing for (int a = 1; a < ksize; a++)

img[a] = img[a - 1] + width + skip;

// Convolution section for (int f = 0; f < height; f++) // For each column, {

for (int c = 0; c < width; C++) // of each row of source image -

{ ace =0.0; // clear accumulator for (int d = 0; d < ksize; d++)// for each col of kernel

for (int e = 0; e < ksize; e++) // for each row of kernel { // convert kernel value to fraction

kk = (float)(ker[(d * ksize) + e])/127.0; // convert image value to fraction

li = (float)(img[d][e])/255.0;

// accumulate convolution value

ace = ace + (kk * ii);

} // convert float convolution value to normalized BYTE value

bdata[((f + skip/2) * (width + skip)) + c + skip/2] = 255 * ((ace * 127.0) - min)/(max - min);

for (int p = 0; p < ksize;p++) // advance pointer set to next img[p]++; // source column

} for (int q = 0; q < ksize; q++) // at end of row, skip border img[q] += skip;

}

98

// end of convolution section

delete img; return 0;

// deallocate local array

//

// eonv_aeeel(const kernel* k, const image* i) //

// args - k (kernel *) kernel to be used in convolution // i (image *) image for convolution source

//

// Perform convolution using hardware accelerator.

//

int image::conv_accel(kernel* k, image* i)

{

int X = w, y = h, kd = k.widthO; // Make local versions of int blk_sz = (kd * kd)/2 + 1 ; // variables to simplify int inc_x = kd - 1; // (and speed up) addressing BYTE *dt = bdata;

BYTE *blk = new BYTE [(kd * kd) + 1] ; // Allocate buffer static BYTE **img = new BYTE *[kd]; // Allocate pointer array

// source image

if ((blk == NULL) I I (img == NULL)) // if alloc fail, return

return -1; // error

img[0] = i.pbdataO; // set up pointer array for (int j = 1; j < kd; j++) // one pointer per kern row

img[j] = img[j - 1] + x + kd - 1; // pointing to first N rows of image

_asm _asm _asm _asm

_asm y_loop: _asm asm

push push

push push

mov

push mov

SI

di

es ds

ex.

ex ex.

y

X

;// preserve critical registers

;// set image row counter

;// save image row counter ;// set image column counter

•M

99

x_loop: _asm _asm .asm _asm _asm

b_loop: _asm

_asm _asm _asm _asm _asm _asm _asm _asm _asm

_asm _asm _asm _asm

_asm _asm _asm _asm

_asm

_asm _asm _asm _asm _asm _asm

_asm _asm

push mov mov les mov

push mov push Ids Ids repnz add pop pop loop

mov mov xor out

mov push Ids repnz

pop

m in m in m e out

les mov

ex ex,

dx, di.

bx,

ex ex, ds

S I ,

si,

kd ; ex ; [blk] ;

0 ;

dx ;

[img] ; [si + bx[

movsb ; bx, ds ex

b_l

ex, dx,

al. dx.

dx. ds

si. out

ds

al. al.

al, al. dx dx.

di. bx.

4 ;

oop ;

blk_sz ; COUNTER ; al ; al

DACS ;

[blk] ; sw ;

dx ; dx dx dx

al ;

[img] ; 0 ;

// save image coljjnn ccj.r.ter // set buffer xfer outer Ice:: comter // make copy for inner loop coujiter // set target pointer to buffer // set buffer offset

// save outer loop counter // move inner loop counter to count reg // save data segment for string ops // get source pointer for block row ;// from pointer axray

// copy current block row to buf // go to next pointer m pointer array // get data segment back to get next ptr // get outer loop counter // decrement counter, if != 0, loop

// Having copied data to buffer, // set buffer counter // point I/O address to COUNTER // clear value in COUNTER

// now point I/O address to DAC array // save data segment for string operation // set up source pointer to buffer // string I/O copy buffer to board // get data segment back

;// let board settle

;// start converter

// increment row pointers // point to pointer array // clear offset

100

_asm mov ex, kd ;// set counter

p_loop: _asm mov _asm inc _asm mov _asm add _asm loop

_asm mov not_done: _asm in _asm and _asm jnz

.asm

.asm

.asm asm .asm asm

mov in les mov inc mov

si, es: [di+bx]

si es:[di+bx], si bx, 4 p_loop

al, dx al, 0x80 not_done

dx, ADC al, dx di, [dt] es:[di], al di

// get pointer from array // increment it // store it // go to next pointer // decrement counter, if != 0, loop

dx, COUNTER ;// wait for ADC status bit in counter

// get status // isolate status // test it, and act

// ADC is ready now // read data // get current image pointer // store data // increment image pointer

WORD PTR [dt] , di ;// and store it

.asm

.asm

.asm

.asm asm

pop loop

mov mov les

ex x_loop

ex, kd bx, 0 di, [img]

;// get column counter ;// decrement counter, if != 0, loop

// if at last column, need to skip // pointer array past border

//

p_loop2: _asm mov _asm add _asm mov _asm add _asm loop

_asm mov _asm add asm mov

si, es: [di+bx] si, inc_x es:[di+bx], si bx, 4 p_loop2

// get pointer from array // increment by 2*border to next row // store updated pointer // point to next pointer // decrement counter, if != 0, loop

di, WORD PTR [dt] ;// and do the same for target image

di, inc_x ;// data pointer

WORD PTR [dt] , di

asm pop ex ;// now get row counter

101

_asm _asm

_asm

done: _asm

_asm _asm

_asm

dec

jz jmp

pop pop pop

pop

return 0;

ex done y_loop

ds es si di

// decrement counter, if == 0, done // this locution required because // eond jumps limited to +/-128 bytes

;// finished with convolution ;// restore registers

// return OK } //

//

// display(WORD x,WORD y) - display image (active area) at given // coordinates

// args - X, y (WORD) pixel coordinates for upper-left corner // of image in this object // return - void //

// Displays active image area with upper left corner of image at

// coordinates (x, y). Bounds checking is NOT performed - position // is responsibility of programmer. //

void image::display(WORD x,WORD y) { // int sereen_x, sereen_y;

if (!attached) // if image display libraxy not initialized

{ asm push es // save critical register attached = pe_attaeh(); // initialize graphics library pe_baekground_black(); // to 320 x 200 pe_gray_shades(); // with 64 level grey scale asm pop es // restore critical register

}

if (!attached) return;

// if graphics library init fails

// return without attempt to display

int row = w + 2 * b; // set an indexing constant

102

for (int i = b, sereen_y = y; i < h + b; i++, screen_y++) { // starting from active corner

for (int j = b, screen_x =x; j < w + b ; j++, screen_x++) { // display active area of image

asm push es // library does not preserve this register pe_write_pixel(sereen_y,sereen_x,bdata[(i*row)+j]/4);

asm pop es // scale pixel to 64-level grey }

} } / /

// hw_eonv(const image *I, BYTE *kernel) - 'C++' language version // very slow, used DMA, obsolete.

/ /

//void image::hw_eonv(const image *I, BYTE *kernel)

//{ // int bloek_size = kern.dim * kern_dim; // int counter = 0, stop = siz_x * siz.y; // BYTE *i_buf = new BYTE[bloek_size]; // int X = kern_dim/2, y = kern_dim/2; / /

// setup^dma(kernel,bloek_size,7);

// outp(COUNTER,0x80); // dma_eyele_nr(7);

/ /

// setup_dma(i_buf,bloek_size,7);

// I.get_block(i_buf);

/ /

// while(counter++ < stop)

// i II dma_eyele(7);

/ /

//.asm mov ex, OxOOFF; // need a short delay for board to settle

//here: //_asm loop here;

/ /

// outp(ADC,0); // I.get.block(NULL); // block.size = inp(ADC); // write.pixel(block.size,X,y); // if (++X > siz.x + kern.dim/2 - 1)

. iaE«»B**S8JST v:. -:

103

/ /

/ /

/ /

/ /

/ /

//} }

{ X =

y++

>

k e r n .

t

. d i m / 2 ;

/ / / /

//// hw_eonv_a(eonst image *I, BYTE *kernel) - Uses asm code for //// bulk of convolution. 16-bit xfers direct from image array ////

//// *** OBSOLETE ***

nil //void image::hw.eonv.a(eonst image *I, BYTE *kernel) //{

int block.size = (kern_dim*kem.dim)/2+l; int p; int X = (kern.dim/2)-1, y = kern.dim/2; int w = siz.x + kern.dim - 1; int sx = siz.x, sy = siz.y, kd = kern.dim;

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//

//.asm //.asm //.asm

//


//

// set up indexing // constants

static BYTE *i.buf = new BYTE [block.size]; static BYTE **img = new BYTE *[kern.dim]; static BYTE *dt;

// static arrays // transfer block // source ptr array // target pointer

dt = data;

if (!i.buf I I !img) abort 0;

img[0] = I.get.img.ptrO ; for (int i = 1; i < kern.dim; i++)

img[i] = img[i - 1] + w;

// set target pointer

// if array alloc failed // die without explanation

// set source pointers

push si push di push bx

mov dx, COUNTER mov al, 0x80 out dx, al

;// save critical registers

;// set counter to access kernel

.V . 1

y

104

//.asm //.asm //.asm ll_a.3m

//.asm //.asm

II //.asm //.asm //.asm //.asm //.asm //.asm

II //k.loop //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm

II //.asm //y.loop //.asm //.asm

mov mov mov repnz mov out

mov mov mov push push pop

1: push mov mov repnz mov inc inc pop loop pop

mov ;

push mov

//x.loopl: //.asm

II //.asm //.asm

II //.asm //.asm //.asm //.asm

II

push

mov out

mov mov mov repnz

si. CX,

dx. out dx. dx.

CX,

bx.

di. es ds es

CX

CX,

si. mov

[kernel] block.size DACS sw

CLR.COUNT al

kd [img] [i_buf]

i i

kd [bx] sb

[bx] , si bx bx CX

k.l es

CX,

CX

CX,

CX

dx. dx.

si. CX,

dx. out

oopl

sy

sx

ADC al

[i_buf] block.size DACS sw

// put kernel in DACs

;// copy first block to transfer buffer

;// increment pointers

; // start ADC

// during conversion, xfer next // block to board (overlap)

d

105

//.asm //.asm

//.asm ll_a.3m

//.asm //.asm

II

mov mov mov push push pop

//k.loop2: //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm

II //.asm //.asm

II //.asm //.asm //.asm //.asm //.asm //.asm //.asm

II //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm

II //.asm

push mov mov repnz mov inc inc pop loop pop

mov in

mov mov imul add mov add mov

CX, kd bx, [img] di, [i.buf] es ds es

CX CX, kd si, [bx] movsb [bx], si bx bx CX k.loop2 es

dx, ADC al, dx

cl, al ax, y w ax, X bx, [dt] bx, ajc [bx], cl

mov ajc, X inc mov cmp jle mov shr mov inc

mov

ax X, ajc a^, w same.row a_x, kd ajc, 1 X, ajc

y

bx, [img]

;// copy next block to buffer

;// increment pointers

;// conversion should be done now, ;// get result

;// do some perverted and time-;// consuming address calculation

;// store result in target image

;// yet more addressing

Mij^<

106

II^asm mov //_asm mov //.asm dec //k_loop3:

mov add mov inc inc loop

row pop loop pop loop

pop pop pop

ex, kd ax , ex ax

s i , [bx] s i , ax [bx] , s i bx bx k_loop3

ex x.loopl ex y.loop

bx di si

//.asm //.asm //.asm //.asm //.asm //.asm

//

//same. //.asm //.asm //.asm //.asm

//


//

//} ////

//// hw.eonv.8.bloek(eonst image *I) - accel convolution using //// 8-bit transfers and intermediate buffer.

nil nil *** OBSOLETE ***

nil //void image::hw.conv_8_bloek(const image *I) //{

int blk.sz = kern.dim * kern.dim; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1;

//

//

//

//

//

//

//

//

//

//

//

BYTE *dt = d a t a ; BYTE *blk = new BYTE [ b l k . s z ] ; s t a t i c BYTE **img = new BYTE * [ k d ] ;

img[0] = I . g e t . i m g . p t r O ; f o r ( i n t i = 1; i < kd; i++)

img[ i ] = img[i - 1] + s i z . x + kd - 1;

riiev';^ .J22::U

107

II II.asm //.asm //.asm //.asm //.asm

II //y.loop //.asm //.asm

II //x.loop //.asm //.asm //.asm //.asm //.asm

II //b.loop //.asm //.asm //.asm //.asm //.asm //.asm //.asm II _a.sm //.asm //.asm


II //.asm //.asm //.asm //.asm //.asm

II

push push push push mov

push mov



mov mov xor out

mov push Ids repnz pop

si

di es ds CX, y

ex ex, X

ex ex, kd dx, ex di, [blk] bx, 0

ex ex, dx ds si, [img] si, [si + bx] movsb bx, 4 ds ex b.loop

ex, blk.sz dx, COUNTER al, al dx, al

dx, DACS ds si, [blk] outsb ds

^.V«4

108

//.asm //.asm //.asm //.asm //.asm //.asm

II //.asm //.asm //.asm

II //p.loop: //.asm //.asm //.asm //.asm //.asm

II //.asm

in in in in inc out

les mov mov

mov inc mov add loop

mov //not.done: //.asm //.asm //.asm

II II_asm //.asm //.asm //.asm //.asm //.asm

II //.asm //.asm

II //.asm //.asm //.asm

II //p.loop2 //.asm //.asm

in and jnz


pop loop

mov mov les

mov add

al. al. al. al. dx dx,

di, bx. ex.

si, si es: bx.

dx ;// let dx dx dx

al ;// sta

[img] 0 kd

es: [di+bx]

[di+bx], si 4

p.loop

dx.

al. al. not

dx.

al. di. es: di

COUNTER

dx 0x80 .done

ADC dx [dt] [di], al

WORD PTR [dt], di

ex x.l

CX,

bx.

di.

si.

si.

oop

kd 0 [img]

es: [di+bx] inc.x

• m^^m^^^i^^^mamammmmmmmmmmMii^)MA

109

//.asm //.asm //.asm //.asm //.asm //.asm II //.asm //.asm //.asm //.asm II //done: //.asm //.asm //.asm //.asm in

mov add l o o p mov add mov

pop dee

j z jnip

pop

pop pop pop

e s : [ d i + b x ] , s i b x , 4

p . l o o p 2 d i , WORD PTR [ d t ] d i , i n c . x WORD PTR [ d t ] , d i

ex ex done y . l o o p

ds

e s s i d i

/ / / /

//// hw_conv.l6.block(eonst image *I, const kernel *k) - conv using //// 16-bit xfers and intermediate buffer. Fastest of these versions

nil nil *** OBSOLETE ***

nil //void image::hw.conv_16.bloek(const image &I) //{

int blk.sz = (kern.dim * kern.dim)/2 + 1; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1;

//

//

//

//

//

//

//

//

//

//

//

//


BYTE *dt = data; BYTE *blk = new BYTE [(kern.dim * kern.dim) + 1] ; static BYTE **img = new BYTE *[kd] ;

img[0] = I.get.img.ptrO; for (int i = 1; i < kd; i++)

img[i] = img[i - 1] + siz.x + kd - 1;

push push push

S I

di es

^mmi^\}rJl, - ^ I I T I •lliTTrjMii KR=:;^Ti

no //.asm

//.asm

II //y.loop //.asm //.asm

II //x.loop: //.asm //.asm //.asm //.asm //.asm

II //b.loop: //.asm //.asm //.asm //.asm //.asm II _a.sm //.asm //.asm

//.asm //.asm


II //.asm //.asm //.asm //.asm //.asm


push mov

push mov



mov mov xor out

mov push Ids repnz pop

in in in in

ds ex.

ex ex.

ex ex. dx.

di. bx.

ex CX,

ds si.

si.

y

X

kd ex [blk] 0

dx

[img] [si + bx]

movsb bx, ds ex

4

b.loop

ex. dx.

al. dx.

dx. ds si. out! ds

al. al. al. al.

blk.sz COUNTER al al

DACS

[blk] 3W

dx ;// dx dx dx

;II let board settle

^^«a

I l l //_asm

II.asm II II.asm II.asm II.asm II //p.loop //.asm

//.asm //.asm

II.asm II.asm II //.asm

inc out

les mov mov

;


mov //not.done:

11.asm //.asm //.asm

II II.asm II.asm II.asm //.asm //.asm //hi: //.asm //.asm

II.asm II.asm II II.asm //.asm

II //.asm

11.asm II.asm II //p_loop2 //.asm //.asm

in and jnz

mov in cmp jae xor

les mov inc mov

pop loop

mov mov les

•

mov add

dx

dx, al ;// St

di, [img] bx, 0 CX, kd

si, es:[di+bx] si

es:[di+bx], si bx, 4 p.loop

dx, COUNTER

al, dx al, 0x80 not.done

dx, ADC al, dx al, 128 hi al, al

di, [dt] es:[di], al di WORD PTR [dt], di

ex x.loop

ex, kd bx, 0 di, [img]

si, es: [di+bx] si, inc.x

3S

^tMM mK'iiwriiiii'i'ii ififTiTigrriTgiiTBi ^ssm

112

//.asm

//.asm

//.asm

//.asm

//.asm

//.asm

II //.asm

//.asm

//.asm

//.asm

II //done:

//.asm

//.asm

//.asm

II.asm //}

mov

add

loop

mov

add

mov

pop

dee

jz

jnip

pop

pop

pop

pop

es:[di+bx], si bx, 4

p_loop2

di, WORD PTR [dt]

di, inc.x

WORD PTR [dt] , di

ex

ex

done

y.loop

ds

es

si

di

/ / / /

//// hw.eonv.8.nobIk(const image *I) - using 8-bit transfers //// direct from image array. Slowest of ASM versions.

nil //void image::hw_conv_8.noblk(const image &I) //{

int blk.sz = kern.dim * kern.dim; int X = siz.x, y = siz.y, kd = kern.dim; int inc.x = kd - 1; int size = ((siz.x + kd - 1) * (siz.y + kd - l))/2;

//

//

//

//

//

//

//

//

//

//

//

//

//

//.asm //.asm //.asm //.asm

BYTE *dt = data; BYTE *blk = new BYTE [blk.sz]; static BYTE **img = new BYTE *[kd];

img[0] = I . g e t . i m g . p t r O ; for ( i n t i = 1; i < kd; i++)

img[i] = img[i - 1] + s i z . x + kd - 1;

push s i push di push es push ds

. > 1. » • r . >. > t Pm*rjM^ •• • > j -»-mim».j

113

II 11.asm //.asm //.asm //.asm //.asm //.asm //.asm //.asm

II II.asm //.asm //.asm //.asm

II //y.loop: //.asm //.asm II //x.loop: II.asm //.asm II.asm II.asm II II.asm //.asm //.asm //.asm //.asm

II //b.loop: //.asm II.asm //.asm //.asm //.asm //.asm 11.asm II //.asm

les Ids push push pop pop mov

pop push mov dee

di, [img] si, es: [di] ds si di es ex, size

repnz movsw

pop loop

pop

ds ds CX, y ex

push ex mov ex, X

push ex mov ex, kd mov di, ex les bx, [img]

mov dx, COUNTER xor al, al out dx, al mov dx, DACS push ds

push ex mov ex, di Ids si, es:[bx] repnz outsb add bx, 4

ex b.loop

ds

mMk

114 II.asm //.asm

11.asm II.asm 11.asm //.asm

II //.asm

//.asm //.asm

II //p-loop //.asm

11.asm II.asm II.asm //.asm

II //.asm

in in in in inc out

les mov mov


mov //not.done:

II.asm //.asm //.asm

II II.asm II.asm //.asm //.asm

II.asm //.asm

II II.asm //.asm

II //.asm

II.asm //.asm

II //p.loop2 //.asm

II.asm

in and jnz


pop loop

mov mov les

;

mov add

al. al, al, al, dx dx.

di. bx. ex.

si. si es: bx.

dx ;// lei dx dx dx

al ;// st«

[img] 0 kd

es: [di+bx]

[di+bx], si 4

p.loop

dx,

al. al. not

dx.

al. di. es: di

COUNTER .

dx 0x80 .done

ADC dx [dt]

[di], al

WORD PTR [dt], di

CX

x.loop

CX,

bx.

di.

si.

si.

kd 0 [img]

es: [di+bx] inc.x

vs:-^/':'v:^^ ........^s^,..,^*,^^.^,^^.-*^ m^^n.. .,3j..

115

//.asm //.asm //.asm II.asm //.asm //.asm

II //.asm //.asm

II.asm //.asm

II //done: //.asm

//.asm

II.asm

mov add loop mov add mov

pop dec

jz jmp

pop

pop pop

es:[di+bx], si bx, 4 p_loop2 di, WORD PTR [dt] di, inc.x WORD PTR [dt], di

ex ex done y.loop

ds es si

//.asm pop di

//} IIII III II III II IIIII IIIII Illill I III II IIIII nil II IIIII III III 11 nil III I II II File POT.H - define the class POT, which interfaces to the // Dallas Semiconductor Digital Potentiometers that // calibrate the board. //

enum pot.t {ZERO = 0x40, GAIN = 0x80};

class pot

{

long int value; int which;

void shift.it0;

public: pot(int which.one, long int init = 0x0080);

void operator ++(void); void operator —(void);

};

#include <dos.h> #include "always.h" #include "board.h"

- II ' 'inrilliy'i I I I 'I iMf'^rrrrvrxs^y,-,.,

http://shift.it

116

#include "pot.h"

pot::pot(int which.one, long int init) {

which = which.one; value = init; shift.it0 ;

>

void pot::operator ++(void) {

if (value < OxOOOFFL) value++;

else {

if ((value > OxlOOOOL) ** (value < OxlFFOOL)) value += 0x00100;

else {

if (value == OxOOOFFL) value = OxlOOOOL;

else {

if (value == OxlFFOOL) value = OxOOOOOL;

} }

}

shift.itO;

void pot::operator —(void)

{ if (value < OxOOOFFL)

value—; else {

if ((value > OxlOOOOL) ** (value < OxlFFOOL)) value -= 0x00lOOL;

else

mg^' s^t^B^mmmmmssmmm

http://shift.it

117

if (value == OxlOOOOL) value = OxOOOFFL;

else {

if (value == OxOOOOOL) value = OxlFFOOL;

>

shift.it();

void pot::shift.itO {

outp(COUNTER,which);

for (int i = 17; i > 0; i—) {

outp(POTS, ((BYTE)(value » (i - 1)) * 0x01)); }

outp(COUNTER,0); } ////////////////////////////////////////////////////////////////////

//

// File: UI.CPP // Author: Donald A. Symes // Purpose: Operate image and kernel classes to perform convolution. //

////////////////////////////////////////////////////////////////////

#define UICPP #inelude <conio.h> #inelude <iostream.h>

#include "always.h" #include "kernel.h" #include "image.h"

extern "C" { #include "pcdip.h"

lAlMl MMBM^Mife.

118

void main(int arge, chax **argv) {

ifstream f(argv[l]);

kernel k(f); f -closeO ;

// open kernel file

// create object kernel from file

f.open(argv[2]); image i(f, k.width()/2); f .closeO ;

// open image file // create image object from file

i.display(0,0); // display raw image

image j(i.widthO,i.height(),i.bord()); // create target image

j.conv_float(k,i); j.display(155,0);

j.conv.int(k,i); j.display(0,101);

j.conv.accel(k,i); j.display(155,101);

while(!kbhit()); getchO ; pc.detachO ; arge = arge;

// perform floating point convolution // display result

// use fractions this time // display that

// use accelerator

// wait for user to be done looking

// restore display to text mode // suppress warning for unused variable

S^BSSSSS ^^jMsMiaem

PERMISSION TO COPY

In presenting this thesis in partial fulfillment of the requirements for a

master's degree at Texas Tech University or Texas Tech University Health Sciences

Center, I agree that the Library and my major department shall make it freely

available for research purposes. Permission to copy this thesis for scholarly

purposes may be granted by the Director of the Library or my major professor.

It is understood that any copying or publication of this thesis for financial gain

shall not be allowed without my further written permission and that any user

may be liable for copyright infringement.

Agree (Permission is granted.)

/ ^ ^f^l-'^ f puSFs^ifnil ture Date

Disagree (Permission is not granted.)

Student's Signature Date

t'

dMi^ma

TWO-DIMENSIONAL IMAGE CONVOLUTION A THESIS IN …

Documents

Transcript of TWO-DIMENSIONAL IMAGE CONVOLUTION A THESIS IN …