EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

183
EFFECTIVE COMPUTER ARCHITECTURE FOR MICROPROGRAMMED MACHINES by ROBERT WINSTON NOWLIN, B.S. IN E.E., M.S. IN E.E A DISSERTATION IN ELECTRICAL ENGINEERING Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY r. )9ceiTiber, 197o

Transcript of EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

Page 1: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

EFFECTIVE COMPUTER ARCHITECTURE FOR

MICROPROGRAMMED MACHINES

by

ROBERT WINSTON NOWLIN, B.S. IN E.E., M.S. IN E.E

A DISSERTATION

IN

ELECTRICAL ENGINEERING

Submitted to the Graduate Faculty of Texas Tech University in

Partial Fulfillment of the Requirements for

the Degree of

DOCTOR OF PHILOSOPHY

r. )9ceiTiber, 197o

Page 2: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

•J

t^ P' ACKNOWLEDGMENTS

I am very grateful to all members of my committee,

Dr. Donald L. Gustafson, Dr. Russell H. Seacat, Dr. Darrell

L. Vines, Dr. John F. Walkup and Dr. Thomas G. Newman, for

taking the time required to serve as members and for their

advice and help. I am most grateful to Dr. Gustafson who

conceived this project, served as actual chairman, and who

encouraged, directed, and prodded me to its successful

completion.

I ara also especially indebted to Dr. Seacat and

Colonel Travis Simpson for encouraging me to attend Texas

Tech University, and for providing the financial and moral

support necessary to continue my studies. Last but not

least I want to thank my wife, Donna, and my two children,

Nathan and Cynthia, for patiently enduring the extra

burdens and responsibilities placed on them during this

tenure. I am grateful to my wife who encouraged me

throughout and who served as typist and dissertation

preparation coordinator.

11

Page 3: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

TABLE OF CONTENTS

ACKNOWLEDGMENTS ii

LIST OF TA3LES v

LIST OF FIGURES vi

Chapter

I. INTRODUCTION 1

Tools For Preparing Correct Microprograms . 5 Optimization Of Microcode 6 System Structures Based On Micro-programmed Processors 7

Applications Of Microprogramming 8 Outline Of The Dissertation 11

II. MICROPROGRAMMING 14

Concepts And Terms 15 Hewlett-Packard 2100 Microprogramming

Characteristics 20 HP-21MX Characteristics . 24 Microdata MICRO 1600 Characteristics. . . . 30

III- MATRIX-VECTOR MULTIPLICATIONS 34

Assembly Language Routine 3 6 Microcoded Full-Word Matrix-Vector

Multiplication Routines 39 Microcoded Half-Word Matrix-Vector

Multiplication Routines 46 Proposed Architecture 52

IV. GAUSSIAN ELIMINATION 5 7

Machine Language Program 58 Microcoded Program 68 Proposed Archicecture 73

111

Page 4: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

V . KALÍ4AN FILTERING 75

On-Line Solutions 77 Off-Line Implementation 83 Example Second-Order System 9 0 Proposed Architecture 95

VI. EFFECTS OF FINITE WORD LENGTHS 98

FFT Algorithm 98 Truncation Noise Effacts 101 Experimental Results 110

VII. CONCLUSIONS 114

LIST OF REFERENCES 121

APPENDIX 127

A. COMPUTER PROGRAMS FOR CHAPTER III 128

B. PROGRAI^S AND FLOWCIiARTS FOR CHAPTER IV. . . 136

C. PROGRAMS FOR CHAPTER V 151

D. FFT PROGRAMS AND RUNS 161

IV

Page 5: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

LIST OF TABLES

Table Page

1. Microprogramming Faciiities 25

2. Gaussian Elimination With Complete Pivoting Algorithm 59

3. Times Of Execution For Individual

Subroutines, n = 5 71

4. Comparison Of Execution Times 96

5. Summary Of FFT Quantization Error 113

V

Page 6: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

LIST OF FIGURES

Figura Page

1. Simplified block diagrams of computer sysrems 16

2. (a) Organization of the HP2100; (b) HP2100 microinstruction format 21

3. 21MX architecture 27

4. 21MX microinstruction formats 29

5. MICRO 1600 block diagram 31

6. Flowchart for assembly-language matrix-vector multiplication 37

7. Flowchart for a n x 2 matrix times a 2x1 vector 40

8. Microprogram flowchart for a 4x4 matrix times a 4 X 1 vector 42

9. Microprogram flowchart for an 8 x 8 matrix

times an 8 X 1 vector 43

10. n x m matrix times m x l vector 44

11. n x 2 matrix times 2x1 vector, half-word . . . 47

12. 4 x 4 matrix times 4x1 vector, half-word . . . 48 13. Time of execution vs. matrix size 51

14. Proposed architecture for a microprogrammed

machine 54

15. Gaussian elimination with com.plete pivoting. . 61

16. Time vs. matrix size (Gaussian elimination). . 72

17. General block diagram of Kalman filter . . . . 76

18. 2 x 2 matrix multiplication 80

19. 4 x 4 matrix product 82

vi

Page 7: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

Figure Page

20. Second-order off-line Kalman filter 85

21. Straight line, off-line Kalman filter 88

22. Kalman filter example 94

23. FFT flow graph, N=8 100

24. Noisy butterfly computation (floating-point) . 105

VI1

Page 8: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER I

INTRODUCTION

Many techniques of modern controi theory and signal

processmg are especially well suited for adaptation to

digital computer solutions. Several of these techniques

are Kalman filtering, state estimation, optimal controller

implementation, signal smoothing, digital filtering, and

discrete Fourier transform computations using the fast

Fourier transform algorithm. These techniques use many

matrix routines such as matrix multiplication, matrix-vector

multiplication, and matrix inversion, for which the Gaussian

elimination algorithm is notably amenable. They also use

many special purpose algorithms which typically have a high

repetition rate, are relatively simple, have real-time

implications, and represent a significant percent of the

processing load in computer systems dedicated to such

purposes. Several of these modern techniques, i.e., Kalman

filtering, signal smoothing, and digital filtering, have

been digitally implemented in specializad hardware for such

esoteric projects as the Apollo re-entry controi, space

shuttle guidance and control system, and others. However,

these specialized hardware implementations are very

expensive and too costly for most commercial and industrial

users to apply to the somewhat more mundane projects of

Page 9: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

process control, power systems monitoring and control,

guidance and control of mass transportation vehicles, etc.

Since hardware implementations are too expensive, a

software solution to these problems must be considered.

There is little question that the aforementioned techniques

of modern control theory and signal processing can be

implemented on large computer systems such as the IBM 360 or

370, the UNIVAC 1108, the CDC 6600 or 7600, and others of

similar magnitude. But these machines are also too

expensive for most users to dedicate to such purposes.

Minicomputers, however, are relatively inexpensive, but are

normally slower and their software more cumbersome and

inefficient than large scale computers. Hence, minicomputers

are considered inappropriate for such problems either due to

size, speed or other requirements.

If both hardware and software implementations are dis-

carded due to economics, speed, or other requirements, how

can these problem.s be feasibly solved using digital sysrems?

The answer is that since many modern computers are micro-

program controlled (e.g., the IBM 360 and HP-2100), the

firmware of these machines may be used to effectiveiy

implement solutions. Firmware is an interm.ediate level

between hardware and software offering the flexibility

afforded by software but at a speed approaching that of

hardware. For machines being designed, microprogramming is

Page 10: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

a systematic approach for designing the control section of

the computer. But once the firmware has been designed, it

is burned into a ROM and becomes a perrr.anent, unalterable

unit of the computer. Later generation machines have been

designed to incorporata advanced, state-of-the-art architec-

tures, but have continued to use the same firmware as their

predecessor. In machines using microprogram control, micro-

programming is a systematic approach to the design (or

redesign) of the instruction set of the machine or a

systematic approach for studying the architectural features

of such machines to effectively implement specialized

problems. In either case, the use of microprogramming or

firmware requires a thorough knowledge of the characteristics

(both hardware and software) of the computer. Most computers

which use microprogram control have instruction sets which

are general purpose and which are not designed to efficiently

implement specialized problems. The task of the micro-

programmer is to design an instruction set, within the

limits defined by the computer's architecture, or to

redesign the architecture to accommodate this redesigned

instruction set, to effectively implement specialized

techniques such as matrix or matrix-vector products,

Gaussian elimination or other related tasks of modern

control and signal processing. The research reported in

this dissertation is concerned with developing or

Page 11: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

redesigning the architecture of a microprogram controlled

minicomputer to efficiently implement the algorithms of

modern control theory and signal processing, i.e., matrix-

vector products, Gaussian elimination, and Kalman filtering.

This architecture is designed to optimize usage of the

computer in terms of memory allocation and speed. It might

be assumed that an architecture redesigned for specific

tasks produces a specialized machine, but this is not the

case. As will be seen, that with the proper microcode, the

developed architecture results in a more general purpose

machine, The primary interest is to determine relationships

and trade-offs between the computer's parameters, i.e., word

length (accuracy), speed of operation, control store size,

memory access time, and other architectural features such

as number and type of registers available. Microprogramming

is the vehicle used to arrive at these relationships.

The research reported in this dissertation contributes

to the research in microprogrammable computers since little,

if any, work has been concerned with redesigning the

architecture of an existing microprogram controllad machine.

Presently, the research effort in microprogramming can be

divided into four broad areas: development of tools to

facilitate the preparation of correct microprograms;

optimization of microprograms; fundamental studies on the

organization of computer systeras utilizing microprogrammed

Page 12: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

control; and demonstrations of the use of microprogrammable

processors in special purpose applications (26). A brief

discussion of the work being done in each area is presented

in the following paragraphs.

Tools For Preparing Correct Microprograms

Since a thorough knowledge of the organization and

peculiarities of the host computer is necessary for the

preparation of correct microprograms, microprogramming is

often viewed as a difficult assignment. The preparation

of correct raicroprograms also has considerable economic

importance, for if after many production runs, it is

discovered that the microprograms are in error, the cost

of reprogramming could be prohibitive. These considerations

have prompted much effort to be exerted in developing

machine independent, higher level languages suitable for

microprograraming. Tirrell (57) has investigated the design

of an optiraizing corapiler for translating higher ievel

languages into raicrocode. Mallet and Lewis (3 4) describe

various approaches to the design of high-level languages

for raicroprogramming. A system for generation of optimum

microcode has been described by Blain, et ai., (5). A

system for computer aided microprograra design, MIKADO,

covering a broad range of aids has been developed and

discussed by Rottman (48).

Page 13: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

Papers describing procedures for the proof of

correctness of microprograms are few. Ramamoorthy and

Shankar (4 6) developed and discussed procedures for the

proof of correctness and equivalence of loopfree micro-

prograras. Extensions to the inductive assertion method that

are necessary for the proof of correctness of self-

modifying raicroprograms is described by Mauer (36). A two

phase procedure for producing defect-free microcode has

also been described (6). Patterson (43) has used the

methodologies that have been beneficial in the generation

of correct programs to design a structured microprogramming

language (STRUM) for generating correct microprograms.

Optimization Of Microcode

Research activity in optimization of microcode is

concerned with producing microprograms designed for miniraum

time or rainiraura use of control store memory space (i.e.,

both bit lengths and number of locations are minimized).

Since the format of vertical microinstructions resembles

that of machine language instructions, optimization

techniques used in translating programming languages are

applicable in the translation of higher level micro-

prograraming languages to verticai microcode (3). Kleir and

Ramamoorthy (29) provided some early comments OTL this

subject. Agerwala (2) described the various techniques that

Page 14: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

have been used for microprogram minimization. Kleir (30)

developed a method for the representation of microprogram

activity in machines using either a horizontal or vertical

microinstruction. This method can be used to determine

resource conflicts and micro-operation interaction and

hence microprogram minimization.

Tsuchiya and Gonazlez (58) have derived a graph-

theoretic algorithm to detect raicroprogram parallelism in

a sequential raicroprogram. The problem of minimizing the

storage requirements of horizontal microprograms by optimal

packing of raicro-operations into microinstructions has been

discussed by Yau, et al., (65). Dewitt (14) has developed

the Control Word Model for determining when two or more

micro-operations can be executed concurrently. The Control

Word Model is a machine independent model of the semantics

of the control words for microprogrammable computers. A

procedure for determining the minimum number of micro-

instructions of a given bit length B required to code a

microprograra has been developed by Tsuchiya and Jacobson (59)

Systera Structures Based On Microprogrammed Processors

In addition to the research discussed in the previous

two sections, which was mainly concerned with the application

of tools and techniques used in machine language programming

to microprogramraing, research on fundamental aspects of the

Page 15: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

8

organization of machines based on microprogrammed processors

has been conducted. Some of the early papers include that

of Thomas (55) and Hoeval (20). R. Burns and D. Savitt (7)

reported on the use of emulation to study block structured

architectures as did Lutz and Manthey (33). Hartenstein

(19) described the linkage mechanisms used in micro-

programraable systeras; discussed the difference between

horizontal and vertical linkage raechanisras, and showed the

distinction between linkage irapleraentation and the

execution raechanisras of seraantics. A hierarchial organiza-

tion of interpretation levels based on a top-down process

extended to the level of microcode which is interpreted by

hardware has been described by Schoellkopf (52). Cohen and

Liu (9) have suggested the use of multiple microprogrammable

microprocessors as an economically feasible means of

studying large computer networks. A network of minicomputers

is currently being designed at the University of Illinois

(12). A two-dimensional address scheme for addressing a

large address space with a limited number of bits is

discussed by Wakerly, et al., (61).

Applications Of Microprogramming

Although raost of the research reported in the three

previous sections may be considered applications of micro-

programraing, this section is concerned with those that may

be labeled as raicroprogramjning applications other than

Page 16: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

instruction execution. Im.plemventing various routines

directly in microcode, especially routines that are CPU

bound, that use many intermediate results which need not

be saved, routines that are highly repetitive, or are

awkward to implement with machine language instructions,

can result in significant performance improvement (3). The

literature describing applications of microprogramming has

been and continues to be dominated by research related to

the microcoded implementation of systems programs, for

example floating-point arithmetic, multiply routines, tree

searches, and fast FORTRAN processor microprograms for mini-

computers such as the Hewlett-Packard 21MX and Varian V-73

(3, 26). Considerable work has been directed toward

performance enhancement of higher level languages by micro-

coding the basic functions of the corapilation, translation,

and execution tasks. Habib (18) has given an overview of

sorae of the work in this area. An early paper by Weber (62)

on impleraenting EULER using the IBM System 360 static micro-

instructions has shown that primarily iogical programs

(e.g., compilers) can be an order of magnitude more

efficient if completely microprogrammed. Cook and Fiynn (10)

developed a raicroprogrammed microprocessor to emulate the

IBM System 360 set of machine instructions to assess

relative efficiencies of dynamic microprogramming. Tucker

and Flynn (60) developed a horizontally microcoded

Page 17: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

10

microprocessor and processor language to again emulate the

System 360 instruction set. Several other papers have

dealt with the problem of developing special microcoded

microprocessors to improve performance of programs of a

target machine (13, 42). Cox and Schneider (11) have

described work on a system designed to measure the improve-

ments in operating system efficiency that are obtainable

by iraplementing special functions (i.e., table manipulation

and indexing) in raicrocode. A general discussion of the

application of raicroprogramming to the tasks of emulation

and interpretation has been given by Fuller, et al., (15).

Thomas (56) reported the results of experiments conducted

by the IBM Corporation in which the execution times of micro-

coded iraplementations of algorithms such as raatrix

multiplication, polynomial evaluation, and table search were

compared with machine language versions of the same functions

Other applications include a graphics systems

developed at Brown University which utilizes three user

microprogrammable processors, two Digital Scientific Meta

4's and a locally developed processor called the SIMALE (3).

A lexical processor which accepts a string of characters and

delivers a string of symbols has been described by Chu (8).

Abd-Alla and Karlgaard (1) described an aigorithm for the

synthesis of an applications oriented microcode. Kratz,

Sproul, and Walendziewicz (31) described a specific

Page 18: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

11

microprogram.-controlled signal processing unit designed to

perform a variety of sigx-ial processing algorithms at

extremely high speeds. A combination of software and hard-

ware (firmware) monitoring of the performance of computer

systems which combines the advantage of both software and

hardware monitors has been reported (49). Diagnostic

microprograms (raicrodiagnostics) provide capabilities

unattainable by software diagnostic prograras as reported

by Ramaraoorthy and Chang (45) and Johnson (25).

Outline Of The Dissertation

The research to date has been chiefly concerned with

developing specialized raicroprogram controlled micro-

processors and the appropriate microprogramming language

necessary for implementation of some or all of these tasks;

or are concerned with the development of microprogram

controlled raicroprocessors designed explicitly for the

emulation of an already existing machine (e.g., the I3M 360)

in which comparisons of performance betv/een machine coded

routines and the microcoded microprocessors for the same

functions can be made. There has been very little, if any,

research perform.ed which utilizes an already existing user

microprogramraable minicomputer to study the architectural

needs of machines for efficient implementation of special

purpose probleras such as those of m.odern control theory and

signal processing.

Page 19: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

12

Chapter II discusses several of the mora important

concepts of modern firmware raalizations, describes the

architecture of the microprogram controlled Hewlett-Packard

2100 (the minicomputer used in this research effort), and

gives a brief discussion of two other microprogram

controlled minicomputers for comparison. In Chapter III

the architecture necessary to effectively iraplement various

size matrix-vector product algorithms is developed, Micro-

routines for both full- and half-word length algorithras

were coded to determine the best architecture for their

efficient execution and to assess architectural trade-offs

between accuracy and speed requirements. Algorithms which

can be used to calculate execution of these product routines

are derived. These algorithms are amenable to any computer.

In Chapter Four, the most effective architecture for

implementing a Gaussian elimination algorithm is áesigned.

This architecture is very similar to that described in

Chapter Three with the major difference being that more

microstorage is required to implement the Gaussian elimina-

tion algorithm. A machine language program and micro-

programmed version of the algorithm were developed to

compare speed and to assess the requirements for the

architecture. Techniques developed in the work reported

in Chapter Three for half-word multiplies were used to

produce speed improvements, especialiy effective in the

Page 20: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

13

microcoded array addressing routine. Algorithms similar to

those derived in Chapter Three for calculating execution

times are also produced. Chapter Five discusses the

architecture needed to implement the Kalman filter and

Chapter Six considers more fully the effects of finite

register lengths in the iraplemientation of specialized

algorithms such as the fast Fourier transform. If a

computer can be designed with a small word length and still

retain necessary accuracy, a considerable savings will

result. Not only will the cost of registers be decreased,

but main and control store memory costs will also be

slashed.

Page 21: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER II

MICR0PR0GRAMÎ4ING

Microprogramming is a relatively old concept for

computer design which has attained an unusual state of

maturity without having a formal definition. It was first

proposed by Wilkes in 1951 as a design technique to

"provide a systematic alternative to the somewhat ad hoc

procedure used for designing the control system" of a

coraputer (5 6, 47, 3). However, interest in raicroprograraming

remained miniraal until the 1960's (56) when the availability

of solid-state read only raemories (ROM's) made it

econoraically feasible to forraalize these ideas into hard

designs. With an accelerating pace of microprogramraing

activity, the term microprograraming has acquired a broader

meaning than in its original conception. Many definitions

such as "firmware" (40) and "the bridge between hardware

and software" (71) have been proposed. Possibly the idaas

of a more forraal definition are generalized in the

following: "Microprogramming corrals key control functions

into a regular structure, isolating them from. the data flow.

Microprogrammed control functions can be viewed as working

hypotheses while hardwired control functions are solid

commitments" (16). In the following paragraphs, the

concepts and terms of raicroprograraming are defined as they

are most commonly used.

14

Page 22: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

15

Concepts And Terras

The architectural difference between a conventionaliy

controlled and microprogram controlled computer is

illustrated in Figura 1. Even though the block diagram of

a conventional systera appears simpler, the control logic

is in fact comparatively more complex. All control

functions are implemented by means of a large number of

specialized combinatorial circuits scattered throughout

the computer. In addition, clock signals must be

distributed throughout to generate the controls in the

desired sequence. In such a system, the instruction set is

executed in a most efficient way but any function beyond

the original design would be very difficult to incorporate

due to the logic interdependency.

In the m.icrocoded system, the control logic is

comparatively simple and new functions (e.g., more machine

instructions) raay be iraplemented easily, The complexity

of the microprogrammed computer is in the coding of the

microprograras; but the systematic design of this approach

facilitates understanding and visualization of the various

control functions, Basicaily, microprogrammed control

consists of two parts: control store and control decode,

both of which are centrally located rarher than distributed

as in the conventionally controlled system. The control

store may be a ROM (read-oniy memory) or a WCS (Writabia

Page 23: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

16

CONVENTIONAL CONTROL

MEMORY

J ARITHMETIC

LOGIC

INPUT/ OUTPUT

^ > 1 I I/O DEVICES

MICROP ROGRAMMED CONTROL

CONTROL CONTROL STORE DECODE

^

B (MICROPROGRAMS)

CONTROL

DATA

MEMORY

i

ARITHMETIC LOGIC

li INPUT/ OUTPUT

I/O DEVICES

Fig. 1. Simplified block diagrams of computer systems

Page 24: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

17

Control Store). The raicroprograms which control the

computer are stored in this memory. The control decode

accepts one microinstruction at a time from control store

and decodes this microinstruction to produce a given

function.

The microinstructions may be of two types, vertical or

horizontal, and are usually classified by the number of

operations each controls. (But this certainly is not the

only distinguishing feature and considerable overlapping

occurs between vertical and horizontal microinstructions.)

Vertical microinstructions control single or few operations—

load, add, store, etc,—and often resemble machine language

instructions. They typically range from 12 to 24 bits.

Horizontal microinstructions control many resources which

operate in parallel. Typical operations that might be

controlled by a single horizontal microinstruction are the

simultaneous and independent operation of one or more

arithmetic/logic units (ALU's), input and output to main

memory, etc. Because horizontal microinstructions control

multiple operations, they contain more information than

vertical microinstructions and hence have greater lengths—

64 bits and more are common.

The degree of encoding in a microinstruction word also

affects the length of a microinstruction. In single level

(or direct) encoding, bits that control mutually exclusive

Page 25: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

18

resources are combined into fields. In two level (or

indirect) ancoding, the meaning of a particular f ield depends

on the value of a control field in the microinstruction.

Typically vertical microinstructions use two level encoding

while horizontal microinstructions tend to use single level

encoding or no encoding at all (each bit controls one

resource or operation in the computer).

The serial-parallel characteristic measures the araount

of overlap between the execution of the current raicro-

instruction and the fetching of the next microinstruction

to be executed. In a serial implementation, fetching of the

next raicroinstruction to be executed is initiated when the

execution of the current microinstruction terminates. In a

parallel iraplementation, the next microinstruction to be

executed is fetched in parallel with the execution of the

current microinstruction. The serial implementation has

the advantage of simplicity of realization, the disadvantage

of slower speed. For the parallel implementation, these

roles are reversed; its advantage is a savings of time and

its disadvantage is difficulty of implementation.

The monoohase-Dolyphase characteristic of micro-

instructions refers to the number of phases (minor cycles,

subcycles) used to execute each microinstruction, which

usually requires one major clock cycle. When monophase

implementation is used, there are no distinct subcycles of

Page 26: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

19

the basic clock cycle and each microinstruction is executed

by a single simultaneous issue of control signals. A

polyphase implementation is characterized by having each

major clock cycle comprised of multiple minor clock cycies;

the hardware generates control signals at each minor clock

cycle.

The raicroprogramraability of a specific raachine refers

to the degree of difficulty in microprogramming that machine

once the design and implementation of its microinstructions

are understood. The facility with which a machine can be

microprograraraed depends both on the realization of control

store and the availability of support software to aid in

microprograra preparation (i.e., both hardware and software

characteristics). Machines in which the control store can

be loaded under prograra control are called dynaraically

raicroprograraraable. Software support for microprogramr.able

computers which includes translators that transform higher

level representations into microcode ready for loading into

control store, simulators to assist in debugging of micro-

programs, and online debugging programs to assist in tha

monitoring of executing programs define a system called

user microprogrammable. Systems which have both a

dynamically alterable control store and microprogramming

support software are considered dynamical user micro-

proqrammable computers.

Page 27: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

20

The concepts and definitions of the preceding

paragraphs have acquired wide, but not universal, acceptance.

Various authors may define these terms slightly different

and changing technologies may alter their usage even as

"microprogramjTiing" has changed. Notwithscanding, the .reader

should be sufficiently equipped to understand the succeeding

discussions on microprogramming and the characteristics of

various coraputers.

Hewlett-Packard 2100 Micro-programming Characteristics

The Hewlett-Packard (HP) 2100 is a general purpose mini-

computer which features a magnetic core main memory of 32K

words, 16 bit word length and 9 80 nanosecond memory read/

write cycle time. The 2100 has a standard machine language

instruction set of 80 instructions all of which are fully

executed in 1.96 microseconds except for the ISZ instruction

which executes in 2.94 microseconds and the extended

arithmetic instructions which execute in various times

longer than 1.96 microseconds. For each level of indirect

addressing, 0.98 raicrosecond is added to the execution time.

Figure 2a summarizes the organization of the 2100.

The microinstructions in the HP-2100 consists of 24-

bits and are divided into six fields as shown in Figure 2b.

The microinstructions consist of basically three addresses,

an ALU function, and are vertical but do not normally use the

Page 28: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

S bus 16

Main I^emory

16 b i t s X

32K words

iOR 16 H-ALU,

±b

S h i f t e r

MAR^ 5 i

Counter 5 r "

Frcm w r i t a b l e æ n t r o l s t o r e

CSDR, 24

Control Store (RÛM)

24 bits X

IK words

CPU Instruction Reqister

CSAR^ I Sa,ve i^ |Registeif*'

OJ

to writable ^control store

Rbus 16

A 16

-r~^

(a)

16

16

16

16

^^hT

21

3P2 16

SP3 1 6 I

S?4 16

T bus 16

23

Rbus Sbus

! ' !

Function

Microinstruction 0 i bits

StorP Soecia l SkÍD ^ ^ c r o i n s t r u c t i o n O U l J i . t î û,«Jtiv_Xa.X oi\_LlJ j - • T -,

^ f i e l d s

(b)

F i g . 2 . (a) O r g a n i z a t i o n o f t h e HP2100; (b) HP2100 m i c r o i n s t r u c t i o n f o r m a t .

Page 29: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

22

two level encoding implementation which characterizes

vertical micromstructions. Ail micro-orders in a given

microinstruction are executed in a single control memory

cycle time of 196 nanoseconds (with the exception of micro-

instruc ions containing jump or return jump micro-orders in

the Function field). The main purpose of the R-bus, S-bus,

and Store fields is to read the contents of the specified

registers onto the R-, S- and T-buses, respectively. The

Function field causes the function ganerator and shifter

to perforra the specified operation using the contents of

the R- and S-buses as inputs and outputting onto the T-bus,

The Special field initiates I/O operations, accesses core

memory, raanipulates the counter and the carry flip-flop, and

specifies direction of shift operations. The Skip field

controls microprogram sequencing. The term "skip" is used

in an unconventional way: if the skip condition is true,

the next sequential microinstruction is not actually

"juraped over", but is forced to be a do nothing micro-

instruction (or no operation, NOP). Most fields use singie

level encoding, but two level encoding is sometimes used.

The two cases of two level encoding are: 1) when the

Function field specifies that a jump be executad, the jump

address is supplied in place of the Special and Skip fields,

and; 2) if the S-bus field spacifies that a constant be

read onto the S-bus, the constant is supplied in place of

the Special and Skip fieids.

Page 30: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

23

In the KP-2100, control store consists of a 24 bit by

256 word ROM and an optional Writable Control Store of 24

bits by 256 words expandable to 24 bits by 768 words (in

modules of 2 56 words each). The microprograms in the ROM

are committed to the implementation of the standard

instruction set. Each raicroinstruction is executed in 196

nanoseconds (ten times the speed of a standard machine

instruction) with the exception of Jr4P, RSB, and main memory

and I/O reference instructions. The JMP and RSB micro-

orders cause the microinstructions in v/hich they reside to

execute in two control store cycles (392 nanoseconds).

Microinstructions that reference main memory and I/O must ba

synchronized with the memory or 1/0 cycle. That is, the

microprogram raust be delayed at certain points until an

appropriate point in the longer cycle is detected. The

main meraory and 1/0 cycle times are five times longer than

the microinstruction cycle time (980 vs. 196 nanoseconds).

Synchronization is provided by either a CPU freeze, which is

automatic, or by inserting NOP microinstructions. The

microprogrammer must know when and how to properly insert

microinstruction that perform no operation (NOP's) except

to produce synchronization.

The Writable Control Store allows the user to tailor

the 2100 to a particular need by building microprograms to

be stored in and executed from tha WCS. Microprograms may

Page 31: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

24

be loaded into WCS via a system editor, Micro-Debug Editor

(MDE) , V7hich contains aids to assist in the monitoring of

executing programs. This last feature is very useful for

debugging the microprograms. An assembler which translates

the mnemonic code representations into microinstructions

ready for loading into WCS is also available. Hence with

a WCS and these software capabilities (MDE and Micro-

assembler), the HP-2100 is a dynamical user microprogramraable

machine.

Table 1 lists the computer parametars that are available

to the microprogrammer. The M- and T-registers are used in

accessing core raemory locations. The Central Interrupt

Register is a read-only register which contains the informa-

tion specifying which I/O device caused an interrupt. The

CPU instruction register is used for performing I/O and

special shift operations, and for passing parameters. All

other registers are generai purpose. For a more complete

description of the HP-2100 and its microprogramming

capabilities and characteristics, refer to references 21,

22, and 25.

HP-21MX Charactaristics

The Hewiett-Packard 21MX is a new family of microprogram

controlied and microprogrammable computers that is similar

to the HP-2100. Both are 16-bit, word oriented, paraliel

Page 32: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

25

TABLE 1

MICROPROGRAMMING FACILITIES

Thirteen registers

* A-register (16 bits)

* B-register (16 bits)

* M-register (15 bits)

* T-register (16 bits)

* Q-register (16 bits)

* F-register (16 bits)

* P-register (16 bits)

* Four Scratch Pad Registers (16 bits each)

* Central Interrupt Register (6 bits)

* CPU Instruction Register (16 bits)

A five-bit hardware counter

A function generator

A shifter

Five 16-bit data paths between the registers, the counter, the function generator, the shifter, and the I/O hardware

*

*

*

*

*

R-bus S-bus

ALU-bus

T-bus

I/O-bus

Four flip-flops * Flag (not to be confused with the I/O Flag)

* Overflow

* Extend

* Carry

Page 33: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

26

processing, and with a WCS, user microprogrammable machines.

The 21MX family emulates the 2100 line efficiently, has

compatible software, some differences in instruction tim.ing,

and a more powerful microcode to aliow ease of user

microprogramming. Other differences exists, but only those

that especially affect the microprogrammer will be

considered.

The control section of the 21MX, the CPU, is raicro-

programmed with a raicroinstruction execution time of 324.3

nanoseconds. The control store is divided into sixteen

modules of 256 words each (4K words total). Four modules

(IK) are used to implement the basic instruction set; the

other modules (3K) are available for user microprogramming.

Memory cycles take two microinstruction cycles (650

nanoseconds); one non-raemory type instruction may be

executed after initiating the cycle without freezing the

CPU. The memory address ragister (M) is unique to tha CPU.

Other raemory referencing devices share the M-bus and hava

their own M-registers.

An I/O or Dual Channel Port Controllar (DCPC) takes

five CPU cycles. This is done to make the 21MX 1/0

campatible with the 2100 and lets the CPU participate in

I/O operations.

Figure 3 illustrates the basic features of the 2iMX

architecture and should be referenced for the following

Page 34: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

27

; : -, 1

1 ^ ^ -

1 ! !

• i

u 4J O <u -p •H

u u (Ú

X

i H

ro

• H &4

Page 35: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

28

discussion. The M-bus is a 16-bit meraory address bus which

is controlled by the DCPC and shared by DCPC and CPU. The

lA-bus is the 6-bit interrupt address bus; the SC-bus is a

6-bit bus containing the select code of the I/O device

being referenced. The L-register is a 16-bit register

which acts as a second input to the ALU. Sixteen 16-bit

scratch registers are provided which include index registers

(X,Y) prograra address (P), front panel display storage (S),

and 12 scratch registers. The ALU performs operations on

the S-bus and the L-register. The DSPL register is the 16-

bit front panel display register. DSPI is the 6-bit front

panel display indicator.

The microinstruction word for the 21MX is vertically

implemented but unlike the 2100 has only five fields and

makes much more extensive use of two level encoding than

does the 2100. Figure 4 shows the organization of each of

the four microinstruction word types. The word type is

specified by the coding in the OP field: IMM, JMP or JSB

in this field defines word types 2, 3, 4, and modify

definition of the other fields. All other OP's define Word

Type 1. MPY in the OP field will perform the operation

B + A-L storing the result in the B/A register combination.

Word Type 1 is the most common and performs ALU operations,

I/O sequences, meraory references, and most control

functions of the machine. Word Type 2 is defined by IMM in

a å å åm

Page 36: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

29

RIR B i t s

^^rd Type 1

Word Type 2

Vford Type 3

Wbrd Type 4

23 20

OP

L JM

JMP

JM^^JSB

19 i5

ALU

0 1

0 1

14 10

S-Bus

Operand

Condition

not used

PJS

9 5

Store

Store

Ajidress (9 bits)

Address (12 bits)

4 0

Special

Special

CNDXspec

Any junp Spec.

H a r d w a r e F o r m a t s i n C o n t r o l S t o r e

Word Type 1

Word Type 2

Wbrd Type 3

Word Type 4

OP

IMM

JMP

JM^^JSB

Special

Special

CNDX

Special

ALU

Modifiers

Condition

not used

Store

Store

(PJS)

not used

S-Bios

Operand

Address

Address

Software Formats As Read Left to Right

Fiq. 4. 21MX microinstruction formats.

Page 37: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

30

the OP field and results in an 8-bit operand in the RIR

being gated onto the S-bus. Bit 19 specifies which byte of

the S-bus the 8-bit operand is gated onto and bit 18

specifies whethar or not it is gated complemented. Word

Type 3 is specified by JMP in the OP fiald and CNDX in the

Special field and is used for conditional branching, the

ALU field specifying one of 32 possible test conditions.

The destination of the jump is determined by the Address

field. Word Type 4 is determined by JMP or JSB in the OP

field but not CNDX in the Special field. For more detailed

descriptions of these features and other characteristics of

the 21MX, see references 35 and 24.

Microdata MICRO 1600 Characteristics

The MICRO 1600 is a microprogrammable digital computar

which makes use of both an expandable high-speed control

memory and magnetic core main memory (see Figure 5).

Control memory can be implemented in bipolar read-oniy

memory (BROM), programmable read-only memory (PROM), or

alterable read-only memory (AROM), expandable to 16K by

16-bit words. Main memory is expandable to 65K, 8-bit

words. Main memory cycle time is one microsecond and

control memory has a 200 nanosecond microcommand execution

time. There are thirty general purpose eight bit file

registers plus an eight bit status register.

Page 38: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

31

D i r e c t .Mamory

A d d r e s s

I /O Con txo l

O u t p u t Bus .

I n p u t Bus

1

I /O Corr^l í<£g (3)

OutpiJt Pâg (3)

.XerorN' Addr

1

.M Req (3)

=:5S 3UJ i :t.

" N k e q ( S ;

0:-ro .'-!e~ory 0-55.K 'v-.es

Menor/ Û3ta Bus i ,

(8)

1 ^ 1 T T-Reg (8)

> B 3uî

f CoTisole

1 •

'

L ink (1)

S t a t u s

'

î

'

« — Ari t r -

• 1 f L Reg (12)

1 '

C o n t r o l î^ionor/ ; i6 )

í 1 r R ?£g

i.o;

i •E;ner2i F i l e

Re^s ( 3 0 x 3 )

F l a g . ^ (3)

/ ' l oQÍc ' . r - i c ;3)

'

1

[ . < i < F l a g s

r A B'i^

1 iv . - î ?i=-g (12;

1

1 •

•3^

<

C-XT-a--.d a-^d

Fig. 5. MICRO 1600 block diagram.

Page 39: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

32

Standard software support includes a micro language

cross assembler written in FORTRAN, a micro language

assembler written for use on the MICRO 1600/20 and 1600/21

computers, MICRO 1600 simulator for use on the 1600/20 and

1600/21, an integrated circuit memory MAP generator which

converts the outputs of the micro language cross assembler

and micro language assembler to control memory bit patterns,

diagnostic routines stored in ROM and pluggable, and an

alterable read-only memory operating system for control of

AROM used for firmware checkout and debug.

The inicroinstructions are of the vertical type and

employ two level encoding characteristic of vertical micro-

instructions. A minimum of decoding is used and each micro-

instruction is executed in a single clock cycle, unless

delayed by a reference to main memory. A serial-parallel

type control is exercised. While one microcommand is being

executed, the next coramand in sequence is being accessed.

When the normal sequence is altered by a jump, a delay of

200 nanoseconds occurs to allow the first command of the new

sequence to be read.

When an AROM is incorporated along with a BROM, which

implements a standard instruction set, the MICRO 1600 becomes

a dynamical user microprogrammable machine. Uniike the

HP-2100 and HP-21MX which discourage discarding of the

basic instruction, the MICRO 1600 may be ordered with a

Page 40: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

33

control store microprogrammed completely to user specifica-

tions. This gives it an inherent flexibility not found in

either the HP-2100 or HP-21íyiX. The MICRO 1600 can be

applied as a direct function processor, general purpose

computer, special purpose computer, emulator or language

processor. This is bur a brief description of the MICRO

1600 but allows comparisons to be made with both the HP-2100

and HP-21MX. For more details see reference 38.

As has been discussed, microprograraraing affords the

coraputer user familiar with that computer's idiosyncrasies,

the capability to tailor the computer to a specific need.

This tailoring is obtained by designing a specialized

instruction set and redesigning the architecture to

impleraent this instruction set. By thus designing the firm-

ware of the coraputer to perforra special functions

efficiently, the coraputer usage is optimized in terms of

speed and memory allocation. The process of firmware

design is clearly demonstrated in the succeeding chapters.

Page 41: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER III

MATRIX-VECTOR MULTIPLICATIONS

Many equations of raodern control, digital filtering,

Kalman filtering and related problems involve matrix-vector

raultiplications which must be calculated to obtain their

solution. These matrix-vector products have the general

form shown in Equation 1

A X =

^ll ^12

^21 ^22

nl n2

Im

2m

nm . , 1

^i

2

• • •

X m

(1)

where A is a n xm matrix and x is a m x 1 column vector. In

computing a solution to problems where the product Ax is

needed, considerable time is involved for any reasonable

size n and m. Hence if a real-tim.e, on-line solution to the

probleras of raodern control and signal processing is to be

computed, a raethod for calculating Ax efficiently must be

developed. The method developed in this chapter is to

design an appropriate architectura and firm.ware which wiii

yield an efficient implementation of matrix-vector product

34

Page 42: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

35

algorithms. The procedure used to derive this appropriate

architecture and firmware is outlJ.ned below.

First a machine language iraplementation of the

algorithra to compute Ax is produced so that comparisons

of execution times with subsequently developed microcoded

versions can be made. An algorithm is deve.loped to

calculate execution times for any n, m, and T where n and

m refer to the matrix size and T is the basic machine

language instruction execution tirae (1.96 ps for the HP-2100),

Next, various size raatrix-vector products are microcoded.

A n X 2 matrix times a 2 x1 vector microprogram is

implemented and run. With the technigues used and

limitations of the HP-2100, this is the largest matrix-

vector product that can be implemented. Algorithms are

developed to calculate execution t.imes and actual execution

times for various size matrix-vector products are computed

for comparison. Finally half-word length (8-bit) versions

of a n x 2 matrix times a 2 x1 vector and a 4 x 4 matrix

times a 4 X 1 vector are microcoded and run. Algorithms for

execution tirae calculation are developed and execution

times computed for various size products to compare with

the aforem.entioned machine language and full-word length

versions. This development of micrcroutines reveals the

architectural characceristics necessary for an effective

Page 43: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

36

im.pleraentation of matrix-vector products on microprograra

controlled coraputers.

Assembly Language Routine

The assembly language routine for an n xm raatrix times

an mX1 vector is shown flowcharted in Figure 6. (The

actual program, along with all others developed in this

chapter, is given in Appendix A.) The routine was

programmed to handle only integers that had been scaled

three octal places, thus actually allowing three octal place

accuracy. No checking for overflow or underflow was made.

The multiplication of integers is certainly justifiable

since the main concern is not to just deraonstrate

feasibility or viability of this approach but to develop

routines araenable to real-tirae, on-line systeras applications.

In such applications, inforraation into the computer derives

from an analog to digital converter which produces

integerized digital values of the analog signal. Scaiing

is normally eraployed between the computer and the system

both on input and output. Checking for an overflow

condition would also normally be carried out, but the action

taken upon an overflow detection is system dependent, e.g.,

an abort might be necessary, further scaling might be

sufficient, etc. Therefore, the routine developed is

sufficiently general and adequate.

Page 44: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

37

ENTER

i + 1 -> i

1 - j

i A B 1 1

->

- j *

- > •

->

N M j 1

0 - SUM

GETAD

a..-X. + SUM

-> SUM

NO

SUM -> S

j + l-> j

X

Fig. 6. Flowchart for assembiy-ianguage raatrix vector multiplication.

Page 45: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

38

u

The elements of A, the a^j's, were stored sequentially

by rows, i.e., first, a^^, then a . through a , then a ,

etc. The algorithm used for accessing each array eleraen^

is given in Equation 2.

Address(ajj) = (I-l)N + J + Address (a ) - 1 (2)

The tirae of execution for this routine can be calculated

from the following equation,

t = 29.5mnT + lO.SnT + T (3)

where n and m refer to the matrix size and T is the basic

machine instruction execution timte (1.96 ps for the HP-2100) .

Equation 3, along with all other equations in this chapter

relating execution times, was deterrainad by actually

summing the times to execute each instruction and noting the

dependency on ra and n in processing the algorithm. The times

for a 2 x 2 raatrix tiraes a 2 x 1 vector, a 4 x 4 matrix times a

4 x 1 vector, and an 8 x 8 miatrix times an 8 x 1 vector which

were calculated for latter comparisons are, respectively.

^22 = 274.40 ys

^44 = 974.12 ys

tgg = 3867.08 ys . ( )

Page 46: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

39

Microcoded Full-Word Matri::-Vector MultiplicacioiT'Rourt r e:

The flowchart for a microcoded multiplication of an

n x 2 matrix times a 2 x 1 vector is shown in Figure 7. To

save mem.ory refsrences and unnecessary programming, x, and

x^ are first read into WCS and stored in registers F and Q,

respectiveiy. All scratch-pad registers are used: S2 and

S4 in the multiply subroutine to temporarily hold the

multiplicand and multipiier as are A and B to hold the

results; S3 contains the running sum, and; Sl to count n.

Each a^j is retrieved from meraory as needed; and the result

a-j., X., + a-j. x^ is then stored in raeraory. The Flag flip-

flop is used to deterraine when this result is corapleted.

The algorithra for calculation of tirae of execution was

deterrained to be

t = 13T + llOnT , (5) e

where n is the nuraber of rows in the matrix and T is the

microinstruction execution time (196 ns for the HP-2100)

The time to execute this microprogram for a 2 x 2 matrix

times a 2 X1 vector was calculated from Equaticn 5 to be

45.688 Ms. Comparing this with the time to perform the

same operation in assembiy language, a savings factor of

six (6) or about 230 ys is accomplished.

Page 47: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

40

ENTER

S^ - 1 - s^

N - 1

^ l

- > •

-> ^ i F

"r

X2

0

->•

- •

Q

E

MULTIPLY AND SCALE SUBROUTINE

YES

Fig. 7. Flowchart of a n x 2 matrix times a 2x1 vector

Page 48: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

41

As was demonstrated, all available registers plus the

Flag flip-flop were used to implement this microprogram.

Since oniy microinstructions and no data (other than eight

bit constants stored in the least significant eight bits of

certain microinstructions) can be stored in WCS, a severe

limitation, this is the raaxiraura size raatrix-vector product

that can be irapleraented on the KP-2100. However, raicro-

programs of a 4x4 matrix times a 4 x 1 vector, an 8 x 8

matrix tiraes an 8 x1 vector, and an n x m matrix times an

m x l vector were flowcharted and actually microcoded, but

not implraented. These flowcharts are shown in Figures 8, 9,

and 10, respectively. These routines require one storage

register for each element of the vector (4, 8, and m) , two

scratch registers plus the A- and B-registers for the

multiply subroutine, one scratch register for the counter

for the square matrices and four counters for the n x m mcatri ,

and one register for the running sum; or, 8, 12, and m + 7

scratch registers, respectively. As seen, these are more

than are available on the 2100, a iim.itation that could be

overcome if data could be stored and retrieved directly from

WCS (other than as eight bit constants as previously

discussed).

The time of execution aigorithms for the first two

of these microprograras is given as

Page 49: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

42

H A V E ^ \ 16 PASSES \ .N0

BEEN MADE?--

: : !

( EXIT V

X2 - S^

HAVE i JO/THPÆE PASSEâX

\ B E E N MADE? / ^

ÍYES

X4 - s .

MULTIPLY / AND SCALE \ SUBROUTINE

NO

- HAVE \ TVrD PASSES ^>

BEEN iMADE?

HAVE FOUR PASSES" BEEN MADE?

ixis_

S^ -> M D

ÍYES

X. t C 1

•^. B )

Fig. 8. Microprograra flowchart for a 4x4 matrix times a 4x1 vector.

Page 50: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

43

>i ' -1

X2 - £2

^3 * S3

.—

¥ 0 - Sio

; , -• s , 1

^ MOLTIPLY A.ND SCALS S'J3R0!JTI.\-E

Fig. 9. Microprogram flowchart for an 8 x 8 matrix times an 8 X1 vector.

Page 51: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

44

ENTER

X . 3

M-1 -> S N - 1 -> S

M+1 M+2

0 ^ 8 M+3

' M + 4

0 -• S M+5

*M+6

^Mf^"^^ "^^M+4

Fig. 10. nxra matrix tiraes raxl vector

Page 52: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

45

tg = 5T + 7nT + 59n^T + nT Z (n - d) n-2 ' Z d=2

n-1 + 2nT Z (n - d) , (6)

d=2

where n refers to the (square) raatrix size, T is the raicro-

instruction execution tirae, and d is a counter which

depends on the number of decisions to be made (n-dependent)

The times of execution for these programs are

(4x4) t = 200.116 ps e

(8x8) t = 860.244 ys . (7)

Comparing these times with those corresponding ones found

using the assembly language routine (i.e., 200.116 ys vs.

974.12 ys and 860.244 \is vs. 3867.08 us) , savings ratio of

about 4.8 and 4.5 are effected. The algorithm for

calculating execution time for the n x m matrix times the

mxl vector microprogram is

m-2 t = 2T + 5mT + 58nmT + 4nT + nT E (m - d) ^ d=l .

m-1 + 2nT Z (m - d) , (8)

d=2

Page 53: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

46

where again n and m refer to the matrix dimensions (n rows

and m columns), T is the basic microinstruction execution

time (196 ns for the HP-2100), and d is an ra-dependent count

Tirae of execution for a 4 x 4 raatrix times a 4 x1 vector is

approximately the same as the above corresponding 4x4 time.

Microcoded Half-Word Matrix-Vector Multiplication Routines

As was noted above, the largest full word length,

matrix-vector product raicroroutine within the capabilities

of the 2100 was an n x 2 raatrix tiraes a 2 x 1 vector. This

was due to a shortage of scratch-pad registers for storage

of vector elements. If the word lengths are halved to 8-

bits (7 bits plus sign), then two vector elements can be

stored in the same register and the product of a 4x4 matrix

times a 4 X1 vector come within the capabilities of the

machine. Also observe that approximately two-thirds of the

execution time for the above routines was consumed in the

multiply subroutine. Since in a half-word routine 16-bit

multiplies are not needed, but only 8-bit muitiplies, an

initialization of the counter to eight will effectively

reduce multiply tiraes by half. This results in a consider-

able savings of tirae. The flowcharts for an n x2 raatrix

times a 2 X1 vector and a 4 x 4 matrix times a 4 x1 vector

are illustrated in Figures 11 and 12.

Page 54: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

47

ENTER

V x ^ , X2 ^

0 -> E

Q

1 -^ FLG 27 -> IR

N-1 ^ S

Fiq. 11. nx2 matrix times 2x1 vector, half-word.

Page 55: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

48

-X, • 5 ,

^

í ENTER j

t X X, * Q

X *.

X , X. - F

27g - IR 2C8 * 5 i

0 - £

r^v-V •

0 - S j X^- 52

CED

F i g . 12 . 4 x 4 m a t r i x t imes 4 x 1 v e c t o r , half-worcî

Page 56: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

49

The algorithras for calculation of execution times

for both worse case (all vector elements negativs) and

best case (no vector element negative) are given by

Equations 9 and 10, respectively.

Worse Case: t = 7T + 107nT e

Best Case: t = 7T + lOlnT (9) e

Worse Case: t = 5T + 4.5nT + 53.5n^T e

n-2 n-1 + nT S (n - d) + 2nT E (n - d)

d=l d=2

Best Case: t = 5T + 4nT + 49.5n^T e

n-2 n-1 + nT Z (n-d) + 2nT E (n - d) , (10)

d=l d=2

where n, T and d have the same meanings as in the previous

equations. The respective times of execution for a 2 x 2

matrix tiraes a 2 x 1 vector and a 4 x 4 raatrix times a 4 x 1

vector are:

Worse Case: t = 43.316 ys e

Best Case: t = 40.914 ys (11) e

Page 57: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

50

Worse Case: t^ = 180.908 ys e

Best Case: t = 167.972 ys . (12) e

Equation 10 can also be used to calculate the time of

execution for an 8 x8 matrix times an 8 x1 vector. These

times are:

Worse Case: t = 787.332 ys e

Best Case: t = 703.444 us . (13) e

Even though considerable time is saved in these half-word,

matrix-vector product microroutines, the savings is not as

great as anticipated. This is due to the extra prograraming

necessary to extract each vector eiement from the scratch-

pad registers, to detect and account for negative vectors,

and to scale each vector to accoraplish the 8-bit raultiply

routine. But as the value of n increases (see Equations 9

and 10), greater savings are realized. This becomes even

more apparent by referring to Figure 13 where it is seen

that as n increases, the graphs of execution times for the

assembly language and microcoded full- and half-word

routines all diverge. As also shown, that in microcoding

8-bit word length routines, larger matrix-vector products

can actually be implemented in the computer. This aspect

Page 58: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

51

900-

850-

800"

750-

700-

650-

600-

550-

'MSOO-

(U

£450" •H

400-

350-

300-

250'

200.

150'

100.

* 50.

Assembly Language

4 -T -

6

Ful l-Wbrd

Half-Wbrd

8

Matr ix S ize (n)

F i g . 13 , Time of execu t ion v s . raatrix s i z e .

Page 59: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

52

is discussed raore fully in the succseding section on the

architecture proposed to effectively impiement these

matrix-vector products.

Proposed Architecture

From the preceeding discussion concerning implementa-

tion of microroutines for matrix-vector products, the

limitations of the HP-2100 for efficiently microprogramraing

such problems are evident. A proposed architecture to

alleviate these limitations and allow an efficient imple-

mentation of an n x ra raatrix tiraes an ra x1 vector is shown

in Figure 14 and described in the following paragraphs.

Assuraing that the bus structure and raicroinstruction

format remain fixed, the most severe limitation of the

HP-2100 is a shortage of scratch-pad registers, By

providing raore scratch-pad registers, larger size raatrix-

vector products can be irapleraented, greater computational

versatility results, and the execution speed of many

programs can be increased considerably. To implement the

nxm matrix tiraes raxl vector product, ra+7 registers are

needed. These ra+7 registers are shown in Figure 14, where

m scratch-pad registers are shown as "S-bus" registers and

the other seven are shown as "R-bus" registers. Four of

the "R-bus" registers are iabeled as in the HP-2100, i.e.,

/ B, Q, F. Providing more registers on the R-bus aiso

Page 60: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

53

results in greater versatility, a reduction in number of

microinstructions to implement a given function, and hence,

a reduction in execution time. Ail of these registers

are also assumed to be generai purpose registers and not

latches as are the scratch-pad registers in the HP-2100.

The scratch-pad registers in the HP-2100 are likely to

develop a "race" condition if loaded while being interrogated.

This limitation prevents the microprograraraer from specifying

the same scratch-pad register in both the S- and T-bus (in

the same microinstruction) which leads to more micro-

instructions than necessary if these registers are made

general purpose.

If it is desired to implement the half-word length

(8-bit) version of this microroutine, the m scratch-pad

registers could be made 8-bit registers. To efficientiy

incorporate these eight bit registers, the capability to

detect the raost significant bit (sign bit) as on or off must

also be implemented. With these features added, the half-

word length microroutines can be much more efficiently

executed since each eight bit vector wouid not have to be

extracted from sixteen bit registers and a sign detection

made.

Figure 14 also shows several additional five bit:

counters on the S-bus. These countars are not necessary

to implem.ent this matrix-vector product since the m + 7

Page 61: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

54

S bus

Main Memory

16 bits x

32K words

16

T bus

"csn

From writable control store

Control Store (ROM)

24 bits X

IK wrds

PSavê" jRegister

A CSARj^ 0-*-

CPU Instruction Reqister

'' to writable control store

16

F ig . 14. Proposed a r c h i t e c t u r e for a raicroprograramed machine.

Page 62: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

55

registers include the registers necessary for counting.

The counters are shown to indicate that several of the

m+7 registers may be replaced by shorter length (5 bit)

registers to be used as counters.

Another limitation of the HP-2100 is that only micro-

instructions can be stored in and executed from WCS, that

is no data can be directly accessed in WCS. The only data

available in WCS is stored as eight bit constants in the

least significant eight bits of raicroinstructions containing

a "CR" or "CL" micro-order in the S-bus field. The "CR"

and "CL" micro-orders direct the coraputer to read the eight

bit constants stored in bits 0-7 of the microinstruction

onto the least (CR) or most (CL) significant bits of the

register specified in the T-bus field. The featúre to read

and store data directly into WCS locations by microprograras

resident in WCS would greatly enhance the capabilities of

the machine. However, the need for this feature is

mitigated by the addition of sufficient scratch-pad

registers but if ra is large, the cost of such registers

might become prohibitive. So there is a trade-off here,

either incorporating as many registers as necessary to

implement a problem or incorporating several additional

registers and adding the capability to read and store data

into WCS. These features would significantly increase speed

of execution of many microprograras and greatly enhance

Page 63: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

56

the computer's overall capabilities—not only for raatrix-

vector products but for a wide class of probleras.

Page 64: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER IV

GAUSSIAN ELIMINATION

Solving a set of equations of the forra Ax = c, whera

the raatrix A and the vector £ are given and the vector x is

to be determined, is one of the foremost problems of

numerical analysis and systems theory. A companion problem

to obtaining the solution to this set of linear equations,

one that also occurs in modern control theory, signal

processing and other applications, is that of matrix

inversion. There have been many algorithms developed for

the solution of both of these problems. By far the miost

heavily used algorithm, at least for the solution of linear

systems, is Gaussian elimination (51). The Gaussian

elimination algorithm is applicable to both the solution of

linear systems and the problem of raatrix inversion. For

any appreciable size raatrix, this algorithm also consumes

considerable time. Hence it is iraperative to design an

appropriate architecture and firmware for efficiently

computing this algorithm.

An appropriate architecture and firmware is developed

in this chapter by first coding an assembly language

routine for the solution of the Gaussian elimination

algorithm. This program is developed to test correctness

of procedure and to compare its running time with the

57

Page 65: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

58

subsequently developed raicrocoded version. Certain

portions of the assembiy language program are then micro-

coded; the complete algorithm could not be microprogrammed

due to the siza of the WCS. Execution times of the machine

language program and the microcoded version are calculated

and compared. This procedure illuminates the limitations

of the HP-2100 and suggests an appropriate architecture for

efficiently irapleraenting the Gaussian eliraination algorithra.

This architecture is described at the end of this chapter.

Machine Language Prograra

The Gaussian eliraination algorithm used is a modifica-

tion of that given by Stewart (54, p. 126). The algorithm

is a Gaussian elimination with complete pivoting and over-

writes A, an n Xn matrix, with the Gaussian decomposition

of A with its rows and columns permuted. Tha row and

column interchange indices are pj and y^, respectively.

This algorithm is given in Table 2. The algorithm developed

by Stewart has the advantage of saving the multipliers ]i .

which must be used in a succeeding algorithm to finally

obtain the solution to the linear system Ax = c. The

algorithm actualiy programmed is somewhat faster, but can

not be used to solve the linear system of equations. How-

ever, it does serve to illustrata the features of Gaussian

elimination with complete pivoting; very little additional

programraing is needed to secure the multipiiers \i^y

Page 66: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

59

TABLE 2

aAUSSIAN ELIMINATION WITH COMPLETE PIVOTING ALGORITHiM

For k = 1, 2, ...,n-l

1) Find Pv' Yi 2: k such that

a pk'^k

= raax{ a. . : i, j 2: k} a . 13

2) If a = 0, set r = k-1 and halt. Pk'^k

3) a a kj Pv/3 (j = k, k + 1, . . . ,n)

4) a., -f~>-a. (i = l,2,...,n) ik i,Yj^

5) a. . - a . - (^^^•^^•)/\^

(i = k + l,k+2,...,n;j = k + l,k+2,...,n)

6) a., 0 (i = k + 1, k + 2, . . . ,n) ik

Page 67: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

60

This algorithm was written in subroutine format, that

is, each step of the algorithm was programmed as a separate

subroutine. This procedure has the advantage that each

step (subroutine) can be programraed separately and then

tested for correctness and completeness independently.

Subroutines can later be replaced by an appropriately coded

microprogram. The general flowchart for this algorithm is

shown in Figure 15; the flowcharts for each subroutine and

all prograras are given in Appendix B. Only integer

arithraetic is used with no overflow or underflow checks

made. Integer arithmetic and finite register length

representations of real numbers introduce truncation errors

which tend to accumulate (grow) in the Gaussian elimination

routine. This "chopping" effect is one not normally

analyzed but its consequences need to be considered.

Let the real number, N, be represented using a finite

number of bits with a fixed point format and binary

arithmetic. Further, assume that N has been normalized such

that |N|Í.1. Then N has the binary expansion

N = Z n, 2 ^ , n, = 1 or 0 (14) k=l ^ ^

where N i s a l so assumed to be a p o s i t i v e q u a n t i t y . Now

approximate N by a word of only b - b i t s using t r u n c a t i o n .

Page 68: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

61

/ MAX VALUE

/^ij"^ij\

-Q ik-Q ki /

= ^

= n - 1? NO

k + 1 -> k

\ Y YES

HALT 77 ^

V.

Fig. 15. Gaussian elimination with complete pivoting.

Page 69: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

62

that is all bits beyond the raost significant b-bits are

dropped. The error introduced by truncation is raore

serious than that introduced by rounding because of a bias

The error due to truncation, i.e., the value before

truncacion rainus the value after truncation, v/ill always

be positive for positive numbers. That is, the effect of

truncation is to reduce the value of the numbers. Let N ^

be the b-bit representation of N, then

0 l e = N - N ^ < 2 ^ (15)

where e is the error due to truncation.

Because of its convenience, requiring only a bit by bit

compleraention, the one's-complement representation of

negative numbers is the rapresentation used to insert

negative matrix and vector elements into the computer. For

a one's-compleraent negative nuraber represented by the bit

string l^ n^, n^,. . .,n^, the raagnitude is given by

N = 2.0 - X, ,

where

-k (16) oo

XT = 1 + E n, 2 ^ k=l ^

Page 70: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

riprr

63

Truncation to b-bits produces the bit string l^ n^, n^,.. .,nj

where now the magnitude is

b "•" " " 2 N^ = 2.0 - 2 - X /

where

X = 1 + E n 2 . (17) " k=l ^

The change in magnitude is

AN = N - N = 2 ^ - S n, 2"^ (18) ^ k=b+l ^

so that

0 1 AN < 2"^ (19)

Hence the effect of truncation for one's-compleraent

negative numbers is to also decrease the raagnitude of the

negative number,

For a two's-corapleraent negative nuraber (the HP-2100

uses two's-corapleraent arithmetic) represented by the bit

string 1 n, , n^/...,n^, the magnitude is

N = 2.0 - x^ (20)

Page 71: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

64

and truncation to b-bits yields a magnitude of

^b = 2.0 - x^ (21)

where x^ and x^ are as defined above. The change in

magnitude is

AN = N - N = - E n 2 (22) k=b+l ^

where it is easily seen that

0 2. AN > - 2"^ (23)

Hence the effect of truncation for two's-compleraent

negative numbers is to increase the raagnitude of the

negative number. Thus inserting negative nurabers in one's-

complement form tends to offset the errors introduced by

the two's-complement arithmetic in the HP-2100. However,

it is this two's-compleraent error that tends to accuraulate.

The effects of this accuraulation will now be deterrained.

The Gaussian elimination routine only calls for the

execution of the arithmetic operations of subtraction,

multiplication and division. However, the 2100 has only an

addition instruction; subtraction is performed by first

complementing the subtrahend and then adding this to the

Page 72: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

65

minuend. Hence, the effects of errors on the processes

of addition, multiplication and division will be

considered. When two b-bit fixed point numbers are added,

their sura will still have b-bits, provided there is no

overflow. Tharefore, under the assumption of no overflow,

fixed point addition causes no error. Another considaration

enters since the reprssentation of positive numbers

introduces positive errors and two's-compleraent arithraetic

introduces negative errors, these errors tend to cancel

(assuraing positive nurabers are as likely as negative

numbers) and there would be no accumulation of error in

this process, The only error is that which was introduced

by the finite length representation of real num.bers.

Now let A and £ be the actual matrix and vector in the

coraputer. Denote by x. and x the true and computed

solutions of A X = £. A reasonable measure of the error

is given by

x^ - X —t —c (24)

The approach to be taken in an estimation of the error E

is to assume that the computed solution x is the true

solution of some perturbed system which is written as

(A + ôA)x = c + 6c (25)

Page 73: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

66

then the problem is to find bounds on ôA and ôc. The

analysis to determine the size of these perturbations was

carried out by Ralston (44) with the result that

Aa. .| < 2 i <. j

and

13

Ac^l < 2"^ (26)

where the errors have been given to reflect the "chopping"

effect rather than the round-off effect. Thus all elements

of 6A and Ô£ are less than 2 in magnitude. This says

that the result of triangular decomposition (the Gaussian

elimination routine) corresponds to an original raatrix

which differs from A by no more than the truncation

required to insert A into the computer. That is, in the

procedure for calculating the result

^ij " ^ij " ( ik' kj / kk ^ ^

where a., is multiplied by a, ., that double length product

divided by a, , , the single length quotient then complemented

and added to a.., there is no accumulation of error. 13

With an estiraate of the errors incurred in coraputing

the Gaussian eliraination algorithra in hand, the problera of

I Éi i lMÉ'i II

Page 74: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

67

prograraming the algorithm can be more fully considered.

The flowchart for the algorithm was shown in Figure 15.

Formulas for computing tiraes of execution for each of the

SQbroutinas of the Gaussian eliraination routine are given

in Equations 28 through 33.

Max Value Subroutine:

t = 22T + 36^2^ - 25nT (28)

Interchange Rows Subroutine:

t = 44nT - 0.5T (29) e

Interchange Colurans Subroutine:

t = 1.5T + 44nT (30) e

Eliraination Subroutine:

t = 15T + 77.5n^T + 5nT (31) e

Zero Elements Subroutine

t = 2.5T + 20.5nT (32) e

Get Array Element Address Subroutine:

t = IIT . (33) e

Page 75: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

68

In each of thesa equations (28-33), n refers to the size of

the matrix, and T is the instruction execution tirae. These

may be combined into one equation to obtain an algorithm

for computing the total time of execution of the Gaussian

elimination routine. This equation is given as

n t = 63nT + T E m(113.5m + 68) - 59T , (34)

m=2

where n and T are as given above and m is a counter depend-

ing on n. For a 5x5 raatrix, the total tirae of execution

was calculated to be 14,379.52 ys.

Microcoded Prograra

As previously stated, the HP-2100 WCS does not have

enough storage to raicrocode the coraplete Gaussian

elimination routine. Hence those subroutines which were

most repetitive, needed fewer meraory references and/or

fewer scratch-pad registers and were less complex to

program, are those which were microcoded. These subroutines

are get the array element address, interchange rows, inter-

change columns, and set array elements to zero subroutines.

The initialization procedure for the microcoded

version of the Gaussian elimination aigorithm required more

programraing than did the asserably language version. This

is because each raacro-instruction which calls a raicroroutine

Page 76: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

69

was first initialized to take the value of n to the raicro-

prograra. The microprogram to get the array element address

requires that the values of i and j be loaded into the B and

A registers, raspectiveiy, before accessing. This micro-

routine also uses to advantage the techniques to shorten

multiply time which were developed in the previous chapter.

The microprograras to interchange rows and columns and zero

elements have their own array element addressing schemes and

do not use the raicrocoded subroutine to get addresses.

The formulas for calculating execution times of each

subroutine are given in the following equations.

Max Value Subroutine:

t = 14.IT + 26.1n^T - lô.lnT (35) e

Elimination Subroutina:

t = 8.1T + 40.2n^T + 6nT (36) e

Interchange Rows Subroutine

t = 21T + 36nT e

(37)

Interchanqe Columns Subroutine

t = 23T + 50nT e

(38)

Page 77: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

70

Zero Elements Subroutine:

t^ = 20T + 24nT (39)

Get Array Element Address Subroutine:

t^ = 21T . (40)

In the first two equations (35 and 36), T is 1.96 ys and

in Equations 37, 38, 39, and 40, T is 196 ns. In all

equations n refers to the matrix size. These equations may

be combined into one to obtain an algorithm for coraputing

the total tirae of execution of the microcoded Gaussian

elimination routine. This algorithm is given in Equation 41

n t = 46.2nT + T Z m(66.3ra + 0.9) - 27.2T . (41)

ra=2

In this equation, T equals 1.96 ys, and n and ra are as

previously given. The timLe to complate execution for a

5x5 matrix was calculated to be 7239.176 ys. This

represents a savings of approximately 2 to 1; but in

addition a meraory storage savings of about 1/3 (255 to 176

storage locations) was accomplished. Table 3 shows a

comparison of times for each subroutine for n = 5.

Figure 16 shows that as n increases, greater savings are

Page 78: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

71

TA3LE 3

TIMES OF EXECUTION FOR INDIVIDUAL SUBROUTINES, n = 5

Subroutine

Time (]is)

Assembiy Language Microcoded

MAX

ELIM

IROW

ICOL

ZERO

GETAD

1562.12

3797.5

430.22

434.14

205.80

21.56

1148.72

2044.476

39.396

53.508

27.44

4.116

Page 79: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

72

34000 -|

32000 -

30000 "

28000 -

26000 .

24000 -

22000 -

20000 -

10 p. 18000 "

g 16000 1 -H

14000 -I

12000 -

10000 -

8000 .

6000 "

4000 -

2000 -

•^v

Assembly Language

Microcoded

- r 4

-r 5

-r 6

Matrix Size (n)

F ig . 16. Time v s . raatrix s i ze (Gaussian e l i ra inat ion)

Page 80: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

73

realized in the microcoded version since the curves of

execution times diverge. As can be seen, those subroutines

which were completely microcoded (IROW, ICOL, ZERO, and

GETAD) produced considerable savings. If more scratch-pad

registers had been available, even greater savings could

have been effected, and if the complete Gaussian eliraination

routine could have been microcoded still greater savings

would have resulted.

Proposed Architecture

As noted in the preceding paragraph, the most critical

shortcoming of the HP-2100 for efficiently microcoding the

Gaussian elimination algorithm was again the shortage of

scratch-pad registers. Considerable time was spent in

accessing main raemory for temporary storage locations. This

could be avoided with the addition of several more scratch-

pad registers. Essentically, that architecture proposed

and discussed in Chapter III for raatrix-vector products

would also adequately handla the Gaussian elimination micro-

program. The same number of scratch-pad registars (m + 7)

would not be necessary, but at least a four-fold increase

(to 16) over the HP-2100, as in the HP-21MX, is suggestad.

It is further proposed that these sixtean scratch-pad

registers be in addition to the A, B, F, and Q registers.

And, again, as in Chapter III, Figura 14, severai of thesa

Page 81: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

74

registers should be R-bus registers for added flexibility.

Since considerable counting is performed in implementing

the Gaussian algorithm, the additional five bit counters

depicted in Figure 14, page 54, play an important role in

efficiently microcoding this algorithm. Five bit counters

are adequate for matrices up to 32 x 32; any matrices iarger

than this will require larger counters.

The fact that the WCS was not of sufficient size to

completely raicrocode the Gaussian elimination algorithm

will be partly offset by the irapleraentation of the proposed

architecture. This proposed architecture permits the

optimization of the raicroprograra, which essentically means

fewer microinstructions per microcoded function and hence

less storage required. Another WCS module could be added

to alleviate the problem of insufficient storage, but these

modules are quite expensive and the accessing scheme

between WCS raodules is not conducive to efficient and fast

microprograraming. Since the WCS is organized as a 1x256

(1 bit by 256 locations) memory, a much more efficient and

less expensive way to increase memory size would be to

implement the meraory as 1x1024 or even 1x4096, taking

advantage of the state of the art in seraiconduc-or memories.

As previously noted, the implementation of thi3 proposed

architecture would certainly enhance the capabilitias of

the computer in executing these routines or any other since

flexibility has also been increased and not restricted.

Page 82: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER V

KALMAN FILTERING

The derivation and results of Kalman filtering

techniques are well known and have found wide application

(27, 28, 37) . This optimum filter is a faedback system

which operates by taking a copy of the model of the system

being estimated, forming the error between the output of

this copy of the model and the measurement on the output of

the system, and feeding forward this error amplified by

the gain K(t) (see Figure 17). Thus thé specification of

the Kalman filter is equivalent to the computation of the

optimal tirae-varying gains K(t). Tha equations specifying

K(t) involve raatrix and raatrix-vector products. This

recursive estiraator is especially coavenient for computation

on small on-line digitai computers since each new set of

measurements on the system being estimated is used to

improve the state estimates (39). Other important

considerations are that the Kalman filter is uniformly

asymptotically stable and the variancas converge (28, 39).

Thus this optiraal filtar is quite insensitive to poor

initial estiraates and computer round-off errors.

Assuming a linear tima-invarian- system, as is done

throughout this chapter, two methods exist for implementing

a computer solution of the Kalman fiiter. The matrix

75

Page 83: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

76

4J

<x

-p

u Q) -P

-H M-l

c

(0

O

Ê fd }-i

> (d

•H

U

o (0

c 0

• H CS4

-p N

Page 84: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

77

products specifying the gains K(t) can be computed on-line

or these gains raay be coraputed off-line independent of the

actual raeasurements. Microcoded implemcentations of these

two methods are discussed in the following sections with

special attention given to timing considerations, sampling

rates possible and accuracy necessary. An assembly

language version and a microcoded version of a simple

second-order system are implemented and compared. This

procedure indicates limitations of the HP-2100 and suggests

an architecture for efficiently computing the Kalman filter.

Qn-Line Solutions

The notation of Kalman-Bucy (28) will be used to outline

the procedure for a microcoded, on-line iraplementation. The

message is a random process x(t) generated by the model

x(t) = F x(t) + G u(t) , (42)

wne here x(t) is an n-vector, the state of the system; u(t) i;

m-vector (ra^n) representing the inputs to the systera; F

and G are n x n and nxra matrices, respectiveiy. The

observed signal is

(t) = y;(t) + v(t) = Hx(t) + y(t) (43)

Page 85: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

78

z (t) , y;(t) , and v(t) are p-vectors denoting the measurements,

outputs, and the noise corrupting the measurements; H is a

nxp matrix; p Im. The functions u(t) and y(t) in (42, 43)

are independent random processes (white noise) with

identicaliy zero means and covariance matrices represented

by

Cov

Cov

Cov

u(t) , U(Y)

v(t) , v(y)

u(t) , V(Y)

Qô(t-Y)

= RÔ(t-Y) for all t.

= O (44)

(As has been previously noted, for simplicity it has been

assumed that F, G, H, Q, and R are linear time invariant

constant matrices.) The optiraal estimate x(t/t) is

generated by alinear dynamical system of the form

dx(t/t)/dt = Fx(t/t) + K(t)i(t/t)

i(t/t) = z(t) - Hx(t/t) (45)

The optimal gains K(t) are given by Squation 46

K(t) = P(t)H' R - (46)

Page 86: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

79

The matrix P(t) in Equation 46 must be a solution of a

Ricatti type of differential equation given in Equation 47.

dP/dt = FP + PF' - P H' R~^ HP + GQG' (47)

It is these last two equations (46 and 47) that must be

computed in real time to effect an on-line impleraentation.

The techniques of Chapter III for raatrix-vector product

microroutines, will be extended to obtain solutions to

Equations 4 6 and 47 since they are just matrix products.

It is assumed that the inverses (R~ ) and the transposes

(F', H', G') will be calculated beforehand and will not be

computed on-line. Coraputational times for both a second-

and fourth-order system will be derived.

A flowchart for the computation of A 3, where both A

and B are 2 x 2 raatrices, is shown in Figure 18. This

flowchart is just a raodification of the one presented in

Chapter III for the coraputation of an n x 2 raatrix times a

2x1 vector. The program executed from this flowchart

first reads b.. .. and b^, into registers F and Q. Than each

^ii ^^^<^s^ to compute 3-iit>Ti + ^12^21 ^^^ ^21^11 " 22^^1

are read as required; each sum is stored when calculated.

When this is finished, b-. and b ^ ^^e read into F and Q and

the process repeated. The time to execute this routine

was calculated as 95.06 ys. To compute the Kalman filter

Page 87: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

i

80

YES

0 •> FLG Q ^ S^

EXIT V

Fiq. 18. 2x2 matrix multiplication.

Page 88: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

81

gains from Equations 46 and 47, tan such products are

necessary, requiring 950.6 ys. In addition, to corapute

Equation 47 requires several matrix adds, multiplication

by AT (sam>pling time) and several other numerical operations.

Total tirae to calculate Equations 46 and 47 was approximated

at 1.2 ras. The sarapling thaorera states a signal with the

1 27T

highest frequency component greater than u) = y(-m-) or about

2.6 k radians/second could not be handled by this computa-

tional procedure (that is using this program) . Practically,

a signal with highest frequency component greater than

w = - to T (-FR-) or about Ik rad/sec to 500 rad/sec could not D lU i

be handled.

The flowchart for a 4 x 4 matrix times a 4 x 4 matrix

is shown in Figure 19 (again just an extension of tha

Chapter III 4 x 4 matrix times 4x1 vector f lowchart) . The

algorithra for coraputation of execution time is given as

2 3 2 ^"^ t = 2T + 7nT + 7n T + 59n T + n T Z (n - d) e d=l

+ n- T Z (n- d) (48) d=2

for n>-3 where n is the size of the system and T = 196 ns.

The approximate tirae of execution for n = 4 is 803 ys.

Thus to solve for the Kalman filter gains requires tan such

Page 89: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

82

ENTER

X )

^8 - ^9

^ j - Si ^i+l,j - ^2 ^i+2,j -*• S3 ^i+3,1 ^ S4

^ 8- S

y A + S^ 5 S3 - 1

- ^ 6

" S

Fiq. 19. 4x4 matrix product

Page 90: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

83

products (3.03 ms) plus time to add the various matrix

products plus othar numericai operations (approximately

.77 ms) for a total execution time of about 8.8 ms. Using

the criteria of the previous paragraph, a signal with

highest frequency component greater than about 71 to 142

rad/sec could not be handled by an on-line im.plementation

of a Kalman filter for a fourth-order system.

Now if accuracy can be sacrificed for an increase in

speed, the half-word length (8-bit) impleraentations of

Chapter III raay also be extended to the computation of the

Kalman gains. Not rauch additional gain in bandwidth (or

highest frequency components) can be obtained in a second-

order system (about 550 to 1100 rad/sec for worse case

conditions and 600 to 1200 rad/sec for best case conditions).

In the 4 x 4 irapleraentation, a signal with highest frequency

components of about 80 to 160 rad/sec for worse case

conditions is amenable to on-line Kalman filtering and a

signal with highest frequency components of about 85 to

170 rad/sec for best case conditions can be computed.

Off-Line Implementation

In an off-line impleraentation, the Kaim.an filter gains,

K(t), are calculated off-line and stored in memory. The

purpose of the microprogram is to input the measurement,

^(t), and y(t/t) (see Figure 17), to compute the difference

Page 91: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

84

to forra z (t/t) = z(t) - y(t/t), and to compute and output

the product K(t)£{t/t). It will be assumed that the

vectors z (t) , v(t/t) and £(t/t) ara of the same order as the

system and the matrix K(t) is square with this same order

also. Compurational times for both a second- and fourth-

order system will be calculated.

A flowchart for the second-order system is shown in

Figure 20 where the first three boxes input the vector

elements z, , z^, y-,, y^ ^nd form the differences z^ - y ,

7. - y . The remainder of the flowchart is the same as

that for the raatrix-vector product routines developed in

Chapter III, except instead of storing the results in

memory, they are outputted as calculated. It was assumed

that input and output could be accomplished with a simpla

LIA, LIB, OTA or 0TB instruction. The execution time for

this routine was calculated to be approximately 51 ys,

Using this time as the minimum sampling interval, a signal

with highest frequency components of 12.5k to 25k rad/sec

could be handled with this computational method.

A flowchart for a fourth-order system computation

wouid be much like that shown in Figures 19 and 20 and will

not be given here. The algorithm for calculation of

execution time is

Page 92: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

85

ENTER

0 -> S.

INPUT 2. INPUT z 9

INPUT y-_ INPUT Yo

YES

OUTPUT S^

0 ^ FLG Q ^ S^

Fig, 20. Second-order off-line Kalman filter

Page 93: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

86

^ n-2 t^ = 6T + 18nT + 59n^T -;- nT Z (n - d)

d=l

n-1 + 2nT Z (n - d) (49)

d=2

for n l 3 , T = 19 6 ns. The time to compute the outputs for

a fourth-order systera (n = 4) is approxiraately 209 ys. A

signal with highest frequency coraponents of 3k to 6k rad/sec

is araenable to this coraputational raethod.

As was pointed out in the discussion concerned with an

on-line iraplementation, no appreciable increase in bandwidth

was achieved with the half-word implementations. There is

no reason to expect an appreciabla increase in bandwidth for

the off-line impleraentation either, unless it is assumed

that two words (16-bits) can be input simultaneously. This

implies that both z and z^ are inputted as one word and

similarly y.. and y_ are inputted as one together (see

Figure 17). The range of frequencies amenabie to an off-

line, fourth-order system was calculated to be approximately

3.55k to 7.1k rad/sec (as compared to 3k and 6k rad for a

full-word fourth-order system). As can be seen, tha

increase in frequencies for the half-word implementation is

not too appreciable.

Page 94: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

87

Another possible approach to an off-line impiementation

takes advantage of the capability to store vaiues of K(t)

in the same or another WCS module in the lower eight bits

of microinstructions as shown in the following exampla,

EXAMPLE: CR lOR A xxx

A CL lOR A yyy

where these two microinstructions comJDÍne xxx (8-bits) and

yyy as the raost significant and xxx as the least significant

8-bits, respectively. A flowchart of a straight line raicro-

prograra (no looping) for a second order systera is shown in

Figure 21. This program was written assuming that the

precalculated values of K(t) are stored in the least

significant eight bits of certain microinstructions. The

time to execute such a program was caiculated to be

approximately 40 ys. Thus, a signal with highest frequency

from 15.5k to 31k rad/sec can be implemented in this

fashion. In the impleraentation of a haif-word micro-

routine, it might again be assuraed that two components

(z, and y,, for example) are inputted simultaneously to

save time. Very little savings is realized by such an

assumption since most of tha time saved in inputting will

be spent in separating z^ and y , checking for sign,

scaling, etc. Hence, this assumption will not be made and

Page 95: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

88

START 'I ' î

INPUT

INPUT

^ii^i

INPUT z^

INPUT

^2

OUTPUT SUM

FORM PARTIAL SUM

END

Fig. 21. Straight line, off-line Kalman filter

Page 96: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

89

the same program used for the full-word will be used for a

half-word implementation. The execution time for this

haif-word microprogram is reduced to 28 us, increasing the

highest freq-jency components to 22.5k and 45k rad/sec,

respectively. Tha algorithms for computing these

execution times are given in Equations 50 and 51,

respectively.

t^ = 3T + 24nT + 38n^T (50)

t = 3T + 24nT + 23n^T (51)

The execution tiraes for a fourth-order system can be computed

by letting n = 4 in these equations, yielding t --^ 119 ys

and 71 ys, respectively. Again calculating highest

frequency components that can be handled by these prograras

obtains 5.75k to 11.5k rad/sec and 8.9k to 17.8k rad/sec,

respectively.

As can be seen, the off-line irapleraentation is rauch

faster than the on-line version. This was expected. How-

ever, there is a raajor drawback to this off-line iraplementa-

tion. Each sample requires a new K(t) and with only finite

memory space available, this effectively liraits the use of

this prograra to a finite time. The on-line impleraentation

may take raore tirae to execute, but it has the advantage

Page 97: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

90

that it can be run continuously and is not limited to a

finite running time as is the off-iine implementation.

Example Second-Order System

The example programmed in this section is due to Sage

(50). The system is described by

dx (t)

dx^^t)

"dt = ' ^ ^ ' (52)

where a)(t) is Gaussian white noise with zero mean and

variance of one (1). The measurement is corrupted by

noise as given in Equation 53,

z(t) = x(t) + v(t) (53)

where v(t) is Gaussian white noise with zero mean and

variance of sixteen (16). The estimation equations are:

Pll = 2p^2 - Pll^/^^ ' Pll'°' = ^

Pl2 = P22 - P11P12/" ' Pl2<°' = °

Page 98: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

P22 = "Pl2 /16 + 1 f P22((5) = 0

91

l = 2 + ÅPll^^^ z(t) - x^(t) , Xj^(O) = 0

^2 - 2^Pl2^^^f ^ ^ - ^i^t)] , x^^O) = 0

K(t) =

^ (t)

k^ít)

1 16"

Pll(t)

P.o(t) 12

(54)

The method used to program these equations was the standard

difference equation technique. That is to solve the

differential equation x., = x^ in discrete time on the

digital computer, form the equation

x^(k + l) - x^(k) = X2(k) (55)

and solve for x, (k + 1), yielding

: (k + l) = Xj (k) + ATx^^k) (56)

This technique was applied to all the differential equations

obtaining the following set of equations for computer

Page 99: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

92

s o l u t i o n :

x^ (k + 1) = x^ (k ) + ATx (k)

X (k + 1) = X (k) + ATco(k)

p^^ik + 1) = P i i ( k ) + 2ATp^2(^) - ATpj^^2(^^/j_g

P ^ 2 ( ^ + 1) = Pj_2(k) + ATp22(k) - ^Tp^^ik^p^^ik)/l

P22(k + 1) = P22(î^) - ATp^2^(k ) /16 + A'

x , (k + 1) = S (k) + ATÎ^'^ík)

+ (AT/16)p^^(k) z ( k ) - X-^(k)

x^^k + 1) x^^k + 1) + (AT/16)p^2(^) z ( k ) - X-, ( k )

z(k) = x ( k ) - v ( k )

kj^(k) = p ^ ^ ( k ) / 1 6

^ ^ ( k ) = p^^ik)/16 (57)

Page 100: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

93

The noise disturbances a)(t) and v(t) were genaratad

off-line and then stored as constant vaiues in the computer.

At each sample instant the appropriate values of w(k) and

v(k) were acquired from the respactive tables. All vector

elements wera treated as integer quantities scaled three

octal places, resulting in three octal place accuracy. No

checks for overflow or underflow were made. The saraple

time used to prograra this siraple 1/s systera was chosen as

1/8 second. Since o) = 1 rad/sec for the system, this

sampling interval falls within the range 1/5 to 1/10 and is

a convenient number to represent in octal. A flowchart of

the prograra is given in Figure 22. (The program is given in

Appendix C.) The Noise Fix subroutine is necessary since

the noise table values are stored in floating-point format

and must be converted to appropriately scaled fixed point

values. The time for one pass through the assembly language

routine was approximately 575 ys.

Since the entire assembiy language program could not

be incorporated into microcode, the question is what

operations can be microprogramraed to save time. It can be

easily observed that multiplication by AT and scaling are

performed repeatedly. This operation constitutes multiplica-

tion by a positive constant, known a priori, so that a

memory reference to obtain this constant and checking for

negativity are unnecessary. This concept was used to design

Page 101: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

94

START

INITIALIZE

GET 03 (k)

FIX NOISJ

I CALCULATE x^ík + l)

CALCULATE Xj. (k + 1)

I CALCULATE Pii(k + i)

CALCULATE

Pl2(^ + 1)

I CALCULATE P22(k + 1)

I GET v(k)

FIX NOISE

V

CALCULATE z(k)

CALCULATE x, (k + 1)

^ik + 1)

CALCULATE K(t)

SHIFT AND STORE PAST

VALUES 1

INCREMENT k

NO / ^ k = FINALVALl

Fig. 22. Kalman filter example

Page 102: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

95

a m.achine instruction (microprogram) that multiplias by a

positive constant and scales. This microprogram executas

in 8.428 ys as corapared to 15.78 ys to execute the same

operatioas in assembly language. The next operation which

is repeatedly executed is divide by a positive constant,

known a priori. This was incorporated into a microprogram

which executed in 8.86 ys as compared to 16.66 ys for a

normal machine language divide. The equations to compute

X-. (k + 1) and Xp(k + 1) v/ere also raicroprograraraed except for

a couple of LDA's and a MPY instruction. The execution

time for this microprograra was 83.576 ys as corapared to 117.60

ys for the same operations in machine languaga. One pass

through the raachine language routine required approxiraately

575 ys whereas one pass through the raicrocoded version

required only about 450 ys. These tiraes are suraraarized for

easy reference in Table 4. As can be seen with just these

few operations raicroprogramraed, a considerabie tirae savings

(125 ys) resulted.

Proposed Architecture

An architecture for efficiently coraputing the Kalman

filtar is more difficult to design than those architectures

for efficiently computing raatrix-vector products and

Gaussian eliraination algorithms. One reason for this is

that the best architecture for an on-line implamantation

may not be txhe best for an off-line implementation. Thus

Page 103: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

TABLE 4

COMPARISON OF EXECUTION TIMES

96

Operation

Time (ys)

Assembly Language Microcoded

Multiply by Constant

Divide by Constant

Compute X.. (k+1) and

x^ík+l)

One Complete Pass

15.78

16.66

117.60

575

8.428

8.86

83.576

450

Page 104: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

97

the proposed architecture depends on whether an off-line

or on-line, half-word or full-word impiementation is

dasired. Another reason for this difficulty is that no

severe limitations which seemingly prohibited an efficiant

impleraentation arose to suggest an alternate raachine

architecture. However, since the on-iine implementation

uses essentially the same matrix-vector product routines

of Chapter III, the shortage of scratch-pad registers will

limit its application.

The architectures suggested in Chapter III and Chapter

IV significantly increase the capabiiities of tha coraputar.

These added capabilities can also be used to significantly

increase the efficiency in computing the Kalman fiiter.

These architectures also add a new dimension of flexibility.

Hence they are adaptable to either an on-line or off-line

implementation of the Kalman filter. As discussed above,

limitations in irapleraenting the Kalraan suggest no new

architectural features other than those described in

Chapters III and IV. Thus these architectures are

sufficiently general to aiso increase the efficiency in

coraputing the Kalman filter.

Page 105: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER VI

EFFECTS OF FINITE WORD LENGTHS

The major thrust of this chapter is to determine the

minimura register length necessary for accurate computations.

This has particular application to the development of an

effective architecture for raicroprograra controlled raachines

since the shorter the register length that can be used for

accurate coraputations, the less costly it wouid be to

implement specialized problems in microcode (assuming

prograras of equal length). The procedure used to determine

the miniraura register length is to analyze how finite

register lengths affect the accuracy of the fast Fourier

transforra algorithra. The works by Bergland (4) and

Steiglitz (53) are concise and lucid discussions of both

the discrete Fourier transforra (DFT) and fast Fourier

transforra (FFT) algorithms. The reports of Weinstein (63),

Oppenheim and Weinstein (41), Welch (64) and Liu (32) are

definitive works on the effects of finite register lengths

on the accuracy of digital filters and the FFT.

FFT Alqorithra

The fast Fourier transform algorithm is directed

toward computing the discrete Fourier transform of a

finite duration signai f(n). Tha DFT is defined for f(n) as

98

Page 106: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

99

n=l F(k) = l f(n)W^^, w = e"^2"/N . (58)

n=0

The FFT is simply an efficient raeans for coraputing the DFT

and can be used in place of the continuous Fourier

transforra only to the extent that the DFT can, but with a

substantial reduction in computational time, A flowchart

depicting the FFT algorithra for N = 8 = 2^ is shown in

Figure 2 3, where the coraraonly used radix-2, decimation in

timie forra is depicted, Sorae key features of this diagrara,

common to all standard radix-2 forms, are as follows. The

DFT is calculated in v = log^N stages. At each stage, the

algorithm goes through the complete array of N coraplex

numbers, two at a time, and generates a new N number array.

The vth array is the desired DFT. The basic calculations

operating on a pair of numbers in the (m+Drh array is

referred to as a "butterfly" and given by

^m + l'i' = ^m'i' + "^m<^'

\ + l<3' = ^m<i' +"^in<í' • <'"'

In E q u a t i o n 5 9 , X ( i ) and X^( j ) r e p r e s e n t a p a i r of nurabers ^ ra mi

in the rath a r r a y , and W i s sorae appropr ia te i n t ege r power

Page 107: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

100

x ( 0 )

x ( 4 )

x ( 2 )

x(6 )

x ( l )

x ( 5 )

x ( 3 )

x ( 7 )

X ( 0 )

X(l )

X ( 2 )

X ( 3 )

X ( 4 )

X ( 5 )

X(6)

^ X ( 7 )

F i g . 2 3 . FFT flow g r a p h , N = 8

Page 108: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

101

of W, that is

w _ w? _ ^-j2iTp/N W - W^ - e -• ^/ . (60)

There are other forms of the FFT, such as the decimation

in frequency forra, but for the noise analysis and experi-

mental results, this forra wiil be used.

Truncation Noise Effects

A brief review of the work done by Oppenheira and

Weinstein (41) on the effects of finite register lengths

on the accuracy of the FFT will now be given. Since raost

of the work analyzed only the consequences of round-off

noise, the essential results that change when truncation

is considered will be given. In addition, since the

experiraental method used by this author to determine the

effects of finite register lengths is probably unique, the

import of this raethod will also be analyzed.

Whether the arithraetic is perforraed in fixed point or

floating point, the error is represented by independent

white noise generators of uniforra distribution. In the

case of fixed point arithmetic, errors are only introducad

following multiplications (i.e., assuming no overflow in

addition), so a noise generator is associated with each

muitiplication. For the FFT aigorithm this means that a

noise source feeds into each node of the signal fiow graph

Page 109: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

102

of Figure 2 3. These are complex noise sources where the

2 complex variance, o^ , is defined as the expected squared

magnitude of such a noise source and is given as a ^ =

4(2 /12). By examining Figure 23 for the case N = 8, it

is seen that 2N - 1 noise sources propagate to each output

E node so that the output noise variance a is given by

a^^ = (2N-l)ag2 (61)

or for large N

o^ = 2Nag^ . (62)

Here it has been assumed that the input node derives from

an A/D converter which also introduces a quantization effect.

The errors given in Equations 61 and 6 2 are for round-off

2 only. If errors due to truncation are considered, a is

JD

given by a = =-(2 ) , the mean is no longer zero (as for B -j

round-off) but is given by y^ = 2(2 ), and both of these

propagate through the algorithm so that

y^ = (2N)yg

and

a^ = (2N)aj 2 . (ô3)

Page 110: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

103

The essential result, that the output noise variance and

mean are proportional to N, is the same for both round-off

and truncation. It can be shown that if the input sequence

x(n) is bounded by

|x(n)1 < 1/N (64)

overflow can be prevented. Further, assuraing x(n) is white

with real and iraaginary parts each uniforraly distributed in

(-1//2N, 1//2N), which gives oj^ = Na ^ = 1/3N, leads to the A. X

ratio

2

^ = ^N^a^^ . (65) a ^ ^ X

In Equation 65, a^ is the variance of the output sequence

which is also white when x(n) is white. If x(n) is not

white, the constant of proportionality in Equation 65

2 2 . changes but the essential feature that a^ /a^ is

2 proportj-onal to N reraams.

In floating-point arithmetic, as in fixed-point

arithmetic, noise is introduced due to each butterfly

computation. Second-order error terms are neglected so

that noise sources are introduced after each multiplication

Page 111: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

104

and addition. These noise sources ara assumed to be white

with variances proportional to the variance of the signal

at that node. Tha input signal is assumed to be white to

simplify the anaiysis.

In Figura 2 4, a typicai butterfly computation (only

top half) is illustrated showing the noise sources due to

multiplication and addition. Also shown are the noise

sources (e. 's) that must be included to coincide with the

experimental method used to implement truncation. Writing

out Equations 59 in terms of their real and imaginary

parts (illustrated in Figure 2 4) yields

Re — —

^ m - U 1 ( Í ) m + 1

= Re X ( i ) ra + Re "—™ ~ ~ "

S'"l Re - W

w

- Ira Xm(3) ra m

Im ^m + l<í' = Ira m - Re X^(j) Im W

- Im m - Re ^ (66)

In the experiraental coraputation of thase values, W is first

calculated, scaled to include the appropriate number of bits,

and then truncated (integerized). Tha product X^(j)W is

Page 112: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

105

Re -^m(i) m

Re \<í'

Im X (j) m -•

>" Re ^ + l ^ i )

• -m^i)

Im ^m(i>

Re X (j) m -'

Im X (j) m -•

"8

< )- ^ r m+ií "

^m^i)

t,

Fig. 2 4. Noisy butterfly computation (floating-point)

Page 113: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

106

also scaled and truncated and finally the sum X (i) + m

WX (j) is scaled and truncated. In Fiqure 24, e , e ^' ^l ^2

e , e are due to the truncation of W; e^ , e^ , e^ , ^6 ^7 ^3 t. t^

e^ are due to the truncation of WX (j), and; e, , e^

due to the truncation of X 0 are

m + 1*

The assumption of a white noise input signal implies

that

Re(X^) m Im(X ) ra

1 2

X ra

(67)

The effect of rounding in floating-point is represented such

that if [}{] denotes rounding of the raantissa in a floating-

point nuraber, then

[XI = x(l + e) (68)

It can be shown that lel 1 2 . Using this inforraation

yields the noise source variances

a 2 + ^ 2 ^ ^ 2 ^ ^ 2 ^ ^ 2 ^ 2 ^ |a^2,^|2 (gg,

^l ^2 ^5 ^6 ^3 ^7 2 °£ IV

2 2 2, i2 a = a = a X„ e^ eg e I m (70)

Page 114: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

107

The variance of the complex noise source U = u + iv '"s m m -• ra

U î = ,,hy: l2 ra' "^e '^n' ( l

so that the variance of the noisa generated in coraputing the

2 (m + l)th array is 4a. times the variance of the signal in

the mth array. Letting the input (zeroth) array variance be

2 2 a and a the output noise variance generated in the

(m+l)th array, then

a ^ = 2Na a ^ . (72) om e X

Since the noise generated in each array is assumed to be

2 . independent, the total output noise variance a^ is

o^^ = 2vNa ^ . (73) E X

By noting that the output signal variance is related to the

input signal variance by

a 2 = Na 2 (74) X X

the following result is obtained

Page 115: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

108

^E^/^X^ = 2a/v . (75)

To see the implications of Equation 75 in terms of

register length requirem.ents, this equation can be expressed

in units of bits by

2 2 2 1 (a^ /a^ o^ )bits = ^-log^^^v) . (76)

Equation 76 represents the nuraber of bits by which the rras

noise-to-signal ratio increases in passing through a

floating-point FFT. The validity of Equations 75 and 76

have been experimentally confirmed by Oppenheim and

Weinstein and others. Oppenheim and Weinstein have also

shown that when the input signal is not white, but is a

sine wave, that the results of Equation 75 ara only in

error by approxiraately fifteen percent.

Now the additional effects of the experiraentally

introduced quantizers need to be considered also. The

variances of the quantization noise sources (e^'s) are

assumed equal and in general independent and different

2 from o or the other noise sources introduced by addition

e

and multiplication. Assurae each of these noise generators

is white, uniformiy distributed (0, 2-2~ ), mutually

independent, independent of the signal, and with equai

Page 116: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

109

means (y^ = 2 ) and equal variancas (a ^ = 2"^V3) • ^ ^t

Then it can be seen that the noise variance at the (m + l)th

node dua to these noise generators is lOa ^more than the ^t

varianca at the mth node (the mean also increases by 10).

This effect then may be viewed as introducing a signal

independent white noise sequence with variance a = lOa ^ ^ ^t

and mean y^ = lOy^ at the (m + l)th array. As in the case t

for the fixed-point arithraetic, N - 1 such noise sources

will propagate to the output giving

o^ = (N-l)a.^ = 10(N-l)a ^ o t e^

U = (N-l)y =10(N-l)y (77) t

2 where o and y represent the variance and raean, o o

respectively, at the output. This result is very similar

to that obtained for fixed point errors, as expected.

However, the raean has also increased by a factor of N and

this error due to truncation is in addition to the errors

incurred through norraal floating-point raanipulations. If

the input signal is not white, the errors predicted by

Equation 77 should not vary raore than about fifteen percent

as stated above.

Page 117: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

110

Experimental Results

The D?T for the two signals

and

S, (n) = sin(nTr/8)

S^di) = sin(n7T/8) + 1. 5s in (nTT/16) + 2s in(mr/32) (78)

was coraputed using the prograra given in Steiglitz (53, p.

153) and then again using this prograra raodified to include

the quantizers. (These two prograras are given in Appendix

D.) The DFT of these two signals is easily calculated as

shown for S.. (n) . Note that

S^(n) I2j

í jn7T/8 e- ' -jm /S'

Zl ,JmT/8 ^ 1 -jn7T/3 (79)

The DFT of the phasor e^^^'^^ is equal to N at the frequency

point corresponding to one-eighth the Nyquist frequency,

which is (1/8) (N/2) = N/16; and zero elsewhera. Similarly

for the phasor e"^^^^^, the DFT equals N at the frequency

point corresponding to -(1/3) (N/2) = -N/16 = (15/16)N.

In the coraputed example, N = 32, giving the DFT of S^(n)

to be

Page 118: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

111

-jl6 when k = 2

Sj (k) = ji6 when k = 30

0 elsewhere . (80)

Using the sarae procedure to compute the DFT for S (n) with

N = 64 yields

S*(k) =

-j64

-j48

-j32

j32

j48

j64

0

when k = 1

when k = 2

when k = 4

when k = 60

when k = 62

when k = 63

elsewhere (81)

These values of the DFT were confirmed when the FFT program

was run on the IBM 370. (For the results of all coraputer

runs, see Appendix D.) Since the results of the error

analysis for floating-point arithjnetic have essentially

been confirmed by other researchers, no atterapt at

corroboration was raade. Instead the effects of quantization

will be shown.

As previously discussed, the consequences of finite

register length on the accuracy of the FFT were studied

experimentally by scaling certain parameters in the prograro

and then truncating (intagerizing). The pararaeters that

Page 119: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

112

were scaled and chopped were the input signais S and S , -M 'N/

W, WX^(J)/ ^m + l^^^ ^^^ \ + l^^^- Threa different scale 3 2

factors, 10 , 10 , and 10, corresponding to register

lengths of approxiraately 9.97, 6.64, and 3.32, respectively,

were used. The calculated values of the DFT and the values

computed by the FFT program were differenced, squared and

averaged to forra a m.ean squared error. The rasults of

this analysis are sumraarized in Table 5. It is evident

that the variance (raean-squared error) is proportional to -2t

both 2 and N and that there is a bound on this error even

for very short register lengths (3.32 bits). The scale

factors are different for each case but the iraportant

result is that the error is bounded and proportional to both -2t 2 and N, in close agreeraent with theoretical analysis.

Hence, the largest scale factor could be used in a bounding

process with the realization that the actual error vzould be

close to twenty percent better than predicted by this bound.

Page 120: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

113

TABLE 5

SUMMARY OF FFT QUANTIZATION ERROR

N Scale # of Bits a^

32

32

256

32

10

102

lO^

lO^

Signal S,

Signal S^

3.32

6.64

6.64

9.97

1.47791

2.38319

1.862 X

2.51444

X 10"^

10-1

X 10"^

64 10 3.32 1.45383 x 10

64 lO^ 6.64 1.76383 x lO""

64 10" 9.97 1.82959 x lO"^

Page 121: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

CHAPTER VII

CONCLUSIONS

The problem of efficiently iraplemienting the techniques

of modern control theory and signal processing in a micro-

program controlled minicomputer has been solved by the

architecture proposed in this dissertation. It has been

shown that there are several features of modern minicoraputers

that prohibit an efficient coraputation of matrix-vector

products, Gaussian elimination, and other similar techniques

of modern control theory and signal processing. The manner

in which the proposed architecture corrects these short-

comings is discussed in the following paragraphs.

The most severe limitation in modern minicomputers, for

example the HP-2100, for effectively computing solutions to

matrix-vector products, Gaussian eliraination, and other

similar problems is an architecture with an insufficient

number of scratch-pad registers. The proposed architecture

(shown in Figure 14, page 54) corrects this shortcoming by

incorporating the hardware necessary to increase the number

of general purpose scratch-pad registers. By increasing the

number of scratch-pad registers, the Gaussian elimination

routine can be raore efficiently coraputed and raost of the

matrix-vector products discussed in Chapter III can be

implemented. However, by also requiring these additional

114

Page 122: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

115

registers to be general purpose, a feature of the proposed

architecture, all routines can be more efficiently micro-

coded.

Another less severe limitation that occurs in mini-

computers such as the HP-2100 and 21MX and the Microdata

MICRO-1600 is the incapacity to store and retrieve data

directly in microstorage. The proposed architecture

corrects this shortcoming by incorporating this capability

in place of, or in addition to, the already available

capacity to store and retrieve data from the least

significant eight bits of certain raicroinstructions. The

advantages of the capability to perforra byte operations on

data stored in raicrostorage, that is, of accessing either

eight, sixteen, or even twenty-four bit data words directly,

are readily apparent. This added feature of the proposed

architecture certainly enhances the overall capabilities

of the coraputer and would significantly increase the

efficiency of routines using half-words (8-bits).

A third liraitation, at least of the HP-2100, is the

small size (256 locations) of the raicrostorage. The

proposed architecture corrects this shortcoming and

enhances the capabilities of the computer by incorporating

an expanded microstorage. Even as small main raeraories

limit the capabilities of computers, small raicrostorages

limit the capabilities of raicroprograimnable coraputers and

prohibits the efficient irapleraentation of many problems

Page 123: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

116

(e.g., the Gaussian elimination routine could not be

completely raicrocoded in the microstorage provided in the

HP-2100).

It has been deraonstrated in this dissertation that

microprogrammed versions of routines such as matrix-vector

products and Gaussian elimination execute much faster than

even machine language versions of the same routines. This

performance improveraent occurs even though the architecture

of the coraputer was not designed specifically to efficiently

compute these routines. A microprogram controlled computer

incorporating many general purpose scratch-pad registers,

the capability to access data directly in microstorage, and

a large microstorage, i.e., the architectural features

proposed in this dissertation, will significantly increase

the performance of microprograramable computers beyond that

attainable without these features.

Even though the proposed architecture was constrained

to use the sarae bus structure and raicroinstruction forraat

as already irapleraented in the HP-2100, this does not

restrict the design to any significant degree. Several

minicoraputers use a sirailarly structured bus system; for

example, the HP-21MX retains the T- and S-buses but

eliminates the R-bus. The Microdata MICRO-1600 has A- and

B-buses functionally like the T- and S-buses of the 2100

and 21MX. In addition, several of the additional scratch-

pad registers of the proposed architecture should be

Page 124: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

117

impleraented as R-bus registers to increase the efficiency

of those routines requiring masking operations (AND, XOR,

etc). This feature adds a degree of flexibility and

efficiency not attainable in either the 21MX or MICRO-1600.

However, further research into the most efficient bus

structure should be conducted.

The proposed architecture calls for an expanded raicro-

storage system. This expansion can be accomplished in

either of two ways. Since the present microstorage (WCS)

is organized as a 1 bit by 256 location memory, it can be

expanded by replacing each memory chip by a more advanced

state-of-the-art chip of greater density such as the new

1 bit by 4k chips. (Other microstorages may be similarly

expanded.) The second method of expansion is to add more

microstorage modules. The problem with the first method is

that only eight bits (in the 2100 WCS) are available for

addressing locations. Eight bits can only access 256

locations directly. There are schemes for accessing more

locations than the bit dimension directly allows, such as

two dimensional and indirect addressing schemes. The

limitations and methods of expanding each microstroage

module in this raanner should be investigated. The problem

of expanding microstorage by the second raethod, at least

in the 2100, is twofold. First, each module is quite

expensive and second, the scheme for accessing locations

in one module from another, is not conducive to efficient

Page 125: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

118

microprogramming. The architecture should be designed such

that all locations in microstorage, including those on

various boards or modules, can be addressed directly as a

continuum of locations. This problem was partially solved

in the 21MX by providing a microinstruction that allows

direct jumps frora one raodule to another, but further

research is needed to design a systera for even more direct

access between raodules.

Since microprogramraing requires a thorough knowledge

of both the hardware and software characteristics of the

computer, the perforraance iraproveraent achieved through

microprogramming may be offset by the additional time

required to becorae farailiar with these characteristics.

However, the additional tirae required to raicrocode any

specific problera would soon be raade up in the execution

for dedicated applications. Part of the extra time spent

in microcoding is due to the marginal perforraance of both

the micro-debug editor (MDE) and the microasserabler. Two

features of the raicroasserabler that need to be raodified

are its inability to read the frontal (blank) portion of a

prograra tape, and its inability to ascertain that no

comments have been made (it spaces the entire width of a

page on printout even though no comments have been made).

The first feature is mainly a nuisance, but the latter

consumes considerable time in the assembly process. The

micro-debug editor has the feature that on certain

Page 126: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

119

inappropriate commands it hangs up in an apparently endless

loop. Halting the computer and restarting MDE does not

remedy the situation; the complete program tape raust be

reloaded. It is recoramended that both MDE and micro-

assembler be raodified to elirainate these disadvantages and

to improve their perforraance. Another research effort raight

be directed at developing a prograra to tirae optimize user

developed microprograras. This tirae optiraizing prograra

would take programs that had already been processed by the

microasserabler, thus elirainating any prograraraing errors,

and exaraine thera for inefficiencies.

Further areas for possible future research are

discussed below. The techniques developed in this disserta-

tion should be applied to a systera that uses Kalman filter-

ing, and in which it is necessary to interface with analog

systems. This would provide the opportunity to actually

implement a systera and to further investigate input/output

characteristics. Another area of investigation should be

to develop and optiraize FFT prograras in microcode. One of

the studies now in progress at Texas Tech University is the

emulation of a PDP-11. It is recommended that other existing

and proposed machines be emulated in microcode to study the

architectural features of other microprograra controlled

machines. In eraulating other raachines, the raicrostorage

might not be large enough to corapletely eraulate the

specific raachine under consideration. In this case, part of

Page 127: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

120

the emulator microprograms might reside in main meraory

and be transferred to raicrostorage as needed. Another

area rich in usefulness is the developraent of raicro-

diagnostics to determine their effectiveness as compared

to standard diagnostic techniques, and to investigate

which failed parameters may be microprograraraed around to

allow continued coraputer usage. Finally an investigation

to determine the feasibility of reducing the bit diraension

of microstorage through standard control store miniraization

techniques or through raore two level encoding or through a

combination of both might be conducted. If it is possible

to reduce the bit diraension of raicrostorage, a considerable

cost advantage would result, especially if it was expanded.

Microprograraraing can be used as a systematic approach

to designing the control section of a computer or as a

systeraatic procedure for investigating the architecture

of microprogram controlled machines, as in this dissertation.

It is recoraraended that additional investigations extend the

work done in this research in order to fully exploit the

capabilities provided by a user raicroprogramraable raachine.

Page 128: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

LIST OF REFERENCES

1. Abd-Alla, A. M., and Karlgaard, D. C. "Heuristic Synthesis of Microprogramraed Computer Architecture/' ISEE Trans. Comp., August 1974, pp. 802-807.

2. Ager\«/ala, Tilak. "A Survey of Techniques to Reduce/ Mininiza the Control Part/ROM of a Microprogramraed Digital Coraputer," Micro 7 Preprints, Septeraber 1974, pp- 91-97.

3. Agrawala, A. K., and Rauscher, T. G. "Microprogramming: Perspective and Status," ISEE Trans. Corap., Vol. C-2 3, No. 8, August 1974, pp. 817-837.

4. Bergland, G. D. "A Guided Tour of the Fast Fourier Transforra," lEEE Spectrura, Juiy 1969, pp. 41-51.

5. Blain, G. ; Perrone, M.; and Hong, N. X. "A Compiler for the Generation of Optimized Microprograms," Sigm.icro Newsletter, Vol. 5, No. 3, October 1975, pp. 50-67.

6. Bouricius, W. G. "Procedure for Testing Microprograms," Micro 7 Preprints, September 1974, pp. 235-240.

7. Burns, R. , and Savitt, D. "Microprogramraing, Stack Architectures Ease Minicoraputar Programmer's Burden," Electronics, February 15, 1973, pp. 95-101.

8. Chu, Yaohan. "Design of a Microprogramraed Lexical Microprocessor," Micro 8 Proceedings, September 1975, pp. 26-39.

9. Cohen, David. , and Liu, Ming T. "Emulation of Com.puter Networks by Microprogrammable Microcomputers," Micro 7 Preprints, September 1974, pp. 159-167.

10. Cook, R. W., and Flynn, M. J. "Systans Design of a Dynaraic Microprocessor," lEEE Trans. Corap., Vol. C-19, March 1970, pp. 213-222.

11. Cox, G. W., and Schnaider, V. B. "On Iraproving Operating System Efficiency Through Use of a Micro-programraed, Low-Level Environment," Micro 7 Preprints, September 1974, pp. 297-298.

121

Page 129: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

122

12. Davidson, Scott. "A Network of Dynamically Micro-programmable Machines," Micro 8 Proceedings (Suople-ment), September 1975, pp. 1-5.

13. DeWitt, D, J.; Schansker, M. S.; and Atkins, D. E. "A Microprogramraing Languaqe for tha B-1726," Micro 6 Preprints, Septernber 1973, pp. 21-29.

14. DeWitt, David J. "A Control Word Model for Detacting Conflicts Between Microprograms," Micro 3 Proceedings, Septe-Ttber 197D, pp. 6-12.

15. Fuller, S. H. ; Lesser, V. R.; Bell, C. G.; and Kaman, C. "Microprogramraing and Its Relation to Eraulation and Technology," Micro 7 Preprints, Septeraber 19 74, pp. 151-158.

16. Galey, J. M., and Kleir, R. L. "Introduction," Micro-prograraraing, A Tutorial On The Queen Mary, May 17, 19 75, pp. 1-4.

17. Galey, J. M. "Microprograraraing: The Bridge Between Hardware and Software," Coraputer, Vol. 8, No. 8, August 1975, p. 23.

18. Habib, S. "Microprograraraed Enhancements to Higher Level Languages--An Overview," Micro 7 Preprints, September 1974, pp. 80-84.

19. Hartenstein, R. W. "Microprogramraing Concepts--A Step Towards Structured Hardware Design," Micro 7 Preprincs, September 1974, pp- 59-65.

20. Hoevel, L. W. "'ldeal' Directly Executed Languagas: An Analytical Argument for Emulation," IEE3 Trans. Coinp., Vol. C-23, No. 8, August 1974, pp. 759-767.

21. Hewlett-Packard. 2100 Computer Microprogramming Guide, Hewlett-Packard Company, Cupertino, California, February 1972.

22. Hewlett-Packard. 2100 Computer Microprogra.Traing Software, Hewlett-Packard Company, Cupertino, California, September 1973.

23. Hewlett-Packard. 21Q0A Computer Referenca Manuai, Hewlett-Packard Company, Cupertino, California, December 1971.

24. Hewlett-Packard. Micr^oproqramming 21MX Computers, Operatinq And Reference Manual, Hewlatt-Packard Company, Cupertino, California, August 1974.

Page 130: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

25

26

27

28

29

30

123

Johnson, A. M. "The Microdiagnostics for the IBM Systera 360 Model 30," EEE Trans. Corao., Vol! C-20, No. 7, July 1971, pp. TM^TBÔT. ~^ . ^u,

Jones, Louise H. "A Survey of Current Work in Micro-programming," ComHuter, Voi. 8, No. 8, August 1975 P P . -5 J— J o .

ItÍ'^Vnli^Í' l' ? ^^^ ApDroach to Linaar Filtering and Prediction Problems," Trans. ASl^, Series D, Journal 0£ Basic i^nqmeering;, Vol. 82, March 1960, pp. 35-45.

Kalman, R. E., and Bucy, R. S. "New Results in Linear Filtering and Prediction Theory," Trans. ASME, Journal Qf Basic Engmeerinq, Vol. 83, March 1961, pp. 95-108.

Kleir, R. L., and Ramamoorthy, C. V. "Optimatization Strategies for Microprograras," lEEE Trans. Corap., Vol. C-20, July 1971, pp. 785-79Tr~ ^

Kleir, R. L. "A Representation for the Analysis of Microprogram Operation," Micro7 Preprints, September 1974, pp. 107-118.

31. Kratz, G. L.; Sproul, W. W.; and Wallendziewicz, E. T. "A Microprograramed Approach to Signal Processing," lEEE Trans. Corap., Vol. C-23, No. 8, August 1974, pp. 808-816.

32. Liu, Bede. "Effect of Finite Word Length on the Accuracy of Digital Filters—A Review," lEEE Trans. Circuit Theory, Vol. CT-18, November 1971, pp. 670-677.

33. Lutz, M. J., and Manthey, M. J. "A Microprogrammed Irapleraentation of a Block Structured Architecture," Micro 5 Preprints, Septeraber 1972, pp. 28-41.

34. Mallett, P. W., and Lewis, T. G. "Approaches to Design of High Level Languages for Microprograraraing," Micro 7 Preprints, September 1974, pp. 66-73.

35. Matheson, W. Gordon. "User Microprogramraability in the HP-21MX Minicoraputer," Micro 7 Preprints, September 1974, pp. 168-177.

36. Maurer, W. D. "Some Correctness Principles for Machine Language Prograras and Microprograras," Micro 7 Preprint.i> September 1974, pp. 225-234.

37. Meditch, J. S. Stochastic Optimal Linear Estimation And Control, McGraw-Hill Book Compaay, New York, New York, 1969.

Page 131: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

124 38. Microdata. Microprogramming Handbook, 2nd Edition,

Microdata Corporatioa, Santa Ana, California, 1972.

39. Noton, Maxwell. Modarn Control Engineering, Pergamon Press, Inc, Elmsford, New York, 1972.

40. Opler, A. "Fourth-Generation Software," Dataraation, Vol. 13, No. 1, January 1967, pp. 22-24.

41. Oppenheim, A. V., and Weinstein, C. J. "Effects of Finita Register Length in Digitai Filtering and the Fast Fourier Transform," lEES Proc., Voi. 60, No. 8, Augus 1972, pp. 957-976.

42. Park, H. "Fortran Enhancement," Micro 6 Preprints, September 1973, pp. 156-159.

43. Patterson, David A. "The Design of a System for the Synthesis of Correct Microprograms," Micro 8 Proceedings, September 1975, pp. 13-17.

44. Ralston, Anthony. A First Course In Nuiriarical Analysis, McGraw-Hill, New York, New York, 1965.

45. Ramaraoorthy, C. V., and Chang, L. C. "Systera Modeling and Test Procedures for Microdiagnostics," lEEE Trans. Corap., Vol. C-21, No. 11, November 1972, pp. 1169-1183.

46. Raraaraoorthy, C. V., and Shankar, K. S. "Autoraatic Testing for the Correctness and Equivalence of Loopfree Microprograms," lEEE Trans. Comp., Voi. C-23, No. 8, August 1974, pp. 768-782.

47. Rosin, R. F. "Contemporary Concepts of Microprogramming and Eraulation," Coraputing Surveys, Vol. 1, No. 4, Deceraber 1969, pp. 197-212.

48. Rottraan, G. E. "MIKADO—A System for Computer Aidad Microprograra Design," Micro 7 Preprints, September 19 7- , pp. 203-207.

49. Saal, H. J., and Shustek, L. J. "Microprogrammed Impleraentation of Computer Measuremeat Tachniques, Micro 5 Preprints, Septamber 1972, pp. 42-50.

50. Sage, A. P. Optimura Systeras Control, Prentica-Hall, Englewood-Cliffs, N.J., 1968.

51. Scheid, Frank. "Theory and Probiems of Numerical Analysis," Schaum's Outline Series, McGraw-Hill Corapany, New York, New York, 196 8.

Page 132: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

125

52. Schoellkopf, J. p. "Microprogramming: A Step of a Top-Down Methodology," Micro 7 Preorints, Septe.-uber 1974, pp. 203-207. "

53. Steiglitz, Kenneth. An Introduction To Discrete Systems, John Wiley, New York, New York, 19 74.

54. Stewart, G. W. Introduction To Matrix Computations, Academic Press, New York, New York, 1973.

55. Thomas, R. T. "Organization for Execution of User Microprograms from Main Memory: Synthesxs and Analysis," lEEE Trans. Comp., Vol. C-23, No. 8, August 1974, pp. 783-790.

56. Thomas, R. T. "The Developraent of User Micropro-grararaing: A Survey and Status Report," Micro 7 Preprints, Septeraber 1974, pp. 212-216.

57. Tirrell, A. K. "A Study of the Applications of Compiler Techniques to the Generation of Microcode," in Sigplan/Sigraicro Interface Meeting Preprints, 1973.

58. Tsuchiya, M. , and Gonzalez, M. J. "An Approach to Optiraization of Horizontal Microprograras," Micro 7 Preprints, September 1974, pp. 85-90.

59. Tsuchiya, M. , and Jacobson, T. "An Algorithm for Control Memory Minimization," Micro 8 Proceedings, September 1975, pp. 18-25.

60. Tucker, A. B., and Flynn, M. J. "Dynamic Micro-programraing: Processor Organization and Prograraraing," Corara. ACM, Vol. 14, April 1971, pp. 240-250.

61. Wakerly, John F.; Hollander, C. R.; and Davies, Daniel "Placeraent of Microinstructions in a Two-Diraensional Address Space," Micro 8 Proceedings, Septeraber 19 75, pp. 46-51.

62. Weber, H. "A Microprograramed Implementation of EULER on IBM Systera 360/30," Comm. ACM, Vol. 10, September 1967, pp. 549-558.

63. Weinstein, C. J. "Roundoff Noisa in Floating Point Fast Fourier Transform Computation," EEE Trans. Audio Electroacoust., Vol. AU-17, Septan ÍDer 1969, pp. 209-215.

Page 133: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

126

64. Welch, Peter D. "A Fixed-Point Fast Fourier Transform Error Analysis," lEEE Trans. Audio Electrcacoust., Vol. AU-17, June 1969, pp. 151-157.

65. Yau, S. S.; Schowe, A. C ; and Tsuchiya, M. "On Storage Optimization for Horizontal Microprograms," Micro 7 Preprints, Septamber 1974, pp. 98-105.

Page 134: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

APPENDIX

A. COMPUTER PROGRAMS FOR CHAPTER III

B. PROGRAMS AND FLOWCHARTS FOR CHAPTER IV

C. PROGRAMS FOR CHAPTER V

D. FFT PROGRAMS AND RUNS

127

Page 135: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

APPENDIX A

COMPUTER PR0GPvAÍ4S FOR CHAPTER III

Assembly language n x 2 by 2 x 1 matrix-vector product routine

0002 ORG Í;?03PS

ØØP3 STA N aKøJ ST3 M 000 5 CLA.INA 00?6 STA I 0007 STA J 0008 LCCPl CLB 0009 STr 5uM 0010 LÛ0P2 LDA I 0011 A::A Mi 0012 M.= Y ^ 0013 AOA J • > 0014 ADû AAOD 0015 STA TA3 ' 0016 LOe XADD 0017 A03 J 0018 LDA 1,1 0019 MPY TAD,I 0020 ASR 1 tS 0021 AOA SUM 0022 STA SU.' 0023 LDS J 002-4 CP9 >\ 0025 Ji^P i>*4 0 026 ÎN8 0027 ST3 J 0028 JMP L00P2 0029 LD9 SADD 0030 AD3 I 0031 STA 1,1 0032 LDA I 0033 CPA V 0034 HLT 778 0035 INA 0036 STA I 0037 CLA,INA 0038 STA J 0039 j(-p LOOPl 0040 I NCP 0041 J NOP 00 42 N NOP 0043 M NOP 0044 TAD NOP 0045 Ml OCT -1 0046 SUM NOP 0047 XAOD DEf • 0048 X 8SS 123 0349 SAOO DEF • 0050 S 8SS 129 0051 AAOD D£F « 0052 A BSS 144B 0053 END

128

Page 136: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

129

Microcoded 2x2 by 2x1 matrix-vector product routine

0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 001 1 0012 0013 001 4 0015 0016 0017 0018 0019 0020 øe2i 0022 0023 0024 0025 0026 0027 002.0 0029 0 0 30 0031 0032 0033 0034 Øí^3 5 0036 0037 0038 0039 0040 0041 00 42 0043 0044 0045 0046 0047 00 48 0 0 49

SLIST = 12 $OUTPUT= SDEBU6 $ORIGIN=

$ORIGIN= MAT

LOOPl

LOOP

STOR

END

MPYA

RET $END

Q

F

A A

0

A

A

B

B

B

B

1 1

400

421 CR ADR P T P P T

S3

P P

S3 Sl

Sl

p p p T

S2 S^ S2 S2 S4 CR S2

JMP

lOR ANO lOR lOR INC lOR CFLG RFE SFLG lOR JSB ADO lOR JMP CFLG JMP INC ÎOR lOR lOR lOR JMP lOR DEC JMP lOR INC INC lOR lOR JKP lOR CLO MPY lOR SUB lOR SUB SUB CLO ARS RSB

Q Sl M F P M 0

S3 S2

A S3

S2

P M

T

A Sl

P P M A

S4 B B

B

B S2

B

MAT

17

KU

RW

MPYA

STOR

LOOP

C*

HSS END

LOOPI

.-<»( SS

RET

CNTR Rl RSS

RSS

1 1 CNTR Rl

FLG

UNC

TBZ

EOP

TBZ

RPT CTRI NEG

NEG

RPT CTRI

Page 137: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

130

Microcoded 4 x 4 by 4 x 1 mat r ix-vec tor product r o u t i n e :

0 0 0 1

ø?r;2 0 0 0 3 0 0 0 4

0 0 0 5 0 0 0 6 ØPÍ?7 0 0 0 8 0 0 0 9 0 0 1 0 0 0 1 t 0 0 1 2 0 0 1 3 0 0 1 4 0 0 1 5 0 0 1 6 0 0 1 7 0 0 1 8 0 0 1 9 0 0 2 0 0 0 2 1 0 0 2 2 0 0 ? 3 002.Û

0 0 2 5 0 0 2 6 0 0 2 7 (102«? 0.129 Pli^3ø ø*ií31 0 0 3 2 ( "033 ?.^-2A P'C35 0 0 3 6 í í ^ 3 7 0 0 3 « 0 0 3 9 Ot'TlA? í 0 4 1 /10 42

PI0 43 01.1.^4 Ø ; I J 5

:•" ;i 4 6 VlfíAl 0 0 4 S 0 0 4 9 ØøSf i i r " i |

0 0 52 0 «-5 J Ø. 'S i i 0 0 5 5 í' ;• 5 6 ííc^57 0 0 5 8 0 0 59

0 0 6 0

0 0 6 1 0 0 6 2 0 0 63 0 0 6 4

ØíT65

S L I S T = 5 2 SOUT] r w< 1 -

SD :8JG S 0 R I G I N =

s ; ; = í i G i N = .•'iAT

L O O P l

LOOP

CHEK

xo

X3

X4

MPYA

PET

A A

A

A

A

^

A

^S

8

B

P

3

1 1

4 ; Ø

421 P T P P T P

P T P P T CR

S l

S 6

S 5

CR 5 5

P P

S6 C^ S5

P C^ S5

C'^ S5

S2

S3

S-i

P ^

T

S7 SB S7

S7 S-5 c ^ S3

JMP

IO.=í l O R INC l O H lOR I \ C 10.-^ lOK

INC l O R l R CFLG RFE lOR IQR J S B ADD IQR DEC I0.-< 10« A.\D JMP

I.NC ÍOR 10*^ l O lOr ANu JMP

lOR INC lO.'^ A \ D J>'P 10.-« AND JMP lOR J-^lr^ lOR J-^p lOR J.VP I.^C I0.=? lôr: J.Sp

l O K CLZ MPY lOR sua lOR SU3 SUP CLO A.^S R 5 3

M

S l p M

52 P M

S3 P M S4

i S5

5 6 S7

A S6 A S5 A

P M

T û

P A

A

S7

S7

S7

P M M

-

S5 -i 3

8

3 S.3

B

MAT

H'"

R'f,

RW

2 0

MPYA

3

CriEK

C'A

1 7

Ta^

oNC

T 8 ^ LOOPl

1

-\S3 .Á3

2 ->bs X4

LOOf^

L u ^ j r '

1-OG.-'

.r Ví

.•tSS w r j

CN:.^ •^i R33

.^SS

I 1 C >> Tn

"<1

EOP

TêZ

TB^

TBZ

RPT C T R l NEG

N£CJ

• , i J •'•

C T K I

Page 138: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

131

Microcoded 8x8 by 8 x1 matrix-vector product routine:

eøøi 0 0 0 2

erø3 0 0 0 4 0 0 0 5 0 3 0 6 0 0 0 7 0 0 3 3 0 0 0 9 ,"• 0 1 0 8 0 1 1 0 0 1 2 3 0 1 3 0 0 1 4 0 0 1 5 0 0 16 0 0 1 7 0 0 1 8 0 0 1 9 í»023 0 ? ? 1 0 0 2 2 0 0 2 3 C i î 2 4 0 0 2 5 0 0 2 6 0 0 2 7 ?I02B 0 0 2 9

fl0 30 0 0 3 1 0 0 3 2 0 0 3 3 0^134 0 0 3 5 <?^(*3e 0 0 3 7

0 0 3 3 0 0 3 9 0 0 4 0 0 0 41 0 0 4? 0 0 4 3 0 0 4 4 0 0 4 5 0 0 4 6 00 i57 0 0 4 8 0 0 4 9 0 (^50 0 0 5 1 0 0 52 0 0 5 3 0 0 5 4

0 0 5 5 0 0 5 6 0 0 5 7 0 0 58 0 0 59

S L I S l • = 1 2 S C L . T P U T =

1.DI0UG Î O R I G I N - ^

SORIG MAT

L O G P l

LOOP

CHEK

, I N =

A A

A

A

A

A

A

A

1 1

»5,''.?

^ 2 1 P 1

P P

T P P T P P T P P T P P T P P T P P T CR

31

S 1 3

S9

CR S9

P P

S I Ø CR S9

P

CR 39

CR S9

CR 3 9

JMP

IQx 10,-< I.N.r 10,--: 10 .; INC lO- í IO 'N '

INC I3-< lOR INC 10.^' lOR INC I O N

lO-T INC lOK l O K

I c 10.^ l O K C F L o R' E l û ^ í lOR J 3 9 aoo lOK DEC 10,^^ Î O N

A N L> J>\)-' I NC lOK lOs-10-^ lOK A N ci

JMP I c -T-INC ÎO/> X O A

JMH l O xo JM,->

I G S xo.s JMP

M

Sl P M S2 P M

33 P M 3 4 P M

35 P M

3 6 P ,V

37 i->

M

S3 39

3 1 0 S i 1

A 3 1 0 A 39 A

P M

T A

P A

A

A

MAT

rí M

R»i

,-<W

.-V W

- < / , •

"C A

H'*

1^0

MPYA

TSZ C^^EK

C>. UNC

77

TBZ LQÛPl

EOP

6 TEZ

K3 5

T3Z X4 4

TSZ X5

Page 139: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

132

•0 0 6C1 f 0 6 1 ØØS2

0 0 6.5 0 0 6 a

0 0 6 5 í^ f '66 !?»ø67 0 0 6 8 0 0 69 0 0 7 0 0 0 7 1 0 0 7 2 0 0 7 3 0 0 7 4 0 0 7 5 0 0 7 6 0 0 7 7 0^7=t e iø79 0 0 8 0 Ø ø R l !?!;as?2 øC^g3 f * 0 ? 4

PIØH5 ØØÍÍ6 (^{^«7 ØC'S.í! 0 0 8 9 ø;i9:? (*091 0 0 9 2 0 0 9 3 009.Û

0 0 9 5 0 0 9 6 0 0 9 7 0 0 9 8

X2

X3

XA

X5

X6

X7

.<8

^PYA

.^ET SEND

A

A

A

P

B

R

3

C^ S9

CR S9

C'r. 3 9

32

33

S4

3 5

3 6

S7

38

o

p T

3 ! ! 3 1 2 31 l S l 1 S 1 2 CR S 1 2

10 r: X -'j ,-«.

J ••• p 10'-í X0.-{ J.vp

íO'-< < ', -,

J.vp lO- t JMP l O ^ J.'!.-' lOR J M P lOR JMP

10^^ J ' - p [ : •<

j - ^ 10~ jyp

r.c 1 0 5 l O r i J.-^P ÎOR CLO •M? Y lOR SU3 l O ^ SUB Sun C L J AR5 R3 9

A

A

A

S l 1

S l 1

S l 1

31 !

31 1

31 1

3 ! 1

P M A 8 3 1 2 B P

9

8 i l ?

=

3

X6 2

X7 1

XB

LOQP

LOOP

LOCP

L O O P

LOOP

LOOP

LOOP

, -<A

R35 i^r •j-

C N : , ^

-^i .-.3 3

.-''3 3

1 1 Z:-i -^l

TbZ

T r Z

TBZ

T3Z

-ry T CT I NEO

.N£u

KPT CT-<Î

Page 140: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

133 Microcoded, 8 - b i t p v 9 KT, o i

' o D i t , 2 x 2 by 2 X 1 m a t r i x - v e c t o r p r o d u c t r o u t i n e 0 001 0002 tí.H30 3

0004

0005 0006 a'0ø7 0 338 0009 0 •.*? î 0

031 l 0012 0013 001 4 0015 001 6 0017 ØølS 0019 0020 0021 0022 0023 0024 0025 0 026 0027

002S 0029 0030 0 031 0032 0033 ?I034 0 03 5 0036 0037 0 0 38 0039 0 0 4Í1

0041

0 0 .û2

i 0 43 Øø4.a

0045 0046 0047 004«?

0049 0 0 50 0051 0<"52 0353 0054 0055 0056 ;'>''57

0058 0^59

0060 ^061 0 0 62 0063 0064 0065 0066

SLIST=!2 SOUTPuT= S0£3'oG SORîClNs

:-ORic MAT

LOOPl

LOOP

ADD

3T0R

END

MPYA

RET SEND

;rN=

^ M

A

3

^ A F

F

0

A

A

A

3

P

3

3

1 i

4'.13

42! p

T CR

CR

CL S^

CR S2 S4

54 S3

R.RS

CR 54 S2 34 S2

CR 52 S4

P P

53 31

Sl

CR

P P P T

CR 34

52 54 S2

52 54 CR 32

J.' P

lOR lOR C.~LC

R-'E lOR CFLG lOR AND JMP S F L G

lOR NOR JS3 CFLG NOR A D lOR lOR J iM ?

ZC lOR ^N lOR IOr<

lOR j.rp

lOR SFLG NO . J." ?

ÍNC lOR lOR lOR lO.-x JMP ICr 0£C ÎOR JMP lOR INC INC lOR lOR JMP lOK CLO lO

10.-< •^.Pf lOR SUP

I.:R Su3 5US CLO AR3 RSE

V)

i F

51 IR

i S 3

34 S2

û

54 32

54 3 A

53

F 34

S2 C A

32

A

ÍA

32

P M

T

A 51 F

P P M A B S-î

S^ 8 9

8

3

32

8

MAT

Rw

1

27

377

Lo:t^ 377

MPYA RS3

'- 33 STCP

37?

S~ .J2

3.-32

L C O P

377

L J O P

0».

-t S 3 clNO

4 1

L0C--1

r\ "

-33 Pl T Ø

CNT'^

Rl .-<3 3

R33

-1

CNTR R!

.NIG

FLG

T3Z

N £ G

oNC

T3Z

EOP

T=Z

RPT C TRI

.N'£G

NEG

PPT C T R I

Page 141: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

134

Microcoded, 8-bit, 4x4 by 4x1 matrix-vector product routine

0001 8002 0003 0004 3 00 5 0006 0007 0 003 0009 0010 001 1 0012 0013 001 4 0015 0016 3017 0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 0 028 0029 0030 0031 0 032 0033 0034 0035 0036 0037 0038 ^^039 0 0 40 0041 0042 0043 0044 00^5 0046 0047 0048 0049 0 0 50 0051 0 0 52 0053 0054 0055 0056 0057 0 0 5? 0059

SLIST = SOUTPU SOtlB J SORiGI

12 r _ < —

N =

S0RIGIN= y.AT

LOOPl

LOOP

0.4 F.K

X2

X3

0

A

3

B A

A

,a

û

A

A

0

A

F

1 1

400

421 P T P P T CR CR

CL S4

j CR S2 S4

34 53

51

CR 51

P P

53 CR Si

p CR 51

CR Sl

C^ 34 52 34 S2

CR S2 34

CL 54

JMP

ICR io;< I NC lOR lOR lOR CFLG RFE CFLG lOR AND JMP SFLG lOR NOR JS3 CFLG NOR AD lO.R lOR D£C lOR AND JM.-

INC l R lOR lOR lOR ANO JMP I3R INC lOR AND JMP lOR ANO J'MP IC,- AND lOR lOR lOR JMP lOR

M Q ? M F IR 31

33 S4 52

A S4 52

54 3 A 53 A 51 A

P •

T A

P A

A

3^ i2 34 w —

^ ->

SFLG S^ NOR JMP lOR AND JMP

52

54 5'2

MAT

RW

RW

27 20

377

LOOP 377

MPYA R53

3

Cr\Z<

• ^ * •

1 7

'._O0P1

• . SS X3 2 rSS x 4 37 7

SRG2 ÍP.G2

L 0 û P 377

LOOP 377

L O O P

N£G

LG

TêZ

UNC

TBZ

EOP

TB;

N£(

NEÍ

Page 142: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

135

ØC60

» ø o l 0 0 62

. 0 0 63

0 0 6 4 X 4 0 0 6 5 0 0 6 6 0 0 6 7 0 0 6 8 0 0 6 9 0 0 70 0 0 7 1 0 0 7 2 0 0 7 3 ø ' ) 7 4 MPYA 0 0 7 5 ØC176 0 0 7 7 0 0 7 8 Ø Í Í79 0 0 8 0 0 0 ? ! 0 0 8 2 0 0 8 3 ØiTS4 0 0 8 5 CÍ086 0 0 8 7 0 0 8 8 0 0 8 9 0 0 9 0 .^£T 0 0 9 1 ££ , \D

A

F

A

A

J 3

R

8

S

CR S5 3 4

CR S4 52 5 4 52

CR 32 S4

P ? T

CR S4

5 2 S4 52 32 5 4 CR S2

S "LG l R SO'R JMP lOR A.N lOR 10,-. 10,-JMP lOR SFLG NO R-JMP î NC lOR lOR JMP

l O P CLO lOR lOR MPY lOR Su5 10? SuB 5UB CLO A .=. S RS3

A 3 4 32

3 4 S2 3 4

32

A 3 4 32

P M A 3 S ^

3 4

8 B

3

3 52

5

3 7 7

LOGP 3 7 7

SRG2 SRG2

LOOP 3 7 7

LOOP

R'« KS3 .^£T ! 0 CNTR

R! RSS

R3 3

3 CNTR R\

NEC

TBZ

RPT C T R I NEG

N£G

RPT CT.-<I

Page 143: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

APPENDIX B

PROGRAMS AND FLOWCKARTS FOR CHAPTER IV

Assembly language Gaussian elimination program:

0 íi ,11 0.^'?2 0 0 0 3 0 0 3 4 0 3 3 5 0 0 0 6 y 3.? 7 C2!08 ø,:-ø9 0 0!'.^ 0 0 1 1 0 3 ! 2 0 0 1 3 0 3 ! 4 0 0 15 0 '0 1 6 0 0 1 7 0 3 1 8 0 0 1 9 0 0 2 3 Øv-^21 0 0 2 2 ø ; i 2 3 0 0 2 4

0 0 2 5 0 3 2 6 0 0 2 7 0 0 2 8 0 3 2 9

0 3 3 3 0 0 3 1 0 3 3 2 0 0 33 0 3 34 Ø Í ' ^ S 0 0 3 6 0 0 3 7

0 0 3 8 0 3 3 9 0 0 40 0 0 4 ! 0 3 42 0 0 4 3 Ø'í^4 4

0 3 4 3 0 0 4 6 0 0 4 7 0 0 43 0 0 4 9 0 0 50 0 3 51 0 0 52 0 0 5 3 0 0 5 4 0 0 5 5 0 3 5 6 Ø ' f 57 0 0 5 8 0 0 5 9

A S '•: 3 /

L O O ? !

STO?

«

A , 3 * L . T HEO GA 'JSSrAN E L I M I M A T I O N CRG 4 3 3 3 ' 2 3 STA N A'DA ,Ml STA R C L A , ! N A 5TA ,•< J S 3 .XAX LDA Rf^.O STA A I LDA GAMMA STA A J J S 3 GETAD LDA T A D / I S Z A ^ R S S JMP STC? J S 8 ÍRO ' * J S 3 ICOL J 3 3 ZLlf< J S B ZERO LDA K CPA R H L T 77 3 I N A STA K JMP L 0 0 P 1 LOA < A A Mi STA R H L T 1 ! 8

• S U 3 R 0 U T I N E MAX: C A L C U L A T 1 3 A

MAX NOP LDA K STA I STA J STA A I STA AJ J S 3 GETAD LDA T A D ^ I SSA CMA STA TMAX L D û TAD STA TEMPl LDA I STA RHO LDA J STA GAMMA

L 0 0 P 2 LDA J

LOO.o

I N A STA A J STA J

'3 J S B GETAD LDA TAD.. I S 3 A / R 5 5 CMA

/NO MIC.-^OCOLJZ

3 .^AX AL.="A

136

Page 144: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

137

0 9 6 0 ADA TMAX 0 3 6 1 SSA,RS5 0 3 6 2 Jy? COMl 0 0 6 3 LOA TAD* I 0 0 6 4 S3A 0 0 6 5 C.-iA 0 0 6 6 STA TMAX 0 3 67 L;)A A I 0 0 68 STA RHO 0 0 6 9 LDA AJ 0070 STA GAMMA 0071 LDA TAD 0 972 STA TEMPl 0073 COMl LDA J 0074 CPA N 0075 JMP ••2 0076 JMP L00P2 0077 LDA I 0078 CPA N 0079 JMP MAX*I 0080 INA 008! STA I 0082 STA AI 0083 LDA K 0084 STA J 0085 STA AJ 0086 JM? L00P3 0087 • 0038 « SUBROUTINE IROW: INTERCHANGES ROWS AS 0089 *• DETERMINED BY THE VALUE OF RHO 0090 • 009! IROW NOP 0 092 LDA K 0 0 9 3 STA J 0 0 9 4 STA A I 009 5 STA AJ 0 0 9 6 L 0 0 P 4 JSB GETAD 0 0 9 7 LDA TAO 0 0 9 3 STA TEMP2 0099 L A TAD*1 0 100 STA TEMPl 0101 LOA RHO 0 1 0 2 STA A I 0 1 0 3 JS3 GETAD 0 1 0 4 L A T A D . I 0 1 0 5 STA TEMP2* I 0 1 0 6 LOA TEMPl 0 1 0 7 STA TAD* I 0 1 0 3 LOA J 0 109 CPA N 0110 J.VP IROW,I 0 1 1 1 INA 01 12 STA J 0 113 STA AJ 0114 LDA K 0115 STA AI 0116 JM? L00P4 01 17 * 0118 * SUBROUTINE ICOL: INTERCHANGES COLUMNS AS 0119 • DETERMINED BY THE VALUE OF GAMMA

Page 145: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

138

0 123 »3 12! 0122 3 123 0 124 0 125 0 !26 0 1 ?. 7 0 128 0129 0 1 33 0 1 31 3 132 0 133 0134 0135 0136 0137 0 1 38 0 139

0 1 40 01 41 0 ! 42 0 ! 43 0 ! 44 0 1 45 0 1 46 0 î 47 0 ! 4? 3 1 49 01 50 0151 3 1 52 0 ! 53 01 54 0 ! 55 0 1 5 6 0 ! 57 0 1 53 0 159 0 ! 60 0161 0 1 62 0 163 0 ! 64 01 65 0166 0167 0! 63 0 169 0 1 70 01 71 0 172 0! 73 0 1 74 0! 75 0176 0177 01 78 0 179

ÍCOL

L00P5

NO? LLiA STA C L .A. STA 5TA

J S 3 LDA STA LOA 3TA LOA STA J S B LDA STA LDA STA LOA CPA JMP

I N A STA STA LDA STA JMP

K

-^J » INA

I A I

G £ T -' D T " û . , I

; : v p i

TAD T f : r P 2 G A M y\ A

A J

GETAD TAÛJ I TEMP2 TEI:-^.?! T A O , I I N

I C O L , :

1 A I •<

A J

L 0 0 P 5

SUBROUTINF. ELIM

ALFAC I, J)>ALFA< I. J)-C ALFA( I,K)«AL.-A(-<, J) J/AL.,--A:<,.<>

ELIM

L00P6

NOP LDA STA STA INA STA STA

J S 3 LDA STA LDA STA J S B LDA 5TA L D 8 ST8 LDB STB J 3 B LDA MPY D I V CMA STA LDA STA

K A I A J

I J GETAO TAD^ I TE.'-IPl I A I GETAD T A D , I TEMP2

A I J A J

GETAD T A D / I TEMP2 TEMPl

TEMP2 I A I

Page 146: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

139

0 180 0181 0132 0183 '318 4 3 185 0186 0137 0 188 0!S9 3 190 0191 0192 0193 0194 0195 0196 0197 3198 0199 0200 0231 02ÍÍ2 0203 0234 32í<3 0236 3237 3208 0209 3213 32 11 3212 ;i213 021 4 3215 3216 fî21 7 í'213 3219 3 220

3221 0222 3223 0224 3225 3226 0227 3228 3229 Í1233 «231 9232 3 233 0234 3235 3236 3 23 7

^^33 3239

02 40 0241 0 2 42 0243 0244 3245 0246 0247 3 2 48 0249 0250 0251 0252

* SUB « 2ER

L00P7

« * S'J « GETA

N I J AI AJ K

RHO 6AMMA TAD R Ml TMAX TEMPl TEMP2 TEMP3 ONE AOD ALFA

LDA STA JS9 LDA ADA STA L A CPA JMP iNA STA LDê STa JMp LDA CPA JMP INA STA LDA STA INA STA JMP

J AJ GETAD TAO, î TEMP2 TAD* I 1 N ••6

I K AJ LG0P,6 J N ELIM> I

J •,

AJ

I L00P6

ftOUTINE Z L ^ J : A L F A ( [ , K ) •-

NOP LDA ST'i INA STA STA JS3 CLA 5TA LÛA CPA JMP INA STA STA JMP

K AJ

I AI GETAL'

Tûj,I

I N ZZ'.-j. i

I AI L00P7

= ROUTIN£ ÚF.TAC: JALCoLATE

NOP LDA AOA MPY ADA ADA STA JMP NOP NOP NOP NOP N P NOP

NOP NOP NOP NOP OCT NOP NOP NOP NOP OCT 1 DEF ' 8SS END

AI Ml N AJ ADD TAD GETAD^I

-1

331 ?^?. « 1 44

5

3» I = K * 1 í K * 2 i

; D . - ! E 3 S 0.=" ALF..i( A I , - J )

Page 147: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

140

Microcoded Gaussian elimination program

0 001 0 002 0003 0004 0005 0 006 0007 0 008 0009 0010 001 1 0012 0013 0014 0015 0016 0017 0018 0019 0020 0021 0022 0023 0024 0025 0 02 6 0027 0023 0029 0030 0031 3032 0033 0034 0035 0036 0037 0038 0039 0 040 0041 0 0 42 0043 0044 0045 0046 0047 0048 00 49 0 0 50 0051 0 0 52 0053 0054 0055 0056 0057 0058 0059

ASMBíAíB*L

LOGP!

wi

W4 K

.RHO W5

GAMMA

W6

STOP

HED ORG STA lOR STA STA STA STA STA STA STA LOA lOR STA LDA lOR STA LDA lOR STA CLAi STA jsa LOA L08 NOP DEF LDA SZAJ

JMP NOP NOP DEF NOP NOP DEF DEF NOP JS3 NOP DEF DEF LDA INA CPA HLT STA JMP LDA ADA STA HLT

.,T GAU5SIAN 430036 N r AC! >:\ W2 W3 W8 «9 wiø Wl 1 N MAC2 W4 N MAC3 W5 N MAC4 W6

, INA K MAX RHO GAMMA

ADO 1* I

»R33

STOP

ADD

K ADD

ELIM

K ADD K

N 77B K LOOPl K Ml R 1 IS

• SU9R0UTINE MAX: * MAX NOP

LDA STA

K I

ELIMINATI

CALCULATZ

.'NW1ICR0CGDED

ZS ^AX ALFA

Page 148: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

m

141

0060 0061 0062 0063 0064 0065 0066 BØÔ7 0063 0069 0070 0371 0072 0073 0074 0075 0076 0077 0078 0079 0080 0081 0082 ØØS3 0084 0085 0086 0037 0085 0089 0090 0091 0092 0093 0094 0095 0396 0097 0098 0099 0100 0101 0102 0103 0 104 0105 0 13 6 0107 0133 0109 01 10 0111 01 12 0113 01 14 0115 0116 0117 01 18 0119

w?

L00P2

L00P3 W3

COMl

*

STA J LD3 K NOP DEF ADO LDA 1,1 SSA CMA STA TMAX LDA 1 STA TEM?! LDA í STA RHO LDA J STA GAMMA LDA I INA STA I LD9 J NOP DEF ADO LDA 1,1 SSA,RSS CMA ADA TMAX SSA,RSS JMP COMl LDA 1,1 SSA CMA STA TMAX LDA I STA RHO LDA J STA GAMMA LDA 1 STA TEMP! LDA I CPA N JMP **2 JMP L00P2 LDA J CPA N JMP MAX,I INA STA J LDA K STA I JMP L00P3

* SUBROUTINE ELIM «

• ALFA<I,J) - ALFAÍI,J)-CALFA(NK)«ALFA(K,J)J/ALFA(K. *

ELIM NOP LDA K INA STA I STA J LDA K Loa K

K )

Page 149: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

142

0 ! 2 3

0 1 2 ! 0 1 2 2 0 1 2 3 BISA 0 1 2 5 0 1 2 6 0 1 2 7 0 123 0 1 2 9 0 1 30 0 ! 3 ! 0 1 3 2 0 ! 33 0 ! 3 4 0 1 3 5 0 1 3 6 0 1 3 7 0 ! 3 8 0 139 3 1 43 0 1 41 0 142 0 ! 43 0 1 4 4 0 1 4 5 0 ! 46 3 ! 47 3 1 4A

0 1 49 0 1 53 0 1 5 ! 0 ! 52 0 1 53 0 1 5 4 0 1 5 5 0 1 5 6 0 ! 57 0 1 58 0 ! 59 0 1 60 3 1 6 ! 0 162 0 163 0 ! 6 4 0 ! 65 0 1 6 6 0 1 67 0 1 6? 0 ! 69 0 1 7C 3 171 0 1 72 0 ! 73 0 1 7 4

0 1 7 5 C ! 7 6 0 ! 77

'T''J

Loo;^: v;9

>. !0

Wl 1

N I J N A C l .^AC2 MAC3 MAC4 R Ml TMAX T ^ M ^ Î Tr.MP'2 TE-^pa ONE

ADD ALFA

NOP D £;,-" A D û L D A ! > I STA T : ; ; ' ; ? ! LDA I

> L D 3 ' Í M P

D£F ADO LOA 1 , 1

STA T Í : . ^ P 2

LDA L D 3 J iNOP D I , ~ ADD LDA 1 , I

MPY T::.^1P2 D Î V T E M P l CMA STA TEMP2 LDA I L D 8 J NOP DEF ADD LDA 1 , I

ACA TE: . ' - ;P2

S TA 1 , 1 LDA I C?A N JMP t + ^ I N A STA I JMP L 0 C P 6 LDA J CPA N J M ? E L I M , I I N A STA J LDA K I N A STA I JMP L 0 0 P 6 t'lQP NOP NGP OCT 1 0 5;} 33 OCT 1 3 5 3 2 0 CCT 10 53 43 OCT 1 3 5 3 6 3 ,NOP OCT - ! NOP NOP NO,^ NO? OCT 0 3 1 3 ? : ; DEF * 3 5 5 1 4 4 END

Page 150: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

143

Microcode for Gaussian elimination program;

0 00! SLIST=12 0002 SOUTPUT=!! 0033 SDE3LG 0904 S0RIGIN=4?0 0005 Jr-.P GETAD 0336 JVP IROU 0 007 JMP ICCL 0008 JMP ZERO 0 0 39 S0RlGIN=.ú2l 0010 GETA CR lOR 0 17 0011 0 ADR AND 52 0012 A lOR Sl 0013 8 DEC A 0014 CR lOR IK 27 0015 S2 lOR S4 3RG2 0016 CR lOR 52 14 0017 lOR S 0018 S2 CLO CNT,^ .-!PT 0019 B 5 4 MPY 3 RÍ CTRI 0 020 3 S! ACD B

,y 0 021 P lOR 0022 B T ûDD 3 EOP 0023 P IMC P 0 02 4 IROW Crt lOR 0 17 0 02 5 Q AO? AND 3 4 0026 C^ lO.R IR 27 0 0 2 7 5 4 lOR 52 3 ? G 2 0 0 2 8 P lOR M RW 0 0 2 9 T lOR J 3 0 3 0 P INC P 0 0 3 1 P lO-í . -W

r 0 0 3 2 T lOR 0 0 3 3 Q T AOO 31 0 0 3 4 P INC P 0 3 3 5 P l O R M RW 0 0 3 6 F T ADD 33 0 0 3 7 LOOP Q DEC A 0 0 3 8 CR lOR 3 14 Ø í ' 39 B RRS lOR CNTR 0 0 40 CLO B RPT 0 0 41 B 32 MPY 3 .^! CTRI 0 0 4 2 3 S l ADD A 0 0 4 3 B 5 3 ADD B 0 0 4 4 A RRS I 3 R M .R'* 0 0 4 5 ^ T lOR F 0 0 4 6 8 RRS lOR M r^V. 0 0 4 7 T lOR 5 4 0 0 48 A RRS lOR M CW jNC 0049 I3R 0050 S4 lOR T 00 51 B RRS lOR M CW JNC 0052 lOR 0053 F RRS lOR T 0054 0 lOR A SRG2 0055 A S2 XOR RS3 T3Z 0056 JMP ^^^ 0057 0 l^C Q 0 0 58 J- - »-00° 0059 END IQR ^OP

Page 151: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

144

0 0 60 0 36! 0 062 0363 0 3 64 0065 0066 0 0 67 0068 0369 0070 0071 0072 0073 0074 0075 0076 0077 0078 0079 0030 03^1 0 062 0083 0084 0085 0 05 6 0 08 7 0 088 0089 0 090 3091 0 09 2 0 3 93 0094 0095 0396 0097 ØC98 3399 01C3 0 131 0 102 0103 3 l n J

0 13 5 3 106 3 ! 37 3 138 e '.09 0 110 0 1!! 0 I ! 2 0 113 0 114 0 ! 1 5 0116 0 117 0 1 !3 0 119

ICOL 0

G

LOOP! û A

E

3 B

A

3

p

3 A

A

F

Q

Q A

0

A

ENC

ZERO A

' P CR ADR CR S4 CR P T 53 T P P T P P T CR 31

CR RRS

52 S3 S! T S4

CR RRS

S2 S3 RRS T RRS

RRS S!

RRS CR 51 32

CR 31 53

P CR ADR CR 54 P T Sl T 1

INC lOR ANO lOR 10' 10,-lOR lOR lOR ÎOR INC lOR INC INC lOR lOR lOR AND DEC lOR I OR CLO MPY ADD lOR ICR lOR DEC lOR lOR CLO MP f

ACD lOR lOR lOR lOR lOR lOR lOR lOR lOR AND XOR JMP lOR ADD lOR INC jvp

lOR I NC lOR AND lOR lOR lOR lOR lOR IC^

P

34 IR S2 0 M 53 M Q P M 53 P M S^ S! A A 3

5 3

3! f^

F A A 3

3

3 A M B

M

T M

i

31 A

31

A

33

P A 34 IR 32 M

31 M

;jí

1 ••

27 SRG2 23 r^i A

Ru

R'"

RW

17

! 4 C'JTR

-í!

w ^

\ ^ W . ' 1 í^

R!

-í'.«

C/.

c*

Jév^

H33 £NC

2.'

LODFl

! 7

27 3.-.J2

r W

RPT CTrtl

RPT CTRI

UNC

UNC

TBZ

ECP

Page 152: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

145

•• 1 c'.''

ØÎ21 3 12? 0 1 2 1 0 124 0 Î25 3126 0 127 0128 0 129 0130 0 131 0 132 0 133 0 !34 0 135 0136 0137 0138 0 139 0 1 40 0141 0 142 0 1 43 3 l 44

- •

LCC,^2

ENDZ

SENO

Q

Q

B

B B A

A

B

A A

A

P P "T

CR .RRS

52 S3 54

,= PS

RRS 54

S2

S4

P

INC INC lOR lOR DEC 10. ' lOR CLO y.pr AOD A D lOR lOK lOR lOR lOR lOR XOR JMP lOR INC JMP lOR INC

S4 P M S3 A 3

a 3 A A 3 M

T A A

A 54

P

RW

1 4 C.NTR

.RPT Rl CT.RI

CW UNC

SRG2 RSS T8Z ENDZ

L0 P2 EOP

Page 153: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

146

1+1,3' ' 13

YES

a. _ .H-TiyiAX 1+1/3 1

ADD -*- TEMP

\ ^ ;

Flowchart: Subroutine, obtain maximum a..

Page 154: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

147

ENTER

zrz k - j k-> i

/ \

GETAD

ttj^ . -»• TEMPl

TAD -»- TEMP2

NO

k^i

j + 1 •> j

Flowchart: Subroutine interchange rows

Page 155: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

148

( ENTER '• V . /

k + 1 -í- i

GETAD

a,, -í'TEMPl ik TAD -Í-TEMP2

^ k " ^

GETAD

"ik^"i,T,

NO

EXIT )

i + 1 -> i k ^ j

v_

Flowchart: Subroutine interchange columns

Page 156: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

149

i + 1 i

ENTER \ /

k-> j k - ^ i

1 . ... ,1, —

a. .- rEMP2^. . 13 13

NO

k + 1 j j + 1 - j

Flowchart: Subroutine elimination otij ct ^ - a -aj /aj j

Page 157: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

150

i + l->j

Flowchart: Subroutine zero

,- f '

\^ ENTER J '

n { i - l ) + j + A

' A + ADDRESS

- 1 -> TAE \Í'

t

) ; !

j

EXIT "N

Flowchart: Subroutine get array element address

Page 158: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

APPENDIX C

PROGPvAMS FOR CHAPTER V

Assembly language Kalman f i l t e r program:

0031 0002 0303 0 33 4 0005 0006 2 307 0003 0009 2010 031 l 0312 0013 0014 0015 0016 0017 0018 3019 0020 0021 0022 0023 0024 0025 0026 0027 0023 0029 0030 0331 0032 0033 0034 0035 0036 0337 0 338 0039 00 40 0341 0042 0343 0044 0045 0346 0047 0 3 48 0049 0050 0051 0052 0 0 53 0054 0055 0056 0057 0053 0059

ASMB*A*a*L.TiC

THIS PRÛGRAf- PERF0R;^.S K A L M A N FILTERI.'-.G FOR TaE FCLLO. I.">G StCZS^j SVSTE--

XIDCTCT) = X2CT) X2D0T<T) = •n(T)* \MZRZ - (T) IS GAUS5IAN wHITE NOISE * l Tri ZERO «EA ANO VARIANCE 0" 1

THE FÎLTER EQUATIONS A-E P11D0T<T) = 2*P!2<T) - < P 1 ! < T ) ••»2 ) / 1 6 P12D0T<T) = P22<T) - P!l<T)*rl2<T)/16 P22<T) = -<P12<T)**2)/16 * 1

N0I5E 3r IS GAU33IAN

THE MEASUREMENT IS CORRUPTED 3Y Z<T) = X K T ) • V<T)/ wHERE v<T) ANO VARINCE OF 16 X1HATD0T<T) - X2HAT<T) • 1/!6*P11<T)*CZ<T) -X2HATD0T<T) = 1/13* P!2<T)«CZ<T) - XlHAT<T)j THE KALMAN GAINS ARE K1<T) = Pll<T)/!6 AND K2<T) = P12<T)/!6

KALMAN F I L T E R ' A I T H N Ø M I C R J C O D E

;-iE E ^ u A T I O N W n i T E N 0 i 3 : U I Tri 0 - M

X H A T < T) 1

KALMN

•> = HEO ORG CLA STA STA STA STA STA 3TA STA LDA STA LDA STA LDA STA LDB JS3 MPY ASR ADA STA LDA MPY ASR AOA STA LDA MPY OIV CMA ADA STA MPY ASR STA LDA MPY

10003

XIK X2K X1KH X2KH P12K P22K CNT Kl KAL! K2 K.AL2 MIØØØ COUNT WRAN NFÎX TEE I IB X2K X2K X2K TEE 1 18 XIK XlK.l Pl !K TEE CONST

INÍTIALÎZATION

1 X2<K+!) = X2<K) * T^XK)

XI <K+l ) XÎ<K> * T«X2<-<)

ONE TEMPl P î 1K U B TEMP2 P12K TEE

<1 T*?! ! <K )/16)

Pl 1 CK)*C ! - T-»P1 ! <K>/1 63

151

Page 159: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

152

0 3 60 0 36! 00 6? 0063 0 3 6 4 3 365 3066 0 3 67

G068 0 3 69 0 3 70

0 vl 7 1

0 0 72

3 37 3 0 37 4 0 375 007 6 0077 3 0 78 0079 0083 038! C^ØP? 3 0^3

ØØS-i 3 3 (5 ^35=6 3 3Í7 3 083 3." 3 9 3 3 93

3 3 9 1 3 3 92 0093 03Í4 3 3 9 5 3 P 9 6 3 397 3 098 0099 3 1 33 3 131

3 132 ?! ! 3 3 3 104

3135 0136 3 137 313? 0 139 • ! 1 3

M ! 1

0! 12 ^113 0 114

0115 31 ! 6 0! 17

0 118 0 119

ASR ADA STA L A r?Y ASR 3 T.A LDA MPY ASR ;> :j.\

3TA

L O ^ MPY AS.-MPY D i v CKA AOA AÛA STA L D 8 J S P ADA STA LDA CMA AOA 5TA MPY D Î V ,rPY ASR STA LDA •VI p Y

A3R AOA A A STA LDA MPY D I -MPY ASR ADû STA LUA MPY D I v 3TA 3TA LDA MPY o : V ^TA STA LDA ADA STA

13B T^JMp? ^ 1 1 K 1 TEMPl P ! ?K -. i3 TEMPl P22< TEE 1 13 TEMPl r î ? K 1 .- 12K - I

1 IB TEE C0N3T

TEE P22K P22K1 VRûN \FIÁ X i K Z< X I K H

z< TEMP1 Pl !K CO^ST

1 !B

TEMr?

Á?<-i -r — —

! 13 TEMP? XI KP

X1Á -; 1

T£,^iP 1 ^!2< C0N3 T TEE 1 !B x? ,-: x? .-<; --1 1 ' ONE CON;-Í:T < \Á

<^L\,l P\2< O'NE : J N S T f<2K

KAL2>I KAL! ONE! KALl

P1!<K*1) = 2 » T ' P 1 2 < K ) * PI!<K)*<1 - T-*P11<K)/16

r 1 2 < K ) • C ! - T • P 1 1 < . >) / 1

P1 2 (K-f 1 ) = T* P 2 2 < K ) * P! 2 < A ) * < 1 - *P 1 1 < K ) / ; 6 >

P 2 2 < K - l ) = .-22<K) - T*<P ! 2 < K ) * » 2 ) / I 6

Z < K ) = ,'. 1 ( K ) • vv ( K )

£ ( K ) - U - . T < K )

P ! 1 ( K ) * C :- ( K ) - A 1 ,H.:; , ( K ) J * T / 1 6

1 ,-i-iT(K • 1 ) =X t .-;AT(K ) • T* A 2 - - T( K ) ••'-! l ';- ) -C i.( A ) - / ! i

;^?HAT(> Ij = X2,-H.^T(K) • T»PI i(K ) C (K ) -Al-i fA)

K K A ) - r-M(K)/16

K 2 ( K ) - P l 2 ( K ) / ! 6

Page 160: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

153

0120 LDA KAL2 0!?1 ADA ONri 0 122 5TA KAL2 0 123 I3Z C:.NT 0 12.Û jM? *•? 0 125 HLT 0 126 LOA XlK.l SHIFT3 0 127 STA X K A\D 012'H LDA X2K.! STC,= £3 0129 3TA X2K PA5T 0130 LDA ?i!r! VALUES 0131 3TA Pi ; 0.=" 0 132 LDA p; >K! X2(K*-l ) 0133 STA P12K XI(K+l; 0 134 LCA P22K1 P!1(K*!) 0135 3TA P?2K P12<K+l) 0 136 LOA .(MHI P22<K*l) 0137 STAAl-H X!nAT<K*l) 0 138 LDA X2KH! A\D 0139 STA X2KH X?-iAT<K>l) 0140 ISZ CNT 0 141 ISZ CNT 0 142 JMP KALMN 0 143 NFIX NOP 0 144 AD9 CNT 0! 45 ST3 TEMPl 0 146 ADB ONE 3 î 47 LDA 1,1 L38 • E.;-? OF RT3 GCL3 TO A RiG. 0 143 SLA EXP P0 5ITIVE? 0 149 JM? **I2 NO 0150 ANO MASKl YES, EXT.-ACT EXPCNENT 0!51 ARS 0152 lOR SHTL FORM LE.~T SHIFT I M5T,-.LCTI0NÍ 0153 STA LSHT 0 154 LDA TEMPWI M33 OF RT3 GGE3 TO A .R£G. 0155 ARS/ARS 0 156 ARS*ARS SCALING 0157 A R S J A R S

0158 CLB 0159 LSHT NOP LEFT SHIFT INST..»C0UNT DETERMIN J B r EXP • 0160 JMP *-*-17 0161 AND MASK2 EXTRAC 7 ^LGATWE £XPONZNT 0162 A L F , A L F

0 163 ARSíARS 0 164 ARS>ARS EXTEND3 £.<P TO ^HOLE WORD 0165 ARS#AR3 0166 ARSíARS 0 167 ARS 0168 CMA 0169 lOR SHTR FORM RIGHT SHIFT I.N3TPLCTI0N 0170 STA RSHT 0171 LDA TE.^Pî , I 0172 ARS#AR5 0173 ARS,ARS SCALÎNG 0174 ARS/AR3 0175 CLS 0176 RSHT NOP RIGHT SHIFT INST-/COUNT DETEJRMINED -dY EXP 0 1 77 JMP NFIXí I 0178 Kl DEF KALll 0 179 K2 DEF KAL22

Page 161: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

154

0!Sa WRAN DEF WRTB 0 131 VRAN DEF VRTB 0 1S2 KAL! NOP. 0 183 KAL2 NC? 0 184 XIK NOP 0 185 X2'K NOP 0 136 X!K.l \0? 0 18 7 X2K.I NOP 0 183 XlKHl NO? 0 189 X2KH1 NOP 0 190 X\<ri NQP 0 191 X2KH NC? 0 19? Pl !K OCT 3310*33 0193 PllKl NÛP 0 194 P12K NO? 0 195 P12K1 NOP 0 196 P22K NO? 0 197 P22K1 N P 0198 KIK NOP 0 199 K2K NOP 0230 TEMPl NOP 0 23 1 TEMP2 NOP 0202 TEE OCT 030103 0 20 3 M!303 DEC -33 5 Tí^d^ CNT NOP 0235 MA3K2 OCT -" 33376 0 23 6 COUNT NOP 3207 MA3K! OCT 030336 3 238 ONE OCT 33!333 0209 CONST OCT Ø ' ' øø 0210 ZK NOP 32!1 ONE! OCT 03033! 0212 SHTL OCT 103323 02!3 SHTR OCT !3î320 021 4 KALI 1 BSS 10. 0 0215 KAL22 3S5 !333

Page 162: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

f'

155

Microcoded Kalman filter program:

0 0 0 1 A S M B > A , B * L , T , C 0 0 0 2 • 0 0 0 3 * THIS PROGRAM PEPFORMS KALMA,\ F I L T E R i N G FOR TH £ F C L L O ' - I N G 3EC0N0 0 0 0 4 * SYSTE."! 0 0 0 5 * X ! D O T ( T ) = X 2 ( T ) 0 0 0 6 * X 2 D 0 T < T ) = W Í T ) * •^HERE W(T) !S GAUSSIAN •^HITZ N ISE > ITH ZERO M£A 0 0 0 7 * AN VARIANCE OF 1 0 0 0 8 * 0 0 0 9 * THE F I L T E R EOUATIONS A.RE 0 0 1 0 * P11D0T<T) = 2 * P 1 2 < T ) - < P ! K T ) * * 2 ) / ! 6 0 0 1 1 * ? 1 2 D 0 T < T ) = P22<T) - P ! K T ) * P ! 2 < T ) / l 6 0 0 1 2 * P 2 2 < T ) = - < P 1 2 < T > * « 2 ) / 1 6 • 1 0 0 1 3 * 0 0 1 4 * THE MEASUREMENT IS CORRUPTED 8Y NOISE BY THE E J J A I I O N

0 0 1 5 * Z<T) - XI <T) * V < T ) * V^HEHE V<T) I S GAJ33IAN ».P.IT£ . \ 0 I3E * l T.-l 3--< 0 0 1 6 * AND VARINCE OF 16 0 0 1 7 • X1HATD0T<T) = X2HAT<T) * 1 / 1 6 * P 1 K T ) * C Z < T ) - X1HAT<T)3 0 0 1 8 * X2HATD0T<T) = 1 / 1 6 * P ! 2 < T ) * L Z < T ) - X ! H A T < T ) 3 0 0 1 9 * T H E K A L M A N G A I N S A R E

0 0 2 0 * K K T ) = P l l < T ) / ! 6 AND 0 0 2 1 * K 2 < T ) = P ! 2 < T ) / ! 6 0 0 2 2 HED KALMAN FILTER wlTH MICRCCOûE 0 0 2 3 ORG 10033 0 0 2 4 CLA 0 0 2 5 STA XIK 0 0 2 6 STA X2K 0027 STA <1KH 0 028 STA X2KH 0029 STA ^12K 3 030 STA =2 2K 3 031 STA CNT INITIALIZATION 0032 LDA K! 3 0 3 3 STA K A L ! 3 0 3 4 LDA K2 0 0 3 5 STA KAL2 0 0 3 6 LDA MIØØØ 0 0 3 7 STA COUNT 0 0 3 8 KALMN LD3 WRAN 0 0 3 9 JSB NFIX 0 0 4 0 OCT 105330 0041 ADA X2K 0 0 4 2 STA X 2 K . 1 X 2 < K > ! ) = X2<K) * T* - < K ) 0 0 4 3 LDA X?K 3 3 4 4 OCT 105000 0 0 4 5 ADA XIK 0 0 4 6 STA X l K . l X K K * 1 ) = Å\(K) > T*X2<K) 0 0 4 7 LDA P! !K 0 0 4 8 MPY TEE 0 0 4 9 OCT 105020 00 50 CMA 0 0 5 1 ADA ONE 0 0 5 2 STA TEMPl <1 - T * P ! ! < K ) / 1 6 ) 0 0 5 3 MPY P ! I K 0 0 5 4 ASR I I B 0 0 5 5 STA TEMP2 P 1 1 < K ) * C ! - T * P 1 1 < K ) / 1 6 3 0 0 5 6 LDA P12K 0 0 5 7 ASL I 0 0 5 8 OCT 105000 0 0 5 9 ADA TEMP2

Page 163: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

156

0360 STA Pli i P I K K M ) = 2*T*P12<K) * P!KK)*<1 - T*P!KK)/16 0 0 6 1 LDA TE.'-'Pl 0 0 6 2 MPY P12K 0 0 6 3 ASR 1 13 0 0 6 4 STA TEMP; P 1 2 < K ) » C ! - T « P ! K K ) / 1 6 ] 0 0 65 LOA P22K 0 0 6 6 OCT 135030 0 0 6 7 AOA TEMPl

f 'P63 STA P12K1 P 1 2 < K + ! ) = T *P22<K) + P ! 2 ( K ) * < 1 - T * P ! ! < K ) / 1 6 ) 0 0 69 LDA P12K 0070 M?Y 0 0071 ASR !!B 0072 MPY TEE 0073 OCT 135020 0074 CMA 0 0 7 5 ADA TEE 007 6 ADA P22K 0 0 7 7 STA P22K1 P22<K+1) = P 2 2 ( K ) - T * ( ? ! 2 ( K ) * * 2 ) / 1 6 0 0 78 LDB VRAN 0 0 7 9 JSB NFIX 0 033 ADA XIK 0 381 STA ZK Z<K) = X l < K ) • V<K) 0 3 82 LDA XIKH 0 0 8 3 CMA 0 33 4 ADA ZK 0 0 3 5 STA TEMPl Z<K) - X !HAT<K) 3 0 3 6 OCT 135343 0 0 3 7 DEF P ! î K 1 0088 OCT 135320 0039 OCT !05333 0090 OCT 105363 3091 DEF /2KH 0 0 9 2 OCT 135000 0 0 9 3 OCT 105133 0 0 9 4 DEF XIKH 0095 DEF XIKH! 0 09 6 LDA TEMPl 0097 OCT 105040 0093 DEF P12K 0099 OCT 105020 0103 OCT 105000 0101 OCT 105120 0 102 DEF X2KH 0133 DEF X2KH1 3104 LDA Pl!K 0105 MPY ONE 0106 OCT 105020 0107 STA KIK !<HK-> = ?1!<K)/16 0108 STA KALl.I 0109 LDA P12K 0 110 MPY ONE 0111 OCT 105320 0112 STA K2K K2<K) = Pl2(K)/!6 0 113 STA KAL2,I 0114 LDA KAL! 0115 AOA ONEl 0116 STA KALl 0 117 LDA KAL2 0 118 ADA ONEI 0 119 STA KAL2

Page 164: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

157

0120 0121 0122 0123 0124 0125 0126 0127 0128 0129 0 130 0131 0 132 0133 0134 0135 0136 0137 0138 0139 0 1 40 01 41 0 1 42 0143 0 1 44 31 45 31 c6 01 47 0! 43 0 l 49 0 1 53 01 51 0 152 0 153 0! 54 01 55 31 56 3 1 57 0 1 53 0 1 59 0! 63 0! 6! 0! 6? 0 163 0 1 64 0 1 6 5 0 1 66 0 1 67 0 168 0 1 69 0 1 73 01 7! 01 7? 0173 0 1 74 0175 0176 0177 0! 73 0179

NFIX

LSHT

RSHT

Kl K2 URAN VRAN KALl

ISZ J.vp HLT LDA STA LDA STA LDA STA LDA STA LDA STA LDA STA LDA STA ISZ ISZ JMP NOP ADB STS ADB LDA SLA JMP AND ARS lOR STA LDA ARS* ARS> ARS* CLB NOP JMP ANO ALFJ

ARS, ARS, ARS# ARS, ARS CMA lOR STA L A ARS.» ARS* A ,>- S , CLB NOP JMP DEF DEF DEF D£F NOP

COUNT ••2

XlK.l XIK X2K,1 X2K P 1 ! K 1 P! ÍK P12K1 P! 2K P22K1 P22K XlKHl XIKH X2KH1 X2KH CNT CNT KALMN

CNT TEMPl ONE I , l

**12 MASKl

SHTL LSHT TEMPl, I ARS ARS ARS

** 1 7 MASK2 ALF AR3 ARS ARS ARS

SHTR RSHT TEMP! AR5 A.PS ARS

NFIX/ I KALl ! KAL22 URT9 VRTB

SHIFTS AND STORES PAST VALUES OF X2<K* 1 ) XI (K* 1 ) P! !<Kf1 ) P!2<K*1> P22<K-»-l ) X1HAT<K*!) AND X2HAT<K-t-l >

L33 * EXP OF RT3 GOES TO A «£G. EXP P03ITIVE? NO

YES> EXTRACT £X?3N£NT

FO.- M LEFT SHfFT I ,N3 TRv.C TI ON

MSé OF RT3 G0£3 TO A .-rEG.

SCALING

L E F T S H I F T I N 3 T . , C 0 U N T D E T E . R M l N t û BY EXP.

EXTRACT N E G A T i V E EXP0.N£.^T

EXTEND3 EXP TO * n J L £ *0,-«D

FORM , ^ I G H T 5 H I F T I . M S T . - ' J C T i ON

SCAL ING

R I G H T 3 H I F T I N 5 T . > C 0 U , N T 0 £ T E , X M I N:.C- £• Y E X ;

Page 165: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

158

0180 TEE OCT ØOaiCØ 0181 KAL2 NO? 0 182 XlK NOP 0 183 X2K NOP 018 4 XlK.l NOP 0 185 X2K . 1 NO? 0 186 XlKHî NC? 0 187 X2KH1 NOP 0183 XIKH \0? 0189 X2KH NO? 0 190 PlIK OCT 001333 0 191 PlIKl NO? 0 192 P12K NOP 0 193 P12K1 NOP 0194 P22K NOP 0 195 P22KI NOP 0 196 Kl< NOP 0197 K2K NOP 0 193 TEMPl NOP 0 199 TEMP2 NOP 0200 MlØØø DEC -305 0281 CNT NOP 0202 MASK2 OCT 333376 0 20 3 COUNT NOP 3204 MASK! OCT 000036 020 5 ONE OCT 33!-333 0206 CONST OCT 023333 0207 Z< NOP 0208 ONE! OCT 30330! 0209 SHTL OCT ! 03323 0210 SHTR OCT 131020 3211 KALll BSS 1333 0212 KAL22 BSS 1333

Page 166: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

159

Microcode fo r Kalman f i l t e r program:

0 001 0002 0003 0004 0005 0 00 6 0007 B008 0 00 9 0 313 00 1 1 0 312 0013 0014 0015 0016 00! 7 0018 0019 0 020 0021 3022 0 023 3024 0 0? 5 0026 33í'7 0028 3029 0 3 33 0031 0 332 0033 0334 3335 3036 3337 0033 3 339 3 3 40 03^1 0 3 42 0343 0044 0045 30 46 0 3 47 0 0 43 0049 0 3 50 035! 3 3 5? 0053 3PI5 4 3 3 5 5 Øí-156 0 0 57

3 0 58 3 0 59

SLIST SOUTP

= 12 UT =

ÎJESJG 5CRIGIN=

SORIG MPYT

DiVC

DV3C

DONEC

MPYP

STORA

IN = A

8

B

B

3 A A p

F F F

F 3 3

0

F

B

0

A

p

8

B

A

1 !

400

421

CR

52 S4 52 CR S2

5! 53 S3 CL S2

S2

S! 52 33

3? 52

32 34 S2 5? 54 P

J .M P j;-'? j . .p

JMP JMP JNP

lOR lOR CLO MPY lOR SUB SUS CLO A^S lOR lOR 30 V JMP ÎOR ior< SL3 •\0R 3UB lOR SUB JMP

LGS CLO OIV lOR KOR SJB XOR 50 V lOR lOR SU3 lOR lOR lOR J33 J3B iO.R CLO MPY iOP SU3 ÎOR SU3 iNC I3R

jsa J33

54 32 g

3

B S2

B

F • j í

3! 33

F (T

S2

3 F 33 f

5? 33 j

32

3 B A

34 8 3

B

8 o

53

MPYT DIvC y.PfP STORA ADXl ACX2

!00 CNTR RPT R! CTRI R33 NEG

I ! CNTR RPT R! CTRI

•" O i-3

• N£G Dv'SC

COoT UNC

a?.

M£G

0 ? .N £ C Lî C.-TR r<^T

L\ CT=I T? -1

RSS NEG

r;3 3 .\EG

" ! ^3 5 N£G

J\C

£JP

GEVAD OPG£T

C,\T,Í R P T

.-,; C TP I

.!-Si .NíG

.R33 NEG £0?

GETAD OPGET

M U 1 _ T I P L I £ 3 TH£ A .-,EG. C0NTENT3 OF BA

Page 167: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

160

0 3 63 0 06! 0362 0063 0 36 4 0065 0066 0067 0 0 68 0069 0070 0071 0 3 72 03 7 3 0 3 7 4 0375 00 76 0077 0378 3079 0030 0081 0082 0033 0034 3085 0386 0087

ADXl

ADX2

GETAD

OPGET

SENO

A

A

A

A

A

S2 P S3

32 P

Sl

PRS P

32 P

S!

RRS P P T

51 T

lOR INC ADO JS3 J53 ADD INC J 3 ? lOR lOR lOR INC JS3 JS8 ADO INC JS8 lOR lOR lOR INC lOR RSB lOR lOR RSB lOR

A P A

P

M

T P

A P

M

T P M 51

M

52

EOf

GETAD OPGET

GETAD C W UNC

EOP

G£TAD CPGET

GETAD C'*> U N C

EOP

RW

RV.

Page 168: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

APPENDIX D

FFT PROGRAMS AND RUNS

FFT program, no q u a n t i z a t i o n :

D î - ? N S I O N S ( 1 0 2 4 ) CH' íPLrX F{ 10?ír)

__N = Î.2 cd 1 j = i , N '"' " —-

1 S U ) = S ! M F L 0 A T ( J - 1 ) . 3 . 1 ^ 1 5 9 3 / 3 . ) CALL = T 5 i N S I S , F , N ) 0 0 2 J = 1 , N F A B S = C â 8 S ( F ( J ) ) J M = J - 1

2 V , R I T r { 6 , 3 ) j u , s i J ) , F ( J ) , F i B S 3 F O ^ M A T d X . I b . ' S : G . N A L = ' , F 1 4 . 7 . ' F = ' , 2 ^ U . 7 , ' =-B S = • , F l ^ . 7 )

STOP fNO SUPPGUTINé F - 'RANS(S ,F ,N )

_ r j ' ^ . ' .SinN S{ 1 0 2 ^ ) COMPLEX F'( 102^ ) CALL S H U F P ( S , F , N ) LÊNGTM=2

1 00 2 J = l , N , L E N G T H 2 CALL C : « = ' I N ( F , J , L E N G T H )

.LeNGTM=LE'.'aTH«-L = NGTH IF ( L E N G T H - N ) 1 , 1 , 3 "

3 K?TUCN L N O

S U 8 < Î . ' - J T : N C S H U F F ( 3 , F , \ ) O î ^ c N S I G N S l 1 0 2 4 )

_j:OM£>LE>; F( 1024 ) ,C»<PL X On 5 I F Ô 9 T = l , N " I s I P O R T - l J=0 M2=l

1 M1=M2 _^ H2=»'2*M2

" T F ( M O D Í I , M 2 ) - M 1 ) 3 , 2 , 2 2 J = J + N / M 2 3 I F ( M 2 - N ) 1 , 4 , 4 4 J F 0 R T = J * 1 5 F d ^ c a T ) : = C M P L X ( S I J F O P T ) , 0 . )

PETlJR^ ^ ÉNb " " "" SU9RQUTINE C 0 M 3 I N ( F , J , N ) COMPLcX F( 1 0 2 4 ) , c M J T , 2 , C £ X 3 E M J T = C t X P ( ( 0 . , - l . ) * ( 6 . 2 8 3 1 3 5 / = L 0 A T ( N » ) ) N 2 = N / 2 0 0 1 L = l t N 2 L0C1 = L * J - 1 L !X2 = LOCH-N2 2 = 6 M j t » * ( L - 1) *F ( LCC 2 ) F ( L 0 C 2 ) = F ( L 0 C 1 ) - Z F ( L O C 1 ) = F ( L O C 1 ) * Z

_P_FT IJP_N ENO

161

Page 169: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

162

FFT program with quantization:

INTEGER S ( 1 0 2 4 ) CCMPL6< F ( 1 0 2 4 ) N = 6 4 0 0 1 J = l ,N

1 S( J ) = 1 0 0 0 * S I N ( = LCAT( J - 1 ) ' 3 . l 4 1 5 < ; 3 / 3 . ! *•! 5 0 0 * S N ( FLC-^T í J - I ) * 3 . 14 1 5^3 l / 1 6 . ) « •2000*5 l \ ( = L G A T ( J - l ) * 3 . 1 4 1 5 9 3 / 3 2 . )

CAwL FTRANSÍS , F , ' N ) 0 0 2 J = 1 , N FA3S = C i 3 S ( F ( J ) 1 J ^ ' = J - 1

2 WRITF ( 5 , 3 1 J»^,S( J ) , F ( J ) ,FA?S 3 FORMATÍ IX , 1 6 , ' SÎGNAL^' , 1 1 ^ , ' r = ' , 2 F 1 4 . 7 , « F'. 5S = ' , F l - . . 7 )

STOP ENO

JOJSROUTINS FTR_ANS(S ,F ,N) INTÊl Eo' S( r 0 2 4 ) ' C0MPL5X F ( 1 0 2 4 ) CALL S H U F F ( S , F , N ) L£NGTM=2

1 0 0 2 J = 1 , N , L E N G T H 2 CALL C 0 V 3 1 N ( F , J , L E N G ' H ) " le'^GTH=LENGTH*LENGTH

IF (LENGTH-N) 1 , 1 , 3 3 PETUFN

t S O SU'ÎR.'TUT !N£ 5HUFF(S»F,N) OIMENSir.N SP( 1024) ÎNTEGFR S(1024) COMPLEX F( 1024) ,CMPLX 00 5 !Fr!pT = i,r I=IFOPT-l

«2 = 1 1 M l = y 2

U2=M2+M2 IF ( M 0 0 ( I , M 2 ) - M l ) 3 , 2 , 2

2 J = J * N / M 2 3 IF (M2-N) 1 , 4 . 4 4 JFO«>«T = J + l

S R ( J F . j P T ) = F L O A T ( S ( JFCRT ) ) S R ( J = O B T ) = S R ( J F C = T ) / 1 0 0 0 .

5 F d F O R T )=CMPLX( SP( JFC.-T) , 0 . ) PE'UPN cND SUBOQUTINE C 0 M 3 I N 1 F , J , N ) C O V P L E X F ( 1 0 2 4 ) , E M J T , Z , C r X P , Z l E M J T = c e X P ( { 0 . , - l . ) « l 6 . 2 3 3 1 3 5 / F L C " . " ( N ) ) )

N 2 = N / 2 n o 1 L = l , N 2 LOC l = L * J - l LCC2_=LnCltNÍ Z = Í 0 6 û ' E M J T * » ( L - Í ) CALL INTG(Z) Z l = Z * F ( L a C 2 ) CALL I N T G ( Z I ) F ( L 0 C 2 ) = F ( L a C l ) - Z l / 1 0 0 0 . F ( L 0 C 2 >_= L000_» iF-.i }:^^A1 CALL I N T G ( F ( L C C 2 ) ) F { L 0 C 2 ) = F { L 0 C 2 ) / I 0 0 0 . F ( L O C 1 ) = F ( L C C l ) • Z l / l 0 0 0 .

Page 170: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

163

F ( L 0 C 1 ) = 1 0 0 0 . * F { 1 0 C 1 ) 'CALL I N T G ( F ( L C C 1 ) )

l F (L 0 C l iiLCÍL 0 C i 1 / 1 (yj 0 . RETURN í N-O

""SU9R0UT INÊ "ÍNTG('X) OIH5N5ION X ( ? ) X ( Í ) = Í N T l l X { í ) j ' X ( 2 ) = I . \ T ( T . { 2 ) ) R6TURN END

Page 171: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

DFT of S^, no q u a n t i z a t i o n :

164

i') S i Z>:" L =

i s: ' . \ ' . ' = 2 S ' .C- '.!•= 3 > r ?..' *. L = 4 S i G . \ i ' . . = 5 S IG.^ •• L -c > IC- ' : • - ' .= 7 i '* ' Al -.3 S : G ^ ^L = Q SvC ' .1.=

i j ' . : : • ' -L = u s i r - ^ i L -• 2 S !G .v . i L = 13 S : G V Í L = 14 S I G ' . - L ^

15 S I G f . - L = 15 S I G ' . i l = 17 S Î G - ' - L ^

13 S I G " . - L =

19 S ; G L - L = 2 J S : ' L i L = : i s ! ' ; ' ; - i L -12 s i c - . - L -13 S : 3 \ : L =

i ^ 3 l ' V :.: -2 5 S I G ' . - . - = 2 6 S I G ' . . L =

Z S I C ' AL = 2 8 S l O " - L -2 9 3 10" • . = 3 0 S I C - :' 31 SIG' .; -

J . O F = • J . 3 : 5 j ' i - 3 4 = -J . ' J T I . ' ^ J S F -

•) . 9 2 3 . 7 j 3 ^--I . J •..• ' -^:) •J JO c —

0 • 'í 2 3 3 7 -/ b F = 0 . í o 7 1 L 7 - F = J . 3 ; £ 3 — i 2 F= 0 . j J u O JOo F =

- 0 . 3 3 2 b ' - 2 L r = - . j . 7 j 7 1,;oc. F = - O . T 2 3 á 7 9 l == - l . J 0 u 0 0 . ) J F = - û . 9 2 3 a 7 - í 3 F = - 0 . 7 o 7 I C ^ t ) F = - J . 3 3 2 n ^ 3 á F=

0 . 3 0 0 0 t ' 0 7 F = 0 . 3 d 2 6 b 3 2 F= 0 . 7 0 7 1 0 6 2 F= u . 0 2 3 3 ^ 4 5 F= l . JOOOOOO F= 0 . 9 2 3 á 7 ^ 3 F^ 0 . 7 0 7 1.j 70 F = 0 . 3 3 2î5.i . :3 F -

- j . 0 0 o' <J L' J 0 .- = - . J . 3 52o^>2 7 F -- 0 . 7 J 7 1 u 7 J f. - 0 . 9 2 3 : ; 7 ) - - = - L. •)vj' "•i^O''J F = - J . -7 2 .) -•! 7 ' •> r -- J . 7 ) 7 1<->rl F = - o . ? b 2 j - 3 2 F=

U . O - J , J 0 0 1 9

J . J O O O G 12 u ' . J J 0 0 . j 4 t í 0 . 0 0 j J C 3 4 •).vV-oo„:-

- 0 . OOO j ' j O t . o . :c joo7á

- L ' . J j J J O l c

0..L 000003 - j . JJ)0OÍ:0

-0. J J00'j20 -J.0000013 J.0)JJ001

-0.0Û0JJÛ9 C.0000 153

-'J. 0 JOOJ J2 U.OJOOU25

-•0.0 )JJJ0 2 -o.OOJ0L09 - L • G^JOO J 0 9

0 . 0 JOJJO L

- • J . j •..' O J Û J •.)

- ' j . 0 G '3' c •:• 2 0 j . 3 0 J . ^ 3 0 3

- J . 0 J O J J 16

- J . J J J ' J U 0 "v

] . 0 J J ^ 0 2 - t j . OoOOO-'^

- j . J ' J . j ' j 2 ~ 7 J . j - ^ J J . O I ^

0 . 0 F A ^ S ^ - O . 0 J 0 C O 2 7 r - . : iS =

- 1 5 . 9 •- ' 9 n S 5 F A S S = 0 . : C 0 C 0 2 5 F A 3 S = C.J.JOOJOO P A í S =

-o .oo o j o i F.ie.. = 0 . J 0 J C 0 S 3 F A = J S ^

- O . j . . . j O ' j i j J : FArf3-^ - C . 0 0 0 0 0 2 1 - c . j i-?.: j o . j

0 . JOJOO 12 C . 0 0 j u 0 j i

- O . O J O O J o / .^Aí^5 C . 0 0 3 0 J C 3 F A 3 3 0 . o O J 0 1 l ' ' t FAbS O . O O J O O O L F-13S

0 . 0 F i r t S - o . j J J G o O l F A 5 S

- A ^ S = F A u . = f= i ••< '

i - 3 -

- 'J . O .J ">."\l J0 29 - . Q 0 0 ; J J J 8

0 . OOJOJ'5 7

'- i !-i b = FA~S = F A c 5 =

- : . 3 0 0 0 0 0 3 F - = S = G. J J j ^ O - ? F.11S-C-JCGO'JOJ F A 3 S = C . 0 0 C C O 2 1 Fi.^.S = C. J : Ô O J 0 2 r * . i '.=

- O . O J O C J - 1 3 F Í 3 N -

. J . J 0 J V . ' J J L F Û O S =

- c . c o : ^ , 3 : o =i-5S = - 0 . J . v C : 0 2 5 F : . : 3 S ^

1 5 . 9 T 9 S 7 9 ' J r i " 5 =

G . 0 0 J C J 2 T r i > 3 S -

o .o :ouo i9 O . 0 C 0 O G 2 9

1 5 . V v9-79 ' j5 . J j-jU J £.^

0.r[:oor}Z-0 - ^ C W . V J O -

O . G ' 3 9 -0 . : .• :j 16 0 . J . -.• , J 2 l 0 . , V. ' -C J ' j 0 . : •.' •. J C 2 3 U . J v J J Kj 4 .0

O . U J J 0 0 5 7 O . C C G C J l 2 0 .\J\)00 19 1 0 . 0 0 0 0 0 . 3 3 O.O:GGO25 0 . J 0 O 0 J O 3 0 .". -. '^ .— •. "1 ,r.

. 0 . J O U J J J

J . J O O J J i . 2

O . O . J O O ' ) 5 7

0 . J U j j J 1 3

0 . JG j u u - : ' J . •- ' j .. O w .L J

J . G J J o J ^ 1 ' j . G'- J u u 11> o . O G ^ j ^ i : j . j ' ' C O v j j n

j . •- w . c : ' " * j . : . j - j j .'G 3

1 5 . - " ' 9 ^ 7 9 •..

0 • .j'.» j 0 ' j í. "•

Page 172: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

165 DFT of S^, no q u a n t i z a t i o n

0 $.ir,:iM-.

1 S I i . M L ^ 2 S : - ^ " . . L ^

3 S I ' . . \ \ L

4 S I G - i . \ L ^ 5 S I - J V i L ^ í* ^ Í : 3 ^ . I . ; L ^

7 S . V ; - . :

10 s : ' ' ^ \ A L = 1 1 S I G \ : . L =

12 SJ.G.\£.L = 13 S : G - ^ L =

1 4 S I í - . - . ^ L =

15 S Î G \ - 1 L = 16 S I G ^ - i L =

1 7 S I C . ' . i L ' l A s r r , . \ ' i L = 19 S I G . V i L = 2 0 S [ G ' ' U L =

2 i S I G ' : : Í . L =

i22 S i G N i L = '2 3 S i G M i L = 2 4 S I G - \ A L =

2 5 S Î G " > i L = 2 6 SrG. ' .ûL =

2 7 S I C N i L = 2 6 S I G N i L ^

2 9 S r G \ i L = 3 0 S I G . ^ A L =

31 S IGr : - .L = '32 3 Í C " : - ! . = 3 3 S Í G V V _ = 3 4 S I G . \ : . L = 3 5 S I G . V - L =

3 6 5 I í ^ ' . ' A l =

37 S I O . ' ; - L =

3^^ .S IG* ; -L =

' 39 S I ^ . L i L ^ -tO S I G " Í . : L =

, 41 S I G ' a L -- t 2 S r G ' J 4 L = 4 3 S I " , ; - A L = 44 S I . ' J - L = -r5 i I j " ; *. L -4 6 S I G - . \ L = A 7

4 3

4 9 ' 5 0 5 1

52

i ^ ^ 154 3 3

~>o 5 7

3 -i 59

6 0 6 1

6 2 6 J

s :c">i.iL ?. i r \ i L S I f . " - . L <; i r / ; ; : . S I ' •'••-!-

S IG.-'-'iL s : :•*:.'-L 5 i:;.-^L i i : 'L iL > • W - . . U

i iC :-" L ' î : 'i i L

s I Í : \ : L Sl,<3,*i-L S ! • • 1 - L

3 t .W;. lL

3 i : ' - A L

0 . 0 F = • J . •! 71 J 3 2 9 ,= r:

1 . ' j 7 I 3 l 2 ?. F = 2 . 3 j 7 3 c J d r = : . . • ' . 2 6 . ) 2 O O F = 3 • L 1 i •- 7 6 3 .- = 3 . : G 4 G Ô 4 Í = -

3 . 1 2 . ^ - - J 3 •- =

: . ; U 2 : j i = -2 . ^ 3 ^ 5 : 5 ' î = = : . 3 -» l O f 1 J r =

2 . J.3 7 1-^6.3 - =

1 . - : i 3 5 57 - = i . J2-Í f ô 74 F -

1 . ' / V O 3 2 ' J I " =

2 . J J 0 0 0 O J r = 2 . J á') V 1 c 7 F = 2 . 0 9 4 6 5 2 2 r = 2 . ^ 0 4 " ^ . ' 4 1 F = l . 7:í7 09.J ' * «= = l . ^ - t J 5 l C 5 F = J . 9 d - f 2 2 7 o F = 0 . 4 5 7 5 2 5 3 F=

- .9 , >J J 5 / 3 i, .S F -

- j . 5 c 3 j 7 2 5 - = - u . 9 J 1 7 5 4 3 .- = - 1 . 2 2 - 2 ; - 3 7 - = - L . 2 9 5 2 ' ^ 2 9 P = - i . I 7 6 í j 6 - r 4 .- = - 0 . ^ 9 0 9 4 . ^ 9 - = - O . - 7 ; 2 ^ - » - r ^- =

' J . ) JO-').) l o F = J . t 7 9 2 ' : u 2 ' = J . 3 9 0 9 5 . ) ^ F = 1 . l 7 6 6 c 53 = -1 , 2 9 5 2 9 2 9 r= 1 . 2 : . - 2 : 9 o = = J . > n 7 - i 5 3 ;- = J . 3 J b J 7 3 5 - = • J . 0 3 5 7 3 5 9 = =

- 0 . - ^ 7 5 2 7 2 ^= - J . ' 7 . - S t 2 2 r t O r-- l , 4 - r J 5 i l 7 >- = - 1 . 7 3 7 J 9 - Í O F_

- 2 . JJ-t- ' - . -r ío F =

- 2 . . r : ; 4 6 5 9 ^ = =

- 2 . J 3 J 4 2 5 3 F -

- 2 . J ' J 0 G G B 6 F =

- l .-3^)0 32 7 7 F =

- L . 2 3-^92 2 F -

- 1 . 3 2 3 3 5 ^ - 3 F =

- 1 . 9 0 3 4 1 8 ^ F=

- 2 . V-á 7 l ^ :3 •-; F - 2 . 3 4 1 6 5 1 0 F= - 2 . -j 3 - :> 1 - -í ' ' = - 2 . • ) i ^ t 2 l 4 l t = - j . 1 2 ^ : 6 3 - - ^ t^^ - 3 , 2 G 4 0 5 o 7 - = - 3 . 1 1 3 3 7 1 6 - = - 2 . 3 2 6 0 2 - 1 F= - 2 . 3 f. 78 j 57 ' = - 1 . 6 7 1 3 142 ' = - o . 3 ^ 3 5 6 - - ^

•G . 0 0 0 0 1 5 5 0 . ' ; . j J.J3 J l 'j . •) .i'j'j • 'o5

- J . . J U J u l _ ^

• u . . j 'JGO^ L 7 •0 . JLQOihZ

0 . J O ' ) 0 2 15

0 . 1)1J;o5

- o . J J . . j i - . i ! O . O J O u l - - . ^ G . J Û o G i 7ií j . û J G ' j . j o i o . o O O )3 3 s J . 0 / 0 j :- 2 4 >...•) ) o G i 7 2 J . ' ) u O G 4 S S

- 0 . G O u u O i I • o . 0 JJO -JO

ô . OOJJu.-iS 0 . . ) 0 0 ' J l 14

- j . ouOGOO 5 'u . J '.J U j 'J .i -J J . J J O O O á -u . 0 0 0 J 0 u 5

- 'J . GvJO J o 3 4

- j . G 0 0 u 3 9

0.ojoGuo: 0 . J J J O 10 7 j . G ô u u 7 i-» 0 . J J J G Ô 7 Î J . JL 'GlCd- t j.JJOl-Ll

- . / . ) / •)•)'; ->J - J • J\J ô j .,'j-'

- • j . G . )0GJL7 j . : 0 0 J J 7 5 u . GGOOG.^T O . J. 'H 7.) '.S

- . J . J J : V 7 :

j . ) ' : J . ) J 5 O

- ; . J J O J u j - t - j . J 0 0 G ' v 9 9 ->:-..) i J G J L2

0 , j ) j V ' . ) : o ' j . O J J ' ) l 2 9 . JOOÔÍ L -

j . . ' JO J L 2 1

- j . j o 0 ) G 3 i - ' i . " ,1 ) . ) J u 2 4 -']. . ^ j v i o L J'J

O . .y J J O u J:; J . J o O O l -»2 . j . G j 'u í; j . 9 0 , J j O 0 2 .'5

..< . '..iOJ' J l - J

- u . .; u J ^ l - 1 .

- w . . ' . / ' iL i 1 - 2 (3 . ^o J J L L -

- u . u u O J 3 5'J

- O . J u J L6 L9

- 0 , . , ' J ' . J i L 7 - • J . O J . J L 3 5 Í

- : . J U C 2 6 71

0 . C F -\ ft 9 =

- 6 3 . ^ 9 ^ ^ i í 7 F A á ' j -- ^ 7 . 99C)v J 3 ^ ~.\r>::-

0 . 0 G o ú 5 ? . i FAeS = - 3 1 . 9 9 9 ' ; - j 9 5 F.:;^, 5 =

- . G J J C 3 74 F-\-íS = - G . ) 0 O v l 11 r A ô 5 =

- C . ) . l ' -»Oi55 F : O ' ' > =

-o .JuG 'C i ' ^G '^ F A 6 S =

- O . . ; O . J 0 3 7 2 F : . : S =

C . ' j ^ G •.-•3 2 5 r A 2 > = — o , ' j . : j .j <. 3 r Á 5 : =

c. í J 7 = - i o S = 0 . •:• 'G J i.; L 6 5 F A 'i" S =

. • J ) 2 9 C^r, ^ ^ - 0 . - C . > J o ' . ,10- p i rj r ^

- 0 . 0 J 0 ' : . - Ú T 6 F A - S =

- C . J ' ) 0 0 ô 3 t FA:5 5 = - 0 . 0 0 0 0 1 7 ' - ^ F Í ! Í S =

0 . ' )0 ) C 0 9 4 FA J 5 = C . G ú •: G 'G 6 2 F.". .-3 5 =

- G . G G j j J G L F i o 5 = C.GG)00^^=í PX'^S = C . Ú G G 0 L L 5 F A n ' j -

-C . . JG . . .U175 F a 3 S = C.C t G 0 : 6 = A ^ i =

- O . G G J O L 9 á F A - S = - ô . J O J 0 0 9 5 FA 7 S -

'0 . 'J C G G i 1 ^ - A ^ 5 = - C . C G J 0 J ^ 2 F i - ^ S -

C .OOGCL53 F » i - - > -. J J C ' J J 0 5 FíLrS =

0 . j - A -j 5 ^ - ' J . J : G O L 5 3 F A - Í S =

- C . ) G J o 3 G 5 r A í ' i = C.GO'.:C0 5 3 F t - í S ^

- G . G Ô J G 0 3 6 rA;'>S = C. ) J )0 J >6 F * , D S =

- 0 G G G 2 5 9 F A 3 : -

C. : ' j j ' j J 5 0 F A " 5 -0 . J G : 0 1 76 F A 3 5 =

- G . G G J U 0 9 7 F A 3 S = j . ' j •" J •: 0 ? 7 . c b--

\- , - 0 .

0 . 0 .

c. .

- 0 .

-c. G.

c. 0 . 0 .

- ' - .

0 . c •

3 1 .

- : .

) . t ' )Ol 35 ;'") ) . .")9ô

C C J u 5 4 4 )-GJL'o3 2

j G u ' . G 8 6 JG 'Ju . )34

)••>.;<.•• 1 6 • *

0 j u U L 71

C G ) 0 0 ' 5 i G C G.. o 0 6 L : J G L 3 5 ) ' ; ' ) C o 3 9

J 0 0 0 - ^ 0 4

JGuGO l.-:.

C G 0 u : 3 1 GuG()3 50

, i ; 9 9 3 9 0

, .j j ' j i. '^ 5 3

r A-^S F i 6 S -F A 5 S ^

= A i S ^

F A D S ^

- i 5 S =.''. r S F i : ) S

F û :3 S • •~ Å ~*.'.

r A 35

F A i S

" A 5 S

- i b S P A -í ' . F A 5 S

= A3S

FA SS

4 7 . ^ 9 ^ . - ' 9 3 2 FA.-:S-6 3 . 9 9 ' ^ 9 0 t ! ' t FA:>S^

O.GOCO 1 5 3 6 3 . J9--r9 6 9 5 4 7 . ' T - t V ^ G ^ ^

0 . . O O ' J 5 - V

3 1 . 9 9 9 <3 3 9 5 0 . . '&OG5 2 0 0 . : o ' j u i i 4 2

C. j - G e C l o 9

J . OG J ' J 3 - / 5

0 . j i ; j ' G 1 7 ^ 0 . • . ' •JuOo74

0 . \ : . ' 0 3 - i

U . j t . J O l c 7 'J. J ' J O ' . ' I 7*

O . ' j vj O-j 5 Z o 0 . J O J O 0 9 1 0 . 0 0 0 0 1 0 3 0 . C C O O 2 0 0 0 . 0 0 O O L 4 H

U.0^^ J 0 6 2 0 , J0G-. ju3 J O.O.-^OOL^Z O . O O J i i i s

0 . 0 0 0 0 l 7 9 0 . G G 0 Go5 0 . 0 0 O O L 9 á O . J G J J L 4 3

0 . J.^ j 79 2 0 . 0 ' ) 0 ' J Í G 3 0 . J G G L l u O

O.G.. . J I - - 4 0.0 j-J'-' .'-' '. ) O . G U J G 1 5 3

O.OG J 0 3 0*3 O.CGGGO'32 O . G i : O O i : 3 O . O O ' J O l G o 0.GGOO 2 o 9 a . G J u ' . 0 " ^ 5

. G : G C I 7 9

0 . G.. ; .j L 3 3

O.uCOO'ooí 0. J G G O O 3 3

0 ._OU01c;-J 0. J : G G l - 7

0.-GJG5r-0. GGOo'b-t'-t O.OQO^O^t l 0 . }j'j'JÔS3 G.GG;_ 2-4

0. j - *: j i 71 O.GO:'^)IOT O.UCGGO:G 0 .00002 43 ú. G '-j u : - -• 4

0 . J o 0. J '• -f 3 O.Gu )JL53 O.v.OGJi 13 O....UJ i'D L7

3 L - 9 9 -i -> 3 9.)

G .^Go'J 50 6 47. .9-3779 6 3. '"• -• t T j •? 4

Page 173: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

166

iFT o f S ,

0 S I G ^ : A L =

1 S Í C . ' . A L =

2 S I C N A L =

3 S I G ' i - i L =

4 S I G . \ A L =

5 S I G M A L ^

ô S i r , ^ a L -

7 5 I ^ - ' . A L =

8 S I G M A L =

9 S ! G f i 4 L =

l . Q . . S I G . \ A L = _

1 1 S Î G . ' . A L =

1 2 S Î G ^ A L -

1_3_ .S IG . \ 'AL= .._

1 4 S I G N . i L =

1 5 S Î G N A L =

1 6 S I G . N A L =

1 7 S I G N A L =

1 8 S I G . \ A L =

1 9 _ S Í 0 . * : ' 1 L =

2 0 S I G N A L =

2 1 S I G . N A L =

J 2 _ S I G . N A L f

2 3 ' S I G N A L =

2 4 S I G ' : A L -

2 5 S Î G N A L =

2 6 S I G . * ; i L =

2 7 S I G N A L =

. 2 3 _ S I G \ A L =

2 9 S I G . \ A L =

3 0 S Î G ' S A L -

3 1 S I G N A L =

sca led by 10:

j 3

7

9

1 0

9

7

3 ,T

- 3

r ?

- 9

- 1 0 - 9

- 7

- 3

0

3 7

9

1 0

9

7

j

0

- 7

- 9

- 1 0 - Q

- 7

- 3

, _

- -c =

' -P =

F = P =

- = = =

- = <P-

F =

F =

F =

F =

F = c =

c ^

F = ? = c -

c —

F -c -

P =

c =

F =

F = í- ~

c =

c =

U . O

J , j

G . l O O O o G O

0 . 0

< . . . . V . .

u . o

0 . I u 0 ) 0 0 0 0 . 0

J . 0

0 . 0

- 0 . I G G - G u G .

J . 0 0 . 0

0 . 0

- 0 . LCOOGOO

0 . 0

Q .(0

0 . 0

- 0 . i G O O O u O

u . 0

o . o ' " 0 .'.')

. - G _ . l O p O O O O .

0 . 0

u . û

0 . J

0 . I G O G O O O

0 . 0

- û . O

ú . J

ô . IGOGGOO

J . ')

C . O

• ^ - J - I l . i 9 9 í . - y s S .

0 . :•

0 . 0

0 . 0

? . 1 9 9 9 ' j ; - :

0 . 0

O-'G

0 . 0

- 0 . i G J O O G O

O . ' J 0 . 0

0 . 0

1 . 3 9 9 9 9 9 6

0 . 0

0 . J

O . G

- 1 . 3 9 9 9 9 9 6

O . J

' O . G

0 . ) 0 . IGGGGOC

" ' o . o O . j

0 . 0 - 2 . L 5 C / 9 9 9 S

0 . 'u

0 . 0

0 . J 1 l . ^ ^ . ^ ' ^ ^ i q í .

0 . )

r A B S =

' A ^ í , =

F A â 5 =

'F A 3 5 = r u ^ ,. —

F A - ". -

•- A ••• > =

F A 1- - =

F A - 5 = î- * .i c. -

F A í j 5 =

F AÍ5 S =

F A 3 S =

F A * S =

= A 3 S = '

- A 3 S =

FA í>S =

F A 8 S =

- A - . S =

CA>3S =

F A » i 5 =

= A B S =

r î . . ^ 5 = - A 3 G =

F A 3 S = ^ 4 3 ' ; ^ =

' ^ i ^ s ' - "

F A 3 S =

F A ^ 5 =

- A ^ ^ 5 = r '. ?. 5 3

F t f H S =

T . J

0 - Û

l U 9 0 0 ^ 1 9 2

• j . o

0 , J

" O . J

2 . 2 3 2 2 7 0 5

0 , G

' 0 , 0

0 . 0

o . i - ^ i ^ ^ i :

0 - 0 0 . 0

0 . 0

1 . 4 0 3 5 c > 4

0 . 0

0 . 0

0 . 0 l . 4 : j 3 5 o 6 4

0 . 0

0 . 0

0 . 0 0 . 1 - 1 4 2 1 2

O . J '

0 - ^ 0 . J

' " 2 . 2 3 2 2 7 G 3

0 . 0

U . 9

0 . 9 1 L . 9 0 J - 1 --2

0 . '•<

Page 174: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

DFT of S^, scaled by 100:

167

0 S I G f : A L = l S Î G . \ A L =

2 s : Í ^ \ A L = 3 S i r . \ - L -4 S I G \ A L =

' 5 S : G V A L =

6 S Î G f : : L = 7 S Î G - V i L =

"S S Î G * i L =

9 S Î G . V i L ^ I J 5 Î G ' \ A L =

l 3 Î G N A L = 12 S IG- \AL = 13 S Î G N A L = I ^ ' S Î G N A L * 15 SIG.NAL = 16 S I G N A L = 17 S I & . * ; A L =

18 S I G . \ a L = 19 S Î G f î i L =

'20 ' S I G . \ A L = 2 1 S Î G N A L = 22 S IG. " ;AL = 2 3 S I G \ - L = 24 S t r , - , i i . =

25 S I G \ A L = 2 6 S î G . \ A L =

27 S I G r i i L = 23 5 : G ^ A L = 29 S ÎG.-.AL = 3 0 S Î G ' . A L = 31 S Î G \ i L =

J F =

3 i - = _ . 7 J F = C •> = X

1 . • . ^ = _ . _ 9 2 ? = ' ' "

7 0 = = 3 3 " =

' Q = = - 3 3 5 = -7,-> = =

- 9 2 " = - 1 0 0 F =

- 9 2 = = - 7 0 F =

- 3 3 F = 0 F =

33 F = 7 0 = = 92 F =

1 0 0 F = 9 2 = = 7 0 F =

3.3 - = J = =

- 3 3 « = - 7 0 F = - 9 2 r =

- I J O - = - 9 2 —^ -7. .1 - = - 3 3 .= =

o . 0 0 . 0 j . 0 '-• . u

_ 0 . : ^ • ,. J . 0 2 u 0 0 0 0 G . •''

0 . G O . 0

- j^ JZ J G J I . ' O

J . O U . 'J 0 . J J . O u . :> 0 . 0

o.O G . G 0 . 0

"'J.O 0 . 0

- 0 . 0 2 0 0 0 0 ) 0 . 0 0 . 0 J . J

"• J . 0 2 G G 0 G G 0.0 G.O 0 . G \ • . 'j

_ 0 . 0

G . . j

0 . G - 1 5 . - 3 9 9 9 9 1

C.O __ 0 . 0

V - . ' , '

0 . 2 3 9 9 9 9 9 0 . 0 C .O O . û

- . l 1 9 9 9 9 ^

C. 'u 0 . 0 0 . 0 O . l G G o O O o 0 . 0 j.O

Q.'Q'

- 0 . 1 3 G 0 0 0 G 0 . 0 0 , 0 0 . 0 0 , L 1 9 9 9 9 9

0 . 0 0 . 0 0 . ''

" - G . 2399^^9 =>

0 . J 0 . 0

"'c.^? 1 5 . - 5 9 9 9 9 1

_ . ^ î _ ^

F A 3 S : F A - 5 r Zy-J ^-

F .-s 5 '"v .

F A 5, S 'c \ ^.-

- A Ô : : c _ < _ • - : ,

- A ^ S :

- A = 5 ^ -i-y'i-

- A G •* : -AfvS = AoS^ •-A ; 5

= : i 3 5 P-*5S

"f^ A S S = A3S C i 5 S

F A a S FA2S F A c 5 = i 3 S ^ A i S f^A^ S p',-3 5 F A a s f^:í3 5

P A3S - * ? >

O.G ' . \ . )

1 5 . ^ 5 9 9 ^ 9 L 0 .0 0 . 0

"C". G "' 0 . 2 4 J 8 3 1 9 ? .9

""o.o 0 . 0

__ G . I 2 i 6 5 5 2 O.O '

0 . 0

.-9-r? á'".'i3ooooo O.Û 0 . 0 0 . 0 ' 0 . 13C0ÛG0 0 . 0 oV.j 0 . 0 0 . L 2 1 6 5 5 2 0.0 Û.G

_ 0 . " ' 6 . : - 3 3 3 19 0 . 0

O . G 1 5 . - 5 9 <-» 9 ^ 1

0 . . :•

Page 175: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

DFT of S^, s c a l e d by 100 , N = 256

168

0 S I G N i l = _ l __S îG ' ^ i L =

'2 SIG.\.'.L = "3 SIG.\AL = - SIG*..*L = 5" S I^.NAL-^ 6 5 1 •:•:*:! = 7 SIG^\AL =

'S" S I^-.^'Al = 9 SIG-.AL =

'.0 5!GNAL = i r S I C N - L = 12 5!G'"!AL = 13 S I G ^ A L = i V SIG^!AL = 15 S IG'-''îL = 16_SIG^;AL = Í 7 S ' l G ' i A L ^ " 13 s r . r . i L ^ 19 S : G . N Í L =

!20 'SIG.NAL^ 2 1 SÎG-.'AL = 22 SIGMAL=_ 23 SI'GMtL = 7 - S I G.*'A L = _25_SIG.NAL = 26 S'IGN.'. L = 27 S I G f A L =

129 S ÍG\'AL = ^ 9 3 î " . - ' ' i L = 30 S I G ' i ! - = ^ l S I ?•.•''". = 3 2 ' " S I G N M = 33 SIG^'AL = 3_4 S Î G N \ L = 35 "S"IG')AL = 36 SIGVAL= 3 7 S IG^AL= 33"S IG 'V iL^ 39 SrGf; : iL = 40_SIGr4 L = M 'S î.'".r.AL = 42 SlG^<-íL = 43 SÎGNAL= 4. S IGNAL = 45 SÎGNAL= 46 S_IG.'<AL = 4 7 S IGNAL = 48 SIG'4Al =

50 S I G . \ . i L -51 5 I G » ' i L = 5 2 SIGNAL= 53 s:Gr.AL = 5 - S IGf'AL = 55 <:iGNiL = 56 S I G " A l = 57 3 Î G N : . L =

5 3 S IG ' ! ^L = 5'9 SIGV^.L=^ 60 SÍG.'(AL =

0 c -33 c =

70 P = - " -32 î^^

1 0 0 = = •:j 7 i; r "

70 - = 3 3 r .

0 c -

- 3 3 F = - 7 . J = = ~Q 2 f =

- l O ' ) F = - 9 2 «• = - 7 0 < = " - 3 B F =

0 p = 3:^"F = ' 7- ) F =

92 F = 10'."> = =

92 - ^ 70 != = 3 8 F =

0 F = - 3 3 = = - 7 0 = = " " • - 9 2 = ^

- l O O F= - 9 2 - = - 7 0 = = - 3 8 = =

0 F =

3 3 F = 7 0 = = 92 F = '

l O j ^ = 9 2 = = 70 F = 33 = =

- 0 F = - 3 3 F = - 7 0 P = - 9 2 F =

- 1 0 0 F= - 9 2 F = -7-D F = - 3 3 = =

0 F =

33 " = 70 - = 92 ^ =

1 0 0 ^ = 02 ^' =

10 p = 33 F =

0 F =

- 3 3 <= = - 7 0 F =

' - 9 2 == - 1 0 0 F -

ô . ' J 0 . 0 0 . o 0.- ' ) 0 . 0 '.'. .' 0 . G

i ' . :•

J . ;

0 . 9 0 . 0 0 . 0

0 - 9

*-* • -' 0 . 0 o . O 0 . G 5 0JOOO J . O 0.0 0 . 0 o . O O « J

0.0 •. ' . 0 0 . G 0 . J

' 0 . -J 0 . 0 J . ' J

0 . j

o . G 'G . 9 O . u 0.0 O . G

" O . G u . ' 0 . 0

"ú.O 0 . 0 O . G

0 . •')

0 . 0 0 . 0 0 - 0 0 . 0 0 . 0 o . G o . i 9 J T^/.JO

0 . ..' O .G

u . •''

O . J 0 . w 0 . . J

0 . 0 0 . 0 0 . 0 0 . 0 û . C' 0 . 0

0 . .9 0 . 0 O . J 0 . 0 j . G

G . }

C.G . ) . , 1

O. f ' 0 . 0 0 . 0

' ' 0 . 0 . ) . . . J . '.'

0 . 0 O . O

- 1 2 3 . 3 7 9 9 3 3 6 0 . J 0.'-^ 0 . 0

o.o 0 . 0 O.G G.O 0 . 0 0 . i 1

O . o 0 . 0 0_^/\ 0 .'G O . J r>. *>

" 0 . u 0 . 0 0 . 0 _ _

' .• ' . ' j ' '

0 . -0 . 0 ,).o 0 . 0 0 . 0

} . .1

O . >J.O

' " 0 , 0 0 . 0 0 . 0

o'.'c 2 . 2 3 = 5 0 9 ? -0 . 0 0 . 0 '

0 . 0 0 . 0 'V.O 0 . 0 0 . 0

d."o 0 . 0 0 . 0 0 . 0 0 . 0

«=^35 = r A 3 5 = r .'. 3 5:^ F A ^ 5 = F A ? 5 =

F A 'G '•• = F A R 5 = ' AÔ5 = = 4 3 5 = CA35 = F A r S =

FA-.S = F A C S = F a - 5 = PA3S =

F A - S ^ F í - 5 = PAbS = = i 3 5 = =Ar>s = FA3S = F A 6 S = FA35 = F i ? ' =

F ; 3 5 = = A 5 S -F A 3 S = " " FA3S = ^ A H G -P A . 5 =

- A 3 S = = A -i '-, ^

~ F l 3 5 = F A T S =

= 4 3 5 =

' = A 3 5 = ' = A ' • = w i = í -

= A r 5 = = A3 5 -= •-.35 = = i ^ 5 = - A . 5 = = A O C =

^ * ^ T =

• A 3 ; =

- A 3 ' =

~ -. - ) 5 -

= i n > =

F : ' 5 -• 1 "! " =

= i - * ^ =

= A ; " =

= .*. r 5 = p .: -j >. =

= A .J S = = A ^ 5 =

= A ^ = = A •í 5 = F i ••;; 5 =

0 . 0 0 . 0 Ô . 0

. O 0 . 0 } . • ) ' "

0.0 0 . 0 0 . 0 0 . 0 0 . 0 Ô. ' ' j 0 . 3 ' j . 0 o.b' 0 . 0

1 2 3 . 3 7 9 9 7 4 4 " ' O . j

0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 0 . 0 O . ô 0 . 0 0 . 0 J . ô 0 . 0 J . O

' o'.'o 0 . 0

q.'O 0. j 0 . 0 0.'^ 0 . 0 0 . 0 0 . 0

' iJ 1 J

2 . 2 5 f - ^ - 3 7 0 . 0 0 . 0 • ) . J O.G o . - v 0 . 0

o.u ) . 3 " 0 . 0 - . 0

"•3.0 0 . 0

Page 176: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

169

63 64

65

66

67

63

S r G N A L =

SI'V!A1 =

5ir.^'iAL =

SÎG',.;L =

s ;G.\'L-

5 Î G ' : A L =

5 Î G ' ; A L =

5 ! G M A L = 5 Î ' * ' : ^ L = 5 I G " A L = S I •"• •'' *. L = 5 ' G ' í A l ^ S ' 0 ' - Í A L =

SI ' ' - ^ ' aL = •77 S ! G * • *• L -73 5 I G ^ i A L = 79 _s :G ' :A I . -3 0 S I G \ A L = 8 1 S I G ^ A L = 8.2 S r " ' ' l L =

6 9 7 0

7 L "

72 7? 74

75 7 6

83 S ! G ^ : A L =

8 4 S IG." AL =

J - 5 _ S I •'"'':-L = 86 S IGr !AL = 87 S I G r ' i L =

_ 8 3 _ 5 I G M A L = a'^ S I G ' ! A L = 90 5!G.N'AL =

_91 S I G » i A L = 9 ' 2 " S ' I G V A L = 9 3 S : G . * Í A L =

9 i . S IG^ AL = 95

96 97

93 93

100

101

112

103

104

105

106

iJ7

103

i 1 0 9

1 1 10 i 1 1 1

112

113

114

LL5

LL^

1L7

113

119

L 2 . D

121

122 123

124

125

126 127

r •• - ». • 1 -

5 IG-:iL =

5 I G " i L =

S IG^JiL-

; I r N' A L =

S ÎG'.iL -s : G ^ : Í L =

S I G N ' L =

S IG^'iL =

S I G ' : A L =

5 : '5 ' i L =

3 IG^'-L =

SIG- -iL^

S !Cf.'iL-=

s IG.';AL = S!G,"iL =

s :G':AL = S IGN'AL:

S!GNAL=

S ;r^;AL-

SIG^.^L^

S IGVU =

SIGNAL-

S IC^iL-

? ÎG\AL =

S:''-VAL-

5 Î G N A L =

5 IG»IAL =

S:GN.-.L =

5 T^/.iL =

SIG,\AL-SIG^íAI :

-33 > 0 33

71

92

100 Q 7

7 0

33

0

-33

-70

-^2

:.IÔ0

-92

-70

-33

J

3B 7-J

92

100

92

70

33

0

"-33

-70

-92

-lO'J

-92

-70 -3.^

0

33

70

92

100

v2 7 ,

36 0

-33

-70

-92

-1 00

-92 _ 7 j

-3 3

\'i

3 í

7()

92

1 O'J

92

70

3*

0 -3^

-7.»

-92

-100 -92

-70

-33

= -

- i

- =

= =

- = c -r; .

C -

- =

= = r = r- .

— —

- -

c —

c -

r -

c -

c -

z —

c -

í =

r = z —

z —

." --

?:='

— -

c -

c-^ ^ e = c -

c = z —

c =

r- =

c =

f = c -

^ =

= = c -

• - =

f^

c -C :-.

r -

F =

f = - • _

F = C =

,' = c -

F = r-

r = F =

P =

r- =

C s

F =

0 . 0 J . O 0 . 0 0 . 0 . * . : ) 0 . 0 0 . 0 0 . 0 >. , T)

0.0 G . 0 j . 0

G . 0 u . 0

0 . 0 . 0

- O . 2 9 C 0 G C U 0 . 0 o.n 0 . 0 " ' K.1.0 c.o 0 . 0 . ) . 0 0 . 0 o . u c .o 'G. 0 _

. j . O

u . 0 0 . 0

G.O G . 0 , ' . " 0 . ô .1 .'1

.J .

0.

' -' •

0.

0.

•J .

0.

0.

0

0

•J<

0

J

0

0 J

0

L'

G

0 ••)

0 0

J

0

0

0

0

. J

.0

. 050000.1

. 0

.0

.0

.0

• J

0 . 0 'j. '->

j - O

0 . 0 0 . 0

0 0 0 0 ».>

.0

0

, 0 r\

0 . 0 0 . 0 0 . 0 0 . •» 0 . 0 0 . 0 0 - 0 0 . 0 O . '•

0 . 0 0 . 0 0 . " ' r> •"

0 . !} _

G. 0 0 . 0 n.)

- 0 . 3 C C C C 0 . 0 -0.-'» _

• 0 . 0 0 . «9 0 . 0 0 . 0 0 . 0

0 . 0 O . 'J

0 . 0 0 . 0 ' ) " . ^ 0 . 0 O . í ! 0 . 0 ' 0 . 0 0 . 'O 0 . 9 0 . 0 G.C

"'o .0 0 . 0 0 . 0 0 . ' ) 0 . " ' 0 . 0 0 . •) o.c O.r,

0 . 0

O.G 1 . 0 19C

0 . 0

O.f'

0 . 0 O.G

• j . 0

. ) . ' • >

' . " > . ' . '

o.c 0 . 0

0 . 0 0 . « ' •G'. 0 0 . 0

' • ) . . 1

F.'.3S =

F A 3 «; =

FA35^

F '. M =

F A 3 5 -

" F A 3 5 =

F i 3 > =

^ A 5 5 -

F A -. _

c > _ «: -

_f A 3 S = F ' Í = 5 =

F A ^ = c * - •:. =

- . •: 5 =

F A -. 5 = 0 0 F J 3 S = "

FA3S = F_a o S_f_

T.A*'3S = '

= A35 = = i 3 5 = FA3S="" F i 3 5 = P A .3 5 =

F A 55 = Í : Í 5 5 =

F A 3 S . FASS^

_F_A_5S-" F Â 3 S ^

FA35 = F A 3 5

FA55 = FA=S = c /\ a - =

F A 3 5'=' = •i 3 5 =

9 9 5

F A 3 > = F A 5 •"- = = A & .> =_ F A 3 S -= A3 5 = = A552_ í * 2 •" —

F A = 5 = c i :., S =

F A 5 S -F A = S =• c î ••* S -

F t JS = F i -<S = ? ' - r S ^ FA3S = F A 3 5 =

F i r s = F r :i 5 = = A •! 5 = FA.^.<* =

9 . > : • . ) 0 . 0 .•• - 0

0 - 0 0 . 0 } . • ;

0 . 0 . '1. ."»

C. J . 1 . }

0 . D D.O

0 - ) " 0 . 0 ' 0 . 0

0 . 3 5 0 9 4 C 6 O . J 0 . 0 0 . 0 " 0 . 0 0 . 0 O . J o.-> 0 . 0 o . G ' 0 . 0 : > . • )

' 0 . 0 0 . 0 0 . j

" 0 . J . ) . ) j . •':

0 . ; ) ' 0 . ' ) rs '^

u . > "o^o ' " '

9 . J 0 . 0

" 0 . ) 0 . 0

__0-G ^ • ^ . ' ' •

O.G _ G . J

^.0 0 . 0

_ L. >2L_2 2 j^ . : j . . j

.'>. ) 0 . 0 " 0 . •)

0.0 " . • > . • '

0 . 0 0 . 0 J . >

0 . 0 0 . )

0 . 0 "»..)

Page 177: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

170

I 2 T " " L29 130 131 L32 133 L34 L35 L3o L 3 7

1 3 3

1 3 ' " >

1 4 0

I 4 L

L 4 2

l^i 1 4 4

1^5 146 147 1^3 1 4 9 ' 150 151 152 153 154 15"5 15o 1 5 7

1 5 3

1 5 9

1 6 0

1 6 1

1 6 2

1 ^ 3

1 6 4

1 6 5

1 6 =

16"^

1 6 3

1 6 - )

1 ^ 0

1 7 1

1 7 2

T - > — - r -> i. ' •

S Î G - '

S I G *

S I . G ' :

S I G '

5 î--.,'; SI :^ . ' : r; T r. V

S Î G ' .

S I ' " . "

5 ! f , ' .

S! v; 5 I G: S I G ' ;

S Î G V

S ! • ' .'•

S I G \ SIG^ S I G ' 5 IG^ SIG.' S IG." S I G ' S ÎG ' S î G ' S I G '

' S I G '

SI'--^ S IG ' Sl'-' S I G ' 5 ! "•' 5r~? 3 I G.' 2 , >

S 1 9 -

s:."-' S I ' V

S I ' 9 '

5 ' '"•'

Sir . ; s :G: S i G :

à L = '* ' -r

- V. -

* 1 =

1 1 =

i L =

' - L -

i L -

- L =

- • - -

\ i =

'. 1 =

AL = ^ —

i i L =

' - L = ; i ' =

l - iL =

i A L = '

: A I = < • —

A L = A 1 _

* » • • "

' A L =

• i L =

: A L =

• i i ! =

' i L = > * t - •

* 1 •

. \ < —

JA L =

; A L =

• « w. ~

' : - ' _ = « t . .

'** - , ~ 1 . < 1 _

'.i i L =

••JAL =

' !\ L =

' '• L =

' A L =

' J Á L -

^l i L =

173 s r ^ " M = 174 S r 9 - AL = J 7 5 S I G ' A L = l"7'6 S I G ' ; A L =

177 SIG ' . ' iL = 178 S î G " i L = 179 S!G. ' . ••L = 130 S I G ' i L = 101 S Î G " - L = 132 S!G.')*. L = 133 S G'..". ^ 1 Î 4 S!G^- .L = 135 S : G " Í L =

l n 6 S r - ' ^ i L ^ 1^.7 sy'".^;AL-1^-í S Î G ' : i L = 139 S I G ' U L ^ 190 SIG'- ! iL = l"91 s ;.G"*-i -

3-^.

7 )

'- '?

i . ; ' )

9 ?

7 0

3 3

') _ 3 j

- 7 . .

- 9 2

1 •) •

_c ;2

- 7 "1

-2-^ 0

33 70 92

l O J 92 70 3-í

0

- 3 3

- 7 0

"-QZ

- 1 0 J - q o

- 7 0

- 3 3

J

" 3'?

7 0

r) 2

LOG

9 2

7 0

3 3 \

- 5 3

- 7 0 - 9 2

- l o J - 9 2

- 7 J

-'i^

'J

3 5

7 0

q ?

l ' J . )

9 '

7 0

3'-í

0

- J 3

- 7 9

- ' - ' 2

- K » 0

r —

F = c -

c -

c —

= -

r- -s

= -

.- = - -z

*. =

- =

,= =

r = C -

C =

P =

r = z —

c .

f-

f -

z. -

C z

Z •;

- 2

c -

?" -

. = •

C ;

F -r ;

- ;

- :

- : <-.

c •

= •

c

c

- •

r::

f :

?

.-^

c

p C

c

c

-T

--

-c

c

c

=

w •

0 . : < .

\ i .

o .

0 .

L ,

. . ' .

J .

• j .

G .

0 .

. • 1

L' .

0 ,

i i .

• J

•w^ .

)

0

o

0 '^\

j

( j

. 1

G •)

j

J

0

0

J

u

G

j

- •

!_, -.

u 0 .n o 0

u

. 0

. 0

• 0

, 0

. G

, U

' • ' )

, 'J

. j

. .';5

. J

. 3

. û

. 0 . 0 . • • )

. 0

. 0

. •}

.0

. './

. 'j

. 0

. ' j

m .rf

. J

. G

. J

• j

. 0

. 1)

. G

. ; . '

. •')

; 0 o o

•.•). 0

c. •w' .

r. U .

0 .

j .

e. :' .

U -

J .

... • ( ' • .

• •

0 .

l . ' .

• ) .

0 .

o . o .

G .

0 .

0 .'_)

G 1 1

2 9 -j •G G j '9

0

> j

• - '

',t

•)

• j

'-

j

)

V {}

0

. 0

, 0

, ' . t

, 0

0 . J

0 . 0

O.'-j

o.c 0 . 0

J . O

O . J

0 . 0

0 . 0

O . ..'

G . ' :

O . G

0 . 0

G - G

0 . 1

O . G

- i . G i 9 = -0 . . j _

0 . 0

0 . 9

- 0 . 0 • " 0 . . )

O . G

0 . 0

G . G

0 . 0

0 . J

J . G

0.0

' J . )

0 . 0

0 . 0

O . G

• • ^ . 0

^ . ' •

O . G

0 . J

0 . 0

G . G

' . • j

G . O

) . J

0 . •:•

O . G ' ' . • . 1

0 . G

• J . J

2 . 3 0 G '

U . G

0 . 'j

0 . J

0 . 0

• i . - y

" " ' = ! . , • : -

F A B ' i ^

F A f, 5 =

F A ^ S =

F î ' S ^

" ' F A £ 5 -

F A f - ^

F A - . 5 =

FA.? . ' =

F A ?. 5 =

F A:-' s -F i =; 5 =

F A i 5 =

.. '-..^-^Ã-

9 9 5 Í : Í 7 5 = c i i, . ^

^ A ^ S =

- A e 5 =

FA- : iS =

" F '. •: S ="

F A - 5 -

= A=. 5 =

F A b î =

FA = " -

= i 3 5 =

? i ' - s =

- A ^ S -C A =S =

= 1 - 5 = í A 3 ' , =

F A = S =

- Í -: 5 = = 1 r. - ^

= A 3 : =

F i T 5 -

• A 9 ' =

F A .- ' = C l 3 " -

c *. 3 ."; •:

^ ^^'^ :^ C ' - C

C * ' ' .' ^

C * - ^ -

G 0 ) 0 F i r .-•

F * : -

? i -3 ••; -

-1.'-^ ' -

F i - 5 = c ». -3 •:, =

. 0 . 0 0 . 0

0 . 0

0 . 0

c.o .e

0 . 0

0 - 0

" ' Ô ' . J

0 , 0

0 . 0

" ' G . O 0 . 0

0 . 0

O . G 1 . 0 2 1 2 2 ^ 0

0 . 0

o'."d' 0 . 0

_ 0 . 0

u . O

0 . 0

Z.Q

Û . O

0 . 0

O . G

û.O 0 . G

0 . D

û . •G

0 . 0

0 . Û

'G. )

0 . )

">. )

O . G

0 . J

0 . 0

G .0

0 . )

O . G

0.0 0 . 0

G . G

- ' . :•

O . G

0 . )

O'. 3 5 G 9 4 J 5

C . G

' ^ . j

O.'O

• » . )

j . -'

0 .

0.

0 .

0 .

G .

0 .

0 •

0 .

û 1 >

^

i ;

'.

0 Q

'J

0

A -x 5 =

F i" R 5 -= A 'í 5 : = A3S^ F t ^ S :

f 135 "c • ,i ";:

P -. ^ '.

0 . 0 0.0 c.o

""0.'3 G. -' 0 . 0 • ) . ) 0 . 0 0 . . )

~ 7. 1 •9 .0

Page 178: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

171

1 > - ,

1 9 5

19 .^

L ^ H

1 9 3

: • • < • ; )

2 0 Ô

2 J L

2 0 ?

2 0 3

2 >-•

. ' . ; 5

2 : ^o

2 0 7

. 2 0 3 .

2 í 9

2 1 0

2 1 1

2 1 2

2 1 3

2 1 4

2 1 5

2 1 6

2 1 7

2 1 3

2 1 9

1 2 0

2 2 1

2 2 2

2 2 3

2 2 4

2 2 5

2 2 5

2 2 7

2Z^

72^

2 3 J

2 3 1

2 3 2 2 3 3

2 3 4

2 3 5

2 3 6

2 3 7

. ' 3 3

2'3 9

2 4-^

2 4 1

2 - 2

2 4 3 2 4 4

2 - 5

2 ^ 6

2 4 7

2 4 J

2 4 9

2 5 0

2 5 1

2 5 2

2 5 3

2 5 4

2 5 5

S ! .. - '_ = S I G M A I =

S I G ' ^ : L -

S : G - ; r L -

S î " " i L -

- Î G - . - 1 -

r r •- • • . -

5 - ' ; • , L, -

5 1 ' ^ ' ^ : L =

5 Î 9 « - . L -

S I G ' . A L ^

5 : G " ; Í L -

5 1 ' * ' A L -

S I G ' ' - L = . . _ .

5 I G ' ". L "

S I G f " . L =

S f G . " i L =

S I G ^ i L ^

S I G \ ' - . L =

. S I G . ' ; i L =

S I ( j ' i L =

S ! G ' ; i L ^ 5 I G " A ' . .T

S I G ' ^L =

S I G ' i L =

_ 5 I G " J A L = S Í G V A i =

S I G > - ' . L ^

. S I - . ' : A L =

5 I G ' . A L =

S Î G - . i L = S I G " . L = r . - \

T ' 1- \ • • _ ^ l ' - _ -

c: V - » •_:_-

• . Î G ' ^ - . L -

5 : " . ' . AL =

5 I G í i L =

S I - ^ ' l =

S I 9 : , A L = S : 9 ' >! =

S I G ' ' " . L =

5 IG.* AL =

5 ! G ' ; i L =

S I G " : L =

S í ; V -.1 =

S I ^ . ' A [ . =

s i;:"-.L = S Î G ' i L =

5 : G ' : A L =

S ' ,9 ' •! ! =

S ! • - , " . ' L =

s l ' ^ ' ; i L -S : n t : \ i =

S Î ' 9 " - L =

S I G ^ ' i l = C 1 '','•, ^ \ =

5 ! 9 ' : ' . L = S I f . r . A ' . =

S ir. , ' . ' - .L =

S : ^ ^ A L =

.^ - - - -' 1 • =

9 : - =

9 •] ' -

7 1 ' =

3 ! " =

0 C-r

- 3 -í = =

- 7 J - = - Q .•• •.-. - .

- ^ J = i

- 7 . 1 = -

- 2 -i .= = • ) = =

3 •î = =

^ ) ^ = 9 3 c -

1 0 0 - =

9 2 - =

- .. . . . . . . 7 0 F = 3>? = = -

0 F =

- 3 i = -- 7 0 F .

- 9 2 = =

- I G J F = _ r, 2 c -

- 7 .. = =

., - 3 •? = =

) ^ =

3.3 P -

7 ) = = _ r, ) 1" -

lO') r =

9 2 ^ =

7 ) - = 3 J = =

'• z —

~ 3 - i ' =

- 7 j - = - Q ^ C r

- 1 0 0 - = - Q 2 <= =

— 7 0 F = - 3 R r- =

j ^-r

3 3 ' ^ 7 0 e ^

'3 2 = - .

I.'-C' ' ^ ^

^ 2 F =

7'-) P = 3 ! ? =

0 f =

- 3 - í ^ =

- 7 0 - =

- Q 2 = ^ - l , - > 0 = -

- 0 ? = -

- 7 0 F = - 3 - P =

_ ^

0 . 0

' j • ' j

0 . '.1

• ' . ' )

t . i .

o . ) . • . • j

V. . 0

. 1 . .

J . )

J . O

o . O

•J , 0

. ' . 2 9 0 . ) V ; > J 0

1. )

0 . 0

o • )

0.0

û . O

0 . 0

0 . 0

«>. .3

c.o o . O

' j . 0 G . G

. ) . 0

u . )

..>. J

o . •)

u.o 0 . J

J . ' i

0 . 0

> ' . •.')

0 .0

' J . J

) . )

í; . 0

' . ' . •, >

0 . 0

• J . . )

j .0

G . C

. . ' .

0.05o:cou 0.0 j . O

0 . 0

• : ) . j

O . G

0 . 0

j . )

o . O

. > . ' )

. ' . ' )

0.0 . ' , ' )

U . O

• . 0

•:•. n

u .

0 . 0

) . )

O . j ) . . )

) . •

0 . 0

0 . )

G . O

. ) . • !

G . 0

. G

- 2 . 2 3 ^ 9 9 n

. ) . )

0 . 0

n . . )

o.c O . G

0 . ' )

O . G

'^.•j

0 . 0

O . J

0 . :>

0 . 0

O . G G . O

G . 9

G . G

'G. . )

0 . ^

G. .9

0 . . . '

0 . 0

' } . J

•/'. .J

0 . 0

• • ) . ' • •

o.c ,"l . o

C ' . '

0 . 0

0 . .•'

1 2 3 . . - ; 7 ^ - - 3 ^ ^ 6

0 . 0

O. . " '

0 . 0

' ) . "•

G . G

0 . 0 " ^ . • )

0 , G

' l . j

j . )

0 . 0

' ' . • ) O .

0 . 0 0 . )

- - r > -=^^^3 =

= Í H 5 =

= A .=? 5 =

- A E 5 -

- - - j -

F i -;- í =

= i ô 5 = = i ^ 5 =

F A 15 =

- A - 5 =

- A 6 S -

F 1 i 5 =

- '• ; S =

F A - 5 -= A =i -, =

= A.3S =

F a 2 S = C » 3 C -

'=A~35 = ~

= A 5 S =

= . ' 3 5 = = A = 5 =

F i 5 3 =

F A 3 5 =

= J i 5 = = i 5 ; = — 4 .• r —

"PA S= " '

= i 3 5 =

F i .3 5 -

~- i - S " c ^ q-- =

= . i d 5 - (^ASS-^^

F i ? 5 =

FA t • = F A .- N =

= A : =

= : ^ j =

í i -; 5 -

- -' .-. > --

- - -. ; =

- - • 5 = C _ ; •v T- . T_

? - : : -^ A .-> •' -

r - ^ -* '• " "

F i 3 , =

F i - ^ = c :•, c. ; -

- ^ - - =

"- ^ 7 •) =

r A 3 5 -

= A ^ 5 = = A :-';,=

r •: =. 5 =

= A ^" í =

= - '3 5

î '. - 5 =

0 . 0

) . )

) . ' )

• ) . )

9 . 3

9 . »

: . J

• j . . '

' • ) . ' )

. ' . j

G . O

2 . 2 -5 3 o • J 3 7

u . J 9 . )

? . J

G . )

O . J

9 . •)

' ) . J

' ) . )

0.0

0 . )

9 . ) 0 . J

» . :•

U . j

. ) . ,

9 - ••

0 . ' J

0 - 'k

o - J

">- )

0 . . j

) . G

0 . G

) . • )

' - )

• : . o

' ^. 1

j . )

• • ' . G _

. )

2 . 3 " -» - r - « .

" . . )

G - J 0 . 9

) . )

) . 0

) . 0

1 . )

• ) . 0

^ . J

0 . 0

O . J

) - )

G - J

J . )

0 . . )

Page 179: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

172

DFT of S.., scaled by 1000:

0 SIG:.--- 0 c- Q^3 Q j F A 3 5 =

^ SIC-' A.... 332 F^ u.G J.O FA3S = 2 S:G'.ÍL= 737 F= U.GJIGUCO - 15.9459-:*91 P^bS = 3 SIGG--.^- 9 23 F= G.J .O F A 3 5 = ^* SIGN-.- IGOJ F= j.u 0.0 FASS = 5 SÍG.^iL- 9 23 ^= 0.0 0.0 FA3S-^ SIGí\-!= 707 F= 0.0 0.0190000 FA35 = T SIG'\AL= 332 == J.O O.G FA35 = 3 SIG''.AL= 0F= G.O 0.0 FA35 = g3IGr.AL= - í S i F = 0.0 u.O F A 3 5 = 10 SÍGN-:-= -7 07 F= j.o -0.00400 00 FA3S = 11 SIG.\-L= -923 F= 0.0 0.0 rA?S = 12 SÎGNAL^ -1000 F= 0.0 0.0 =A3S= 13 SIG.NA;.= -923 F= 0.0 0.0 F A 3 S = 14 S IG . ' ; i L= - 7 0 7 F= -O.GOIOOJO 0 . 0 2 7 0 0 0 0 r A 3 5 = 15 SIG. ' . -L= - 3 ^ 2 F= 0 . 0 0 . 0 FAeS = l â S I G . \ A L = 0 F = J . O 0 . 0 FA3S = 17 SîGNAL= 382 F= 0 . 0 0 . 0 FA3S= 13 5 IGNAL= 7U7 F= - O . OIOOOO - 0 . 0 2 7 0 0 0 0 FA3S= 19 S Î O r i A L - 923 F= 0 . 0 0 . 0 FA3S = 20 5 I G . ' . A L = IC O F= O.G 0 . 0 ^ ^ 3 5 =

2L SíG.'iAL-- 923 F= 0 . 0 Q.O FA3S = 22 S I G ^ i L = 707 F= o . J 0 . G 0 4 o J 0 0 FA5S= 23 S Î G . " ; A ' _ = 332 F-= 0 . 0 0 . 0 F A Ô S ^

2 4 S Î G ' ' . i ' . ^ 0 F= 0 , 0 0 - 0 FASS = 25 S ! G ' . ; - = - 3 6 2 -= 0 . 0 C.O FAGS^ 26 S : G \ Í L = - 7 0 7 == 0.0 - 0 .ÛL9GG00 FA35= 2 7 SIG. ' .AL- - 9 2 3 F= J . J U.O r A 3 S -2d SIG.' iAL= - I J G O P- O.G 0 . 0 FA.3S = 2^ SIG'..AL= - 9 2 3 r - O . J 0 . 0 F A 3 S =

3u SIG. '^ ' .L- - 7 0 7 F= O.OJiûOOO 15.9459<391 rAdS = 31 SÎG'')AL= - 3 8 2 F= 0 . 0 0 . 0 FA3S =

c. 0 .

1 5 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 , 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 . 0 0 0 . 0 0 . 0 0 0 0 0

15 0

. ' t

0 9- .59991 0 0 0 0L9O0ÛO •j

0

0 G040ÛOO 0 0 3 02 70135

. 0

.0 0 0 2 7 0 1 3 5 0

. 0

. 0

. 0040 0 0 0

.0

. 0 , J • J 1 9 J u 0 0 . 0

. 0

. 4 5 9 9 9 1 .0

Page 180: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

DFT of S^, s c a l e d by 10: 173

S:G. 'JAL = 5 :G-JAL = S : G ^ A L = sîG.' ..;.= .5 i u . - L -S I G ' ; A L =

5 : G ^ ' L =

S!G'-. .L = SIGf AL =

SI G\AL-== S:Gf.AL = s : G . ' ; A L -^ T , - * ' > I -

SIG.';*-.L = 5 IG. \AL = S:GNÍ.L = SIG.NAL = SlG.'iAL = SIG.'^iL = 5IG.'-;AL =

S I G N A L = SÎG.'.AL = •îIG^AL^ s :G.*-AL = 5IG.NAL = 5IGí!AL = 5 Î G ^ N A L =

3IG.NAL = SIG'iiL = S I G ' ' A L =

SIG'-AL = S iG.'. il = S:G''.'AL =

5 IG^*.:AL = SIGr.AL = S ; G.N'A L = SIC.NiL = SIGP.iL-S IG.-^AL^ S I G ^ A L = SIG.'1AL = 5IÛ'MAL = SIGi\AL = S iG.^AL = S [G\^iL = S I G N A L = S iG\AL = oîG.NAL = J:G^;-.L = SÎG.'.AL = SIG*!AL =

j . . . . J L -

5 i i.j N i L =

j 1 j . . . L —

;: IGí.AL = :> i i j . x - L -

5IGi"- .L = SIGMAL= S Î G ' \ - L = 5 ir,\\i = SlGfv\L = > !G: ;AL =

0

8

1 6

Zi

2 9

3 L

3 2

3 1

2 9 2 6

2 3 2 0

1 9 l ô

1 3

1 9

2 0

2 0

2 0

2 0

1 7 1 4

9 4

0 - 5

- 9

- 1 2

- 1 2

- 1 1

- 3 • " - T

0

4

3

1 1

1 2

1 2

9

5

0 — •-.

- 9

- 1 4

- 1 7

- 2 U

- 2 0

- 2 0

-zo - 1 9

- irt - 1 3

- 1 9

-Z'J

- 2 3

- 2 b

- 2 9

- 3 1

- 3 2

- 3 L

- 2 3

- 2 3

- 1 6

-H

F = f~ -

F =

F =

F =

F =

F = ^ S

^ = F =

F = —

F =

- =

F =

F = c =

F =

,- = F =

F =

F =

F =

F =

F = F =

F =

F =

F -

F =

F = F =

F = F =

r = F = C -z.

f = f-

F =

F =

F =

F = C s

F =

F = ^ _

c =

= = c =

F = F =

F - f ^

F =

F -

F = c -

F =

F =

F =

F = F =

r =

1 . 500000') 5 . 7 ^ 9 9 9 9 2 2 . 5 J 9 9 9 9 4

0 . G

0 . 7 G G G 0 G O

u.o 0 . 4 G G 0 O O

0 . 3 G 0 Ô J O J

- ' j . i > ) ' j ' : ' O o o

- 0 . 9 J O O J O O

- 0 . 3 o o O o u O 0 . 0

- 0 • í, J U O Û O û 0 . 0

- 0 . i G J o O J O

0 . 5 0 J J J O O 0 . 0

- 2 . 5 9 9 9 9 9 - »

- J . 9 0 0 0 0 0 0 0 . 0

J . L J O O U o O 0 . 0

- 0 . I G O Û O O O

- 0 . ' t O O O Û O u

- 0 . 5 G o 3 0 u o . 0

0 . 0

0 . 0

- 0 . 7 0 0 0 0 0 0 . Q

- 1 . 3 9 9 9 9 9 6

- 2 - D . l U U O O o

0 . 5 0 'G G u o G

- 2 . 5 G G 0 0 O J

- 1 . 3 ^ J 9 9 9 o

Q.O

- 0 . 7 0 0 J U O O

0 . 0

0 . 0 'G . 'G

- 0 . 5 U O J O O U

- J - 4 J U U 0 0 0

- 0 . l O U J o o o

0 .

o . 1 0 0 0 0 0 0

0 . 0

- 0 - 9 J . j J J U U

- 2 . 5 ^ 9 9 9 9 - »

0 . G U . 5 0 Ô 0 G G O

- 0 . l O O O U O U

0 . 0

- 0 . I J GOOU

u . u

- G . 3 J U O U O O - J . 9 J 0 J O U u

- J . 5 J 0 . ) 0 U U

G . 3 0 0 G 0 0 G

U . 4 U 0 U U U U

\j .0 U . 7 J 0 0 0 0 0

o . 0 2 . 5 9 9 9 9 0 . t

5 . 7 < ^ 9 9 Q 9 2

0 . 0

- 4 3 . 5 O G O J 0 U - 3 7 . 5 9 9 < ; 9 J : 3

0 . 0

- 2 3 . 2 9 9 9 3 7 5

- 0 . 5 0 G G 0 J U

1 . 2 9 ' ^ 9 ^ 9 2 2 , 5 G ) G J G 0

û . 4 ' J )O^JU0

0 . 3 G G G G 0 0

0 . J O G O J O U

0 . 3 G G L G C O

C . 6 G 0 U J 0 O 0 . 0

1 . G G J O G J O

2 . 7 9 = . 9 4 g 2 C . 2 C 0 G U J 0

- 0 . 3 o C J G O U

Q . O 0 . 0

0 . 5 0 J O O 0 0

- O . o G J C J G O

- 0 . 3 J O O J O 0 1 . 0 9 9 9 9 9 - T

O . G - U . 5 J ) u G J u

0 . 3 0 0 0 J O

0 . 5 a t ; u G 0 0

0 . 1 0 íOOOO 0 . 0

L . 2 ^ ' 5 9 9 9 2

3 . 5 ^ . ) J ' ' . ' 0 ' ) 0 0 . 0

- 3 . 5G ) ' ) J , ) U - 1 . 2 9 9 9 9 ^ 2

0 . 0

- 0 . I J J J J O O - 0 . 5 G J u J J J

- 0 . 3C J . J J O û

0 . 5 C G C u G 0

0 . 0

- 1 . 0 9 9 9 9 5 - .

0 . 3 'G • j 'u j 0 0

0 . ÍÍOC00J\J

- O . 5 G G O G 0 G

0 . 0

0 . 0 G . S G G u U O O

- G . 2 GCGGG

- 2 • 7 9 9 9 9 ^ 2

- l . U u O U O U U

0 . 0 - G . 6 O C G J 0 0

- 0 . 3 G G J J G O

- G . 6 0 G G G J O

- G . 3 G U 0 J : 0

- 0 . 4G )••)'.)01»

- 2 . 5 0 J . G G C

- 1 . 2 9 9 9 9 9 2

G . 5 J ) 0 0 0 0

2 3 , 2 9 : ^ ==373

0 . 0 i 7 . ^ 9 ) 9 9 J 5

4 , 3 . S v ^ . i i j J U U

F A B S = F A B 5 =

F A 3 5 = F A ^ . > =

F A : 5 = — /> ^ •: ^

F A ^ ; =

r A 0 5 = r A -, 5 =

F.'.í 5 =

F A 3 5 = F A 3 5 -.- A S j =

>= A 3 5 =

^ A 3 S = r 1 .;}. 5 =

= A 5 5 = F A - î í s

F A 3 5 =

F û 3 5 =

F A 3 5 =

• = A ^ 5 ^

F A o 5 =

F A 5 S - -c * .3 r -

F A á S =

F i 3 5 =

F A ^ S ^ -

F i i 5 = =A 3 ; =

F A 3 5 = F î J S - .

F A 3 5 = F A 5 :• =

F A ^ S =

F A : ^ 5 =

F i f S -

= A 3 5 =

= A ? 5 = = A •: 5 =

= A 3 5 =

FA- j .5 =

F A 3 i -F A e 5 =

= i 3 5 =

F A ô S =

F A 3 5 =

= A 3 : =

F A cr 5 = F i 3 5 =

F A : i 5 -

F A ^ 5 =

F i = ^ =

= A 3 5 =

F A 3 > -c ^ - , ; = 1 - ^ _ . . ^

F A 3 5 =

F i .3 " =

F A B Î =

F A b 5 = F A 3 Í =

F A 3 S =

F A ^ S = í F ^ - : S =

1 - 5 U J O O Í / J

4 3 . 3 - 5 3 : 0 5

3 7 . ' j 8 9 i ' 5 3 3

0 . Û 2 3 . 3 L U - r 3 5 3

0 . 5 J 0 O I J C O l . 5 6 J l - r 5 6

2 . 5 L 7 9 3 - ^ 3 'G . c -•O' i 1 2 4

0 .-•• '•-,-) 0 3 1

0 . V .'• Í 2 0 -

• J . J G ZJO'JO

0 . 6 - - 2 7 6 2 . O

1 . G :• 4 9 e 7

2 . G-^42 -3^6 0 . 2 0 0 0 G C Ô

2 . 7 2 0 2 ^ 3 0

0 - 9 9 0 0 0 0 0

0 . 0

0 . 5 O 9 9 U 2 0

0 . 6 G G 0 v . 0 0 G . 3 1 6 2 2 7 7

1 . i 7 0 . r 6 9 3 0 . 5 0 0 U U J J

U . S U U U U J v ' 0 . 3 0 0 0 0 0 0

0 . 5 0 G G G G O

0 . 7 0 7 1 0 6 7

0 . 0

1 - 9 1 0 4 9 5 3 - . 3.-) 1 1 6 2 7

0 . 5 0 ' J û û J 0

- r . 3 J L 1 6 2 7 L . 9 1 0 ^ 9 5 3

0 . 0 G - 7 0 7 1 G 6 7

j . 5 J O J o U j

0 . 3 J O J O O O U , J \^ j J - J 'J

0 . 5 ' J o O O J o

l - 17 J 4 5 9 3

G . 3 1 0 2 2 7 7

G . - L - j J U o o G

G . 5 G ^ 9 2 0

U . . J

J . }''^:;Ú<JOD

:• . 7 í - u 2 9 3

G . 2 ) J J ^ J O O

2 . . i ^ 4 2 3 9 3

l . G 0 T 9 O 7 7

0 . 0

0 . 6 0 3 2 7 S 2

• . ' . W0O'J>.)0 0 . 0 7G 3 1 0 4

C . í 4 3 ' 3 3 - ^ L

0 . 6 4 0 3 1 2 4 i.'jlT^^-^i

1 . 3 6 0 1 ^ 5 6

0 . 5 0 0 0 0 0 0

2 3 . 3 1 0 4 S 5 3

0 . 0

3 7 , 6 3 9 7 5 3 3

4 3 . 3 4 5 5 5 0 5

Page 181: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

174

DFT of S^, s c a l e d by 100

j SIG\AL-1 S : G . ' J : L -

2 SIG*.A'^ =

3 i!GN-^L =

4 SiG''.''\L =

5 SÎG.'.-L-

6 SÎG'.-iL =

7 iIG'!-,L-

3 5!G-:.-.L-

9 SÎG.-;AL =

10 SIGNAL=

U SÎG.\iL =

12 S : G M Í L =

13 SIG'\AL =

1-t SIG';AL =

15 S : G V A L =

L6 S I G N A L =

17 SIG'\AL =

13 SIG\AL=

1 9 S I G - ' J A L =

20 SÎGNAL^ 21 SIGNAL= :22 S I G ^ i L = 23 S I G : Í A L =

2 4 S Î G ' : A L =

2 5 3IG.'*AL = 26 SIGNAL= 2 7 5IG.* . iL = 23 S.uiNAL =

2 9 S Î G \ ' - L =

30 S I G < ' : A L =

3 1 SrG' .AL = 3 2 S IG . ^ i L = 33 5 I G - . - L = i^ SIG.\AL = 35 S iG. ' ; iL =

'2t> S I G . ' . * A L =

3 7 S Io ' -AL = 33 S : G \ Í L =

i 9 s : G '\ û L = VO S IG . "AL = - r l S I G N A L = -T2 S I G \ A L =

4 3 S I G N - L =

;^4 s îC '^ -L = ,45 ?ÎG''.AL = j46 S : G . \ A L =

fri S : G \ Í L -

'4o S Í G ' ' I A L =

^49 S Í G ' . ; A L =

50 : i G N A L = oi S IG ' l ' .L = ,32 S : G . * - A L =

;5 3 5 I G ' > . A L =

':5'* S :Gr .AL = ;5 5 5 I G ' " ; i L =

5-; 5 : G r . - L = G7 S Î G ^ ^ L -

.53 SIG.*i '*L = i39 S I G ' ; A L =

• j u 5 k o I j L = 6 1 : í r , t ' \ i =

^•i 3 : G ' ; A L =

r. J '-, : r „ ' .AL =

0

6 7

l o 7

2 3 3

2 f l 2 3 1 1

3 2 0

3 1 2

29 1 2 6 3

2 3 4

Zot

1 9 0

L 3 2

1 3 2

1 9 U

2 o O

2 0 3

2 0 9

20 - j 1 7 o

1 4 4

^ i

4 5

- 8

- 3 8

- 9 3

- 1 2 2 - 1 2 9 - 1 1 7

- 3 9

- ^ 7

U

4 7

3 9

1 1 7

1 2 9

1 2 2

9 d

5 3

3

- 4 5

-sa - 1 4 -

- 1 7 3 - 2 0 0 - 2 J 9

- 2 J b

-ZQO - 1 9 0

- 1 3 2 - L 3 2 - 1 9 0

- 2 u 3 - Z 3 4

- 2 o 3 - 2 9 1

- 3 U - 3 2G - 3 L 1

- 2 3 2 - ^ 3 3 - 1 6 7

- 3 7

F =

r = :

r* --— 2

— ^

•" = ,- — * —

^ = F =

^ — r = F = T- —

F = c -r*

r = = =

.- = • - =

= = F =

,- = F = - = f = F = F = c — — — c —

p = p =

.•- = F = r- =

r- =

^ ..

F = c -

F =

F -

F = e =

F = r = F =

r = p =

r = f =

F = - :z

r -

P -

C -

î" — F = . .;

:: = F = F = f -

- =

0 . U O G C G U 0 . - 7 0 JGU'J 0 . i G u •) J j G

0 . C3 J i.jGU 0 . 3 J G C • J j

- O . •G ' j 0 G G ' j G 0 . l i j ' j o j u

0 . C20' .GG-j - • J . 02 j J. iu:. ' - j . L ' 2 O . J O J J

- 0 . 1 70 00'.. j o . i j 2 • j • j u J 0 . u 1 G o 'u G j

— 0 . 0 - 0 J 'J J J

0 . u 3 o J o o " - 'J . uJUGUuU

ô . O

- G . G 9 0 0 G 0 O - U . G 3 U ) 0 U U

o . U 30 JU 0 0 - J . 0 3 U U G U U - ' . / . uLGJGOU

0 . G1O oGG 0 . G L Û O U U U

- 0 . U 2 U G G O U - o . J 2 U U U U O

0.0

- u . G L C G G C C - 0 . G I G G J G U

-u-oGoGouo - G . L O G O O U

- O , 2 5 jJ ' j . )00 O . G

- j • i 5 'G G G1 j >J - u . 1 J . J O O ' ) U

- u . JO>J.,OOJ

- 0 , 'jÍjjO'OO —j . u L j J O J G

0 . J - u . J 2 J ) ) J 0

- 0 . OZJ'OOJO

0.GLGOUOU G . G Í G G J C O

- u . G L w G J . G

- U . J 3 ' J ' J J O O

0 , G 3 J G J G G

- o . J J U J G G U

- G . U 9 J Û U U O

U . 0 - o - u JG'JUJU

U . U G J G O G ' G

- G. . J - Ú ' J - J ' J G

U . O I O O U U U

G. J 2 J V J G 0 G

- G » L 7 u '• J G 0 - u . G 2 j ^ ' u u G

- u . ••> 2 u 'J G' u

0 . J 2 O y o o j 'u - L 6 G ' j 'j: j 0

- j , G 9 0JOOO u . u 3 0 G G J U

0 . 'J3 j J J G U

J . L )JJuUU

U . '•' 70'.. ' . >j J

0 . 0 FAS:- = - 6 2 . 3 5 9 9 3 5 4 F A j ' : ^ - 4 6 . 7 7 9 9 9 3 3 FA3 5=

0 . 0 3 G : O G U F A b S =

- 3 0 , 9 6 9 9 3 6 0 F A 3 5 =

- 0 . 0 7 0 C J 0 0 F. -V35 =

û . 15C0GUa F A 3 S = 0 .13GGGGO F A d f ^ 0 . 0 F A 3 G =

0 . 0 f ^ i r S - ^ - O . O I G G J G O F A G S ^

0 . C 9 J ) J J 0 F i 3 5 =

O . I 7 9 A ' - i 9 9 FA3S = - .G5GUUG0 F A £ S =

0 . C 6 C G 0 G J F A O 5 =

0.2'GOOUUO F A 6 5 = 0 . 0 F A 3 S =

0 . 0 F A 3 S = G . Û 6 J G u J J F A 3 S = 0 . 0 1 J O J 0 0 F A 3 S = Û . u FASS=

- O . O l J U O U U FA5 5 -0 . C 3 C C 0 O O F 4 3 S = 0 . 1 4 0 0 0 0 0 F â 3 S = 0 . 0 F4 3=

- 0 . G 9 0 0 J U 0 F A 3 S = 0 . 0 F A 3 S = O . 3CuGGO F A 3 5 = 0 . l O G O O O O F A 3 S =

- 0 . 0 6 G J J J U F A 3 S =

0 . U 7 0 O J U 0 F A 3 S = 0 . 16OGUU0 G=A35 = Û . J F A 3 S =

- U . 1 6 0 C J J 0 r A 3 S -- 0 . 0 7 ) 0 . ) J û FA3 5 =

O . G 6 G ' . U J Û F A 3 5 -- 0 , l O u G J J O F A 3 S = - J . G 3 G G J J 0 = A 3 5 =

U.O F A a 5 = 0 . 0 9 J O U U U F A 3 S = O.G F A 3 S =

- 0 . 1 4 G O 0 U O FABS= - 0 . 0 3 G O G G Q F A 3 S =

O . o i C O J G O F A 3 5 = 0 . J F A 3 5 =

- Û . G I G G U J O F A 3 5 =

- Û . G 6 G J 0 Q U F A 3 5 = 0 . 0 F A o S ^ O.U FA3S =

- 0 . 2 U ) J o o u F A 3 5 = - 0 . G 6 G u G O O - A - S =

0 . G 5 J J J G U r A 3 S = - 0 . 1 7 9 9 7 9 9 F A J S -

- C . C 9 ) G J G 0 - A 3 S =

G. G IG-JG JO F A 3 5 = 0 . 0 F A 3 S -0 . 0 FA:3S =

- 0 . 13 )ooGU F A 3 5 = - 0 . 1 5 G 0 U 0 Û F A â S =

J . U 7 J C ' . - J 0 F A 3 S = 3 0 . 9 o 9 9 3 6 0 - ^ 3 5 =

- Û . J 3 C J - ' > . G r A 3 S =

. i 6 . n 9 9 > ô 3 F A - ' S ^ 6 2 . 3 5 9 9 3 5 4 - i B S -

O. l - ^UÛOOO 6 2 . 5 ^ 1 7 5 5 4 4 6 . 7 3 0 0 9 0 3

0 . 1 1 3 1 3 7 U 3 0 - 9 Í 3 9 9 3 6 j

0 - - . 1 4 ^ 1 7 5 0 . 2 1 9 3 1 , 7 1 0 . 1 3 1 5 2 9 5 '

0 . G 2 U U U U 0 G.G2GGUC0 0 . L ? U 2 9 3 3 . U . . ) 9 2 1 9 5 ^

0 , l o G 2 7 / - 3

G. J o ^ U i i ^ ' G . 0 6 7 0 3 2 0 0 . 2 0 2 2 3 7 4

0 , 0 0 . G 9 0 u o 0 0 0 . 0 3 7 0 3 2 0 0 . 0 3 1 6 2 2 3 0 . 0 3 0 0 0 0 0 0 . 0 1 4 1 ^ - 2 1 0 . 0 3 1 3 2 2 3 . 0 . 1 4 0 3 5 6 7 O . O 2 u u C 0 0 0 . 0 9 2 1 9 5 4

Û - U

0 . 0 : L 6 2 2 3

0 . 1 0 0 ^ = 3 7 0 . 0 6 7 0 3 2 0

0 . i 2 2 0 b 5 5 0 , 2 9 6 3 l o 3 0 . 0

0 . 2 9 6 3 1 6 3 . 0 . L 2 2 0 & 5 5 -0 . 0 = 7 0 3 2 0 ) 0 . 1 0 U ' r 9 3 7 0 - G 3 1 6 2 2 3 0 . 0

' J . 0 9 2 1 9 5 V 0 . 0 2 0 G U C O Û . 1 4 0 3 5 o 7 . 0 . G 3 L o 2 2 3 0 , J L 4 1 . - 2 . :

0 . .3'j''Jv»UO C . : K t ^ 2 2 d O - G : ) 7 0 3 2O '3 . J-rUuUGJ

O . G

U . ^022 3 74 0 . w c 7 Û S 2 0 U . G i 4U 3 1 2 G . 1 3 0 2 7 75

0 . 0 ^ 2 1 9 5 - * Û . i 7 o > 9 3 3 O.OIO'OOOQ

0 . 0 2 J U U O U

0 . L3L529 -> 0 . 2 1 9 3 1 7 L U- Í 1 4 U L 7 5

3 0 . •.•'t)^9 3 6 0 0 . 1 L 3 L 3 7 l

••*b. 7 i O U ' ' J 3 D 2 . 3 O 1 7 S ' 5 ^

Page 182: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...

175 DFT of S^, s c a l e d by 1000

0

l

2

3 u

5 6

7 0 O

9

1 0

1 1

1 2

1 5

i 4

1 5

1 6

1 7

1 3

1 9

ZO

2 1

2 2

2 3

2 4

2 5

2 6

2 7

2 3

2 9

3 0

3 1

3 2

3 3

3 4

3 5

3 6

3 7

3 3

3 9

4 J

4 1

4 2

4 3

4 4

4 5

T 6

Í . 7

4 3

4 9

5 0

5 1

5 2

5 3

5 4

5 5

- • o

5 /

5 3 5 0

6 0

6 1

6 2

6 3

5

5

i

.;

5

S 5

S

S

5 s 5 s s 5 s s s s s s s 5

S

s s s s s s s s

f

5 5 S s s s s S i

S)

5

s, 5

5

5 !

S

5 i

S >

j :

S .

S i

) '•

> '.

5 1

i

5 1

. }

5 I

5 ]

5 í

'. j • ' . - . -

î G.' A L

' G". i L

. '->• 1 - í.

G''. i ' . -1 r- . , . l • J . • - ..

[ Ci'.'-....

I G " , - L

: G \ . L

Î G ' . A L

I G ' . i L T /~ -..

; G ' ' . A L

: G : \ A L

I G N A L

I G . N i L

I G r . A L

Î G M ^ L

L G N A L

Î G ' ' . A L

: G \ Í L I G . ' 4 i L

I G \ A L

I G . N A L

Î G ' I A L

:G>.AL LG' . IAL

i G ' r A L

l ú . N i L

[ G . ^ A L

:G"^AL

IG' ' - iAL

: G * ' Í L

• \j • * • -

[•oy>\'.

L G Í . A .

[ G . \ i L

: G \ A L

[ G ' . A L

G ' i A L

:G' ;AL G \ i L

G N - L

[ C \ i L

G.N' iL

: G : \AL

G''. i L

j " -• i .

[ G ^ - i L

G ' .AG

C';- ' .L

G ^ ' A L

G.' .AL

" % A L

G *-; A L

G . \ A L

G\ . :^L

G. \ '>L

vj '•• - L

G.SAL

G * ; - L

S V A L

G.'>IAL

*" ' . A '

0

í 7 l 1 6 7 1

2 3 3 7

2 o 2 6

3 1 . 3

3Í J-r

2 3 1-r

ZbJ>~f

2 3 - 1

2 G B 7

L =• J ô

l i 2 3

1 3 2 3

1 = 0 0

2 U o U

2 J 3 0

ZQ'i^

2 J U -

1 7 3 7

1-T-rO

9 3 - r

4 5 7

- 3 5

- 5 3 5

- 9 3 1

1 2 2 3

1 2 9 5

1 1 7 6

- 3 9 U

- - 7 9

U

4 7 9

8 9 0

1 1 7 6

1 2 9 5

1 2 2 3 9 3 1

5 8 5

ú5

- ^ » 5 7

- 9 3 4

1 4 - 0

1 7 3 7

2 0 0 4

2 u 9 4

2 0 8 0

2 U U U

1 9 G U

1 3 2 3

l a 2 3

1 9 0 3

2 0 3 7

2 3 4 1

2 6 3 4

2 9 1 - r

3 1 2 2

3 2 J 4

3 1 1 3

2 3 2 6

2 3 : 7

i û 7 1

- 3 7 1

F =

F = r- _

— —

f -

" " F

,- _

" -c —

F =

F =

F =

F =

F =

c =

F =

F =

F =

F = í "

F =

r =

^ =

- = F =

F =

= = F =

= =

F =

F =

= =

" =

= =

= =

- = = =

F =

r- =

F = c -

F =

= = c -

f =

F =

= =

r =

F =

F = r -

c =

F =

F =

F = p =

- ^

í —

r* _

r = r —

F =

F =

U . O L 7 o O O U

O . G L 6 C G G 0 . -j

C . 'G

U . G J G O Û U Û

o.u O . G J I G G O G

o . UL i .)U U

- J . ' j j 3 j v j U ' J

- U . G i t u O O U

O . 0

O . G

- 0 . 0 0 ?GOo 'J

0 . 0

O . u L O O O O U

U . O l O u G U U

O . U

- U . O i U U U O O

- 0 . 0 1 - 0 0 0 0

0 . 0

O . G G I G G O O

0 . 0

u.u 0 . G O 3 G U U 0

- U . J j L u U U U

- 0 . J G - O U O G

- O . Q ' J l G O U o

•v . U

- • j . G J o O U G J

0 . 0

0 . 0 0 3 J U O U

- G . G G 5 )000

- 0 . G J 2 G G U 0

- U . O G 5 G o J O

G . G'J S G o O u

O . G

- 0 . G G Û G O O J

0 . 0 - J . G J '. GUOU

- j . JG-O 'GU

- 0 . G ^ L O O O U

0 . 'Gu 3 G u O u

G . G

'j . yj

0 . G J L .. u J C

G .

- 0 . Q I T G G G O

- J . J L C G O O O

0 , 0

O . O L G C U O G

U . U L u J u U U

o.u - 0 - O G G G J O O

J . o

0 . G

>J . u' JL — v U U U

- j . O J ? ) 0 0 0

0 . 0 1 L G U O U

u . 3 0 1 . 0 0 0 0

G . U

0 . U U ô ) : J U O

G . O

0 - j

U . C ' l 6 0 0 o U

0 . 0

- 6 3 . 8 3 3 9 9 ^ 6 - 4 7 . 3 í, 7 9 -* o V

U . '., vj J. V. U JvJ

- 3 1 . : ; 9 9 9 - * 3 ' '

- G . G U L U - ) J O r. 1 . ' ; '••. •-. n -.

U . Jt.0.j •J •J o

V ^ . - : - - V - ' J J U

- 0 , •. G - G J JO

0 . G C L ' J J G O

V . . s . V Í - / ^ U U n . . : ^ r. -. ;1 .-> j . -> - * . . . •j j

— ' J . J j I J ' - J O

G. •..'i )0') JO

O . G 2 G u G 0 0 0 . 0

- 0 . O I 3 0 U G 0

- 0 . 0-3—00 '00

O . O O L G O J O 0 . G j 2 ..j j 0 0

- 0 . û 0 l O U J U

U . G

0 . 0 1 4 0 J 0 0

- ) . J J2 J J G O

- 0 . 0 L 5 G G G 0

' 0 . j

0 . J G I O J G O

U - G 2 3G.-GO - O . J G L G G G :

0 - G l - G G G O

0 - J I 4 0 J G J

0 . 0

- 0 . 0 L - ' : ) ) O

- 0 . ' J I - O J G G

0 . U U L G J G

- 0 . G 2 3 C G 0 G

- 0 . OCÍ •}000 U . )

C . 0 L 5 0 J G O

U . J G 2 G 0 J J

- G . G L 4 G 0 G G

0 . 0

0 . O G L O J J U

- Û . : U 2 G G G '•)

- J . G G L G J J G

'w • Lí 0 " '• o c j

G . G L i O G O Ô

0 - 0

- C . G 2 G G G G 0

— 0 . G 2 J •) U j <)

0 . u J l . U . y U J

- G . J L l J G J J

- G . J O L G O G U

- . ., \j k \ j ' j j

j . G G — G u ^ w

- G - O ' . - o ; . . :;

- O . G 2 5 G J ) G

- 0 . O O s G U J O

G . J G I G G G O

3 L . G ' - ; A ' - . - 3 9

- 0 . 0 ) 1 0 JUO

4 7 . 5 -. 7 '^ ^ 6 2

6 3 . 3 3 3 9 9 9 6

F A 3 S -

F A 3 S = F A 6 5 =

F .*. 3 3 =

= i 3 S =

F A s S -

t A 3 5 =

f- -'•- 3 G =

- A 3 j -

F A 6 i -

F.-. 3 5 =

F /, 3 S ^

= A 3 S =

F A 3 S =

F A 3 5 -

F A 3 S =

F A B S = F A 3 5 =

F A 3 5 = - A 3 S =

= A 3 5 =

F A 3 5 =

F A 3 5 =

F A 3 S =

F A 3 5 =

= A? 5 =

F A 3 S =

F t; 3 5 =

F A 3 5 =

F A 3 S =

F A 3 5 =

= A 3 3 =

F A 3 5 =

- A 5 5 - ^

= A - S = = 1 3 5 =

F A 3 5 =

F A 3 S = = A 3 5 - ^

P A 3 5 =

F A 3 3 =

F A 3 5 =

F A r 5 =

= A 3 S =

= A 3 5 =

F A 3 ' = C * - ~ —

F A 3 S ^

= A 3 5 -

F A 3 5 -

•= A - 5 =

= ' . 5 5 = F A ô j -

r A 3 5 =

r A 3 'j =

.•= A 3 -C A. j > =

F A 3 5 =

F A 3 5 =

F A 3 S -

F i 3 5 =

F A 3 S =

F A 3 > =

F A 3 5 =

0 . ' J I 70O-JI . ,

o 3 . 3 3 3 9 3 4-V 4 7 . : Í 7 9 - D 2

G . G G I O O G O

3 l . 3 9 9 9 9 j 9

0 . ' j o L U •J u G

G . G J -> 0 8 2 -j

0 . 0 2 7 3 1 : u

G . - : G 5 ' J O G O

O . G l 4 5 r u 2

G . j j : G G O u

J . G ^ i J C U G

G . u l : ^ j í H

G . J u L • G G G

U . 0 2 G 3 b i j 7

G . C 2 2 i o O 7 O . û

0 . 0 2 G 5 9 1 3

0 . G i 4 5 o 0 2

O . u ^ > 1 0 o 0 0 G - j j 2 -G 3 o 1

O . u O i J C o J

0 . 0

O . G L - 3 1 7 3

O . U ' J 2 2 3 : L

O . L 5 5 2 - 2

U . O J i U O Ô J

G . u J l O G O O O.Z ? G 7 £ ) 9 7

G . G V J I G O G

0 , 0 1 - 3 1 7 3

C . G '.^6o-~.i

O". G J 2 0 C 0

0 - 0 l ^- - . o o 1

O . J 1 - 3 1 7 3

0 . o J L O O U G

0 . J 2 3 7 o ', /

0 . : j u L U u O U J . J . J L G O O O

0 . : L 5 - 2 V £

0 . .J J 2 Z 3 6 1

G . G L ^ 3 ^ i . . '3

0 . . .

0 . :! J L G o J O

G . : G 2 <í 3 r •

J - ^ j 1 J j ., )

j . -• 1 ** 5 o L.' 2

0 . J Í - 0 5 9 . 3

0 . U

O . J ' 2 3 o G 7

G . 0 2 2 j 6 G 7

0. j . l ô ' j Z'i 0.':'. i r013

U . u C L U o J 0

v... V- u 1 G C j 0

0 . ' . ) . -r 5 e .) 2 G - G - > G : G O

0 . J i m 3 o

' J . ) ' j í' V»f t! 3

O . G . ' I O u U O

3 1 .3 -3 ->t}9 j g

0 . J ' . : G J ' J - )

4 7 . - .s 7 9 9 :-, .•>

6 3 . •! ^ j W b - -

Page 183: EFFECTIVE COMPUTER ARCHITECTURE FOR A DISSERTATION IN ...