On the Design of a Microcode Compiler for a Machine-Independent ...

14
261 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981 On the Design of a Microcode Compiler for a Machine-Independent High-Level Language PERNG-YI RICHARD MA AND TED G. LEWIS Abstract-A translator system employing a partial compiler, inter- mediate language, and three-pass code generator is described that pro- duces compact microcode for a class of horizontal microinstruction machines. The techniques described include a system of translation clues, tags, and rules for identifying program variable usage patterns, flow-of-con- trol, and parallelism. The system has been implemented as a HLL com- piler, a simulator, and an optimizer for a class of horizontal machines. Index Terms-Code compaction, code optimization, high-level lan- guage compilers, microcode optimization, microprogramming, portabil- ity, translator writing systems. I. INTRODUCTION THE explosive rate of progress of hardware technology has made the microprogrammable processor characterized by horizontal microcoding extremely attractive for many high- speed and real-time applications, such as signal processors. However, the human microprogrammer currently has little but assembler and text editor to help him with code develop- ments [5]. The lack of software tools to support microcode generation results in high costs and poor reliability especially when the volume of microcode increases. The application of software tools to microprogramming will be termed "firmware engineering" in the remainder of this paper. The most obvious solution is to increase firmware tool capa- bility by encoding the intended application program in a high- level language (HLL), and then to develop a translation system for conversion of the HLL into horizontal microcode [15]. Unfortunately, this approach is complicated and difficult to do because of the following problems [17], [22] - [24]. 1) Machine Variety and Complexity [21, [11], [12]: The object code of the microprogramming system is the hardware- oriented and timing critical horizontal microcode. The micro- programmer has to be familiar with the various complex and intricate features of these machines before he can provide the translation system with this machine dependent infornation. 2) System Portability: The time and effort needed to pro- duce the translation system should not be wasted when chang- ing the underlying machine. Furthermore, we should be al- Manuscript received May 1, 1980; revised December 29, 1980. This work was supported in part by the National Science Foundation under Grant MCS76-207 10 in cooperation with the Department of Computer Science, Oregon State University, Corvallis, OR. P-Y. R. Ma is with the TRW Systems Group, Redondo Beach, CA 90278. T. G. Lewis is with the Department of Computer Science, Oregon State University, Corvallis, OR 97331. lowed to implement the firmware of a system in parallel with the hardware. Thus, the firmware engineer need not be aware of the underlying machine. But, the microprogrammable machine variety is in direct opposition to system portability and machine independence. 3) Concurrency Utilization: One benefit of microprogram- ming is that the horizontal microinstruction formats offer added speed of machine operation only if concurrent micro- operations can be combined into a single microinstruction [1 ], [4], [18] - [21]. The concurrency detection rules reported in the literature are usually machine dependent [17], and the optimization algorithms which determine optimal compaction of a sequence of microoperations is an NP hard problem [7]. Devising a practical compaction algorithm is still an open re- search question [5]. To study these problems we designed and implemented a compiler to produce microcode. This compiler, denoted mi- crocompiler, is shown in Fig. 1. The application program is encoded in a high-level language (HLL). A machine-indepen- dent partial compiler is used to perform the lexical, syntax, and data flow analyses of the HLL and produces a stream of machine-independent intermediate language (IML) statements. The IML statements consist of: 1) a set of instructions in quadruple format which are produced from the HLL and 2) a declaration portion to record all HLL information, e.g., a symbol table obtained from the partial compiler. The IML version of the intended application program and the underly- ing machine information are input to the machine dependent translation subsystem. The underlying machine is described by a set of microoperations (Mi/i = 1 * - *n). Each microop- eration is a machine primitive operation and represented by <OP, I, OXF, P>, where OP = function, I = input data set, O = output data set, F = microinstruction field, and P = clock phase. This is called the field description model and is the collection of microoperations along with a methodology for checking the parallelism among microoperations. By parallelism we mean that two microoperations can be executed in one control store cycle. It is also possible that two or more microoperations may be invertible. Invertibility means that a particular microoperation can be exchanged with certain preceding microoperations. Invertibility is the source of NP-completeness in microcode optimization, as we see later. The translation system requires three passes over the IML representation of the source code to produce compact micro- code. In Pass 1 a macro table is used to translate the IML 0098-5589/81/0500-0261$00.75 © 1981 IEEE

Transcript of On the Design of a Microcode Compiler for a Machine-Independent ...

Page 1: On the Design of a Microcode Compiler for a Machine-Independent ...

261IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

On the Design of a Microcode Compiler for aMachine-Independent High-Level Language

PERNG-YI RICHARD MA AND TED G. LEWIS

Abstract-A translator system employing a partial compiler, inter-mediate language, and three-pass code generator is described that pro-duces compact microcode for a class of horizontal microinstructionmachines.The techniques described include a system of translation clues, tags,

and rules for identifying program variable usage patterns, flow-of-con-trol, and parallelism. The system has been implemented as a HLL com-piler, a simulator, and an optimizer for a class of horizontal machines.

Index Terms-Code compaction, code optimization, high-level lan-guage compilers, microcode optimization, microprogramming, portabil-ity, translator writing systems.

I. INTRODUCTIONTHE explosive rate of progress of hardware technology has

made the microprogrammable processor characterized byhorizontal microcoding extremely attractive for many high-speed and real-time applications, such as signal processors.However, the human microprogrammer currently has littlebut assembler and text editor to help him with code develop-ments [5]. The lack of software tools to support microcodegeneration results in high costs and poor reliability especiallywhen the volume of microcode increases. The application ofsoftware tools to microprogramming will be termed "firmwareengineering" in the remainder of this paper.The most obvious solution is to increase firmware tool capa-

bility by encoding the intended application program in a high-level language (HLL), and then to develop a translation systemfor conversion of the HLL into horizontal microcode [15].Unfortunately, this approach is complicated and difficult todo because of the following problems [17], [22] - [24].1) Machine Variety and Complexity [21, [11], [12]: The

object code of the microprogramming system is the hardware-oriented and timing critical horizontal microcode. The micro-programmer has to be familiar with the various complex andintricate features of these machines before he can provide thetranslation system with this machine dependent infornation.2) System Portability: The time and effort needed to pro-

duce the translation system should not be wasted when chang-ing the underlying machine. Furthermore, we should be al-

Manuscript received May 1, 1980; revised December 29, 1980. Thiswork was supported in part by the National Science Foundation underGrant MCS76-207 10 in cooperation with the Department of ComputerScience, Oregon State University, Corvallis, OR.P-Y. R. Ma is with the TRW Systems Group, Redondo Beach, CA

90278.T. G. Lewis is with the Department of Computer Science, Oregon

State University, Corvallis, OR 97331.

lowed to implement the firmware of a system in parallel withthe hardware. Thus, the firmware engineer need not be awareof the underlying machine. But, the microprogrammablemachine variety is in direct opposition to system portabilityand machine independence.3) Concurrency Utilization: One benefit of microprogram-

ming is that the horizontal microinstruction formats offeradded speed of machine operation only if concurrent micro-operations can be combined into a single microinstruction [1 ],[4], [18] - [21]. The concurrency detection rules reported inthe literature are usually machine dependent [17], and theoptimization algorithms which determine optimal compactionof a sequence of microoperations is an NP hard problem [7].Devising a practical compaction algorithm is still an open re-search question [5].To study these problems we designed and implemented a

compiler to produce microcode. This compiler, denoted mi-crocompiler, is shown in Fig. 1. The application program isencoded in a high-level language (HLL). A machine-indepen-dent partial compiler is used to perform the lexical, syntax,and data flow analyses of the HLL and produces a stream ofmachine-independent intermediate language (IML) statements.The IML statements consist of: 1) a set of instructions inquadruple format which are produced from the HLL and 2)a declaration portion to record all HLL information, e.g., asymbol table obtained from the partial compiler. The IMLversion of the intended application program and the underly-ing machine information are input to the machine dependenttranslation subsystem. The underlying machine is describedby a set of microoperations (Mi/i = 1 * - *n). Each microop-eration is a machine primitive operation and represented by<OP, I, OXF, P>, where OP = function, I = input data set,O = output data set, F = microinstruction field, and P = clockphase. This is called the field description model and is thecollection of microoperations along with a methodology forchecking the parallelism among microoperations.By parallelism we mean that two microoperations can be

executed in one control store cycle. It is also possible thattwo or more microoperations may be invertible. Invertibilitymeans that a particular microoperation can be exchanged withcertain preceding microoperations. Invertibility is the sourceof NP-completeness in microcode optimization, as we seelater.The translation system requires three passes over the IML

representation of the source code to produce compact micro-code. In Pass 1 a macro table is used to translate the IML

0098-5589/81/0500-0261$00.75 © 1981 IEEE

Page 2: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

machineindependentcompiler

HLL (specifying the intended application)

partial conmpilersyntax, lexical anddata flow analyses

IML(machine independent intermediate step)j 41

Pass 1

usina the macro tableto decode the rML

MDIL

machi nedependenttranslationsystem

Macro table

using the microoperationsto define the IML instructionset

IT,

(Machine dependent code)

(Microoperations)

Pass 3

-Allocating micro-operations intoni croi nstructions

- address assignment

target machinemicrocode.

Field Description Model

* define the targetmachine nicrooperations

define the microoperationsparallelism detection rules

Fig. 1. General structure of microcode compiler.

into a set of machine dependent statements (MDIL). Foreach MDIL statement some operands are bound to machineunits. Some operands still hold the symbolic variables fromIML.Pass 2 allocates the remaining symbolic operands of MDIL

to one of the target machines' general purpose registers andconverts the MDIL statement to the corresponding binarymicrocode.In general, the number of symbolic variable operands in a

given program is greater than the number of machine registers.Thus, each register must be shared by more than one symbolicoperand. A register allocation and deallocation scheme swapsoperands between the machine's main memory and its generalpurpose registers. Within the MDIL stream, different symbolicvariables may use a register at the same time when there is morethan one branch statement leading to the same label statement.A control flow analysis algorithm minimizes the number ofswaps between machine memory and its registers. After alloperands of a MDIL statement have been allocated registers,the field value and timing phase are assigned to each statement.The resulting codes from Pass 2 are called microoperations.Pass 3 is used to increase the system throughput by com-

pacting the sequence of microoperations into the least num-

ber of horizontal microinstructions. However, completeoptimization of microoperations produced from a portablehigh-level language is known to be an NP-complete problem(7]. Thus, a linear order algorithm is developed which may

not produce optimum compaction, but produces a "best" pos-sible compacting given linear time to scan the stream.

Finally, the microcompiler was tested and studied using a

HLL called VMPL and the PDP 11 /40E as the target machine.VMPL (virtual machine programming language), is a machineindependent structured programming language used to imple-ment emulators. However, the design of a microcompiler dis-cussed here is not restricted to a particular HLL or a specificmicroprogrammable machine. Any IML can be designed fora class of problems and freed from a specific hardware opera-tion set. In fact, it is desirable to alter the IML instruction setto better reflect the HLL being translated. The field descrip-tion model is intended to be machine independent format andseparated from the compiler. The contents of the model andthe macro table are redefined in order to implement the micro-compiler for another microprogrammable machine.

II. PARTIAL COMPILERThe partial compiler translates the HLL source program into

intermediate form by lexical scanning and syntax analysis justas any other compiler does; however, the unusual demandsplaced on the performance of a microprogram mean we mustuse additional information to produce the "best" possible ob-ject microcode. Thus, code optimization is aided by the par-

tial compiler in three general areas:

1) attaching register allocation "clues" to symbolic variablesin order to guide the microcode generator into a "least amountof swapping" pattern;2) attaching flow-of-control tags to the segments of straight-

line microcode so that a program flow analysis can be done,this assists both variable-to-register binding and microcodecompaction algorithms;3) producing an IML format which provides opportunities

for subsequent code optimization.The first area is satisfied by the partial compiler using a

three-character clue attached to each entry in the symboltable:

1) Scope:Local variable-Global variableSubprocedure name.

2) Activity:Temporary variablePermanent variable.

3) Type:Simple variableMemory

StackFieldFlag (condition code)Parameter (Actual)Parameter (Formal)ProcedureConcatenated variableConstant.

\ _Pass 2

. Allocating the IMLvariable into targetregisters

-linking the control_flow

262

\

Page 3: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A MICROCODE COMPILER

The second area is accommodated by the partial compilerusing additional tags or modifiers which aid the code compac-tion algorithms:

Tags:Block code start/stopLabel of IF-THEN-ELSE branchLabel of a GOTOLabel of a FOR loopLabel of an EXIT (from a procedure)Label of a CASE selector.

Finally, the design of an IML (machine independent inter-mediate language) is guided by the application, class of under-lying machines, and overall firmware engineering goals. Malik[13], [14] studied several candidates for a suitable IML.

IML CANDIDATES:3-operand format2-operand format1-operand format (stack)Polish NotationProgram Tree

He studied these five formats with respect to the number ofinstructions instruction "size," stack and register require-

Proc: DECODE;dcl global use IR, OPCD;OPCD = OPCODE (IR);select (OPCD, 8) from;(0, MRI);(1, MRI);(2, MRI);(3, DCA);(4, JMS);(5, JMP);(6, 10);(7, OPT);endselect;

This is a piece of VMPL source code taken from a programto emulate the PDP-8 instruction set [13], [17]. ProcedureFETCH fetches an instruction IR from main memory locationPC. The PC is incremented to point to the next instruction inMEM. The DECODE procedure extracts the opcode from theOPCODE field of the instruction stored in IR. Then, if OPCDis 0, 1, or 2, the MRI (memory reference instruction) group isexecuted by branching through the select clause.The IML version of this piece of HLL source code is given

below along with (added) comments to document the cor-respondence with the VMPL statements.

PC +T.001 : read MEM (PC) into a temporary T.001.IR : assign IR=T.001.

: increment PC.

OPCODE IR

SLCT OPCDCoC1C2C3C4CsC6C7

C8SMRISMRISMRISDCASJMSSJMPSIOSOPT

OPCD : extract field OPCODE from IR and: assign it to OPCD: select one of 8 possible branches: branch to subprocedure MRI

: branch to subprocedure DCA: branch to subprocedure JMS: branch to subprocedure JMP: branch to JO

: branch to OPT

ments, complexity of interpretation, and Halstead's informa-tion theoretical measure of "level" [25].The 3-operand format (quadruples) yielded the le'ast execu-

tion time estimates, the highest level in Halstead's measure

[25], and provided the greatest opportunity for subsequenteconomization of microcode.The results of the partial compiler are machine independent

as illustrated by the following example of a VMPL source

code segment and its resulting IML object code.Proc: FETCH;

dcl global use MEM, IR, PC;IR=MEM [PC];inc PC;

The temporary variable T.001 is used as the intermediatestep in case the machine cannot read the data from memoryto the general purpose register directly. This temporaryvariable may be redundant if the data can be transferred frommemory to register directly. However, the code efficiency isnot affected because the redundant variable is able to beeliminated by the code optimization phase later.This example also illustrates the system of tags used to aid

in code optimization. The + and - tags on temporary variableT.001 indicate that it -can be removed because its "life" isbrief. The constants are tagged using a C prefix and the sub-procedures are tagged with an S.The IML overall structure consists of a program portion and

FETCH:

RMOVEMOVEINC

DECODE:EXTR

MEM-T.001PC

263

Page 4: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

delayed information portion. The program portion has a dec-laration part and main body. The main body is in single entry-multiple exit block structure and used to represent the applica-tion program. The declaration part declares all variables usedin the main body. The delayed information portion is createdto contain information specified in the HLL program butcannot be directly supported in the IML main body due toinfeasibility or highly machine dependent features, e.g.,machine implicit I/O request.The partial compiler passes this form of the translated HLL

program to a three-pass code generator that binds symbolicvariables and instructions to the real machine, and compactsthe resulting bound instructions into the machine microin-structions. This three-pass process is the subject of the follow-ing sections of this paper.

III. FIELD DESCRIPTION MODEL TO DESCRIBE THETARGET MACHINE

The following goals are set up for designing the targetmachine model used in the microcompiler system:

1) the format of this model is machine independent sothat it easily fits other machines,2) the model is comprehensive in that it includes all- target

machine information needed to translate from symbolic IMLcode into actual target machine binary code,3) this model provides an easy way to detect possible con-

currency between any two operations.Several other researchers have suggested similar models.

For example, Dasgupta [3], [4] describes a target machinein terms of a sequence of microoperations and all the hard-ware units used by the microoperation. This model is a steptoward a general description system, but does not provideall of the information needed for concurrency detection of themicrooperations. DeWitt [6], [7] describes a target machineas a set of blocks and a set of configurations. Each block con-sists of the concurrency microoperations and each configura-tion describes the legal combination of blocks. This modelprovides a correct method to detect the concurrency of themicrooperations, but much effort is needed to identify legalblocks and legal configurations. These two models suggest amodified model called the field description model whichmeets the goals proposed previously.The field description model (FDM) is a compact representa-

tion of the machine software and hardware capability. It con-sists of a set of machine microoperations which replace theintermediate language (IML) instructions produced by thepartial compiler; see Fig. 1. The FDM is generally dividedinto two parts [10]: hardware description and softwaredescription.

A. Hardware Description

The following information about the machine *must beknown:

1) word size and memory size,2) arithmetic mode,

3) status registers used to display flag settings, e.g., carry,overflow,

4) storage devices,a) data memory used to store all variables declared in

IML,b) control memory used to store the final version of

application program,c) general purpose registers (GPR's) used to temporarily

hold the IML variables,d) working registers used to perform ALU operations

(in most machines, working registers and the GPR arethe same), and

e) any other machine units,5) the method used to determine the next microaddress.

B. Software DescriptionFrom the functional behavior viewpoint, a microprogram-

mable machine consists of a set of microoperations encodedand stored in a control memory. A FDM is simply the collec-tion of these microoperations.

FDM = {Mi/i= I ... n}.Each M1, identified by a unique index i, is in turn defined bya set of five tuples

Mi= {OP,I,O,F,P}jand each tuple is expanded by specifying its domain. Eachdomain enumerates all the legal values which the componentcan assume. The tuple components are

OP: Designates the primitive operation to beperformed.

I: Denotes the resource used as the input of the OP.0: Denotes the resource used as the output of the OP.F: Denotes the set of fields which are occupied in the

microinstruction format when <OP, I, 0> isexecuting.

P: Denotes the set of timing phases at which the <OP,I, O> can execute.

The following example will illustrate this idea.Example: From Appendix A, one of the microoperations

in the PDP1 1/40E [17] is described by

Mi =<ADD, I, O, F, P>

where1) the domain of I is register B and the set of the general

purpose registers,2) the domain of 0 is register D (see Fig. 2),3) the domain of P is pulse 2 (see Fig. 3),4) the domain of F is as follows (the meaning of each field

is described in Appendix A):Field 1 specifies one register from the set GPR,Field 13 specifies the next address,Field 6 = 9 (specifies the operation ADD),Field 2 =1 (allows Field 1 to be used as a source ofgeneral register address),

264

Page 5: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A MICROCODE COMPILER

Extension

BUS

CL1

Fig. 2. Simple diagram of PDP1/40E CPU.

p1

E-----140 ns -

p2

CL2 200 ns

P2 P3

CL3 200 ns 00 n

300ns >

Fig. 3. PDP1/40E processor clock.

Field 5 = 0 (B register -e B mux),Field 19 = l (allows clocking the ALU into D regis-

ter); the remaining fields are not used in the micro-operation.

5) the domain of OP is operation ADD.End Example

It will be easier to understand this model ifwe examine howthe five-tuple of each microoperation affects: 1) OP and I/Oresources, 2) timing phases, and 3) field tuple.<OP, 1,O>: The set of operations <OP,> in the FDM

must be able to express the instruction set of the IML. Toeach OPi, the I/O resources must be selected so that the ex-

ecution of <OPi, Ij> will leave the correct result in the out-put resource <Oi>.<Timing tuple>: The execution of a microinstruction is

controlled by the fixed control store cycle (CS cycle). Withinthis cycle most machines provide multiple phases (polyphases)of timing periods for each microinstruction. In this paper thecontrol cycle is logically broken into several distinct phasesand control signals are issued at each phase, and each micro-operation is assigned to its associated phases.For example, in the PDPI 1/40E machine [8], [9] there are

three control store cycles listed in Fig. 3: 1) cycle 1 generatespulse P1; 2) cycle 2 generates pulse P2; and 3) cycle 3 gen-erates pulse P2 and pulse P3. Cycle 3 is further divided intotwo phases, phase 1 and phase 2. The control signal pulse P2is issued during phase 1, and the control signal pulse P3 iSissued during phase 2. In the preceding example, phase 1, P2,is assigned to the microoperation "ADD."<Field tuple>: Each microoperation has fixed fields in the

microinstruction format where binary microcode is assigned.The set of fields of each microoperation can be considered alogical operational unit occupying part of a microinstructionfor a period of time during its execution. For example, thePDPI 1/40E microinstruction format is divided into 22 fields(see Appendix A). In the execution of microoperation "ADD"(see preceding example), six fields are needed to define thelogical operational units activity to process the data during acertain timing period with reference to the control storecycle.

C. Concurrency (Parallelism) Detection RulesSome definitions are needed before the concurrency detec-

tion rules can be explained. We adopt the notation andschema first proposed by Dasgupta [3], [4] to describeparallel microoperations and sequential microinstructions.Microoperations Mi and Mj are said to be data independent

if Ii noj = Ij l oi = Oi n oj = empty set. Otherwise, there isan I/O conflict between Mi and Mj. Mi is said to precede Mj insequential order, if they are in separate CS cycles and Mi isexecuted prior to executing Mj. A field conflict between mi-crooperations occurs if the same field is used by these micro-operations in the same CS cycle. But, there is a special kindof field tuple which can be shared by more than one micro-operation in the same CS cycle as long as the values assignedto the field of each microoperation are the same. For ex-ample, a literal field can be shared by microoperation in thesame CS cycle if the field values are the same. Obviously, ifthe literal field value of one microoperation is different fromthe others, it will cause a field conflict. Microoperations Miand Mj are in parallel, denoted Mi//Mj, if they can be executedin the same control store cycle and produce the same outputas if executed sequentially in separate control store cycles.Two microoperations Mi and Mi+j are said to be invertible,denoted by Mi >< M; + I , if the execution of Mi and Mi + 1

yields the same result as the execution pfMi +, and Mi.Machine constraints on the microoperation may be different

for each machine. Therefore, we seek general rules that workfor a class of horizontal machines. This is done by generalizingthe structure of an arbitrary machine by describing the ma-chine as a set of m-icrooperations in five-tuple format.General Rules: We assume every microinstruction is com-

pleted within a control store cycle. This cycle is divided intoseveral minor phases and each microoperation is assigned tothe corresponding phases. Given two microoperations Mi andM1, and Mi precedes Mj in sequential order, the timing phasesused to execute Mi and Mj are denoted by Pi and Pj, respec-tively. The general rules are as follows.

265

Page 6: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

BEGINCASE "THE RELATIONSHIP BETWEEN Pi AND Pi " OF:

WITHIN THE SAME CONTROL STORE CYCLE, Pi IS PRIOR TO Pj:IF THERE IS NO FIELD CONFLICT THEN Mj//Mj

WITHIN THE SAME CONTROL STORE CYCLE, Pi IS NOT PRIOR TO Pj:IF THERE IS NO FIELD CONFLICT AND DATA IS INDEPENDENTFROM EACH OTHER THEN Mi//Mj.

ENDCASEIF Mi AND Mi +1 ARE DATA INDEPENDENT, THEN Mi IS

INVERTIBLE WITH Mi + 1END.

Example: We use the PDPl1 /40E as the target machine andthe following cases to illustrate these general rules. Pulses P2and P3 mentioned here are relative to cycle CL3 (see Fig. 3).

Case 1: M1: R2 -÷D, P2 : copy R2 to register D inphase P2

M2: D - R3, P3 : copy register D to R3 inphase P3.

M3: R3 + B - D, P2 : add R3 and register B toregister D in P2 .

M4: D-R4, P3 : copyregisterDtoR4 inP3.

M2 and M3 are examined to detect parallelism. Check OPMOVES and OP ADD in Appendix A; P3 is not prior to P2,M2 is not data independent from M3. This implies M2 not//M3. (If M2 and M3 are executed in one CS cycle, and M3is executed prior to M2, it will give the wrong result.)

Case 2: M5: PS -> stack, P3: copy the contents of the"PS register" to stack in P3 .

M6: R3 -*D, P2 : copy R3 to register D in P2.

Check OP PUSH3 and OPMOVE3 in Appendix A, (F5 n F6 =0) and (M5 is data independent from M6) imply M5 //M6,which is independent of timing sequence.

Case 3: M7: R3 + B-D, P2 : addR3 andBtoregisterDin P2 .

M8: D-R3, P3 : copyregisterDtoR3inP3.

The pulses used by M7 and M8 are P2 and P3, respectively.F7 n F8 =0 implies M7//M8, which is independent of I/Oconflict.END EXAMPLE.

Machine Constraints: Some examples from PDP1 1 /40E arenow used to illustrate the effect of machine dependence onthe parallelism detection rule.Example: In the FDM of the PDP1 1/40E the microopera-

tion FLAG is used to set the machine flags for the previousALU operation. Microoperation FLAG must be the next oneafter the ALU operation, and it cannot be moved even if in-vertibility is possible.Microoperation NOOP, which is used in an N-way branch

operation on the PDP1 1/40E machine, has its own fixed posi-tion. It cannot be moved and/or be made parallel with othermicrooperations even if the general rule indicates parallelism.END EXAMPLE

The microoperations used for these special purposes lead to

restrictions on the parallelism detection rules. Therefore, theusers of this model must provide these kinds of machine-dependent rules in addition to the general rules.

IV. PASS 1Pass 1 maps the IML version of the application program into

a machine dependent intermediate code (MDIL). The MDILinstruction is defined from the field description model withthe format <OP, I, O>, which excludes the field and timingtuples from the five-tuple <OP, I, 0, F, P> representation.The microoperations defined in the machine field descrip-

tion model are used to decode the instruction set of the IML.The delayed information portion in IML, which contains theitems specified in the HLL but cannot be directly supportedby the IML, are emulated by the machine microoperations.All the mappings from IML facilities to the machine micro-operations are stored in the Macro table. Pass 1 allocates thevariables in the IML declaration part into data memory anduses the macro table to decode the IML main body and thedelayed information.Example: The IML instruction for addition is given as

ADD SRCl SRC2 DEST: SRC1 + SRC2 -- DEST.

This IML statement is bound to a particular machine bymeans of macro expansion. The PDP-1 1 /40E expansion is

MOVE1 SRCI B : MOVE SRCl toregisterBADD SRC2, B D : SRC2 + [B] into register DMOVE5 D DEST : copy result to DEST.

Similarly, the IML instruction for subtraction and the cor-responding macro expansion are given as

SUB SRCl SRC2 DEST; SRC1 - SRC2 -* DEST.

The macro expansion is

MOVE 1 SRC2SUB SRC1, BMOVE5 D

BD

DEST

: MOVE SRC2 to register B: SRC1 - [B] -+D: Move [DI to destination.

In the above expansions the machine registers B and D aredefined by the PDP1 1/40E (8, 9). The symbolic variablesSRC 1, SRC2, and DEST are not bound at present.Suppose the following IML codes are to be decoded by

pass 1.

Declaration:Global W, ZLocal X, Y

266

Page 7: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A-MICROCODE COMPILER

Main Body:ADD W X Y :- W+X-+YSUB Y Z W: Y-Z-*W

Pass 1 stores the variables W, X, Y, and Z into the mainmemory, then uses the macro table to expand the IML state-ments into machine-dependent codes as follows.

MOVEI W B : move variable W to register B.ADD X, B D : add variable X and [B] and put into

register [D].MOVE5 D Y : copy result to variable Y.MOVE1 Z B : move Z to register B.SUB Y, B D : subtract register B from Y and put

into register D.MOVE5 D W: copy register D to W.

The variables W, X, Y, and Z are in the main memory. How-ever, the machine PDP1 1/40E cannot use the main memorywords as a working register of the arithmetic and logic unit(ALU) and it always takes a longer time to access the datafrom memory. Therefore, these symbolic variables are to bebound to the general purpose registers in pass 2 in such a wayas to minimize the access time to these variables.END EXAMPLE

The output of pass 1 is a collection of machine-dependentcode (MDIL) consisting of a set of blocks. The operands inMDIL are either the machine units defined by the FDM orthe symbolic variables which will be bound to the machinegeneral purpose registers in pass 2.

V. PASS 2The purposes of pass 2 are to allocate each symbolic operand

to one of the general purpose registers (GPR's) of the actualtarget machine and assign the corresponding binary code toeach statement of MDIL.In general, the number of variables can be assumed greater

than the number of GPR's in the machine. Swapping betweenGPR and memory is needed when the set of GPR's is full ofvariables and some new variable is to be allocated a register.As the number of swaps increases the efficiency of the ex-ecutable code decreases. The register allocation scheme isused to allocate the symbolic variables and reduce the re-dundant swappings.The blocks of MDIL codes can be analyzed for the flow of

control governed by branch statements and labeled state-ments. These two statement types divide the blocks into aset of straight line codes (SLC's) which are sets of single-entry single-exit segments.We define the "state" of a SLC as the assignment of op-

erands to GPR's for the given SLC. Upon entry to the SLCwe must define an initial state ISi for SLCi, and the finalstate FSi as the state of SLCi when register allocation iscompleted.A control flow interface problem occurs when one SLC

interfaces with another SLC. Refer to Fig. 4(a), SLCiIi= to nforward branches to SLCm . How is the initial state of SLCm

FS_ n-i

(a)

FSk

Isp

(b)Fig. 4. Forward and backward branching. (a) n final states FS1, - -- ,and FSn are to detennine ISm. (b) Final state FSq is determinedfrom ISp.

determined from the collective final states of SLCili = 1 ton?In Fig. 4(b) the backward region from SLCq to SLCp maybe a loop. How can we determine the final state of SLCq?These control-flow problems are solved next.

A. Register Allocation SchemeThe general idea of this scheme is to keep the variables in

their assigned registers as long as possible. When no registeris available for a newly encountered variable, a replacementpriority table is applied to determine the least likely usedvariable which will be moved out of the register. The newlyencountered variable is then assigned the free register.The replacement priority is determined by the status (active

or passive) and the kind (local or global) of each variable. Ifthe contents of a variable held in a register is different fromthe contents of its main memory location, then it is said to beactive. Otherwise, it is called passive. When an active statusvariable is to be deallocated, a memory write is needed toswap; it back to memory. However, memory write is notnecessary for a passive status variable. In order to reduceredundant swappings, a passive status variable is assigned ahigher priority to be replaced. A local variable is assigned ahigher priority to be replaced than the global variable sincelocal variables are available only in the current block. In the

267

Page 8: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

replacement priority table (Table I) the highest priorityvariable is the first one to be deallocated.

B. Initial State ofSLCThe initial state of SLCm, denoted by ISm, is defined as

the assignment of symbolic variables to GPR's immediatelybefore entering this SLCm. Refer to Fig. 4(a), the initialstate of SLCm is actually determined from the final statesof SLCii =I ton, and used as the basis to perform the registerallocation/deallocation scheme on the current SLCm.Assume that Rj is one of the machine general purpose

registers used to hold a symbolic variable from IML. Thevariable allocated to Rj in the final state of SLCk is denotedby FSA(k,j) and the associated status of the variable is de-noted by FST(k, j), where k = 1 to n. The variable and thestatus of Rj in the initial state ISm, denoted by ISA(m,j)and IST(m,j), respectively, are determined by the followingalgorithm.

BeginIF all variables held by Rj from n final states are the

same, i.e., FSA(k,j)=FSA(k+l,j)k=1ton- 1then begin

ISA(m,j) = FSA(k,j)IF all the FST(k, j), k = 1 to n, are passivethen IST(m, j) = passiveelse IST(m, j)= activeEnd

else begin (One or more than one variable is differentfrom others)

DO i= 1to nbegin

IF FST(i,j) = activethen "memory write Rj FSA(i, j )" is

inserted at the end of SLCi.end (loop)Rj is flagged as a free registerend (else)

end (algorithm)

The following example will illustrate this idea.Example (To Determine the Initial State of SLC): Assume

that a machine used two registers R, and R2 to hold thesymbolic variables from IML. Refer to Fig. 5, SLC1, SLC2,and SLC3 forward branch to SLC4.The final states FS1, FS2, and FS3 are given as follows.

TABLE IREPLACEMENT PRIORITY TABLE

priority(l=highest) action

passive status local variable not used in thecurrent block

2 active status global variable not used in thecurrent block

3 passive status global variable not used in thecurrent block

4 active status global variable not used in thecurrent block

5 passive status local variable will be used inthe curren block

6 passive status global variable will be used incurrent block

7 active status local variable will be used inthe current block

8 active status global variable will be used inthe current block

Fig. 5. Example forward branch diagram.

Final State of SLC3, FS3

Register Variable Status

R, AB Passive

R2 EF Passive

Final State of SLC1, FS1

Register Variable Status

RI AB Active

R2 CD Passive

Final State of SLC2, FS2

Register Variable Status

RI AB j_ Passive

R2 CD Active

From the above algorithm the initialas follows:

state of SLC4, IS4 is

Register Variable Status

R, AB Active

1(2 _.

The memory write statement MEMWRITE R2 CD: R2 -4

MEM(CD) is inserted at the end of SLC2 .END EXAMPLE.

268

Page 9: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A MICROCODE COMPILER

C. Final State of the SLCRefer to Fig. 4(b), showing the SLCq backward branch to

SLCp. The state immediately before the branch statementmust be the same as the starting state of the SLCp.The first problem to be determined is what starting state of

SLCp will be used to determine the final state FSq. From thelast section, ISP is the state right before entering the SLCp.The generation of ISp does not involve any register allocationaction inside SLCp. The next initial state of SLCp, denotedby NISp, is used to include the operand assigned to Rj and itsposition when the, Rj is first allocated in the register allocationscheme performed on- SLCp. The position of the operand iseither source or destination. Some memory accesses areneeded in the generation of NISp from ISp. In the worstcase the memory write (MEMWRITE) is used to deallocatethe active status variable in Rj of ISp, and a memory read(MEMREAD) is used to allocate a source operand to Rj ofNISp. If NISp is used to determine FSq, these two state-ments do not need to be executed when the backward branchoccurs, and can be moved out of the loop region.Example: To determine the NISp from ISp, assume the

ISp is

ISp

Register Variable

R, .AB (global)

TABLE IIINTERFACE FINAL STATE (FS) WITH NEXT INITIAL STATE NIS IN THE

LooP REGION

Status of FSA(Q,J) Position of NSA(P,J) *Extra statements usedto interface FSq withNIS q

Active Source MEMWRITE RJ FSA(Q,J)

MEMREAD NSA(P,J) RJ

Active Destination MEMWRITE R0 FSA(Q,J)

Passive Source MEMREAD NSA(P,J) RJ

Passive Destination None

* When SLC backward branch to SLCp, sorme memory access statementsq

which depend on the variable status of FS and the variableq

position in NISp are inserted at the end of SLCq to set FSq

equal to NIS .

Example: Refer to Fig. 4(b) and the previous example;the final state of SLCq is to be determined.Assume that the last statement of SLCq is BRANCH label 1;

branch to the first statement of SLCp. The state of SLCqright before this branch statement denoted by BSq is

R2 CD (local)

and the first statement of SLCp is

Label I MOVE EF AB: EF-AB.

Since all registers are full in ISp , the variable in R2 with higherpriority has to be deallocated before the variable EF can beloaded in. After this statement has been register allocated,the output and the NISp are

NTTP1op

Register Variable Position

R, AB destination

EF source Label 1: MOVE

NISp in SLCp from the previous example is used to deter-mine the final state of SLCq. Compare the NISp with BSq,

MEMWRITE R2 CD

MEMREAD EF R2

R2, RI

END EXAMPLE.

In the backword branch case [Fig. 4(b)], if the variable inRj of FSq, denoted by FSA(q, j), is different from the vari-able in Rj of NISp, denoted by NSA(p,j), some memoryaccess statements described in Table II are used to forcethem equal. This idea is shown in the next example.

the variables in RI are different. From Table II a memorywrite is inserted before the branch statement. That is,

MEMWRITE RI GH: R1 -MEM(GH)BRANCH Label 1:

and the final state is

Register Variable Status

RI GH Passive

R2 EF Active

Note: The MEMREAD statementis not necessary, since variable"AB" in SLCp is used as adestination.

269

R2

Page 10: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

Combining this example and the previous example, the out-put codes are

-MEMWRITE R2 CD: R2 -> MEM(CD)MEMREAD EF R2: MEM (EF) -- R2

Label l MOVE R2 R1: R2 -R,

- - - - end of SLCp and start with SLCq - - - - -

MEMWRITEbranch

R1 GH: R1 MEM(GH)label 1

The reader' may compare the inefficiency resulting if ISp isused as the reference state by FSq.END EXAMPLE.

D. Output ofPass 2Finally, pass 2 produces an output as a collection of five-

tuple format microoperations consisting of a set of SLC's.This five-tuple format provides a convenient way to allocatethe microoperations to microinstructions which are then usedin pass 3.

VI. PASS 3

This pass uses a set of rules to detect the concurrency ofmicrooperations and combine sequences of microoperationsinto shorter concurrent microinstructions, or what we ab-breviate as MI's.The MI sequence is optimized if it is impossible to rearrange

the sequence of microinstructions, in a manner that will pro-

duce fewer microinstructions. DeWitt [7] proved that thiskind of absolute minimal reduction problem is an NP-com-plete problem. Here, we first show why the optimizationproblem is NP-complete problem. Then by seeking a near-

optimal solution rather than the absolute, we get a fast algo-rithm of complexity proportional to mn, where m is a prag-

matically determined constant less than n.

A. Theoretical Constraints on Optimization ProblemFirst of all, we examine why the optimization problem is

NP-complete. Given a SLC = {M1, M2,.**Mk,* *Mn} and MIi refers to the ith microinstruction. (Note: Miis a single microoperation, while MIi is a microinstructionwith several Mi's.) As Mk is allocated, the possible relation-ships between Mk and MIi are (refer to Table III):Case 1: Mk not ><MIi, and Mk not // MI1,Case 2: Mk not ><MIj, and Mk MIi,Case 3: Mk ><MIj, and Mk not // MIi,Case 4: Mk ><MIm, and Mk // MI-.If Mk is invertible with MIi (Case 3 or 4 of Table III), it may

be moved past MIi and the same test applied to MI-1 . Onthe other hand, if Mk is not invertible with MIi (Case 1 or 2),it is blocked by this microinstruction [4].Let us consider the worst case.

S = {M1 ... Mn}; assume every Mk is invertible with everyother, but not parallel. M1 is allocated in MI1; Mj is to bedetermined, 2 < j < n.

j = 2, there are 2! possible positions for M2, {M1}, {M2},or {M2}, {M1}.

j = 3, there are 3! possible positions for M3 .

j = n, there are n! possible positions for Mn.

Totally, there are 2 j! possible positions in which to allo-cate these n microoperations.

Clearly, this is a very special case, since if we know in ad-vance that there is no parallelism among microoperations, itis not necessary to check these positions. We just use n MI'sto allocate the n microoperations. The problem is that allthe relationships are not known until we check the last micro-operation in the SLC. The allocation of microoperations de-pends not only on the microoperations ahead of it, but on themicrooperations after it. The best position of microopera-tions cannot be decided until every possible combination ofmicrooperations is checked. We can see that invertibility iswhy the problem is NP-complete.On the other hand, the data dependency among microopera-

tions is obvious and limits the invertibility considerably. Inthis case, it is hard for a microoperation to cross too many mi-crooperations ahead of it. A limitation of the times ofcompar-ing a microoperation with other microoperations is necessary.

B. Linear Order Compaction AlgorithmIn order to get a practical and efficient algorithm, we impose

the following restrictions:1) The position of microoperation Mk is computed by

searching backward over the previous microinstructions lead-ing up to microoperation Mk.2) In each case of Table III we make the following decision:

Case 1: {Mk}l MIi+ I

Case 2: {Mk} -MIjIn the next two cases we compare Mk m times with the

previous microoperations. In other words, Mk can comparewith h MI's from MI1 to MI h+l, where h is a number ofMI's and Zj=' IMIi-jl is nearest to m. (IMIkI means thenumber of microoperations in MIk.)

Case 3: If Mi is invertible with all MI's not parallel,then {MK} +MIi+l

Case 4: Compare Mk with Mi -j, 0.j.h- 1, untilwe find the MI nearest to MI1 that can acceptMk.

Thus, we use invertibility and parallelism between micro-operations and MI's to get a compaction algorithm (see Ap-pendix B). Now we consider the computational complexityof this algorithm to allocate n microoperation, using thenumber of comparisons between pairs of microoperationsas a measure of this complexity. In the algorithm, Mk islimited to make m comparisons with the preceding micro-operations. If K S m, at most, k comparisons are necessaryto optimally place Mk. If k > m, Mk requires a total of m

270

Page 11: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A MICROCODE COMPILER

TABLE IIIPOSSIBLE POSITIONS OF MICROOPERATIONS IN THE ALLOCATION PROBLEM

Possible position MI MI MI MIcase number i+l i 1 i-i

M Not><MIi,Mk Not //MI1

Case 1 X

M Not>< MI M // MIk i kCase 2 X X

M ><Ml Mk Not r/ lIk k

Case 3 X Aj

M>< MI ,M /MI x x Ak i k 1

Case 4

X: M can be in this positionk

Z: Check M with the MI ahead of the current one andk

determine which case it belongs to.

comparisons before (suboptimal) allocation. Indeed, if thisoccurs for each microoperation: M1 nMn the total numberof comparisons is

Tm E m jm(m + I) + m(n -m)j=l j=m+i 2

1 1=nm+-m--m

2 2

Therefore, the algorithm complexity is proportional to n.Now we will pragmatically determine the value of m.

C. Determination ofm in the Linear AlgorithmThe compaction algorithm was applied to the abstract

Husson machine [10] and the PDP1l/40E to determine the"best" width of the MI, and the best value of m in the linearalgorithm. The width of the MI means the number of micro-operations allowed in the MI.Early results indicate that four microoperations is the limit-

ing width of a MI for a microprogrammable machine [17].Beyond this number, data dependency among microoperationslimits the compaction of microoperations into MI's. Thisconstraint leads to the following conjecture in the determina-tion of the value of m. While we have no way of proving thisconjecture, it is in line with independent work of Mallett [161.Conjecture: Given a horizontal microprogrammable micro-

instruction width W, and a linear compacting algorithm thatlocally compacts the straight line code segments of length n,using peephole size m of time complexity mn, then

m =2W

produces compact code within 10 percent of optimal.For example, applying our conjecture to Mallett's results

[16], W= 4, so m = 8. Thus, 8n comparisons are requiredfor a code- segment of length n. In the 10 tests reported by

Mallett [16], an average of 5.6n comparisons were performedto compact code to within 3 percent of optimal. Thus, theconjectured value of m = 8 appears to safely be conservativewhen used to explain Mallett's results [161.

VII. CONCLUSIONThe techniques described in this paper have been success-

fully applied to the design and implementation of a HLL com-piler for microprogramming [13], [17]. The main goal ofefficient compaction of parallel microcode in a horizontalmicroprogrammable machine has been demonstrated [24].The system runs as a cross-compiler on a Cyber machinewhich downloads to a PDP-1 1 /40E. The system is not portable,but produces transportable code. The HLL syntax can be-changed by specifying new syntax rules. The object machineis changed by specifying a new FDM and the set of macros.The system includes a simulator for testing IML code beforepasses 1, 2, and 3 are completed.

Clearly, the work reported here is experimental and tenta-tive. Many questions remain unanswered, for example:1) HLL Determination: How to determine a micropro-

gramming HLL as to its capabilities such as: 1) to describethe intended application algorithm; 2) to be compiled tomicrocode efficiently?2) Hardware Mapping: How to sufficiently use the hard-

ware features to decode the machine independent intermediatelanguage (IML)? This issue is always contradictory to the sys-tem portability mentioned in the first section.3) Target Machine Selection: How to select a machine

which can produce the minimum object code for the applica-tion algorithm coded in HLL?4) A machine description language is needed to describe

the target machine in high-level terms which produces theFDM automatically.The early indications are that register allocation schemes

271

Page 12: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

Fields

OP I 0 T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

ADD GPR, B D P2 *1 1 0 9 *2 1

SUB GPR,B | D P2 *1 1 0 6 8 *2 1

MOVE3 GPR D P2 *1 1 0 *2 1

MOVE5 D GPR P3 *11 2 3 *2

PUSH3 PS TOS P3 0 6 3 *2 8 1

FLAG C,V,N,Z P1 3 *2

NOOP XUPF Label P1 *2

PUSH P1 *2 11

*1: This field value will be determined by the selection of GPR.*2: This field value will be determined by the next address.

and microcode compaction as described in this paper andelsewhere are solved problems [1], [4], [6], [7], [16], [18],[20]. However, preliminary experience gained from usingthis system suggests that allocation and compacting are minorsources of inefficiency when compared with the problemslisted above. A casual examination of several tested programrunning on a target machine PDPll/40E indicates that thenumber of microoperations generated from the hardwaremapping is often 2-3 times larger (counting the instructionwords) than the number of IML codes. However, it is un-likely that compaction algorithms will be able to improvecode by more than 30 percent unless target machines are de-signed to encourage greater levels of concurrency. Particu-larly, the machine selection is an important factor to mini-mize the object code. More experimental studies on a varietyof machines are needed to make conclusive statements aboutcoding efficiency.

APPENDIX ATHE PARTIAL FDM OF PDPI 1 /40E

The PDPI 1/40E [8], [9] simple diagram of CPU and theprocessor clock are shown in Figs. 2 and 3, respectively. Themicroinstruction format is divided into 22 fields. Each fieldis defined as follows.

Field 1 specifies one register from the GPR's.Field 2 selects the source of Field 1.Field 3 selects input to Bus Address MUX.Field 4 selects DMUX.Field 5 selects input to BMUX.Field 6 selects ALU function.Field 7 selects loading and clocking the PS word.Field 8 sets discrete alteration of data.Field 9 selects BUS READ or WRITE.Field 10 allows clocking the BA register.Field 11 loads GPR from DMUX.Field 12 selects the processor clock length.Field 13 selects the next microaddress.Field 14 selects the destination from DMUX.

Field 15 limits the left mask.Field 16 limits the right mask.Field 17 selects the shift count field.Field 18 sets the constant value.Field 19 allows clocking the ALU into the D register.Field 20 allows clocking DMUX into B register.Field 21 turns off the processor clock.Field 22 enables stack push or pop.

There are 41 microoperations defined in the PDPI 1/40EFDM [17] which will be used to decode the Malik's IML[13]. Some microoperations used in this paper are shownin the table at the top of this page.

APPENDIX BCOMPACTION ALGORITHM

Program: 0(n) local compaction algorithm.Data:

1) SLCp, M1, M2, Mk ... Mn is to be processed.2) When Mk is allocating into MI, we assumed Ml ...

Mk-l has been allocated into MI, ... MIj already.3) n is the number of microoperations in SLCp.4) m is the maximum number of comparisons which is

allowed by the algorithm when a microoperation is allocatinginto MI.

5) k is the current microoperation index.6) j is the current MI index.7) KK records the last MI index which is parallel with the

Mk.8) JJ records the last MI index.9) S is the counter to count the number of comparisons

when Mk is allocating.10) /MI/ is the number of microoperations in the MI.11) Mif/MI means this microoperation is parallel with all

microoperations in the MI. The same meaning is applied toMj >< MI.

12) >< and // are determined in Section III.

272

Page 13: On the Design of a Microcode Compiler for a Machine-Independent ...

MA AND LEWIS: DESIGN OF A MICROCODE COMPILER

BEGINSET S =0, KK =0, K = K + 1, AND J = JJFETCH THE NEXT MICROOPERATION, Mk.IF ALL MICROOPERATIONS IN SLCp ARE ALLOCATED ALREADY TO MIs

THEN PROCESS THE NEXT SLC.

(A)F- ELSE

(C) I--_______ t-_______

(B) I----------

BEGINS=S+IMIljIF Mk//MIj

THEN BEGINIF Mk >< MIj

THEN KK = j (RECORD THE MI WHICH IS // WITH Mk), GO TO C.ELSE ALLOCATED Mk INTO MIj, GO TO START.

-END

ELSE -BEGINIF Mk >< MI-

THEN F~BEGIN---IF S > M (THE # OF COMPARISONS > THE LIMITATION)

ELSE

- END

THEN GO TO B

ELSE BEGIN

J= J - 1IF J = O, GO TO B

ELSE GO TO A

END

END

-BEGINIF KK = (Mk HAS NEVER BEEN//WITH MIKK, KK < JJ)

THEN ALLOCATE Mk TO MIjj + 1

JJ = JJ + 1GO TO STARTELSE ALLOCATE Mk TO MIKK

GO TO START

- END

- END- END

REFERENCES

[1] T. Agerwala, "Microprogram optimization: A survey," IEEETrans. Comput., vol. C-25, pp. 962-973, Oct. 1976.

[2] A. K. Agrawala and T. G. Rauscher, Foundations ofMicropro-gramming Architecture, Software, and Applications. New York:Academic, 1976.

[31 S. Dasgupta, "Parallelism in microprogramming system," Ph.D.dissertation, Dep. Comput. Sci., Univ. of Alberta, Alta., Canada,Tech. Rep., Aug. 1976.

[4] S. Dasgupta and J. Tartar, "The identification of maximal paral-lelism in straight line microprograms," IEEE Trans. Comput.,vol. C-25, pp. 986-991, Oct. 1976.

[5] S. Davidson and B. D. Shriver, "An overview of firmware engi-neering," Computer, vol. 11, pp. 21-33, May 1978.

[6] D. J. DeWitt, "A control word model for detecting conflicts be-tween microprograms," in Proc. 8th Annu. Workshop Micropro-gramming, pp. 6-13.

[7] -, "A machine independent approach to the production ofoptimal horizontal microcode," Ph.D. dissertation, Univ. ofMichigan, Ann Arbor, 1976.

[8] S. H. Fuller et al., "PDP1 1/40E microprogramming referencemanual," Dep. Comput. Sci., Carnegie-Mellon Univ., Pittsburgh,PA, Jan. 1976.

[91 -, "The PDP1 1/40E maintenance manual," Dep. Comput.Sci., Carnegie-Mellon Univ., Pittsburgh, PA, June 1977.

10] P. Ma and T. G. Lewis, "On the design of a machine descriptionmodel and a compaction algorithm for microcode generation,"in Proc. Euro Micro 80 Symp., London, England, Sept. 1980.

[11] H. Katzan, Jr., Microprogramming Primer. New York: McGraw-Hill, 1977.

[12] S. S. Husson, Microprogramming: Principles and Practice. Engle-wood Cliffs, NJ: Prentice-Hall, 1970.

[13] K. Malik, "Optimizing the design of a high level language formicroprogramming," Ph.D. dissertation, Oregon State Univ.,Corvallis, OR.

[141 K. Malik and T. G. Lewis, "High level microprogramming lan-guage," in Proc. COMPCON 19 78, pp. 88-91.

l151 P. W. Mallett and T. G. Lewis, "Considerations for implementinga high level microprogramming language translation system,"Computer, vol. 8, pp. 40-52, Aug. 1975.

[16] P. W. Mallett, "Methods of compacting microprograms," Ph.D.

(START)

273

Page 14: On the Design of a Microcode Compiler for a Machine-Independent ...

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-7, NO. 3, MAY 1981

dissertation, Dep. Comput. Sci., Univ. of Southwestern Louisiana,Lafayette, LA, Dec. 1978.

[171 P. Y. Ma, "Optimizing microcode produced from a high levellanguage," Ph.D. dissertation, Dep. Elec. Comput. Eng., OregonState Univ., Corvalis, OR.

[18] M. Tabendeh and C. V. Ramamnoorthy, "Execution time (andmemory) optimization in microprograins," in Proc. 7th Annu.Workship on Microprogramming (preprints supplement), pp.119-127.

[19] M. Tsuchiya and C. V. Ramamoorthy, "A high level language forhorizontal microprograinming," IEEE Trans. Comput., vol. C-23,pp. 791-802, Aug. 1974.

[20] M. Tsuchiya and M. J. Gonzalez, "An approach to optimizationof horizontal microprograms," in Proc. 7th Workshop on Micro-programming, Palo Alto, CA, Sept. 1974.

[21] S. S. Yau, A. C. Schowe, and M. Tsuchiya, "On storage optimiza-tion of horizontal microprograms," in Proc. 7th Annu. Workshopon Microprogramming (preprints), pp. 98-106.

[22] P. Ma and T. G. Lewis, "Design of a machine independent, op-timizing system for emulator development," ACM TOPLAS,vol. 2, Apr. 1980.

[23] T. G. Lewis, K. Malik, and P. Ma, "Firmware engineering usinga high level microprograinming system to implement virtualinstruction set processors," presented at IFIPS Workshop, Linz,Austria, Apr. 1980.

[24] T. G. Lewis, P. Ma, K. Malik, and C. Liu, "On the problem ofportable microprogramming," Dep. Comput. Sci., Oregon StateUniv.. Corvallis, OR, Tech. Rep. TN79-3.

[25] M. H. Halstead, Elements of Software Science. North-Holland:Elsevier, 1977.

Perng-Yi Richard Ma was born in Taiwan, onMarch 25, 1951. He received the B.S.E.E. de-gree from Taiwan Maritime College in 1972,the M.S.E.E. degree in 1976 and the Ph.D. de-gree in 1978, both in electrical engineeringfrom Oregon State University, Corvalis.From 1978 to 1979 he worked in the Space

Division at Rockwell. Currently, he is withllTRW Systems Group, Redondo Beach, CA.

His present research interests include staticand dynamic allocation in distributed com-

puting systems, closed loop design among application tasks, softwaremapping systems, and network architecture.

Ted G. Lewis received the B.S. degree in math-"N", _ ematics from Oregon State University, Corval-

lis, in 1966, and the M.S. degree and the Ph.D.degree in computer science from WashngtonState University, Pullman, in 1970 and 1971,respectively.He taught at the University of Missouri at

Rolla and the University of SouthwesternLouisiana, Lafayette, before joining OregonState University, Corvallis, in 1976. He hasbeen an officer in ACM's SIGMINI, SIGSMALL,

and SIGMICRO organizations, and is the author of ten books and 35papers on topics ranging from data structures to personal computing.His current activities include serving as an Associate Technical Editorof IEEE Computer Societies COMPUTER Magazine, and director ofProject FS which conducts research in microprogramming, distributedoperating systems, and software engineering.

LR-Automatic Parser Generator and LR(1) ParserCHARLES WETHERELL AND ALFRED SHANNON

Abstract-LR is an LR(1) parser generation system. It is written en-tirely in portable ANSl standard Fortran 66 and has been successfullyoperated on a number of computers. LR uses a powerful algorithm ofPager's to generate a spaht Wfficient parser for any LR(1) grammar.Generated parsers have beeYl led in a variety of compilers, utility pro-grams, and applications packages.

Manuscript received June 1, 1979; revised January 22, 1980. Thiswork was performed under the auspices of the U.S. Department ofEnergy by the Lawrence Livermore Laboratory under ContractW-7405-ENG-48. This report was prepared as an account of worksponsored by the U.S. Government. Neither the United States nor theU.S. Department of Energy, nor any of their employees, nor any oftheir contractors, subcontractors, or their employees, makes any war-ranty, express or implied, or assumes any legal liability or responsibilityfor the accuracy, completeness or usefulness of any information, ap-paratus, product or process disclosed, or represents that its use wouldnot infringe privately owned rights. Reference to a company or prod-uct name does not imply approval or recommendation of the productby the University of California or the U.S. Department of Energy to theexclusion of others that may be suitable.C. Wetherell was with the Computing Science Group, Department

of Applied Science, University of California at Davis, Livermore, CA94550. He is now with Bell Laboratories, Murray Hill, NJ 07974.A. Shannon is with the Lawrence Livermore Laboratory, Livermore,

CA 94550.

Index Terms-Compiler construction, LR(1) parsing, programminglanguages.

T R is a pair of programs-an automatic parser generator andL.an LR(1) parser. The parser generator reads a context-free grammar in a modified BNF format and produces tableswhich describe an LR(1) parsing automaton. The parser is asuite of subroutines that interpret the tables to construct aparse of an input stream supplied by a (locally written) lexicalanalyzer. The entire system may be used to generate parsersfor compilers, utility routines, command interpreters, and thelike. LR and its predecessors have been in use at LawrenceLivermore Laboratory (LLL) for ten years. LR's outstandingcharacteristic is the ease with which new tables can be gen-erated to reflect a change in the language to be parsed. Thisflexibility is prized by programmers writing utilities and com-mand interpreters whose input languages typically grow andchange during program development. LR is written entirely inANSI standard Fortran 66 [1] and requires only minor changeswhen moved to a new computer.

0098-5589/81/0500-0274$00.75 i 1981 IEEE

274