The impact of 32 bit and 64 bit pointers on applications IMPACT OF 32 BIT AND 64 BIT POINTERS ON...

30
M A R C H 1 9 9 3 WRL Technical Note TN-33 The impact of 32 bit and 64 bit pointers on applications Jeffrey C. Mogul, Joel Bartlett, Jeremy Dion, Russell Kao, Bob Mayo, Louis Monier, Amitabh Srivastava Digital Internal Use Only d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA

Transcript of The impact of 32 bit and 64 bit pointers on applications IMPACT OF 32 BIT AND 64 BIT POINTERS ON...

M A R C H 1 9 9 3

WRLTechnical Note TN-33

The impact of 32 bitand 64 bit pointerson applications

Jeffrey C. Mogul,Joel Bartlett,Jeremy Dion,Russell Kao,Bob Mayo,Louis Monier,Amitabh Srivastava

Digital Internal Use Only

d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA

The Western Research Laboratory (WRL) is a computer systems research group thatwas founded by Digital Equipment Corporation in 1982. Our focus is computer scienceresearch relevant to the design and application of high performance scientific computers.We test our ideas by designing, building, and using real systems. The systems we buildare research prototypes; they are not intended to become products.

There is a second research laboratory located in Palo Alto, the Systems Research Cen-ter (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,Massachusetts (CRL).

Our research is directed towards mainstream high-performance computer systems. Ourprototypes are intended to foreshadow the future computing environments used by manyDigital customers. The long-term goal of WRL is to aid and accelerate the developmentof high-performance uni- and multi-processors. The research projects within WRL willaddress various aspects of high-performance computing.

We believe that significant advances in computer systems do not come from any singletechnological advance. Technologies, both hardware and software, do not all advance atthe same pace. System design is the art of composing systems which use each level oftechnology in an appropriate balance. A major advance in overall system performancewill require reexamination of all aspects of the system.

We do work in the design, fabrication and packaging of hardware; language processingand scaling issues in system software design; and the exploration of new applicationsareas that are opening up with the advent of higher performance systems. Researchers atWRL cooperate closely and move freely among the various levels of system design. Thisallows us to explore a wide range of tradeoffs to meet system goals.

We publish the results of our work in a variety of journals, conferences, researchreports, and technical notes. This document is a technical note. We use this form forrapid distribution of technical material. Usually this represents research in progress.Research reports are normally accounts of completed research and may include materialfrom earlier technical notes.

Research reports and technical notes may be ordered from us. You may mail yourorder to:

Technical Report DistributionDEC Western Research Laboratory, WRL-2250 University AvenuePalo Alto, California 94301 USA

Reports and notes may also be ordered by electronic mail. Use one of the followingaddresses:

Digital E-net: DECWRL::WRL-TECHREPORTS

Internet: [email protected]

UUCP: decwrl!wrl-techreports

To obtain more details on ordering by electronic mail, send a message to one of theseaddresses with the word ‘‘help’’ in the Subject line; you will receive detailed instruc-tions.

The impact of 32 bit and 64 bit pointerson applications

Jeffrey C. Mogul, Joel Bartlett, Jeremy Dion, Russell Kao,Bob Mayo, Louis Monier, Amitabh Srivastava

March, 1993

Abstract

32-bit architectures cannot support the largest applications, so the transition to 64-bitsystems has commenced. This has led to some debate over which applications require64-bit support. We have identified, instead, a set of applications that actually suffer fromthe use of 64-bit pointers. Such applications are best supported by a 64-bit machine thatprovides 32-bit pointer variables. We suggest that a 64-bit system should thereforeprovide the option of using either 32-bit or 64-bit pointers, on a program-by-programbasis.

This note describes a number of example programs (drawn mostly from advancedCAD software) and programming techniques that would suffer from the unnecessary useof 64-bit pointers. We estimate the performance and constraints on problem size thatwould result.

Digital Internal Use Only

Copyright 1993Digital Equipment Corporation

d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA

Digital Internal Use Only ii

Table of Contents1. Introduction 12. Garbage collection and pointer sizes 33. Examples of CAD programs that prefer 32-bit pointers 5

3.1. Corner stitching 63.2. Switch-level simulation 83.3. Binary Decision Diagrams 83.4. Simulation browsing 113.5. Summary of CAD applications 12

4. Problems with global variables 125. Other large programs 13

5.1. Sorting programs 146. Summary 14Acknowledgements 14References 14

Digital Internal Use Only iii

Digital Internal Use Only iv

List of FiguresFigure 1: Garbage-collection overhead as a function of heap size 5

Digital Internal Use Only v

Digital Internal Use Only vi

List of TablesTable 1: Garbage-collection overhead as a function of heap size 4Table 2: Effect of pointer size on garbage-collection cost 10

Digital Internal Use Only vii

Digital Internal Use Only viii

1. Introduction

Many people believe that if 32-bit machines were better than 16-bit machines, then surely64-bit machines are better than 32-bit machines. While this may be true, it leads to the beliefthat the proper way to exploit a 64-bit machine is to use 64-bit pointers in all applications. Thisis wrong: for some programs the costs of using 64-bit instead of 32-bit pointers exceed thebenefits.

We have written this note in an attempt to explain and quantify this issue. We address onlytechnical issues of application feasibility and performance, and ignore any logistical issues re-lated to source code portability or the complexities of supporting both 32-bit and 64-bit program-ming environments on a single machine. We also ignore issues related to the use of integerslarger than 32 bits (since these are rarely necessary) or 64-bit floating point values (since theseare available on all modern CPUs).

One often-made assertion is that CAD programs ‘‘need 64 bits.’’ Certainly this is true inmany cases (such as FORTRAN programs using huge arrays), but in some cases it is not true.Worse, we know of many CAD programs that need lots of main memory, and that use manypointers; for a given CPU and a given amount of RAM, such programs may be much slower, oreven essentially infeasible, using 64-bit pointers. In this note, we present several examples ofsuch programs.

The assertion that CAD programs need large address spaces is based on a belief that suchprograms reference many millions of data items. We believe that the crucial question for manysuch programs is not ‘‘how much address space can the program reference,’’ but rather ‘‘howmany items can the program fit into the available main memory,’’ since demand-paging an enor-mous address space is usually infeasible.

Why do these CAD programs not need 64-bit pointers? Consider CAD used in VLSI design.Current chips, even the largest ones, include millions or tens of millions of circuit elements.They do not yet contain hundreds of millions or billions of circuit elements, nor will they for anumber of years. CAD programs tend to use tens or hundreds of bytes to represent each circuitelement, so a full 32-bit address space is actually sufficient for even the most complex designsthat one might expect over the next few years.

Most current 32-bit computer systems do not support true 32-bit address spaces for applica-tions. The 32-bit virtual address space is often divided up into kernel and user spaces, and per-haps subdivided even further. Existing CPU implementations are often incapable of addressingmore than a few hundred megabytes of RAM. And existing platforms are often further boundedin the amount of RAM they can accept. The net effect is that 32-bit computers do not supportthe larger VSLI CAD problems.

A 64-bit system with support for large virtual and physical address spaces can indeed improvethe environment for large VLSI CAD problems. It should be possible to configure such a systemwith 4GB of main memory, and it should be possible for an operating system using 64-bitpointers to provide a true 32-bit address space for user programs.

Nevertheless, the actual amount of main memory available on any given computer system,even 64-bit machines, is always bounded. Most cost-effective platforms do not support the use

Digital Internal Use Only 1

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

of more than a few hundred megabytes of RAM, and even when a lot of RAM can be attached, itsignificantly increases the cost of the system. For example, consider the DECstation 3000Model 400 AXP, with an entry-level price of $15,000. Digital’s current list price for RAM isabout $100 per megabyte, so a fully-loaded system with 512MB of RAM (available with 16Mbitchips) would cost $66000. More than 75% of the system cost would be for RAM. The mostRAM configurable for any announced DECstation is 1GB. In other words, there are monetaryand actual limits on the amount of real memory available for cost-effective systems.

On an architecture such as Alpha AXP, which imposes no intrinsic penalty on the use of 32-bitpointers, we believe that for many CAD applications it would be better to use 32-bit pointersthan 64-bit pointers. The use of larger pointers, for a given problem and a given platform, meansone of two things:

• The problem will require more RAM, at a real cost to the customer, and with nooffsetting benefit.

• The problem simply will not fit on even the largest possible configuration, andhence becomes infeasible.

In theory, one can always convert a program using 32-bit pointers into one using 32-bit integerindices into a large array. In practice, this is not without its drawbacks:

• It adds additional instructions and perhaps a memory reference to the ‘‘pointer-manipulation’’ steps. Since these are often the dominant steps in CAD algorithms,much of the performance of the machine is wasted.

• It may still require more memory than a purely 32-bit pointer model would require,if the integer-to-pointer conversion must be done using a mapping array. This in-creases the amount of RAM that must be purchased, or it might make the probleminfeasible.

• A vendor wishing to sell a CAD program on both 32-bit and 64-bit systems is eitherforced to maintain two copies of the source code, or is forced to accept an unneces-sary performance hit on the 32-bit systems.

Unnecessary use of 64-bit pointers imposes two other costs: an increase in the cache-miss rate,1and an increase in the TLB-miss rate . Both of these effects tend to reduce the performance

advantage of a given CPU.

The bottom line is that, for a given 64-bit CPU, it may be more cost-effective to use 32-bitpointers in certain applications. The application might run faster, take less RAM, or support alarger problem before demand-paging leads to thrashing. While a modern 64-bit system maybenchmark faster than a contemporary 32-bit system, it may actually perform slower on somereal problems.

We can distinguish between three system models:

1If the operating system were able to use the granularity-hint mechanisms to map huge virtual spaces with a fewTLB entries, the TLB miss rate would be much less of a problem for us. This is not yet supported by OSF/1, theonly operating system that our applications would run under.

Digital Internal Use Only 2

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

1. 32-bit-only architecture

2. 64-bit-only architecture

3. 64-bit hardware with optional support for 32-bit pointersWhile full 64-bit support is necessary for some applications, the third model provides this sup-port while giving the option of using 32-bit pointers when that is appropriate. Using a 64-bitsystem with 32-bit pointers is also likely to provide better performance than using a simple 32-bit system, especially if floating-point operations are involved. Thus, the third model is the onethat we prefer.

Subsequent sections of this note discuss specific examples and problems in more detail.

2. Garbage collection and pointer sizes

Many of the programs in our CAD system use ‘‘garbage collection’’ instead of explicit deal-location of dynamic storage. Garbage collection makes these programs much simpler to write(one no longer has to worry about forgetting to free a data item, or freeing it too many times, orat the wrong time). It also may improve the performance of code that manipulates complexstructures, since it obviates the need to maintain reference counts or to call explicit deallocationroutines.

Garbage collection does complicate the performance picture. We use a ‘‘mostly-copying’’algorithm [1], which requires that a pool of free space be kept available. The total size of thegarbage-collected address space, including live storage and the overhead, is called the ‘‘heap.’’When the heap gets too small, the cost of garbage collection becomes excessive. Even withenough headroom to work in, garbage collection does have run-time costs, and these dependsomewhat on the size of objects used.

Increasing the size of pointers from 32 bits to 64 bits, and hence the size of any data structurecontaining pointers, causes several performance problems related to garbage-collection:

1. It will increase the rate at which storage is allocated; that is, the rate at which ad-dress space is consumed. This in turn increases the rate at which garbage collec-tion must take place.

2. It will increase the cost of each garbage collection phase, since this cost is propor-tional to the amount of live storage.

3. It reduces the number of structures that can fit into a given amount of real memory,and thus may cause the garbage collector to run out of headroom.

In the worst case, the first and second effects would each increase run-time linearly with theincrease in structure size. That is, if the change in pointer size added 50% to the size of agarbage-collected structure, one would expect the rate of address space consumption to increaseby 50%. Assuming that the number of live structures does not change, the amount of data copiedduring the collection phase would also increase by 50%.

In practice, the actual increase in these run-time costs are somewhat lower. Address spaceconsumption is normally not the only thing that a program does. During the copying phase ofgarbage collection, doubling the size of a copied object does not necessarily double the cost of

Digital Internal Use Only 3

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

copying it, since copies of large objects amortize the fixed costs of copying over more bytes (andmay have better locality of reference).

The third effect, related to the loss of headroom available to the garbage collector, is notamenable to such simple analysis. For any given application working on a specific problem,there is some minimum amount of memory that is sufficient to support the garbage collectorwithout excessive overhead. When the available memory is reduced beyond this point, garbage-collection costs increase dramatically, and may become effectively infinite. Normally, the gar-bage collector expands its heap as necessary, but the user can limit the expansion to keep it fromoverflowing real memory.

We measured the garbage-collection overheads for a specific problem on both MIPS/Ultrix,using 32-bit pointers, and Alpha AXP/OSF, using 64-bit pointers. The example we used is theScheme->C compiler [2] compiling the largest source file in an application called ezd.Scheme->C is a Scheme implementation that achieves high portability by using C as its inter-mediate language. For this problem, the first ‘‘reasonable’’ heap size using 32-bit pointers is7MB; using 64-bit pointers, one needs at least 12MB, a 71% increase (see table 1 and figure 1.)

Using 32-bit pointers

Appl. time GC time heap (MB) GC/total timeNever finished - 574.2 82.1 6 .5372.7 13.1 7 .1572.9 9.7 8 .1273.1 6.2 9 .0873.7 5.6 10 .0773.2 5.8 11 .0774.2 5.5 12 .0773.6 5.4 13 .0772.4 5.5 14 .0772.4 5.2 15 .0773.4 5.0 16 .0673.0 4.8 17 .0673.3 4.9 18 .06

Using 64-bit pointers

Appl. time GC time heap (MB) GC/total timeNever finished - 931.4 117.3 10 .7930.4 34.0 11 .5329.7 6.7 12 .1830.0 4.4 13 .1330.5 3.4 14 .1030.0 3.3 15 .1029.9 2.2 16 .0730.4 2.1 17 .0630.1 2.1 18 .07

Table 1: Garbage-collection overhead as a function of heap size

Note that the garbage collector adds one integer to each garbage-collected object, for its ownpurposes. While only 32 bits of information are stored in this field, in a 64-bit world the fieldmust be padded to 64 bits, to obey structure alignment rules. Thus even objects that do not

Digital Internal Use Only 4

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

4 206 8 10 12 14 16 18Heap size (MB)

0

1.2

0.2

0.4

0.6

0.8

1Fr

actio

n of

tim

e sp

ent i

n G

C At or above line, never finishes

MIPS (32 bit)

Alpha AXP (64-bit)

Figure 1: Garbage-collection overhead as a function of heap size

contain explicit pointers still incur more garbage-collection costs with 64-bit pointers, especiallyif they are small. (Other storage allocators may also impose overhead that increases in a 64-bitworld. For example, malloc() adds one pointer to every allocated object.)

3. Examples of CAD programs that prefer 32-bit pointers

The Western Research Laboratory (WRL) has for many years done research on the design ofhigh-speed VLSI implementations of RISC processors. Our approach depends on aggressive useof innovative CAD systems, most of which have been written by WRL members. While ourCAD approach may not be entirely representative of the traditional VLSI CAD market, webelieve that we encounter many of the same problems.

The current design project at WRL is called BIPS-1, a full implementation of the Alpha AXParchitecture in BiCMOS. The chip includes integer and floating point functional units, twolevels of cache, address translation buffers, and various control logic. The final chip will haveapproximately 4 million transistors.

In this section, we describe four CAD algorithms crucial to our highly-automated design style:

• Corner-stitching, used in our router. The router is used to produce virtually allwiring in the BIPS-1 chip. Our design style is to run the router, then analyze theresult to see if changes in the circuit design or layout would improve performance.This means that the router is run fairly often, and must be as fast as possible. Manyyears of effort have been put into this router; even so, it takes many hours to routethe chip, which limits the turnaround time for our design cycle.

• Switch-level simulation of the chip. We aggressively simulate our chips at manydesign levels, because an error requiring an additional fabrication run costs lots ofmoney and many months. The speed of the simulator is crucial, since we need tosimulate as many CPU cycles as possible in order to have confidence in our designs.

• Binary Decision Diagrams, used for manipulations of logical equations. Rather thandoing the chip design at the level of gates, our designers create much of the designusing more abstract logical equations. We then generate gate-level schematicsautomatically; this involves a lot of automated optimization and is CPU-intensive.

Digital Internal Use Only 5

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

• Tree-structured representation of circuits, useful (among other things) for a ‘‘brow-ser’’ that allows a designer to visualize the results of a simulation run.

All of these algorithms are used, in one form or another, by other people in the CAD community,and (for the BIPS-1 chip) we have serious problems fitting some of them into the memory avail-able on our current systems (32-bit DECstation 5000s with 480MB of main memory). In thecase of the router and the simulator, we doubt that we could actually make use of current AlphaAXP workstations without support for 32-bit pointers; the designs would simply not fit using64-bit pointers.

3.1. Corner stitching

During the process of routing the BIPS-1 chip (when all the circuit elements are automaticallywired together), we use an algorithm called ‘‘corner-stitching.’’ This was originally developedby John Ousterhout [5]. Many other papers have been published since then that refer to thealgorithm; it is widely used, at least in the research community.

The basic data unit in the corner-stitching algorithm is called a Tile. A Tile has this structure:

H,Y^|

------------------------| | ---> H,X| || || (x,y) |

L,X <--- ------------------------|vL,Y

The (x, y) coordinates of the lower left corner of the Tile are stored, along with four ‘‘cornerstitches’’: (L,X), (L,Y), (H,X), and (H,Y). The ‘‘stitches’’ are pointers to neighboring Tiles. Forexample, the (H,Y) stitch points to the rightmost top neighbor of the Tile. A Tile’s sides are onintegral coordinates; the Tile contains its lower and left edges, but not its top or right edges.

Our router is written in C++, and represents a Tile like this:

class Tile {Vec org; // lower-left cornerTile* st[2][2]; // [LH][XY]

public:int color; // user data[... C++ class methods omitted ... ]

};

In other words, we use 3 integers and 4 pointers for each Tile. The integers represent a positionin X,Y space and so will probably never need more than 32 bits.

This is the dominant data structure in the router’s address space. Here are some statistics forhow many Tiles are needed to route a large chip, using 32-bit pointers:

Digital Internal Use Only 6

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

• BIPS-0, a 32-bit MIPS processor without floating point or virtual memory support,required about 200MB of address space for the final routing process, of which about100MB was used for Tiles. That works out to about 3.6 million Tiles.

• BIPS-1, a 64-bit Alpha AXP CPU has not yet been routed, but it will be about 10times more complex than BIPS-0. On the other hand, we have been trying to makethe router smarter, and optimistically estimate that we will be able to route the chipusing 400MB of memory using 32-bit pointers. With 64-bit pointers, this wouldprobably mean that we need more than 600 MB of memory.

• Future chips are likely to be even more complex (taking advantage of shrinkingVLSI feature sizes) and hence will require even more memory to route.

For some of the data structures used by the router, we are using garbage collection rather thanexplicit memory allocation and deallocation; this means that we need some extra space in orderto run the garbage collection algorithms. Even in a 32-bit world, the router is so memory-starvedthat we cannot afford to pay this garbage-collection space overhead for the bulk of the data, andso the corner-stitching Tiles are not garbage-collected. Still, the structures that are garbage-collected amount to around 100MB (using 32-bit pointers), which means that we must spendseveral tens of megabytes of RAM on garbage collector overhead. If the garbage-collected datastructures grow because we use 64-bit pointers instead of 32-bit pointers, the amount of thisextra space will increase in proportion.

The router algorithms do not have much locality of reference. This means that as soon as itruns out of RAM (after the operating system has taken its cut) the program starts thrashing thebacking store and it becomes far too slow for any practical purposes. The program also tends totrash the TLB; our router tries to maintain a little TLB-relative locality but we are not sure howsuccessful this is.

If we stick with our current algorithms, a 32-bit machine gives us more than enough addressspace. On the other hand, a 64-bit machine almost doubles the amount of RAM we need to buy,and in fact we would not be able to use a DECstation 3000 Model 400 at all (since it doesn’tsupport more than 512 MB of RAM). Since our design style is to run the router once a night,and then adjust the design during the daytime, it would be a real benefit to be able to use a CPUfaster than an R3000, but only if the 64-bit pointers do not cause thrashing.

Using 32-bit pointers, a Tile is stored in 28 bytes. Using 64-bit pointers, a Tile requires 482bytes . It should be possible to replace the 4 64-bit pointers with 32-bit integer indices into a

table of pointers. This would not entirely eliminate the storage penalty for large pointers, be-cause each Tile would require an 8-byte entry in the table; the net space used per Tile would be36 bytes. It would also mean that each reference to a Tile would require an additional arrayreference (probably 2 extra instructions including one extra memory reference, which is likely tobe a cache miss). Since the main loops of the program mostly chase Tile pointers and do a littlecalculation on each Tile, this could add a significant cost. It might not eliminate the AlphaAXP’s performance advantage, but it would certainly cut into it.

2The 3 integers and 4 pointers together take up 44 bytes, but the compiler rounds up the size of a structurecontaining 64-bit pointers to a multiple of 8 bytes.

Digital Internal Use Only 7

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

3.2. Switch-level simulation

Bisim is a simulator that WRL has been using to simulate the logical behavior of the BIPS-1microprocessor. We will eventually use Bisim to simulate the entire chip at the switch-level.This entails predicting the logical behavior of the chip by accounting for the behavior of eachtransistor. We anticipate that BIPS-1 will eventually consist of approximately 4 million transis-tors and 2 million nodes. Bisim uses most of main memory to store the graph that represents theinterconnection of those nodes and transistors.

A fair amount of effort has gone into trying to represent that graph in as compact a form aspossible. We believe that we will barely be able to fit the graph into the maximum amount ofphysical memory of a DECstation 5000, 480 megabytes. This is important because Bisim’s per-formance is unacceptable if the entire graph doesn’t fit into physical memory.

The graph is represented by two data structures: Nodes and Transistors. A Node containspointers to all Transistors attached to it, and each Transistor has pointers to all nodes to whichthe transistor is attached. The Node data structure consists of 14 words of data and 12 pointerswhile the Transistor data structure consists of 7 words of data and 9 pointers. Using 32-bitpointers, 2 million nodes require 208 million bytes of storage while 4 million transistors require256 million bytes of storage. Thus the graph requires 464 million bytes of storage.

52% of that storage is occupied by pointers. If the size of pointers were doubled, the size ofthe graph would increase to 705 million bytes. The size of physical memory would have to beincreased correspondingly to preserve performance.

In addition to the graph, Bisim employs other data structures (for example, a node-name sym-bol table). However, these can reside on a paging device without seriously impacting thesimulator’s performance. We expect that the size of this auxiliary storage to be about 1/4 thesize of the storage needed to represent the graph. Thus the total virtual address space occupiedby Bisim can definitely be handled using 32 bit addresses.

For the BIPS-1 design, Bisim will not benefit from 64 bit pointers. Instead, its performancewill be hurt by the 50% growth in data size, and the use of 64-bit pointers will make the probleminfeasible on any system supporting less than about 750MB of main memory.

3.3. Binary Decision Diagrams

Bdds (Binary Decision Diagrams) are data structures used for logic manipulations in CADtools. Bdds serve as unique representations for boolean logic functions. They have become apopular structure for many logic manipulation applications.

A single Bdd represents a single boolean logic function. It contains the index of a variable,say X, and two pointers to simpler logic functions represented as Bdds. The two Bdds representthe boolean functions that would result if X were bound to one of the values 0 or 1. Distin-guished terminal Bdds are used to represent the logic functions 0 (always false) and 1 (alwaystrue). In this manner, any boolean function can be represented with Bdd nodes by simply num-bering the variables and building the graph.

Digital Internal Use Only 8

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

Every Bdd is part of a BddContext, which represents a set of variables and an order imposedupon them. Bdds have the property that there is a single unique tree that represents a given logicfunction under a given variable ordering. To make logic function equality fast, as well as to savestorage, we further ensure that we never allocate a new Bdd if we already have one with thesame value. We do this by storing all Bdds in a hash table in the BddContext. When we areabout to allocate a new Bdd, we first check the hash table to see if we already have one with theright value. This allows us to guarantee that if two logic functions are equal, the pointers to theirBdds will be equal.

Enhancements can be made to the basic Bdd structure and will vary from application to ap-plication. In the BIPS-1 CAD system, we add to the basic structure 4 integers and 1 pointer.The 4 integers contain a unique ID for the node (used for hashing), an integer containing aboolean flag that allows us to compress the data structures, and two integers that cache theresults of commonly computed functions (the depth of the Bdd subtree and the number of nodesin the subtree). The additional pointer points to the BddContext.

Our enhanced Bdd node structure thus consists of three pointers and 5 integers. Using 32-bitpointers, a Bdd node consumes 32 bytes; using 64-bit pointers, it would take 44 bytes. (All butone of the five integers can be represented using a 16-bit field, so in fact the structures could be 8bytes smaller in either case. We have been using 32-bit integers, but we have machines with lotsof RAM.)

The garbage collection system adds another pointer-sized field to each garbage-collectedstructure. That increases the node sizes to 36 and 52 bytes, respectively.

We must also consider the storage in the BddContext’s hash table of unique Bdds. There aremany ways to build a hash table, but one would expect somewhat more than one pointer peritem; in our code, we use about 1.5 pointers per item. Thus the total consumption for a Bdd nodeusing 32-bit pointers is 42 bytes, and 64 bytes using 64-bit pointers, for a net increase of 52%.

Two classes of operations on Bdds that normally would take time exponential in the size ofthe Bdd subtree can be converted to sub-linear time operations through the use of caching.These are logical operations (e.g., AND/OR/NOT) and variable-binding operations. The datastructures used to represent these caches are named BddOpEntry and BddSetEntry, respectively.A BddOpEntry takes 3 pointers and 4 integers and would grow by 66% if 64-bit pointers wereused; a BddSetEntry takes 2 pointers and 3 integers and would grow by 50%. In a typical run,the program allocates almost as many BddOpEntry items as basic Bdd nodes, and about twice asmany BddSetEntry items.

In our application, Bdd operations are time-limited, not space limited, in the sense that thereare never more than a few hundreds of megabytes of Bdd-related storage in use at any one time.However, Bdds are created at a furious rate during logic optimizations of a given equation, andare then made obsolete when work ceases on a given logic equation. Because the Bdd code isembedded in the rest of the CAD system, we choose to use the same garbage collection schemethat the rest of the system uses.

The cost of the copying garbage collection we use is roughly proportional to the amount ofspace retained (see section 2). This is probably best measured as the number of bytes retained,

Digital Internal Use Only 9

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

not the number of pointers, since when garbage-collecting a large address space, one will tend tohave mostly cache misses and probably a lot of TLB misses. (One doesn’t want to have a lot ofpage faults; that is, the size of the garbage-collected space is set to fit into available realmemory.) So it should increase by somewhat more than 50% if we switched from using 32-bitpointers to 64-bit pointers. On a 32-bit machine, the garbage-collection overhead appears to beabout 10%, so we would expect it to be about 15% using 64-bit pointers.

In addition, the Bdd manipulations are likely to suffer a higher cache-miss rate and possibly ahigher TLB-miss rate, since their working set is more than 50% larger. It is hard to quantify theperformance costs here, since they depend on specifics of the memory hierarchy and operatingsystem.

The performance of Bdd operations is considered critical to most logic synthesis CADsystems [3], although the BIPS CAD system does not use Bdds as aggressively as some othersystem. To simulate a more aggressive system we ran our normal Bdd calculation 40 times on aMIPS/Ultrix (32-bit) system. In one run, we used the normal Bdd node structure; in the otherrun, we added three dummy pointers to the node, to simulate the use of 64-bit pointers. We alsodoubled the size of the hash tables, for the same reason. We did not double the size of thegarbage collector’s overhead word (and the size of BddSetEntry items was not rounded up to a64-bit boundary), so this ‘‘simulation’’ is conservative in that respect.

Pointer size Total time GC time %GC

32 bits 10420 1130 10.8%

64 bits 12008 1627 13.5%

Table 2: Effect of pointer size on garbage-collection cost

The results are shown in table 2. The ratio of garbage collection times (1627/1130 = 1.44) isslightly less than we had predicted, although this may be caused partly by our incompletesimulation of a 64-bit system. We also suspect that the cost of copying the larger structures doesnot increase linearly, because it improves the cache hit rate slightly. The time spent in the rest ofthe application (i.e., not doing garbage collection) increased by about 12%, probably because ofa higher cache-miss rate. (Note, however, that we had trouble getting repeatable numbers forthese trials, so these figures are somewhat fuzzy.)

The net cost (in CPU time) of the change from 32-bit to 64-bit pointers seems to be nearlyproportional to the change in the size of data items. In the worst case, a data structure made upsolely of pointers, run time might almost double. Even in the case of Bdd operations, where theincrease is likely to be in the range of 10% to 15%, the cost of using 64-bit pointers might besignificant for long-running applications.

The increase in CPU time does not seem to be particular significant in this case. More trou-bling is the expansion in the amount of heap space. When we used 32-bit pointers, the garbagecollector acquired about 150 Mbytes of heap space. When we used 64-bit pointers, the garbagecollector allocated about 200 Mbytes. This is a significant expansion in the amount of realmemory required to run this problem. We tried running the same problem (using 64-bit pointers)with the heap size limited to 150 Mbytes, and the program simply failed: the garbage collector

Digital Internal Use Only 10

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

was unable to allocate sufficient storage. For us, this is not a serious limitation, but for otherBdd-based applications, or for machines with smaller main memories, it could be a real problem.

Since Bdd manipulation is time-limited, it probably does not make sense to replace thepointers with integer indices into an array of pointers. Although this would not reduce the speedof subtree comparisons (these could be done using the integer IDs rather than the pointers), itwould reduce the speed of Bdd construction and other manipulations (by adding an extra arrayreference to each Bdd reference).

3.4. Simulation browsing

The Krono program is used to visually ‘‘browse’’ the results of a simulation of a complexchip. Krono takes as input a description of the circuit, and the log of a simulation run for thatcircuit. The log contains a history of the states for each node in the circuit. Using Krono, adesigner can look at the timing diagrams for various points in the circuit, and see how they relateto each other. Effectively, Krono is an omniscient logic analyzer that can display any element ofthe circuit at any point during the simulation.

Krono uses lots of memory. There are two major contributors to memory use: the circuitdescription, which is a tree structure made up of ‘‘Chunks,’’ ‘‘Blocks,’’ and ‘‘Nodes,’’ and thesimulation log, which is stored at the leaves of the tree in ‘‘Chunks.’’ (Chunks are a sort ofdynamic array, where every element is a pointer.) The size of the tree depends directly on thecomplexity of the circuit; the amount of information stored at the leaves depends on the length(in time steps) of the simulation being browsed. These structures are all garbage-collected, sinceduring tree construction there are intermediate Chunk structures that must be deallocated.

In a typical problem (we used an early instance of the BIPS-1 chip), the circuit is representedusing 60480 Chunks, 15585 Blocks, and 26951 Nodes (these numbers are approximate). EachBlock contains seven pointers and three integers, and each Node contains one pointer and threeintegers. Both structures also require another 32-bit field for use by the garbage collector. Theaverage Chunk requires about 87 bytes, which means that it contains approximately 20 pointers.Thus, to represent this circuit we use 1345646 pointers and 170144 integers. This means that thestorage requirements would increase by about 88% if 64-bit pointers were used. Looking at thisanother way: using 32-bit pointers, we can construct the tree using about 20MB of RAM. Using64-bit pointers, we would need perhaps 38MB of RAM to avoid problems with garbage collectorperformance.

The simulation results, at the leaves of the tree, are also kept in Chunks. In this case, however,the Chunks contain integers rather than pointers, so increasing the pointer size would not sig-nificantly increase the amount of storage required.

Compared to the amount of space required to represent the log of a long simulation, the extra18MB or so of space that would be needed with 64-bit pointers is not enormous. We expectfuture circuits to be significantly more complex (e.g., to support larger caches, multiple instruc-tion issue, etc.), so the space required to represent the tree could become more of an issue.

Digital Internal Use Only 11

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

3.5. Summary of CAD applications

We have shown how an increase in pointer size will affect four important CAD applications.None of the programs needs 32-bit pointers; all live quite comfortably within a 32-bit addressspace. All of the programs will be slowed down by an increase in pointer size, to varying ex-tents. For the router and Bisim, use of 64-bit pointers may convert feasible problems into in-feasible ones. These two programs most strongly illustrate the point that the size of mainmemory is a far more critical resource than virtual address space.

4. Problems with global variables

All current 64-bit architectures use 32-bit-wide instructions. Since there is no efficient way toload 64-bit address constants directly from a 32-bit instruction stream, one component of therun-time environment is a global offset table, or GOT table. The GOT table contains an array ofthe address constants used in the program. An adjacent set of Small Data Section (SDS) tablescontain the actual values of scalar global variables. (‘‘Global’’ means with respect to avariable’s lifetime, not necessarily to its scope.)

When a program makes a (non-PC-relative) procedure call, it must first load the procedure’sentry address into a register. To do this, it does an indexed load from the GOT table into aregister, and then calls the procedure using an indirect jump via the register. The same method isused to load global pointer variables, character string constants, or the values of global scalarvariables. To reduce the cost of GOT-table accesses, one of the general registers is designated asthe global pointer or GP, and normally does not need to be loaded before every GOT-table ac-cess. (This is an optimization that is not available for shared-library routines, which thereforehave to load the GP on entry.) If the GOT table is small enough, the GP register can also be usedto access SDS table entries without being reloaded.

The size of the GOT table is limited by the size of the offset in an indexed load instruction.For Alpha AXP, this is 16 bits, which means that the GOT table can span at most 65536 bytes.Use of 64-bit pointers means that the GOT table can contain up to 8192 entries.

8192 entries may seem like a lot, but in fact it is not uncommon for a program to need a lot ofGOT-table entries. Among our own applications, we have found some programs that need morethan 8192 GOT-table entries. For example, the Scheme->C system, which compiles programs inScheme (a dialect of LISP) into C, generates many global variables. Some moderately largeScheme programs compile into C programs with more than 8192 globals.

For example, the set of interfaces from Scheme to the X window system (Xlib) is a library thatshould be shared, but cannot be shared because it overflows the GOT and SDS tables. In orderto provide stubs for Scheme to all of Xlib, and Scheme access procedures to Xlib’s C structures,this library requires over 5400 table entries (both procedure addresses and global variables).This leaves less than 2800 table slots free for application-specific objects and for other libraries.

This is not an insurmountable problem. The compilation system can allocate multiple GOTtables, and in fact the current OSF/1 Alpha AXP system does so, for non-shared-libraryprograms. (Support for shared libraries is feasible, but not yet implemented. This means thatour large Scheme programs currently cannot use shared libraries.)

Digital Internal Use Only 12

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

Unfortunately, when multiple GOT tables are used, it becomes much harder to optimize outthe loads of the GP. In general, this must be done once on entry to each procedure, and thisregister must be saved and restored across procedure calls, although one could remove theseoperations for a call between procedures that provably share the same GOT table(s). This addsextra overhead to each procedure call (except in shared-library code, where it must be doneanyway).

We have not measured the performance effect of this extra work. It is likely to be most visiblefor programs that do lots of short procedure calls, but that is typical behavior for the veryScheme programs that use so many globals.

Support for 32-bit pointers, instead of 64-bit pointers, would eliminate the need to keep ad-dress constants in the GOT table. 32-bit constants can be loaded using two LDAx instructions(effectively, load-immediates). Not only does this eliminate the restriction that address constantsmust fit into the GOT table, and hence may eliminate the need for multiple GOT tables, it alsogets an address value into a register in just two cycles. Loading an address indirectly through theGOT table, because it requires a memory reference, takes at least two cycles (on a CPU with1-cycle cache latency) and probably more. (The 21064-AA has a load latency of 3 cycles, butsometimes the intervening cycles can be used for independent instructions.)

Even in a 32-bit world, one would still like to use the SDS tables to store scalar variables; thisreduces the overhead of accessing such variables. Removing the address constants from thistable leaves more room for scalars.

5. Other large programs

We have relatively less experience with other large programs that use lots of pointers. (Mostlarge scientific applications are probably still written in FORTRAN, which has no pointer vari-ables, so these applications are not particularly sensitive to the issue of address size.)

We suspect that large symbolic systems, such as artificial intelligence applications, written inlanguages such as LISP, Scheme, or Prolog, are likely to have many of the same properties asour CAD programs:

• Large data sizes

• Mostly pointers

• Poor locality of reference

• Use of garbage collectionHence, to the extent that these applications fit into a 32-bit address space, they will probably bemore cost-effective (and probably more feasible) using 32-bit pointers.

We also suspect that large ‘‘in-memory’’ database systems will have similar properties. Theline between such systems and AI applications is somewhat blurry, but one might include in thiscategory a large engineering design represented as a relational database. While there will cer-tainly be databases that cannot fit into a 32-bit address space even with 32-bit pointers, for thesmaller databases there will be significant cost advantages to using smaller pointers (and henceless RAM).

Digital Internal Use Only 13

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

5.1. Sorting programs

Unnecessarily large pointers would also hurt the performance of pointer-based sorting on largedata sets. A sort program spends a large part of its time exchanging the order of records. It cando this by exchanging the records themselves, by exchanging the keys along with pointers to thefull records, or by exchanging just the pointers. Pointer sort may be the most efficient techniqueif the keys are large, especially if the full array of pointers can fit into the CPU board-level datacache. For keys of moderate size, it might be more efficient to use a key sort, because it keepsthe keys and the pointers together (and so increases locality). Some phases of a key-basedQuickSort can run entirely in a small, on-chip cache [4].

If one is sorting less than a few billion bytes, a sort program has no need for 64-bit pointers.Use of larger pointers than necessary will probably reduce the number of keys (or pointers) thatcan fit into the CPU’s caches, and so reduces the size of the problem that can be sorted withoutexcessive cache-miss overheads. In an environment that provides only 64-bit pointers, one couldimplement a pointer-based sort using 32-bit record indicies. This requires the execution ofseveral additional instructions each time an index is converted to a pointer, but this is probablyless costly than incurring the extra cache misses that would be caused by use of 64-bit pointers.It also means that one must maintain separate versions of the sorting program for 32-bit and64-bit machines.

6. Summary

Most certainly there are applications, even CAD applications, that need 64 bits. Just as cer-tainly, there are many applications that do just fine with 32 bits. Some of these applications willperform significantly better in a 32 bit world, or require significantly less RAM. In a real sense,these applications will achieve better price/performance ratios using a 32-bit model, with every-thing else held equal.

What concerns us most, in our effort to build the CAD tools for the BIPS-1 project, is thatsome of our applications simply will not run at all using 64-bit pointers, given the amount ofRAM that we can connect to current workstations. If OSF on Alpha AXP did not support anoptional 32-bit model, we would be forced to continue to use our MIPS-based Ultrix systems, inspite of their fundamentally poorer performance.

AcknowledgementsJim Gray and Chris Nyberg contributed to the section on sorting programs. David Wall

helped with proofreading and some issues of emphasis.

References[1] Joel F. Bartlett. Mostly-Copying Garbage Collection Picks Up Generations and C++.Technical Note TN-12, WRL, October, 1989.

Digital Internal Use Only 14

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

[2] Joel F. Bartlett. SCHEME->C: A Portable Scheme-to-C Compiler. Research Report89/1, WRL, January, 1989.

[3] Karl S. Brace, Richard L. Rudell, Randal E. Bryant. Efficient Implementation of a BDDPackage. In Proceedings of the 27th ACM/IEEE Design Automation Conference, pages 40-45.July, 1990.

[4] Harold Lorin. Sorting and Sort Systems. Addison-Wesley, Reading, MA, 1975.

[5] John Ousterhout. Corner Stitching: A Data Structuring Technique for VLSI LayoutTools. IEEE Trans. on Computer-Aided Design CAD-3(1):87-100, January, 1984.

Digital Internal Use Only 15

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

Digital Internal Use Only 16

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

WRL Research Reports

‘‘Titan System Manual.’’ ‘‘MultiTitan: Four Architecture Papers.’’

Michael J. K. Nielsen. Norman P. Jouppi, Jeremy Dion, David Boggs, Mich-

WRL Research Report 86/1, September 1986. ael J. K. Nielsen.

WRL Research Report 87/8, April 1988.‘‘Global Register Allocation at Link Time.’’

David W. Wall. ‘‘Fast Printed Circuit Board Routing.’’

WRL Research Report 86/3, October 1986. Jeremy Dion.

WRL Research Report 88/1, March 1988.‘‘Optimal Finned Heat Sinks.’’

William R. Hamburgen. ‘‘Compacting Garbage Collection with Ambiguous

WRL Research Report 86/4, October 1986. Roots.’’

Joel F. Bartlett.‘‘The Mahler Experience: Using an Intermediate WRL Research Report 88/2, February 1988.

Language as the Machine Description.’’

David W. Wall and Michael L. Powell. ‘‘The Experimental Literature of The Internet: An

WRL Research Report 87/1, August 1987. Annotated Bibliography.’’

Jeffrey C. Mogul.‘‘The Packet Filter: An Efficient Mechanism for WRL Research Report 88/3, August 1988.

User-level Network Code.’’

Jeffrey C. Mogul, Richard F. Rashid, Michael ‘‘Measured Capacity of an Ethernet: Myths and

J. Accetta. Reality.’’

WRL Research Report 87/2, November 1987. David R. Boggs, Jeffrey C. Mogul, Christopher

A. Kent.‘‘Fragmentation Considered Harmful.’’ WRL Research Report 88/4, September 1988.Christopher A. Kent, Jeffrey C. Mogul.

WRL Research Report 87/3, December 1987. ‘‘Visa Protocols for Controlling Inter-Organizational

Datagram Flow: Extended Description.’’‘‘Cache Coherence in Distributed Systems.’’ Deborah Estrin, Jeffrey C. Mogul, Gene Tsudik,Christopher A. Kent. Kamaljit Anand.WRL Research Report 87/4, December 1987. WRL Research Report 88/5, December 1988.

‘‘Register Windows vs. Register Allocation.’’ ‘‘SCHEME->C A Portable Scheme-to-C Compiler.’’David W. Wall. Joel F. Bartlett.WRL Research Report 87/5, December 1987. WRL Research Report 89/1, January 1989.

‘‘Editing Graphical Objects Using Procedural ‘‘Optimal Group Distribution in Carry-Skip Ad-Representations.’’ ders.’’

Paul J. Asente. Silvio Turrini.WRL Research Report 87/6, November 1987. WRL Research Report 89/2, February 1989.

‘‘The USENET Cookbook: an Experiment in ‘‘Precise Robotic Paste Dot Dispensing.’’Electronic Publication.’’ William R. Hamburgen.

Brian K. Reid. WRL Research Report 89/3, February 1989.WRL Research Report 87/7, December 1987.

Digital Internal Use Only 17

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

‘‘Simple and Flexible Datagram Access Controls for ‘‘Link-Time Code Modification.’’

Unix-based Gateways.’’ David W. Wall.

Jeffrey C. Mogul. WRL Research Report 89/17, September 1989.

WRL Research Report 89/4, March 1989.‘‘Noise Issues in the ECL Circuit Family.’’

Jeffrey Y.F. Tang and J. Leon Yang.‘‘Spritely NFS: Implementation and Performance ofWRL Research Report 90/1, January 1990.Cache-Consistency Protocols.’’

V. Srinivasan and Jeffrey C. Mogul.‘‘Efficient Generation of Test Patterns UsingWRL Research Report 89/5, May 1989.

Boolean Satisfiablilty.’’

Tracy Larrabee.‘‘Available Instruction-Level Parallelism for Super-WRL Research Report 90/2, February 1990.scalar and Superpipelined Machines.’’

Norman P. Jouppi and David W. Wall.‘‘Two Papers on Test Pattern Generation.’’WRL Research Report 89/7, July 1989.Tracy Larrabee.

WRL Research Report 90/3, March 1990.‘‘A Unified Vector/Scalar Floating-Point Architec-

ture.’’‘‘Virtual Memory vs. The File System.’’Norman P. Jouppi, Jonathan Bertoni, and DavidMichael N. Nelson.W. Wall.WRL Research Report 90/4, March 1990.WRL Research Report 89/8, July 1989.

‘‘Efficient Use of Workstations for Passive Monitor-‘‘Architectural and Organizational Tradeoffs in theing of Local Area Networks.’’Design of the MultiTitan CPU.’’

Jeffrey C. Mogul.Norman P. Jouppi.WRL Research Report 90/5, July 1990.WRL Research Report 89/9, July 1989.

‘‘A One-Dimensional Thermal Model for the VAX‘‘Integration and Packaging Plateaus of Processor9000 Multi Chip Units.’’Performance.’’

John S. Fitch.Norman P. Jouppi.WRL Research Report 90/6, July 1990.WRL Research Report 89/10, July 1989.

‘‘1990 DECWRL/Livermore Magic Release.’’‘‘A 20-MIPS Sustained 32-bit CMOS Microproces-Robert N. Mayo, Michael H. Arnold, Walter S. Scott,sor with High Ratio of Sustained to Peak Perfor-

Don Stark, Gordon T. Hamachi.mance.’’WRL Research Report 90/7, September 1990.Norman P. Jouppi and Jeffrey Y. F. Tang.

WRL Research Report 89/11, July 1989.‘‘Pool Boiling Enhancement Techniques for Water at

Low Pressure.’’‘‘The Distribution of Instruction-Level and MachineWade R. McGillis, John S. Fitch, WilliamParallelism and Its Effect on Performance.’’

R. Hamburgen, Van P. Carey.Norman P. Jouppi.WRL Research Report 90/9, December 1990.WRL Research Report 89/13, July 1989.

‘‘Writing Fast X Servers for Dumb Color Frame Buf-‘‘Long Address Traces from RISC Machines:fers.’’Generation and Analysis.’’

Joel McCormack.Anita Borg, R.E.Kessler, Georgia Lazana, and DavidWRL Research Report 91/1, February 1991.W. Wall.

WRL Research Report 89/14, September 1989.

Digital Internal Use Only 18

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

‘‘A Simulation Based Study of TLB Performance.’’ ‘‘Cache Write Policies and Performance.’’

J. Bradley Chen, Anita Borg, Norman P. Jouppi. Norman P. Jouppi.

WRL Research Report 91/2, November 1991. WRL Research Report 91/12, December 1991.

‘‘Packaging a 150 W Bipolar ECL Microprocessor.’’‘‘Analysis of Power Supply Networks in VLSI Cir-William R. Hamburgen, John S. Fitch.cuits.’’WRL Research Report 92/1, March 1992.Don Stark.

WRL Research Report 91/3, April 1991.‘‘Observing TCP Dynamics in Real Networks.’’

Jeffrey C. Mogul.

WRL Research Report 92/2, April 1992.‘‘TurboChannel T1 Adapter.’’

David Boggs.‘‘Systems for Late Code Modification.’’WRL Research Report 91/4, April 1991.David W. Wall.

WRL Research Report 92/3, May 1992.‘‘Procedure Merging with Instruction Caches.’’

Scott McFarling.‘‘Piecewise Linear Models for Switch-Level Simula-WRL Research Report 91/5, March 1991.

tion.’’

Russell Kao.‘‘Don’t Fidget with Widgets, Draw!.’’WRL Research Report 92/5, September 1992.Joel Bartlett.

WRL Research Report 91/6, May 1991.‘‘A Practical System for Intermodule Code Optimiza-

tion at Link-Time.’’‘‘Pool Boiling on Small Heat Dissipating Elements inAmitabh Srivastava & David W. Wall.Water at Subatmospheric Pressure.’’WRL Research Report 92/6, December 1992.Wade R. McGillis, John S. Fitch, William

R. Hamburgen, Van P. Carey.‘‘A Smart Frame Buffer.’’WRL Research Report 91/7, June 1991.Joel McCormack & Bob McNamara.

WRL Research Report 93/1, January 1993.‘‘Incremental, Generational Mostly-Copying Gar-

bage Collection in Uncooperative Environ-

ments.’’

G. May Yip.

WRL Research Report 91/8, June 1991.

‘‘Interleaved Fin Thermal Connectors for Multichip

Modules.’’William R. Hamburgen.

WRL Research Report 91/9, August 1991.

‘‘Experience with a Software-defined Machine Ar-chitecture.’’

David W. Wall.

WRL Research Report 91/10, August 1991.

‘‘Network Locality at the Scale of Processes.’’Jeffrey C. Mogul.

WRL Research Report 91/11, November 1991.

Digital Internal Use Only 19

The IMPACT OF 32 BIT AND 64 BIT POINTERS ON APPLICATIONS

WRL Technical Notes

‘‘TCP/IP PrintServer: Print Server Protocol.’’ ‘‘Systems for Late Code Modification.’’

Brian K. Reid and Christopher A. Kent. David W. Wall.

WRL Technical Note TN-4, September 1988. WRL Technical Note TN-19, June 1991.

‘‘TCP/IP PrintServer: Server Architecture and Im- ‘‘Unreachable Procedures in Object-oriented Pro-

plementation.’’ gramming.’’

Christopher A. Kent. Amitabh Srivastava.

WRL Technical Note TN-7, November 1988. WRL Technical Note TN-21, November 1991.

‘‘Smart Code, Stupid Memory: A Fast X Server for a ‘‘Cache Replacement with Dynamic Exclusion’’

Dumb Color Frame Buffer.’’ Scott McFarling.

Joel McCormack. WRL Technical Note TN-22, November 1991.

WRL Technical Note TN-9, September 1989.‘‘Boiling Binary Mixtures at Subatmospheric Pres-

‘‘Why Aren’t Operating Systems Getting Faster As sures’’

Fast As Hardware?’’ Wade R. McGillis, John S. Fitch, William

John Ousterhout. R. Hamburgen, Van P. Carey.

WRL Technical Note TN-11, October 1989. WRL Technical Note TN-23, January 1992.

‘‘Mostly-Copying Garbage Collection Picks Up ‘‘A Comparison of Acoustic and Infrared Inspection

Generations and C++.’’ Techniques for Die Attach’’

Joel F. Bartlett. John S. Fitch.

WRL Technical Note TN-12, October 1989. WRL Technical Note TN-24, January 1992.

‘‘Limits of Instruction-Level Parallelism.’’ ‘‘TurboChannel Versatec Adapter’’

David W. Wall. David Boggs.

WRL Technical Note TN-15, December 1990. WRL Technical Note TN-26, January 1992.

‘‘The Effect of Context Switches on Cache Perfor- ‘‘A Recovery Protocol For Spritely NFS’’

mance.’’ Jeffrey C. Mogul.

Jeffrey C. Mogul and Anita Borg. WRL Technical Note TN-27, April 1992.WRL Technical Note TN-16, December 1990.

‘‘Electrical Evaluation Of The BIPS-0 Package’’

‘‘MTOOL: A Method For Detecting Memory Bot- Patrick D. Boyle.

tlenecks.’’ WRL Technical Note TN-29, July 1992.Aaron Goldberg and John Hennessy.

‘‘Transparent Controls for Interactive Graphics’’WRL Technical Note TN-17, December 1990.Joel F. Bartlett.

‘‘Predicting Program Behavior Using Real or Es- WRL Technical Note TN-30, July 1992.timated Profiles.’’

‘‘Design Tools for BIPS-0’’David W. Wall.Jeremy Dion & Louis Monier.WRL Technical Note TN-18, December 1990.WRL Technical Note TN-32, December 1992.

Digital Internal Use Only 20