Encoded pointers — an interesting data-structure for modern sil's

4
Volume 10, number 2 INFURIATION PROCESSING LETTERS 18 March 19h0 ENCODED POIldTERS - AN INTERESTING DATA-STRUCTURE FOR MODIERN SILS Bruce ANDERSON Man-Machine L&oratory, Department of Electrical Engineering Science, iJniversityof Essex, Co?&ester CQ4 3SQ, England Received 20 November 1979 Encapsulation, representations, loophole, ring structure 1. Introduction In this note we describe a low-level data-structure and the issues raised by attempts to implement it in modern systems-pro~amm~ng languages. 2. The StNCtUR A ring data-structure consists of blocks of data linked by two-way pointers, so that the neighbours of a block can be accessed immediately. A single pointer to an element of the ring gives us access to all other elements, by following pointers in either direction. There is clearly a good deal of redundancy in the pointer structure, but there can be good reason for using a ring rather than the simpler circular list. We can reduce the storage needed for pointers by (a) storing not two pointers to the adjacent blocks, but the XOR of the two, together with (b) referring to the ring by keeping two pointers to adjacent elements. Since (A xor B) xor A = B we now compute, rather than just extract, the required pointers when moving our (two-pointer) ref- erence around the ring in either direction. The BCPL [3] code for such a representation is given in Fig. 1. ‘Illis coding requires special care over two points: that meaningless bit-patterns are not created through XORing unrelated objects; and that enough real * A Ring Element BIock (REB) has two fields: element of * t:,:: ring E and xored pointer X * A Ring Reference Block (RRB) has two fields, both poin- * ters into the ring, called L and R manifest { E: 0; X: 1; L: 0;R: I;NIL: 0; 1 let gol(rrb) be ( 1L.t reb =rrb!L;rrb!L:= (rrb!L)!X xor rrb!R;rrb!R := reit 1 let gor(rrb) be { let reb = rrb!L; rrb!R := (rrb!R)!X xor rrb!L; rrb!L := rel:~ 1 let insert (4, rrb) be { let reb - newvec(2); reb?X := 0; reb!E := vd; if rrb!L = NIL do {rrb!L : = reb; rrb! R : = reb; reb! X : = 0; return) reb!X := rrb!L xor rrb!R (rrb!L)!X := ((rrb!L)!X xor rrb!R) xor reb (rrb!R)!X := ((rrb!R)!X xor rrb!L) xor ieb rrb! L :=reb 1 let remove(rrb) be ( if rrb!L = NIL do return let reb = rrb!L; rrb!L := (rrb!L)!X XOF rrb!R (rrb!L)!X := ((rrb!L)!X xor rob) xor rr?!R (rrb!R)!X := ((rrbiR)!X sor reb) xor rrir!L 1 Fig. 1. BCPL code for rings with encoded pointers. 47

Transcript of Encoded pointers — an interesting data-structure for modern sil's

Page 1: Encoded pointers — an interesting data-structure for modern sil's

Volume 10, number 2 INFURIATION PROCESSING LETTERS 18 March 19h0

ENCODED POIldTERS - AN INTERESTING DATA-STRUCTURE FOR MODIERN SILS

Bruce ANDERSON Man-Machine L&oratory, Department of Electrical Engineering Science, iJniversity of Essex, Co?&ester CQ4 3SQ, England

Received 20 November 1979

Encapsulation, representations, loophole, ring structure

1. Introduction

In this note we describe a low-level data-structure

and the issues raised by attempts to implement it in modern systems-pro~amm~ng languages.

2. The StNCtUR

A ring data-structure consists of blocks of data linked by two-way pointers, so that the neighbours of a block can be accessed immediately. A single pointer to an element of the ring gives us access to all other elements, by following pointers in either direction. There is clearly a good deal of redundancy in the pointer structure, but there can be good reason for using a ring rather than the simpler circular list. We can reduce the storage needed for pointers by

(a) storing not two pointers to the adjacent blocks, but the XOR of the two, together with

(b) referring to the ring by keeping two pointers to adjacent elements. Since

(A xor B) xor A = B

we now compute, rather than just extract, the required pointers when moving our (two-pointer) ref- erence around the ring in either direction. The BCPL [3] code for such a representation is given in Fig. 1.

‘Illis coding requires special care over two points: that meaningless bit-patterns are not created through XORing unrelated objects; and that enough real

* A Ring Element BIock (REB) has two fields: element of * t:,:: ring E and xored pointer X * A Ring Reference Block (RRB) has two fields, both poin- * ters into the ring, called L and R

manifest { E: 0; X: 1; L: 0;R: I;NIL: 0;

1

let gol(rrb) be ( 1L.t reb =rrb!L;rrb!L:= (rrb!L)!X xor rrb!R;rrb!R := reit

1

let gor(rrb) be { let reb = rrb!L; rrb!R := (rrb!R)!X xor rrb!L; rrb!L := rel:~

1

let insert (4, rrb) be { let reb - newvec(2);

reb?X := 0; reb!E := vd; if rrb!L = NIL do

{rrb!L : = reb; rrb! R : = reb; reb! X : = 0; return) reb!X := rrb!L xor rrb!R (rrb!L)!X := ((rrb!L)!X xor rrb!R) xor reb (rrb!R)!X := ((rrb!R)!X xor rrb!L) xor ieb rrb! L :=reb

1

let remove(rrb) be ( if rrb!L = NIL do return

let reb = rrb!L; rrb!L := (rrb!L)!X XOF rrb!R (rrb!L)!X := ((rrb!L)!X xor rob) xor rr?!R (rrb!R)!X := ((rrbiR)!X sor reb) xor rrir!L

1

Fig. 1. BCPL code for rings with encoded pointers.

47

Page 2: Encoded pointers — an interesting data-structure for modern sil's

Volume IO, number 2 INFORMATION PROCESSING lLETTERS 18 March 1980

pointers are maktained to enable the structure to be travcrscd. For example, if during the REMOVE opera- tion WC point only to the two blocks that are to hccome adjacent after the removal then we can no longer compute their X components because we can- not locate their neighbours. (Of course the temporary F&ter we introduce could perhaps be implicit, using BCPL’s muttipfe assignment, but we have chosen to show it explicitly.)

BCPL, and the many other languages of that level, allow hitpatterns to be manipulated as desired. The prcrhlcm is left firmly in the hands of the programmer, v&o is assumed to be capable. There are many popu- lar languages, such as Pascal, where the job is essen- tially impossible, as pointers are closely guarded. The I?roblcms is removed from the hands of the program- mer, who is assumed to be incapable. Such languages zc usualiy ‘augmented’ or ‘extended’ when used for systems prl)gramrning. However, the modern approach to systems i~rograrnrning languages attempts, at least, to ailow fhe full power of assembly-code program- ming to bc real&cd, but in a much more structured and controlled manner. In particular, they introduce explicitly the idea of abstract data-structure and the separation of the implementation of such structures from their usage. we should thus see the encoded ring as raising two questions about individual langlrsges, or c1asscs of lallgl.ragcs:

(a) does tile language allow the ring abs:raction to hc pro+led safely. and

(b ) dot.- it aliow our encoded implementation? BCPL allows the abstraction to be provided, via proce- dures such as those already defined, but cannot pre- vent access to t!!e implementation, that is it camlot encapsulate it. We ; *zo characterised above the Pascal- like languages whicfl may or map not include encapsul- ticm mechanisms, but .io not allow low-level opera-

!!ijr\s. Chr dixussions t,‘en turns to more recent Ian- rli+CF: wtl t&e Ada [ I] and C f2] as examples.

4. Providi?, the abstraction

Tlhis is :*oi he piac~ to make a detailed compari- 5on hetwe:ll languages in their power to present

let cont(rrb) = (rrb!L)!E let sctcont(rrb, val) be ((rrb!L)!E := va1) Iet init(rrb) be {rrb!L : = rrb!R : = NIL)

Fig. 2. Further BCPL code for rings.

abstractions - see [4] for an excellent discussion of this and related topics. Briefly, however, we require: - type RING: corresponding to the RRB, - operiltion INIT: to initialise the RRB structures, - operations GOL, GUR, INSERT, REMOVE: as in

the implementations, - operations CONT, SETCONT: to obtain/assign to

the current element of the ring. The operations not imp1emente.J above can be

coded as shown in Fig. 2. Note that we require objects of type RING to be available to the user, but to for- bid access to components of that type.

5. Implementing the abstraction

Several requirements are imposed by the represen- tation in quesiion:

(1) a type of object upin which to perform the XOR operation. Some larguages have a WORD data- type and allow logical (i.e. bit-p~lrallel) operations upon them, whereas oth:rs insist on an array of ‘bits and the corresponding serial operation;

(2) the ability to fin{:1 out how long a pointer is, in order to choose the correct length of word or bit- array;

(3) permission either to view pointers as words/ arrays (and vice-versa), or function to produce one representation from the other.

Ada does allow us to implement this scheme in a rnac~liile-independent way. The ACCESS._SIZE attribute of the REB type gives us the length of an address in bits. We can then use this to declare the X component of REB”s as a packed array of l-bit ob- jects, and we can define an XOR function on these objects. The library module UNSAFE_PROG~M- MING contains a generic function UNSAFE-CON- VERSION which we can instantiate to transfer between the pointer (known as access type) and bit- array representations, The code is outlined in Fig. 3;

C~\m&l ,,nint.J fh”, rrr:nn I----

Page 3: Encoded pointers — an interesting data-structure for modern sil's

Volume lQ, number 2 INFORMATION PROCESSING LETTERS 18 March 1980

type BIT is new BOOLEAN; for BIT use 1; type REB; BMAX: constant INTEGER := REB’ACCESS-SIZE - 1; type PWORD is array (0.. BMAX) of BIT; for PWORD using packing; type REB is access record

E: VAL; X: PWORD (others =+ FALSE);

end record; type RRB is record

L, R: REB : = null; end record;

**. restricted(XRING, UNSAFE_PROGRAMMING) function PTOPWORP is UNSAFE_PROGRAdMING, UNSAFE_CONVERSION

(REB, PWORD);

Fig. 3. Partial Ada code for ring abstraction.

(a) is the pointer size determinable before the things it points to are fully defined? The defmition of the ADDRESS attribute implies that all pointers are the same length, so this should work. However, on modern machines with large address-spaces and useful addressing-modes one mig’nt wei! want to use short (based) pointers, but Ada does not allow this to be requested, though nothing prevents the compiler from doing it ;

(b) will the compiler realise that no computing is needed to turn bit-arrays into pointers, and vice-versa, since we are using the function merely to satisfy type- checking;

(c) the iterative XOR is very slow. In practice it would be more satisfactory to take the traditional escape route, turning pointers intO ,ntegers and drop- ping into machine-code to do the XOR.

In C the cast construction ‘(UNSIGNED) x’ gives us X viewed as an UYSIGNED integer, but the defini- tion states explicitly that no computation is done. Similarly C has unions, which are basically variant records with no dkcriminant field, so that we can view a component as a pointer or as a bit-pattern. C does have explicit bitwise operations on unsigned integers, includm!; XOR, but there is no way to ob- tain the length of’ pointer from the environment, so complete machine-independence cannot be achieved.

6. Discussion

These newer languages don’t help greatly in making the implementation a safe one internally; in particular the constraints given above that we must XOR only certain related values, and that we must maintain pairs of pointers to adjacent blocks, cannot usefully be built into further lower-Ievel abstractions. Secondly, the more powerful the language the more complex the implementation becomes, both for the implementer and for the compiler. Thus BCPL has a very simple virtual machine, with simple but powerful operations on it. It is useful for systems programming because it matches current hardware closely enough; machine- code is rarely needed, Ada, on the other hand, pre- sents a complex virtual machine. The more complex this becomes, the harder it is for the programmer to be given control over the mapping of these constructs onto the hardware. Despite the facilities provided for that, which will obviously be useful in many cases, we seem to retain in the case in question a need to exit into machine-code to provide a reasonable implemen- tation of a relative straightforward, though very ma- chine-oriented, concept. While this escape-hatch should no doubt be available, it seems a pity that it should be needed in Ada for this example when the more primitative BCPL manages well without it. Of course Ada solves some problems that BCPL does not even address, and the languages cannot be compared in every dimension. C’s extensive (but often un- checked) type structure places it in between these two languages.

Although we set out to see whether this useful implementation trick could be used in a high-level ianguage, our aim was purely exploratory, so that a negative result need not be a pessimistic one. Despite its neatness, the representation has many disadvan- tages, apart from its obvious fragility:

(1) it assumes that pointers are conventional, or, rather, that each REB is represented by a unique (and unchanging) bit-pattern. It fails if pointers are relative (or, more realistically. if those in REB’s are relative and those in RRB’s are absohrte),

(2) garbage-collection is made complex since the garbage-collector would have to know about the ring type and its representation. Ada’s garbage-collector Could fail;

(3 j multiple references to the ring must be handled

49

Page 4: Encoded pointers — an interesting data-structure for modern sil's

Viifumtc IO, number 2 JNFOIRMATION PROCESSING LETTERS 18 March 1980

very carefully: (a) integrity of the ring - though this is safe if

operations, are not interleaved; and (b) integrity of RRB’s - quite nontrivial.

When insertions arc made, the encoded implementa- tion requires that we be able to trace all references to an REB, in order to update the corresponding RRB. Varioiis implementations suggest themselves, such as chaining all RRB’a.

7. Conclusion

Qn examination, the apparent simplicity of the re~9re~ntation is seen to be unrealistic. It still remains true, bowever, that this relatively simple trick cannot be used strai~tfo~ardly in current (projected?) sys- tems-programming languages. Whatever the merits of this particular case, it does point out the difficulty of brim&g together very high-level and very low-level

concepts. There is a fundamental issue here: should a high4evel language allow the description of detailed implementation techniques, especially those of its own implementation? To put it another way: can we view languages both as theories of programming and as tools to extract the most from computer hardware?

References

[I] J.C. Jchbiah, J.G.P. Barnes, J.C. Hehard, B. Krieg- Brbeckner, 0. Roubine and B.A. Wichmann, Reference manual for the Ada language, SIGPLAN Notices 14 (6) (1979).

[ 21 B.W. Kernighan and D.M. Ritchie, The C Programming Language (Prentice-Hall, Englewood Cliffs, NJ, 1978).

[ 31 M. Richards, BCPL - a tool for compiler writing and sys- tems programming, Proc. AFIPS SJCC 34 (1969) 557- 566.

[4] R. Schwanke, Survey of scope issues in programming lan- guages, Internal report CMUCS-78-131, Carnegie-Mellon University, Computer Science Department (1978).

50