Comparison of Access Methods for Time-Evolving …cchen/580-S06/reading/ST99.pdf · Comparison of...

Comparison of Access Methods for Time-Evolving DataBETTY SALZBERG

Northeastern University

AND

VASSILIS J. TSOTRAS

University of California, Riverside

This paper compares different indexing techniques proposed for supporting efficientaccess to temporal data. The comparison is based on a collection of importantperformance criteria, including the space consumed, update processing, and querytime for representative queries. The comparison is based on worst-case analysis,hence no assumptions on data distribution or query frequencies are made. When anumber of methods have the same asymptotic worst-case behavior, features in themethods that affect average case behavior are discussed. Additional criteriaexamined are the pagination of an index, the ability to cluster related datatogether, and the ability to efficiently separate old from current data (so thatlarger archival storage media such as write-once optical disks can be used). Thepurpose of the paper is to identify the difficult problems in accessing temporal dataand describe how the different methods aim to solve them. A general lower boundfor answering basic temporal queries is also introduced.

Categories and Subject Descriptors: H.2.2 [Database Management]: PhysicalDesign; Access methods; H.3.1 [Information Storage and Retrieval]: ContentAnalysis and Indexing—Indexing methods

General Terms: Management, Performance

Additional Key Words and Phrases: Access methods, I/O performance, structures,temporal databases

1. INTRODUCTION

Conventional database systems captureonly a single logical state of the modeledreality (usually the most current). Us-

ing transactions, the database evolvesfrom one consistent state to the next,while the previous state is discardedafter a transaction commits. As a result,there is no memory with respect to prior

Betty Salzberg’s work was supported by NSF grants IRI-9303403 and IRI-9610001. Vassilis Tsotras’work was performed while the author was with the Department of Computer Science, PolytechnicUniversity, Brooklyn, NY 11201; it was supported by NSF grants IRI-9111271, IRI-9509527, and by theNew York State Science and Technology Foundation as part of its Center for Advanced Technologyprogram.Authors’ addresses: B. Salzberg, College of Computer Science, Northeastern University, Boston, MA02115; email: [email protected]; V. J. Tsotras, Department of Computer Science and Engineering,University of California, Riverside, Riverside, CA 92521; email: [email protected] to make digital / hard copy of part or all of this work for personal or classroom use is grantedwithout fee provided that the copies are not made or distributed for profit or commercial advantage, thecopyright notice, the title of the publication, and its date appear, and notice is given that copying is bypermission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute tolists, requires prior specific permission and / or a fee.© 1999 ACM 0360-0300/99/0600–0158 $5.00

ACM Computing Surveys, Vol. 31, No. 2, June 1999

states of the data. Such database sys-tems capture a single snapshot of real-ity (also called snapshot databases), andare insufficient for those applicationsthat require the support of past, cur-rent, or even future data. What isneeded is a temporal database system[Snodgrass and Ahn 1986]. The term“temporal database” refers in general toa database system that supports sometime domain, and is thus able to man-age time-varying data. (Note that thisdefinition excludes user-defined time,which is an uninterpreted time domaindirectly managed by the user and not bythe database system.)

Research in temporal databases hasgrown immensly in recent years [Tso-tras and Kumar 1996]. Various aspectsof temporal databases have been exam-ined [Ozsoyoglu and Snodgrass 1995],including temporal data models, querylanguages, access methods, etc. Proto-type efforts appear in Böhlen [1995]. Inthis paper we provide a comparison ofproposed temporal access methods, i.e.,indexing techniques for temporal data.We attempt to identify the problems inthe area, together with the solutionsgiven by each method.

A taxonomy of time in databases was

developed in Snodgrass and Ahn [1995].Specifically, transaction time and validtime have been proposed. Transactionand valid time are two orthogonal timedimensions. Transaction time is definedas the time when a fact is stored in thedatabase. It is consistent with thetransaction serialization order (i.e., it ismonotonically increasing), and can beimplemented using the commit times oftransactions [Salzberg 1994]. Valid timeis defined as the time when a fact be-comes effective (valid) in reality. De-pending on the time dimension(s) sup-ported, there are three kinds oftemporal databases: transaction-time,valid-time, and bitemporal [Dyreson etal. 1994].

A transaction-time database recordsthe history of a database activity ratherthan real-world history. As such, it can“rollback” to one of its previous states.Since previous transaction times cannotbe changed (every change is stampedwith a new transaction time), there isno way to change the past. This is use-ful for applications in auditing, billing,etc. A valid-time database maintainsthe entire temporal behavior of an en-terprise as best known now. It storesour current knowledge about the enter-prise’s past, current, or even future be-havior. If errors are discovered in thistemporal behavior, they are correctedby modifying the database. When a cor-rection is applied, previous values arenot retained. It is thus not possible toview the database as it was before thecorrection. A bitemporal database com-bines the features of the other twotypes. It more accurately represents re-ality and allows for retroactive as wellas postactive changes.

The tuple-versioning temporal model[Lorentzos and Johnson 1988; Navatheand Ahmed 1987] is used in this paper.Under this model, the database is a setof records (tuples) that store the ver-sions of real-life objects. Each suchrecord has a time-invariant key (surro-gate) and, in general, a number of time-variant attributes; for simplicity, we as-sume that each record has exactly one

CONTENTS

1. Introduction2. Problem Specification3. Items for comparison

3.1 Queries3.2 Access Method Costs3.3 Index Pagination and Data Clustering3.4 Migration of Past Data to Another Location3.5 Lower Bounds on I/O Complexity

4. Efficient Method Design For Transaction/BitemporalData4.1 The Transaction Pure-Timeslice Query4.2 The Transaction Pure-Key Query4.3 The Transaction Range-Timeslice Query4.4 Bitemporal Queries4.5 Separating Past from Current Data and Use of

WORM disks5. Method Classification and Comparison

5.1 Transaction-Time Methods5.2 Valid-Time Methods5.3 Bitemporal Methods

6. Conclusions

Comparison of Access Methods for Time-Evolving Data • 159


time-varying attribute. In addition, ithas one or two intervals, depending onwhich types of time are supported. Eachinterval is represented by two at-tributes: start_time and end_time.

Accurate specification of the problemthat needs to be solved is critical in thedesign of any access method. This isparticularly important in temporal da-tabases, since problem specification de-pends dramatically on the time dimen-sion(s) supported. Whether valid and/ortransaction times are supported directlyaffects the way records are created orupdated. In the past, this resulted inmuch confusion in the design of tempo-ral access methods. To exemplify thedistinct characteristics of the transac-tion and valid time dimensions, we usea separate abstraction to describe thecentral underlying problem for eachkind of temporal database.

The query performance of the meth-ods examined is compared in the con-text of various temporal queries. In or-der to distinguish among the variouskinds of queries, we use a general tem-poral query classification scheme [Tso-tras et al. 1998]. The paper also intro-duces lower bounds for answering basictemporal queries. Each lower bound as-sumes a disk-oriented environment anddescribes the minimal I/O for solvingthe query if space consumption is keptminimal. We also show access methodsthat achieve a matching upper boundfor a temporal query.

Among the methods discussed, theones that support transaction time (ei-ther in a transaction time or in a bitem-poral environment) assume a lineartransaction-time evolution [Ozsoyogluand Snodgrass 1995]. This implies thata new database state is created by up-dating only the current database state.Another option is the so-called branch-ing transaction time [Ozsoyoglu andSnodgrass 1995], where evolutions canbe created from any past databasestate. Such branched evolutions form atree-of-evolution that resembles the ver-sion-trees found in versioning environ-ments. A version-tree is formed as new

versions emanate from any previousversion (assume that no version merg-ing is allowed). There is however a dis-tinct difference that makes branchedevolutions a more difficult problem. In aversion-tree, every new version isuniquely identified by a successive ver-sion number that can be used to accessit directly [Driscoll et al. 1989; Lankaand Mays 1991]. In contrast, branchedevolutions use timestamps. These time-stamps enable queries on the evolutionon a given branch. However, times-tamps are not unique. The same timeinstant can exist as a timestamp inmany branches in a tree-of-evolutionsimply because many updates couldhave been recorded at that time in var-ious branches. We are aware of only twoworks that address problems related toindexing branching transaction time,namely Salzberg and Lomet [1995] andLandau et al. [1995]. In the ongoingwork of Salzberg and Lomet [1995], ver-sion identifiers are replaced by (branchidentifier, timestamp) pairs. Both a treeaccess method and a forest accessmethod are proposed for these branchedversions. Landau et al. [1995] providesdata structures for (a) locating the rela-tive position of a timestamp on the evo-lution of a given branch and (b) locatingthe same timestamp among siblingbranches. Clearly, more work is neededin this area.

Other kinds of temporal, in particulartime-series, queries have recently ap-peared: [Agrawal and Swami 1993; Fa-loutsos et al. 1994; Jagadish et al. 1995;Seshadri et al. 1996; Motakis and Za-niolo 1997]. A pattern and a time-series(an evolution) are given, and the typicalquery asks for all times that a similarpattern appeared in the series. Thesearch involves some distance criterionthat qualifies when a pattern is similarto the given pattern. The distance crite-rion guarantees no false dismissals(false alarms are eliminated afterwards).Whole pattern-matching [Agrawal et al.1993] and submatching [Faloutsos et al.1994] queries have been examined. Suchtime-series queries are reciprocal in

160 • B. Salzberg and V. J. Tsotras


nature to the temporal queries addressedhere (which usually provide a time in-stant and ask for the pattern at thattime), and are not covered in this paper.

The rest of the paper is organized asfollows: Section 2 specifies the basicproblem underlying each of the threetemporal databases. We categorize amethod as transaction-time, valid-time,and bitemporal, depending on whichtime dimension(s) it most efficientlysupports. Section 3 presents the itemson which our comparison was based,including the lower bounds. Section 4discusses in more detail the basic char-acteristics that a good transaction orbitemporal access method should have.The examined methods are presented inSection 5. The majority falls in thetransaction-time category, which com-prises the bulk of this paper (Section5.1). Within the transaction-time cate-gory, we further classify methods ac-cording to what queries they supportmore efficiently (key-only, time-only, ortime-key methods). A table summariz-ing the worst-case performance charac-teristics of the transaction-time meth-ods is also included. For completeness,we also cover valid-time and bitemporalmethods in Sections 5.2 and 5.3, respec-tively. We conclude the paper with adiscussion on the remaining open prob-lems.

2. PROBLEM SPECIFICATION

The following discussion is influencedby Snodgrass and Ahn [1986], where thedifferences between valid and transac-tion times were introduced and illus-

trated by various examples. Here weattempt to identify the implications foraccess method design from support ofeach time dimension.

To visualize a transaction-time data-base, consider first an initially emptyset of objects that evolves over time asfollows. Time is assumed to be discreteand is described by a succession of con-secutive nonnegative integers. Anychange is assumed to occur at a timeindicated by one of these integers. Achange is the addition or deletion of anobject or the value change (modifica-tion) of the object’s attribute. A real lifeexample is the evolution of the employ-ees in a company. Each employee has asurrogate (ssn) and a salary attribute.The changes include additions of newemployees (as they are hired or re-hired), salary changes or employee dele-tions (as they retire or leave the compa-ny). Since an attribute value change canbe represented by the artificial deletionof the object, followed by the simulta-neous rebirth of the object with themodified attribute, we may concentrateon object additions or deletions. Such anevolution appears in Figure 1. An objectis alive from the time it is added in theset and until (if ever) it is deleted fromthe set. The state of the evolving set attime t, namely s~t!, consists of all thealive objects at t. Note that changes arealways applied to the most current states~t!, i.e., past states cannot be changed.

Assume that the history of the aboveevolution is to be stored in a database.Since time is always increasing and thepast is unchanged, a transaction time

Figure 1. An example evolution where changes occur in increasing time order. The evolution isdepicted at time t10. Lines ending in ’.’ correspond to objects that have not yet been deleted. At t10,state s~t9! 5 $a, f, g% is updated by the addition of object e to create state s~t10! 5 $a, f, g, e%.



database can be utilized with the im-plicit updating assumption: that whenan object is added or deleted from theevolving set at time t, a transactionupdates the database system about thischange at the same time, i.e., this trans-action has commit timestamp t.

When a new object is added on theevolving set at time t, a record repre-senting this object is stored in the data-base accompanied by a transaction-timeinterval of the form @t, now!. now is avariable representing the current trans-action time, used because at the timethe object is born its deletion time is yetunknown. If this object is later deletedat time t9 the transaction-time intervalof the corresponding record is updatedto @t, t9!. Thus, an object deletion in theevolving set is represented as a “logical”deletion in the database (the record ofthe deleted object is still retained in thedatabase, but with a different transac-tion end_time).

Since a transaction-time databasesystem keeps both current and pastdata, it is natural to introduce the no-tion of a logical database state as afunction of time. We therefore distin-guish between the database system andthe logical database state. (This is notrequired in traditional database sys-tems because there always exists ex-actly one logical database state—thecurrent one.) The logical database stateat time t consists of those records whosetransaction time interval contains t.Under the implicit updating assump-tion, the logical database state is equiv-alent to the state s~t! of the observedevolving set. Since an object can be re-born, there may be many records (orversions) that are stored in the data-base system representing the history ofthe same object. But all these recordscorrespond to disjoint transaction-timeintervals in the object’s history, andeach such record can belong to a singlelogical database state.

To summarize, an access method for atransaction-time database needs to (a)

store its past logical states, (b) supportaddition/deletion/modification changeson the objects of its current logicalstate, and (c) efficiently access andquery the objects in any of its states.

In general, a fact can be entered inthe database at a different time thanwhen it happened in reality. This im-plies that the transaction-time intervalassociated with a record is actually re-lated to the process of updating thedatabase (the database activity), andmay not accurately represent the periodthe corresponding object was alive inreality.

A valid-time database has a differentabstraction. To visualize it, consider adynamic collection of interval-objects.We use the term interval-object to em-phasize that the object carries a valid-time interval to represent the validityperiod of some object property. (In con-trast, and to emphasize that transac-tion-time represents the database activ-ity rather than reality, we term theobjects in the transaction-time abstrac-tion as plain-objects.) The allowablechanges are the addition/deletion/ modi-fication of an interval-object, but thecollection’s evolution (past states) is notkept. An example of a dynamic collec-tion of object-intervals appears in Fig-ure 2.

As a real-life example, consider thecollection of contracts in a company.Each contract has an identity (con-tract_no), an amount attribute, and aninterval representing the contract’s du-ration or validity. Assume that when acorrection is applied only the correctedcontract is kept.

A valid-time database is suitable forthis environment. When an object isadded to the collection, it is stored inthe database as a record that containsthe object’s attributes (including its val-id-time interval). The time of therecord’s insertion in the database is notkept. When an object deletion occurs,the corresponding record is physicallydeleted from the database. If an objectattribute is modified, its correspondingrecord attribute is updated, but the pre-



vious attribute value is not retained.The valid-time database keeps only thelatest “snapshot” of the collection of in-terval-objects. Querying a valid-timedatabase cannot give any informationon the past states of the database orhow the collection evolved. Note thatthe database may store records with thesame surrogate but with nonintersect-ing valid-time intervals.

The notion of time is now related tothe valid-time axis. Given a valid-timepoint, interval-objects can be classifiedas past, future, or current (alive), asrelated to this point, if their valid-timeinterval is before, after, or contains thegiven point. Valid-time databases aresaid to correct errors anywhere in thevalid-time domain (past, current, or fu-ture) because the record of any interval-object in the collection can be changedindependently of its position on the val-id-time axis.

An access method for a valid-time da-tabase should (a) store the latest collec-tion of interval-objects, (b) support addi-tion/deletion/modification changes to

this collection, and (c) efficiently querythe interval-objects contained in the col-lection when the query is asked.

Reality is more accurately repre-sented if both time dimensions are sup-ported. The abstraction of a bitemporaldatabase can be viewed as keeping theevolution (through the support of trans-action-time) of a dynamic collection of(valid-time) interval-objects. Figure 3(taken from Kumar et al. [1998]) offersa conceptual view of a bitemporal data-base. Instead of a single collection ofinterval-objects, there is a sequence ofcollections indexed by transaction time.If each interval-object represents a com-pany contract, we can now representhow our knowledge about such con-tracts evolved. When an interval-objectis inserted in the database at transac-tion-time t, a record is created with theobject’s surrogate (contract_no), at-tribute (contract amount) and valid-time interval (contract duration), andan initial transaction-time interval [t,now). The transaction-time intervalendpoint is changed to another transac-tion time if this object is updated later.For example, the record for interval-object I2 has transaction-time interval[t2, t4), since it was inserted in the data-base at transaction-time t2 and “delet-ed” at t4. Such a scenario occurs if attime t4 we realize that a contract waswrongly inserted at the database.

A bitemporal access method should (a)store its past logical states, (b) supportaddition/deletion/modification changes onthe interval-objects of its current logicalstate, and (c) efficiently access and querythe interval-objects on any of its states.

Figure 3 is helpful in summarizingthe differences among the underlyingproblems of the various database types.A transaction-time database differsfrom a bitemporal database in that itmaintains the history of an evolving setof plain-objects instead of interval-ob-jects. A valid-time database differs froma bitemporal since it keeps only onecollection of interval-objects (the latest).Each collection C~ti! can be thought of

Figure 2. Two states of a dynamic collection ofinterval-objects. Only the valid-time intervals ofthe objects are shown. The new collection (b) iscreated from the previous collection (a) after de-leting object I1 and adding object I2. Only the new(latest) collection is retained.



on its own as a separate valid-time da-tabase. A transaction-time database dif-fers from a (traditional) snapshot data-base in that it also keeps its past statesinstead of only the latest state. Finally,the difference between a valid-time anda snapshot database is that the formerkeeps interval-objects (and these inter-vals can be queried).

Most of the methods directly supporta single time-dimension. We categorizemethods that take advantage of increas-ing time-ordered changes as transac-tion-time access methods, since this isthe main characteristic of transaction-time. (The bulk of this paper deals withtransaction-time methods.) Few ap-proaches deal with valid-time accessmethods, and even fewer with the bi-temporal methods category (methodsthat support both time dimensions onthe same index).

3. ITEMS FOR COMPARISON

This section elaborates on the itemsused in comparing the various accessmethods. We start with the kinds ofqueries examined and proceed withother criteria.

3.1 Queries

From a query perspective, valid-timeand a transaction-time databases aresimply collections of intervals. Figures

1, 2(a), and 2(b) differ on how theseintervals were created (which is impor-tant to the access method’s update andspace performance) and their meaning(which is important to the application).Hence, for single-time databases, (validor transaction) queries are of similarform. First, we discuss queries in thetransaction-time domain, i.e., interval Tbelow corresponds to a transaction-timeinterval and “history” is on the transac-tion-time axis. The queries can be cate-gorized into the following classes:

(I) Given a contiguous interval T, findall objects alive during this inter-val.

(II) Given a key range and a contigu-ous time interval T, find the ob-jects with keys in the given rangethat are alive during interval T.

(III) Given a key range, find the historyof the objects in this range.

A special case of class (I) occurs wheninterval T is reduced to a single trans-action time instant t. This query istermed the transaction pure-timeslice.In the company employee example, thisquery is “find all employees working atthe company at time t.” It is usually thecase that an access method that effi-ciently solves the timeslice query is also

Figure 3. A conceptual view of a bitemporal database. The t-axis (v-axis) corresponds to transaction(valid) times. Only the valid-time interval is shown from each interval-object. At transaction time t1 thedatabase recorded that interval-object I1 is added on collection C~t1!. At t5 the valid-time interval ofobject I1 is modified to a new length.



efficient for the more general intervalquery; so we consider the timeslicequery as a good representative of class(I) queries.

Similarly for class (II). Special casesinclude combinations where the keyrange, and/or the transaction time in-terval, contain a single key and a singletime instant, respectively. For simplic-ity, we consider the representative casewhere the time interval is reduced to asingle transaction time instant; this isthe transaction range-timeslice query(“find the employees working at thecompany at time t and whose ssn be-longs in range K”).

From class (III), we chose the specialcase where the key range is reduced to asingle key, as in: “find the salary historyof employee with ssn k.” This is thetransaction pure-key query. If employeek ever existed, the answer would be thehistory of salaries of that employee, elsethe answer is empty. In some methods,an instance of an employee object mustbe provided in the query and its previ-ous salary history found (this is becausethese methods need to include a timepredicate in their search). This specialpure-key query (termed the pure-keywith time predicate) is of the form: “findthe salary history of employee k whoexisted at time t.”

The range and pure timeslices andpure key with time predicate correspondto “range” queries in Rivest’s categori-zation [Rivest 1976], since being “alive”corresponds to two range queries,namely start.time # t and end.time .t. The pure-key without time predicate

is an example of the “exact-match”query, as all objects with the given keyshould be retrieved.

When no key range is specified, queryclass (I) can be thought as a special caseof class (II), and class (III) a special caseof (II) when no interval is specified(rather, all times in history are of inter-est). As some of the proposed methodsare better suited for answering queriesfrom a particular class, we discuss all

three classes separately. We indicatewhen an access method, as originallypresented, does not address queriesfrom a given class, but feel that suchqueries could be addressed with a slightmodification that does not affect themethod’s behavior.

Similarly, we can define the valid-time pure-timeslice for valid-time data-bases (“find all contracts valid at timev”), valid-time range-timeslice (“find allcontracts with numbers in range K andwhich are valid at v”), etc. A bitemporaldatabase enables queries in both timedimensions: “find all contracts thatwere valid on v 5 January 1, 1994, asrecorded in the database at transactiontime t 5 May 1, 1993.” From all con-tracts in the collection C~t! for t 5 May1, 1993, the query retrieves only thecontracts that would be valid on Jan. 1,1994.

The selection of the above queryclasses is definitely not complete, butcontains basic, nontrivial queries. Inparticular, classes (I) and (II) relate tointersection-based queries, i.e., the an-swer consists of objects whose intervalcontains some query time point or ingeneral intersects a query interval. De-pending on the application, other que-ries may be of importance. For example,find all objects with intervals before orafter a query time point/interval, or allobjects with intervals contained in agiven interval [Bozkaya and Ozsoyoglu1995; Nascimento et al. 1996], etc.

A three-entry notation, namely key/valid/ transaction [Tsotras et al. 1998],to distinguish among the various tempo-ral queries, will be used alternatively.This notation specifies which object at-tributes are involved in the query andin what way. Each entry is described asa “point,” “range,” “ *,” or “-”. A “point”for the key entry means that the userhas specified a single value to match theobject key; a “point” for the valid ortransaction entry implies that a singletime instant is specified for the valid ortransaction-time domain. A “range” in-



dicates a specified range of object keyvalues for the key entry, or an intervalfor the valid/transaction entries. A “*”means that any value is accepted in thisentry, while “-” means that the entry isnot applicable for this query. For exam-ple, “*/-/point” denotes the transactionpure-timeslice query, “range/point/-” isthe valid range timeslice query, and“point/-/*” is the transaction pure-keyquery. In a bitemporal environment, thequery “find all the company contractsthat were valid on v 5 January 1,1994,as recorded in the database duringtransaction time interval T : May 1–May 20, 1993” is an example of a “*/point/range” query. As presented, thethree-entry notation deals with inter-section queries, but can easily be ex-tended through the addition of extraentry descriptions to accommodate be-fore/after and other kinds of temporalqueries.

3.2 Access Method Costs

The performance of an access method ischaracterized by three costs: (1) thestorage space to physically store thedata records and the structures of theaccess method, (2) the update process-ing time (the time to update the meth-od’s data structures as a result of achange), and (3) the query time for eachof the basic queries.

An access method has two modes ofoperation: in the Update mode, data isinserted, altered, or deleted while in theQuery mode queries are specified andanswered using the access method. Fora transaction-time access method, theinput for an update consists of a timeinstant t and all the changes that oc-curred in the data in that instant. Achange is further specified by theunique key of the object it affects andthe kind of change (addition, deletion,or attribute modification). The accessmethod’s data structure(s) will then beupdated to include the new change. In-put to a bitemporal access methodwhere the time of the change is speci-

fied along with the changes and theaffected interval-object(s) is performedsimilarly. The input to a valid-time ac-cess method simply contains thechanges and the interval-object(s) af-fected.

For a transaction or a bitemporalmethod, the space is a function of n, thetotal number of changes in the evolu-tion, i.e., n is the summation of inser-tions, deletions, and modification up-dates. If there are 1,000 updates to adatabase with only one record, n is1,000. If there are 1,000 insertions to anempty database and no deletions orvalue modifications, n is also 1,000.Similarly, for 1,000 insertions followedby 1,000 deletions, n is 2,000. Note thatn corresponds to the minimal informa-tion needed for storing the evolution’spast. We assume that the total numberof transaction instants is also O~n!. Thisis a natural assumption, since everyreal computer system can process a pos-sibly large but limited number of up-dates per transaction instant.

In a valid-time method, space is afunction of l, the number of interval-objects currently stored in the method,i.e., the size of the collection. For exam-ple, in both Figures 2(a) and 2(b), l isseven.

A method’s query time is a function ofthe answer size a. We use a to denotethe answer size of a query in general.

Since temporal data can be large (es-pecially in transaction and bitemporaldatabases), a good solution should usespace efficiently. A method with fastupdate processing can be utilized evenwith a quickly changing real-world ap-plication. In addition, fast query timeswill greatly facilitate the use of tempo-ral data.

The basic queries that we examinecan be considered as special cases ofclassical problems in computational ge-ometry, for which efficient in-core (mainmemory) solutions have been provided[Chiang and Tamassia 1992]. It shouldbe mentioned that general computa-



tional geometry problems support physi-cal deletions of intervals. Hence, they aremore closely related to the valid-time da-tabase environment. The valid pure-time-slice query (“*/point/-”) is a special case ofthe dynamic interval management prob-lem. The best in-core bounds for dynamicinterval management are given by thepriority-search tree data structure in Mc-Creight [1985], yielding O~l! space,O~log l! update processing per change,and O~log l 1 a! query time (all loga-rithms are base-2). Here l is the numberof intervals in the structure when thequery/update is performed. The range-timeslice query is a special case of theorthogonal segment intersection problem,for which a solution using O~l log l!space, O~~log l!loglog l! update process-ing, and O~~log l!loglog l 1 a! querytime is provided in Mehlhorn[1984]; an-other solution that uses a combination ofthe priority-search tree [McCreight 1985]and the interval tree [Edelsbrunner 1983]yields O~l! space, O~log l! update pro-cessing, and O~log2l 1 a! query time.

The problems addressed by transac-tion or bitemporal methods are relatedto work on persistent data structures[Driscoll et al. 1989]. In particular,Driscoll et al. [1989] shows how to takean in-core “ephemeral data structure”(meaning that past states are erasedwhen updates are made) and convert itto a “persistent data structure” (wherepast states are maintained). A “fullypersistent” data structure allows up-dates to all past states. A “partiallypersistent” data structure allows up-dates only to the most recent state. Dueto the properties of transaction timeevolution, transaction and bitemporalaccess methods can be thought of asdisk extensions of partially persistentdata structures.

3.3 Index Pagination and Data Clustering

In a database environment the cost of acomputation is not based on how manymain memory slots are accessed or how

many comparisons are made (as thecase with in-core algorithms), but in-stead on how many pages are trans-ferred between main and secondarymemory. In our comparison this is verycrucial, as the bulk of data is stored insecondary storage media. So it is natu-ral to use an I/O complexity cost [Kanel-lakis et al. 1993] that measures thenumber of disk accesses for updatingand answering queries. The need to useI/O complexity for secondary storagestructures is also recognized in the “the-ory of indexability” [Hellerstein et al.1997]. Index pagination and data clus-tering are two important aspects whenconsidering the I/O complexity of querytime. How well the index nodes of amethod are paginated is dealt with bythe process of index pagination. Sincethe index is used as a means to searchfor and update data, its paginationgreatly affects the performance of themethod. For example, a B1-tree is awell-paginated index, since it requiresO~logBr! page accesses for searching orupdating r objects, using pages of sizeB. The reader should be careful with thenotation: logBr is itself an O~log2r!function only if B is considered a con-stant. For an I/O environment, B isanother problem variable. Thus, logBrrepresents a log2B speedup over log2r,which, for I/O complexity, is a greatimprovement. Transferring a page takesabout 10 msec on the fastest diskdrives; in contrast, comparing two inte-gers in main memory takes about 5nsec. Accessing pages also uses CPUtime. The CPU cost of reading a pagefrom the disk is about 2,000 instruc-tions [Gray and Reuter 1993].

Data clustering can also substantiallyimprove the performance of an accessmethod. If data records that are “logi-cally” related for a given query can alsobe stored physically close, then thequery is optimized as fewer pages areaccessed. Consider for example an ac-cess method that can cluster data insuch a way that answering the transac-



tion pure-timeslice query takesO~logBn 1 a/B! page accesses. Thismethod is more I/O efficient than an-other method that solves the samequery in O~logBn 1 a! page accesses.Both methods use a well-paginated in-dex (which corresponds to the logarith-mic part of the query). However, in thesecond method, each data record thatbelongs to the answer set may be storedon a separate page, thus requiring amuch larger number of page accessesfor solving the query.

Data can be clustered by time dimen-sion only, where data records “alive” forthe same time periods are collocated, orby both time and key range, or by keyrange only. Note that a clustering strat-egy that optimizes a given class of que-ries may not work for another queryclass; for example, a good clusteringstrategy for pure-key queries stores allthe versions of a particular key in thesame page; however, this strategy doesnot work for pure-timeslice queries be-cause the clustering objective is differ-ent.

Clustering is in general more difficultto maintain in a valid-time accessmethod because of its dynamic behav-ior. The answer to a valid-time querydepends on the collection of interval-objects currently contained in the accessmethod; this collection changes as valid-time updates are applied. Even thoughsome good clustering may have beenachieved for some collection, it may notbe as efficient for the next collectionproduced after a number of valid-timeupdates. In contrast, in transaction orbitemporal access methods, the past isnot changed, so an efficient clusteringcan be retained more easily, despite up-dates.

Any method that clusters data (a pri-mary index) and uses, say, O~logBn 1a/B! pages for queries can also be used(less efficiently) as a secondary index byreplacing the data records with pointersto pages containing data records, thususing O~logBn 1 a! pages for queries.The distinction between methods used

as primary indexes and methods usedas secondary indexes is one of efficiency,not of algorithmic properties.

We use the term “primary index” tomean that the index controls the physi-cal placement of data only. For example,a primary B1-tree has data in theleaves. A secondary B1-tree has onlykeys and references to data pages(pointers) in the leaves. Primary in-dexes need not be on the primary keysof relations. Many of the methods doexpect a unique nontime varying key foreach record; we do not attempt to dis-cuss how these methods might be modi-fied to cluster records by nonuniquekeys.

3.4 Migration of Past Data to AnotherLocation

Methods that support transaction timemaintain all their past states, a prop-erty that can easily result in excessiveamounts of data (even for methods thatsupport transaction time in the mostspace-efficient way). In comparing suchmethods, it is natural to introduce twoother comparison considerations: (a)whether or not past data can be sepa-rated from the current data, so that thesmaller collection of current data can beaccessed more efficiently, and (b)whether data is appended sequentiallyto the method and never changed, sothat write-once read-many (WORM) de-vices could be used.

For WORMs, one must burn into thedisk an entire page with a checksum(the error rate is high, so a very longerror-correcting code must be appendedto each page). Thus, once a page iswritten, it cannot be updated. Note thatsince WORM devices are themselvesrandom access media, any accessmethod that can use WORM devices canalso be used with magnetic disks(only).There are no access methods restrictedto the use of WORMs.

3.5 Lower Bounds on I/O Complexity

We first establish a lower bound on theI/O complexity of basic transaction-time



queries. The lower bound is obtainedusing a comparison-based model ina paginated environment, and appliesto the transaction pure-timeslice (“*/-/point”), range-timeslice (“range/-/point”), and pure-key (with time predi-cate or a “point/-/range”) query. Anymethod that attempts to solve such aquery in linear (O~n/B!) space needs atleast V~logBn 1 a/B! I/Os to solve it.

Since a corresponds to the query an-swer size, to provide the answer, nomethod can do better than O~a/B! I/Os;a/B is the minimal number of pages tostore this answer. Note that the lowerbound discussion assumes that a querymay ask for any time instant in the setof possible time instants. That is, alltime instants (whether recent or past)have the same probability of being inthe query predicate. This fact separatesthis discussion from cases where que-ries have special properties (for exam-ple, if most queries ask for the mostrecent times). While this does not affectthe answer part (a/B) of the lowerbound, it does affect the logarithmicsearch part ~logBn!. We could probablylocate the time of interest faster if weknew that this instant is among themost recent times. Under the aboveequal query probability assumption, weproceed with the justification for thelogarithmic part of the lower bound.Since the range-timeslice query is moregeneral than the pure-timeslice query,we first show that the pure-timesliceproblem is reduced to the “predecessor”problem for which a lower bound is thenestablished [Tsotras and Kangelaris1995]. A similar reduction can beproved for the pure-key query with timepredicate.

The predecessor problem is defined asfollows: Given an ordered set P of Ndistinct items, and an item k, find thelargest member of set P that is less thanor equal to k. For the reduction of thepure-timeslice problem, assume that setP contains integers t1 , t2 , . . . , tNand consider the following real-world

evolution: at time t1, a single real-worldobject with name ~oid!t1 is created andlives until just before time t2, i.e., thelifespan of object t1 is @t1, t2!. Then,real-world object t2 is born at t2 andlives for the interval @t2, t3!, and so on.So at any time instant ti the state of thereal-world system is a single object withname ti. Hence the N integers corre-spond to n 5 2N changes in the aboveevolution. Consequently, finding thewhole timeslice at time t reduces tofinding the largest element in set P thatis less or equal to t, i.e., the predecessorof t inside P.

We show that in the comparison-based model and in a paginated envi-ronment, the predecessor problem needsat least V~logBN! I/Os. The assumptionis that each page contains B items, andthere is no charge for a comparisonswithin a page. Our argument is basedon a decision tree proof. Let the firstpage be read and assume that the itemsread within that page are sorted (sort-ing inside one page is free of I/Os). Byexploring the entire page using compar-isons, we can only get B 1 1 differentanswers concerning item k. These corre-spond to the B 1 1 intervals created byB items. No additional information canbe retrieved. Then, a new page is re-trieved that is based on the outcome ofthe previous comparisons on the firstpage, i.e., a different page is read everyB 1 1 outcomes. In order to determinethe predecessor of k, the decision treemust have N leaves (since there are Npossible predecessors). As a result, theheight of the tree must be logBN. Thusany algorithm that solves the paginatedversion of the predecessor problem inthe comparison model needs at leastV~logBN! I/Os.

If there were a faster than O~logBn1 a/B! method for the pure-timesliceproblem using O~n/B! space, then we



would have invented a method thatsolves the above predecessor problem inless than O~logBN! I/Os.

Observe that we have shown thelower bound for the query time of meth-ods using linear space, irrespective ofupdate processing. If the elements of setP are given in order, one after the other,O~1! time (amortized) per element isneeded in order to create an index onthe set that would solve the predecessorproblem in O~logBN! I/Os (more accu-rately, since no deletions are needed, weonly need a fully paginated, multilevelindex that increases in one direction). Ifthese elements are given out of order,then O~logBN! time is needed per inser-tion (B-tree index). In the transactionpure timeslice problem, (“*/-/point”)time is always increasing and O~1! timefor update processing per change isenough and clearly minimal. Thus wecall a method I/O optimal for the trans-action pure-timeslice query if it achievesO~n/B! space and O~logBn 1 a/B!query time using constant updating.

Similarly, for the transaction range-timeslice problem (“range/-/point”), wecall a method I/O optimal if it achievesO~logBn 1 a/B! query time, O~n/B!space and O~logBm! update processingper change. m is the number of aliveobjects when the update takes place.Logarithmic processing is needed be-cause the range-timeslice problem re-quires ordering keys by their values.Changes arrive in time order, but out ofkey order, and there are m alive keys onthe latest state from which an updatehas to choose.

For the transaction pure-key withtime predicate, the lower bound forquery time is V~logBn 1 a/B!, since thelogarithmic part is needed to locate thetime predicate in the past and a/B I/Osare required to provide the answer inthe output.

The same lower bound holds for bi-temporal queries, since they are at leastas complex as transaction queries. For

example, consider the “*/point/point”query specified by a valid time v and atransaction time t. If the valid-time in-terval of each interval object extendsfrom 2` to ` in the valid-time domain,finding all interval objects that at t,where intersecting v, reduces to findingall interval-objects in collection C~t!(since all of them would contain thevalid instant v). However, this is the“*/-/point” query.

Since from a query perspective a validand a transaction-time database areboth collections of intervals, a similarlower bound applies to the correspond-ing valid-time queries (by replacing n byl, the number of interval-objects in thecollection). For example, any algorithmsolving the “*/point/-” query in O~l/B!space needs at least V~logBl 1 a/B!I/Os query time.

4. EFFICIENT METHOD DESIGN FORTRANSACTION/BITEMPORAL DATA

Common to all methods that supportthe transaction time axis is the problemof how to efficiently store large amountsof data. We first consider the transac-tion pure-timeslice query and show whyobvious solutions are not efficient. Wealso discuss the transaction pure-keyand range-timeslice queries. Bitemporalqueries follow. The problem of separat-ing past from current data (and the useof WORM disks) is also examined.

4.1 The Transaction Pure-Timeslice Query

There are two straightforward solutionsto the transaction pure-timeslice query(“*/-/point”) which, in our comparison,serve as two extreme cases; we denotethem the “copy” and “log” approaches.

The “copy” approach stores a copy ofthe transaction database state s~t!(timeslice) for each transaction timethat at least one change occurred. Thesecopies are indexed by time t. Access to astate s~t! is performed by searching for



time t on a multilevel index on the timedimension. Since changes arrive in or-der, this multilevel index is clearly pag-inated. The closest time that is less orequal to t is found with O~logBn! pageaccesses. An additional O~a / B! I/Otime is needed to output the copy of thestate, where a denotes the number of“alive” objects in the accessed databasestate. The major disadvantage of the“copy” approach is with the space andupdate processing requirements. Thespace used can in the worst case beproportional to O~n2/B!. This happensif the evolution is mainly composed of“births” of new objects. The databasestate is thus enlarged continuously. Ifthe size of the database remains rela-tively constant, due to deletions andinsertions balancing out, and if thereare p records on average, the space usedis O~np/B!.

Update processing is O~n/B! perchange instant in a growing databaseand O~p/B! per change instant in anongrowing database, as a new copy ofthe database has to be stored at eachchange instant. The “copy” approachprovides a minimal query time. How-ever, since the information stored ismuch more than the actual changes, thespace and update requirements suffer.

A variant on the copy approach storesa list of record ADDRESSES that are“alive” each time at least one changeoccurs. The total amount of space usedis smaller than if the records them-selves were stored in each copy. How-ever, the asymptotic space used is stillO~n2/B! for growing databases andO~np/B! for databases whose size doesnot increase significantly over time.This means most records have O~n! ref-erences in the index. “n” does not haveto be very large before the index isseveral times the size of the record col-lection. In addition, by changing from aprimary to a secondary unclusteredstructure, O~a!, not O~a/B!, pages mustbe accessed to output the copy of the a

alive records (after the usual O~logBn!accesses to find the correct list).

In the remainder of this paper, wewill not consider any secondary indexes.In order to make a fair comparison,indexes that are described as secondaryby their authors will be treated as ifthey were primary indexes. Secondaryindexes never cluster data in diskpages, and thus always lose out inquery time. Recall that by “primary”index we mean only an index that dic-tates the physical location of records,not an index on “primary key.” Second-ary indexes can only cluster referencesto records, not the records themselves.

In an attempt to reduce the quadraticspace and linear updating of the “copy”approach, the “log” approach stores onlythe changes that occur in the databasetimestamped by the time instant onwhich they occurred. Update processingis clearly reduced to O~1! per change,since this history management schemeappends the sequence of inputs in a“log” without any other processing. Thespace is similarly reduced to the mini-mal O~n/B!. Nevertheless, thisstraightforward approach will increasethe query time to O~n/B!, since in orderto reconstruct a past state, the whole“log” may have to be searched.

Combinations of the two straightfor-ward approaches are possible; for exam-ple, a method could keep repeated time-slices of the database state and “logs” ofthe changes between the stored time-slices. If repeated timeslices are storedafter some bounded number of changes,this solution is equivalent to the “copy”approach, since it is equivalent to usingdifferent time units (and thereforechanging only the constant in the spacecomplexity measure). If the number ofchanges between repeated timeslices isnot bounded, the method is equivalentto the “log” approach, as it correspondsto a series of logs. We use the twoextreme cases to characterize the per-formance of the examined transaction-time methods. Some of the proposedmethods are equivalent to one of the



two extremes. However, it is possible tocombine the fast query time of the firstapproach with the space and updaterequirements of the second.

In order for a method to answer thetransaction pure-timeslice (“*/-/point”)query efficiently, data must at least beclustered according to its transactiontime behavior. Since this query asks forall records “alive” at a given time, thisclustering can only be based on thetransaction time axis, i.e., records thatexist at the same time should be clus-tered together, independently of theirkey values. We call access methods thatcluster by time only, (transaction) time-only methods. There are methods thatcluster by both time and key; we callthem (transaction) time-key methods.They optimize queries that involve bothtime and key predicates, such asthe transaction range-timeslice query(“range/-/point”). Clustering by timeonly can lead to constant update pro-cessing per change; thus a good time-only method can “follow” its inputchanges “on-line.” In contrast, cluster-ing by time and key needs some loga-rithmic updating because changes ar-rive in time order but not in key order;some appropriate placement of changeis needed based on the key it (thechange) is applied on.

4.2 The Transaction Pure-Key Query

The “copy” and “log” solutions could beused for the pure-key query (“point/-/*”).But they are both very inefficient. The“copy” method uses too much space, nomatter what query it is used for. Inaddition, finding a key in a timesliceimplies either that one uses linearsearch or that there is some organiza-tion on each timeslice (such as an indexon the key). The “log” approach requiresrunning from the beginning of the log tothe time of the query, keeping the mostrecent version of the record with thatkey. This is still in O~n/B! time.

A better solution is to store the his-tory of each key separately, i.e., clusterdata by key only. This creates a (trans-

action) key-only method. Since at eachtransaction time instant there exists atmost one “alive” version of a given key,versions of the same key can be linkedtogether. Access to a key’s (transaction-time) history can be implemented by ahashing function (which must be dy-namic hashing, as it has to support theaddition of new keys) or a balanced mul-tiway search tree (B-tree). Hashing pro-vides constant access (in the expectedamortized sense), while the B-tree pro-vides logarithmic access. Note thathashing does not guarantee againstpathological worst cases, while the B-tree does. Hashing cannot be used toobtain the history for a range of keys (asin the general class (III) query). Afterthe queried key is identified, its wholehistory can be retrieved (forward orbackward reconstruction using the listof versions).

The list of versions of each key can befurther organized in a separate arrayindexed by transaction time to answer apure-key query with time predicate(“point/-/range”). Since updates are ap-pended at the end of such an array, asimple paginated multilevel index canbe implemented on each array to expe-dite searching. Then a query of the form“provide the history of key k after (be-fore) time t” is addressed by first findingk (using hashing or the B-tree) and thenlocating the version of k that is closestto transaction time t using the multi-level index on k’s versions. This takesO~logBn! time (each array can be O~n/B! large).

The above straightforward data clus-tering by key is only efficient for classIII queries, but is not efficient for any ofthe other two classes. For example, toanswer a “*/-/point” query, each keyever created in the evolution must besearched for being “alive” at the querytransaction time, and it takes logarith-mic time to search each key’s versionhistory.



4.3 The Transaction Range-TimesliceQuery

If records that are “logically” related fora given query can also be stored physi-cally close, then the query is optimizedas fewer pages are accessed. Therefore,to answer a “range/-/point” query effi-ciently, it is best to cluster by transac-tion time and key within pages. This isvery similar to spatial indexing; but ithas some special properties.

If the time-key space is partitionedinto disjoint rectangles, one for eachdisk page, and only one copy of eachrecord is kept, long-lived records(records with long transaction-timeintervals) have to be collocated withmany short-lived ones that cannot all fiton the same page. We cannot partitionthe space in this way without allowingduplicate records. So we are reduced toeither making copies (data duplication),allowing overlap of time-key rectangles(data bounding), or mapping recordsrepresented by key (transaction)start_time and end_time, to points inthree-dimensional space (data mapping)and using a multidimensional searchmethod.

Time-key spaces do not have the “den-sity” problem of spatial indexes. Densityis defined as the largest overlap of spa-tial objects at a point. There is only oneversion of each key at a given time, sotime-key objects (line segments in time-key space) never overlap. This makesdata duplication a more attractive op-tion than spatial indexing, especially ifthe amount of duplication can be lim-ited as in Eaton [1986], Lomet and Sal-zberg [1990], Lanka and Mays [1991],Becker et al. [1996], and Varman andVerma [1997].

Data bounding may force single-pointqueries to use backtracking, since thereis not a unique path to a given time-keypoint. In general, for the data-boundingapproach, temporal indexing has worseproblems than spatial indexing becauselong-lived records are likely to be com-mon. In a data-bounding structure, sucha record is stored in a page with a long

time-span and some key range. Everytimeslice query in that timespan mustaccess that page, even though the long-lived record may be the only one alive atsearch time (the other records in thepage are alive in another part of thetimespan). The R-tree-based-methodsuse data bounding [Stonebraker 1987;Kolovson and Stonebraker 1989; 1991].

The third possibility, data mapping,maps a record to three (or more) coordi-nates—transaction start_time, end-_time, and key(s)—and then uses a mul-tiattribute point index. Here recordswith long transaction-time intervals areclustered with other records with longintervals because their start and endtimes are close. If they were alive atnearby times, records with short trans-action-time intervals are clustered withother records with short intervals. Thisis efficient for most queries, as the long-lived records are the answers to manyqueries. The pages with short-livedrecords effectively partition the answersto different queries; most such pagesare not touched for a given timeslicequery. However, there are special prob-lems because many records may still becurrent and have growing lifetimes (i.e.,transaction-time intervals extending tonow). This approach is discussed fur-ther at the end of Section 5.1.3.

Naturally, the most efficient methodsfor the transaction range-timeslicequery are the ones that combine thetime and key dimensions. In contrast,by using a (transaction) time-onlymethod, the whole timeslice for thegiven transaction time is first recon-structed and then the records with keysoutside the given range are eliminated.This is clearly inefficient, especially ifthe requested range is a small part ofthe whole timeslice.

4.4 Bitemporal Queries

An obvious approach is to index bitem-poral objects on a single time axis(transaction or valid time) and use asingle time access method. For example,if a transaction access method is uti-



lized, a bitemporal “*/point/point” queryis answered in two steps. First all bi-temporal objects existing at transactiontime t are found. Then the valid timeinterval of each such object is checkedto see if it includes valid time v. Thisapproach is inefficient because very fewof the accessed objects may actually sat-isfy the valid-time predicate.

If both axes are utilized, an obviousapproach is an extended combination ofthe “copy” and “log” solutions. This ap-proach stores copies of the collectionsC~t! (Figure 3) at given transaction-timeinstants and a log of changes betweencopies. Together with each collectionC~t!, an access method (for example anR-tree [Guttman 1984]) that indexes theobjects of this C~t! is also stored. Con-ceptually it is like storing snapshots ofR-trees and the changes between them.While each R-tree enables efficientsearching on a stored collection C~t!, theapproach is clearly inefficient becausethe space or query time increases dra-matically, depending on the frequencyof snapshots.

The data bounding and data mappingapproaches can also be used in a bitem-poral environment. However, the added(valid-time) dimension provides an ex-tra reason for inefficiency. For example,the bounding rectangle of a bitemporalobject consists of two intervals (Figure4; taken from Kumar et al. [1998]). A“*/point/point” query is translated into

finding all rectangles that include thequery point (ti, vj) An R-tree [Guttman1984] could be used to manage theserectangles. However, the special charac-teristics of transaction time (many rect-angles may extend up to now) and theinclusion of the valid-time dimensionincrease the possibility of extensiveoverlap, which in turn reduces the R-tree query efficiency [Kumar et al.1998].

4.5 Separating Past from Current Dataand Use of WORM disks

In transaction or bitemporal databases,it is usually the case that access tocurrent data is more frequent than topast data (in the transaction-timesense). In addition, since the bulk ofdata in these databases is due to thehistorical part, it is advantageous to usea higher capacity, but slower access me-dium, for the past data such as opticaldisks. First, the method should providefor natural separation between currentand past data. There are two ways toachieve this separation: (a) with the“manual” approach a process will vac-uum all records that are “dead” (in thetransaction-time sense) when the pro-cess is invoked (this vacuuming processcan be invoked at any time); (b) with the“automated” approach, where such“dead” records are migrated to the opti-cal disk due directly to the evolutionprocess (for example during an update).The total I/O involved is likely to besmaller than in a manual method, sinceit is piggybacked on I/O, which, in anycase, is necessary for index mainte-nance (such as splitting a full node).

Even though write-many read-manyoptical disks are available,WORM opti-cal disks are still the main choice forstoring large amounts of archival data;they are less expensive, have larger ca-pacities, and usually have faster writetransfer times. Since the contents ofWORM disk blocks cannot be changedafter their initial writing (due to addederror-correcting code), data that is to beappended on a WORM disk should not

Figure 4. The bounding-rectangle approach forbitemporal queries (the key dimension is notshown). The evolution of Figure 3 is depicted as of(transaction) time t . t5. Modification of intervalI1 at t5 ends the initial rectangle for I1 and insertsa new rectangle from t5 to now.



be allowed to change in the future.Since on the transaction axis the past isnot changed, past data can be writtenon the WORM disk.

We emphasize again that methodsthat can be used on WORM disks arenot “WORM methods”—they can also beused on magnetic disks. Thus the ques-tion of separation of past and currentrecords can be considered regardless ofthe availability of WORM disks.

5. METHOD CLASSIFICATION ANDCOMPARISON

This section provides a concise descrip-tion of the methods we examine. Since itis practically impossible to run simula-tions for all methods on the same collec-tions of data and queries, our analysis isbased on worst-case performance. Vari-ous access method proposals provide aperformance analysis that may havestrong assumptions about the inputdata (uniform distribution of datapoints, etc.), and it may very well bethat under those constraints the pro-posed method works quite well. Ourpurpose, however, is to categorize themethods without any assumption on theinput data or the frequency of queries.Obviously, the worst-case analysis maypenalize a method for some very un-likely scenarios; to distinguish againstlikely worst cases, we call such scena-rios pathological worst cases. We alsopoint out some features that may affectaverage-case behavior without neces-sarily affecting worst-case behavior.

We first describe transaction-time ac-cess methods. These methods are fur-ther classified as key-only, time-only,and time-key, based on the way data isclustered. Among the key-only methods,we study reverse chaining, accessionlists, time sequence arrays, and C-lists.Among time-only methods, we examineappend-only tree, time-index and itsvariants (monotonic B-tree, time-in-dex1), the differential file approach,checkpoint index, archivable time index,snapshot index, and the windowsmethod. In the time-key category, we

present the POSTGRES storage systemand the use of composite indexes, seg-ment-R tree, write-once B-tree, time-splitB-tree, persistent B-tree, multiversion B-tree, multiversion access structure, andthe overlapping B-tree. A comparisontable (Table II) is included at the end ofthe section with a summary of eachmethod’s worst-case performance. Wethen proceed with the valid-time accessmethods, where we discuss the meta-block tree, external degment tree, exter-nal interval tree, and the MAP21 meth-ods. The bitemporal category describesM-IVTT, the bitemporal interval tree,and bitemporal R-tree.

5.1 Transaction-Time Methods

In this category we include methodsthat assume that changes arrive in in-creasing time order, a characteristic oftransaction time. This property greatlyaffects the update processing of themethod. If “out of order” changes (acharacteristic of valid-time) are to besupported, the updating cost becomesmuch higher (practically prohibitive).

5.1.1 Key-Only Methods. The basiccharacteristic of transaction key-onlyapproaches is the organization of evolv-ing data by key (surrogate), i.e., all ver-sions that a given key assumes are“clustered” together logically or physi-cally. Such organization makes thesemethods more efficient for transactionpure-key queries. In addition, the ap-proaches considered here correspond tothe earliest solutions proposed for time-evolving data.

Reverse chaining was introduced inBen-Zvi [1982] and further developed inLum et al. [1984]. Under this approach,previous versions of a given key arelinked together in reverse chronologicalorder. The idea of keeping separatestores for current and past data wasalso introduced. Current data is as-sumed to be queried more often, so byseparating it from past data, the size ofthe search structure is decreased andqueries for current data become faster.



Each version of a key is representedby a tuple (which includes the key, at-tribute value, and a lifespan interval)augmented with a pointer field thatpoints to the previous version (if any) ofthis key. When a key is first insertedinto a relation, its corresponding tupleis put into the current store with itsprevious-version pointer being null.When the attribute value of this key ischanged, the version existing in the cur-rent store is moved to the past store,with the new tuple replacing it in thecurrent store. The previous-versionpointer of the new tuple points to thelocation of the previous version in thepast store. Hence a chain of past ver-sions is created out of each current key.Tuples are stored in the past store with-out necessarily being clustered by key.

Current keys are indexed by a regularB1-tree (“front” B1-tree). The chain ofpast versions of a current key is ac-cessed by following previous-versionpointers starting from the current key.If a current key is deleted, it is removedfrom the B1-tree and is inserted in asecond B1-tree (“back” B1-tree), whichindexes the latest version of keys thatare not current. The past version chainof the deleted key is still accessed fromits latest version stored in the “back”B1-tree. If a key is “reborn” it is rein-serted in the “front” B1-tree. Subse-quent modifications of this current keycreate a new chain of past versions. It isthus possible to have two chains of pastversions, one starting from its currentversion and one from a past version, forthe same key. Hence queries about thepast are directed to both B1-trees. If thekey is deleted again later, its new chainof past versions is attached to its previ-ous chain by appropriately updating thelatest version stored in the “back” B1-tree.

Clearly, this approach uses O~n/B!space, where n denotes the number ofchanges and B is the page size. Thenumber of changes corresponds to thenumber of versions for all keys evercreated. When a change occurs (such as

a new version of key or the deletion ofkey), the “front” B1-tree (current store)has first to be searched to locate thecurrent version of key. If it is a deletion,the “back” B1-tree is also searched tolocate the latest version of key, if any.So the update processing of this methodis O~logBn!, since the number of differ-ent keys can be similar to the number ofchanges.

To find all previous versions of agiven key, the “front” B1-tree is firstsearched for the latest version of key; ifa key is in the current store, its pointerwill provide access to recent past ver-sions of the key. Since version lists arein reverse chronological order, we haveto follow such a list until a versionnumber (transaction timestamp) that isless or equal to the query timestamp isfound. The “back” B1-tree is thensearched for older past versions. If adenotes all past versions of a key, thequery time is O~logBn 1 a!, since ver-sions of a given key could in the worstcase be stored in different pages. Thiscan be improved if cellular chaining,clustering, or stacking is is used [Ahnand Snodgrass 1988]. If each collectionof versions for a given key is clusteredin a set of pages but versions of distinctkeys are never on the same page, querytime is O~logBn 1 a/B! but space utili-zation is O~n! pages (not O~n/B!, as theversions of a key may not be enough tojustify the use of a full page.

Reverse chaining can be further im-proved by the introduction of accessionlists [Ahn and Snodgrass 1988]. An ac-cession list clusters all version numbers(timestamps) of a given key together.Each timestamp is associated with apointer to the accompanying tuple,which is stored in the past store (or to acluster of tuples). Thus, instead ofsearching a reverse chain until a giventimestamp is reached, we can search anindex of the chain’s timestamps. Astimestamps are stored in chronologicalorder on an accession list, finding theappropriate version of a given key takes



O~logBn 1 logBa!. The space and up-date processing remain as before.

While the above structure can be effi-cient for a transaction pure-key query,answering pure- or range-timeslice que-ries is problematic. For example, to an-swer a “*/-/point” query that is satisfiedonly by some keys, we have to searchthe accession lists of all keys ever cre-ated.

Another early approach proposed theuse of time sequence arrays (TSAs)[Shoshani and Kawagoe 1986]. Concep-tually, a TSA is a two-dimensional ar-ray with a row for each key ever cre-ated; each column represents a timeinstant. The ~x, y! entry stores thevalue of key x at time y. Static (the dataset has been fully collected) and dy-namic (the data set is continuouslygrowing , as in a transaction-time envi-ronment) data are examined. If thisstructure is implemented as a two-di-mensional array, query time is minimal(just access the appropriate array en-try), but update processing and spaceare prohibitive (O~n! and O~n2!, respec-tively). We could implement each row asan array, keeping only those valueswhere there was a change; this is con-ceptually the same solution as reversechaining with accession lists. A solutionbased on a multidimensional partition-ing scheme is proposed in Rotem andSegev [1987], but the underlying as-sumption is that the whole temporalevolution is known in advance, beforethe partitioning scheme is implemented.

The theoretically optimal solution forthe transaction pure-key query withtime predicate is provided by the C-listsof Varman and Verma [1997]. C-listsare similar to accession lists, in thatthey cluster the versions of a given keytogether. There are two main differ-ences. First, access to each C-list is pro-vided through another method, the mul-tiversion access structure (MVAS inshort; MVAS is discussed later with thetime-key methods) [Varman and Verma1997]. Second, maintenance is morecomplicated: splitting/ merging C-list

pages is guided by page splitting/merg-ing MVAS (for details, see Varman andVerma [1997]). If there are m “alive”keys in the structure, updating takesO~logBm!. The history of key k beforetime t is found in O~logBn 1 a/B! I/Os,which is optimal. C-lists have an advan-tage in that they can be combined withthe MVAS structure to create a methodthat optimally answers both the range-timeslice and pure-key with time predi-cate queries. However, the methodneeds an extra B1-tree, together withdouble pointers between the C-lists andMVAS, which adds implementationcomplexity.

5.1.2 Time-OnlyMethods. Mosttime-only methods timestamp changes (addi-tions, deletions, etc.) by the transactiontime they occurred and append them insome form of a “history log.” Since noclustering of data according to keys ismade, such methods optimize “*/-/point”or “*/-/range” queries. Because changesarrive in chronological order, ideally atime-only method can provide constantupdate processing (as the change is sim-ply appended at the end of the “historylog”); this advantage is important inapplications where changes are fre-quent and the database has to “follow”these changes in an on- line fashion. Forefficient query time, most methods usesome index on the top of the “historylog” that indexes the (transaction) time-stamps of the changes. Because of thetime-ordered changes, the cost of main-taining this (paginated) index on thetransaction time-axis is minimal, amor-tized O~1! per change.

While organizing data by only its timebehavior provides for very fast updat-ing, it is not efficient for answeringtransaction range-timeslice queries. Inorder to use time-only methods for suchqueries, one suggestion is to employ aseparate key index, whose leaves pointto predefined key “regions” [Elmasri etal. 1993; Gunadhi 1993]. A key regioncould be a single key or a collection ofkeys (either a subrange of the key space



or a relation). The history of each “re-gion” is organized separately, using anindividual time-only access method(such as the time index or the append-only tree). The key index will direct achange of a given key to update themethod that keeps the history of thekey’s region. However, after the regionis found, the placement of this key inthe region’s access method is based onthe key’s time behavior only (and nolonger on the key itself).

To answer transaction range-time-slice queries, we have to search thehistory of each region that belongs tothe query range. Thus the range-time-slice is constructed by creating the indi-vidual timeslices for every region in thequery range. If R is the number of re-gions, the key index adds O~R/B! spaceand O~logBR! update processing to theperformance of the individual historicalaccess methods. The query time for thecombination of the key index and thetime-only access methods is o~Mf~ni, t,ai!!, where M is the number of regionsthat fall in the given query range andf~ni, t, ai! is the time needed in eachindividual region ri~i 5 1, ..., m! toperform a timeslice query for time t (ni

and ai correspond to the total number ofchanges in ri and the number of “alive”objects from ri at the time t, respective-ly). For example, if the time-index [El-masri et al. 1990] is used as the accessmethod in each individual region, thenf~ni, t, ai! 5 O~logBni 1 ai/B!.

There are three drawbacks to this ap-proach: (1) If the query key range is asmall subset of a given region, thewhole region’s timeslice is recon-structed, even if most of its objects maynot belong to the query range and thusdo not contribute to the answer. (2) Ifthe query key range contains many re-gions, all these regions have to besearched, even if they may contributeno “alive” objects at the transactiontime of interest t. (3) For every regionexamined, a logarithmic search, at best,

is performed to locate t among thechanges recorded in the region. To putthis in perspective, imagine replacing amultiattribute spatial search structurewith a number of collections of recordsfrom predefined key ranges in one at-tribute and then organizing each keyrange by some other attribute.

To answer general pure-key queries ofthe form “find the salary history of em-ployee named k,” an index on the keyspace can be utilized. This index keepsthe latest version of a key, while keyversions are linked together. Since thekey space is separate from the timespace, such an index is easily updated.In some methods this index has theform of a B1-tree, and is also facilitatedfor transaction range-timeslice queries(such as the surrogate superindex usedin the AP-tree [Gunadhi and Segev1993] and the archivable time index[Verma and Varman 1994]) or it has theform of a hashing function, as in thesnapshot index [Tsotras and Kangelaris1995]. A general method is to linkrecords to any one copy of the mostrecent distinct past version of therecord. We continue with the presenta-tion of various time-only methods.

Append-Only Tree. The append-only tree (AP-Tree) is a multiwaysearch tree that is a hybrid of an ISAMindex and a B1-tree. It was proposed asa method to optimize event-joins [Segevand Gunadhi 1989; Gunadhi and Segev1993]. Here we examine it as an accessmethod for the query classes of section3.1. Each tuple is associated with a(start_time, end_time) interval. The ba-sic method indexes the start_times oftuples. Each leaf node has entries of theform: ~t, b! where t is a time instantand b is a pointer to a bucket thatcontains all tuples with start_timegreater than the time recorded in theprevious entry (if any) and less than orequal to t. Each nonleaf node indexesnodes at the next level (Figure 5).

In the AP-tree, insertions of newtuples arrive in increasing start_time



order; on this basis, we consider it atransaction-time method. It is also as-sumed that the end_times of tuples areknown when a tuple is inserted in theaccess method. In that case the updateprocessing is O~1!, since the tuple isinserted (“appended”) on the rightmostleaf of the tree. (This is somewhat simi-lar to the procedure used in most com-mercial systems for loading a sorted fileto a multilevel index [Salzberg 1988],except that insertions are now succes-sive instead of batched.) If end_timesare not known at insertion but are up-dated later (as in a transaction-timeenvironment), the index has to besearched for the record that is updated.If the start_time of the updated recordis given in the input, then this search isO~logBn!. Otherwise, we could use ahashing function that stores the aliveobjects only, and for each such object itpoints to a position in the AP-tree (thisis not discussed in the original paper).

To answer a transaction pure-time-slice query for time t, the AP-tree isfirst searched for the leaf that containst. All intervals on the “right” of this leafhave start_times that are larger than t,and thus should not be searched fur-ther. However, all intervals on the leftof this leaf (i.e., the data file from thebeginning until t) have to be checked for“containing” t. Such a search can be aslarge as O~n/B!, since the number of

intervals in the tree is proportional tothe number of changes in the evolution.Of course, if we assume that the queriesare randomly distributed over the entiretransaction-time range, half of the leafnodes, on average, must be searched.The space is O~n/B!.

For answering transaction pure-keyand range-timeslice queries, the nestedST-tree has been proposed [Gunadhiand Segev 1993]. This method facili-tates a separate B1-tree index (calledsurrogate superindex) on all the keys(surrogates) ever inserted in the data-base. A leaf node of such a tree containsentries of the form (key, (p1, p2), wherep1 is a pointer to an AP-tree (called thetime subindex) that organizes the evolu-tion of the particular key and p2 is apointer to the latest version of a key.This approach solves the problem of up-dating intervals by key (just search thesurrogate superindex for the key of theinterval; then this key’s time subindexwill provide the latest version of thisinterval, i.e., the version to be updated).The ST-tree approach is conceptuallyequivalent to reverse chaining with anindex on each accession list (however,due to its relation to the AP-tree, weinclude it in the time-only methods).

Update processing is now O~logBS!,where S denotes the total number ofkeys (surrogates) ever created (S is it-self O~n!). Note that there may be key

Figure 5. The append-only tree. Leaves include the start_time fields of intervals only. Each leaf pointsto file pages, with records ordered according to the start_time field. New records are added only at therightmost leaf of the tree. It is assumed that both endpoints are known for the intervals in this figure.



histories with just one record. For thespace to remain O~n/B!, unused pageportions should be shared by other keyhistories. This implies that the versionsof a given key may reside in separatepages. Answering a pure key query thentakes O~logBS 1 a! I/Os. The given keycan be found with a logarithmic searchon the surrogate superindex, and thenits a versions are accessed but, at worst,each version may reside in a distinctpage. For a transaction range-timeslicequery whose range contains K keys(alive or not at t), the query time isO~KlogBn! because each key in therange has to be searched. When therange is the whole key space, i.e., toanswer a transaction pure-timeslicequery for time t, we have to perform alogarithmic search on the time subindexof each key ever created. This takestime O~SlogBn!.

The basic AP-tree does not separatepast from current data, so transferringto a write-once optical disk may be prob-lematic. We could start transferringdata to the write-once medium instart_time order, but this could alsotransfer long-lived tuples that are stillcurrent (alive), and may be updatedlater. The ST-tree does not have thisproblem because data are clustered bykey; the history of each key representspast data that can be transferred to anoptical medium.

If the AP-tree is used in a valid-timeenvironment, interval insertions, dele-tions, or updates may happen anywherein the valid-time domain. This impliesthat the index will not be as compact asin the transaction domain wherechanges arrive in order, but it wouldbehave as a B-Tree. If, for each update,only the key associated with the up-dated interval is provided, the wholeindex may have to be searched. If thestart_time of the updated interval isgiven, a logarithmic search is needed.Since the M l valid intervals are sortedby start_time, a “*/point/-” query takesO~l/B! I/Os. For “range/point/-” queries,

the ST-tree must be combined with aB-tree as its time subindex. Updates arelogarithmic (by traversing the surrogatesuperindex and the time subindex). Avalid range timeslice query whose rangecontains K keys takes O~KlogBl! I/Os,since every key in the query range mustbe searched for being alive at the validquery time.

Time Index. The time index, pro-posed in Elmasri et al. [1990; 1991], is aB1-tree-based access method on thetime axis. In the original paper themethod was proposed for storing valid-times. But it makes the assumptionthat changes arrive in increasing timeorder and that physical deletions rarelyoccur. Since these are basic characteris-tics of the transaction-time dimension,we consider the time-index to be in thetransaction-time category. There is aB1-tree that indexes a linearly-orderedset of time points, where a time point(also referred to as an indexing point inElmasri et al. [1990]) is either the timeinstant where a new version is createdor the next time instant after a versionis deleted. Thus, a time point corre-sponds to the time instant of a change(for deletions it is the next time instantafter the deletion). Each entry of a leafnode of the time index is of the form(t, b), where t is a time point and b is apointer to a bucket. The pointer of aleaf’s first entry points to a bucket thatholds all records that are “alive” (i.e., asnapshot) at this time point; the rest ofthe leaf entries point to buckets thathold incremental changes (Figure 6). Asa result, the time index does not need toknow in advance the end_time of anobject (which is an advantage over theAP-tree).

The time index was originally pro-posed as a secondary index; but we shalltreat it as a primary index here, inorder to make a fair comparison to othermethods, as explained in Section 4.1.This makes the search estimates com-petitive with the other methods withoutchanging the worst-case asymptoticspace and update formulas.



Since in a transaction environmentchanges occur in increasing time order,new nodes are always added on therightmost leaf of the index. This canproduce a more compact index than theB1 tree in the original paper. The newindex is called the monotonic B1-tree[Elmasri et al. 1993] (the monotonic B1-tree insertion algorithm is similar tothat of the AP-tree).

To answer a transaction pure-time-slice query for some time t, we have tosearch the time index for t; this leads toa leaf node that “contains” t. The paststate is reconstructed by accessing allthe buckets of entries of this leaf nodethat contain timestamps that are less orequal to t. If we assume that the num-ber of changes that can occur at eachtime instant is bounded (by some con-stant), the query time of the time indexis O~logBn 1 a/B!. After the appropri-ate leaf node is found in logarithmictime, the answer a is reconstructed byreading leaf buckets. The update pro-cessing and space can be as large as

O~n/B! and O~n2/B!, respectively.Therefore, this method is conceptuallyequivalent to the “copy” approach ofSection 4.1 (the only difference is thatcopies are now made after a constantnumber of changes).

Answering a transaction range-time-slice query with the time index requiresreconstructing the whole timeslice forthe time of interest and then selectingonly the tuples in the given range. Toanswer range-timeslice queries more ef-ficiently, the two-level attribute/time in-dex (using predefined key regions) wasproposed in Elmasri et al. [1990]. As-suming that there are R predefined keyregions (and R is smaller than n), theupdate processing and space remainO~n/B! and O~n2/B!, respectively—since most of the changes can happen toa single region. Answering a “*/-/point”query means creating the timeslices forall R ranges, even if a range does notcontribute to the answer. Thus thepure-timeslice query time is propor-tional to (

i51

R logBni 1 ai /B, where ni

and ai correspond to the total number ofchanges in individual region ri and thenumber of “alive” objects from ri, respec-tively, at time t. This can be as high asO~RlogBn 1 a!, since each region cancontribute a single tuple to the answer.Similarly, for “range/-/point” queries thequery time becomes O~MlogBn 1 a!,where M is the number of regions thatfall in the given query range (assumingthat the query range contains a numberof regions, otherwise a whole regiontimeslice has to be created).

Pure-key queries are not supported,as record versions of the same object arenot linked (for example, to answer aquery of the form: “find all past versionsof a given key,” we may have to searchthe whole history of the range to findwhere this key belongs).

In Elmasri et al. [1993], it is sug-gested we move record versions to opti-cal disk when their end times change toa time before now. This is under the

Figure 6. The time index. Each first leaf entryholds a full timeslice, while the next entries keepincremental changes.



assumption that the time index is beingused as a secondary index and that eachrecord version is only located in oneplace. So the leaf buckets contain listsof addresses of record versions.

In order to move full pages of data tothe optical disk, a buffer is used in themagnetic disk to collect records as theirend times are changed. An optical diskpage is reserved for the contents of eachbuffer page. When a record version isplaced in a buffer page, all pointers to itin the time index must be changed torefer to its new page in the optical disk.This can require O~n/B! update pro-cessing, as a record version pointer canbe contained in O~n/B! leaves of thetime index. A method for finding point-ers for particular record versions withinthe lists of addresses in the leaf’s firstentry, in order to update them, is notgiven.

Index leaf pages can be migrated tothe optical disk only when all theirpointers are references to record ver-sions on the optical disk or in the mag-netic disk buffer used to transfer recordversions to optical disk. Since each in-dex leaf page contains the pointers to allrecord versions alive at the time theindex page was created, it is likely thatmany index pages may not qualify formoving to optical disk because they con-tain long-lived records.

It is suggested in Elmasri et al. [1993]that long-lived records inhibiting themovement of index pages also be kept ina magnetic buffer and assigned an opti-cal address, so that the index leaf pagecan be moved. When all the children ofan internal index page have been movedto the optical disk, an internal indexpage can also be moved. However, thenumber of long-lived record versionscan also be O~n!. Thus the number ofempty optical pages waiting for long-lived object versions to die and havingmirror buffers on magnetic disk isO~n/B!.

In an attempt to overcome the highstorage and update requirements, thetime index1 [Kourmajian et al. 1994]

has been proposed. There are two newstructures in the time index1: the SCSand the SCI buckets. In the originaltime index, a timeslice is stored for thefirst timestamp entry of each leaf node.Since sibling leaf nodes may sharemuch of this timeslice, in the time in-dex1, odd-even pairs of sibling nodesstore their common parts of the time-slice in a shared SCS bucket. Eventhough the SCS technique would inpractice save considerable space (abouthalf of what was used before), the as-ymptotic behavior remains the same asthe original time index.

Common intervals that span a num-ber of leaf nodes are stored together onsome parent index node (similar to thesegment tree data structure [Bentley1977]). Each index node in the timeindex1 is associated with a range, i.e.,the range of time instants covered by itssubtree. A time interval I is stored inthe highest internal node v such that Icovers v’s range and does not cover therange of v’s parent. All such intervalsare kept in the SCI bucket of an indexnode.

By keeping the intervals in this way,quadratic space is dramatically re-duced. Observe that an interval maynow be stored in, at most, logarithmicmany internal nodes (due to the seg-ment tree property [Mehlhorn 1984]).This implies that the space consumptionof the time index1 is reduced toO~~n/B!logBn! space. The authors men-tion in the paper that in practice thereis no need to associate SCI buckets tomore than two-levels of index nodes.However, if no SCI buckets are used atthe higher levels, asymptotic behaviorremains similar to the original time in-dex.

In addition, it is not clear how up-dates are performed when SCI bucketsare used. In order to find the actualSCIs where a given interval is to bestored, both endpoints of the intervalshould be known. Otherwise, if an inter-val is initially inserted as (t, now), ithas to be found and updated when, at a



later time, the right endpoint becomesknown. This implies that some searchstructure is needed in each SCI, whichwould, of course, affect the update be-havior of the whole structure. Finally,the query time bound remains the samefor the time index1 as for the originaltime index.

If the original time-index (using theregular B1 tree) is used in a valid-timeenvironment, physical object deletionsanywhere in the (valid) time domainshould be supported. However, a de-leted object should be removed from allthe stored (valid) snapshots. If the de-leted object has a long valid-time inter-val, the whole structure may have to beupdated, making such deletions verycostly. Similarly, objects can be addedanywhere in the valid domain, implyingthat all affected stored snapshots haveto be updated.

Differential File Approach. Whilethe differential file approach [Jensen etal. 1991;1992] does not propose the cre-ation of a new index, we discuss it, sinceit involves an interesting implementa-tion of a database system based ontransaction time. In practice, an indexcan be implemented on top of the differ-ential file approach, however, here weassume no such index exists. Changesthat occur for a base relation r arestored incrementally and timestampedon the relation’s log; this log is itselfconsidered a special relation, called abacklog. In addition to the attributes ofthe base relation, each entry of thebacklog contains a triplet: (time, key,op). Here time corresponds to the (com-mit) time of the transaction that up-dated the database about a change thatwas applied on the base relation tuplewith key surrogate; op corresponds tothe kind of change that was applied onthis tuple (addition, deletion, or modifi-cation operations).

As a consequence of the use of times-tamps, a base relation is a function oftime; thus r~t! is a timeslice of the baserelation at time t. A timeslice of a baserelation can be stored or computed.

Storing a timeslice can be implementedas either a cache (where pointers to theappropriate backlog entries are used) oras materialized data (where the actualtuples of the timeslice are kept). Usingthe cache avoids storing (probably) longattributes, but some time is needed toreconstruct the full timeslice (Figure 7).

A timeslice can be fixed (for example,r~t1!) or time-dependent (r~now 2 t1!).Time-dependent stored base relationshave to be updated; this is done eagerly(changes directly update such relations)or lazily (when the relation is re-quested, the backlog is used to bring itup in the current state). An eager cur-rent (r~now!) timeslice is like a snap-shot relation, that is, a collection of allcurrent records.

A time-dependent base relation canalso be computed from a previous storedtimeslice and the set of changes thatoccurred in between. These changes cor-respond to a differential file (instead ofsearching the whole backlog). Differen-tial files are also stored as relations.

For answering “*/-/point” queries, thisapproach can be conceptually equiva-lent to the “log” or “copy” methods, de-pending on how often timeslices arestored. Consider, for example, a singlebase relation r with backlog br: if time-slices are infrequent or the distance(number of changes) between timeslicesis not fixed, the method is equivalent tothe “log” approach where br is the his-tory log. The space is O~n/B! and up-date processing is constant (amortized)per change, but the reconstruction canalso be O~n/B!. Conversely, if timeslicesare kept with fixed distance, the methodwill behave similarly to the “copy” ap-proach.

In order to address “range/-/point”queries, we have to produce the time-slice of the base relation and then checkall of the tuples of this timeslice forbeing in the query range. Similarly, ifthe value of a given key is requested asof some time, the whole relation mustfirst be reconstructed as of that time.The history (previous versions) of a key



is not kept explicitly, as versions of agiven key are not connected together.

Checkpoint Index. The checkpointindex was originally proposed for theimplementation of various temporal op-erators (temporal joins, parallel tempo-ral joins, snapshot/interval operators,etc.) [Leung and Muntz 1992a; 1992b;1993]. Here we take the liberty of con-sidering its behavior as if it was used asan access method for transaction-timequeries. Timeslices (called checkpoints)are periodically taken from the state ofan evolving relation. If the query opera-tor is a join, checkpoints from two rela-tions are taken. Partial relation check-points based on some key predicatehave also been proposed. For simplicity,we concentrate on checkpointing a sin-gle relation.

The checkpoint index assumes thatobject intervals are ordered by theirstart_time. This is a property of thetransaction-time environment (Figure1). A stream processor follows the evolu-tion as time proceeds. When a check-point is made at some (checkpoint) in-stant t, the objects alive at t are stored

in the checkpoint. A separate structure,called the data stream pointer (DSP),points to the first object born after t.Conceptually, the DSP provides accessto an ordered (by interval start_time)list of objects born between checkpoints.The DSP is needed, since some of theseobjects may end before the next check-point, and thus would not be recordedotherwise. The checkpoint time instantsare indexed through a B1-tree-likestructure (Figure 8).

The performance of the checkpoint in-dex for pure-timeslice queries dependson how often checkpoints are taken. Atone extreme, if very few checkpoints aretaken, the space remains linear O~n/B!.Conversely, if checkpoints are keptwithin a fixed distance, the method be-haves similarly to the “copy” approach.In general, the DSP pointer may be“reset” backwards in time to reduce thesize of a checkpoint (which is an optimi-zation issue).

When an object is deleted, its recordhas to be found to update the end_time.The original presentation of the check-point index implicitly assumes that the

Figure 7. Differential file approach.



object end_times are known (since thewhole evolution or stream is known).However, a hashing function on aliveobjects can be used to solve this problem(as with the AP-tree). The checkpointindex resembles the differential file andthe time indexes, in that all of themkeep various timeslices. However, in-stead of simply storing the changes be-tween timeslices, the checkpoint indexkeeps the DSP pointers to actual objectrecords. Hence, in the checkpoint index,an update to an interval end_time can-not simply be added at the end of a log,but it has to update the correspondingobject’s record.

To address range-timeslice querieswith the checkpoint index, the timesliceof the base relation is first produced andthen all of the tuples of this timesliceare checked for being in the queryrange. The history (previous versions) ofa given key is not kept explicitly be-cause versions of the same key are notconnected together.

The checkpoint index could use atransfer policy to an optical mediumsimilar to the one for the time index.

Archivable Time Index. The ar-chivable time index [Verma andVarman 1994] does not directly indexactual transaction time instants, butdoes index version numbers. The trans-action time instant of the first changetakes version number 0, and successivechanges are mapped to consecutive ver-sion numbers. An interval is repre-sented by the version numbers corre-sponding to its start and end times. A

special structure is needed to transformversions to timestamps and vice versa.For the rest, we use the terms timeinstant and version number synony-mously.

Let Tc denote the current time. Themethod partitions records to currentand past records. For the currentrecords (those with unknown end_time),a conventional B1-tree structure isused to index the start_time of theirtransaction intervals. For past records(records whose end_time is less or equalto Tc), a more complex structure, PVAS,is used. Conceptually, the PVAS can beviewed as a logical binary tree of size 2a

(Tc # 2a). Each node in the tree repre-sents a segment of the transaction timespace. At Tc, only some of the nodes ofthe tree have been created; new nodesare dynamically added on the right pathof the tree as time increases. A nodedenoted by segment @i, j# where i , jhas span~ j 2 1!. The root is denoted@0,2a#. The left child of node @i, j# isnode @i, ~i 1 j!/ 2# and its right child isnode @~i 1 j!/ 2, j#. Hence, the span of anode is the sum of the span of its twochildren. The span of a leaf node is two.Figure 9 (from Verma and Varman[1994]) shows an example of the PVAStree at Tc 5 55 with a 5 6. Node seg-ments appear inside the nodes.

Past records are stored in the nodes ofthis tree. Each record is stored in ex-actly one node: the lowest node whosespan contains the record’s interval. For

Figure 8. The checkpoint index. The evolution at Figure 1 is assumed.



example, a record with interval [3,16] isassigned to node [0,16]. The nodes of thebinary tree are partitioned into threedisjoint sets: passive, active, and futurenodes. A node is passive if no morerecords can ever be stored in that node.It is an active node if it is possible for arecord with intervals ending in Tc to bestored there. It is a future node if it canonly store records whose intervals endafter Tc, i.e., in the future. Initially, allnodes begin as future nodes at Tc 5 0,then become active, and finally, as timeproceeds, end up as passive nodes. Node@i, j# becomes Tc 5 ~i 1 j!/ 2 active if itis a leaf, or at Tc 5 ~i 1 j!/ 2 1 1 oth-erwise. For example, in Figure 9, for Tc

5 55, node [48,64] belongs to the futurenodes. This is because any record withinterval contained in [48,55] will bestored somewhere in its left subtree.The only records that can be stored in[48,64] have intervals ending after time55, so they are future records. Futurenodes need not be kept in the tree be-fore becoming active.

Each interval assigned to a PVASnode is stored in two lists, one thatstores the intervals in increasing start-_time order and one that stores them inincreasing end_time order. This is simi-lar to the interval tree [Edelsbrunner1983]. In Verma and Varman [1994], adifferent structure is used to implementthese lists for the active and passivenodes by exploiting the fact that passive

nodes do not get any new intervals afterthey become passive. In particular, allpassive node lists can be stored in twosequential files (the IFILE and theJFILE), a property that provides forgood pagination and record clustering.Two dynamic structures, the ITREE (aB-tree structure) and JLISTS (a collec-tion of lists) are used for the active nodelists.

The PVAS logical binary tree and itsaccompanied structures can be effi-ciently placed in pages (details inVerma and Varman [1994]) occupyingO~n/B! space. Since the structure doesnot index record keys, the update as-sumes that the start_time of the up-dated record is known; then updating isO~logBn!. As with most of the othertime-only methods, if updates are pro-vided by the record key only, a hashingfunction can be used to find thestart_time of the record before the up-date proceeds on the PVAS.

To answer a transaction pure-time-slice query, both the CVAS and thePVAS are searched. Since the CVAS isordered on start_times, a logarithmicsearch will provide the current recordsborn before query time t. Searching thePVAS structure is more complicated.The search follows a single path downthe logical binary tree, and the lists ofnodes whose span contains t aresearched sequentially. Searching eachlist provides a clustered answer, but

[0,64]

[0,32][32,64]

[16,32][0,16] [32,48] [48,64]

[48,56]

[54,56]

[52,56]

. . . . . . . . .

Passive

Passive

active

active

active

active

active

future

future[3,16]

Interval storedat this node

Figure 9. The PVAS binary tree. The current logical time is 55.



there may be O~log2n! binary nodeswhose lists are searched. Since everylist access may be a separate I/O, thequery time becomes O~log2n 1 a/B!.

Since no record keys are indexed, themethod as presented above cannot effi-ciently answer pure-key queries. Fortransaction range-timeslice queries, thewhole timeslice should first be com-puted. Answering pure-key and range-timeslice queries [Verma and Varman1994] assumes the existence of anotherindex for various key regions, similarlyto the time-index.

Snapshot Index. The snapshot in-dex [Tsotras and Kangelaris 1995]achieves the I/O-optimal solution to thetransaction pure-timeslice problem.Conceptually it consists of three datastructures: a multilevel index that pro-vides access to the past by time t; amultilinked structure among the leafpages of the multilevel index that facil-itates the creation of the query answerat t; and a hashing function that pro-vides access to records by key, used forupdate purposes. A real-world object isrepresented by a record with a timeinvariant id (object id), a time-variant(temporal) attribute, and a semiclosedtransaction-time interval of the form[start_time, end_time). When a new ob-ject is added at time t, a new record iscreated with interval @t, now# andstored sequentially in a data page. Atany given instant there is only one datapage that stores (accepts) records, calledthe acceptor page. When an acceptorpage becomes full, a new acceptor pageis created. Acceptor pages are added atthe end of a linked list (list L) as theyare created. Up to now, the snapshotindex resembles a linked “log” of pagesthat keeps the object records.

There are three main differences froma regular log: the use of the hashingfunction, the in-place deletion updates,and the notion of page usefulness. Thehashing function is used for updatingrecords about their “deletion.” When anew record is created, the hashing func-

tion will store the id of this record,together with the address of the accep-tor page that stores it. Object deletionsare not added at the and of the log.Rather, they are represented by chang-ing the end_time of the correspondingdeleted record. This access is facilitatedby the hashing function. All recordswith end_time equal to now are termed“alive” else they are called “deleted.”

As pointed out in Section 4.1, time-only methods need to order their databy time only, and not by time and key.Since data arrives ordered by time, adynamic hashing function is enough foraccessing a record by key (membershiptest) when updating it. Of course, hash-ing cannot guarantee against pathologi-cal worst cases (i.e., when a bad hashingfunction is chosen). In those cases, aB1tree on the keys can be used insteadof hashing, leading to logarithmicworst-case update.

A data page is defined useful for: (i)all time instants for which it was theacceptor page, or (ii) after it ceased be-ing the acceptor page, for all time in-stants for which the page contains atleast u z B “alive” records. For all otherinstants, the page is called nonuseful.The useful period [u.start_time, u.end-_time) of a page forms a “usefulness”interval for this page. The u.start_timeis the time instant the page became anacceptor page. The usefulness parame-ter u ~0 , u # 1! is a constant thattunes the behavior of the snapshot in-dex. To answer a pure-timeslice abouttime t, the snapshot index will onlyaccess the pages useful at t (or, equiva-lently, those pages that have at leastu z B records alive at t) plus at mostone additional page that was the accep-tor page at t. This single page maycontain less than u z B records from theanswer.

When a useful data page becomesnonuseful, its “alive” records are copiedto the current acceptor page (this is likea time-split [Easton 1986; Lomet andSalzberg 1989]). In addition, based on



its position in the linked list L, a non-useful data page is removed from L andis logically placed under the previousdata page in the list. This creates amultilinked structure that resembles aforest of trees of data pages, and iscalled the access forest (Figure 10). Theroot of each tree in the access forest liesin list L. The access forest has the fol-lowing properties: (a) u.start_time fieldsof the data pages in a tree are organizedin a preorder fashion; (b) the usefulnessinterval of a page includes all the corre-sponding intervals of the pages in itssubtree; (c) the usefulness intervals@di, ei# and @di11, ei11# of two consecutivechildren under the same parent pagemay have one of two orderings: di , ei

, di11 , ei11 or di , di11 , ei , ei11.

Finding the timeslice as of a giventime t is reduced to finding the datapages that were useful at time t. This isequivalent to the set-history problem ofTsotras and Gopinath [1990] and Tso-tras et al. [1995]. The acceptor page asof t is found through the multilevel in-dex that indexes the u.start_time fieldsof all the data pages. That is, all datapages are at the leaves of the multilevelindex (the link list and the access forestare implemented among these leaf pag-es). Since time is increasing, the multi-level index is “packed” and increasesonly through its right side. After theacceptor data page at t is located, theremaining useful data pages at t arefound by traversing the access forest.This traversing can be done very effi-

Figure 10. A schematic view of the access forest for a given collection of usefulness intervals: (a) theusefulness interval of each data page at time 80; an open interval at time t 5 80 represents a data pagethat is still useful at that time; (b) the access forest at time t 5 79 (in this figure now corresponds totime 79). Each page is represented by a tuple, ,page-id, page-usefuleness.period.. Page TP denotes thetop of list L, while the current acceptor page is always at the end of L. (c) The access forest at time t5 80. At that time page E became nonuseful (because some record deletion reduced the number of

“alive” records in E below the uB threshold). As a result it is removed from L and placed (together withits subtree) under the previous page in the list, page D. The multilevel index is not shown.



ciently using access forest properties[Tsotras and Kangelaris 1995].

As a result, the snapshot index solvesthe “*/-/point” query optimally: O~logBn1 a/B! I/Os for query time, O~n/B!space, and O~1! update processing perchange (in the expected amortizedsense, assuming the use of a dynamichashing function instead of a B-tree [Di-etzfelbinger et al. 1988]). The number ofuseful pages depends on the choice ofparameter u. Larger u means fasterquery time (fewer accessed pages) andsavings in additional space (which re-mains linear to n/B). Since more spaceis available, the answer could be con-tained in a smaller number of usefulpages.

Migration to a WORM disk is possiblefor each data page that becomes non-useful. Since the parent of a nonusefulpage in the access forest may still be auseful page, an optical disk page mustbe reserved for the parent. Observe,however, that the snapshot index uses a“batched” migration policy that guaran-tees that the “reserved” space in theoptical disk is limited to a controlledsmall fraction of the number of pagesalready transferred to the WORM.

Different versions of a given key canbe linked together so that pure-key que-ries of the form “find all versions of agiven key” are addressed in O~a! I/Os,where a now represents the number ofsuch versions. Since the key space isseparate from the transaction timespace, the hashing function used to ac-cess records by key can keep the latestversion of a key, if any. Each key ver-sion when updated can be linked to itsprevious version; thus each record rep-resenting a key contains an extrapointer to the record’s previous version.If, instead of a hashing function, a B-tree is used to access the key space, thebound becomes O~logBS 1 a! where Sis the number of different keys evercreated.

For answering “range/-/point” queries,the snapshot index has the same prob-

lem as the other time-only methods: thewhole timeslice must first be computed.In general, this is the trade-off for fastupdate processing.

Windows Method. Recently, Ra-maswamy [1997] provided yet anothersolution to the “*/-/point” query, i.e., thewindows method. This approach has thesame performance as the Snapshot In-dex. It is a paginated version of a data-structure presented in Chazelle [1986],which optimally solved the pure time-slice query in main memory. Ra-maswamy [1994] partitions time spaceinto contiguous “windows” and associ-ates with each window a list of all inter-vals that intersect the window’s inter-val. Windows are indexed by a B-treestructure (similar to the multilevel in-dex of the snapshot index).

To answer a pure timeslice query, theappropriate window that contains thistimeslice is first found and then thewindow’s list of intervals is accessed.Note that the “windows” of Ramaswamy[1997] correspond to one or more consec-utive pages in the access-forest of Tso-tras and Kangelaris [1995].

As with the snapshot index, some ob-jects will appear in many windows(when a new window is created it getscopies of the “alive” objects from theprevious one), but the space remainsO~n/B!. The windows method uses theB-tree to also access the objects by key,hence updating is amortized O~logBn!.If all copies of a given object are linkedas proposed in the previous section, allversions of a given key can be found inO~logBn 1 a! I/Os.

5.1.3 Time-Key Methods. To answera transaction range-timeslice query effi-ciently, it is best to cluster data by bothtransaction time and key within pages.The “logically” related data for thisquery are then colocated, thus minimiz-ing the number of pages accessed. Meth-ods in this category are based on someform of a balanced tree whose leaf pagesdynamically correspond to regions of thetwo-dimensional transaction time-key



space. While changes still occur in in-creasing time order, the correspondingkeys on which the changes are appliedare not in order. Thus, there is a loga-rithmic update processing per change sothat data is placed according to keyvalues in the above time-key space.

An example of a page containing atime-key range is shown in Figure 11.Here, at transaction time instant 5, anew version of the record with key b iscreated. At time 6, a record with key gis inserted. At time 7, a new version ofthe record with key c is created. At time8, both c and f have new versions andrecord h is deleted. Each line segment,whose start and end times are repre-sented by ticks, represents one recordversion. Each record version takes upspace in the disk page.

There have been two major approach-es: methods based on variants of R-trees[Stonebraker 1987; Kolovson and Stone-braker 1989; 1991] and methods basedon variants of B1-trees [Easton 1986;Lomet and Salzberg 1989; Lanka andMays 1991; Manolopoulos and Kapetan-akis 1990; Becker et al. 1996; Varmanand Verma 1997]. Utilizing R-tree-based methods provides a strong advan-tage, in that R-trees [Guttman 1984;Sellis et al. 1987; Beckmann et al. 1990;Kamel and Faloutsos 1994] can repre-sent additional dimensions on the sameindex (in principle such a method couldsupport both time dimensions on the

same index). A disadvantage of the R-tree-based methods is that they cannotguarantee a good worst-case update andquery time performance. However, suchworst cases are usually pathological(and do not happen often). In practice,R-trees have shown good average-caseperformance. Another characteristic ofR-tree-based methods is that theend_time of a record’s interval is as-sumed known when the record is in-serted in the method, which is not aproperty of transaction time.

R-Tree-Based Methods. The POST-GRES database management system[Stonebraker 1987] proposed a novelstorage system in which no data is everoverwritten. Rather, updates are turnedinto insertions. POSTGRES timestampsare timestamps of committed transac-tions. Thus, the POSTGRES storagesystem is a transaction-time accessmethod.

The storage manager accommodatespast states of the database on a WORMoptical disk (archival system), in addi-tion to the current state that is kept onan ordinary magnetic disk. The assump-tion is that users will access currentdata more often than past data, thusthe faster magnetic disk is more appro-priate for recent data. As past datakeeps increasing, the magnetic disk willeventually be filled.

As data becomes “old” it migrates tothe archival system by means of an

Figure 11. Each page is storing data from a time-key range.



asynchronous process, called the vac-uum cleaner. Each data record has acorresponding interval (Tmin, Tmax),where Tmin and Tmax are the committimes of the transactions that insertedand (logically) deleted this record fromthe database. When the vacuum cleaneroperates, it transfers data whose endtime is before some fixed time to theoptical disk. The versions of data thatreside on an optical disk page have sim-ilar end times (Tmax), but may havewidely varying start times (Tmin).Thus, pages on the optical disk are as inFigure 12.

If such a page is accessed for a queryabout some “early” time t, it may con-tribute only a single version to the an-swer, i.e., the answer would not be wellclustered among pages.

Since data records can be accessed byqueries that may involve both time andkey predicates, a two-dimensionalR-tree [Guttman 1984] access methodhas been proposed for archival data.POSTGRES assumes that this R-tree isa secondary access method. Pointers todata records are organized according totheir key value in one dimension and totheir intervals (lifespans) in the otherdimension.

The data are written sequentially tothe WORM device by the vacuumingprocess. It is not possible to insert newrecords in a data page on a WORMdevice that already contains data. So itis not possible to have a primary R-treewith leaves on the optical disk withoutchanging the R-tree insertion algo-rithm. However, we make estimates

based on a primary R-tree, in keepingwith our policy of Section 4.1.

For current data, POSTGRES doesnot specify the indexing in use. What-ever it is, queries as of any past timebefore the most recent vacuum timemust access both the current and thehistorical components of the storagestructure. Current records are storedonly in the current database and theirstart times can be arbitrarily far back inthe past.

For archival data, (secondary) indexesspanning the magnetic and optical diskare proposed (combined media or com-posite indexes). There are two advan-tages in allowing indexes to span bothmedia: (a) improved search and insertperformance as compared to indexesthat are completely on the optical me-dium (such as the write-once balancedtree [Easton 1986] and the allocationtree [Vitter 1985]); and (b) reduced costper bit of disk storage as compared toindexes entirely contained on magneticdisk. Two combined media R-tree in-dexes are proposed in Kolovson andStonebraker [1989]; they differ on theway index blocks are vacuumed fromthe magnetic to the archival medium.

In the first approach, the R-tree isrooted on the magnetic disk, and when-ever its size on the magnetic disk ex-ceeds some preallocated threshold, thevacuuming process starts moving someof the leaf pages to the archival me-dium. These pages refer to records thathave already been moved to the opticaldisk. Each such record has Tmax lessthan some time value. For each leafpage, the maximum Tmax is recorded.The pages with smallest maximumTmax refer to data that was transferredlongest ago. These are the pages thatare transferred. Following the vacuum-ing of the leaf nodes, the process recur-sively vacuums all parent nodes thatpoint entirely to children nodes thathave already been stored on the archive.The root node, however, is never a can-didate for vacuuming.

The second approach (dual R-tree)maintains two R-trees, both rooted on

Figure 12. A page storing data with similar endtimes.



the magnetic disk. The first is entirelystored on the magnetic disk, while thesecond is stored on the archival disk,except for its root (in general, exceptfrom the upper levels). When the firsttree gains the height of the second tree,the vacuuming process vacuums all thenodes of the first tree, except its root, tothe optical disk. References to theblocks below the root of the first tree areinserted in the root of the second tree.Over time, there will continue to be twoR-trees, the first completely on the mag-netic disk and periodically archived.Searches are performed by descendingboth R-trees.

In analyzing the use of the R-tree as atemporal index, we speak of recordsrather than pointers to records. In bothapproaches, a given record is kept onlyonce, therefore the space is clearly lin-ear to the number of changes (the num-ber of data records in the tree is propor-tional to n). Since the height of the treesis O~logBn!, each record insertion needslogarithmic time. While, on the average,searching an R-tree is also logarithmic,in the (pathological) worst case thissearching can be O~n/B!, since thewhole tree may have to be traversed dueto the overlapping regions.

Figure 13 shows the general R-treemethod, using overlapping rectangles oftime-key space.

R-trees are best suited for indexingdata that exhibits a high degree of nat-ural clustering in multiple dimensions;then the index can partition data intorectangles so as to minimize both thecoverage and the overlap of the entireset of rectangles (i.e., rectangles corre-sponding to leaf pages and internalnodes). Transaction time databases,however, may consist of data whose at-tribute values vary independently oftheir transaction time intervals, thusexhibiting only one-dimensional cluster-ing. In addition, in an R-tree that storestemporal data, page splits cause a gooddeal of overlap in the search regions ofthe nonleaf nodes. It was observed thatfor data records with nonuniform inter-

val lengths (i.e., a large proportion of“short” intervals and a small proportionof “long” intervals), the overlapping isclearly increased, affecting the queryand update performance of the index.

Figure 14 shows how long-livedrecords inhibit the performance ofstructures that keep only one copy ofeach record and which keep time-keyrectangles. The problem is that a long-lived record determines the length ofthe time range associated with the pagein which it resides. Then, even if onlyone other key value is present, andthere are many changes to the recordwith the other key value in that time

Figure 13. An example of data bounding as usedin R-tree based methods.



range, overlap is required. For example,in Figure 14, the eleventh record ver-sion (shown with a dotted line) belongsto the time-key range of this page but itcannot fit since the page has alreadyten record versions. It will instead beplaced in a different page whose rectan-gle has to overlap this one. The sameexample also illustrates that the num-ber of leaf pages to be retrieved for atimeslice can be large ~O~a!! since onlya few records may be “alive” (containthe given time value in their interval)for any one page.

In an attempt to overcome the aboveproblems, the segment R-tree (SR-tree)was proposed [Kolovson and Stone-braker 1991; Kolovson 1993]. The SR-tree combines properties of the R-treeand the segment tree, a binary tree datastructure proposed in Bentley [1977] forstoring line segments. A Segment Treestores the interval endpoints on the leafnodes; each internal node is associatedwith a “range” that contains all theendpoints in its subtree. An interval I isstored in the highest internal node vsuch that I covers v’s range and doesnot cover the range of v’s parent. Ob-serve that an interval may be stored inat most logarithmic many internalnodes; thus the space is no longer linear[Mehlhorn 1984].

The SR-tree (Figure 15) is an R-treewhere intervals can be stored in both

leaf and nonleaf nodes. An interval I isplaced to the highest level node X of thetree such that I spans at least one of theintervals represented by X’s childnodes. If I does not span X, spans atleast one of its children but is not fullycontained in X, then I is fragmented.

Using this idea, long intervals will beplaced in higher levels of the R-Tree,thus the SR-Tree tends to decrease theoverlapping in leaf nodes (in the regularR-Tree, a long interval stored in a leafnode will “elongate” the area of thisnode thus exacerbating the overlapproblem). One risks having large num-bers of spanning records or fragments ofspanning records stored high up in thetree. This decreases the fan-out of theindex as there is less room for pointersto children. It is suggested to vary thesize of the nodes in the tree, makinghigher-up nodes larger. “Varying thesize” of a node means that several pagesare used for one node. This adds somepage accesses to the search cost.

As with the R-tree, if the record isinserted at a leaf (because it did notspan anything), the boundaries of the

Figure 14. The effect of long-lived records onoverlapping.

Figure 15. The SR-tree.



space covered by the leaf node in whichit is placed may be expanded. Expan-sions may be needed on all nodes on thepath to the leaf that contains the newrecord. This may change the spanningrelationships, since records may nolonger span children that have been ex-panded. In this case, such records arereinserted in the tree, possibly beingdemoted to occupants of nodes they pre-viously spanned. Splitting nodes mayalso cause changes in spanning relation-ships, as they make children smaller—former occupants of a node may be pro-moted to spanning records in theparent.

Similarly with the segment tree, thespace used by the SR-tree is no longerlinear. An interval may be stored inmore than one nonleaf nodes (in thespanning and remnant portions of thisinterval). Due to the use of the segment-tree property, the space can be as muchas O~~n/B!logBn!. Inserting an intervalstill takes logarithmic time. However,due to possible promotions, demotions,and fragmentation, insertion is slowerthan in the R-tree. Even though thesegment property tends to reduce theoverlapping problem, the (pathological)worst-case performance for the deletionand query times remains the same asfor the R-tree organization. The aver-age-case behavior is again logarithmic.

To improve the performance of theirstructure, the authors also proposed theuse of a skeleton SR-tree, which is anSR-tree that prepartitions the entire do-main into some number of regions. Thisprepartition is based on some initialassumption about the distribution ofdata and the number of intervals to beinserted. Then the skeleton SR-tree ispopulated with data; if the data distri-bution is changed, the structure of theskeleton SR-tree can be changed, too.

An implicit assumption made by allR-tree-based methods is that when aninterval is inserted, both its Tmin andTmax values are known. In practice,however, this is not true for “current”data. One solution is to enter all such

intervals as (Tmin, now), where now isa variable representing the currenttime. A problem with this approach isthat a “deletion” update that changesthe now value of an interval to Tmax isimplemented by a search for the inter-val, a deletion of the (Tmin, now) inter-val, and a reinsertion as (Tmin, Tmax)interval. Since searches are not guaran-teed for worst-case performance, thisapproach could be problematic. The de-letion of (Tmin, now) is a physical dele-tion, which implies the physical deletionof all remnant portions of this interval.A better solutionis to keep the currentrecords in a separate index (probably abasic R-tree). This avoids the above de-letion problem, but the worst-case per-formance remains as before.

The pure-key query is addressed as aspecial case of a range time-intervalquery, where the range is limited to akey and the time-interval is the wholetime axis. Hence, all pages that containthe key in their range are accessed.However, if this key never existed, thesearch may go through O~n/B! pages in(pathological) worst case. If this key hasexisted, the search will definitely findits appearances, but it may also accesspages that do not contain any appear-ances of this key.

If the SR-tree is used as a valid-timemethod, then physical deletion of anystored interval should be supported effi-ciently. As above, the problem withphysical deletions emanates from keep-ing an interval in many remnant seg-ments, all of which have to be found andphysically deleted. Actually, the origi-nal SR-tree paper [Kolovson and Stone-braker 1991] assumes that physical de-letions do not happen often.

Write-Once B-Tree. The write-onceB-tree, or WOBT, proposed in Easton[1986], was originally intended for adatabase that resided entirely onWORMs. However, many variations ofthis method (the time-split B-tree[Lomet and Salzberg 1989]; the persis-tent B-tree [Lanka and Mays 1991]; themultiversion B-tree [Becker et al. 1996];



and the multiversion access structure[Varman and Verma 1997]) have beenproposed which may use both a WORMand a magnetic disk, or only a magneticdisk. The WOBT itself can be used ei-ther on a WORM or on a magnetic disk.The WOBT is a modification of the B1-tree, given the constraints of a WORM.

The WORM characteristics imply thatonce a page is written, no new data canbe entered or updated in the page (sincea checksum is burned into the disk). Asa result, each new index entry occupiesan entire page; for example, if a newindex entry takes 12 bytes and a page is1,024 bytes, 99% of the page is empty.Similarly, each new record version is anentire page. Tree nodes are collectionsof pages—for example, a track on a disk.Record versions contain their transac-tion start times only. A new versionwith the same key is placed in the samenode. Its start time is the end time ofthe previous version. Nodes represent arectangle in the transaction time-keyspace. The nodes partition that space—each time-key point is in exactly onenode.

When a node fills up, it can be split by(current) transaction time or split firstby current transaction time and then bykey. The choice depends on how manyrecords in the node are current versionsat the time of the split. The old node isleft in place. (There is no other choice.)The record versions “alive” at the cur-rent transaction time are copied to anew node, or two new nodes if it is alsosplit by key. There is space for newversions in the new nodes. Deletions ofrecords are handled in the only possibleway: a node deletion record is written ina current node and it contains the endtime. When the current node is split,the deleted record is not copied. Thisdesign enables some clustering of therecords in nodes by time and key (aftera node split, “alive” records are storedtogether in a page) but most of thespace of most of the optical disk pages isempty (because most new entries occupywhole pages).

When a root node splits, a new root iscreated. Addresses of consecutive rootsand their creation times are held in a“root log” that has the form of a vari-able-length append-only array. This ar-ray provides access to the appropriateroot of the WOBT by time.

If the WOBT is implemented on amagnetic disk, space utilization is im-mediately improved, as it is not neces-sary to use an entire page for one entry.Pages can be updated, so they can beused for nodes. Space utilization isO~n/B! and range queries are O~logBn1 a/B!, if one disregards record dele-tion. These bounds are for using themethod exactly as described in this pa-per, except that each node of the treewill be a page on a magnetic disk. Inparticular, the old node in a split is notmoved. Current records are copied to anew node or to two new nodes. Sincedeletions are simply handled with a de-letion record (which is “seen” by themethod as another updated value), thesearch algorithm is not able to avoidsearching pages that may be full of “de-leted” records. So if deletions are fre-quent, pages that do not contribute tothe answer may be accessed.

Since all the B1-tree-based transac-tion-time methods search data recordsby both transaction time and key, or bytransaction time only, answering apure-key query with the WOBT (or thetime-split B-tree, persistent B-tree, andmultiversion B-tree) requires that agiven version (instance) of the keywhose previous versions are requestedshould also be provided by the query.That is, a transaction time predicateshould be provided in the pure-keyquery as, for example, in “find the pre-vious salary history of employee A whowas alive at t.”

Different versions of a given key canbe linked together so that the pure-keyquery (with time predicate) is addressedby the WOBT in O~logBn 1 a! I/Os.The logarithmic part is spent findingthe instance of employee A in version t



and then its previous a versions areaccessed using a linked structure. Basi-cally, the WOBT (and the time-split B-tree, persistent B-tree, and the multi-version B-tree) can have backwardslinks in each node to the previous his-torical version. This does not use muchspace, but for records that do notchange over many copies, one needs togo back many pages before getting moreinformation. To achieve the boundabove, each record needs to keep theaddress of the last different version ofthat record.

If such addresses are kept in records,the address of the last different versionfor each record is available at the timethe data node does a time split. Thenthese addresses can be copied to thenew node with their respective records.A record whose most recent previousversion is in the node that is split mustadd that address. A record that is thefirst version with that key must have aspecial symbol to indicate this fact. Thissimple algorithm can be applied to anymethod that does time splits.

To answer the general pure-key query“find the previous salary history of em-ployee A” requires finding if A was everan employee. The WOBT needs to copy“deleted” records when a time split oc-curs, which implies that the WOBTstate carries the latest record for allkeys ever created. However, this in-creases space consumption. Otherwise,if “deleted” records are not copied, allpages including this key in their keyspace may have to be searched.

The WOBT used on a magnetic diskstill makes copies of records where itdoes not seem necessary. The WOBTalways makes a time split before mak-ing a key split. This creates one histori-cal page and two current pages wherepreviously there was only one currentpage. A B-tree split creates two currentpages where there was only one. Nohistorical pages are created. It seemslike a good idea to be able to make purekey splits as well as time splits or time-

and-key splits. This would make thespace utilization better.

Time Split B-Tree. The time-splitB-tree (or TSB-tree) [Lomet and Sal-zberg 1989; 1990; 1993] is a modifica-tion of the WOBT that allows pure keysplits and keeps the current data in anerasable medium such as a magneticdisk and migrates the data to anotherdisk (which could be magnetic or opti-cal) when a time split is made. Thispartitions the data in nodes by transac-tion time and key (like the WOBT), butis more space efficient. It also separatesthe current records from most of thehistorical records. In addition, the TSB-tree does not keep a “root log.” Instead,it creates new roots as B1-trees do, byincreasing the height of the tree whenthe root splits.

When a data page is full and there arefewer than some threshold value ofalive distinct keys, the TSB-tree willsplit the page by transaction time only.This is the same as what the WOBTdoes, except now times other than thecurrent time can be chosen. For exam-ple, the split time for a data page couldbe the “time of last update,” after whichthere were only insertions of recordswith new keys and no updates creatingnew versions of already existingrecords. The new insertions, after thetime chosen for the split, need not havecopies in the historical node. Time splitsin the WOBT and in the TSB-trees areillustrated in Figure 16.

Time splitting, whether by currenttime or by time of last update, enablesan automatic migration of older ver-sions of records to a separate historicaldatabase. This is to be contrasted withPOSTGRES’ vacuuming, which is “man-ual” and is invoked as a separate back-ground process that searches throughthe database for dead records.

It can also be contrasted with meth-ods that reserve optical pages for pagesthat cannot yet be moved and maintaintwo addresses (a magnetic page addressand an optical page address) for search-ing for the contents. TSB-tree migration



takes place when a time split is made.The current page retains the currentcontents and the historical records arewritten sequentially to the optical disk.The new optical disk address and thetime of the split are posted to the parentin the TSB-tree. As with B1-tree nodesplitting, only the node to be split, thenewly allocated node, and the parentare affected (also, but rarely, a full par-ent may require further splitting). Sincethe node is full and is obtaining newdata, a split must be made anyway,whether or not the new node is on anoptical disk. (This migration to an ar-chive can also be used for media recov-ery, as illustrated in Lomet and Sal-zberg [1993].)

Time splitting by other than the cur-rent transaction time has another ad-vantage. It can be used in distributeddatabases where the commit time of atransaction is not known until the com-mit message is sent from the coordina-tor. In such a database, an updatedrecord of a PREPARED cohort may ormay not have a commit time before thetime when an overflowing page contain-ing it must be split. Such a page canonly be split by a time before the time

voted by the cohort as the earliest timeit may commit (see Salzberg [1994] andLomet [1993] for details).

Full data pages with a large numberof distinct keys currently “alive” aresplit by key only in the TSB-tree. TheWOBT splits first by time and then bykey. Similar to the WOBT, space usagefor the TSB-tree is O~n/B!. The con-stant factor in the asymptotic bound issmaller for the TSB-tree, since it makesfewer copies of records. Key splitting forthe WOBT and the TSB-tree is shown inFigure 17.

An extensive average-case analysisusing Markov chains, and consideringvarious rates of update versus inser-tions of records with new keys, can befound in Lomet and Salzberg [1990],showing, at worst, two copies of eachrecord, even under large update rates.The split threshold was kept at 2B/3.(If more than 2B/3 distinct keys are inthe page, a pure key split is made.)

There is, however, a problem withpure key splits. The decision on the keysplits is made on the basis of alive keysat the time the key split is made. Forexample, in Figure 17(b), the key split is

Figure 16. Time splitting in the WOBT and TSB-trees.



taken at time t 5 18, when there are sixkeys alive, separated three per newpage. However, this key range divisiondoes not guarantee that the two pageswill have enough alive keys for all pre-vious times; at time t 5 15, the bottompage has only one key alive.

Suppose we have a database wheremost of the changes are insertions ofrecords with a new key. As time goes by,in the TSB-tree, only key splits aremade. After a while, queries of a pasttime will become inefficient. Everytimeslice query will have to visit everynode of the TSB-tree, since they are allcurrent nodes. Queries as of now, or of

recent time, will be efficient, since everynode will have many alive records. Butqueries as of the distant past will beinefficient, since many of the currentnodes will not contain records that were“alive” at that distant past time.

In addition, as in the WOBT, theTSB-tree merely posts deletion markersand does not merge sparse nodes. If nomerging of current nodes is done, andthere are many record deletions, a cur-rent node may contain few currentrecords. This could make current searchslower than it should be.

Thus the worst-case search time forthe TSB-tree can be O~n/B! for a trans-

Figure 17. Key splitting in the WOBT and TSB-tree.



action (pure or range) timeslice. Pagesmay be accessed that have no answersto the query. Other modifications ofEaston [1986], discussed in the nextsection, combined with the TSB-treemodifications of the author aboveshould solve this problem. Basically,when there are too few distinct keys atany time covered by the time-key rect-angle of a node to be split, it must besplit by time and then possibly by key.Node consolidation should also be sup-ported (to deal with pages lacking inalive keys due to deletions).

Index nodes in the TSB-tree aretreated differently from data nodes. Thechildren of index nodes are rectangles intime-key space. So making a time splitor key split of an index node may causea lower level node to be referred to bytwo parents.

Index node splits in the TSB-tree arerestricted in ways that guarantee thatcurrent nodes (the only ones where in-sertions and updates occur) have onlyone parent. This parent is a currentindex node. Updates need never bemade in historical index nodes, whichlike historical data nodes can be placedon WORM devices.

A time split can be done at any timebefore the start time of the oldest cur-rent child. If time splits were allowed atcurrent transaction times for indexnodes, lower level current nodes wouldhave more than one parent.

A key split can be done at any currentkey boundary. This also assures thatlower level current nodes have only oneparent. Index node splitting is illus-trated in Figure 18.

Unlike the WOBT (or Lanka andMays [1991] or Becker et al. [1996]), theTSB-tree can move the contents of thehistorical node to another location in aseparate historical database withoutupdating more than one parent. Nonode which might be split in the futurehas more than one parent. If a nodedoes a time split, the new address of thehistorical data from the old node can beplaced in its unique parent and the oldaddress can be used for the new current

data. If it does a key split, the new keyrange for the old page can be postedalong with the new key range and ad-dress.

As with the WOBT, pure-key querieswith time predicates are addressed inO~logBn 1 a! I/Os, where a representsthe size of the answer.

Persistent B-Tree. Several meth-ods [Lanka and Mays 1991; Becker etal. 1996; Varman and Verma 1997] werederived from a method by Driscoll et al.[1989] for general main-memory resi-dent linked data structures. Driscoll etal. [1989] show how to take an “ephem-eral data structure” (meaning that paststates are erased when updates aremade) and convert it to a “persistentdata structure” (where past states are

Figure 18. Index node splitting in the TSB-tree.



maintained). A “fully persistent” datastructure allows updates to all paststates. A “partially persistent” datastructure allows updates only to themost recent state.

Consider the abstraction of a transac-tion time database as the “history of anevolving set of plain objects” (Figure 1).Assume that a B1-tree is used to indexthe initial state of this evolving set.If this B1-tree is made partially persis-tent, we have constructed an accessmethod that supports transaction range-timeslice queries (“range/-/point”). Con-ceptually, a range-timeslice query fortransaction time t is answered by tra-versing the B1-tree as it was at t. Par-tial persistence is nicely suited to trans-action time, since only the most recentstate is updated. Note that the methodused to index the evolving set state af-fects what queries are addressed. Forexample, to construct a pure-timeslicemethod, the evolving set state is repre-sented by a hashing function that ismade partially persistent. This is an-other way to “visualize” the approachtaken by the snapshot index.

Note that a fully persistent accessstructure can be restricted to the par-tially persistent case, which is the rea-son for discussing Driscoll et al. [1989]and Lanka and Mays [1991] in this sur-vey.

Lanka and Mays [1991] provide afully persistent B1-tree. For our pur-poses, we are only interested in themethods presented in Lanka and Mays[1991] when reduced to partial persis-tence. Thus we call Lanka and Mays’[1991] partially persistent method thepersistent B-tree. The multiversion B-tree (or MVBT) of Becker et al. [1996]and the MVAS of Varman and Verma[1997] are also partially persistent B1-trees. The Persistent B-tree and theMVBT, MVAS support node consolida-tion (that is, a page is consolidated withanother page if it becomes sparse inalive keys due to frequent deletions). Incomparison, the WOBT and the TSB-tree are partially persistent B1-trees,

which do not do node consolidation(since they aim for applications wheredata is mainly updated and infre-quently deleted). Node consolidationmay result in thrashing (consolidatingand splitting the same page continual-ly), which results in more space. TheMVBT, MVAS disallow thrashing, whilethe persistent B-tree does not.

Driscoll et al. [1989]; Lanka and Mays[1991]; Becker et al. [1996]; andVarman and Verma [1997] speak of ver-sion numbers rather than timestamps.One important difference between ver-sion numbers for partially persistentdata and timestamps is that times-tamps as we have defined them aretransaction time instants when events(changes) are stored. So timestamps arenot consecutive integers; but versionnumbers can be consecutive integers.This has an effect on search operations,since Driscoll et al. [1989], Lanka andMays [1991], and Becker et al. [1996]maintain an auxiliary structure calledroot*, which serves the same purpose asthe “root log” of the WOBT.

In Driscoll et al. [1989], root* is anarray indexed on version numbers. Eacharray entry has a pointer to the root ofthe version in question. If the versionnumbers are consecutive integers,search for the root is O~1!. If time-stamps are used, search is O~logBn!. InLanka and Mays [1991] and Becker etal. [1996], root* only obtains entrieswhen a root splits. Although root* issmaller than it would be if it had anentry for each timestamp, search withinroot* for the correct root is O~logBn!.

The use of the root* structure (array)in Lanka and Mays [1991] and Beckeret al. [1996] facilitates faster updateprocessing, as the most current versionof the B1-tree is separated from most ofthe previous versions. The most currentroot can have a separate pointer yield-ing O~1! access to that root. (Each rootcorresponds to a consecutive set of ver-sions.) If the current version has size m,updating is O~logBm!. Methods that do



not use the root* structure haveO~logBn! update processing.

Driscoll et al. [1989] explains how tomake any ephemeral main-memorylinked structure persistent. Two mainmethods are proposed: the fat nodemethod and the node copying method.The fat node method keeps all the vari-ations of a node in a variable-sized “fatnode.” When an ephemeral structurewants to update the contents of a node,the fat node method simply appends thenew values to the old node, with a nota-tion of the version number (timestamp)that does the update. When an ephem-eral structure wants to create a newnode, a new fat node is created.

Lanka and Mays [1991] apply the fatnode method of Driscoll et al. [1989] tothe B1-tree. The fat nodes are collec-tions of B1-tree pages, each correspond-ing to a set of versions. Versions canshare B1-tree pages if the records inthem are identical for each member ofthe sharing set. But versions with onlyone different data record have distinctB1-tree pages.

Pointers to lower levels of the struc-ture are pointers to fat nodes, not toindividual pages within fat nodes. Whena record is updated, inserted, or deleted,a new leaf page is created. The new leafis added to the old fat node. If the newleaf contents overflows, the new leaf issplit, with the lower-value keys in theold fat node and the higher value keysin a new fat node. When a leaf splits,the parent node must be updated tosearch correctly for part of the new ver-sion that is in the new fat node. When anew page is added to a fat node, theparent need not be updated.

Similarly, when index nodes obtainnew values because a fat node child hasa split, new pages are allocated to thefat index node. When an index nodewants to split, the parent of the indexnode obtains a new value. When rootssplit, a new pointer is put in the arrayroot*, which allows access to the correct(fat) root nodes.

Since search within a fat node meansfetching all the pages in the fat nodeuntil the correct one is found (with thecorrect version number), Lanka andMays [1991] suggest a version block: anauxiliary structure in each fat node ofthe persistent B-tree. The version blockindicates which page or block in the fatnode corresponds to which version num-ber. Figure 19 shows the incrementalcreation of a version block with its fatnode pages. In Figure 20, an updatecauses this version block to split. Theversion block is envisioned as one diskpage, but there is no reason it might notbecome much larger. It may itself haveto take the form of a multiway accesstree (since new entries are alwaysadded at the end of a version block).Search in one version block for one datapage could itself be O~logBn!. For exam-ple, if all changes to the database wereupdates of one B1-tree page, the fatnode would have n B1-tree pages in it.

Although search is no longer linearwithin the fat node, the path from theroot to a leaf is at least twice as manyblocks as it would be for an ephemeralstructure. The height of the tree inblocks is at least twice what it would befor an ephemeral B1-tree containing thesame data as one of the versions. Up-date processing is amortized O~logBm!where m is the size of the current B1-tree being updated. Range timeslicesearch is O~logBn~logBm 1 a/B!!. Afterthe correct root is found, the tree thatwas current at the time of interest issearched. This tree has heightO~logBm!, and searching each versionblock in the path is O~logBn!. A similarbound holds for the pure-timeslicequery. Space is O~n! (not O~n/B!), sincenew leaf blocks are created for eachupdate.

To avoid creating new leaf blocks foreach update, the “fat field method” isalso proposed in Lanka and Mays [1991]where updates can fit in a space ofnonfull pages. In the general full persis-tence case, each update must be marked



with the version number that created itand with the version numbers of alllater versions that delete it. Since weare interested in partial persistence,this corresponds to the start time andthe end time of the update. Fat fieldsfor the persistent B-tree are illustratedin Figure 21.

When a page becomes full, the versioncreating the overflow copies all informa-tion relevant to that version to a newnode. The persistent B-tree then createsa fat node and a version block. If the

new copied node is still overflowing, akey split can be made and then informa-tion must be posted to the parent noderegarding the key split and the newversion. Thus the Persistent B-tree ofLanka and Mays [1991] does time splitsand time-and-key splits just as in theWOBT. In this variation, space usage isO~n/B!, update processing is amortizedO~logBm!, and query time (for both therange and pure-timeslice queries) isO~logBn~logBm 1 a/B!!. The update

Figure 19. Incremental creation of a fat node in the persistent B-tree.



and query time characteristics remainasymptotically the same as in the fat-node method, since the fat-field methodstill uses version blocks.

If a page becomes sparse from toomany deletions, a node consolidation al-gorithm is presented. The version mak-ing the last deletion copies all informa-tion relative to that version to a newnode. Then a sibling node also has itsinformation relative to that version cop-ied to the new node. If it is necessary,the new node is then key-split.

Technically speaking, the possibility ofthrashing by continually consolidatingand splitting the same node could cause

space usage to become O~n!, not O~n/B!.This could happen by inserting a recordin a node, causing it to time-and-key split,then deleting a record from one of thenew current nodes and causing a nodeconsolidation, which creates a new fullcurrent node, and so forth. A solution forthrashing appears in Snodgrass and Ahn[1985]. Basically, the threshold for nodeconsolidation is made lower than half thethreshold for node splitting. Since this isa rather pathological scenario, we con-tinue to assume that space usage for thefat-fields variation of the persistent B-tree is O~n/B!.

Figure 20. An example of a split on a fat node in the persistent B-tree.



To move historical data to anothermedium, observe that time splitting bycurrent transaction time as performedin the persistent B-tree means thatnodes cannot be moved once they arecreated, unless all the parents (not justthe current ones) are updated with thenew address of the historical data. Onlythe TSB-tree solves this problem bysplitting index nodes before the time ofthe earliest start time of their currentchildren. Thus, in the TSB-tree, when acurrent node is time-split, the historicaldata can be moved to another disk. In

the TSB-tree, current nodes have onlyone parent.

Fat nodes are not necessary for par-tial persistence. This is observed inDriscoll et al. [1989], where “node-copy-ing” for partially persistent structuresis discussed.

The reason fat nodes are not neededis that although alive (current) nodeshave many parents, only one of them iscurrent. So when a current node is cop-ied or split, only its current parent hasto be updated. The other parents willcorrectly refer to its contents as of a

Figure 21. The fat-field method of the persistent B-tree.



previous version. The fact that newitems may have been added does notaffect the correctness of search. Sincenodes are always time– split (most re-cent versions of copied items) by currenttransaction time, no information iserased when a time split is made.

Both approaches of the persistent B-tree use a version block inside each fatnode. If the node in question is neverkey-split (that is, all changes are ap-plied to the same ephemeral B1-treenode), a new version of block pages maybe created for this node, without updat-ing the parent’s version block. Thus,when answering a query, all encoun-tered version blocks have to be searchedfor time t. In comparison, the MVBTand MVAS we discuss next use “node-copying,” and so have better asymptoticquery time (O~logBn 1 a/B!).

Multiversion B-Tree and Multiver-sion Access Structure. The multiver-sion B-tree of Becker et al. [1996] andthe multiversion access structure ofVarman and Verma [1997] provide an-other approach to partially persistentB1-trees. Both structures have thesame asymptotic behavior, but theMVAS improves the constant of MVBT’sspace complexity. We first discuss theMVBT and then present its main differ-ences with the MVAS.

The MVBT is similar to the WOBT;however, it efficiently supports dele-tions (as in Driscoll et al. [1989] andLanka and Mays [1991]). Supporting de-letions efficiently implies use of nodeconsolidation. In addition, the MVBTuses a form of node-copying [Driscoll etal. 1989], and disallows thrashing.

As with the WOBT and the persistentB-tree, it uses a root* structure. Whenthe root does a time-split, the siblingbecomes a new root. Then a new entry isplaced in the variable length arrayroot*, pointing to the new root. If theroot does a time-and-key split, the newtree has one more level. If a child of theroot becomes sparse and merges with itsonly sibling, the newly merged node be-comes a root of a new tree.

Figures 21 and 22 illustrate some ofthe similarities and differences betweenthe persistent B-tree, the MVBT, andthe WOBT. To better illustrate the sim-ilarities, we picture the WOBT in Fig-ure 22 with end times and start times ineach record version. In the originalWOBT, end times of records were calcu-lated from the begin times of the nextversion of the record with the same key.If no such version was in the node, theend time of the record was known to beafter the end time of the node.

In all three methods, if we have nonode consolidation, the data nodes areexactly the same. In all three methods,when a node becomes full, a copy ismade of all the records “alive”at thetime the version makes the update thatcauses the overflow. If the number ofdistinct records in the copy is abovesome threshold, the copy is split intotwo nodes by key.

The persistent B-tree creates a fatnode when a data node is copied. TheWOBT and the MVBT do not create fatnodes. Instead, as illustrated in Figure22, information is posted to the parentof the overflowing data node. A newindex entry or two new index entries,which describe the split, are created. Ifthere is only one new data node, the keyused as the lower limit for the overflow-ing child is copied to the new indexentry. The old child pointer gets thetime of the copy as its end time and thenew child pointer gets the split time asits start time. If there are two newchildren, they both have the same starttime, but one has the key of the over-flowing child and the other has the keyused for the key split.

A difference between the persistentB-tree, the WOBT, the MVBT, on onehand, and the TSB, on the other, is thatthe TSB does not have root*. When theonly root in the TSB does a time split, anew level is placed on the tree to con-tain the information about the split.When the root in the MVBT does atime-split, root* obtains a new entry.When the root in the (fat-field) persis-tent B-tree does a time split, that root



fat node obtains a new page and a newentry in the version block. (Only whenthe root fat node requires a key split ora merge, so that a new root fat node isconstructed, does root* obtain a newentry in the Persistent B-tree.)

Another difference with the WOBT isthat the MVBT and the persistent B-tree use a node consolidation algorithm.When a node is sparse, it is consolidatedwith a sibling by time- splitting (copy-ing) both the sparse node and its siblingand then combining the two copies, pos-sibly key-splitting if necessary.

In addition, the MVBT disallowsthrashing (splitting and consolidating

the same node continually) by suggest-ing that the threshold for splitting behigher than twice the threshold for con-solidating. The persistent B-tree doesnot disallow thrashing. This is not anissue with the WOBT, since it does nonode consolidation.

Search in root* for the correct root inMBVT is O~logBn!. Although the exam-ple illustrated in Figure 22 has a smallroot*, there is no reason why this shouldalways be the case. We only need to imag-ine a database with one data node withrecords that are continually updated,causing the root (which is also the datanode) to continually time-split. So if the

Figure 22. The multiversion B-tree and the write-once B-tree. (For simplicity of comparison, both theend and start times appear in each record, which is not needed in the original WOBT).



root* becomes too large, a small index hasto be created above it.

In MBVT, the transaction range time-slice search (“range/-/point”) is O~logBn1 a/B!, since search for the root is it-self O~logBn!. The MVBT has O~logBm!amortized update cost (where m nowdenotes the size of the current B1-treeon which the update is performed), andO~n/B! space usage. Thus the MVBTprovides the I/O-optimal solution to thetransaction range timeslice problem.The update cost is amortized due to theupdating needed to maintain efficientaccess to the root* structure.

The WOBT is not optimal becausedeletions of records can cause pages tobe sparsely populated as of some recenttime. Thus, transaction range-timeslicesearches may become inefficient. Inser-tions in the WOBT and in the MVBTare O~logBm!, since a special pointercan be kept to the root of the tree con-taining all current records (that areO~m!).

The MVBT uses more space than theWOBT, which in turn uses more spacethan the TSB-tree. In order to guardagainst sparse nodes and thrashing, theMVBT policies create more replication(the constant in the O~n/B! spaceworst-case bound of the method is about10.)

Probably the best variation of theWOBT is to use some parameters todecide whether to time-split, time-and-key split, key-split, time-split andmerge, or time-split, merge and key-split. These parameters depend on theminimum number of versions alive atany time in the interval spanned by thepage. All of the policies pit disk spaceusage against query time. A pure key-split creates one new page. A time-and-key split creates two new pages: onenew historical page and one new cur-rent page. The historical page will havecopies of the current records, so morecopies are made than when pure keysplits are allowed. Node consolidationcreates at least two new historical

pages. However, once a minimum num-ber of records is guaranteed to be alivefor any given version in all pages,range-timeslice queries are O~logBn 1

a/B! and space usage is O~n/B!. Differ-ent splitting policies will affect the totalamount of space used and the averagenumber of copies of record versionsmade.

The multiversion access structure(MVAS) [Varman and Verma 1997] issimilar to the MVBT, but it achieves asmaller constant on the space bound byusing better policies to handle the caseswhere key-splits or merges are per-formed. There are two main differencesin the MVAS policies. The first dealswith the case when a node becomessparse after performing a record dele-tion. Instead of always consuming anew page (as in the MVBT), the MVAStries to find a sibling with free spacewhere the remaining alive entries of thetime-split page can be stored. The condi-tions under which this step is carriedout are described in detail in Varmanand Verma [1997]. The second differ-ence deals with the case when the num-ber of entries in a just time-split node isbelow the prespecified threshold. If asibling page has enough alive records,the MVBT copies all the sibling’s aliverecords to the sparse time-split page,thus “deleting” the sibling page. In-stead, the MVAS copies only as manyalive records as needed from the siblingpage for the time-split page to avoidviolating the threshold. The above twomodifications reduce the extent of dupli-cation, hence reducing the overall space.As a result, the MVAS reduces theworst-case storage bound of MVBT by afactor of 2.

Since the WOBT, TSB-tree, persistentB-tree, MVBT, and MVAS are similar intheir approach to solving range-time-slice queries, we summarize their char-acteristics in Table I. The issues oftime-split, key-split, time- and key-split,sparse nodes, thrashing, and history mi-gration are closely related.



The pure-key query is not addressedin the work of Driscoll et al. [1989];Lanka and Mays [1991]; and Becker etal. [1996]; however, the technique thatkeeps the address of any one copy of themost recent distinct previous version ofthe record with each record can avoidgoing through all copies of a record. Thepure-key query (with time predicate) isthen addressed in O~logBn 1 a/B! I/Os,just as proposed for the WOBT (where arepresents the number of different ver-sions of the given key).

As discussed in Section 5.1.1, Varmanand Verma [1997] solve the pure-keyquery (with time predicate) in optimalquery time O~logBn 1 a/B! usingC-lists. An advantage of the C-lists, de-spite their extra complexity in mainte-nance, is that they can be combinedwith the main MVAS method.

Exodus and Overlapping B1-Trees. The overlapping B1-tree[Manolopoulos and Kapetanakis 1990;Burton et al. 1985] and the Exoduslarge storage object [Richardson et al.1986] are similar. We begin here with aB1-tree. When a new version makes anupdate in a leaf page, copies are made ofthe full path from the leaf to the root,changing references as necessary. Each

new version has a separate root andsubtrees may be shared (Figure 23).

Space usage is O~nlogBn!, since newpages are created for the whole pathleading to each data page updated by anew version. Update processing isO~logBm!, where m is the size of thecurrent tree being updated. Timeslice orrange-timeslice query time depends onthe time needed to find the correct root.If nonconsecutive transaction time-stamps of events are used, it isO~logBn 1 a/B!.

Even though pure-key queries of theform “find the previous salary history ofemployee A who was alive at t” (i.e.,with time predicate) are not discussed,they can in principle be addressed inthe same way as the other B1-tree-based methods by linking data recordstogether.

Multiattribute Indexes. Supposethat the transaction start time, transac-tion end time, and database key areused as a triplet key for a multiat-tribute point structure. If this structureclusters records in disk pages by close-ness in several attributes, one can ob-tain efficient transaction pure-timesliceand range-timeslice queries using onlyone copy of each record.

Table I. Basic Characteristics of WOBT, TSB-Tree, Persistent B-Tree, MVBT, and MVAS

time split1 pure keysplit2

time/keysplit

sparse nodemerge

preventthrashing3 root*4 history

migrate5

WOBT yes no yes no N.A. yes noTSB-Tree yes yes no no N.A. no yesPersistent B-Tree yes no yes yes no yes noMVBT/MVAS yes no yes yes yes yes no

1. All methods time-split (copy) data and index nodes. The TSB-tree can time-split by other thancurrent time.2. The TSB-Tree does pure key splits. The other methods do time-and-key splits. Pure key splits use lesstotal space, but risk poor performance on past-time queries.3. Thrashing is repeated merging and splitting of the same node. Only the MBVT prevents thrashing bychoice of splitting and merging thresholds. Prevention of thrashing is not needed when there is nomerging.4. The use of root* enables the current tree search to be more efficient by keeping a separate pointer toits root. Past time queries must search within root*, so are not more efficient than methods withoutroot*.5. Only the TSB-tree has only one reference to current nodes, allowing historial data to migrate.



Records with similar values of starttime, end time, and key are clusteredtogether in disk pages. Having both asimilar start time and a similar endtime means that long-lived records willbe in the same page as other long-livedrecords. These records are answers tomany timeslice queries. Short-livedrecords will only be on the same pages iftheir short lives are close in time. Thesecontain many correct answers to time-slice queries with time values in theshort interval that their entries span.Every timeslice query accesses some ofthe long-lived record pages and a smallproportion of the short-lived recordpages. Individual timeslice queries donot need to access most of the short-lived record pages, as they do not inter-sect the timeslice.

There are some subtle problems withthis. Suppose a data page is split bystart time. In one of the pages resultingfrom the split, all the record versionswhose start time is before the split timeare stored. This page has an upperbound on start time, implying that nonew record versions can be inserted. Allnew record versions will have a starttime after now, which is certainly aftersplit time. Further, if there are currentrecords in this page, their end time willcontinue to rise, so the lengths of therecord time spans in this page will bevariable.

Some are long and others short. Que-ries as of current transaction time mayonly retrieve a few (or no) records froma page that is limited by an upperbound on start time. This is illustratedin Figure 24. Many such pages mayhave to be accessed in order to answer aquery, each one contributing very littleto the answer (i.e., the answer is notclustered well in pages).

Also, when a new version is created,its start time is often far from the starttime of its predecessor (the previousversion with the same key). So consecu-tive versions of the same record areunlikely to be on the same page if start-time splits are used.

Now suppose we decide that splittingby start time is a bad idea and we splitonly by key or by end time. Splitting byend time enables migration of past datato a WORM disk. However, a query of apast transaction time may only retrievea small number of records if the recordsare placed only by the requirement ofhaving an end time before some cut-offvalue, just as in Figure 12.

Current pages (which were split bykey) can contain versions whose life-times are very long and versions whoselifetimes are very short. This alsomakes past-time queries inefficient.

All of these subtle problems comefrom the fact that many records are stillcurrent and have growing lifetimes andall new record versions have increasingstart times. Perhaps if we use a point-based multiattribute index for dead ver-sions only, efficient clustering may bepossible. Here newly dead record ver-sions can be inserted in a page with anupper limit on start time because thestart times may have been long ago.

Figure 23. The Overlapping tree/Exodus struc-ture.

Figure 24. Storing data with similar start_times.



Items can be clustered in pages by key,nearby start times, and nearby endtimes. No guarantee can be made that aquery as of a given time will hit a mini-mum number of record versions in apage, however. For example, imagine apage with record versions with veryshort lifetimes, all of which are close bybut none of which overlap.

Although no guarantee of worst-casesearch time can be made, the advan-tages of having only one copy of eachrecord and no overlapping of time-keyspace, so that backtracking is not neces-sary, may make this approach worth-while, at least for “dead” versions.Space usage is thus linear (space isO~n/B!, if in addition the multiat-tribute method can guarantee that in-dex and data pages have good spaceutilization). A method for migratingcurrent data to the WORM and organiz-ing the current data for efficient tempo-ral queries is needed if the multiat-tribute method was used for past dataonly.

5.1.4 Summary. The worst-case per-formance of the transaction-time meth-ods is summarized in Table II. Thereader should be cautious when inter-preting worst-case performance; the no-tation sometimes penalizes a method forits performance on a pathological sce-nario. The footnotes indicate such cases.

5.1.5 Declustering and Bulk Loading.The query bounds presented in Table IIassume that queries are processed in auniprocessor environment. Query per-formance can be substantially improvedif historical data is spread (declustered)across a number of disks that are thenaccessed in parallel. Temporal data of-fers an ideal declustering predicatebased on time. This idea is explored inKouramajian et al. [1994] and Muth etal. [1996]. Muth et al. [1996] present away to decluster TSB pages. The declus-tering method, termed LoT, attempts toassign logically consecutive leaf (data)pages of a TSB tree into a number ofseparate disks. When a new data page

is created in a TSB tree, it is allocated adisk address based on the disk ad-dresses used by its neighboring pages.Various worst-case performance guar-antees are derived. Simulation resultsshow a large benefit over random datapage allocation. Another declusteringapproach appears in Kourmajian et al.[1994], where time-based declustering ispresented for the time-index. For de-tails we refer to Muth et al. [1996] andKouramajian et al. [1994].

Most transaction-time access methodstake advantage of the time-orderedchanges to achieve good update perfor-mance. The update processing compari-son presented in Table II is in terms ofa single update. Faster updating can beachieved if updates are buffered andthen applied to the index in bulks ofwork. The log-structured history dataaccess method (LHAM) [Neil and Wei-kum 1993] aims to support extremelyhigh update rates. It partitions datainto successive components based ontransaction-time timestamps. Toachieve fast update rates, the most re-cent data is clustered together by trans-action start-time as it arrives. A simple(B1-tree-like) index is used to indexsuch data quickly; since it is based ontransaction start-time and is not veryefficient for the queries we examinehere. As data gets older, it migrates(using an efficient technique termedrolling merge) to another componentwhere the authors propose using other,more efficient, indexes (like the TSB-tree, etc.). Van den Bercken et al. [1997]recently proposed a generic algorithm toquickly create an index for a presum-ably large data set. This algorithm canbe applied to various index structures(the authors have applied it to R-treesand on MVBT trees). Using the basicindex structure, buffers are used ateach node of the index. As changes ar-rive they are buffered on the root and,as this buffer fills, changes are propa-gated in page units to buffers in lowerparts of the index, until the leaves arereached. Using this approach, the totalupdate cost for n changes becomes



Table II. Performance Characteristics of Examined Transaction-Time Methods

Access Method(related section) Total Space Update per

change Pure-key Query Pure-TimesliceQuery

Range-TimesliceQuery

AP-Tree (5.1.2) O~n/B! O~logBn!1 N/A O~n/B! O~n/B!ST-Tree (5.1.2) O~n/B! O~logBS!2 O~logBS 1 a!2 O~SlogBn!2 O~KlogBn!3

Time-Index (5.1.2) O~n2/B! O~n/B! N/A O~logBn 1 a/B! O~logBn 1 s/B!4

Two-level Time(5.1.2)

O~n2/B! O~n/B! N/A O~RlogBn 1 a!5 O~MlogBn 1 a!6

CheckpointIndex(5.1.2)7 O~n/B! O~n/B! N/A O~n/B! O~n/B!

Archivable Time(5.1.2)8 O~n/B! O~logBn! N/A O~log2n 1 a/B! O~log2n 1 s/B!4

Snapshot Index(5.1.2)

O~n/B! O~1!9 O~a!10 O~logBn 1 a/B! O~logBn 1 s/B!4

Windows Method(5.1.2)

O~n/B! O~logBn! O~logBn 1 a! O~logBn 1 a/B! O~logBn 1 s/B!4

R-Trees (5.1.3) O~n/B! O~logBn! O~n/B!11 O~n/B! 11 O~n/B!11

SR-Tree (5.1.3) O~~n/B!logBn! O~lognB! O~n/B! 11 O~n/B! 11 O~n/B! 11

WOBT(5.1.3)12O~n/B! O~logBm!13 O~logBn 1 a!14 O~logBn 1 a/B! O~logBn 1 a/B!

TSB-Tree (5.1.3) O~n/B! O~logBn! O~logBn 1 a!14 O~n/B!15 O~n/B!15

Persistent B-tree/FatNode (5.1.3)

O~n! O~logBm!13 O~logBnlogBm 1

a!14,16

O~logBn~logBm 1

a/B!!16

O~logBn~logBm 1

a/B!!16

Persistent B-tree/FatField (5.1.3)

O~n/B! O~logBm!13 O~logBnlogBm 1

a!14, 16

O~logBn~logBm 1

a/B!!16

O~logBn~logBm 1

a/B!!16

MVBT(5.1.3) O~n/B! O~logBm!13 O~logBn 1 a!14 O~logBn 1 a/B! O~logBn 1 a/B!MVAS(5.1.1 & 5.1.3) O~n/B! O~logBm!13 O~logBn 1 a/B!14,17O~logBn 1 a/B! O~logBn 1 a/B!Overlapping B-Tree

(5.1.3)O~nlogBn! O~logBm!13 O~logBn 1 a!14 O~logBn 1 a/B! O~logBn 1 a/B!

1This is the time needed when the end_time of a stored interval is updated; it is assumed that the start_time of theupdated interval is given. If intervals can be identified by some key attribute, then a hashing function could find theupdated interval at O~1! expected amortized time. In the original paper it was assumed that intervals can only beadded and in increasing start_time order; in that case the update time is O~1!.2Where S denotes the number of different keys (surrogates) ever created in the evolution.3Where K denotes the number of keys in the query key range (which may or may not be alive at the time of interest).4Where s denotes the size of the whole timeslice for the time of interest. No separation of the key space in regions isassumed.5Where Ris the number of predefined key regions.6Assuming that a query contains a number of predefined key-regions, Mdenotes the number of regions in the queryrange.7The performance is under the assumption that the Checkpoint Index creates very few checkpoints and the spaceremains linear. The update time is O~n/B!since when the end_time of a stored interval is updated, the interval has tobe found. As with the AP-Index, if intervals can be identified by some key attribute, then a hashing function could findthe updated interval at O~1!. The original paper did not deal with this issue since it was implicitly assumed thatinterval endpoints are known at insertion. If checkpoints are often then the method will behave as the Time Index.8For the update it is assumed that the start_time of the updated interval is known. Otherwise, if intervals can beidentified by some key, a hashing function could be used to find the start_time of the updated interval. For therange-timeslice query, we assume no extra structure is used. The original paper proposes using an approach similar tothe Two-Level Time Index or the ST-Tree.9In the expected amortized sense, using a hashing function on the object key space. If no hashing but a B-tree is usedthen the update becomes O~logBm!, where mis the size of the current state, on which the update is performed10Assuming as in 9 that a hashing function is used. If a B-tree is used the query becomes O~logBS 1 a!, where S isthe total number of keys ever created.11This is a pathological worst case, due to the non-guaranteed search on an R-tree based structure. In most cases theavg. performance would be O~logBn 1 a!. Note that all the R-tree related methods assume both interval endpoints areknown at insertion time.12Here we assume that the WOBT tree is implemented thoroughly on a magnetic disk, and that no (or infrequent)deletions occur, i.e., just additions and updates.13In the amortized sense, where mdenotes the size of the current tree being updated.14For a pure-key query of the form: “find the previous salaries of employee Awho existed at time t”.15This is a pathological worst case, where only key-splits are performed. If a time-split is performed before a key-splitwhen nodes resulting from a pure key-split would have too few records “alive” at the begin time of the node, then thequery takes O~logBn 1 a!,also assuming infrequent deletions.16Where m denotes the size of the ephemeral B1-tree at the time of interest.17The pure-key query performance assumes the existence of the C-lists on top of the MVAS structure.



O~~n/B!logcn! where c is the number ofpages available in main-memory, i.e., itis more efficient than inserting updatesone by one (where instead the total up-date processing is O~nlogBn!.

5.1.6 Temporal Hashing. Externaldynamic hashing has been used in tra-ditional database systems as a fast ac-cess method for membership queries.Given a dynamic set D of objects, amembership query asks whether an ob-ject with identity z is in the most cur-rent D. While such a query can still beanswered by a B1-tree, hashing is onaverage much faster (usually one I/Oinstead of the usual two to three I/Osfor traversing the tree index). Note,however. that the worst-case hashingperformance is linear to the size of D,while the B1-tree guarantees logarith-mic access. Nevertheless, due to itspracticality, hashing is a widely usedaccess method. An interesting questionis whether temporal hashing has an ef-ficient solution. In this setting, changesto the dynamic set are timestamped andthe membership query has a temporalpredicate, as in “find whether objectwith identity z was in the set D at timet.” Kollios and Tsotras [1998] present anefficient solution to this problem by ex-tending traditional linear hashing[Litwin 1980] to a transaction-time en-vironment.

5.2 Valid-Time Methods

According to the valid-time abstractionpresented in Section 2, a valid-time da-tabase should maintain a dynamic col-lection of interval-objects. Arge and Vit-ter [1996] recently presented an I/Ooptimal solution for the “*/point/-”query. The solution (the external inter-val tree) is based on a main-memorydata structure, the interval tree that ismade external (disk-resident)[Edels-brunner 1983]. Valid timeslices are sup-ported in O~l/B! space, using O~logBl!update per change (interval addition,

deletion, or modification) and O~logB

l 1 a/B! query time. Here l is the num-ber of interval-objects in the databasewhen the update or query is performed.Even though it is not clear how practi-cal the solution is (various details arenot included in the original paper), theresult is very interesting. To optimallysupport valid timeslices is a rather dif-ficult problem because in a valid-timeenvironment the clustering of data inpages can dramatically change the up-dates. Deletions are now physical andinsertions can happen anywhere in thevalid-time domain. In contrast, in atransaction-time environment objectsare inserted in increasing time orderand after their insertion they can be“logically” deleted, but they are not re-moved from the database.

A valid timeslice query (“*/point/-”) isactually a special case of a two-dimen-sional range query. Note that an inter-val contains a query point v if and onlyif its start_time is less than or equal tov and its end_time is greater than orequal to v. Let us map an interval I 5~x1, y1! into a point ~x1, y1! in the two-dimensional space. Then an intervalcontains query v if and only if its corre-sponding two-dimensional point lies in-side the box generated by lines x 5 0,x 5 v, y 5 v, and y 5 ` (Figure 25).Since an interval’s end_time is alwaysgreater or equal than its start_time, allintervals are represented by pointsabove the diagonal x 5 y. This two-di-mensional mapping is used in the prior-ity search tree [McCreight 1985], thedata structure that provides the main-memory optimal solution. A number ofattempts have been made to externalizethis structure [Kanellakis et al. 1993;Icking et al. 1987; Blankenagel andGuting 1990].

Kanellakis et al. [1993] uses theabove two-dimensional mapping to ad-dress two problems: indexing con-straints and indexing classes in an I/Oenvironment. Constraints are repre-sented as intervals that can be added,



modified, or deleted. The problem of in-dexing constraints is then reduced tothe dynamic interval management prob-lem, i.e., the “*/point/-” query! For solv-ing the dynamic interval managementproblem, Kanellakis et al. [1993] intro-duces a new access method, the metab-lock tree, which is a B-ary accessmethod that partitions the upper diago-nal of the two-dimensional space intometablocks, each of which has B2 datapoints (the structure is rather complex;for details see Kanellakis et al. [1993]).Note however that the metablock tree isa semidynamic structure, since it cansupport only interval insertions (no de-letions). It uses O~l/B! space, O~logB

l 1 a/B!query time, and O~logBl 1~logBl!2/B! amortized insertion time.The insertion bound is amortized be-cause the maintenance of a metablock’sinternal organization is rather complexto perform after each insertion. Instead,metablock reorganizations are deferreduntil enough insertions have accumu-lated. If interval insertions are random,the expected insertion time becomesO~logBl!.

Icking et al. [1987] and Blankenageland Guting [1990] present two otherexternal implementations of the prioritysearch tree. Both use optimal space~O~l/B!!; Icking et al. [1987] hasO~log2l 1 a/B! query time I/Os forvalid timeslices, while Blankenagel andGuting [1990] has O~logBl 1 a! querytime.

In Ramaswamy and Subramanian[1994], a new technique called path cach-ing is introduced for solving two-dimen-sional range queries. This technique isused to turn various main-memory datastructures, like the interval tree or thepriority search tree [McCreight 1985],into external structures. With this ap-proach, the “*/point/-” query is addressedin O~logBl 1 a/B! query time, O~logBl!amortized update time (including inser-tions and deletions), but in O~~l/B!log2log2B! space.

The above approaches are aimed atgood worst-case bounds, but lead torather complex structures. Another main-memory data-structure that solves the“*/point/-” query is the segment tree[Bentley 1977] which, however, uses morethan linear space. Blankenagel and Gut-ing [1994] present the external segmenttree (EST), which is a paginated versionof the segment tree. We first describe theworst-case performance of the ESTmethod. If the endpoints of the valid-timeintervals take values from a universe ofsize V (i.e., there are V possible endpointvalues), the EST supports “*/point/-” que-ries using O~~l/B!log2V! space, O~log2V!update per change, and query timeO~log2V 1 a!. Blankenagel and Guting[1994] also present an extended analysisof the expected behavior of the EST underthe assumption of a uniformly distributedset of intervals of fixed length. It is shownthat the expected behavior is much bet-ter; the average height of the EST is, forall practical purposes, small (this affectsthe logarithmic portion of the perfor-mance) and the answer is found by ac-cessing an additional O~a/B! pages.

An advantage of the external segmenttree is that the method can be modifiedto also address queries with key predi-cates (like the “range/point/-” query).This is performed by embedding B-treesin the EST. The original EST structureguides the search to a subset of inter-vals that contain the query valid time vwhile an embedded B-tree allowssearching this subset for whether the

Figure 25. An interval is translated into a pointin a two-dimensional space. Axes x and y repre-sent an interval’s starting and ending valid-times.Intervals that intersect valid instant v correspondto the points included in the shaded area.



query key predicate is also satisfied. Fordetails see Blankenagel and Guting[1994].

Good average-case performance couldalso be achieved by using a dynamicmultidimensional access method. If onlymultidimensional points are supported,as in the k-d-B-tree [Robinson 1984] orthe h-B-tree [Lomet and Salzberg 1990],mapping an (interval, key) pair toa triplet consisting of start_time,end_time, key, as discussed above, al-lows the valid intervals to be repre-sented by points in three-dimensionalspace.

If intervals are represented more nat-urally, as line segments in a two- di-mensional key-time space, the cell-tree[Gunther 1989], the R-tree, or one of itsvariants, the R* [Beckmann et al. 1990],or the R1 [Sellis et al. 1987] could beused. Such solutions should providegood average-case performance, butoverlapping still remains a problem, es-pecially if the interval distribution ishighly nonuniform (as observed inKolovson and Stonebraker [1991] for R-trees). If the SR-tree [Kolovson andStonebraker 1991] is utilized for valid-time databases, overlapping is de-creased, but the method may suffer ifthere are many interval deletions, sinceall remnants (segments) of a deletedinterval have to be found and physicallydeleted.

Another possibility is to facilitate atwo-level method whose top level in-dexes the key attribute of the intervalobjects (using a B1-tree), while the sec-ond level indexes the intervals thatshare the same key attribute. An exam-ple of such method is the ST-index [Gu-nadhi and Segev 1993]. In the ST-indexthere is a separate AP-tree that indexesthe start_times of all valid-time inter-vals sharing a distinct key attributevalue. The problem with this approachis that a “*/point/-” query will have tocheck all stored intervals to see whetherthey include the query valid-time v.

The time-index [Elmasri et al. 1990]may also be considered for storing valid-time intervals; there are, however, two

drawbacks. First, changes can arrive inany order, so leaf entries anywhere inthe index may have to merge or split,thus affecting their relevant timeslices.Second, updating may be problematic asdeleting (or adding or modifying thelength) of an interval involves updatingall the stored timeslices that this inter-val overlaps.

Nascimento et al. [1996] offer yet an-other approach to indexing valid-timedatabases, the MAP21 structure. A val-id-time interval ~x, y! is mapped to apoint z . 5 x10s 1 y, where s is themaximum number of digits needed torepresent any time point in the valid-time domain. This is enough to mapeach interval to a separate point. Aregular B-tree is then used to indexthese points. An advantage of this ap-proach is that interval insertions/dele-tions are easy using the B-tree. How-ever, to answer a valid timeslice queryabout time v the point closer to v isfound in the B-tree and then a sequen-tial search for all intervals before v isperformed. At worst, many intervalsthat do not intersect v can be found(Nascimento et al. [1996] assumes thatin practice the maximal interval lengthis known, which limits how far back thesequential search continues from v).

Further research is needed in thisarea. An interesting open problem iswhether an I/O optimal solution existsfor the “range/point/-” query (validrange timeslices).

5.3 Bitemporal Methods

As mentioned in Section 4.5, one way toaddress bitemporal queries is to fullystore some of the C~ti! collections ofFigure 3, together with the changes be-tween these collections. To exemplifysearching through the intervals of astored C~ti!, an access method for eachstored C~ti! is also included. These C~ti!s(and their accompanying methods) canthen be indexed by a regular B-tree onti, the transaction time. This is the ap-proach taken in the M-IVTT [Nasci-



mento et al. 1996]; the changes betweenstored methods are called “patches” andeach stored C~ti! is indexed by a MAP21method [Nascimento et al. 1996].

The M-IVTT approach can be thoughtas an extension of the time-index [El-masri et al. 1990] to a bitemporal envi-ronment. Depending on how often C~ti!sare indexed, the space/ update or thequery time of the M-IVTT will increase.For example, the space can easily be-come quadratic if the indexed C~ti!s areevery constant number of changes andeach change is the addition of a newinterval.

In another approach, the intervals as-sociated with a bitemporal object can be“visualized” as a bounding rectangle,which is then stored in a multidimen-sional index, such as the R-tree [Gutt-man 1984] (or some of its variants, likethe SR-tree [Kolovson and Stonebraker1991]). While this approach has the ad-vantage of using a single index to sup-port both time dimensions, the charac-teristics of transaction-time create aserious overlapping problem [Kumar etal. 1998]. All bitemporal objects thathave not been “deleted” (in the transac-tion sense) are represented with atransaction-time endpoint extending tonow (Figure 4).

To avoid this overlapping, the use oftwo R-trees (two-R approach) is pro-posed [Kumar et al. 1998]. When a bi-temporal object with valid-time intervalI is added in the database at transac-tion-time t, it is inserted at the front

R-tree. This tree keeps bitemporal ob-jects whose right transaction endpointis unknown. If a bitemporal object islater “deleted” at some time t9~t9 . t!,it is physically deleted from the frontR-tree and inserted as a rectangle ofheight I and width from t to t9, in theback R-tree. The back R-tree keeps bi-temporal objects with known transac-tion-time intervals (Figure 26, from Ku-mar et al. [1998]). At any given time, allbitemporal objects stored in the frontR-tree share the property that they are“alive” in the transaction-time sense.The temporal information of every suchobject is thus represented simply by avertical (valid-time) interval that “cuts”the transaction axis at the transaction-time this object was inserted in the da-tabase. Insertions in the front R-treeobjects are in increasing transactiontime, while physical deletions can hap-pen anywhere on the transaction axis.

A “*/point/point” query about ~ti, vj! isthen answered with two searches. Theback R-tree is searched for all rectan-gles that contain point ~ti, vj!. The frontR-tree is searched for all vertical inter-vals that intersect a horizontal intervalH. Interval H starts from the beginningof transaction time and extends untilpoint ti is at height vj (Figure 26). Tosupport “range/range/range” queries, anadditional third dimension for the keyranges is added in both R-trees.

The usage of two R-trees is reminis-cent of the dual-root mixed media R-tree

Figure 26. In the 2-R-tree approach, bitemporal data is divided according to whether their righttransaction endpoint is known. The scenario of Figure 3 is presented here (i.e., after time t5 haselapsed). The left two-dimensional space is stored in the front R-tree, while the right in the back R-tree.



proposed in Kolovson and Stonebraker[1989] as a mixed-media index thatstores intervals and also consists of twoR-trees. There, new intervals are storedon one R-tree and are gradually movedto the second R-tree. There are, how-ever, the following differences: (a) in thedual-root mixed media R-tree, intervalsinserted have both their endpointsknown in advance (which is not a char-acteristic of transaction-time); (b) bothR-trees in Kolovson and Stonebraker[1989] store intervals with the sameformat; (c) the transferring of data inthe dual-root mixed media R-tree is per-formed in a batched way. When the firstR-tree reaches a threshold near its max-imum allocated size, a vacuuming pro-cess completely vacuums all the nodesof the first R-tree (except its root) andinserts them into the second R-tree. Incontrast, transferring a bitemporal ob-ject in the 2-R approach is performedwhenever this object is deleted in thetransaction-time sense. Such a deletioncan happen to any currently “alive” ob-ject in the front R-tree.

Bitemporal problems can also be ad-dressed by the partial persistence ap-proach; this solution emanates from theabstraction of a bitemporal database asa sequence of history-timeslices C~t!(Figure 3) and has two steps. First, agood ephemeral structure is chosen torepresent each C~t!. This structuremust support dynamic addition/deletionof (valid-time) interval-objects. Second,this structure is made partially persis-tent. The collection of queries supportedby the ephemeral structure implieswhat queries are answered by the bi-temporal structure.

The main advantage obtained by“viewing” a bitemporal query as a par-tial persistence problem is that the val-id-time requirements are disassociatedfrom those with transaction-time. Morespecifically, valid time support is pro-vided from the properties of the ephem-eral structure, while the transactiontime support is achieved by making thisstructure partially persistent. Concep-

tually, this methodology provides fastaccess to the C~t! of interest, on whichthe valid-time query is then performed.

The partial persistence methodologyis also used in Lanka and Mays [1991];Becker et al. [1996]; Varman and Verma[1997] for the design of transaction-timeaccess methods. For a transaction-timeenvironment, the ephemeral structuremust support dynamic addition/deletionof plain-objects; hence a B-tree is theobvious choice. For a bitemporal envi-ronment, two access methods were pro-posed: the bitemporal interval tree [Ku-mar et al. 1995], which is created bymaking an interval tree [Edelsbrunner1983] partially persistent (and well pag-inated) and the bitemporal R-tree [Ku-mar et al. 1998] created by making anR-tree partially persistent.

The bitemporal interval tree is de-signed for the “*/point/point” and“*/range/point” queries. Answering suchqueries implies that the ephemeral datastructure should support point-enclosureand interval-intersection queries. In theabsence of an external ephemeral methodthat optimally solves these problems[Kanellakis 1993; Ramaswamy and Sub-ramanian 1994], a main-memory datastructure, the interval tree (which opti-mally solves the in-core versions of theabove problems), was used and made par-tially persistent and well paginated. Oneconstraint of the bitemporal interval treeis that the universe size V on the validdomain is known in advance. The methodcomputes “*/point/point” and “*/range/point” queries in O~logBV 1 logBn 1 a!I/Os. The space is O~~n 1 V!/B!; theupdate is amortized O~logB~m 1 V!!I/Osper change. Here n denotes the totalnumber of changes, a is the answer size,and m is the number of intervals con-tained in the current timeslice C(t) whenthe change is performed.

The bitemporal R-tree does not havethe valid-universe constraint. It is amethod designed for the more general“range/point/point” and “range/range/point” bitemporal queries. For that pur-pose, the ephemeral data structure



must support range point-enclosure andrange interval-intersection queries oninterval-objects. Since neither a main-memory, nor an external data structureexists with good worst-case performancefor this problem, the R*-tree [Beckmannet al. 1990] was used, an access methodthat has good average-case performancefor these queries. As a result, the per-formance of the bitemporal R-tree isbound by the performance of the ephem-eral R*-tree. This is because a methodcreated by the partial-persistence meth-odology behaves asymptotically as doesthe original ephemeral structure.

Kumar et al. [1998] contains variousexperiments comparing the average-case performance of the 2-R methodol-ogy, the bitemporal R-tree, and the ob-vious approach that stores bitemporalobjects in a single R-tree (the 1-R ap-proach, as in Figure 4). Due to the lim-ited copying introduced by partial per-sistence, the bitemporal R-tree usessome small extra space (about doublethe space used by the 1-R and 2-R meth-ods), but it has much better update andquery performance. Similarly, the 2-Rapproach has in general better perfor-mance than the 1-R approach.

Recently, Bliujute et al. [1998] exam-ine how to handle valid-time now-rela-tive data (data whose end of valid-timeis not fixed but tracks the current time)using extensions of R-trees.

It remains an interesting open prob-lem to find the theoretically I/O optimalsolutions even for the simplest bitempo-ral problems, like the “*/point/point”and “*/range/point” queries.

6. CONCLUSIONS

We presented a comparison of differentindexing techniques which have beenproposed for efficient access to temporaldata. While we have also covered valid-time and bitemporal approaches, thebulk of this paper addresses transac-tion-time methods because they are inthe the majority among the publishedapproaches. Since it is practically im-possible to run simulations of all meth-ods under the same input patterns, our

comparison was based on the worst-caseperformance of the examined methods.The items being compared include spacerequirements, update characteristics,and query performance. Query perfor-mance is measured against three basictransaction-time queries: the pure-key,pure-timeslice, and range-timeslice que-ries, or, using the three-entry notation,the “point/-/*”, the “*/-/point,” and the“range/-/point” queries, respectively. Inaddition, we addressed problems likeindex pagination, data clustering, andthe ability of a method to efficientlymigrate data to another medium (suchas a WORM device). We also introduceda general lower bound for such queries.A method that achieves the lower boundfor a particular query is termed I/O-optimal for that query. The worst-caseperformance of each transaction-timemethod is summarized in Table II. Thereader should be cautious when inter-preting worst-case performance. The no-tation will sometimes penalize a methodfor its performance on a pathologicalscenario; we indicate such cases. WhileTable II provides a good feeling for theasymptotic behavior of the examinedmethods, the choice of the appropriatemethod for the particular applicationalso depends on the application charac-teristics. In addition, issues such asdata clustering, index pagination, mi-gration of data to optical disks, etc.,may also be more or less important,according to the application. While I/O-optimal (and practical) solutions existfor many transaction-time queries, thisis not the case for the valid and bitem-poral domain. An I/O-optimal solutionexists for the valid-timeslice query, butis mainly of theoretical importance;more work is needed in this area. Allexamined transaction-time methodssupport “linear” transaction time. Thesupport of branching transaction time isanother promising area of research.

ACKNOWLEDGMENTSThe idea of doing this survey was pro-posed to the authors by R. Snodgrass.We would like to thank the anonymous



referees for many insightful commentsthat improved the presentation of thispaper. The performance and the de-scription of the methods are based onour understanding of the related pa-pers, hence any error is entirely ourown. The second author would also liketo thank J.P. Schmidt for many helpfuldiscussions on lower bounds in a pagi-nated environment.

REFERENCES

AGRAWAL, R., FALOUTSOS, C., AND SWAMI, A.1993. Efficient similarity search in sequencedatabases. In Proceedings of FODO.

AHN, I. AND SNODGRASS, R. 1988. Partitionedstorage for temporal databases. Inf. Syst. 13,4 (May 1, 1988), 369–391.

ARGE, L. AND VITTER, J. 1996. Optimal dynamicinterval management in external memo-ry. In Proceedings of the 37th IEEE Sympo-sium on Foundations of Computer Science(FOCS). IEEE Computer Society Press, LosAlamitos, CA.

BECKER, B., GSCHWIND, S., OHLER, T., SEEGER, B.,AND WIDMAYER, P. 1996. An asymptoticallyoptimal multiversion B-tree. VLDB J. 5, 4,264–275.

BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND

SEEGER, B. 1990. The R*-tree: An efficientand robust access method for points and rect-angles. In Proceedings of the 1990 ACM SIG-MOD International Conference on Manage-ment of Data (SIGMOD ’90, Atlantic City, NJ,May 23–25, 1990), H. Garcia-Molina,Ed. ACM Press, New York, NY, 322–331.

BENTLEY, J. 1977. Algorithms for Klee’s rectan-gle problems. Computer Science Depart-ment, Carnegie Mellon University, Pitts-burgh, PA.

BEN-ZVI, J. 1982. The time relational model.Ph.D. Dissertation. University of Californiaat Los Angeles, Los Angeles, CA.

BLANKENAGEL, G. AND GUTING, R. 1990. XP-trees, external priority search trees. Tech.Rep., Fern Universitat Hagen, Informatik-Bericht No.92.

BLANKENAGEL, G. AND GUTING, R. 1994. Externalsegment trees. Algorithmica 12, 6, 498–532.

BLIUJUTE, R., JENSEN, C. S., SALTENIS, S., ANDSLIVINSKAS, G. 1998. R-tree based indexingof now-relative bitemporal data. In Proceed-ings of the Conference on Very Large DataBases.

BÖHLEN, M. H. 1995. Temporal database sys-tem implementations. SIGMOD Rec. 24, 4(Dec.), 53–60.

BOZKAYA, T. AND ÖZSOYOGLU, M. 1995. Indexingtransaction-time databases. Tech. Rep. CES-95-19. Case Western Reserve University,Cleveland, OH.

BURTON, F., HUNTBACH, M., AND KOLLIAS, J.1985. Multiple generation text files usingoverlapping tree structures. Comput. J. 28,414–416.

CHAZELLE, B. 1986. Filtering search: A new ap-proach to query answering. SIAM J. Com-put. 15, 3, 703–724.

CHIANG, Y. AND TAMASSIA, R. 1992. Dynamic al-gorithms in computational geometry. Proc.IEEE 80, 9, 362–381.

DIETZFELBINGER, M., KARLIN, A., MEHLHORN, K.,MEYER, F., ROHNHERT, H., AND TARJAN, R.1988. Dynamic perfect hashing: Upper andlower bounds. In Proceedings of the 29thIEEE Conference on Foundations of ComputerScience. 524–531.

DRISCOLL, J. R., SARNAK, N., SLEATOR, D. D., ANDTARJAN, R. E. 1989. Making data struc-tures persistent. J. Comput. Syst. Sci. 38, 1(Feb. 1989), 86–124.

DYRESON, C., GRANDI, F., KÄFER, W., KLINE, N.,LORENTZOS, N., MITSOPOULOS, Y., MONTANARI,A., NONEN, D., PERESSI, E., PERNICI, B., ROD-DICK, J. F., SARDA, N. L., SCALAS, M. R., SEGEV,A., SNODGRASS, R. T., SOO, M. D., TANSEL, A.,TIBERIO, P., WIEDERHOLD, G., AND JENSEN, C.S, Eds. 1994. A consensus glossary of tem-poral database concepts. SIGMOD Rec. 23, 1(Mar. 1994), 52–64.

EASTON, M. C 1986. Key-sequence data sets onindelible storage. IBM J. Res. Dev. 30, 3(May 1986), 230–241.

EDELSBRUNNER, H. 1983. A new approach torectangle intersections, Parts I&II. Int. J.Comput. Math. 13, 209–229.

ELMASRI, R., KIM, Y., AND WUU, G. 1991. Efficientimplementation techniques for the time in-dex. In Proceedings of the Seventh Interna-tional Conference on Data Engineering (Kobe,Japan). IEEE Computer Society Press, LosAlamitos, CA, 102–111.

ELMASRI, R., WUU, G., AND KIM, Y. 1990. Thetime index: An access structure for temporaldata. In Proceedings of the 16th VLDB Con-ference on Very Large Data Bases (VLDB,Brisbane, Australia). VLDB Endowment,Berkeley, CA, 1–12.

ELMASRI, R., WUU, G., AND KOURAMAJIAN, V.1993. The time index and the monotonic B1-tree. In Temporal Databases: Theory, De-sign, and Implementation, A. Tansel, J. Clif-ford, S. Gadia, S. Jajodia, A. Segev, and R.Snodgrass, Eds. Benjamin/Cummings, Red-wood City, CA, 433–456.

FALOUTSOS, C., RANGANATHAN, M., AND MANOLO-POULOS, Y. 1994. Fast subsequence match-ing in time-series databases. In Proceedingsof the 1994 ACM SIGMOD International Con-ference on Management of Data (SIGMOD ’94,Minneapolis, MN, May 24–27, 1994), R. T.Snodgrass and M. Winslett, Eds. ACMPress, New York, NY, 419–429.

GRAY, J. AND REUTER, A. 1993. Transaction Pro-cessing: Concepts and Techniques. Morgan



Kaufmann Publishers Inc., San Francisco,CA.

GUNADHI, H. AND SEGEV, A. 1993. Efficient in-dexing methods for temporal relation-s. IEEE Trans. Knowl. Data Eng. 5, 3(June), 496–509.

GUNTHER, O. 1989. The design of the cell-tree:An object-oriented index structure for geomet-ric databases. In Proceedings of the FifthIEEE International Conference on Data Engi-neering (Los Angeles, CA, Feb. 1989). 598–605.

GUTTMAN, A. 1984. R-trees: A dynamic indexstructure for spatial searching. In Proceed-ings of the ACM SIGMOD Conference on Man-agement of Data. ACM Press, New York,NY, 47–57.

HELLERSTEIN, J. M., KOUTSOUPIAS, E., AND PAPAD-IMITRIOU, C. H. 1997. On the analysis ofindexing schemes. In Proceedings of the 16thACM SIGACT-SIGMOD-SIGART Symposiumon Principles of Database Systems (PODS ’97,Tucson, AZ, May 12–14, 1997), A. Mendelzonand Z. M. Özsoyoglu, Eds. ACM Press, NewYork, NY, 249–256.

ICKING, CH., KLEIN, R., AND OTTMANN, TH. 1988.Priority search trees in secondary memory(extended abstract). In Proceedings of the In-ternational Workshop WG ’87 Conference onGraph-Theoretic Concepts in Computer Sci-ence (Kloster Banz/Staffelstein, Germany,June 29–July 1, 1987), H. Göttler and H.-J.Schneider, Eds. Proceedings of the SecondSymposium on Advances in Spatial Data-bases, vol. LNCS 314. Springer-Verlag, NewYork, NY, 84–93.

JAGADISH, H. V., MENDELZON, A. O., AND MILO, T.1995. Similarity-based queries. In Proceed-ings of the 14th ACM SIGACT-SIGMOD-SI-GART Symposium on Principles of DatabaseSystems (PODS ’95, San Jose, California, May22–25, 1995), M. Yannakakis, Ed. ACMPress, New York, NY, 36–45.

JENSEN, C. S., MARK, L., AND ROUSSOPOULOS, N.1991. Incremental implementation model forrelational databases with transaction time.IEEE Trans. Knowl. Data Eng. 3, 4, 461–473.

JENSEN, C. S., MARK, L., ROUSSOPOULOS, N., ANDSELLIS, T. 1992. Using differential tech-niques to efficiently support transaction time-. VLDB J. 2, 1, 75–111.

KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-tree: An improved R-tree using fractals. InProceedings of the 20th International Confer-ence on Very Large Data Bases (VLDB’94,Santiago, Chile, Sept.). VLDB Endowment,Berkeley, CA, 500–509.

KANELLAKIS, P. C., RAMASWAMY, S., VENGROFF, D.E., AND VITTER, J. S. 1993. Indexing fordata models with constraints and classes (ex-tended abstract). In Proceedings of theTwelfth ACM SIGACT-SIGMOD-SIGARTSymposium on Principles of Database Systems(PODS, Washington, DC, May 25–28), C.

Beeri, Ed. ACM Press, New York, NY, 233–243.

KOLLIOS, G. AND TSOTRAS, V. J. 1998. Hashingmethods for temporal data. TimeCenter TR-24. Aalborg Univ., Aalborg, Den-mark. http:www.cs.auc.dk/general/DBS/tdb/TimeCenter/publications.html

KOLOVSON, C. 1993. Indexing techniques forhistorical databases. In Temporal Databas-es: Theory, Design, and Implementation, A.Tansel, J. Clifford, S. Gadia, S. Jajodia, A.Segev, and R. Snodgrass, Eds. Benjamin/Cummings, Redwood City, CA, 418–432.

KOLOVSON, C. AND STONEBRAKER, M. 1989.Indexing techniques for historical databas-es. In Proceedings of the Fifth IEEE Interna-tional Conference on Data Engineering (LosAngeles, CA, Feb. 1989). 127–137.

KOLOVSON, C. AND STONEBRAKER, M. 1991.Segment indexes: Dynamic indexing tech-niques for multi-dimensional interval da-ta. In Proceedings of the 1991 ACM SIG-MOD International Conference onManagement of Data (SIGMOD ’91, Denver,CO, May 29–31, 1991), J. Clifford and R.King, Eds. ACM Press, New York, NY, 138–147.

KOURAMAJIAN, V., ELMASRI, R., AND CHAUDHRY, A.1994. Declustering techniques for paralleliz-ing temporal access structures. In Proceed-ings of the 10th IEEE Conference on DataEngineering. 232–242.

KOURAMAJIAN, V., KAMEL, I., KOURAMAJIAN, V., EL-MASRI, R., AND WAHEED, S. 1994. The timeindex 1 : an incremental access structure fortemporal databases. In Proceedings of the3rd International Conference on Informationand Knowledge Management (CIKM ’94,Gaithersburg, Maryland, Nov. 29–Dec. 2,1994), N. R. Adam, B. K. Bhargava, and Y.Yesha, Eds. ACM Press, New York, NY,296–303.

KUMAR, A., TSOTRAS, V. J., AND FALOUTSOS, C.1995. Access methods for bitemporal data-bases. In Proceedings of the internationalWorkshop on Recent Advances in TemporalDatabases (Zurich, Switzerland, Sept.), S.Clifford and A. Tuzhlin, Eds. Springer-Ver-lag, New York, NY, 235–254.

KUMAR, A., TSOTRAS, V. J., AND FALOUTSOS, C.1998. Designing access methods for bitempo-ral databases. IEEE Trans. Knowl. DataEng. 10, 1 (Jan./Feb.).

LANDAU, G. M., SCHMIDT, J. P., AND TSOTRAS, V. J.1995. On historical queries along multiplelines of time evolution. VLDB J. 4, 4, 703–726.

LANKA, S. AND MAYS, E. 1991. Fully persistentB1trees. In Proceedings of the 1991 ACMSIGMOD International Conference on Man-agement of Data (SIGMOD ’91, Denver, CO,May 29–31, 1991), J. Clifford and R. King,Eds. ACM Press, New York, NY, 426–435.



LEUNG, T. Y. C. AND MUNTZ, R. R. 1992. Gene-ralized data stream indexing and temporalquery processing. In Proceedings of the Sec-ond International Workshop on Research Is-sues in Data Engineering: Transactions andQuery Processing.

LEUNG, T. Y. C. AND MUNTZ, R. R. 1992. Tem-poral query processing and optimization inmultiprocessor database machines. In Pro-ceedings of the 18th International Conferenceon Very Large Data Bases (Vancouver, B.C.,Aug.). VLDB Endowment, Berkeley, CA,383–394.

LEUNG, T. Y. C. AND MUNTZ, R. R. 1993. Streamprocessing: Temporal query processing andoptimization. In Temporal Databases: The-ory, Design, and Implementation, A. Tansel,J. Clifford, S. Gadia, S. Jajodia, A. Segev, andR. Snodgrass, Eds. Benjamin/Cummings,Redwood City, CA, 329–355.

LITWIN, W. 1980. Linear hashing: A new toolfor file and table addressing. In Proceedingsof the 6th International Conference on VeryLarge Data Bases (Montreal, Ont. Canada,Oct. 1–3). ACM Press, New York, NY, 212–223.

LOMET, D. 1993. Using timestamping to opti-mize commit. In Proceedings of the SecondInternational Conference on Parallel and Dis-tributed Systems (Dec.). 48–55.

LOMET, D. AND SALZBERG, B. 1989. Access meth-ods for multiversion data. In Proceedings ofthe 1989 ACM SIGMOD International Confer-ence on Management of Data (SIGMOD ’89,Portland, OR, June 1989), J. Clifford, J. Clif-ford, B. Lindsay, D. Maier, and J. Clifford,Eds. ACM Press, New York, NY, 315–324.

LOMET, D. AND SALZBERG, B. 1990. The perfor-mance of a multiversion access method. InProceedings of the 1990 ACM SIGMOD Inter-national Conference on Management of Data(SIGMOD ’90, Atlantic City, NJ, May 23–25,1990), H. Garcia-Molina, Ed. ACM Press,New York, NY, 353–363.

LOMET, D. AND SALZBERG, B. 1990. The hB-tree:a multiattribute indexing method with goodguaranteed performance. ACM Trans. Data-base Syst. 15, 4 (Dec. 1990), 625–658.

LOMET, D. AND SALZBERG, B. 1993. Transaction-time databases. In Temporal Databases:Theory, Design, and Implementation, A.Tansel, J. Clifford, S. Gadia, S. Jajodia, A.Segev, and R. Snodgrass, Eds. Benjamin/Cummings, Redwood City, CA.

LOMET, D. AND SALZBERG, B. 1993. Exploiting ahistory database for backup. In Proceedingsof the 19th International Conference on VeryLarge Data Bases (VLDB ’93, Dublin, Ireland,Aug.). Morgan Kaufmann Publishers Inc.,San Francisco, CA, 380–390.

LORENTZOS, N. A. AND JOHNSON, R. G. 1988. Ex-tending relational algebra to manipulate tem-poral data. Inf. Syst. 13, 3 (Oct., 1, 1988),289–296.

LUM, V., DADAM, P., ERBE, R., GUENAUER, J., PIS-TOR, P., WALCH, G., WERNER, H., AND WOOD-FILL, J. 1984. Designing DBMS support forthe temporal database. In Proceedings of theACM SIGMOD Conference on Management ofData. ACM Press, New York, NY, 115–130.

MANOLOPOULOS, Y. AND KAPETANAKIS, G. 1990.Overlapping B1trees for temporal data. InProceedings of the Fifth Conference on JCIT(JCIT, Jerusalem, Oct. 22-25). 491–498.

MCCREIGHT, E. M. 1985. Priority search trees.SIAM J. Comput. 14, 2, 257–276.

MEHLHORN, K. 1984. Data Structures and Algo-rithms 3: Multi-dimensional Searching andComputational Geometry. EATCS mono-graphs on theoretical computer science.Springer-Verlag, New York, NY.

MOTAKIS, I. AND ZANIOLO, C. 1997. Temporal ag-gregation in active database rules. In Pro-ceedings of the International ACM Conferenceon Management of Data (SIGMOD ’97, May).ACM, New York, NY, 440–451.

MUTH, P., KRAISS, A., AND WEIKUM, G. 1996.LoT: A dynamic declustering of TSB-treenodes for parallel access to temporal da-ta. In Proceedings of the Conference onEDBT. 553–572.

NASCIMENTO, M., DUNHAM, M. H., AND ELMASRI, R.1996. M-IVTT: A practical index for bitem-poral databases. In Proceedings of the Con-ference on DEXA (DEX ’96, Zurich, Switzer-land).

NASCIMENTO, M., DUNHAM, M. H., AND KOURAMA-JIAN, V. 1996. A multiple tree mapping-based approach for range indexing. J. Bra-zilian Comput. Soc. 2, 3 (Apr.).

NAVATHE, S. B. AND AHMED, R. 1989. A tempo-ral relational model and a query language.Inf. Sci. 49, 1, 2 & 3 (Oct./Nov./Dec. 1989),147–175.

O’NEIL, P. AND WEIKUM, G. 1993. A log-struc-tured history data access method(LHAM). In Proceedings of the Workshop onHigh Performance Transaction System (Asilo-mar, CA).

ÖZSOYOGLU, G. AND SNODGRASS, R. 1995. Tempo-ral and real-time databases: A survey. IEEETrans. Knowl. Data Eng. 7, 4 (Aug.), 513–532.

RAMASWAMY, S. 1997. Efficient indexing forconstraint and temporal databases. In Pro-ceedings of the 6th International Conferenceon Database Theory (ICDT ’97, Delphi,Greece, Jan. 9-10). Springer-Verlag, Berlin,Germany.

RAMASWAMY, S. AND SUBRAMANIAN, S. 1994.Path caching (extended abstract): a techniquefor optimal external searching. In Proceed-ings of the 13th ACM SIGACT-SIGMOD-SI-GART Symposium on Principles of DatabaseSystems (PODS ’94, Minneapolis, MN, May24–26, 1994), V. Vianu, Ed. ACM Press,New York, NY, 25–35.

RICHARDSON, J., CAREY, M., DEWITT, D., AND SHE-KITA, E. 1986. Object and file management



in the Exodus extensible system. In Proceed-ings of the 12th International Conference onVery Large Data Bases (Kyoto, Japan,Aug.). VLDB Endowment, Berkeley, CA, 91–100.

RIVEST, R. 1976. Partial-match retrieval algo-rithms. SIAM J. Comput. 5, 1 (Mar.), 19–50.

ROBINSON, J. 1984. The K-D-B tree: A searchstructure for large multidimensional dynamicindexes. In Proceedings of the ACM SIG-MOD Conference on Management of Data.ACM Press, New York, NY, 10–18.

ROTEM, D. AND SEGEV, A. 1987. Physical organi-zation of temporal data. In Proceedings ofthe Third IEEE International Conference onData Engineering. IEEE Computer SocietyPress, Los Alamitos, CA, 547–553.

SALZBERG, B. 1988. File Structures: An AnalyticApproach. Prentice-Hall, Inc., Upper SaddleRiver, NJ.

SALZBERG, B. 1994. Timestamping after com-mit. In Proceedings of the 3rd InternationalConference on Parallel and Distributed Infor-mation Systems (PDIS, Austin, TX,Sept.). 160–167.

SALZBERG, B. AND LOMET, D. 1995. Branchedand Temporal Index Structures. Tech. Rep.NU-CCS-95-17. Northeastern Univ., Boston,MA.

SEGEV, A. AND GUNADHI, H. 1989. Event-join op-timization in temporal relational databas-es. In Proceedings of the 15th InternationalConference on Very Large Data Bases (VLDB’89, Amsterdam, The Netherlands, Aug 22–25), R. P. van de Riet, Ed. Morgan Kauf-mann Publishers Inc., San Francisco, CA,205–215.

SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS,C. 1987. The R1-tree: A dynamic index formulti-dimensional objects. In Proceedings ofthe 13th Confererence on Very Large DataBases (Brighton, England, Sept., 1987). VLDBEndowment, Berkeley, CA.

SESHADRI, P., LIVNY, M., AND RAMAKRISHNAN, R.1996. The design and implementation of asequence database system. In Proceedings ofthe 22nd International Conference on VeryLarge Data Bases (VLDB ’96, Mumbai, India,Sept.). 99–110.

SHOSHANI, A. AND KAWAGOE, K. 1986. Temporaldata management. In Proceedings of the12th International Conference on Very LargeData Bases (Kyoto, Japan, Aug.). VLDB En-dowment, Berkeley, CA, 79–88.

SNODGRASS, R. T. AND AHN, I. 1985. A taxonomyof time in databases. In Proceedings of theACM SIGMOD Conference on Management ofData. ACM Press, New York, NY, 236–246.

SNODGRASS, R. T. AND AHN, I. 1986. Temporaldatabases. IEEE Comput. 19, 9 (Sept. 1986),35–41.

STONEBRAKER, M. 1987. The design of the Post-gres storage system. In Proceedings of the13th Confererence on Very Large Data Bases(Brighton, England, Sept., 1987). VLDB En-dowment, Berkeley, CA, 289–300.

TSOTRAS, V. J. AND GOPINATH, B. 1990. Efficientalgorithms for managing the history of evolv-ing databases. In Proceedings of the ThirdInternational Conference on Database Theory(ICDT ’90, Paris, France, Dec.), S. Abitebouland P. C. Kanellakis, Eds. Proceedings ofthe Second Symposium on Advances in Spa-tial Databases, vol. LNCS 470. Springer-Verlag, New York, NY, 141–174.

TSOTRAS, V. J., GOPINATH, B., AND HART, G. W.1995. Efficient management of time-evolv-ing databases. IEEE Trans. Knowl. DataEng. 7, 4 (Aug.), 591–608.

TSOTRAS, V. J., JENSEN, C. S., AND SNODGRASS, R.T. 1998. An extensible notation for spatio-temporal index queries. SIGMOD Rec. 27, 1,47–53.

TSOTRAS, V. J. AND KANGELARIS, N. 1995. Thesnapshot index: An I/O-optimal access methodfor timeslice queries. Inf. Syst. 20, 3 (May1995), 237–260.

TSOTRAS, V. J. AND KUMAR, A. 1996. Temporaldatabase bibliography update. SIGMODRec. 25, 1 (Mar.), 41–51.

VAN DEN BERCKEN, J., SEEGER, B., AND WIDMAYER,P. 1997. A generic approach to bulk load-ing multidimensional index structures. InProceedings of the 23rd International Confer-ence on Very Large Data Bases (VLDB ’97,Athens, Greece, Aug.). 406–415.

VARMAN, P. AND VERMA, R. 1997. An efficientmultiversion access structure. IEEE Trans.Knowl. Data Eng. 9, 3 (May/June), 391–409.

VERMA, R. AND VARMAN, P. 1994. Efficient ar-chivable time index: A dynamic indexingscheme for temporal data. In Proceedings ofthe International Conference on ComputerSystems and Education. 59–72.

VITTER, J. S. 1985. An efficient I/O interface foroptical disks. ACM Trans. Database Syst.10, 2 (June 1985), 129–162.

Received: December 1994; revised: May 1998; accepted: June 1998



Comparison of Access Methods for Time-Evolving …cchen/580-S06/reading/ST99.pdf · Comparison of...

Documents

Transcript of Comparison of Access Methods for Time-Evolving …cchen/580-S06/reading/ST99.pdf · Comparison of...