Concurrency Distributed Databases

download Concurrency Distributed Databases

of 37

Transcript of Concurrency Distributed Databases

  • 7/28/2019 Concurrency Distributed Databases

    1/37

    Con currency Control in Distr ibuted Da tabase S ystem sPHILIP A. BERNSTEIN AND NATHAN GOODMANCom puter Corpora tion of Ame rica, Ca mbridge, M assachusetts 02139

    In this pa pe r we survey, consolidate, and present the s tate o f the ar t in distributeddatabase concurrency control. The he art o f our analysts is a decomposition of theconcurrency control problem into two m ajor subproblems: read-w ri te and w ri te-wri tesynchronization. We describe a series of synchrom zation techniques for solving eac hsubproblem a nd show how to com bine these techniques into algorithm s for solving theentire concurrency control problem. Such algorithms are called "concurrency controlmethods." W e describe 48 principal method s, including all practical algorithm s th at h av eappeared m the literature plus several new ones. W e concentrate on the stru ctur e andcorrectness of concurrency control algorithm s. Issues o f performance are g iven onlysecondary treatment.Keywords a nd Phrases: concurrency control , deadlock, dtstnbu ted da tabase ma nag em entsystems, locking, senahzability, sy nchrom zation, tun esta m p ordering, timestamps, two-phase co mm it, two-phase lockingCR Categories: 4.33, 4.35

    I N T R O D U C T I O NThe Concurrency Control ProblemC o n c u r r e n c y c o n t r o l i s t h e a c t i v i t y o f c o -o r d i n a t i n g c o n c u r r e n t a c c e s s e s t o a d a t a -b a s e i n a m u l t i u s e r d a t a b a s e m a n a g e m e n ts y s t e m ( D B M S ) . C o n c u r r e n c y c o n t r o l p e r -m i t s u s e r s t o a c c e s s a d a t a b a s e i n a m u l t i -p r o g r a m m e d f a s h i o n w h i l e p r e s e r v i n g t h ei ll u s io n t h a t e a c h u s e r is e x e c u t i n g a l o n e o na d e d i c a t e d s y s t e m . T h e m a i n t e c h n i c a ld i f f ic u l t y i n a t t a i n i n g t h i s g o a l is t o p r e v e n td a t a b a s e u p d a t e s p e r f o r m e d b y o n e u s e rf r o m i n t e r f e ri n g w i t h d a t a b a s e r e t r i e v a l sa n d u p d a t e s p e r f o r m e d b y a n o t h e r . T h ec o n c u r r e n c y c o n t ro l p r o b l e m i s e x a c e r b a t e din a d i s t r i b u te d D B M S ( D D B M S ) b e c a u s e(1 ) u s e r s m a y a c c e s s d a t a s t o r e d i n m a n yd i f f er e n t c o m p u t e r s i n a d i s t r i b u t e d s y s t e m ,a n d (2 ) a c o n c u r r e n c y c o n t r o l m e c h a n i s ma t o n e c o m p u t e r c a n n o t i n s t a n t a n e o u s l yk n o w a b o u t i n te r a c t i o n s a t o t h e r c o m -p u t e r s .

    C o n c u r r e n c y c o n t r o l h a s b e e n a c t i v e l yi n v e s t i g a t e d f o r th e p a s t s e v e r a l y e a r s , a n dt h e p r o b l e m f o r n o n d i s t r i b u t e d D B M S s i sw e l l u n d e r s t o o d . A b r o a d m a t h e m a t i c a lt h e o r y h a s b e e n d e v e l o p e d t o a n a l y z e t h ep r o b l e m , a n d o n e a p p r o a c h , c a l l e d t w o -p h a s e l o c k i n g , h a s b e e n a c c e p t e d a s as t a n d a r d s o l u ti o n . C u r re .n t r e s e a r c h o n n o n -d i s t r i b u t e d c o n c u n ' e n c y c o n t r o l i s f o c u s e do n e v o l u t i o n a r y i m p r o v e m e n t s t o t w o -p h a s e l o c k i n g , d e t a i l e d p e r f o r m a n c e a n a l y -s is a n d o p t i m i z a t i o n , a n d e x t e n s i o n s t o t h em a t h e m a t i c a l t h e o r y .D i s t r i b u t e d c o n c u r r e n c y c o n t r o l , b y c o n -t r a s t, i s i n a s t a t e o f e x t r e m e t u r b u l e n c e .M o r e t h a n 2 0 c o n c u r r e n c y c o n t r o l a l g o -r i t h m s h a v e b e e n p r o p o s e d f o r D D B M S s ,a n d s e v e r a l h a v e b e e n , o r a r e b e i n g , i m p l e -m e n t e d . T h e s e a l g o r i t h m s a r e u s u a l l y c o m -p l e x , h a r d t o u n d e r s t a n d , a n d d i f f i c u l t t op r o v e c o r r e c t ( i n d e e d , m a n y a r e i n c o r r e c t ) .B e c a u s e t h e y a r e d e s c r i b e d i n d i f f e r e n t t e r -m i n o l o g i e s a n d m a k e d i f f e r e n t a s s u m p t i o n s

    P e r m i s s i o n t o c o p y w i t h o u t f ee a ll o r p a r t o f t h is a t e r i a l g r a n t e d p r o v i d e d t h a t t h e c o p l e s r e n o t m a d e o rd i st r ib u te d o r d i r ec t o m m e r c i a l a d v a n t a g e , h e A C M c o p y r i g h t o t i c e n d t h e t it le f t h e p u b l i ca t i o n n d i tsd a t e ap p e a r , n d n o t ic e s g i v e n t h a t c o p y i n g i s b y p e r r m s s i o n f t h e A s so c i a ti o n o r C o m p u t i n g M a c h i n e r y . T oc o p y o t h e r w i s e , r t o r e p u b l i s h, e q u i r e s f e e a n d / o r s p ec i fi c e r m i s s i o n . 1981 AC M 001 0-4892/81/0600-0185 $00.75C o m p u t i n g S u r v e y s, V o l . 1 3 , N o . 2 , June 1 9 8 1

  • 7/28/2019 Concurrency Distributed Databases

    2/37

    186 P . A . B e r n s t e i n a n d N . G o o d m a nCONTENTS

    INTRODUCTIONThe Concurrency Control ProblemExamples of Concurrency Control AnomaliesComparison to Mutual Exclnslon Problems1. TRANSACTION-PROCESSING MODEL1.1 Prelmunary Defimtmns and DDBMS Archi-tecture1.2 Centrahzed Transactmn-Processmg Model1.3 Dmmbuted Transactmn-Processing Model2 DECOMPOSITION OF TH E CONCUR-RENCY CONTROL PROBLEM2 1 Selaallzabfllty2.2 A Parachgm for Concurrency Control3. SYNCHRONIZATION TECHNIQUESBASED ON TWO-PHASE LOCKING3.1 Basra 2PL Implementation3.2 Primary Copy 2PL3.3 Voting 2PL3.4 Centrahzed 2PL3.5 Deadlock Detection and Prevention4 SYNCHRONIZATION TECHNIQUESBASED ON TIMESTAMP ORDERING4.1 Basic T/O Implementatmn4.2 The Thomas Write Rule4.3 MulUversion T/O4.4 Conservative T/O4 5 Tnnestamp Management

    5 INTEGRATED CONCURRENCY CONTROLMETHODS5 1 Pure 2PL Methods5.2 Pure T/O Methods5.3 MLxed 2PL and T/O Methods6. CONCLUSIONAPPENDIX. OTHER CONCURRENCY CON-TROL METHODSAI. CertifiersA2. Thomas' MaJority Consensus AlgorithmA3. Ellis' Ring AlgorithmACKNOWLEDGMENTREFERENCES

    v

    a b o u t t h e u n d e r l y i n g D D B M S e n v i r o n -m e n t , i t i s d i f f i c u l t t o c o m p a r e t h e m a n yp r o p o s e d a l g o r i t h m s , e v e n i n q u a l i t a t i v et e r m s . N a t u r a l l y e a c h a u t h o r p r o c l a i m s h i so r h e r a p p r o a c h a s b e s t , b u t t h e r e i s l i t t l ec o m p e l l i n g e v i d e n c e t o s u p p o r t t h e c la i m s .T o s u r v e y t h e s t a t e o f t h e a r t, w e i n t ro -d u c e a s t a n d a r d t e r m i n o l o g y f o r d e s c r ib i n gD D B M S c o n c u r r e n c y c o n t r o l a l g o r i t h m sa n d a s t a n d ar d m o d e l f or t h e D D B M S e n-v i r o n m e n t . F o r a n a l y s i s p u r p o s e s w e d e -c o m p o s e t h e c o n c u r r e n c y c o n t r o l p r o b l e mi n t o t w o m a j o r s u b p r o b l e m s , c a l l e d r e a d -w r i t e a n d w r i t e - w r i t e s y n c h r o n i z a t i o n . E v -

    c r y c o n c u r r e n c y c o n t r o l a l g o ri t h m m u s t i n-c l u d e a s u b a l g o r i t h m t o s o l ve e a c h s u b p r o b -l e m . T h e f ir st s t e p t o w a r d u n d e r s t a n d i n g ac o n c u r r e n c y c o n t r o l a l g o r it h m i s t o i s o l a tet h e s u b a l g o r i t h m e m p l o y e d f o r e a c h s u b -p r o b l e m .A f t e r s t u d y in g t h e l a rg e n u m b e r o f p r o -p o s e d a l g o r it h m s , w e f in d t h a t t h e y a r ec o m p o s i t i o n s o f o n l y a f e w s u b a l g o r it h m s .I n f a c t, t h e s u b a l g o r i t h m s u s e d b y a ll p r a c -t i c a l D D B M S c o n c u r r e n c y c o n t r o l a l g o -r i t h m s a r e v a r i a t i o n s o f j u s t t w o b a s i c te c h -n i q u e s : t w o - p h a s e l o c k i n g a n d t i m e s t a m po r d er in g ; t h u s t h e s t a t e o f t h e a r t i s f a rm o r e c o h e r e n t t h a n a re v i e w o f t h e l i t e ra -t u r e w o u l d s e e m t o i n d i ca t e .

    Examples of Co ncu rrency Control Anom aliesT h e g o a l o f c o n c u r r e n c y c o n t r o l i s t o p r e -v e n t i n t e r f e r e n c e a m o n g u s e r s w h o a r e s i -m u l t a n e o u s l y a c c e ss i n g a d a t a b a s e . L e t u si l lu s t r a te t h e p r o b l e m b y p r e s e n t in g t w o" c a n o n i c a l " e x a m p l e s o f i n t e r u s e r i n te r fe r -e n c e . B o t h a r e e x a m p l e s o f a n o n - l in ee l e c t r o n i c f u n d s t r a n s f e r s y s t e m a c c e s s e dv i a r e m o t e a u t o m a t e d t e l l e r m a c h i n e s( A T M s ) . I n r e s p o n s e t o c u s t o m e r r e q u e s ts ,A T M s r e t r i e v e d a t a f r o m a d a t a b a s e , p e r -f o r m c o m p u t a t i o n s , a n d s t o r e r e s u l t s b a c ki n t o t h e d a t a b a s e .

    A n o m a l y 1: L o s t U p d a te s . S u p p o s e t w oc u s t o m e r s s i m u l t a n e o u s l y t r y t o d e p o s i tm o n e y i n t o t h e s a m e a c c o u n t . I n t h e a b -s e n c e o f c o n c u r r e n c y c o n t r o l , t h e s e t w o a c -t i v i t i e s c o u l d i n t e r f e r e ( s e e F i g u r e 1 ) . T h et w o A T M s h a n d l i n g t h e t w o c u s t o m e r sc o u l d r e a d t h e a c c o u n t b a l a n c e a t a p p r o x i -m a t e l y t h e s a m e t im e , c o m p u t e n e w b a l-a n c e s i n p a r a l l e l , a n d t h e n s t o r e t h e n e wb a l a n c e s b a c k i n to t h e d a t a b a s e . T h e n e te f f e c t i s i n c o rr e c t: a l t h o u g h t w o c u s t o m e r sd e p o s i t e d m o n e y , t h e d a t a b a s e o n l y r e f le c t so n e a c t i v i ty ; t h e o t h e r d e p o s i t i s l o s t b y t h es y s t e m .A n o m a l y 2 : I n c o n s i s t e n t R e t r i e v a l s .S u p p o s e t w o c u s t o m e r s s i m u l t a n e o u s l y ex -e c u t e t h e f o l l o w i n g t r a n s a c t i o n s .C u s t o m e r 1 : M o v e $ 1 , 0 0 0 , 0 0 0 f r o m A c m eC o r p o r a t i o n ' s s a v i n g s a c -c o u n t t o i t s c h e c k in g a c c o u n t .C u s to m e r 2: P r i n t A c m e C o r p o r a t i o n ' st o t a l b a l a n c e i n s a v i n g s a n dc h e c k i n g .

    Computing Surveys, Vol. 13, No 2, June 1981

  • 7/28/2019 Concurrency Distributed Databases

    3/37

    Execut,on of T

    READbolonceAdd ~I,000,000WRITE resultbock to dotobose

    Concurrency Control in Database SystemsDotobose Execution of T

    I I , 0 0 0 0 0 ] , 0 0 0 e$1,500,000 [ J $2~ 500,000 ] Add $21000,000

    bock to dotobose

    187

    Figure 1. Lostupdate anomaly.

    In the absence of concurrency controlthese two transactions could interfere (seeFigure 2). The first transaction might readthe savings account balance, subtract$1,000,000, and store the resul t back in thedatabase. Then the second transactionmight read the savings and checking ac-count balances and print the total. Thenthe first transaction might finish the fundstransfer by reading the checking accountbalance, adding $1,000,000, and finally stor-ing the result in the database. UnlikeAnomaly 1, the final values placed into thedatabase by this execution are correct. Still,the execution is incorrect because the bal-ance printed by Customer 2 is $1,000,000short.These two examples do not exhaust allpossible ways in which concurrent userscan interfere. However, these examples aretypical of the concurrency control problemstha t arise in DBMSs.Compar ison t o Mut ua l Exc lus ion P rob lemsThe problem of database concurrency con-trol is similar in some respects to that ofmutual exclusion in operating systems. Thelatte r problem is concerned with coordinat-ing access by concurrent processes to sys-tem resources such as memory , I/O devices,and CPU. Many solution techniques havebeen developed, including locks, sema-phores, monitors, and serializers [BRIN73,DIJK71, HEWI74, HOAR74].The concurrency control and mutual ex-clusion problems are similar in that bothare concerned with controlling concurrent

    access to shared resources. However, con-trol schemes that work for one do not nec-essarily work for the other, as il lustrated bythe following example. Suppose processesP1 and P2 require access to resources R1and R2 at dif ferent points in thei r execution.In an operating system, the following inter-leaved execution of these processes is per-fectly acceptable: P1 uses R1, P2 uses R~, Peuses R2, P1 uses R2. In a database, however,this execution is not always acceptable. As-sume, for example, that P2 transfers fundsby debiting one account (RI), then creditinganother (R2). If P2 checks both balances, itwill see R~ after i t has been debited, bu t seeR2 before it has been credited. Other differ-ences between concurrency control and mu-tual exclusion are discussed in CHAM74.1 . T R A N S A C T I O N - P R O C E S S I N G M O D E LTo understand how a concurrency controlalgorithm operates, one must understandhow the algorithm fits into an overallDDBMS. In this section we present a sim-ple model of a DDBMS, emphasizing howthe DDBMS processes user interactions.Later we explain how concurrency controlalgorithms operate in the context of thismodel.1 . 1 P re l iminary De f in i t ions and DD B M S

    Arch i t ec t ureA distributed database management sys-tem (DDBMS) is a collection of sites in-terconnected by a network [DEPP76,

    ComputingSurveys,Vol 13, No. 2, June 1981

  • 7/28/2019 Concurrency Distributed Databases

    4/37

    188 P. A. Berns tein and N. GoodmanE x e c u t ,o n o f T D o l o b o s e

    R E A D s o w n g s b o l o n c eS u b t r o c t $ 1 , 0 0 O , O O OW R I T E r e s u l t

    R E A D c h e c k i n g b o l o n c eA d d $ 1 , O O O , O O OW R I T E r e s u l t

    1 , 2 , o o o , o o lI . , o o o , o o o

    1 , 5 o o , o o o . f$1.500,000 ]\

    o o . o . . o o o5Ol~,0OOls u m "S t ,S O D ,O D D J

    E x e c u t io n o f T

    R E A D s o v , n g s b o l o n c eR E A D c h e c k , n g b o l o n c e

    P r i n t S u m

    F i g u r e 2 . I n c o n s i s t e n t r e t r i e v a l a n o m a l y .ROTH77]. Each site is a computer runningone or both of the following software mod-ules: a transac tion manager (TM) or a dat amanager (DM). TMs supervise interactionsbetween users and the DDBMS while DMsmanage the actual database. A network isa computer-to-computer communicationsystem. The network is assumed to be per-fectly reliable: if site A sends a message tosite B, site B is guaranteed to receive themessage without error. In addition, we as-sume that between any pair of sites thenetwork delivers messages in the order th eywere sent.From a user's perspective, a databaseconsists of a collection of logical dataitems, denoted X, Y, Z. We leave the gran-ular ity of logical data items unspecified; inpractice, they may be files, records, etc. Alogical database state is an assignment ofvalues to the logical data items composinga database. Each logical data item may bestored at any DM in the system or redun-dantly at several DMs. A stored copy of a

    logical data item is called a stored dataitem. (When no confusion is possible, weuse the term data item for stored dataitem.) The stored copies of logical dat a it emX are denot ed xl . . . . . Xm. We typically usex to denote an arbitrary stored data item.A stored database state is an assignmentof values to the stored data items in adatabase.Users interact with the DDBMS by exe-cuting transactions. Transactions may beon-line queries expressed in a self-containedquery language, or application programswritten in a general-purpose programminglanguage. The concurrency control algo-rithms we study pay no attention to thecomputations performed by transactions.Instead, these algorithms make all of theirdecisions on the basis of the data items atransact ion reads and writes, and so detailsof the form of transactions are unim porta ntin our analysis. However we do assume thattransactions represent complete and cor-rect computations; each transaction, if ex-

    C o m p u t i n g S u r v e y s , V o l. 1 3 , N o 2 , J u n e 1 9 81

  • 7/28/2019 Concurrency Distributed Databases

    5/37

    C o n c u r r en c y C o n t r o l i n D a t a b a s e S y s t e m s 189t r o n s o c t t o n ,

    t r a n s o c t l o n

    t r o n s a c t l o n

    t r o n so ct lo n

    t r o n so ct lo n

    t r o n so ct lo n

    / X \F i g u r e 3 . D D B M S s y s t e m a r c hi te c tu r e .

    ecuted alone on an initially consistent da-tabase, would terminate, produce correctresults, and leave the database consistent.The l og i c a l r e ads e t (correspondingly,wri tese t ) of a transac tion is the set of logicaldata items the t ransaction reads (or writes).Similarly, s t o r e d r e a d s e t s and s t o r e dwr i t e se t s are the stored data items that atransaction reads and writes.The correctness of a concurrency controlalgorithm is defined relative to users' ex-pectations regarding transac tion execution.There are two correctness criteria: (1) usersexpect that each transaction submitted tothe system will eventually be executed; (2)users expect the computation performed byeach transaction to be the same whether itexecutes alone in a dedicated system or inparallel with other transactions in a multi-programmed system. Realizing this expec-tation is the principal issue in concurrencycontrol.A DDBMS contains four components(see Figure 3): transactions, TMs, DMs,and data. Transactions communicate withTMs, TMs communicate with DMs, and

    DMs manage the data. (TMs do not com-municate with other TMs, nor do DMscommunicate with othe r DMs.)TMs supervise transactions. Each trans-action executed in the DDBMS is super-vised by a s i ng l e TM, meaning that thetransaction issues all of its database oper-ations to that TM. Any distributed com-putation that is needed to execute thetransaction is managed by the TM.Four operations are defined at the trans-action-TM interface. READ(X) returnsthe value of X (a logical data item) in thecurrent logical database state. WRITE(X,new-value) creates a new logical databasestate in which X has the specified newvalue. Since transactions are assumed torepresent complete computations, we useBEGIN and END operations to brackettransaction executions.DMs manage the stored database, func-tioning as backend database processors. Inresponse to commands from transactions,TMs issue commands to DMs specifyingstored data items to be read or written. Thedetails of the T M- DM interface constitute

    Com puting Surveys, Vol . 13, No. 2 , Ju ne 1981

  • 7/28/2019 Concurrency Distributed Databases

    6/37

    190 P. A. Bernstein and N. Goodmanthe core of our transaction-processingmodel and are discussed in Sections 1.2 and1.3. Section 1.2 describes the TM-DM in-teraction in a centralized database environ-ment, and Section 1.3 extends the discus-sion to a distributed database setting.1 . 2 Centra l i zed T ransact ion -Process ingM o d e lA centralized DBMS consists of one TMand one DM executing at one site. A trans-action T accesses the DBMS by issuingBEGIN, READ, WRITE, and END oper-ations, which are processed as follows.

    BEGIN: The TM initializes for T a pri-vate workspace tha t functions as a tempo-rary buffer for values read from and writteninto the database.READ(X): The TM looks for a copy ofX in T's private workspace. If the copyexists, its value is returned to T. Otherwisethe TM issues din-read(x) to the DM toretrieve a copy of X from the database,gives the retrieved value to T, and puts itinto T's private workspace.

    WRITE(X, new-value): The TM againchecks the priva te workspace for a copy ofX. If it finds one, the value is updated tonew-value; otherwise a copy of X wi th thenew value is created in the workspace. Thenew value o f X is not stored in the databaseat this time.END: The TM issues dm-write(x) foreach logical data item X updated by T.Each dm-write(x) requests that the DMupdate the value of X in the stored databaseto t he value of X in T's local workspace.When all dm-writes are processed, T isfinished executing, and its private work-space is discarded.

    The DBMS may restart T any time be-fore a din-write has been processed. Theeffect of restarting T is to obliterate itsprivate workspace and to reexecute T fromthe beginning. As we will see, ma ny concur-rency control algorithms use transactionrestarts as a tactic for attaining correctexecutions. However, once a single dm-write has been processed, T cannot be re-started; each dm-write perma nently installsan update into the database, and we cannotpermit the database to reflect partial effectsof transactions.

    A DBMS can avoid such partial resultsby having the property of atomic commit-ment, which requires that either all of atransaction's din-writes are processed ornone are. The "standard" implementationof atomic commi tment is a procedure calledtwo-phase commit [LAMP76, GRAY78].1Suppose T is updating dat a items X and Y.When T issues its END, the first phase oftwo-phase commit begins, during which theDM issues prewrite commands for X andY. These commands instruct the DM tocopy the values of X and Y from T' s priva teworkspace onto secure storage. If theDBM S fails during the first phase, no ha rmis done, since none of T's updates have yetbeen applied to the stored database. Duringthe second phase, the TM issues din-writecommands for X and Y which instruct theDM to copy the values of X and Y into thestored database. If the DBMS fails duringthe second phase, the database ma y containincorrect information, but since the valuesof X and Y are stored on secure storage,this inconsistency can be rectified when th esystem recovers: the recovery procedurereads the values of X and Y from securestorage and resumes the commitment activ-ity.We emphasize that this is a mathemati-cal model of transac tion processing, an ap-proximation to the way DBMSs actuallyfunction. While the implementation detailsof atomic commitment are important indesigning a DBMS, they are not central toan understanding of concurrency control.To explain concurrency control algorithmswe need a model of transaction executionin which atomic commitment is visible, butnot dominant.1 .3 D is t ributed T ransact ion -Process ing

    M o d e lOur model of transaction processing in adistributed environment differs from thatin a centralized one in two areas: handlingprivate workspaces and implement ing two-phase commit.

    The term "two-phase commit" is commonly used todenote t he distributed version of this procedure. How-ever, since the centralized and dist ribute d versions areidentical in structure, we use "two-phase commit" todescribe both.Computing Surveys, Vol 13, No. 2, June 1981

  • 7/28/2019 Concurrency Distributed Databases

    7/37

    Concurrency Control in Database Systems 191In a centralized DBMS we assumed tha t(1) private workspaces were par t of the TM,and (2) data could freely move between atransaction and its workspace, and betweena workspace and the DM. These assump-tions are not appropriate in a DDBMSbecause TMs and DMs may run at differentsites and the moveme nt of data between aTM and a DM can be expensive. To reducethis cost, many DDBMSs employ queryoptimization procedures which regulate(and, it is hoped, reduce) the flow of databetween sites. For example, in SDD-1 theprivate workspace for transac tion T is dis-tributed across all sites at which T accessesda ta [BF.RN81]. The details o f how T readsand writes data in these workspaces is a

    query optimization problem and has no di-rect effect on concurrency control.The problem of atomic commitment isaggravated in a DDBMS by the possibilityof one site failing while the rest of thesystem continues to operate. Suppose T isupdating x, y, z stored at DMx, DMy, DMz,and suppose T' s T M fails after issuing dm-write(x), but before issuing the dm-writesfor y and z. At this point the database isincorrect. In a centralized DBMS this phe-nomenon is not harmful because no trans-action can access the database until theTM recovers from the failure. However, ina DDBMS, other TMs remain operationaland can access the incorrect database.To avoid this problem, prewrite com-mands must be modified slightly. In addi-tion to specifying data items to be copiedonto secure storage, prewrites also specifywhich other DMs are involved in the com-mitm ent activity. Then if the TM fails dur-ing the second phase of two-phase commit,the DMs whose dm-writes were not issuedcan recognize the situat ion and consult theother DMs involved in the commitment. Ifany DM received a dm-write, the remainingones act as if they had also received thecommand. The details of this procedure arecomplex and appear in HAMM80.As in a centralized DBMS, a transactionT accesses the system by issuing BEGIN,READ, WRITE, and END operations. Ina DDBMS these are processed as follows.

    BEGIN: The TM creates a private work-space for T. We leave the location andorganization of this workspace unspecified.READ(X): The TM checks T's private

    workspace to see if a copy of X is present.If so, tha t copy's value is made available toT. Otherwise the TM selects some storedcopy of X, say xi, and issues din-read(x,) tothe DM at which x, is stored. The DMresponds by retrieving the stored value ofx, from the database, placing it in the pri-vate workspace. The T M returns this valueto T.WRITE(X, new-value): The value of X inT's private workspace is updated to new-value, assuming the workspace contains acopy of X. Otherwise, a copy of X with t henew value is crea ted in the workspace.END: Two-phase commit begins. Foreach X updated by T, and for each storedcopy x, of X, the TM issues a prewrite (x,)to the DM tha t stores x,. The DM respondsby copying the value of X from T's privateworkspace onto secure storage internal tothe DM. After all prewrites are processed,the TM issues dm-writes for all copies of alllogical data items updated by T. A DMresponds to dm-write(x,) by copying thevalue of x, from secure storage into thestored database. After all dm-writes areinstalled, T's execution is finished.2 . D EC O M P O S IT I O N O F T H E C O N C U R -R EN C Y C O N T R O L P R O B L E MIn this section we review concurrency con-trol theory with two objectives: to define"correct executions" in precise terms, andto decompose the concurrency controlproblem into more tractable subproblems.

    2.1 Seria l izabi l i tyLet E denote an execution of transactionsT1 . . . . . T,. E is a serial execution if notransactions execute concurrently in E; tha tis, each transact ion is executed to comple-tion before the next one begins. Every serialexecution is defined to be correct, becausethe properties of transactions (see Section1.1) imply th at a serial execution termina tesproperly and preserves database consist-ency. An execution is serializable if it iscomputationally equivalent to a serial exe-cution, th at is, if it produces the same out-put and has the same effect on the databaseas some serial execution. Since serial exe-cutions are correct and every serializableexecution is equivalent to a serial one, everyserializable execution is also correct. The

    Com puting Surveys , V ol. 13, No. 2 , J un e 1981

  • 7/28/2019 Concurrency Distributed Databases

    8/37

    192 P . A . B e r n s t e i n a n d N . G o o d m a nT r a n s a c h o n s Da t abase

    T 1 BEGIN; i----n ~REA D (X); WRITE(Y); END

    T2 BEGIN;RE AD (Y), WRITE (Z); E ND

    T 3 . BEGIN,READ(Z), W RIT E(X), ENDO n e p o s s i b l e e x e c u t i o n o f T 1 , T 2 , a n d T 3 i s r e p r e s e n t e d b y t h ef o l lo w i n g l o gs . ( N o t e . r, [ x ] d e n o t e s t h e o p e r a t i o n d i n - r e a d ( x ) i s s u e dby T~; w, [x ] de not e s a d i n-wr i t e (x) i s su ed by T , .)L o g f o r D M A : rl[xl]wl[yl]r2[yl]w3[xl]L o g f o r D M B : wl[y2]w2[z2]L o g fo r D M C . w2[z3]r3[z3]

    Figure 4 . M o d e l i n g e x e c ut i o n s a s l og s. T h e e x e c ut i o n m o d e l e d i n Fi g u r e 4 i s s er ia l. E a c h

    l o g i s i t s el f e r ia l ; h a t i s, t h e r e i s n o i n t e r l e a v i n g o fo p e r at i o n s f r o m d if f e re n t r an s a c ti o n s . t D M A , T ip r e c e d e s T ~ p r e c e d e s T 3 ; a t D M B , % p r e c e d e s T ~;a n d a t D M C , T 2 p r e c e d e s T 3 . T h e r ef o r e , T I, T 2 , T 3i s a t o t a l o r d e r s a t i s f y i n g h e d e f i n i t i o n f s e ri al .

    T h e f o l l ow i n g e x e c u t i o n is n o t s e ri al . h e l o g s t h e m -s e l v e s a r e n o t s er i al .D M A : rl[xl]r2[YllW3[Xl]Wl[ l]D M B : w2[z2]wl[y2]D M C : w2[z3lr3[z3]

    T h e f o l l o w i n g e x e c u t i o n is a l so n o t s e r i a l A l t h o u g he a c h l o g i s s e r ia l , t h e r e i s n o t o t a l o r d e r c o n s i s t e n tw i t h a l l l ogs .D M A : rl[x~]wl[yl]re[yl]w3[x~]D M B : w2[z2]wl[y2]D M C : w2[z3]r3[z3]

    F i g u r e 5 . S e r i a l a n d n o n s e r i a l l o o p s .goal of database concurrency control is toensure t ha t all executions are serializable.The only operations that access thestored database are din-read and din-write.Hence it is sufficient to model an executionof transactions by the execution of din-reads and din-writes at the various DMs ofthe DDBMS. In this spirit we formallymodel an execution of transactions by a setof logs, each of which indicates the order inwhich dm-reads and din-writes are proc-essed at one DM (see Figure 4). An execu-tion is ser ial i f there is a total order oftransactions such tha t if T, precedes Tj in

    the total order, then all of T,'s operationsprecede all of Tfs operations in every logwhere both appear (see Figure 5). Intui-tively, this says that transactions executeserially and in the same order at all DMs.Two operations conf l ic t if they operateon the same data item and one of the op-erations is a dm-write. The order in whichoperations execute is computationally sig-nificant if and only if the operations con-flict. To illustrate the notion of conflict,consider a data item x and transac tions T,and Tj. If T, issues dm-read (x) and T~issues dm-write(x), the value read by T, will(in general) differ depending on whetherthe dm-read precedes or follows the dm-write. Similarly, if both transactions issuedm-write(x) operations, the final value of xdepends on which dm-write happens last.Those conflict situations are called r e a d -wri te (rw) conf l ic ts and wri te-wri te (ww)conflicts , respectively.The notion of conflict helps characterizethe equivalence of executions. Two execu-tions are com pu t a t i ona l l y eq u i v a l en t if (1)each dm-read operation reads data item

    values that were produced by the same dm-writes in both executions; and (2) the finaldm-write on each data item is the same inboth executions [PAPA77, PAPA79]. Condi-tion (1) ensures that each transaction readsthe same input in both executions (andtherefore performs the same computation).Com puting Surveys, V ol. 13, No. 2, Jun e 1981

  • 7/28/2019 Concurrency Distributed Databases

    9/37

    C o n c u r r e n c y C o n t r o l i n D a t a b a s e S y s t e m s 193Combined with (2), it ensures that bothexecutions leave the database in the samefinal state.From this we can characterize serializa-ble executions precisely.T h e o r e m 1 [PAPA77, PAPA79, STEA76]L e t T ffi (T1, ..., Tin} b e a s e t o f tr a n s a c -t i o n s a n d l e t E b e a n e x e c u t i o n o f t h e s et r a n s a c t i o n s m o d e l e d b y l o g s ( L b . . . .Lm}. E i s se r ia l i za b le i f t h e re ex i s t s a to ta lo r d e r i n g o f T s u c h t h a t f o r e a ch p a i r o fco n f l i c t in g o p er a t io n s O~ a n d Oj f ro m d i s -t in c t t ra n sa c t io n s T , a n d T j ( re sp ec t i ve l y ) ,O~ p r e c e d e s O j i n a n y l o g L~ . . . . L m i f a n do n ly i f T~ p re ced es T~ in th e to ta l o rd er in g .

    The total order hypothesized in Theorem1 is called a s e r i a l i z a t i o n o r d e r . If thetransactions had executed serially in theserialization order, the computation per-formed by the transactions would havebeen identical to the computation repre-sented by E.To attain serializability, the DDBMSmust guarantee that all executions satisfythe condition of Theorem 1, namely, thatconflicting dm-reads and dm-writes beprocessed in certain relative orders. Co n -c u r r e n c y c o n t r o l is the activity of control-ling the relative order of conflicting opera-tions; an algorithm to perform such controlis called a s y n c h r o n i z a t i o n t e c h n i q u e . Tobe correct, a DDBMS must incorporatesynchronization techniques tha t guaranteethe conditions of Theorem 1.

    (3) T, --,ww Tj if in some log of E, T, wri tesinto some dat a i tem into which T~ sub-sequently writes;(4) T, --,~w~ Tj if T, -*~ T~ or T, --*w~ Tj;(5) T~ --* Tj if Tj --*~ T~ or T~ --*ww%

    Intuitively, -* (with any subscript)means "in any serialization must precede."For example, T, --*~w Tj means "T, in anyserialization must precede Tj." This inter-pretation follows from Theorem 1: If T,reads x before Tj writes into x, then thehypothetical serialization in Theorem 1must have T, preceding T~.Every conflict between operations in E isrepresented by an --, relationship. There-fore, we can restate Th eore m 1 in terms of--,. According to Th eorem 1, E is serializa-ble if there is a total order of transactionstha t is consistent with -*. Th is latte r con-dition holds if and only if --, is acyclic. (Arelation, --*, is a cyc l i c if there is no sequenceT1 -* T2, T2 --* Ta . . . . Tn-1 --* Tn such t hatT1 ffi T~.) Let us decompose --, into itscomponents, --*rwr and--* ww, and re state thetheorem using them.

    Theorem 2 [BERNSOa]L e t "-'>rwr a n d ---,ww b e a s so c ia ted w i th exe -cu t io n E. E i s ser ia l izab le i f (a ) -'*rwr a nd"-'>w~ ar e acyc lic , a n d (b) the re is a to talo r d e r i n g o f t h e t ra n s a c t i o n s c o n s i s te n tw i t h a l l - - ~ a n d a l l ---~w r e l a t io n s h i p s .

    2 . 2 A P arad igm for Concu rrency Contro lIn Theorem 1, rw and ww conflicts aretreated together under the general notionof conflict. However, we can decompose theconcept of serializability by distinguishingthese two types of conflict. Let E be anexecution modeled by a set of logs. Wedefine several binary relations on transac-tions in E, denoted by -* with various sub-scripts. For each pair of transactions, T~and Tj(1) T~ --*~w Tj if in some log of E, T, readssome data item into which T~ subse-quently writes;(2) T~ --*~ T~ if in some log of E, T, writesinto some data item that Tj subse-quently reads;

    Theo rem 2 is an immediate consequenceof Theorem 1. (Indeed, par t (b) of Theore m2 is essentially a re sta tement of the earliertheorem.) However, this way of character-izing serializability suggests a way of de-composing the problem into simpler parts.Theore m 2 implies that rw and ww conflictscan be synchronized independently exceptinsofar as there mus t be a total ordering ofthe transactions consistent with both typesof conflicts. This suggests th at we can useone technique to guarantee an acyclic--*~w~relation (which amount s to r e a d - w r i t es y n c h r o n i z a t i o n ) and a different techniqueto guaran tee an acyclic --*~,~ relation( w r i t e - w r i t e s y n c h r o n i z a ti o n ) . However, inaddition to both - - .~ and -*ww beingacyclic, there mus t also be o n e serial order

    Co mp uting Surveys, Vol . 13, No. 2 , Ju ne 1981

  • 7/28/2019 Concurrency Distributed Databases

    10/37

    194 P.A. Bernstein and N. Goodmanconsistent with all--, relations. This serialorder is the cement that binds together therw and ww synchronization techniques.Decomposing serializability into rw andww synchronization is the cornerstone ofour paradigm for concurrency control. Itwill be important hereafter to distinguishalgorithms tha t attain either rw or ww syn-chronization from algorithms tha t solve theentire distributed concurrency controlproblem. We use the term synchronizationtechnique for the former type of algorithm,and concurrency control method for thelatter.

    3 . S Y N C H R O N I Z AT I O N T E C H N I Q U E SB A S ED O N T W O - P H A S E L O C K I N GTwo-phase locking (2PL) synchronizesreads and writes by explicitly detecting andpreventing conflicts between concurrentoperations. Before reading data item x,a transaction must "own" a readlock onx. Before writing into x, it must "own" awritelock on x. The ownership of locks isgoverned by two rules: (1) different trans-actions cannot simultaneously own con-flicting locks; and (2) once a transactionsurrenders ownership of a lock, it may neverobtain additional locks.The definition of conflicting lock de-pends on the type of synchronization beingperformed: for rw synchronization twolocks conflict if (a) both are locks on thesame data item, and (b) one is a readlockand the other is a writelock; for ww syn-chronization two locks conflict if (a) bothare locks on the same data item, and (b)both are writelocks.The second lock ownership rule causesevery transaction to obtain locks in a two-phase manner. During the growing phasethe transaction obtains locks without re-leasing any locks. By releasing a lock thetransaction enters the shrinking phase.During this phase the transaction releaseslocks, and, by rule 2, is prohibited fromobtaining additional locks. When t he trans-action terminates (or aborts), all remain inglocks are automatically released.A common variation is to require thattransactions obtain all locks before begin-ning their main execution. This varia tion iscalled predeclaration. Some systems also

    require t ha t transactions hold all locks untilterminationTwo-phase locking is a correct synchro-nization technique, meaning that 2PLatta ins an acyclic --*~ (--*~) re lationwhen used for rw (ww) synchronization[BERs79b, EswA76, PAPA79]. The seriali-zation order atta ined by 2PL is determinedby the order in which transactions obtainlocks. The point at the end of the growingphase, when a transact ion owns all the locksit ever will own, is called the locked pointof the transaction [BERN79b]. Let E be anexecution in which 2PL is used for rw (ww)synchronization. The --*~ (--*~) relationinduced by E is identical to the relationinduced by a serial execution E' in whichevery transaction executes at its lockedpoint. Thus the locked points of E deter-mine a serialization order for E.3 . 1 B a s i c 2P L I mp le me n ta t io nAn implementation of 2PL amounts tobuilding a 2PL scheduler, a software mod-ule that receives lock requests and lockreleases and processes them according tothe 2PL specification.The basic way to implement 2PL in adistributed database is to distribute theschedulers along with the database, placingthe scheduler for data item x at the DMwere x is stored. In this implementationreadlocks may be implicitly requested bydin-reads and writelocks may be implicitlyrequested by prewrites. If the requestedlock cannot be granted, the operation isplaced on a waiting queue for the desireddata item. (This can produce a deadlock,as discussed in Section 3.5.) Writelocks areimplicitly released by din-writes. However,to release readlocks, special lock-release op-erations are required. These lock releasesmay be transmitte d in parallel with the din-writes, since the dm-writes signal the startof the shrinking phase. When a lock isreleased, the operations on the waitingqueue of that data item are processed first-in/first-out (FIFO) order.Notice that this implementation "auto-matically" handles redundant data cor-rectly. Suppose logical data item X hascopies xl, ..., xm. If basic 2PL is used forrw synchronization, a tr ansaction m ay readany copy and need only obtain a readlock

    C o r a p u t m g S u r v e y s , V o l . 13 , N o . 2 , J u n e 1 98 1

  • 7/28/2019 Concurrency Distributed Databases

    11/37

    Concurrency Control in Database Systems 195o n t h e c o p y o f X i t a c t u a l l y r e a d s. H o w e v e r ,i f a t ra n s a c t i o n u p d a t e s X , t h e n i t m u s tu p d a t e a ll c o p i e s o f X , a n d s o m u s t o b t a i nw r i t e l o c k s o n a l l c o p i e s o f X ( w h e t h e r b a s i c2 P L i s u s e d f o r r w o r w w s y n c h r o n iz a t i on ) .3 2 P r im a r y C o p y 2 P LPrimary copy 2PL i s a 2 P L t e c h n i q u e t h a tp a y s a t t e n t i o n t o d a t a r e d u n d a n c y[ S T o s 7 9 ] . O n e c o p y o f e a c h l o g i ca l d a t ai t e m i s d e s i g n a t e d t h e primary copy; b e f o r ea c c e ss i ng a n y c o p y o f t h e l og i ca l d a t a i t e m ,t h e a p p r o p r i a t e l o c k m u s t b e o b t a i n e d o nt h e p r i m a r y c o p y.F o r r e a d l o c k s t h i s t e c h n i q u e r e q u i r e sm o r e c o m m u n i c a t i o n t h a n b a s i c 2 P L . S u p -p o s e x l i s t h e p r i m a r y c o p y o f lo g ic a l d a t ai t e m X , a n d s u p p o s e t ra n s a c t i o n T w i s h e st o r e a d s o m e o t h e r c o p y , x,, o f X . T o r e a dx,, T m u s t c o m m u n i c a t e w i t h t w o D M s , t h eD M w h e r e X s i s s t o r e d ( so T c a n l o c k x l )a n d t h e D M w h e r e x , i s s t o re d . B y c o n t r a st ,u n d e r b a s i c 2 P L , T w o u l d o n l y c o m m u n i -c a t e w i t h x ,'s D M . F o r w r i t e lo c k s , h o w e v e r ,p r i m a r y c o p y 2 P L d o e s n o t i n cu r e x t r a c o m -m u n i c a t i o n . S u p p o s e T w i s h e s t o u p d a t e X .U n d e r b a s i c 2 P L , T w o u l d i s s u e p r e w r i t e st o a l l c o p i e s o f X ( t h e r e b y r e q u e s t i n gw r i t e l o c k s o n t h e s e d a t a i t e m s ) a n d t h e ni s s u e d m - w r i t e s t o a l l c o p i e s . U n d e r p r i -m a r y c o p y 2 P L th e s a m e o p e r a t io n s w o u l db e r eq u i r e d , b u t o n l y t h e p r e w r i t e (X l )w o u l d r e q u e s t a w r i t e l o c k . T h a t i s , p r e -w r i t e s w o u l d b e s e n t f o r x l, . . . , xm , b u t t h ep r e w r i t e s f o r x2 . . . . xm w o u l d n o t i m p l i c i t l yr e q u e s t w r i t e lo c k s .3 . 3 Vo t in g 2P LVoting 2PL (o r majority consensus 2PL) isa n o t h e r 2 P L im p l e m e n t a t i o n t h a t e x p lo i tsd a t a r e d u n d a n c y . V o t i n g 2 P L i s d e r i v e df r o m t h e m a j o r i t y c o n s e n s u s t e c h n i q u e o fT h o m a s [ T H O M 7 9 ] a n d i s o n l y s u i t a b l e f o rw w s y n c h r on i z a ti o n .T o u n d e r s t a n d v o ti ng , w e m u s t e x a m i n ei t in t h e c o n t e x t o f t w o - p h a s e c o m m i t . S u p -p o s e t r a n s a c t i o n T w a n t s t o w r i t e i n t o X .I t s T M s e n d s p r e w r i t es t o e a c h D M h o ld i nga c o p y o f X . F o r t h e v o t i n g p r o t o c o l , t h eD M a l w a y s r e s p o n d s i m m e d i a t el y . I t a c -k n o w l e d g e s r e c e i p t o f t h e p r e w r i t e a n d s a y s" l o c k s e t " o r " l o c k b l o c k e d . " ( I n t h e b a s i ci m p l e m e n t a t i o n i t w o u l d n o t a c k n o w l e d g ea t a l l u n t i l t h e l o c k i s s e t . ) A f t e r t h e T M

    r e c e i v e s a c k n o w l e d g m e n t s f r o m t h e D M s ,i t c o u n t s t h e n u m b e r o f" lo c k~ s e t" r e s p o n se s :i f t h e n u m b e r c o n s t i t u t e s a m a j o r i t y , t h e nt h e T M b e h a v e s a s i f a ll l o c k s w e r e s e t.O t h e rw i s e , i t w a i t s f o r " lo c k s e t " o p e r a t i o n sf r o m D M s t h a t o r i g i n a l l y s a i d " l o c kb l o c k e d . " D e a d l o c k s a s i d e ( s e e S e c t i o n 3 .5 ),i t w i ll e v e n t u a l l y r e c e i v e e n o u g h " l o c k s e t "o p e r a t i o n s t o p r o c e e d .S i n c e o n l y o n e t r a n s a c t i o n c a n h o l d am a j o r i t y o f l o c k s o n X a t a t i m e , o n l y o n et r a n s a c t i o n w r i t i n g i n t o X c a n b e i n i t ss e c o n d c o m m i t p h a s e a t a n y t i m e . A l l co p -i es o f X t h e r e b y h a v e t h e s a m e s e q u e n c e o fw r i t e s a p p l i e d t o t h e m . A t r a n s a c t i o n ' sl o c k e d p o i n t o c c u r s w h e n i t h a s o b t a i n e d am a j o r i t y o f i t s w r i t e l o c k s o n e a c h d a t a i t e mi n i t s w r i t e s e t . W h e n u p d a t i n g m a n y d a t ai t e m s , a t r a n s a c t i o n m u s t o b t a i n a m a j o r i t yo f l o c k s o n e v e r y d a t a i t e m b e f o r e i t i ss u e sa n y d m - w r i t es .I n p r i nc i p le , v o t i n g 2 P L c o u l d b e a d a p t e df o r r w s y n c h r o n i z a t io n . B e f o r e re a d i n g a n yc o p y o f X a t r a n s a c t i o n r e q u e s t s r e a d l o c k so n a ll c o p i e s o f X ; w h e n a m a j o r i t y o f l o c k sa r e se t , t h e t r a n s a c t i o n m a y r e a d a n y c o p y .T h i s t e c h n i q u e w o r k s b u t i s o v e r l y s t ro n g :C o r r e c t n e s s o n l y r e q u i r e s t h a t a s in g l e c o p yo f X b e l o c k e d - - n a m e l y , t h e c o p y t h a t isr e a d - - y e t t h is t e c h n i q u e r e q u e s t s l oc k s o na l l c o p i e s . F o r t h i s r e a s o n w e d e e m v o t i n g2 P L t o b e i n a p p r o p r i a t e f o r r w s y n c h r o n i -za t io n .3 . 4 C e n t ra li z e d 2P LI n s t e a d o f d i s t ri b u t i n g t h e 2 P L s c h e d u l e r s,o n e c a n c e n t r a l i z e t h e s c h e d u l e r a t a s i n g l es i t e [AL s B7 6 a , GARC7 9 a ] . Be fo re acce s s in gd a t a a t a n y s it e , a p p r o p r i a t e l o c k s m u s t b eo b t a i n e d f r o m t h e c e n t r a l 2 P L s c h e d u l e r .S o , f o r e x a m p l e , t o p e r f o r m d m - r e a d ( x )w h e r e x i s n o t s t o r e d a t t h e c e n t r a l s i te , t h eT M m u s t f i r st re q u e s t a r e a d l o c k o n x f r o mt h e c e n t r a l s i t e , w a l t f o r t h e c e n t r a l s i t e t oa c k n o w l e d g e t h a t t h e l o c k h a s b e e n s e t ,t h e n s e n d d m - r ea d ( x ) t o t h e D M t h a t h o l d sx . ( T o s a v e s o m e c o m m u n i c a t i o n , o n e c a nh a v e t h e T M s e nd b o t h t h e l o c k r e q u e s ta n d d m - r e a d ( x ) t o t h e c e n t r a l s i t e a n d l e tt h e c e n t r a l s i t e d i re c t l y f o rw a r d d m - r e a d ( x )t o x ' s D M ; t h e D M t h e n r e s p o n d s t o t h eT M w h e n d m - r e a d (x ) h a s b e e n p r o ce s se d .)L i ke p r i m a r y c o p y 2 PL , th i s a p p r o a c h t e n d st o r e q u i r e m o r e c o m m u n i c a t i o n t h a n b a s i c

    Co mp uting Surveys, Vol . 13, No. 2 , J un e 1981

  • 7/28/2019 Concurrency Distributed Databases

    12/37

    19 6 P . A . B e r n s te i n a n d N . G o o d m a nTronsochons Datobose

    T 1 : BEG IN; r - - - ~ t< :; :~ lR E AD ( X) ; W R I T E ( Y ) ; E N D

    T2 BEGIN;READ(Y); WRITE(Z); END

    T 3 , BEGIN,READ(Z), W RIT E(X), END S u p p o s e t r a n s a c t i o n s e x e c u t e co n c u r r e n t l y , w i t h e a c h t r a n s a c t i o nissuing its REA D before an y transaction issues its END. Th is part ial execution could be represented by the following logs

    DM A: r l [xl ]D M B : r~[y2]DM C: r3[z3]

    A t this point, T~ ha s readlock on xxT2 ha s readlock o n y2T3 has readlock on z3 Before proceeding, all transactions m us t obtain wntelocks.% requires wntelocks on y~ and yeT2 requires writelocks on z2 and z3T3 requires writelock on Xl But

    % cannot get writelock o n y2, until T2 releases readlockT~ cann ot get writelock o n z3, until T3 releases re adlockTs cannot get wn telock on x~, until Tx releases readlockTh is is a deadlock

    F igure 6. Deadlock.2 P L , s i n c e d m - r e a d s a n d p r e w r i t e s u s u a l l yc a n n o t i m p l i c i t l y r e q u e s t l o c k s .3 . 5 D e a d l o c k D e t e c t i on a n d P r e v e n t i o nT h e p r e c e d in g i m p l e m e n t a t i o n s o f 2 P Lf o r c e t r a n s a c t i o n s t o w a i t fo r u n a v a i l a b l el o c k s . I f t h i s w a i t i n g i s u n c o n t r o l l e d , d e a d -l o c k s c a n a r i s e ( s e e F i g u r e 6 ) .D e a d l o c k s i tu a t i o n s c a n b e c h a r a c t e r i z e db y w ai t s- fo r g r aph s [ H O L T 7 2 , K I N G 7 4 ] , d i -r e c t e d g r a p h s t h a t i n d i c a te w h i c h t r a n s a c -t i o n s a r e w a i t in g f o r w h i c h o t h e r t r a n s a c -t io n s . N o d e s o f t h e g r a p h r e p r e s e n t t r a n s -a c t io n s , a n d e d g e s r e p r e s e n t t h e " w a i t i n g -f o r " r e l a t i o n s h i p : a n e d g e i s d r a w n f r o mt r a n s a c t i o n T , t o t r a n s a c t i o n T j if T , isw a i t i n g f o r a lo c k c u r r e n t l y o w n e d b y T~ .T h e r e i s a d e a d l o c k i n t h e s y s t e m i f a n do n l y i f t h e w a i t s - f o r g r a p h c o n t a i n s a cycle( s e e F i g u r e 7 ) .

    T w o g e n e r a l t e c h n i q u e s a r e a v a i la b l e f o rd e a d l o c k r e s o l u t i o n : dead l ock p r ev en t i ona n d deadlock de tec t ion .3.5.1 Deadlock PreventionD e a d l o c k p r e v e n t i o n i s a " c a u t i o u s "s c h e m e i n w h i c h a t r a n s a c t i o n i s r e s t a r t e dw h e n t h e s y s t e m i s " a f r a id " t h a t d e a d l o c km i g h t o c c u r . T o i m p l e m e n t d e a d l o c k p r e -v e n t i o n , 2 P L s c h e d u l e r s a r e m o d i f i e d a sf o ll o w s . W h e n a l o c k r e q u e s t i s d e n i e d , t h es c h e d u l e r te s t s t h e r e q u e s t i n g t r a n s a c t i o n( s a y T , ) a n d t h e t r a n s a c t i o n t h a t c u r r e n t l yo w n s t h e l o c k ( s a y T~ ). I f T , a n d T j p a s s t h et e s t, T , is p e r m i t t e d t o w a i t f o r T~ a s u s u a l .O t h e r w i s e , o n e o f th e t w o i s a b o r t e d . I f T ,i s r e s t a r t e d , t h e d e a d l o c k p r e v e n t i o n a l g o-r i t h m i s c a l l e d nonpreempt ive; i f T ~ i s r e -s t a r t e d , t h e a l g o r i t h m i s c a l le d preempt ive .T h e t e s t a p p l i e d b y th e s c h e d u l e r m u s t

    C o m p u t i n g S u r v e y s , V o l . 13 , N o 2 , J u n e 1 9 81

  • 7/28/2019 Concurrency Distributed Databases

    13/37

    Concurrency Control in Database SystemsT 1 m u s t w a l t f o r T2 tor e l e as e r e a d - l o c k o n Y2

    T~'~' r aT3 must wa,t for Tl tO ~ / 1 "2 must wa,t forT3tOr e l e a s e r e a d - l o c k o n x 1 - \ T 5 r e l e a s e r e a d - l o o k o n Z3

    Figure 7 . W a i t s - f o r g r a p h f o r F i g u r e 6 .

    1 9 7

    g u a r a n t e e t h a t i f T , w a i t s f o r T j, t h e n d e a d -l o c k c a n n o t r e su l t. O n e s i m p l e a p p r o a c h i sn e v e r t o l e t T ~ w a i t f o r T j. T h i s t r i v i a ll yp r e v e n t s d e a d l o c k b u t f o r c e s m a n y r e s t a rt s .A b e t t e r a p p r o a c h i s t o a s s i g n prioritiest o t r a n s a c t i o n s a n d t o t e s t p r i o r i t i e s t o d e -c i d e w h e t h e r T , c a n w a i t f o r T j. F o r e x a m -p l e, w e c o u l d l e t T , w a i t f o r T j i f T , h a sl o w e r p r i o r i t y t h a n T j ( i f T~ a n d T j h a v eequal p r i o r i t i e s , T , c a n n o t w a i t f o r T j , o rv i c e v e r sa ) . T h i s t e s t p r e v e n t s d e a d l o c kb e c a u s e , f o r e v e r y e d g e ( T , T j ) i n t h e w a i t s -f o r g r a p h , T , h a s l o w e r p r i o r i t y t h a n T j .S i n c e a c y c l e i s a p a t h f r o m a n o d e t o i t s e l fa n d s i n c e T , c a n n o t h a v e l o w e r p r i o r i t ythCan i t se l f , no cy c le c an ex is t .O n e p r o b l e m w i t h t h e p r e c e d i n g a p -p r o a c h i s t h a t cyclic restart i s p o s s i b l e - -s o m e u n f o r t u n a t e t r a n s a c t i o n c o u l d b e co n -t i n u a l l y r e s t a r t e d w i t h o u t e v e r fi n is h in g . T oa v o i d t h i s p r o b l e m , R o s e n k r a n t z e t a l .[ R o s E 7 8 ] p r o p o s e u s i n g " t i m e s t a m p s " a sp r i o r i t i e s . I n t u i t i v e l y , a t r a n s a c t i o n ' s t i m e -s t a m p i s t h e t i m e a t w h i c h i t b e g i n s e x e c u t-i ng , s o o l d t r a n s a c t i o n s h a v e h i g h e r p r i o r i t yt h a n y o u n g o n es .T h e t e c h n i q u e o f R o s ~ . 7 8 r e q u i r e s t h a te a c h t r a n s a c t i o n b e a s s i g n e d a uniquet i m e s t a m p b y it s T M . W h e n a t r a n sa c t i o nb e g i n s , t h e T M r e a d s t h e l o c a l c l o c k t i m ea n d a p p e n d s a u n i q u e T M i d e n t i f i e r t o t h el o w - o r d e r b i t s [ T H O M 7 9 ] . T h e r e s u l t i n gn u m b e r i s t h e d e s ir e d t i m e s t a m p . T h e T Ma l so a g re e s n o t t o a s si g n a n o t h e r t i m e s t a m pu n t i l t h e n e x t c l o c k t ic k . T h u s t i m e s t a m p sa s s i g n e d b y d i f f e r e n t T M s d i f f e r i n t h e i rl o w - o r d e r b i t s ( s i n c e d i f f e r e n t T M s h a v ed i f f e r e n t i d e n t i f i e r s ) , w h i l e t i m e s t a m p s a s -s i gn e d b y t h e s a m e T M d i ff e r i n t h e i r h i g h-o r d e r b i t s ( si nc e th e T M d o e s n o t u s e t h es a m e c l oc k t i m e t w i c e) . H e n c e t i m e s t a m p sa r e u n i q u e t h r o u g h o u t t h e s y s t e m . N o t et h a t t h i s al g o r i th m d o e s n o t r e q u i r e c l o c k sa t d i f f e r e n t s i t e s t o b e p r e c i s e l y s y n c h r o -n ized .

    T w o t i m e s t a m p - b a s e d d e a d l o c k p r e v e n -t i o n s c h e m e s a r e p r o p o s e d i n R a s p , 7 8 .Wait-Die i s t h e n o n p r e e m p t i v e t e c h n i q u e .S u p p o s e t r a n s a c t i o n T , t r i e s t o w a i t f o r T~ .I f T , h a s lo w er p r io r i ty th an T ~ ( i.e ., T , i sy o u n g e r t h a n T ~ ), t h e n T , i s p e r m i t t e d t ow a i t . O t h e r w i s e , i t i s a b o r t e d ( " d i e s " ) a n df o r c e d to r e s ta r t . I t i s i m p o r t a n t t h a t T , n o tb e a s s i g n e d a n e w t i m e s t a m p w h e n i t r e -s t a r t s . Wound.Wait i s t h e p r e e m p t i v ec o u n t e r p a r t t o Wait-Die. I f T , h a s h i g h e rp r i o r i ty t h a n T j , t h e n T , w a it s ; o t h e r w i s e T ji s a b o r t e d .B o t h W a i t - D i e a n d W o u n d - W a i t a v o i dc y c li c r e s t a rt . H o w e v e r , i n W o u n d - W a i t a no ld tr a n s a c ti o n m a y b e r e s ta r t e d m a n yt i m e s , w h i l e i n W a i t - D i e o l d t r a n s a c t i o n sn e v e r r e s t a r t . I t i s s u g g e s t e d i n R o s E 7 8 t h a tW o u n d - W a i t i n d u c e s f e w e r r e st a r t s i n t ot a l.C a r e m u s t b e e x e r c i se d in u s in g p r e e m p -t i v e d e a d l o c k p r e v e n t i o n w i t h t w o - p h a s ec o m m i t : a tr a n s a c ti o n m u s t n o t b e a b o r t e do n c e t h e s e c o n d p h a s e o f t w o - p h a s e c o m m i th a s b e g u n . I f a p r e e m p t i v e t e c h n i q u ew i s h e s t o a b o r t T j , i t c h e c k s w i t h T f s T Ma n d c a n c e l s t h e a b o r t i f T j h a s e n t e r e d t h es e c o n d p h a s e . N o d e a d l o c k c a n r e s u l t b e -c a u s e i f T j i s i n t h e s e c o n d p h a s e , i t c a n n o tb e w a i ti n g f o r a n y t r a n s a c t i o n s .Preordering of resources i s a d e a d l o c ka v o i d a n c e t e c h n i q u e t h a t a v o i d s r e s t a r t sa l t o g e t h e r . T h i s t e c h n i q u e r e q u i r e s p r e d e -c l a r a ti o n o f l o c k s ( e a c h t ra n s a c t i o n o b t a i n sa l l i t s l o c k s b e f o r e e x e c u t i o n ) . D a t a i t e m sa r e n u m b e r e d a n d e a c h t r a n s a c t i o n r e -q u e s t s l o c k s o n e a t a t i m e i n n u m e r i c o r d e r .T h e p r i o r i ty o f a tr a n s a c t i o n i s t h e n u m b e ro f t h e h i g h e s t n u m b e r e d l o c k it o w n s. S i n c ea t r a n s a c t i o n c a n o n l y w a i t f o r t r a n s a c t i o n sw i t h h i g h e r p r i o r i t y , n o d e a d l o c k s c a n o c -c u r . I n a d d i t i o n t o re q u i r i n g p r e d e c l a r a t i o n ,a p r i nc i p a l d i s a d v a n t a g e o f t h i s t e c h n i q u ei s t h a t i t fo r c e s l o c k s t o b e o b t a i n e d s e q u e n -t i a l l y , w h i c h t e n d s t o i n c r e a s e r e s p o n s et i m e .

    Com puting Surveys, V ol. 13, No. 2, Ju ne 1981

  • 7/28/2019 Concurrency Distributed Databases

    14/37

    198 P . A . B e r n s t e i n a n d N . G o o d m a n Consider the execution illustrated in Figures 6 and 7. Locks are request ed at DMs in t he following order:

    DM A DM B DM Creadlock xl for T1 readlock y2 for T2 readlock z3 for %writelock yl for T~ writelock z2 for T2*writelock x~ for T3 *writelock y2 for T1 *writelock z3 for T2 None of the "star red" locks can be granted and the system is in deadlock. However,the waits-for graphs at each DM are acyclic.

    DM A DM B DM C ,(9 @ ,@Figure 8 . Multisite deadlock.

    3 . 5 . 2 D e a d l o c k D e t e c t io nIn d e a d l o c k d e t e c t i o n , transactions wait foreach other in an uncontrolled manner andare only aborted if a deadlock actually oc-curs. Deadlocks are detected by explicitlyconstructing the waits-for graph andsearching it for cycles. {Cycles in a graphcan be found efficiently using, for example,Algori thm 5.2 in AHO75.) If a cycle is found,one transaction on the cycle, called thev i c t i m , is aborted, thereby breaking thedeadlock. To minimize the cost of resta rtingthe victim, victim selection is usually basedon the amount of resources used by eachtransaction on the cycle.The principal difficulty in implementingdeadlock detection in a distributed data-base is constructing the waits-for graph ef-ficiently. Each 2PL scheduler can easilyconstruct the waits-for graph based on thewaits-for relationships local to that sched-uler. However, these local waits-for graphsare not sufficient to characterize all dead-locks in the distributed system (see Figure8). Instead, local waits-for graphs must becombined into a more "global" waits-forgraph. (CentrAlized 2PL does not have thisproblem, since there is only one scheduler.}We describe two techniques for construct-ing global waits-for graphs: centralized andhierarchical deadlock detection.In the c e n t r a l i z e d approach, one site isdesignated the deadlock detector for thedis tribu ted system [GRAY78, STON79]. Pe-riodically (e.g., every few minutes) eachscheduler sends its local waits-for graph tothe deadlock detector. The deadlock detec-tor combines the local graphs into a system-

    wide waits-for graph by constructing theunion of the local graphs.In the h i e r a r c h i c a l approach, the data-base sites are organized into a hie rarchy (ortree), with a deadlock detector at each nodeof the hierarchy [MENA79]. For example,one might group sites by r e g i o n , then byc o u n t r y , then by c o n t i n e n t . Deadlocks tha tare local to a single site are detected at thatsite; deadlocks involving two or more sitesof the same region are detected by theregional deadlock detector; and so on.Although centralized and hierarchicaldeadlock detection differ in detail, both in-volve periodic transmiss ion of local waits-for information to one or more deadlockdetector sites. The periodic nature of theprocess introduces two problems. First, adeadlock may exist for several minuteswithout being detected, causing response-time degradation. The solution, executingthe deadlock detector more frequently, in-creases the cost of deadlock detection. Sec-ond, a transaction T may be restarted forreasons other than concurrency control(e.g., its site crashed). Until T's restartpropagates to the deadlock detector, thedeadlock detector can find a cycle in thewaits-for graph that includes T. Such acycle is called a p h a n t o m d e a d l o c k . Whenthe deadlock detector discovers a phantomdeadlock, it may unnecessarily restart atransaction other than T. Special precau-tions are also needed to avoid unnecessaryresta rts for deadlocks in voting 2PL. 22 Suppose logical da ta item X has copies x~, x2, and x3,and suppose usmg voting 2PL T, owns write-locks onx] and x2 but T,'s lock re ques t for x~ is blocked by Tj.

    Computing Surveys, Vol 13, No. 2, June 1981

  • 7/28/2019 Concurrency Distributed Databases

    15/37

    ConcurrencyA m a j o r c o s t o f d e a d l o c k d e t e c t i o n i s t h er e s t a r ti n g o f p a r t i a l ly e x e c u t e d t r a n s a c -t io n s . P r e d e c l a r a t i o n c a n b e u s e d t o r e d u c et h i s c o st . B y o b t a i n i n g a t r a n s a c t i o n ' s l o c k sb e f o r e i t e x e c u t e s , t h e s y s t e m w i ll o n l y re -s t a r t t r a n s a c t i o n s t h a t h a v e n o t y e t e xe -c u t e d . T h u s l i t t l e w o r k i s w a s t e d b y t h er e s t a r t .

    4 . S Y N C H R O N I Z AT I O N T EC H N I Q U E SB A S E D O N T I M E S T A M P O R D E R I N G

    T i m e s t a m p o r d e r i n g ( T / O ) i s a t e c h n i q u ew h e r e b y a s e r i a l i z a t i o n o r d e r i s s e l e c t e d ap r io r i a n d t r a n s a c t i o n e x e c u t i o n i s f o r c e d t oo b e y th i s o r d er . E a c h t r a n s a c t i o n i s a s-s ig n e d a u n i q u e t i m e s t a m p b y it s T M . T h eT M a t t a c h e s t h e ti m e s t a m p t o a ll d m - r e a d sa n d d m - w r i t e s i s s u e d o n b e h a l f o f t h e t r a n s-a c t i o n , a n d D M s a r e r e q u i r e d t o p r o c e s sconf l ic t ing operat ions i n t i m e s t a m p o rd e r.T h e t i m e s t a m p o f o p e r a t io n O is d e n o t e dt s (O) .T h e d e f i n i t i o n o f c o n f l ic t i n g o p e r a t i o n sd e p e n d s o n t h e t y p e o f s y n c h r o n i z a t i o nb e i n g p e r f o r m e d a n d i s a n a l o g o u s t o c o n -f l i c t i n g l o c k s . F o r r w s y n c h r o n i z a t i o n , t w oo p e r a t i o n s conf l ic t i f ( a ) b o t h o p e r a t e o nt h e s a m e d a t a i t e m , a n d ( b ) o n e i s a d m -r e a d a n d t h e o t h e r i s a d m - w r i t e . F o r w ws y n c h r o n iz a t i on , t w o o p e r a t i o n s conf l ic t i f(a ) b o t h o p e r a t e o n t h e s a m e d a t a i t e m , a n d( b ) b o t h a r e d m - w r i t e s .I t is e a s y to p r o v e t h a t T / O a t t a i n s a na c y c li c - - . ~ ( - . ~ w) r e l a ti o n w h e n u s e d f o rr w ( w w ) s y n c h r o n i z a t i o n . S i n c e e a c h D Mp r o c e s s e s c o n f l i c t i n g o p e r a t i o n s i n t i m e -s t a m p o rd e r , e ac h ed g e o f th e - - . ~w~ ( -~ ww)r e l a t i o n i s i n t i m e s t a m p o r d e r. C o n s e -q u e n t l y , a l l p a t h s i n t h e r e l a t i o n a r e i nt i m e s t a m p o r d e r a n d , s i n c e a ll t r a n s a c t i o n sh a v e u n i q u e t i m e s t a m p s , n o c y c l e s a r e p o s-s i b l e . I n a d d i t i o n , t h e t i m e s t a m p o r d e r i s av a l id s e r i a l i z a t io n o rd e r .4 . 1 B a s i c T / O I mp le me n ta ti o nA n i m p l e m e n ta t io n o f T / O a m o u n t s t ob u i l d i n g a T/ O schedu l e r , a s o f t w a r e m o d -u l e th a t r e c e iv e s d m - r e a d s a n d d m - w r i te sInsofar as xa's scheduler is concerned, T, is waitmg for%. How ever, since T, has a maJority of the copieslocked, T, can p roceed withou t waiting for Tj. Thisfact shou ld be incorporated into the deadlock resolu-tion sch em e to avoid unnecessary restarts.

    Cont ro l i n D a t abase Sys t em s 199a n d o u t p u t s t h e s e o p e r a t i o n s a c c o r d i n g t oth e T /O s p ec i f i c a t io n [SHAP 7 7 a , SHAP 7 7 b ] .I n p r a c t i c e , p r e w r i t e s m u s t a l s o b e p r o c -e s s e d t h r o u g h t h e T / O s c h e d u l e r f o r t w o -p h a s e c o m m i t t o o p e r a t e p r o p e r l y . A s w a st h e c a s e w i t h 2 P L , t h e b a s i c T / O i m p l e -m e n t a t i o n d i s t r i b u t e s t h e s c h e d u l e r s a l o n gw i t h t h e d a t a b a s e [ B E B N 8 0 a ] .I f w e i g n o re t w o - p h a s e c o m m i t , t h e b a s i cT / O s c h e d u l e r is q u i t e s im p l e. A t e a c h D M ,a n d f o r e a c h d a t a i t e m x s t o r e d a t t h e D M ,t h e s c h e d u le r r e c o r d s t h e l a rg e s t t i m e s t a m po f a n y d m - r e a d ( x ) o r d i n -w r i t e (x ) t h a t h a sb e e n p r o c e s se d . T h e s e a r e d e n o t e d R - t s (x )a n d W - t s( x ) , re s p e c t i v e l y . F o r r w s y n c h r o -n i z a t i o n , s c h e d u l e r S o p e r a t e s a s f o l l o w s .C o n s i d e r a d i n - r e a d ( x ) w i t h t i m e s t a m p T S .I f T S < W - t s ( x ) , S rejects t h e d m - r e a d a n da b o r t s t h e i s su i n g t r a n s a c ti o n . O t h e r w i s e So u t p u t s t h e d m - r e a d a n d s e t s R - t s ( x ) t om a x ( R - t s ( x ) - , T S ) . F o r a d m - w r i t e ( x ) w i t ht i m e s t a m p T S , S r e j e c t s t h e d m - w r i t e i fT S < R - t s ( x ) ; o t h e r w i s e i t o u t p u t s t h e d m -w r i t e a n d s e t s W - t s ( x) t o m a x ( W - t s ( x ) ,T S ) .F o r w w s y n c h r o n i z a ti o n , S r e j e c t s a d m -w r i te ( x ) w i t h ti m e s t a m p T S i f T S < W -t s( x ); o t h e r w i s e i t o u t p u t s t h e d m - w r i t e a n ds e t s W - t s ( x ) t o T S .W h e n a t r a n s a c t i o n i s a b o r t e d , i t i s a s-s i g ne d a n e w a n d l a r g e r t i m e s t a m p b y i tsT M a n d i s r e s t a r t e d . R e s t a r t i s s u e s a r ed i s c u s se d f u r t h e r b e l o w .T w o - p h a s e c o m m i t is i n co r p o ra t e d b yt i m e s t a m p i n g p r e w r i t e s a n d a c c e p t i n g o rr e j e c ti n g p r e w r i t e s i n s t e a d o f d m - w r i t e s .O n c e a s c h e d u l e r a c c e p t s a p r e w r i t e , it m u s tg u a r a n t e e t o a c c e p t t h e c o r r e s p o n d i n g d m -w r i te n o m a t t e r w h e n t h e d m - w r i te a rr iv e s.F o r r w ( o r w w ) s y n c h r o n i z a t i o n , o n c e Sa c c e p t s a p re w r i te ( x ) w i t h t im e s t a m p T S i tm u s t n o t o u t p u t a n y d m - r ea d ( x ) ( or d m -w r i t e ( x ) ) w i t h t i m e s t a m p g r e a t e r t h a n T Su n t i l t h e d m - w r i te ( x ) i s o u t p u t . T h e e f f e c ti s s i m i l a r t o s e t t i n g a w r i t e l o c k o n x f o r t h ed u r a t i o n o f t w o - p h a s e c o m m i t . .T o i m p l e m e n t t h e a b o v e ru le s, S buf fersd m - r e a d s , d m - w r i t e s , a n d p r e w r i t e s . L e tm i n - R - ts (x ) b e t h e m i n i m u m t i m e s t a m p o fa n y b u f f e r e d d i n - r e a d ( x ) , a n d d e f i n e m i n -W - t s ( x ) a n d m i n - P - t s ( x ) a n a l o g o u s l y . R ws y n c h r o n i z a t i o n i s a c c o m p l i s h e d a s f o l l o w s :1. L e t R b e a d m - r e a d ( x ) . I f t s ( R ) < W -t s ( x ), R i s r e j e c t e d . E l s e i f t s ( R ) > m i n -P - t s ( x ) , R i s b u f f e r e d . E l s e R i s o u t p u t .

    C o m p u t i n g S u r v e y s , V o l . 1 3, N o . 2 , J u n e 1 9 8 1

  • 7/28/2019 Concurrency Distributed Databases

    16/37

    200 P. A. Bernstein and N. GoodmanL e t R ffi dm -read (x) .Le t W ffi r im-w ri te (x).R i s r e ady f l i t p r e c e d e s t h e e a r l i e s t p r e w r i t e re q u e s t :f f t s (R ) < mi n-P - t s (x) .W i s re ady i f i t p r e c e d e s t h e e a r l i e s t d i n - r e a dr e q u e s t :

    ifts (W) < min-R-ts(x).When a din-write(x) arrives, do the following:I Bufferit,I

    e sI O u t p u t a l l r e a d y W ' s , a n d d e b u f f e r t h e i r ' 1p r e w r i t e s. ( T h i s m a y i n c r e a s e m i n - P- t ~ ( x ) In d m a k e s o m e R ' s ready . )!

    iO u t p u t a ll r e a d y R ' s . ( T h i s m a y in c r e a s e |m i n - R - t s (x ) a n d m a k e s o m e W ' s ready . ) IIFigure 9 . B u f f e r e m p t y i n g f o r b a s ic T / O r w s y n c h r o m z a t io n .

    2. Let P be a prewrite(x). If ts(P) < R-ts(x), P is rejected. Else P is buffered.3. Let W be a dm-write(x). W is neverrejected. If ts(W) > min-R-ts(x), W isbuffered. (If W were output it wouldcause a buffered dm-read(x) to be re-jected.) Else W is output.4. When W is output, the correspondingprewrite is debuffered. If this causesmin-P-ts(x) to increase, the buffered din-reads are re tested to see if any of the mcan be output. If this causes min-R-ts(x)to be increased, the buffered dm-writesare also retested, and so forth. This proc-ess is diagramed in Figure 9.

    Ww synchronization is accomplished as fol-lows:1. Le t P be a prewrite(x). If ts(P) < W-ts(x), P is rejected; else P is buffered.2. Let W be a dm-write(x). W is never

    rejected. If ts(W) > min-P-ts(x), W isbuffered; else W is output.3. When W is output, the correspondingprewrite is debuffered. If this causesmin-P-ts(x) to be increased, the buffereddm-writes are retested to see if any cannow be output. See Figure 10.As with 2PL, a common variation is torequire that transactions predeclare theirreadsets and writesets, issuing all dm-readsand prewrites before beginning their mainexecution.3 If all operations are accepted,

    3 T h e s e p r e w r i te s a r e n o n s t a n d a r d r e l a ti v e t o t h e d e f-i n i t io n i n S e c t i o n 1 . 4 . S i n c e n e w v a l u e s f o r t h e d a t ai t e m s i n t h e w r i t e s e t a r e n o t y e t k n o w n , t h e s e p r e -w r i t e s d o n o t i n s t r u c t D M s t o s t o r e v a l u e s o n s e c u r es t orage ; i ns t ead , p rewr i t e (x ) me re l y " w a r n s " t h e D Mt o e x p e c t a d i n - w r i t e (x ) m t h e n e a r f u t u r e . H o w e v e r ,t h e s e p r e w r i t e s a r e p r o c e s s e d b y s y n c h r o n i z a t i o n a l -g o r i t h m s e x a c t ly a s " s t a n d a r d " o n e s a re .Com puUng Surveys, V ol. 13, No. 2, Ju ne 1981

  • 7/28/2019 Concurrency Distributed Databases

    17/37

    Concurrency Control in Database Systems 201Wh en a d in-wr i t e (x) a r r ives , do the fo l lowmg:

    I B u f f e r l t ]

    esO u t p u t a l l r e ad y W ' s an d d eb u f f e r t h e i rp r ew r it e s . ( T h i s m ay i n c r ea s e m i n - P- t s ( x )a n d m a k e s o m e W ' s re a d y . )

    IFigure 1 0 . B u f f e r em p t y i n g fo r b a s i c T / O w w s y n ch r o n i za ti o n .the transaction is guaranteed to executewithout danger of restart. Another varia-tion is to delay the processing of operationsto wait for operations with smaller time-stamps. The extreme version of this heuris-tic is conservative T/O, described in Sec-tion 4.4.4 . 2 T h e T h o m a s W r it e R u leFor ww synchronization the basic T/Oscheduler can be optimized using an obser-vation of THOM79. Le t W be a dm-write(x),and suppose ts(W) < W-ts(x). Instead ofrejecting W we can simply ignore it. Wecall this the Thomas Write Rule (TWR ).Intuitively, TWR applies to a dm-write thattries to place obsolete information into thedatabase. The rule guarantees that the ef-fect of applying a set of dm-writes to x isidentical to wha t would have happened hadthe dm-writes been applied in timestamporder.If TWR is used, there is no need to in-corporate two-phase commit into the wwsynchronization algorithm; the ww sched-uler always accepts prewrites and neverbuffers dm-writes.4 . 3 M u lt iv e r si on T / OFor rw synchronization the basic T/Oscheduler can be improved using multiver-sion data items [REED78]. For each dataitem x there is a set of R-ts's and a set of

    (W-ts, value) pairs, called versions. The R-ts's of x record the timestamps of all exe-cuted dm-read(x) operations, and the ver-sions record the timestamps and values ofall executed dm-write(x) operations. (Inpractice one cannot store R-ts's and ver-sions forever; techniques for deleting oldversions and timestamps are described inSections 4.5 and 5.2.2.)Multiversion T/O accomplishes rw syn-chronization as follows (ignoring two-phasecommit). Let R be a dm-read(x). R is proc-essed by reading t he version of x wi th larg-est timestamp less than ts(R) and addingts(R) to x's set of R-ts's; see Figure lla. Ris never rejected. Let W be a dm-write(x),and let interval(W) be the interval fromts(W) to the smallest W-ts(x) > ts(W); 4see Figure llb. If any R-ts(x) lies ininterval(W), W is rejected; otherwise W isoutput and creates a new version of x withtimestamp ts(W).To prove the correctness of multiversionT/O, we must show that every execution isequivalent to a serial execution in time-stamp order [BERNS0b]. Let R be a dm-read(x) that is processed "out of order";that is, suppose R is executed after a dm-write(x) whose timestamp exceeds ts(R).Since R ignores all versions with time-

    4In terv a l (W ) ffi ( t s(W) ,oo) i f no W - ts (x) > t s (W )exis ts .

    C o m p u t i n g S u r ve y s, Vo l . 1 3, N o . 2 , J u n e 1 9 8 1

  • 7/28/2019 Concurrency Distributed Databases

    18/37

    202 P . A . B e r n s t e in a n d N . G o o d m a n( a) L e t u s r e p r e s e n t t h e v e r s i o n s o f a d a t a i t e m x o n a " t i m e li n e " :V a l u e s V1 V2 V3 " ' " Vn-1 V ~W -ti m es ta m ps ~ 1~0 2~0 . . . 912 1[00 ~--~T o p r o c e s s a d m - r e a d ( x ) w i t h t i m e s t a m p 9 5 , f i n d t h e b i g g e s tW - t i m e s t a m p l e s s t h a n 9 5 ; i n t h i s c a s e 9 2 . T h a t i s t h e v e r s i o ny o u r e a d . S o i n t h i s c a s e, t h e v a l u e r e a d b y t h e d i n - r e a d i s V,.].

    ( b) L e t u s r e p r e s e n t t h e R - t i m e s t a m p s o f x s i m i la r l y :R - t i m es t am ps ~ ~ ll 5 . , . 9[2 915V a l u e s V I 1 V I ~ VI3 " " " V , , - 1 V .I l l o oW - t i m e s t a m p s ~ 1 0 2 0 9 2L e t W b e d i n - w r i t e ( x ) w i t h t i m e s t a m p 9 3 . I n t e r v a l ( W ) f f i(93,100).

    T o p r o c e ss W w e c r e a t e a n e w v e r s i o n o f x w i t h t h a t t i m e s t a m p .R - t i m e s t a m p s 1 I J I I5 7 15 92 95V a l u e s V l V2 V3 . . Vn . 1 V VnI I I 9 1 2 9 1 3 1 1 0 0- t i me s t am ps 5 10 20 H o w e v e r , th i s n e w v e r s i o n " i n v a li d a t e s " t h e d i n - r e ad o f p a r t ( a) ,b e c a u s e i f t h e d i n - r e a d h a d a r r i v e d a f t e r t h e d i n - w r i t e , i t w o u l dh a v e r e a d v a l u e V i n s t e a d o f V n -1. T h e r e f o r e , w e m u s t r e j e c t t h ed i n-wr i t e .

    F i g u r e 1 1 . M u l t i v e r s i o n r e a d i n g a n d w r i t in g .

    stamps greater than ts(R), the value readby R is identical to the value it would haveread had it been processed in timestamporder. Now let W be a dm-write(x) that isprocessed "ou t of order"; that is, suppose itis executed after a dm-read(x) whose time-stamp exceeds ts(W). Since W was notrejected, there exists a version of x withtimestamp TS such that ts(W) < TS ts(P). Rw synchroniza tion is performedas follows:1. Let R be a dm-read(x). R is never rejected.If ts(R) lies in interval(prewrite(x))for some buffered prewrite(x), then R isbuffered. Else R is output.2. Let P be a prewrite(x). If some R-ts(x)lies in interval(P), P is rejected. Else Pis buffered.3. Let W be a din-write(x). W is alwaysoutput immediately.4. Whe n W is outpu t, its prewrite is debuf-

    fered, and the buffered din-reads are re-tested to see if the y can now be output.See Figure 12.Two-phase commit is not an issue for wwsynchronization, since dm-writes are neverrejec ted for ww synchronization.

    Com puting Surveys, Vo l. 13, No. 2, Jun e 1981

  • 7/28/2019 Concurrency Distributed Databases

    19/37

    Concurrency Control in Database SystemsLe t R ffi d in - r ead ( x ) . R i s r e ady i f t s ( R ) ~ i n t e r v a l

    ( P ) , w h e r e P i s a n y b u f f e r e dp r e w r i t e ( x ) .W h e n a d m - w r i t e a r r i v e s d o t h e f ol lo w i ng :

    O u t p u t I t a n d d e b u f f e r i t s p r e w r i t e [

    1o t u t r e a d y ' s ' lFigure 1 2 . B u f f e r e m p t y i n g f o r m u l t i v e r s l o n T / O .

    4 .4 Conservat ive T /OConservative timestamp ordering i s a t e c h -n i q u e f o r e l i m i n a t i n g r e s t a r t s d u r i n g T / Os c h e d u l i n g [ B E R N 8 0 a ] . W h e n a s c h e d u l e rr e c e i v e s a n o p e r a t i o n O t h a t m i g h t c a u s e af u t u r e r e s ta r t , t h e s c h e d u l e r delays 0 u n t i li t i s s u r e t h a t n o f u t u r e r e s t a r t s a r e p o s s i b le .C o n s e r v a t i v e T / O r e q u i r e s t h a t e a c hs c h e d u l e r r e c e i v e d m - r e a d s ( o r d m - w r i t e s )f r o m e a c h T M i n t i m e s t a m p o r d e r . F o re x a m p l e , i f s c h e d u l e r S j r e c e i v e s d m -r e a d (x ) f o ll o w e d b y d m - r e a d ( y ) f r o m T M , ,t h e n t s ( d m - r e a d ( x ) ) _ t s ( d m - r e a d ( y ) ) .S i n c e t h e n e t w o r k i s a s s u m e d t o b e a F I F Oc h a n n e l, t h i s t i m e s t a m p o r d e r in g i s a c c o m -p l i s h e d b y r eq u i r in g t h a t T M , send d in -r e a d s ( o r d i n - w r i t e s ) t o S : i n t i m e s t a m po r d e r :C o n s e r v a t i v e T / O b u f f e r s d i n - r e a d s a n dd i n - w r i te s a s p a r t o f i t s n o r m a l o p e r a t i o n .W h e n a s c h e d u l e r b u f f e rs a n o p e r a t i o n , i tr e m e m b e r s t h e T M t h a t s e n t it . L e t m i n - R -t s( T M , ) b e t h e m i n im u m t i m e s t a m p o f a n yb u f f e r e d d i n - r e a d f r o m T M ~ , w i t h m i n - R -t s ( T M , ) ffi - o o i f n o s u c h d i n - r e a d i s

    5 T h i s c a n b e i m p l e m e n t e d b y r e q u i ri n g t h a t T M sp r o c e s s t r a n s a c t i o n s s e ri a ll y A l t e r n a t i v e ly , w e c a nr e q u i r e t h a t t r a n s a c t i o n s i s s u e a l l d m - r e a d s b e f o r eb e g i n n i n g t h e i r m a i n e x e c u t m n , a n d a l l d m - w r i t e s a f t e rt e r m i n a t i n g t h e i r m a i n e x e c u ti on . T h e n t r a n s a c t io n sc a n e x e c u t e c o n c u r r e n t l y , a l t h o u g h t h e y m u s t t e r m i -n a t e i n t i m e s t a m p o r de r .

    203b u f f e re d . D e f i n e m i n - W - t s ( T M i ) a n a lo -g o u s ly .C o n s e r v a ti v e T / O p e r f o r m s r w sy n c h r o -n iza t io n a s fo l lo ws :1. L e t R b e a d i n - r e a d ( x ) . I f t s ( R ) > m i n -W - t s ( T M ) f o r a n y T M i n t h e s y s t em , R

    i s b u f f e r e d . E l s e R i s o u t p u t .2. L e t W b e a d m - w r i t e (x ) . I f t s ( W ) :> m i n -R - t s ( T M ) f o r a n y T M , W i s b u f fe r e d.E l s e W i s o u t p u t .3. W h e n R o r W i s o u t p u t o r b u f f e re d , t h i sm a y i n cr e a se m i n - R - t s ( T M ) o r m i n - W -t s ( T M ) ; b u f f e r e d o p e r a t i o n s a r e r e t e s t e dt o s e e i f t h e y c a n n o w b e o u t p u t .T h e e f f e c t i s t h a t R i s o u t p u t i f a n d o n l y

    i f ( a) t h e s c h e d u l e r h a s a b u f f e r e d d i n - w r i t ef r o m e v e r y T M , a n d ( b) t s ( R ) < m i n i m u mt i m e s t a m p o f a n y b u f f e r e d d m - w r it e . S i m i -l a r ly , W i s o u t p u t i f a n d o n l y i f ( a) t h e r e i sa b u f f e re d d in - r ea d f r o m e v e r y T M , a n d (b )t s (W ) < m i n i m u m t i m e s t a m p o f a n yb u f f e r e d d i n -r e a d . T h u s R ( o r W ) i s o u t p u tf f a n d o n l y i f t h e s c h e d u l e r h a s r e c e i v e de v e r y d i n - w r i t e ( o r d i n - r e a d ) w i t h s m a l l e rt i m e s t a m p t h a t i t w i l l e v e r re c e iv e .W w s y n c h r o n i z a t i o n i s a c c o m p l i s h e d a sfo l lows :1. L e t W b e a d i n -w r i t e (x ) . I f t s ( W ) > m i n -W - t s ( T M ) f o r a n y T M i n t h e s y s te m , Wi s b u f f e r e d ; e ls e i t i s o u t p u t .2. W h e n W i s b u f f e r e d o r o u t p u t , t h is m a yi n c r e a se m i n - W - t s ( T M ) ; b u f f e r e d d i n-w r i t e s a r e r e t e s t e d a c c o rd i n g ly .

    T h e e f f e c t i s t h a t t h e s c h e d u l e r w a i t su n t i l i t h a s a b u f f e r e d d in - w r i t e f r o m e v e r yT M a n d t h e n o u t p u t s t h e d i n - w r i t e w i t hs m a l l e st t i m e s t a m p .T w o - p h a s e c o m m i t n e e d n o t b e t i g h t l yi n t e g ra t e d i n t o c o n s e rv a t i ve T / O b e c a u s ed m - w r i t e s a r e n e v e r r e je c t e d . A l t h o u g h p r e -w r i t e s m u s t b e i s s u e d f o r a l l d a t a i t e m su p d a t e d , t h e s e o p e r a t i o n s a r e n o t p r o c e s s e db y t h e c o n s e rv a t iv e T / O s c h e du l e rs .T h e a b o v e i m p l e m e n t a t i o n o f c o n se r v a-t i v e T / O s u f fe r s t h r e e m a j o r p r o b l e m s : (1)I f s o m e T M n e v e r s e n d s a n o p e r a t i o n t os o m e s c h e d u l e r , t h e s c h e d u l e r w i l l " g e ts t u c k " a n d s t o p o u t p u t t i n g . ( 2 ) T o a v o i dt h e f ir st p r o b l e m , e v e r y T M m u s t c o m m u -n i c a t e r e g u l a r l y w i t h e v e r y sc h e d u l e r ; t h isi s i n f e a s i b l e i n l a r g e n e t w o r k s . ( 3 ) T h e i m -

    Com puting Surveys, V ol. 13, No. 2, J un e 1981

  • 7/28/2019 Concurrency Distributed Databases

    20/37

    204 P. A. Bernstein and N. Goodman A class i s d e f i n e d b y a r e a d s e t a n d a w r i t e se t . F o r

    e x a m p l e ,C I : r e a d s e t f fi { x l } , w r i t e s e t ffi { y l , Y 2}C2: re a d se t f fi {x l , y2} , w r i t e se t f fi {y l , y2 , z2 , z3}C3: re a d se t f fi {y2 , z3} , w r i t e se t - -- {x l , z2 , z3}

    A t r a n s a c t i o n i s a m e m b e r o f a c l a s s i f i t s r e a d s e t i sa s u b s e t o f t h e c l a s s r e a d s e t a n d i t s w r i t e s e t i s as u b s e t o f t h e c l a s s w n t e s e t . F o r e x a m p l e ,T l : r e a d s e t f fi { x l } , w r i t e s e t ffi { Y l, y 2}T 2 : r e a d s e t f fi ( y 2 ) , w r i t e s e t - - { z 2, z 3)T s " r e a d s e t ffi { z3 }, w r i t e s e t ffi { x~ }

    T ~ i s a m e m b e r o f C1 a n d C 2 T ~ is a m e m b e r o f C2 a n d C 3 T 3 I s a m e m b e r o f C3

    Fig ure 1 3 . T r a n s a c t i o n c l as s es .plementa tion is overly conservative; the wwalgorithm, for instance, processes all dm-writes in timestamp order, not merely con-flicting ones. These problems are addressedbelow.Null Operations. To solve the firstproblem, TMs are required to periodicallysend timestamped null operations to eachscheduler in the absence of "real" traffic. Anull operation is a dm-read or dm-writewhose sole purpose is to convey timest ampinformation and thereby unblock "real"dm-reads and prewrites. An impatientscheduler can prompt a TM for a null op-eration by sending a "request message."For example, for rw synchronization sup-pose scheduler S wants to process a dm-read with timestamp TS, bu t does not havea buffered dm-write f rom TM~. S can senda message to TM~ requesting a null-dm-write with timestamp greater than TS.A variation is to use null operations withvery large (perhaps infinite) timestamps.For example, if TM~ rarely needs to issuedm-reads to S, TM, can send S a null-dm-read with infinite timestamp signifying tha tTM, does not intend to communicate withS until further notice.

    Transaction Classes. Transac t ionclasses [BER~78a, BERN80d] is a techniquefor reducing communication in conserva-tive T/O and for supporting a less conserv-ative scheduling policy. As in predeclara-tion, assume t hat every transaction's read-set and writeset are known in advance. Aclass is defined by a readse t and a wri teset(see Figure 13). Transaction T is a member

    of class C if readset(T) is a subset of read-set(C) and writeset(T) is a subset of write-set(C). (Classes need not be disjoint.)Class definitions are not expected tochange frequently during normal operationof the system. Changing a class definitionis akin to changing the database schemaand requires mechanisms beyond the scopeof this paper. We assume that class defini-tions are stored in static tables that areavailable to any site requiring them.Classes are associated with TMs. Everytransaction that executes at a TM must bea member of a class associated with theTM. If a transacti on is submitt ed to a TMthat has no class containing it, the trans-action is forwarded to another TM thatdoes. We assume tha t every class is associ-ated with exactly one TM, and vice versa.The class associated with TM, is denotedC,. To execute transactions that are mem-bers of class C at two TMs, we define an-othe r class C' with the same definition as Cand associate C with one TM and C' withthe other. To execute transactions tha t aremembers of two classes at one site, wemultiprogram two TMs a t t hat site.Classes are exploited by conservativeT/ O schedulers as follows. Consider rw syn-chronization and suppose scheduler Swants to output a dm-read(x). Instead ofwaiting for dm-writes with smaller time-stamps from all TMs, S need only waitfor dm-writes from those TMs whose classwriteset contains x. Similarly, to process adm-write (x), S need only wait for dm-readswith smaller timestamp from those TMswhose class readset contains x. Thus com-munication requirements are decreased,and the level of concurrency