Advanced Transaction Management Chapter 13. © Jim Gray, Andreas Reuter Transaction Processing -...

32
Advanced Transaction Management Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP m ons & ORBs Logging & res. M gr. Files& BufferM gr. Structured files 11:00 Reliability Locking theory Res. M gr. & Trans. M gr. COM + A ccesspaths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP G roupw are 15:30 Transaction models Q ueueing A dvanced Trans. M gr. Replication Perform ance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Chapter 13

Transcript of Advanced Transaction Management Chapter 13. © Jim Gray, Andreas Reuter Transaction Processing -...

Page 1: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

Advanced Transaction Management

Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &

terminologyTP mons& ORBs

Logging &res. Mgr.

Files &Buffer Mgr.

Structuredfiles

11:00 Reliability Lockingtheory

Res. Mgr. &Trans. Mgr.

COM+ Access paths

13:30 Faulttolerance

Lockingtechniques

CICS & TP& Internet

CORBA/EJB + TP

Groupware

15:30 Transactionmodels

Queueing AdvancedTrans. Mgr.

Replication Performance& TPC

18:00 Reception Workflow Cyberbricks Party FREE

Chapter 13

Page 2: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication

Page 3: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 3

Mixing Transaction Managers

Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de facto TP standard

X/Open + OSI/TP : The de jure TP standard. OTS: The CORBA standard TIP: De facto interoperability standard Almost everyone interoperates with LU6.2 LU6.2 has evolved to have presumed abort, not reuse

aborted trids, .. other fixes LU6.2 is "open" two phase commit, documented

interface, reconnection / resolve is documented. Internally, everyone uses private protocols with many

tricks.

Page 4: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 4

Mixing "OLD" Transaction Managers

Many old TP monitors are not open: Do not expose 2PC (prepare() and commit()) => insist on being root commit coordinator.

All will become X/Open-compliant eventually and thus be open TP monitors.

If stuck with an "closed" TM: Can still get atomicity if: 1. Only one closed TM involved. 2. TM is direct not queued

Page 5: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 5

Mixing with a Closed Transaction Manager

All "open" TMs and RMs prepared, closed TM does "RUMP"

deferred_update(int id, complex_type list_of_updates) /* rump logic */

{Begin_Work(); /* start a new transaction */

select count(*) from done where id = :id; /* test if work was done */

if not found then /* if not done */

do list_of_updates; /* then do the list of updates.*/

insert into done values (:id); /* flag transaction done */

Commit_Work(); /* commit update and flag */

acknowledge; /* reply success to caller */

} /* in both cases. */

Status_Transaction(TRID trid)

{ select count(*) into :ans from done where trid = :trid; return ans:}

Transaction Gateway to Closed Transaction Mgr

If Not duplicate Do transaction Insert trid in done table Commit Acknowledge

Do Transaction While not acknowledge Send trid + data Wait

Done Table

Page 6: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 6

Mixing Open Transaction Managers

Gateway translates between external and internal TRID. Gateway translates between external and internal protocols Participates in transaction resolution (is a TM in both worlds)

Local Protocol

Transaction Gateway

OSI Protocol Stack

"Foreign" Transaction Managers

"Our" Transaction

Manager

his trid our tridTrid Map Table

Page 7: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 7

Mixing Open Transaction Managers

Multiple entry problem:TRID enters system twice at two different paths."works" but looks like two separate transactions.commit dependency is external to system.

Fancy option problem:External/internal TM has an option the other does not.Fakes (or turn off) optimizations/options not supported by one side or the other

Page 8: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 8

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication

Page 9: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 9

Non-Blocking Commit

The problem: what if the coordinator fails.

Solutions: 1. wait

2. appoint a new coordinator

Appointment can be thought of as a process pair (n-plex)

Works great in a cluster (no communications failures).

Primary Backup Participants

Prepare (+ list of participants and sessions) ack

Prepare

Prepared

Commitack

Commit

Committed

Write Commit Log Record

Log

Completeack Write "Complete" Log Record

Process Pair

Page 10: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 10

Non-Blocking Commit in a WAN: 3 or Heuristic or Operator Command

Wide area net can partitionProcess pairs cannot reliably decide to take over.Solution(s): 1. Three phase protocol

Broadcast participant list and decision as part of phase 1.5; let (majority) of participants decide if coordinator fails.

2. Heuristic decisionsDefault to commit/abort.Announce Heuristic Mismatch at reconnect if wrong guess

3. Human decisionAnnounce Operator Mismatch at reconnect if wrong guess.

Page 11: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 11

Transfer of Commit

What if a participant

is more secure than the coordinator?

is more reliable than the coordinator?

Is faster than the coordinator?

Transfer commit authority to him?

Gas Pump

LA Bank

VisaSF Bank

Gas Pump

LA Bank

VisaSF Bank

Page 12: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 12

Transfer of Commit

Is also an optimization:

saves messages if done as part of commit.

called nested commit protocol

or last resource manager optimization

2 messages vs 5 messages (plus one lazy msg)

Begin Dequeue Prepare doit

Enqueue Commit_Work()Phase 2 Commit

Begin Dequeue doit

Enqueue

Phase 2 CommitCommit

Prepare

No Transfer of Commit Transfer of Commit

complete

complete

Commit_Work()

work request

work request + You are Root!

Page 13: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 13

Transfer of Commit: More Complex Case

More complex if the root has more than one branch:

Need to set up new sessions among "trusted" nodes

root sends new root name to all participants at phase 1

Lybia

US

Deutschland

Page 14: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication

Page 15: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 15

Optimizing Commit

Can optimize:Delay: milliseconds/commitMessage cost: number, size, urgency of messagesIO cost: number, size, or urgency of IOCPU cost: cycles usedThroughput: maximum commit rate.

Page 16: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 16

Commit: the General Case

Prepare(): 1 rpc or message pair per RM and one per non-root TM1 forced IO per RM (prepare record)1 forced IO per TM(commit record)

Commit(): The same.Summary of 2PC cost:

IO: 2(RM+TM)RPCs: 2(RM+(TM-1))Messages: 4(RM+(TM-1)) (equivalent to RPCs)Delay: 2IO ~ 50ms ~ 10Kins.

4 msg ~ 20ms ~ 50Kins50ms*(RM+TM) + 20ms*(RM+TM-1)

These are the error-free counts (i.e. the minimum values)

Page 17: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 17

Commit: Simple Optimizations

Presumed abort saves a TM IO (implicit in protocol above)

Do phase 1, phase2 in parallel (saves delay)

Common log (saves RM log forces)

IO: 2(TM)

Messages: 4(RM+TM-1) (equivalent to RPCs)

Delay: 2*IO*TM + 4*M*(RM+TM-1)

~50ms*TM+40ms*(RM+TM-1)

Use Local RPC (10x faster)

~50ms*TM + RM+40ms*(TM-1)

Use WADS for low IO latency(3ms vs 25ms)

~ 6ms*TM + RM + 40ms*(TM-1)

Simple case of 1 TM 2 RM:

~ 8ms delay for a commit.

Page 18: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 18

Group Commit Optimization

Amortizes IO and messages across several transactions

Adds delay

If N transactions in a group:

IO, Message cost per transaction is ~ 1/N

Small extra delay if one slow step in original path.

As system heats up (commit rate rises) to 25tps

start to install group commit with a 30ms threshold

(at 100tps: 3.3 trans/group).

Page 19: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 19

Simple Commit Optimizations

Read-only: just get phase1 call to release locks.

Note: may violate ACID, should release read locks

at phase 2 if any locks acquired during phase 1.

Saves messages (Phase 2) and IO (no RM IO).

True read-only transaction must prepare at phase 1

unlock at phase 2.

Unjoin: RM does no work at commit/abort.

Lazy: user-requested group commit. Piggybacks on others.

no extra IO or messages.

Page 20: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 20

Transaction Commit Trees

one node deep bush general case

share log transfer Parallel ParallelLRPC commit transfer transfer

.

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

Page 21: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 21

Transfer of COMMIT: Linear COMMIT

Parent and other sub-trees prepare

then transfer commit authority to remaining child.

Last in chain becomes commit coordinator.

More delay, fewer messages

For N=2, Same delay, 3 vs 4 messages.

Always use it.TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

TM

RM

Page 22: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 22

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication

Page 23: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 23

Disaster Recovery at a Remote Site

Replicate Data

Applications

Network connection at 2 (or more sites)

Symmetric design:

Either site can process transactions

Asymmetric design:

One site is master of each data item.

Allows: Caching

Batching of updates at backup

So far, asymmetric design is most popular.

To get symmetry, have each node master 1/2 of the db/net.

Page 24: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 24

Sample Physical LOG RECORD

Basic idea of asymmetric design:

send log from primary to backup

backup applies log to its copy

backup is in constant media recovery

backup processes/sessions/data ready to take over

Client

Primary Backuplog

Session

System Pair

Clients

Primary Backuplog

Symmetric:Two System

Pairs

System PairsBasic Idea

Primary Backup log

Primary

Hub:Central Site Backs

upSeveral Primaries

client Client

Primary

Backup

log &archivedumps

Vault:Backup stores Log

andArchive Dumps

client

Backup

Primary Primary

client

Page 25: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 25

Sample Physical LOG RECORD

Need some way to decide failure.

Easy in a cluster

Hard in a WAN (partition possible)

Solutions: Extra wires

Wires on demand (dialup)

Human (operator)

Quorum device.

Kind of log?

Logical log is best

loose coupling (allows backup to be a different TM/RM

failure independence (different from physiological log)

Page 26: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 26

Takeover Logic

/* initialization */

Tell primary I'm here

Setup all RMs and application processes

Open all initial sessions to clients.

/* the main backup loop */

While (not primary) {redo log} /* the main backup loop */

/* Takeover */

redo rest of log

resend most recent message on each session

abort any incomplete transactions

/* Become Primary */

tell application processes to start accepting requests.

Page 27: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 27

Session Takeover

Just like process pairs Session sequence numbers eliminate duplicates So, get at-least-once delivery: resend msg at takeover

Primary Backup

Network Switches Clients

OSI, SNA,TCP/IP, X..25,etc

Primary Backup

Front Ends Switch Clients

OSI, SNA,TCP/IP, X..25,etc

Page 28: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 28

Catch-up After Failure

Failed node at restart executes normal restart

Then enters backup logic.

If both fail, outside observer must say who is best

backup has to match its log to new primary.

Design issue: are nodes bit-for-bit identical?

If so, backup must “trim” log to match primary.

Page 29: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 29

How Safe?

1-SAFE: no extra delay, risks lost transactions

2-SAFE: extra delay (if backup up),

single fault tolerant, high availability

VERY-SAFE: extra delay, no lost transactionslow availability

client

commitcommitok

client

commitcommit

client

commit

commitok

client

out of service

client

commit

commitok

client

commitcommit

primary backup primary backup

Both Up Primary Up, Backup Down

1-Safe

2-Safe

Very Safe

Page 30: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 30

System Pairs vs Replicated Data

System pairs replicate the application DB application processes sessions

Data replicators only replicate data.

Other aspects left as an exercise for the application designer.

Page 31: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 31

System Pair Benefits

Tolerates faultsHardwareEnvironmentOperationsHeisenbugs

Can replace software/hardware onlineCan move backup to new building or...Allows design diversity: backup can be completely different

S tep 1 : Bo th sy stems are ru n n in g v ersio n V1 . S tep 2 : Back u p is co ld -lo ad ed as v ersio n V2 .

S tep 3 : S WITCH to Back u p . S tep 4 : Back u p is co ld -lo ad ed as v ersio n V2

PrimaryV1

BackupV1

PrimaryV1

BackupV2

V1

Backup

V2

PrimaryV2

Backup

V2

Primary

Page 32: Advanced Transaction Management Chapter 13.  © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2.

© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 32

Outline

Mixing heterogeneous TMs

High-Availability Commit & Transfer of Commit

Optimizing Commit

Disaster Protection via Data/Application Replication