Database Replication in Tashkent

73
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety

description

Database Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety. Replication for Performance. Expensive Limited scalability. DB Replication is Challenging. Single database system Large, persistent state Transactions Complex software Replication challenges - PowerPoint PPT Presentation

Transcript of Database Replication in Tashkent

Page 1: Database Replication in Tashkent

Database Replication in Tashkent

CSEP 545 Transaction ProcessingSameh Elnikety

Page 2: Database Replication in Tashkent

Replication for PerformanceExpensiveLimited scalability

2

Page 3: Database Replication in Tashkent

DB Replication is Challenging• Single database system

– Large, persistent state– Transactions– Complex software

• Replication challenges– Maintain consistency – Middleware replication

3

Page 4: Database Replication in Tashkent

BackgroundReplica 1StandaloneDBMS

4

Page 5: Database Replication in Tashkent

Background

Replica 2

Replica 1

Replica 3

Load Balancer

5

Page 6: Database Replication in Tashkent

Read Tx

Replica 2

Replica 1

Replica 3

Load Balancer

T

Read tx does not change DB state

6

Page 7: Database Replication in Tashkent

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Twsws

7

Page 8: Database Replication in Tashkent

Update tx changesDB state

Update Tx 1/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Apply (or commit) T everywhere

ws

ws

ws

Example:T1: { set x = 1 }

8

ws

Page 9: Database Replication in Tashkent

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer

Tws

Update tx changesDB state

ws

Tws

ws

9

Page 10: Database Replication in Tashkent

Ordering

Update Tx 2/2

Replica 2

Replica 1

Replica 3

Load Balancer T

Update tx changesDB state

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3 Example:T1: { set x = 1 }T2: { set x = 7 }

Commit updates in order

10

Page 11: Database Replication in Tashkent

Ordering

Sub-linear Scalability Wall

Replica 2

Replica 1

Replica 3

Load Balancer T

T

ws

ws

ws ws

ws

ws

ws

ws

Replica 3

11

Replica 4

Page 12: Database Replication in Tashkent

• General scaling techniques– Address fundamental bottlenecks– Synergistic, implemented in middleware– Evaluated experimentally

This Talk

12

Page 13: Database Replication in Tashkent

Super-linear Scalability

Single Base United MALB UF0

20

40

60

80

100

120

TP S

12 X

25 X

37 X

1 X

7 X

Page 14: Database Replication in Tashkent

Big Picture: Let’s OversimplifyStandaloneDBMSR reading

updatelogging

U

14

Page 15: Database Replication in Tashkent

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

StandaloneDBMSR reading

updatelogging

U

N.RN.U

RU

(N-1).ws

15

Page 16: Database Replication in Tashkent

readingupdatelogging

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

16

R readingupdatelogging

U

N.RN.U

RU

(N-1).ws

N.RN.U

R*U*

(N-1).ws*

Page 17: Database Replication in Tashkent

readingupdatelogging

readingupdatelogging

Big Picture: Let’s Oversimplify

Replica 1/N (traditional)

Replica 1/N (optimized)

StandaloneDBMS

17

R readingupdatelogging

U

N.RN.U

RU

(N-1).ws

N.RN.U

R*U*

(N-1).ws*

MALBUpdate FilteringUniting O & D

Page 18: Database Replication in Tashkent

Key Points1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

18

Page 19: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

12, 3Ordering

Load balancing

Update propagation

Commit updates in

order 19

Page 20: Database Replication in Tashkent

• Traditionally: – Commit ordering and durability are separated

• Key idea: – Unite commit ordering and durability

Key Idea

20

Page 21: Database Replication in Tashkent

All Replicas Must Agree• All replicas agree on

– which update tx commit– their commit order

• Total order – Determined by middleware – Followed by each replica

durability

Replica 3

Tx A

Tx Bdurability

Replica 2

durability

Replica 1

21

Page 22: Database Replication in Tashkent

Tx B

durability

Replica 3

Ordering

Tx A

Order Outside DBMSTx A

Tx Bdurability

Replica 2

durability

Replica 1

22

Page 23: Database Replication in Tashkent

Tx B

durability

Replica 3

Ordering

Tx A

A B

A B

Order Outside DBMSTx A

Tx Bdurability

Replica 2

A B

durability

Replica 1

A B

A B

A B

A B

23

Page 24: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

Enforce External Commit Order

24

Page 25: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

B A

Enforce External Commit Order

25

Page 26: Database Replication in Tashkent

Ordering

A B DBMS

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

B A

Cannot commit A & B concurrently!

Enforce External Commit Order

26

Page 27: Database Replication in Tashkent

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

A

Enforce Order = Serial Commit

DBMS

27

Page 28: Database Replication in Tashkent

Ordering

A B

durability

Replica 3

Proxy

Tx A

Tx B

SQL interface

Task A

Task B

A B

Enforce Order = Serial Commit

DBMS

28

Page 29: Database Replication in Tashkent

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

29

Page 30: Database Replication in Tashkent

Commit Serialization is Slow

DurabilityA

Proxy

DBMS

durability

CPU

OrderingA B C

Commit orderA B C

DurabilityA B

CPU

DurabilityA B C

CPU

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Problem: Durability & ordering separated → serial disk writes

30

Page 31: Database Replication in Tashkent

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

31

Page 32: Database Replication in Tashkent

Com

mit A

Com

mit B

Com

mit C

Ack

A

Ack

B

Ack

C

Unite D. & O. in Middleware

Proxy

DBMS

CPU

OrderingA B C

Commit orderA B C

CPU

DurabilityA B C

CPU

durabilityOFF

durability

Solution: Move durability to MW Durability & ordering in middleware → group commit

32

Page 33: Database Replication in Tashkent

• Middleware logs tx effects– Durability of update tx

• Guaranteed in middleware• Turn durability off at database

• Middleware performs durability & ordering– United → group commit → fast

• Database commits update tx serially– Commit = quick main memory operation

Implementation: Uniting D & O in MW

33

Page 34: Database Replication in Tashkent

Uniting Improves Throughput• Metric

– Throughput• Workload

– TPC-W Ordering (50% updates)

• System– Linux cluster – PostgreSQL– 16 replicas– Serializable exec. Single Base United MALB UF

0

5

10

15

20

25

30

35

40

TPC-W

1 X

12 X

7 X

TP S

Page 35: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1Ordering

2, 3

Load balancing

Update propagation

Commit updates in

order 35

Page 36: Database Replication in Tashkent

Key IdeaReplica 1

Mem

Disk

Replica 2Mem

Disk

Load Balancer

Equal load on replicas

36

Page 37: Database Replication in Tashkent

Key IdeaReplica 1

Mem

Disk

Replica 2Mem

Disk

Load Balancer

Equal load on replicas

MALB: (Memory-Aware Load Balancing)Optimize for in-memory execution 37

Page 38: Database Replication in Tashkent

How Does MALB Work?Database 21 3

Workload A →

B →

MemMemory

21

2 3

38

Page 39: Database Replication in Tashkent

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1Mem

Disk21 3

Replica 2Mem

Disk21 3

LeastLoaded

31

A →

B →21

2 3

39

Page 40: Database Replication in Tashkent

A, B, A, B

A, B, A, B

Read Data From Disk

A, B, A, B

Replica 1Mem

Disk21 3

Replica 2Mem

Disk21 3

LeastLoaded

31

Slow

Slow

A →

B →21

2 3

40

21 331

21 331

Page 41: Database Replication in Tashkent

Data Fits in MemoryReplica 1

Mem

Disk21 3

Replica 2Mem

Disk21 3

MALB

A →

B →21

2 3A, A, A, A

B, B, B, B

A, B, A, B

41

Page 42: Database Replication in Tashkent

Data Fits in MemoryReplica 1

Mem

Disk21 3

21

Replica 2Mem

Disk21 3

32

MALB

Fast

Fast

A →

B →21

2 3A, A, A, A

B, B, B, BMemory info?Many tx and replicas?

A, B, A, B

42

Page 43: Database Replication in Tashkent

• Exploit tx execution plan– Which tables & indices are accessed– Their access pattern

• Linear scan, direct access• Metadata from database

– Sizes of tables and indices

Estimate Tx Memory Needs

43

Page 44: Database Replication in Tashkent

• Objective– Construct tx groups that fit together in memory

• Bin packing– Item: tx memory needs– Bin: memory of replica– Heuristic: Best Fit Decreasing

• Allocate replicas to tx groups– Adjust for group loads

Grouping Transactions

44

Page 45: Database Replication in Tashkent

MALB in Action

A B CD E F

MALB

45

Page 46: Database Replication in Tashkent

MALB in Action

A B CD E F

MALB

Memory needs forA, B, C, D, E, F

46

Page 47: Database Replication in Tashkent

Group A

MALB in Action

A B CD E F Group B C

Group D E F

MALB

Memory needs forA, B, C, D, E, F

47

Page 48: Database Replication in Tashkent

Group A

MALB in Action

A B CD E F Replica

Replica

Replica

Group B C

A

Group D E F

B C

D E F

MALB

Disk

Disk

Disk

Memory needs forA, B, C, D, E, F

48

Page 49: Database Replication in Tashkent

• Objective– Optimize for in-memory execution

• Method– Estimate tx memory needs– Construct tx groups– Allocate replicas to tx groups

MALB Summary

49

Page 50: Database Replication in Tashkent

• Implementation– No change in consistency – Still middleware

• Compare– United: efficient baseline system– MALB: exploits working set information

• Same environment– Linux cluster running PostgreSQL– Workload: TPC-W Ordering (50% update txs)

Experimental Evaluation

50

Page 51: Database Replication in Tashkent

MALB Doubles Throughput

TPC-WOrdering16 replicas

51

Single Base United MALB UF0

20

40

60

80

100

120

TP S

105%

12 X

25 X

1 X

7 X

Page 52: Database Replication in Tashkent

MALB Doubles Throughput

52United MALB

0.0

0.2

0.4

0.6

0.8

1.0

Single Base United MALB UF0

20

40

60

80

100

120

TP S

Rea

d I/O

, nor

mal

ized

105%

12 X

25 X

1 X

7 X

Page 53: Database Replication in Tashkent

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%0%29%

48%105%45%

182%75%12%

Page 54: Database Replication in Tashkent

BigSmall

Big

Small

MemSize

DBSize

Big Gains with MALB

4%0%29%

48%105%45%

182%75%12%

Run from memory

Run from disk

Page 55: Database Replication in Tashkent

Tx A

Roadmap

Replica 2

Replica 1

Replica 3

Load Balancer

1Ordering

2, 3

Load balancing

Update propagation

Commit updates in

order 55

Page 56: Database Replication in Tashkent

• Traditional: – Propagate updates everywhere

• Update Filtering: – Propagate updates to where they are needed

Key Idea

56

Page 57: Database Replication in Tashkent

Update Filtering ExampleReplica 1

Mem

Disk21 3

Replica 2Mem

Disk21 3

MALBUF

A →

B →21

2 3

A, B, A, B

57

Page 58: Database Replication in Tashkent

Group A

Update Filtering ExampleReplica 1

Group B

Mem

Disk21 3

21

Replica 2Mem

Disk21 3

32

MALBUF

A →

B →21

2 3

A, B, A, B

58

Page 59: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

59

Page 60: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

60

Page 61: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk2 3

2

MALBUF

Updatetable 1

3

3

A →

B →21

2 3

A, B, A, B

611

Page 62: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

62

Page 63: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

63

Page 64: Database Replication in Tashkent

Group A

Update Filtering Example

Disk

Replica 1

Group B

Mem

21

21

Replica 2Mem

Disk21 3

2

MALBUF

Updatetable 1

3

Updatetable 3

3

A →

B →21

2 3

A, B, A, B

64

Page 65: Database Replication in Tashkent

Update Filtering in Action

UF

65

Page 66: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

66

Page 67: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

67

Page 68: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

68

Page 69: Database Replication in Tashkent

Update Filtering in Action

UF

Update tored table

Update togreen table

69

Page 70: Database Replication in Tashkent

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput37 X

TP S

12 X

25 X

1 X

7 X

49% TPC-WOrdering16 replicas

Page 71: Database Replication in Tashkent

MALB UF0

2

4

6

8

10

12

14

Single Base United MALB UF0

20

40

60

80

100

120

MALB+UF Triples Throughput37 X

TP S

12 X

25 X

1 X

7 X Prop

. Upd

ates

15

7

49%

Page 72: Database Replication in Tashkent

1.49

0

0.5

1

1.5

2

MALB MALB+UF

Filtering Opportunities

50%Ordering Mix

5% Browsing Mix

1.02

0

0.5

1

1.5

2

MALB MALB+UF

Updates

Rat

io M

ALB

+UF

/ MA

LB

72

Page 73: Database Replication in Tashkent

Conclusions1. Commit updates in order

– Perform serial synchronous disk writes– Unite ordering and durability

2. Load balancing– Optimize for equal load: memory contention– MALB: optimize for in-memory execution

3. Update propagation– Propagate updates everywhere– Update filtering: propagate to where needed

73