Google Megastore & Google Spanner - Harvard...

50
Google Megastore & Google Spanner

Transcript of Google Megastore & Google Spanner - Harvard...

Page 1: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Google Megastore & Google Spanner

Page 2: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

The ProblemI have a great app idea, but I don’t want to learn much about databases. Can’t it just be easy and scalable and reliable?

Page 3: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Because that’s impossible!

Consistency Availability

Scalability

NoSQLRDBMS

Page 4: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 5: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 6: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 7: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Google Megastore

Highly ScalableRapid

Development

Low Latency

ConsistentHighly Available

Page 8: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

BigTable, Google’s distributed KV store“anchor:cnnsicom” “anchor:my.look.ca”

“CNN” “CNN.com”t3

t8t4t5

t9“<html>...”

“contents:”

Page 9: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Megastore

BigTable

GFS

2, 14, 37....

58,90,102..

700, 706..

2, 14, 37....

58,90,102..

700, 706..

2, 14, 37....

58,90,102..

700, 706..

Entity Groups

Datacenters

Page 11: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Paxos Basics - Read and Write

Page 12: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Read, Phase 1 - Reading replica polls other replicas

Page 13: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Read, Phase 2 - Available replicas respond with their values

Page 14: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

0

0

0

0

0

0

0

0

Write, Prepare - Writing replica asks other replicas to conduct vote

Page 15: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

0

0

0

0

0

0

0

0

Write, Promise -- Available replicas promise to ignore lower proposals

Page 16: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

0, 3

0, 3

0, 3

0, 3

0, 3

0, 3

0, 3

0, 3

Write, Accept - Writing replica proposes its value

Page 17: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Write, Commit - Available replicas accept the value

Page 18: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Paxos Basics -- Write Conflicts

Page 19: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

0

0

0

1

1

1

1

1

Prepare - 2 writing replicas want to make proposals

Page 20: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

0

0

0

1

1

1

1

1

Promise

Page 21: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

1

1

1

1

1

1

1

1

More prepares - Lower numbered proposals get rejected / replaced

0

0

0

0

Page 22: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

1

1

1

1

1

More Promises

1

1

1

Page 23: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Accept - Only 1 replica gets the majority of promises required

1, 3

1, 3

1, 3

1, 3

1, 3

1, 3

1, 3

1, 3

Page 24: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Commit

Page 25: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

How could Paxos be made more efficient?

Page 26: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Local Reads via a “coordinator”

Page 27: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Local replica up-to-date

Page 28: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Local replica out-of-date

Page 29: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Local replica out-of-date - normal Paxos read

Page 30: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Local replica out-of-date - update replica and coordinator

Page 31: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

b

0, 3, b

0, 3, b

0, 3, b

0, 3, b

0, 3, b

0, 3, b

0, 3, b

0, 3, b

Faster Writes: Accept & Prepare in 1 communication

Page 32: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

b

Writing replica requests to be proposal 0

8

Page 33: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

b

Immediate Accept & Prepare for next round

0, 8, c

0, 8, c0, 8, c

0, 8, c

0, 8, c

0, 8, c

0, 8, c0, 8, c

Page 34: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Megastore Performance Analysis

Page 35: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Performance AnalysisOur experimental setup is that we’ve used Megastore for 100+ applications for several years and it works!

Page 36: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 37: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 38: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

What are some drawbacks to this solution?

Page 39: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Google Megastore

Highly ScalableRapid

Development

Low Latency

ConsistentHighly Available

Page 40: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Google Spanner

Highly ScalableRapid

Development

Low Latency

ConsistentHighly Available

Page 41: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

What time is it?

TT.now() → TTinterval:

Absolute time Upper boundLower bound

2016-02-10, 4:35:32

2016-02-10, 4:35:33

2016-02-10, 4:35:312016-02-10, 4:35:30

Page 42: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

timestamp_0

Improved Paxos: Serialization with Lockingtimestamp _0

timestamp _0

timestamp _0

timestamp _0

timestamp _0

timestamp _0

timestamp _0

Page 43: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Improved Paxos: Long Leader Leases

ts_0

ts _0

ts_0

ts_0

ts _0

ts _0ts _0

ts_1

ts_1

ts_1

ts_1

ts_1

ts_1

ts_1

Page 44: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Spanner Performance Analysis

Page 45: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Setup

Page 46: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 47: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?
Page 48: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Next steps?

Page 49: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

Supporting more complex SQL queries with an underlying key-value structure

key, value…………...…...

ID, name, age, address...……………...

Page 50: Google Megastore & Google Spanner - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/... · 2019-05-13 · databases. Can’t it just be easy and scalable and reliable?

The Future

SQL NoSQL

NewSQL