Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a...
Transcript of Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a...
![Page 1: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/1.jpg)
Data-Intensive Distributed Computing
Part 7: Mutable State (2/2)
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United StatesSee http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
CS 451/651 (Fall 2018)
Jimmy LinDavid R. Cheriton School of Computer Science
University of Waterloo
November 13, 2018
These slides are available at http://lintool.github.io/bigdata-2018f/
![Page 2: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/2.jpg)
The Fundamental Problem
We want to keep track of mutable state in a scalable manner
MapReduce won’t do!
Assumptions:State organized in terms of logical records
State unlikely to fit on single machine, must be distributed
![Page 3: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/3.jpg)
Motivating Scenarios
Money shouldn’t be created or destroyed:Alice transfers $100 to Bob and $50 to Carol
The total amount of money after the transfer should be the same
Phantom shopping cart:Bob removes an item from his shopping cart…
Item still remains in the shopping cartBob refreshes the page a couple of times… item finally gone
![Page 4: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/4.jpg)
Motivating Scenarios
People you don’t want seeing your pictures:Alice removes mom from list of people who can view photos
Alice posts embarrassing pictures from Spring BreakCan mom see Alice’s photo?
Why am I still getting messages?Bob unsubscribes from mailing list and receives confirmation
Message sent to mailing list right after unsubscribeDoes Bob receive the message?
![Page 5: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/5.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Why do these scenarios happen?
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 6: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/6.jpg)
Source: Wikipedia (Cake)
![Page 7: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/7.jpg)
Morale of the story: there’s no free lunch!
Source: www.phdcomics.com/comics/archive.php?comicid=1475
(Everything is a tradeoff)
![Page 8: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/8.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Why do these scenarios happen?
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 9: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/9.jpg)
Relational Databases
… to the rescue!
Source: images.wikia.com/batman/images/b/b1/Bat_Signal.jpg
![Page 10: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/10.jpg)
How do RDBMSes do it?
Partition tables to keep transactions on a single machineExample: partition by user
What about transactions that require multiple machines?Example: transactions involving multiple users
Transactions on a single machine: (relatively) easy!
Solution: Two-Phase Commit
![Page 11: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/11.jpg)
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
YES
Good.COMMIT!
ACK!
ACK!
ACK!
DONE!
2PC: Sketch
![Page 12: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/12.jpg)
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
NO
ABORT!
2PC: Sketch
![Page 13: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/13.jpg)
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
YES
Good.COMMIT!
ACK!
ACK!
2PC: Sketch
![Page 14: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/14.jpg)
2PC: Assumptions and Limitations
Assumptions:Persistent storage and write-ahead log at every node
WAL is never permanently lost
Limitations:It’s blocking and slow
What if the coordinator dies?
![Page 15: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/15.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Why do these scenarios happen?
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 16: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/16.jpg)
Replication possibilities
Update sent to a masterReplication is synchronousReplication is asynchronous
Combination of both
Update sent to an arbitrary replica
Okay, but if the
master fails?
Replication is synchronous(?)Replication is asynchronous
Combination of both
![Page 17: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/17.jpg)
Distributed ConsensusMore general problem: addresses replication and partitioning
Time
… Paxos
Hi everyone, let’s change
the value of x.Hi everyone,
let’s execute a transaction t.
![Page 18: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/18.jpg)
Replication possibilities
Update sent to a masterReplication is synchronousReplication is asynchronous
Combination of both
Update sent to an arbitrary replica
Okay, but if the
master fails?
Replication is synchronous(?)Replication is asynchronous
Combination of both
Guaranteed consistency with a consensus protocolA buggy mess
“Eventual Consistency”
![Page 19: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/19.jpg)
Consistency
Availability
(Brewer, 2000)
Partition tolerance
… pick two
CAP “Theorem”
![Page 20: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/20.jpg)
CAP Tradeoffs
CA = consistency + availabilityE.g., parallel databases that use 2PC
AP = availability + tolerance to partitionsE.g., DNS, web caching
![Page 21: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/21.jpg)
Wait a sec, that doesn’t sound right!
Source: Abadi (2012) Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2):37-42
Is this helpful?
CAP not really even a “theorem” because vague definitionsMore precise formulation came a few years later
![Page 22: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/22.jpg)
Abadi Says…
CAP says, in the presence of P, choose A or CBut you’d want to make this tradeoff even when there is no P
Fundamental tradeoff is between consistency and latencyNot available = (very) long latency
CP makes no sense!
![Page 23: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/23.jpg)
Move over, CAP
PACIf there’s a partition, do we choose A or C?
ELCOtherwise, do we choose Latency or Consistency?
PACELC (“pass-elk”)
![Page 24: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/24.jpg)
At the end of the day…
Guaranteed consistency with a consensus protocol
A buggy mess
“Eventual Consistency”
Sounds reasonable in theory…
What about in practice?
![Page 25: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/25.jpg)
Morale of the story: there’s no free lunch!
Source: www.phdcomics.com/comics/archive.php?comicid=1475
(Everything is a tradeoff)
![Page 26: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/26.jpg)
h = 0h = 2n – 1
Machine fails: What happens?
Solution: ReplicationN = 3, replicate +1, –1
Covered!
Covered!
![Page 27: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/27.jpg)
Image Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
HBase
![Page 28: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/28.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Why do these scenarios happen?
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 29: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/29.jpg)
Source: www.facebook.com/note.php?note_id=23844338919
MySQL
memcached
Read path:Look in memcachedLook in MySQLPopulate in memcached
Write path:Write in MySQLRemove in memcached
Subsequent read:Look in MySQLPopulate in memcached
Facebook Architecture
![Page 30: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/30.jpg)
1. User updates first name from “Jason” to “Monkey”.
2. Write “Monkey” in master DB in CA, delete memcached entry in CA and VA.
3. Someone goes to profile in Virginia, read VA replica DB, get “Jason”.
4. Update VA memcache with first name as “Jason”.
5. Replication catches up. “Jason” stuck in memcached until another write!
Source: www.facebook.com/note.php?note_id=23844338919
MySQL
memcached
California
MySQL
memcached
Virginia
Replication lag
Facebook Architecture: Multi-DC
![Page 31: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/31.jpg)
Source: www.facebook.com/note.php?note_id=23844338919
= stream of SQL statements
Solution: Piggyback on replication stream, tweak SQLREPLACE INTO profile (`first_name`) VALUES ('Monkey’)WHERE `user_id`='jsobel' MEMCACHE_DIRTY 'jsobel:first_name'
Facebook Architecture: Multi-DC
MySQL
memcached
California
MySQL
memcached
Virginia
Replication
![Page 32: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/32.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Why do these scenarios happen?
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 33: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/33.jpg)
Source: Google
Now imagine multiple datacenters…What’s different?
![Page 34: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/34.jpg)
tl;dr -
Implement a global consensus protocol for every transactionGuarantee consistency, but slow
Eventual consistencyWho knows?
Single row transactionsEasy to implement, obvious limitations
![Page 35: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/35.jpg)
tl;dr -
Implement a global consensus protocol for every transactionGuarantee consistency, but slow
Eventual consistencyWho knows?
Can we cheat a bit?
Single row transactionsEasy to implement, obvious limitations
![Page 36: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/36.jpg)
tl;dr -
Implement a global consensus protocol for every transactionGuarantee consistency, but slow
Per partition✗ ✗And fast!
Entity groupsGroups of entities that share affinity
Example: user + user’s photos + user’s posts etc.
![Page 37: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/37.jpg)
Figure 1: Scalable Replication
Figure 2: Operations Across Entity Groups
replicated via Paxos). Operations across entity groups couldrely on expensive two-phase commits, but typically leverageMegastore’s efficient asynchronous messaging. A transac-tion in a sending entity group places one or more messagesin a queue; transactions in receiving entity groups atomicallyconsume those messages and apply ensuing mutations.Note that we use asynchronous messaging between logi-
cally distant entity groups, not physically distant replicas.All network traffic between datacenters is from replicatedoperations, which are synchronous and consistent.Indexes local to an entity group obey ACID semantics;
those across entity groups have looser consistency. See Fig-ure 2 for the various operations on and between entity groups.
2.2.2 Selecting Entity Group BoundariesThe entity group defines the a priori grouping of data
for fast operations. Boundaries that are too fine-grainedforce excessive cross-group operations, but placing too muchunrelated data in a single group serializes unrelated writes,which degrades throughput.The following examples show ways applications can work
within these constraints:
Email Each email account forms a natural entity group.Operations within an account are transactional andconsistent: a user who sends or labels a message isguaranteed to observe the change despite possible fail-over to another replica. External mail routers handlecommunication between accounts.
Blogs A blogging application would be modeled with mul-tiple classes of entity groups. Each user has a profile,which is naturally its own entity group. However, blogs
are collaborative and have no single permanent owner.We create a second class of entity groups to hold theposts and metadata for each blog. A third class keysoff the unique name claimed by each blog. The appli-cation relies on asynchronous messaging when a sin-gle user operation affects both blogs and profiles. Fora lower-traffic operation like creating a new blog andclaiming its unique name, two-phase commit is moreconvenient and performs adequately.
Maps Geographic data has no natural granularity of anyconsistent or convenient size. A mapping applicationcan create entity groups by dividing the globe into non-overlapping patches. For mutations that span patches,the application uses two-phase commit to make thematomic. Patches must be large enough that two-phasetransactions are uncommon, but small enough thateach patch requires only a small write throughput.Unlike the previous examples, the number of entitygroups does not grow with increased usage, so enoughpatches must be created initially for sufficient aggre-gate throughput at later scale.
Nearly all applications built on Megastore have found nat-ural ways to draw entity group boundaries.
2.2.3 Physical LayoutWe use Google’s Bigtable [15] for scalable fault-tolerant
storage within a single datacenter, allowing us to supportarbitrary read and write throughput by spreading operationsacross multiple rows.
We minimize latency and maximize throughput by let-ting applications control the placement of data: through theselection of Bigtable instances and specification of localitywithin an instance.
To minimize latency, applications try to keep data nearusers and replicas near each other. They assign each entitygroup to the region or continent from which it is accessedmost. Within that region they assign a triplet or quintupletof replicas to datacenters with isolated failure domains.
For low latency, cache efficiency, and throughput, the datafor an entity group are held in contiguous ranges of Bigtablerows. Our schema language lets applications control theplacement of hierarchical data, storing data that is accessedtogether in nearby rows or denormalized into the same row.
3. A TOUR OF MEGASTOREMegastore maps this architecture onto a feature set care-
fully chosen to encourage rapid development of scalable ap-plications. This section motivates the tradeoffs and de-scribes the developer-facing features that result.
3.1 API Design PhilosophyACID transactions simplify reasoning about correctness,
but it is equally important to be able to reason about perfor-mance. Megastore emphasizes cost-transparent APIs withruntime costs that match application developers’ intuitions.
Normalized relational schemas rely on joins at query timeto service user operations. This is not the right model forMegastore applications for several reasons:
• High-volume interactive workloads benefit more frompredictable performance than from an expressive querylanguage.
Source: Baker et al., CIDR 2011
Google’s Megastore
![Page 38: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/38.jpg)
But what if that’s not enough?
![Page 39: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/39.jpg)
Preserving commit order: example schema
Source: Llyod, 2012
![Page 40: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/40.jpg)
Preserving commit order
Source: Llyod, 2012
![Page 41: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/41.jpg)
Snapshot MapReduce and queries
Initial state
T1@ts1 INSERT INTO ads VALUES (2, “elkhound puppies”)
T2@ts2 INSERT INTO impressions VALUES (US, 2PM, 2)
Source: Llyod, 2012
![Page 42: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/42.jpg)
Source: Llyod, 2012
Google’s Spanner
Features:Full ACID translations across multiple datacenters, across continents!
External consistency (= linearizability):system preserves happens-before relationship among transactions
How?Given write transactions A and B, if A happens-before B, then
timestamp(A) < timestamp(B)
![Page 43: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/43.jpg)
TrueTime → write timestamps
Source: Llyod, 2012
![Page 44: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/44.jpg)
Why this works
Source: Llyod, 2012
![Page 45: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/45.jpg)
TrueTime
Source: Llyod, 2012
![Page 46: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/46.jpg)
Source: The Matrix
What’s the catch?
![Page 47: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/47.jpg)
Three Core Ideas
Partitioning (sharding)To increase scalability and to decrease latency
CachingTo reduce latency
ReplicationTo increase robustness (availability) and to increase throughput
Need distributed transactions!
Need replica coherence protocol!
Need cache coherence protocol!
![Page 48: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/48.jpg)
Source: Wikipedia (Cake)
![Page 49: Data-Intensive Distributed Computing - GitHub Pages · Replication possibilities Update sent to a master Replication is synchronous Replication is asynchronous Combination of both](https://reader034.fdocuments.in/reader034/viewer/2022042323/5f0db5647e708231d43bb174/html5/thumbnails/49.jpg)
Morale of the story: there’s no free lunch!
Source: www.phdcomics.com/comics/archive.php?comicid=1475
(Everything is a tradeoff)