Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC...
-
Upload
heather-powell -
Category
Documents
-
view
214 -
download
0
Transcript of Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC...
![Page 1: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/1.jpg)
Probabilistically Consistent
Indranil Gupta (Indy)Department of Computer Science,
FuDiCo 2015DPRG: http://dprg.cs.uiuc.edu 1
![Page 2: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/2.jpg)
Joint Work With
• Muntasir Rahman (Graduating PhD Student)• Luke Leslie, Lewis Tseng• Mayank Pundir (MS, now at Facebook)
• Work funded by Air Force Research Labs/AFOSR, National Science Foundation, Google, Yahoo!, and Microsoft
![Page 3: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/3.jpg)
Hard Choices in Extensible Distributed Systems• Users in extensible distributed systems desire
• Timeliness and Correctness Guarantees
• But these are at odds with…• Unpredictability
• Network Delays and Failures
• Research community and industry often tends to translate this into hard choices in systems design
• Examples1. CAP Theorem: choice between consistency and availability (or latency)
• Either relational databases or eventually consistent NoSQL stores• (Maybe a convergence now?)
2. Always get 100% answers in computation engines (batch or stream)• Use checkpointing
![Page 4: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/4.jpg)
Hard Choices… Can in fact be Probabilistic Choices!
• Many of these are in fact probabilistic choices• One of the earliest examples: pbcast/Bimodal Multicast
• Examples1. CAP Theorem:
• We derive a probabilistic CAP theorem that defines an achievable boundary between consistency and latency in any database system
• We use this to incorporate probabilistic consistency and latency SLAs into Cassandra and Riak
2. Always get 100% answers in computation engines (batch or stream)• In many systems, checkpointing results in 8-31x higher execution time!• We show that in systems like distributed graph processing systems
• We can avoid checkpointing altogether• Instead, have a reactive approach: upon failure, reactively scrounge state (naturally replicated)• And achieve very high accuracy (95-99%)
![Page 5: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/5.jpg)
Key-value/NoSQL Storage Systems
• Key-value/NoSQL stores: $3.4B sector by 2018• Distributed storage in the cloud• Netflix: video position (Cassandra) • Amazon: shopping cart (DynamoDB)• And many others
• Necessary API operations: get(key) and put(key, value)• And some extended operations, e.g., “CQL”
in Cassandra key-value store
![Page 6: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/6.jpg)
Key-value/NoSQL Storage: Fast and Fresh
• Cloud clients expect both • Latency: Low latency for all operations (reads/writes)
• 500ms latency increase at Google.com costs 20% drop in revenue • each extra ms $4 M revenue loss• Long latency User Cognitive Drift
• Consistency: read returns value of one of latest writes• Freshness of data means accurate tracking and higher user satisfaction• Most KV stores only offer weak consistency (Eventual consistency)• Eventual consistency = if writes stop, all replicas converge, eventually
![Page 7: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/7.jpg)
Hard vs. Soft Partitions
• CAP Theorem looks at hard partitions• However, soft partitions may happen inside a
data-center• Periods of elevated message delays • Periods of elevated loss rates
• Soft partitions are more frequent
Data-center 1(America)
Data-center 2(Europe)
Hard partition
ToR ToR
CoreSw
Congestion at switches=> Soft partition
![Page 8: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/8.jpg)
Our work: From Impossibility to Possibility
• C Probabilistic C (Consistency)• A Probabilistic A (Latency)• P Probabilistic P (Partition Model)
• A probabilistic CAP theorem• A system that validates how close we are to the
achievable envelope• (Goal is not: another consistency model, or
NoSQL vs New/Yes SQL)
8
![Page 9: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/9.jpg)
time
W(1) W(2) R(1)
tc
A read is tc-fresh if it returns the value of a write that starts at-most tc time before the read
pic is likelihood a read is NOT tc-fresh
Probabilistic Consistency (pic ,tc)
pua is likelihood a read DOES NOT return an answer within ta time units
Probabilistic Latency (pua ,ta)
α is likelihood that a random path ( client server client) has message delay exceeding tp
time units
Probabilistic Partition (α, tp )
PCAP Theorem: Impossible to achieve both Probabilistic Consistency and Latency
under Probabilistic Partitions if:
tc + ta < tp and pua + pic < α
Bad network -> High (α, tp )
To get better consistency -> lower (pic ,tc)
To get better latency -> lower (pua ,ta)
Probabilistic CAP
9Full proof in our arXiv paper: http://arxiv.org/abs/1509.02464
Special case: Original CAP has α=1 and tp = ∞
![Page 10: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/10.jpg)
10
Towards Probabilistic SLAs
• Latency SLA: Similar to latency SLAs already existing in industry.• Meet a desired probability that client receives operation’s result
within the timeout• Maximize freshness probability within given freshness interval• Example: Amazon shopping cart
• Doesn’t want to lose customers due to high latency• Only 10% operations can take longer than 300ms
• SLA: (pua, ta) = (0.1, 300ms)
• Minimize staleness (don’t want customers to lose items)
• Minimize: pic (Given: tc)
![Page 11: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/11.jpg)
11
Towards Probabilistic SLAs (2)
• Consistency SLA: Goal is to • Meet a desired freshness probability (given freshness interval) • Maximize probability that client receives operation’s result
within the timeout• Example: Google search application/Twitter search
• Wants users to receive “recent” data as search• Only 10% results can be more than 5 min stale
• SLA: (pic , tc)=(0.1, 5 min)
• Minimize response time (fast response to query)
• Minimize: pua (Given: ta)
![Page 12: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/12.jpg)
Meeting these SLAs: PCAP Systems
Increased Knob Latency Consistency
Read Delay Degrades Improves
Read Repair Rate Unaffected Improves
Consistency Level
Degrades Improves
Continuously adapt control knobs to always satisfy PCAP SLA
KV-store (Cassandra,
Riak)
CONTROL KNOBS
PCAPSystem
Satisfies PCAP SLAADAPTIVE CONTROL
System assumptions:• Client sends query to coordinator server which then forwards to replicas (answers reverse path)• There exist background mechanisms to bring stale replicas up to date
![Page 13: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/13.jpg)
Meeting Consistency SLA for PCAP Cassandra (pic=0.135)
Consistency always below target SLA
Setup • 9 server Emulab cluster: each server has 4 Xeon + 12 GB RAM• 100 Mbps Ethernet• YCSB workload (144 client threads)• Network delay: Log-normal distribution [Benson 2010]
Mean latency = 3 ms | 4 ms | 5 ms
![Page 14: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/14.jpg)
Meeting Consistency SLA for PCAP Cassandra (pic=0.135)
Optimal envelopes under different Network conditions (based on PCAP theorems)
PCAP system SatisfiesSLA and close to Optimal envelope
![Page 15: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/15.jpg)
Geo-Distributed PCAP
15
N(20,sqrt(2)) | N(22,sqrt(2.2)Latency SLA met before and after jump
Consistency degrades after delay jump
Fast convergence initially, and after delay jump
Reduced oscillation, compared to multiplicative controller
PCAP multiplicative controller
![Page 16: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/16.jpg)
Related Work
• Pileus/Tuba [Doug Terry et al]• Utility-based SLAs • Focus on wide-area• Can be used underneath our PCAP system (instead of our SLAs)
• Consistency Metrics: PBS [Peter Bailis et al] • Considers write end time (we consider write start time)• May not be able to define consistency for some read-write pairs (PCAP
accommodates all combinations)• Can use it in PCAP system
• Approximate answers: Hadoop [ApproxHadoop], Querying [BlinkDB], Bimodal multicast
16
![Page 17: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/17.jpg)
PCAP Summary
• CAP Theorem motivated NoSQL Revolution• But apps need freshness + fast responses
• Under soft partition• We proposed
• Probabilistic models for C, A, P• Probabilistic CAP theorem – generalizes classical CAP• PCAP system satisfies Latency/Consistency SLAs• Integrated into Apache Cassandra and Riak KV stores
• Riak has expressed interest in incorporating these into their mainline code
17
![Page 18: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/18.jpg)
Distributed Graph Processing and Checkpointing
• Checkpointing: Proactively save state to persistent storage• If there’s a failure, recover 100% cost• Used by:
•PowerGraph [Gonzalez et al. OSDI 2012]•Giraph [Apache Giraph]•Distributed GraphLab [Low et al. VLDB 2012]•Hama [Seo et al. CloudCom 2010]
18
![Page 19: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/19.jpg)
Checkpointing Bad
19
Graph Dataset
Vertex Count
Edge Count
CA-Road 1.96 M 2.77 M
Twitter 41.65 M 1.47 B
UK Web 105.9 M 3.74 B
8x
31x
19
8 – 31x Increased Per-Iteration Execution Time
![Page 20: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/20.jpg)
Users Already Don’t (Use or Like) Checkpointing
• “While we could turn on checkpointing to handle some of these failures, in practice we choose to disable checkpointing.” [Ching et. al. (Giraph @ Facebook) VLDB 2015]
• “Existing graph systems only support checkpoint-based fault tolerance, which most users leave disabled due to performance overhead.” [Gonzalez et. al. (GraphX) OSDI 2014]
• “The choice of interval must balance the cost of constructing the checkpoint with the computation lost since the last checkpoint in the event of a failure.” [Low et. al. (GraphLab) VLDB 2012]
• “Better performance can be obtained by balancing fault tolerance costs against that of a job restart.” [Low et al. (GraphLab) VLDB 2012]
20
![Page 21: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/21.jpg)
Our Approach: Zorro
• No checkpointing. Common case is fast.• When failure occurs, opportunistically scrounge state (from surviving
servers) and continue computation• Natural replication in distributed processing systems
• A vertex data is present at its neighbor vertices• Each vertex assigned to one server, and its neighbors likely on
other servers• We get very high accuracy (95%+)
21
![Page 22: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/22.jpg)
Natural Replication => Can Retrieve a Lot of State
22
PowerGraph LFGraph87 – 95% Graph State is Recoverable
Even After Half the Servers Fail
92 – 95%
87 – 91%
22
![Page 23: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/23.jpg)
Natural Replication => Low InAccuracy
23
PowerGraph LFGraph
2%
3%
![Page 24: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/24.jpg)
Natural Replication => Low InAccuracy
24
Algorithm PowerGraph LFGraphPageRank 2 % 3 %
Single-Source Shortest Paths
0.0025 % 0.06 %
Connected Components 1.6 % 2.15 %K-Core 0.0054% 1.4 %
Graph Coloring* 5.02 % NAGroup-Source Shortest
Paths*0.84 % NA
Triangle Count* 0 % NAApproximate Diameter* 0 % NA
![Page 25: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/25.jpg)
Takeaways
• Impossibility theorems and 100% correct answers are great• But they entail
• Inflexibility in design (NoSQL or SQL)• High overhead (Checkpointing)
• Important to explore • Probabilistic tradeoffs and Achievable envelopes • Leads to more flexibility in design
• Other applicable areas: stream processing, machine learning
DPRG: http://dprg.cs.uiuc.edu
![Page 26: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/26.jpg)
Plug: MOOC on “Cloud Computing Concepts”
• Free course, On Coursera• Ran Feb-Apr 2015• 120K+ students
Next run: Spring 2016• Covered distributed systems and algorithms used in cloud computing• Free and Open to everyone
• https://www.coursera.org/course/cloudcomputing• Or do a search on Google for “Cloud Computing Course” (click on first
link)
![Page 27: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/27.jpg)
Backup Slides
![Page 28: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/28.jpg)
28
PCAP Consistency Metric Is more Generic Than PBS
time
W(1)W(2)
R(1)
tc
A read is tc-fresh if it returns the value of a write that starts at-most tc time before the read starts
W(1) and R(1) can overlap
time
W(1)W(2)
R(1)
tc
A read is tc-fresh if it returns the value of a write that starts at-most tc time before the read ends
W(1) and R(1) cannot overlap
PCAP
PBS
![Page 29: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/29.jpg)
GeoPCAP: 2 Key Techniques
Client Read, SLA
Prob C1, L1
Local DC
Composed modelProb CC, LC
Compare
SLA
Given client C or L SLA:• QUICKEST: at-least one DC satisfies SLA• ALL: each DC satisfies SLA
Prob C2, L2 Prob C3,L3
(1) Prob Composition Rules
Prob WAN Model
Δ Δ Δ(2) Tune Geo-delay using PID Control
![Page 30: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/30.jpg)
CAP Theorem NoSQL Revolution
• Conjectured: [Brewer 00] • Proved: [Gilbert Lynch 02]• Kicked off NoSQL
revolution• Abadi’s PACELC
• If P, choose A or C• Else, choose L
(latency) or C
Consistency
Partition-tolerance Availability (Latency)
RDBMSs (non-replicated)
Cassandra, RIAK, Dynamo, Voldemort
HBase, HyperTable,BigTable, Spanner
![Page 31: Probabilistically Consistent Indranil Gupta (Indy) Department of Computer Science, UIUC indy@illinois.edu FuDiCo 2015 DPRG: ://dprg.cs.uiuc.edu.](https://reader031.fdocuments.in/reader031/viewer/2022032703/56649f585503460f94c7d031/html5/thumbnails/31.jpg)
Geo-Distributed PCAP
31
N(20,sqrt(2)) | N(22,sqrt(2.2)Latency SLA met before and after jump
Consistency degrades after delay jump
Fast convergence initially, and after delay jump
Reduced oscillation, compared to multiplicative controller
PCAP multiplicative controller