Compilation of Socialite, Dedalus & WebDAMLog A Spontaneous Talk
Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.
-
Upload
blanche-allison -
Category
Documents
-
view
221 -
download
1
Transcript of Declarative Distributed Programming with Dedalus and Bloom Peter Alvaro, Neil Conway UC Berkeley.
Declarative Distributed Programming withDedalus and Bloom
Peter Alvaro, Neil ConwayUC Berkeley
This Talk
1. Background– BOOM Analytics
2. Theory– Dedalus– CALM
3. Practice– Bloom– Lattices
Berkeley Orders of Magnitude
Vision:Can we build small programs for large distributed systems?
Approach: • Language system design• System language design
Initial Language: Overlog
• Data-centric programming– Uniform representation of system state
• High-level, declarative query language– Distributed variant of Datalog
• Express systems as queries
BOOM Analytics
Goal: “Big Data” stack– API-compliant– Competitive performance
System: [EuroSys’10]– Distributed file system• HDFS compatible
– Hadoop job scheduler
What Worked Well
• Concise, declarative implementation– 10-20x more concise than Java (LOCs)– Similar performance (within 10-20%)
• Separation of policy and mechanism• Ease of evolution
1. High availability (failover + Paxos)2. Scalability (hash partitioned FS master)3. Monitoring as an aspect
What Worked Poorly
Unclear semantics– “Correct” semantics defined by
interpreter behavior
In particular,1. change (e.g., state update)2. uncertainty (e.g., async
communication)
Temporal Ambiguity
Goal: • Increment a counter upon “request” message• Send response message with value of counter
counter(“hostname”,0).counter(To,X+1) :- counter(To,X), req(To,_).response(@From,X) :- counter(@To,X), req(@To,From).
When is counter incremented?
What does response contain?
Implicit Communication
Implicit communication was the wrong abstraction for systems programming.– Hard to reason about partial failure
Example: we never used distributed joins in the file system!
path(@S,D) :- link(@S,Z), path(@Z,D).
Received Wisdom
We argue that objects that interact in a distributed system need to be dealt with in ways that are intrinsically different from objects that interact in a single address space. These differences are required because distributed systems require that the programmer be aware of latency, have a different model of memory access, and take into account issues of concurrency and partial failure.
Jim Waldo et al.,A Note on Distributed Computing (1994)
Dedalus(it’s about time)
Explicitly represent logical time as an attribute of all knowledge
“Time is a device that was invented to keep everything from happening at once.”
(Unattributed)
Dedalus: Syntax
Datalog + temporal modifiers1. Instantaneous (deduction)2. Deferred (sequencing)• True at successor time
3. Asynchronous (communication)• True at nondeterministic future time
Dedalus: Syntax(1)Deductive rule: (Plain Datalog)
(2) Inductive rule: (Constraint across “next” timestep)
(3) Async rule: (Constraint across arbitrary timesteps)
p(A,B,S) :- q(A,B,T), T=S.
p(A,B,S) :- q(A,B,T), S=T+1.
p(A,B,S) :- q(A,B,T), time(S), choose((A,B,T), (S)).
Logical time
Syntax Sugar(1)Deductive rule: (Plain Datalog)
(2) Inductive rule: (Constraint across “next”
timestep)
(3) Async rule: (Constraint across arbitrary timesteps)
p(A,B) :- q(A,B).
p(A,B)@next :- q(A,B).
p(A,B)@async:- q(A,B).
State Update
p(A, B)@next :- p(A, B), notin p_del(A, B).
Example Trace:p(1, 2)@101;p(1, 3)@102;p_del(1, 3)@300;
Time
p(1, 2) p(1, 3) p_del(1, 3)
101
102
...
300
301
Logic and time
Key relationships:• Atomicity• Mutual exclusion• Sequentiality
Overlog: Relationships among facts
Dedalus: Also, relationships between states
Change and Asynchrony
Overlogcounter(“hostname”,0).counter(To,X+1) :- counter(To,X), req(To,_).response(@From,X) :- counter(@To,X), req(@To,From).
Dedaluscounter(“hostname”,0).counter(To,X+1) :- counter(To,X), req(To,_).counter(To,X) :- counter(To,X), notin req(To,_).response(@From,X) :- counter(X), req(@To,From).@async
@next
@next
Increment is deferred
Pre-increment value sentNon-deterministicdelivery time
Dedalus: Semantics
Goal: declarative semantics– Reason about a program’s meaning
rather than its executions
Approach: model theory
Minimal Models
A negation-free (monotonic) Datalog program has a unique minimal model
Model No “missing” factsMinimal
No “extra” facts
Unique Program has a single meaning
Stable Models
The consequences of async rules hold at a nondeterministic future time– Captured by the choice construct• Greco and Zaniolo (1998)
– Each choice leads to a distinct model
Intuition:A stable model is an execution trace
Traces and modelscounter(To, X+1)@next :- counter(To, X), request(To, _).counter(To, X)@next :- counter(To, X), notin request(To, _).response(@From, X)@async :- counter(@To, X), request(@To, From).response(From, X)@next :- response(From, X).
Persistence rules leadto infinitely large models
Async rules lead to infinitely many models
An Executioncounter(To, X+1)@next :- counter(To, X), request(To, _).counter(To, X)@next :- counter(To, X), notin request(To, _).response(@From, X)@async :- counter(@To, X), request(@To, From).response(From, X)@next :- response(From, X).
counter(“node1”, 0)@0.request(“node1”, “node2”)@0.
A Stable Model
counter(“node1”, 0)@0.request(“node1”, “node2”)@0.counter(“node1”, 1)@1.counter(“node1”,1)@2.[…]response(“node2”, 0)@100.counter(“node1”, 1)@101.counter(“node1”, 1)@102.response(“node2”, 0)@101.response(“node2”, 0)@102.[…]
A stable modelfor choice = 100
counter(To, X+1)@next :- counter(To, X), request(To, _).counter(To, X)@next :- counter(To, X), notin request(To, _).response(@From, X)@async :- counter(@To, X), request(@To, From).response(From, X)@next :- response(From, X).
Ultimate Models
A stable model characterizes an execution–Many of these models are not
“interestingly” different
Wanted: a characterization of outcomes– An ultimate model contains exactly those
facts that are “eventually always true”
Traces and modelscounter(To,X+1)@next :- counter(To,X), request(To,_).counter(To,X)@next :- counter(To,X), notin request(To,_).response(@From,X)@async :- counter(@To,X), request(@To,From).response(From,X)@next :- response(From,X).
counter(“node1”, 0)@0.request(“node1”, “node2”)@0.counter(“node1”, 1)@1.counter(“node1”, 1)@2.[…]response(“node2”, 0)@100.counter(“node1”, 1)@101.
response(“node2”, 0)@101.
[…]
counter(“node1”, 1)@102.
response(“node2”, 0)@102.
Traces and modelscounter(To,X+1)@next :- counter(To,X), request(To,_).counter(To,X)@next :- counter(To,X), notin request(To,_).response(@From,X)@async :- counter(@To,X), request(@To,From).response(From,X)@next :- response(From,X).
counter(“node1”, 1).
response(“node2”, 0).
Ultimate Model
Confluence
This program has a unique ultimate model– In fact, all negation-free Dedalus
programs have a unique ultimate model [DL2’12]
We call such programs confluent:same program outcome, regardless of network non-determinism
The BloomProgrammingLanguage
Lessons from Dedalus
1. Clear program semantics is essential
2. Avoid implicit communication
3. Confluence seems promising
Lessons From Building Systems
1. Syntax matters!– Datalog syntax is cryptic and foreign
2. Adopt, don’t reinvent– DSL > standalone language– Use host language’s type system (E. Meijer)
3. Modularity is important– Scoping– Encapsulation
Bloom Operational Model
Bloom Rule Syntax
<=
now
<+
next
<- delete (at next)
<~
async
table persistent state
scratch transient state
channel network transient
map, flat_map
reduce, group
join, outerjoin
empty?, include?
<collection> <merge op> <expr>
Local computation State update
Asynchronousmessage passing
34
QUORUM_SIZE = 5RESULT_ADDR = "example.org"
class QuorumVote include Bud
state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end
bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn <~ cnt {|c| [RESULT_ADDR] if c >= QUORUM_SIZE} endend
Example: Quorum Vote
Communication interfaces
Coordinator state
Accumulate votes
Send message when quorum reached
Bloom state
Bloom logic
Ruby class definition
Count votes
Asynchronous messaging
Question:How does confluence relate to practical problems of distributed consistency?
Common Technique:
Replicate state at multiple sites, for:• Fault tolerance• Reduced latency• Read throughput
Problem:Different replicas might observe events in different orders
… and then reach different conclusions!
Alternative #1:Enforce consistent event order at all nodes(“Strong Consistency”)
Alternative #1:Enforce consistent event order at all nodes(“Strong Consistency”)
Problems:• Availability• CAP Theorem
• Latency
Alternative #2:Achieve correct results for any network order(“Weak Consistency”)
Alternative #2:Achieve correct results for any network order(“Weak Consistency”)
Concerns:Writing order-independent programs is hard!
Challenge:How can we make iteasier to writeorder-independentprograms?
Order-Independent Programs
Alternative #1:– Start with a conventional language
– Reason about when order can be relaxed• This is hard! Especially for large programs.
Taking Order For Granted
Data (Ordered)array of bytes
Compute
(Ordered) sequence of instructions
Writing order-sensitiveprograms is too easy!
Order-Independent Programs
Alternative #1:– Start with a conventional language
– Reason about when order can be relaxed• This is hard! Especially for large programs.
Alternative #2:– Start with an order-independent language
– Add order explicitly, only when necessary
– “Disorderly Programming”
(Leading) Question:So, where might we finda nice order-independentprogramming language?
Recall:All monotone Dedalus programsare confluent.
Monotonic Logic• As input set grows,
output set does not shrink
• Order independent• e.g., map, filter, join,
union, intersection
Non-Monotonic Logic• New inputs might
invalidate previous outputs
• Order sensitive• e.g., aggregation,
negation
Consistency
As
Logical
Monotonicity
CALM Analysis [CIDR’11]
1.Monotone programs are deterministic (confluent) [Ameloot’11, Marczak’12]
2.Simple syntactic test for monotonicity
Result: Whole-program static analysis foreventual convergence
Case Study
Scenario
Scenario
Scenario
Scenario
Questions
1. Will cart replicas eventually converge?– “Eventual Consistency”
2. What will client observe on checkout?– Goal: checkout reflects all session
actions
3. To achieve #1 and #2, how much additional coordination is required?
if kvs[x] exists: old = kvs[x] kvs.delete(x) if old > c kvs[x] = old – c
Design #1: Mutable State
Add(item x, count c):
if kvs[x] exists: old = kvs[x] kvs.delete(x)else old = 0kvs[x] = old + c
Remove(item x, count c):
Non-monotonic!
CALM Analysis
Conclusion:Every operation mightrequire coordination!
Non-monotonic!
if kvs[x] exists: old = kvs[x] kvs.delete(x) if old > c kvs[x] = old – c
Subtle Bug
Add(item x, count c):
if kvs[x] exists: old = kvs[x] kvs.delete(x)else old = 0kvs[x] = old + c
Remove(item x, count c):
What if removebefore add?
Design #2: “Disorderly”
Add(item x, count c):
Append x,c to add_log
Remove(item x, count c):
Append x,c to del_log
Checkout():
Group add_log by item ID; sum counts.
Group del_log by item ID; sum counts.
For each item, subtract deletions from additions.Non-monotonic!
CALM Analysis
Conclusion:Replication is safe;might need tocoordinate on checkout
Monotonic
Takeaways
• Major difference in coordination cost!– Coordinate once per operation vs.
Coordinate once per checkout
• Disorderly accumulation when possible–Monotone growth confluent
• “Disorderly”: common design in practice!– e.g., Amazon Dynamo
Generalizing Monotonicity
• Monotone logic: growing sets over time– Partial order: set containment
• In practice, other kinds of growth:– Version numbers, timestamps– “In-progress” committed/aborted– Directories, sequences, …
62
Example: Quorum Vote
Not (set-wise)monotonic!
QUORUM_SIZE = 5RESULT_ADDR = "example.org"
class QuorumVote include Bud
state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end
bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn <~ cnt {|c| [RESULT_ADDR] if c >= QUORUM_SIZE} endend
Challenge:Extend monotone logic toallow other kinds of “growth”
hS,t,?i is a bounded join semilattice iff:– S is a set– t is a binary operator (“least upper
bound”)• Induces a partial order on S: x ·S y if x t y = y
• Associative, Commutative, and Idempotent– “ACID 2.0”
• Informally, LUB is “merge function” for S
– ? is the “least” element in S• 8x 2 S: ? t x = x
Time
Set(t = Union)
Increasing Int(t = Max)
Boolean(t = Or)
f : ST is a monotone function iff:8a,b 2 S : a ·S b ) f(a) ·T f(b)
Time
Set(t = Union)
Increasing Int(t = Max)
Boolean(t = Or)
size() >= 3
Monotone function:set increase-int
Monotone function:increase-int boolean
68
Quorum Vote with Lattices
QUORUM_SIZE = 5RESULT_ADDR = "example.org"
class QuorumVote include Bud
state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] lset :votes lmax :vote_cnt lbool :got_quorum end
bloom do votes <= vote_chn {|v| v.voter_id} vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } endend
Monotone function: set maxMonotone function: max bool
Threshold test on bool (monotone)
Lattice state declarations
Accumulate votesinto set
Program state
Program logic
Merge new votes togetherwith stored votes (set LUB)Merge using lmax LUB
Conclusions
• Interplay between language and system design
• Key question: what should be explicit?– Initial answer: asynchrony, state update– Refined answer: order
• Disorderly programming for disorderly networks
Thank You!
Queries welcome.
gem install budhttp://www.bloom-lang.net
Emily AndrewsPeter BailisWilliam MarczakDavid MaierTyson Condie
Joseph M. HellersteinRusty SearsSriram Srinivasan
Collaborators:
Extra slides
Ongoing Work
1. Lattices– Concurrent editing– Distributed garbage collection
2. Confluence and concurrency control– Support for “controlled non-
determinism”– Program analysis for serializability?
3. Safe composition of monotone and non-monotone code
Overlog
“Our intellectual powers are rather geared to master static relations and […] our powers to visualize processes evolving in time are relatively poorly developed. For that reason we should do (as wise programmers aware of our limitations) our utmost to shorten the conceptual gap between the static program and the dynamic process, to make the correspondence between the program (spread out in text space) and the process (spread out in time) as trivial as possible.”
Edgar Djikstra
(Themes)
• Disorderly / order-light programming• (understanding, simplifying) the relation
btw program syntax and outcomes• Determinism (in asynchronous
executions) as a correctness criterion• Coordination – theoretical basis and
mechanisms–What programs require coordination? – How can we coordinate them efficiently?
Traces and modelscounter(X+1)@next :- counter(X), request(_, _).counter(X)@next :- counter(X), notin request(_, _).response(@From, X)@async :- counter(X), request(To, From).response(From, X)@next :- response(From, X).
counter(0)@0.request(“node1”, “node2”)@0.
Traces and models -- 0counter(0+1)@1 :- counter(0)@0, request(“node1”, “node2”)@0.counter(X)@next :- counter(X), notin request(_, _).response(“node2”, 0)@100 :- counter(0)@0, request(“node1”, “node2”)@0.response(From, X)@next :- response(From, X).
counter(0)@0.request(“node1”, “node2”)@0.counter(1)@1.[…]response(“node2”, 0)@100.
Traces and models -- 1counter(X+1)@next :- counter(X), request(To, From).counter(1)@?+1 :- counter(1)@?, notin request(_, _)@?.response(From, X)@async :- counter(X), request(To, From).response(“node2”, 0)@101:- response(“node2”, 0)@100.
counter(0)@0.request(“node1”, “node2”)@0.counter(1)@1.counter(1)@2.[…]
Traces and models -- 100counter(X+1)@next :- counter(X), request(To, From).counter(1)@101 :- counter(1)@100, notin request(_, _)@100.response(From, X)@async :- counter(X), request(To, From).response(“node2”, 0)@101:- response(“node2”, 0)@100.
counter(0)@0.request(“node1”, “node2”)@0.counter(1)@1.counter(1)@2.[…]response(“node2”, 0)@100.counter(1)@101.response(“node2”, 0)@101.
Traces and models – 101+
counter(X+1)@next :- counter(X), request(To, From).counter(1)@102 :- counter(1)@101, notin request(_, _)@101.response(From, X)@async :- counter(X), request(To, From).response(“node2”, 0)@102:- response(“node2”, 0)@101.
counter(0)@0.request(“node1”, “node2”)@0.counter(1)@1.counter(1)@2.[…]response(“node2”, 0)@100.counter(1)@101.counter(1)@102.response(“node2”, 0)@101.response(“node2”, 0)@102.[…]
A stable modelfor choice = 100
Traces and modelscounter(X+1)@next :- counter(X), request(_, _).counter(X)@next :- counter(X), notin request(_, _).response(@From, X)@async :- counter(X), request(To, From).response(From, X)@next :- response(From, X).
counter(0)@0.request(“node1”, “node2”)@0.
Stable models:
{counter(0)@0, counter(1)@1, counter(1)@2, […]response(“node2”, 0)@k, response(“node2”, 0)@k+1, […]
}
Studying confluence in Dedalus
q(#L, X)@async <- e(X), replica(L).
p(X) <- q(_, X)p(X)@next <- p(X).
Bob
Carol
q(Bob,1)@1q(Bob, 2)@2e(1). e(2).
replica(Bob). Replica(Carol).
q(Carol, 2)@1q(Carol, 1)@2
p(1), p(2)
p(1), p(2)
Alice
UUM
Studying confluence in Dedalus
q(#L, X)@async <- e(X), replica(L).r(#L, X)@async <- f(X), replica(L).p(X) <- q(_, X), r(_, X).p(X)@next <- p(X).
Bob
Carol
q(Bob,1)@1r(Bob, 1)@2e(1). f(1).
replica(Bob). replica(Carol).
q(Carol, 1)@1r(Carol, 1)@1
{ }
p(1)
Alice
Multiple ultimatemodels
Studying confluence in DedalusBob
Carol
q(Bob,1)@1r(Bob, 1)@2e(1). f(1).
replica(Bob). replica(Carol).
r(Carol, 1)@1q(Carol, 1)@2
p(1)
{ }
Alice
Multiple ultimatemodels
q(#L, X)@async <- e(X), replica(L).r(#L, X)@async <- f(X), replica(L).p(X) <- q(_, X), r(_, X).p(X)@next <- p(X).q(L, X)@next <- q(L, X).
Studying confluence in DedalusBob
Carol
q(Bob,1)@1r(Bob, 1)@2e(1). f(1).
replica(Bob). replica(Carol).
r(Carol, 1)@1q(Carol, 1)@2
p(1)
p(1)
Alice
UUM
q(#L, X)@async <- e(X), replica(L).r(#L, X)@async <- f(X), replica(L).p(X) <- q(_, X), r(_, X).p(X)@next <- p(X).q(L, X)@next <- q(L, X).r(L, X)@next <- r(L, X).
Studying confluence in DedalusBob
Carol
q(Bob,1)@1r(Bob, 1)@2e(1). f(1).
replica(Bob). replica(Carol).
r(Carol, 1)@1q(Carol, 1)@2
p(1)
{ }
Alice
q(#L, X)@async <- e(X), replica(L).r(#L, X)@async <- f(X), replica(L).p(X) <- q(_, X), NOT r(_, X).p(X)@next <- p(X).q(L, X)@next <- q(L, X).r(L, X)@next <- r(L, X).
Multiple ultimatemodels
CALM – Consistency as logical monotonicity
• Logically monotonic => confluent
• Consequence: a (conservative) static analysis for eventual consistency
• Practical implications: – Language support for weakly-consistent,
coordination-free distributed systems!
Does CALM help?
• Is the monotonic subset of Dedalus sufficiently expressive / convenient to implement distributed systems?
Coordination
• CALM’s complement:– Nonmonotonic => order-sensitive– Ensuring deterministic outcomes may
require controlling order.
• We could constrain the order of– Data• E.g., via ordered delivery
– Computation• E.g., via evaluation barriers
Coordination mechanismsBob
Carol
r(Bob,1)@1e(1). f(1). replica(Bob). replica(Carol). { }
Alice
Approach 1:
Deliver the q() and r() tuples in the sametotal order to all replicas.
q(Bob, 1)@2
r(Bob,1)@1q(Bob, 1)@2
{ }
Coordination mechanismsBob
Carol
q(Bob,1)@1e(1). f(1). replica(Bob). replica(Carol). { }
Alice
p(X) <- q(_, X), NOT r(_, X).
r(Bob, 1)@2
r(Bob,1)@1q(Bob, 1)@2
{ }
Approach 2:
Do not evaluate “NOT r(X)” until its contents are completely determined.
Ordered delivery vs. stratification(Differences)
• Stratified evaluation– Unique outcome across all executions– Finite inputs – Communication between producers and
consumers
• Ordered delivery– Different outcomes in different runs– No restriction on inputs– Multiple producers and consumers => need
distributed consensus
Ordered delivery vs. stratification(Similarities)
• Stratified evaluation– Control order of evaluation at a course grain
• table by table.
– Order is given by program syntax
• Ordered delivery– Fine-grained order of evaluation
• Row by row
– Order is ND chosen by oracle (e.g. Paxos)– Analogy:
• Assign a stratum to each tuple. • Ensure that all replicas see the same stratum
assignments