Relaxed Consistency Models
description
Transcript of Relaxed Consistency Models
Relaxed Consistency Models
OutlineLazy Release Consistency
TreadMarks DSM system
Review: what makes a good consistency model?Model is a contract between memory system and programmer◦ Programmer follows some rules about reads and writes◦ Model provides guarantees
Model embodies a tradeoff◦ Intuitive for programmer vs. Can be implemented efficiently
Treadmarks high level goalsBetter DSM performance
Run existing parallel (and “correct”) code.
What specific problems with IVY are TreadMarks want to fix?False sharing: two machines use different variables on the same page, at least on writes◦ IVY will make the pages bouncing back and forth◦ However, it doesn’t need to do so of two process (threads) working on
different variables.
send only written bytes – not whole pages
Goal 1: Reducing the data to be sentGoal: don’t send while page, just the written bytes.
On M1 write fault:◦ tell other hosts to invalidate but keep hidden copy.◦ M1 itself also keep the hidden copy.
On M2 fault:◦ M2 asks M1 for recent modifications.◦ M1 “diffs” current page against hidden copy.◦ M1 send differences to M2.◦ M2 applies diffs to its hidden copy and make the up-to-date version
Goal 2: allow multiple readers+writersTo cope with false sharing◦ no invalidation when a machine writes◦ no r/w r/o demotion when a machines reads◦ so, there will be multiple “different” copies of a page! which should a reader
look at?
Diffs help here: can merge writes to same page
But, when to send the diffs?◦ No invalidations, no page faults, what triggers sending diffs?
Release ConsistencyThink about how you program your multi-thread codes. While accessing the shared data, you should first get a lock and then accessing the data and final you have to release the lock. This is considered as the “correct” programming practice.
In distributed environment, think about we have a lock server. Each process should get a lock from the lock server before accessing the shared resources
Thus, we can send out write diffs on release to all copies of pages written.
This is a new consistency model!
Release Consistency ModelM0 wont see M1’s writes until M1 releases a lock
so machines can temporarily disagree on memory contents
If the programs always follow the rules of lock:◦ Locks force order no stale reads like sequential consistency
But, if you do not follow this guideline (don’t lock)◦ reads can return stale data◦ concurrent writes to same variable trouble (data race)
Benefit?
multiple machines can have copies of a page, even when 1 or more writes◦ no bouncing of pages due to false sharing◦ read copies can co-exist with writers◦ relies on write diffs otherwise can’t reconcile concurrent writes to same page
Lazy Release Consistency ModelDo we really need to update the pages at moment of release a lock? Suppose you never use a variable which is updated by some processes in the system. You do not need to get notified by the update event for the variable.
Only fetch write diffs on acquire of a lock and only fetch from previous holder of that lock. Thus nothing happens at time of write or release.
This is called as Lazy Release Consistency Model (LRC) and is another new consistency model!
LRC hides some writes that RC reveals.
Benefit?◦ if you don’t acquire lock on object, you don’t have to fetch updates to it◦ if you use just some variables on a page, no need to fetch writes to others◦ less network traffic
Conventional DSM Implementation
Sequential vs Release Consistency
Every Write is broadcasted
More Message Passing
Writes are broadcasted only synchronization points
More Memory overhead
Read-Write False Sharing
w(x)
r(y) r(y) r(x)
w(x) w(x)
Read-Write False Sharing
w(x) w(x)
r(y) r(y) r(x)
synch
Write-Write False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Multiple-Writer False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Example 1 (false sharing)x and y are on the same page. (a: acquire, r: release)
M0: a1 for (…) x++ r1
M1: a2 for (…) y++ r2 a1 print x, y r1
What does IVY do?
What does Treadmarks do?◦ M0 and M1 both get cached writeable copy of the page◦ when they release, each computes diff against original page◦ M1’s a1 cause it to pull write diffs from last holder of lock1, so M1 update x
in its page.
Example 2 (LRC)x and y on same page
M0: a1 x=1 r1
M1: a2 y=1 r2
M2: a1 print x r1
What does IVY do?
What does Treadmarks do?◦ M2 only ask previous holder of lock 1 for write diffs◦ M2 does not see M1’s modification to y, even though on the same page
DiscussionQ: is LRC a win over IVY if each variable on a separate page? (No)
Q: why is LRC a reasonably intuitive model for programmers?
It is the same as sequential consistency if the programmers always use lock and unlock locks. (follow the rules defined by LRC)
but, non-locking code does not work. like v=f(); done=1;
Example 3 (motivate vector timestamps)M0: a1 x=1 r1
M1: a1 a2 y=x r2 r1
M2: a2 print x, y r2
What’s the “right ” answer?◦ we need to define what LRC guaranetees◦ answer: when you acquire a lock,
◦ you see all writes by previous holder and all writes previous holder saw
What does TreadMarks do for example 3?What does TreadMarks do?◦ M2 and M1 need to decide what M2 needs and doesn’t already have uses
“vector timestamps”◦ each machine numbers its releases (i.e. write diffs)◦ M1 tells M2:
◦ at release, had seen M0’s writes through #20, and see◦ 0:20◦ 1:25◦ 2:19◦ 3:36◦ ……◦ this is a “vector timestanmp”
◦ M2 remembers a vector timestamp of writes it has seen◦ M2 compares with M1’s VT to see what writes it needs from other machines.
DiscussionsVTs order writes to same variable by different machines:◦ M0: a1 x=1 r1 a2 y=9 r2◦ M1: a1 x=2 r1◦ M2: a1 a2 z = x + y r2 r1◦ M1 is going to hear “x=1” from M0, and “x=2” from M1.◦ How does M1 know what to do?
Could the VTs for two values of the same variable not be ordered?
M0: a1 x=1 r1
M1: a2 x=2 r2
M2: a1 a2 print x r2 r1
Programmer rules /system guarentees?Programmer must lock around all writes to shared variables to order writes to same variable, otherwise “latest value” not well defined
to read latest value, must lock
if no lock for read, guaranteed to see values that contributed to the variables you did lock
Example of when LRC might work too hardM0: a2 z=99 r2 a1 x=1 r1
M1: a1 y=x r1
TreadMarks will send z to M1 because it comes before x=1 in VT order.◦ Assuming x and z are on the same page.◦ Even if on different pages, M1 must invalidate z’s page.
But M1 doesn’t use z
How could a system understand that z isn’t needed?◦ Require locking of all data you read thus to relax the causal part of the LRC
model
Q: without using VM page protection?It uses VM to ◦ detect writes to avoid making hidden copies (for diffs) if not needed◦ detect reads to pages know whether to fetch a diffneither is really crucialso TreadMarks doesn’t depend on VM as much as IVY does
IVY used VM faults to decide what data has to be moved and whenTM uses acquire()/release() and diffs for that purpose
TreadMarks ImplementationLooks a lot like pthreads
Implicit message passing
Implicit process creation
Only standard Unix System Calls◦ Message Passing◦ Memory Management
TreadMarks Code
Eager vs. Lazy RCSends Messages at release of lock or at barriers
Broadcasts Messages to all nodes
Sends Messages when locks are acquired
Message goes only to the required node
Eager vs. Lazy RC
Memory ConsistencyDone by creating diffs
Eager RC creates diffs at barriers
Lazy RC creates diffs at the first use of a page
Twin Creation
Diff Organization
Vector Timestampsw(x) rel
acq w(y) rel
p1
p2
p3 acq r(x) r(y)000
000
000
100
110
Diff chain in Proc 4
Garbage CollectionUsed to merge all diffs – recover memory
Occurs only at barriers
All nodes that have a pages must have all diffs of that page.
DSM successful?clusters of cooperating machines are hugely successful
DSM not so much◦ main justification is transparency for existing threaded code◦ that's not interesting for new apps◦ and transparency makes it hard to get high performance
MapReduce or message-passing or shared storage more common than DSM
Thank You! Any Questions?
Click icon to add picture