TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems
-
Upload
bethesda-sharp -
Category
Documents
-
view
54 -
download
3
description
Transcript of TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems
TreadMarks: Distributed Shared Memory on
Standard Workstations and Operating Systems
Present By: Blair Fort
Oct. 28, 2004
Overview
Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Introduction
Threadmarks is a Distributed Shared Memory system
Unix workstations over an ATM or Ethernet network
Cluster Configuration
Distributed Shared Memory
Motivation
No widely available DSM system
Eliminate problems of other systemBad portabilityBad performanceFalse sharing
Goals
Ease of Use
Portability
Good Performance
Also show that it works for real programs
Overview
Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Ease of Use
Looks a lot like pthreads
Implicit message passing
Implicit process creation
Portability
Only standard Unix System Calls Message Passing Memory Management
Performance
False sharing
Excessive message passing
Conventional DSM Implementation
Sequential vs Release Consistency
Every Write is broadcasted
More Message Passing
Writes are broadcasted only synchronization points
More Memory overhead
Read-Write False Sharing
w(x)
r(y) r(y) r(x)
w(x) w(x)
Read-Write False Sharing
w(x) w(x)
r(y) r(y) r(x)
synch
Write-Write False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Multiple-Writer False Sharing
w(x)
w(y) w(y) r(x)
synch
w(x) w(x)
Eager vs. Lazy RC
Sends Messages at release of lock or at barriers
Broadcasts Messages to all nodes
Sends Messages when locks are acquired
Message goes only to the required node
Eager vs. Lazy RC
Memory Consistency
Done by creating diffs
Eager RC creates diffs at barriers Lazy RC creates diffs at the first use of a
page
Twin Creation
Diff Organization
Vector Timestamps
w(x) rel
acq w(y) rel
p1
p2
p3acq r(x) r(y)
000
000
000
100
110
Diff chain in Proc 4
Garbage Collection
Used to merge all diffs – recover memory
Occurs only at barriers
All nodes that have a pages must have all diffs of that page.
Overview
Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Testing Platform
8 DECstation-5000/240’s running Ultrix V4.3
Network: ATM 100Mbps Ethernet 10Mbps
Testing Programs
Modified Water from Splash Jacobi TSP QuickSort ILINK
Unix Overhead
ThreadMarks Overhead
Network Comparison - Water
Lazy vs Eager RC
Message Rate
Data Rate
Diff Creation Rate
Overview
Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
Conclusions
Automated Distributed Shared Memory system works for real programs!
LRC improves performance over ERC for most cases
Overview
Introduction and Motivation Implementation Experiments and Results Conclusions My two cents
My Thoughts
Good design – promotes re-use
Would like to see a comparison over hand-coding the message passing
Why not a partial merging of diffs?
Comments/Questions