Transactional Memory Concurrency unlocked Programming 1 Bingsheng Wang– TM – Operating Systems.
Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) •...
Transcript of Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) •...
![Page 1: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/1.jpg)
Exploiting Distributed Software Transactional Memory
Christos Kotselidis Research Fellow Advanced Processor Technologies Group The University of Manchester
![Page 2: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/2.jpg)
2
Outline • Transactional Memory • Distributed Transactional Memory • DiSTM
• Architecture • Protocols
• Evaluation • Conclusions
![Page 3: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/3.jpg)
3
Need for Concurrent Programming
• Multicores are mainstream: new software challenges
• Exploit parallelism • Manage concurrency
• Locks are challenging for safe shared data access • Problem is explicit synchronization
• Programmer manages shared accesses • Correctness: Race conditions, deadlocks, … • Performance/complexity: lock granularity (coarse/fine
grain)
![Page 4: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/4.jpg)
4
What is Transactional Memory? (1/2)
• New concurrent programming model, aims to: • Simplify programming compared to fine-grain locks • Provide similar or better performance than fine-grain
locks • Database transactions adapted for memory
accesses • Growing Research Area
• 50+ TM implementations (last decade) • Software, Hardware, Hybrid Platforms (STM, HTM, HyTM) • Intel Haswell RTM, IBM Blue Gene/Q • Akka, PGAS languages, etc.
![Page 5: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/5.jpg)
5
What is Transactional Memory? (2/2)
• Instead of acquiring locks, execute code optimistically • Resolve detected conflicts • Commit and publicize the changes • Atomicity, Consistency, Isolation (ACI) synchronized(this) { … x++;
} Programming with locks
atomic { … x++;
} Programming with transactions
![Page 6: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/6.jpg)
6
TM research Most TM systems target shared-memory architectures • Hardware, Software, Hybrid Concerning distributed computing: • Partitioned Global Address Space (PGAS) languages (X10,
Fortress) contain the atomic construct without currently having any underlying distributed TM system
• Distributed JVM domain for Enterprise Applications (Terracotta) use locks for synchronization
• Transactions have started being used in Distributed Systems (Sinfonia)
![Page 7: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/7.jpg)
7
STM on CMPs
atomic { x=a; x++; }
Thread 1
X Restart
atomic { a++; }
Thread 2
![Page 8: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/8.jpg)
8
STM on Clusters
![Page 9: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/9.jpg)
9
Distributed Software Transactional Memory DiSTM - Architecture (1/4)
• Clustered JVMs behaving as a Single System Image • Modular and Pluggable architecture • JVM middleware coordinating transactional execution • Proactive Framework (RMI) for distributed
communication • @distatomic annotated interface denote
transactional objects
![Page 10: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/10.jpg)
10
DiSTM - Architecture (2/4)
• Automatic class re-writing (BCEL) in order to inject the transactional protocol within the objects
• Four distributed transactional coherence protocols • TCC, Single Lease, Multiple Leases, Anaconda
• Library of distributed atomic collection classes • Arrays, Singleton Objects, HashMaps, Linked Lists
![Page 11: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/11.jpg)
11
DiSTM - Architecture (3/4)
@atomic public interface AtomicInteger { public int getValue(); public void setValue(int value);
}
![Page 12: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/12.jpg)
12
DiSTM - Architecture (4/4)
• DiSTM’s single instance overview
!"#$%#&'()$#*+,$-($.
!/".#0%
1.2)'.+3)224$(&#'()$+5#6."
7.2)"6+
8.#9
!"#$%#&'()$#*+
3)/.".$&.+
:")')&)*%
![Page 13: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/13.jpg)
13
DiSTM – Protocols
• DiSTM supports two modes of operation • Centralized mode: Data and coherency handled by the master node
• Three protocols in centralized mode (TCC, Single Lease, Multiple Leases) • Two stage validation protocols (eager localValidation(), lazy
remoteValidation())
• Decentralized mode: Fully decentralized operation, data distribution • Data are partitioned amongst the nodes • Anaconda decentralized protocol • Unified validation procedure (lazy localValidation(), lazy
remoteValidation())
![Page 14: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/14.jpg)
14
DiSTM- Centralized Protocols (1/5)
TCC, Single Lease, Multiple Lease • Data consistency
• Master Node keeps a guaranteed consistent view of data • Worker Nodes keep cached working dataset • Upon a transaction’s commit the master node eagerly
forces the worker nodes to update their working datasets
Commit stage is serialized thus blocking possible parallel commits.
![Page 15: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/15.jpg)
15
DiSTM - TCC (2/5)
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
/.0)'.+
1)002$(&#'()$
+3#4."
9*):#*+;#'#
5#%'."+<)=.
>)"?."+<)=.+@+ >)"?."+<)=.+A
BC+".0)'.D#*(=#'.EC+FC+'"2.GH#*%.
IC+27=#'.+
-*):#*+=#'#
JC+27=#'.+
&#&K.=+=#'#
![Page 16: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/16.jpg)
16
DiSTM – Single lease (3/5)
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
/.0)'.+
1)002$(&#'()$
+3#4."
9*):#*+;#'#
5#%'."+<)=.
>)"?."+<)=.+@+ >)"?."+<)=.+A
BC+#&D2(".+*.#%.
EC+27=#'.+
#%%(-$+*.#%.FC+'"2.GH#*%.
IC+27=#'.
+".*.#%.+*.#%.
![Page 17: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/17.jpg)
17
DiSTM – Multiple leases (4/5)
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
!"#$%#&'()$#*+,$-($.
/.0)'.+
1)002$(&#'()$
+3#4."5.0)"4+6.#7
8")')&)*
/.0)'.+
1)002$(&#'()$
+3#4."
9*):#*+;#'#
5#%'."+<)=.
>)"?."+<)=.+@+ >)"?."+<)=.+A
BC+#&D2(".+*.#%.
E#*(=#'.
FC+".#&D2(".+
*.#%.+)$+#:)"'GC+'"2.HI#*%.
JC+27=#'.
+".*.#%.+*.#%.
![Page 18: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/18.jpg)
18
DiSTM – Anaconda protocol (1/3)
• Fully decentralized, 3-stage protocol • Object caching and replication • Enables parallel commit of transactions • Library of distributed atomic collection classes
• Arrays, Singleton Objects, HashMaps, Linked Lists
![Page 19: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/19.jpg)
19
DiSTM – Anaconda protocol (2/3)
Three stage protocol: 1. Lock Acquisition: Acquire locks of objects 2. Validation: Validate against concurrently
running transactions 3. Update Objects :Update objects with new
values and Release locks
![Page 20: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/20.jpg)
20
DiSTM – Anaconda protocol (3/3)
![Page 21: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/21.jpg)
21
Evaluation
Benchmarks: • Lee-TM (Classic PCB Routing Benchmark) • Kmeans (Clustering algorithm) • Glife-TM (Conway’s Automaton)
Hardware • 4 nodes x 8 dual core Opterons, Open Suse, Sun HotSpot 1.6, Gigabit
Ethernet Experimental Setup: • Each node utilizes 1 to 8 threads (* 4 nodes: min=4, max=32) • We start by one thread per node and continue by incrementing by one • Comparative evaluation of protocols • Evaluation against industrial-strength Terracota clustering JVM
![Page 22: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/22.jpg)
22
LeeTM
![Page 23: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/23.jpg)
23
LeeTM
!!"#$
!!%#"
&'' () *) +,+
+-./01!23/!4.5560
)33&*!78
9!0:/3;<1
$!0:/3;<1
%=!0:/3;<1
%>!0:/3;<1
="!0:/3;<1
=9!0:/3;<1
=$!0:/3;<1
?=!0:/3;<1
!!"#"
!!"#=
!!"#9
!!"#>
![Page 24: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/24.jpg)
24
KMeans
![Page 25: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/25.jpg)
25
KMeans-Low
20
25
30
TCC SL ML ANA
Ab
ort
s p
er c
om
mit
KMeansLow
4 threads
8 threads
12 threads
16 threads
20 threads
24 threads
28 threads
32 threads
0
5
10
15
![Page 26: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/26.jpg)
26
GLife
![Page 27: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/27.jpg)
27
Categorization
!"#$%#&'($)*+$,-.) /($-+$'($)
!"#$ %&'()*$ +(,-$
.*/00$ 12/3"2'/$450(6&7$
8&99/3":/$4;/3/<"27$
8&99/3":/$
45&2"*&7$
8&99/3":/$4=%&/2>7$
!/9,&$ 12/3"2'/$4!&&8%?@A7$ 12/3"2'/$4!&&8%7$
![Page 28: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/28.jpg)
28
Bottlenecks – Future Work • Network Optimizations • Immediate Services • Java Fast Sockets
• Garbage Collection • Tuning • Distributed GC
• Transactional Protocols • Multi-versioning (D2STM)
• Integration with Enterprise Servers • Real-time workloads
![Page 29: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/29.jpg)
29
Conclusions
• JVM Clustering with Software TM • Study of Distributed TM protocols • Centralized – TCC, SL, ML • Decentralized - Anaconda
• Performance influenced by: • Transaction abort rate • Computational intensity of applications
• Different winning protocol depending on workload • Evaluation against state-of-the art commercial lock based solution
![Page 30: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/30.jpg)
30
Further Contributions • Intel: Hardware/Software CPU Codesign
![Page 31: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/31.jpg)
31
Further Contributions
Features • CISC->VLIW • Dynamic Binary Translation and Optimizations • Load Hoisting, Code Versioning
• Targeting better power/performance • Real time path profiling and optimizations • Aggressive speculation and fail recovery
![Page 32: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/32.jpg)
32
Further Contributions
Oracle • Truffle/Graal (One VM to rule them all) • Abstract Syntax Tree Intepreter + Dynamic Compilation on top of the HotSpot VM
• Multi-language VM • JavaScript, Python, Ruby, R, etc.
• Compiler/Garbage Collection optimizations • Write Barrier Elision, Compressed Pointers
![Page 33: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/33.jpg)
Research Opportunities within the APT Group
Christos Kotselidis Research Fellow Advanced Processor Technologies Group The University of Manchester
![Page 34: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/34.jpg)
34
History • World’s first stored program
computer (The Baby) • Invention of virtual memory (Atlas) • Manchester Dataflow Computer • 2008 1st place in RAE
![Page 35: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/35.jpg)
35
Advanced Processor Technologies Group
• Led by ICL Professor Steve Furber • Designer of the ARM processors (BBC Micro, Acorn)
• Diverse research agenda • Spinnaker (one of the few academic institutions fabricating
chips) • Computer Architecture • Systems • Compilers and Managed Runtimes
![Page 36: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/36.jpg)
36
Advanced Processor Technologies Group
• Major Spinoffs • ICL Goldrush database server • Amulet processors (Low power) • Transitive (Rosseta software, acquired by IBM) • Silistix (Network-on-Chip)
• Career opportunities • ARM, Oracle, Intel, Google, Imagination Technologies, etc.
![Page 37: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/37.jpg)
37
Advanced Processor Technologies Group • Current Projects
• SpiNNaker: A universal Spiking Neural Network Architecture
• Teraflux: Research in Many-core (Software and Hardware) following Data Driven Task model
• AXLE: Big Data Analytics Acceleration • AnyScale Apps Further info at: http://apt.cs.man.ac.uk/
![Page 38: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/38.jpg)
38
Advanced Processor Technologies Group
• New Initiatives • Pamela: A Panoramic Approach to the Many-CorE LAndscape We focus on hardware/software codesign for heterogeneous many-core systems for computer vision and data-centers with emphasis on novel virtualization techniques. • DOM: Delaying and Overcoming Microprocessor Errors We focus on hardware/software codesign using virtualization (Managed Runtime Environments) for delaying and detecting microprocessor errors.
![Page 39: Exploiting Distributed Software Transactional Memory · 4 What is Transactional Memory? (1/2) • New concurrent programming model, aims to: • Simplify programming compared to fine-grain](https://reader034.fdocuments.in/reader034/viewer/2022050401/5f7ee9105563693618770200/html5/thumbnails/39.jpg)
39
Advanced Processor Technologies Group
• Funding • Industrial Funding Positions by ARM (3 years): Deadline ASAP • Center of Doctoral Training-CDT Positions (4 years): Deadline: As
early as possible
http://www.cs.manchester.ac.uk/phd/programmes/cdt/ http://www.mdc.manchester.ac.uk/funding/pdsaward/ http://www.cs.manchester.ac.uk/study/postgraduate-research/programmes/phd/funding/school-studentships/ http://www.cs.manchester.ac.uk/phd/
Contacts: Mikel Lujan ([email protected]) Christos Kotselidis ([email protected])