McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben...
-
Upload
raymundo-ferrand -
Category
Documents
-
view
222 -
download
0
Transcript of McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben...
McRT-Malloc: A Scalable Non-Blocking
Transaction Aware Memory Allocator
Ali Adl-Tabatabai
Ben Hertzberg
Rick Hudson
Bratin Saha
2
Goals of McRT-Malloc
Scalable • Performance linear to # of processors then flat as you add
more SW threads• Preemption safety• Implies a lock free approach to all structures
Allows other scalable McRT algorithms to use malloc and remain scalable
Transactional memory awareness• Avoid memory blowup within transaction• Avoid freeing of bits needed to validate other transactions• Enable a object level conflict detection in STM
Best of class
3
Block Data Structure
Heap divided into aligned 16K blocks• 18 significant bits
Block• Owned by a single thread during
allocation • Blocks segregated into bins
according to objects size• Meta data header
– Free Lists– Bump Pointer– Next/Previous Block– Object size and usage info
• No per object Headers• Free blocks on non-blocking LIFO
queue – 46 bit for update timestamp
0xABCD0000
0xABCD0040
0xABCD4000
Meta data Header
.
.
.
Object Pointer
4
Object Allocation and Freeing
Thread owns block they allocate in
Trick - Free uses two linked free lists per block
• Private free list for block owner avoids atomic instructions
• Public list for other threads use atomic instruction and non-blocking algorithm
Trick - Fresh block uses frontier pointer to avoid free list initialization
Then allocates from private free list
Privatize entire public list as needed with atomic xchg
5
McRT-Malloc: A Transaction Aware Memory Allocator
Three problems
1. Speculative memory allocation and de-allocation inside transactions can cause space blowup
2. Transactional conflict detection and frees
3. Object-based conflict detection in C/C++
Garbage collection also solves these issues
6
Allocation with STM
Speculatively allocate or free inside transaction
• Valid at commit - rolled back on abort
Balanced – both malloc and free within transaction
• Memory is transaction-local must be reused to prevent memory blowup
transaction {
for (i=0; i<big_number; i++) {
foo = malloc(size);
…
free(foo);
}
}
7
Solution
Use sequence numbers to track allocation relationships
• Sequence counter per-thread (thread-local)
• Every transaction (even nested) takes a new (incremented) sequence number upon start
• Every allocation in the transaction is tagged with its sequence number
The relationship of an object being freed in a given transaction is determined by sequence number:
• seq(object) < seq(transaction) → speculative free
• seq(object) == seq(transaction) → balanced free
8
Monitors != Transactions
• STM uses bits in object to validate at commit
• Pessimistically monitors (locks) allow only one thread inside a critical section
• Optimistically transactions allow multiple threads inside a critical section
• This causes problems freeing an object
9
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
Thread 1Deleting node 2
Thread 2Deleting node 3
10
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
11
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
At this point you have read / read (non) conflict
12
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
Now we have a read / write conflictThread 1 commits and thread two will abort
13
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate & end transaction */} /* validate & end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate & end transaction */} /* validate & end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
STM Version information needed for validation is destroyed along with object 2
14
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
Thread two wakes up
15
The bits thread 2 are relying on to detect and resolve conflict by aborting are now garbage
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
nodeDelete(int key) {nodeDelete(int key) { ptr = head of list;ptr = head of list; transaction {transaction { while( ptr->next->key != key ) {while( ptr->next->key != key ) { ptr = ptr->next;ptr = ptr->next; }} /* end while */ temp = ptr->next;temp = ptr->next; ptr->next = ptr->next->next;ptr->next = ptr->next->next; } /* validate &} /* validate & end transaction */end transaction */ free(temp); /* Anyone using? */free(temp); /* Anyone using? */}}
16
Solution
Delay the actual free and reuse until in a consistent state
A global epoch (timestamp) is maintained and incremented periodically
Each thread locally remembers the global epoch of the last time it entered or exited a top level transaction
• Set as part of TransactionBegin and TransactionAbort/Commit
Each free and global epoch noted in a thread local buffer
When the buffer fills each thread’s epoch is queried
All frees before the minimum epoch are freed “for real”
O(number of frees) not O(number of memory accesses)
17
McRT-Malloc Beats Hoard
Machias Benchmark Mimics the consumer producer pattern with minimal work load
(Normalized so X axis indicates linear scaling)
McRT Malloc vs. Hoard
1
10
100
1000
1 2 4 8 16 32 64 128
SW Threads
HW
Th
rea
d N
orm
aliz
ed
Tim
e (
low
er
is b
ett
er)
Hoard 100% Sharing
Hoard 50% Sharing
Hoard 25% Sharing
Hoard 12.5% Sharing
Hoard 0% Sharing
100% Sharing
50% Sharing
25% Sharing
12.5% Sharing
0% Sharing
18
McRT STM Malloc Running Machias
McRT STM Malloc
1
10
100
1000
1 2 4 8 16 32 64 128
SW Threads
HW
Th
rea
d N
orm
ailz
ed
Tim
e (
low
er
is b
ett
er)
100% Sharing
50% Sharing
25% Sharing
12.5% Sharing
0% Sharing
19
McRT STM vs. McRT Malloc Running Machias
McRT STM Malloc vs. McRT Malloc
1
10
100
1000
1 2 4 8 16 32 64 128
SW Threads
HW
Thr
ead
Nor
mal
ized
Tim
e (lo
wer
is b
ette
r)
Transactional
Non-Transactional
20
McRT STM vs. McRT Malloc Memory UsageRunning Machias
Memory Usage For Machias
0
10
20
30
40
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Machias Scenario
MB
ytes
use
d
McRT-STM-Malloc McRT Malloc
21
Conclusion
Best of class scalable malloc implementation
Non-blocking to enable other McRT algorithms to be non-blocking and still use malloc
Solved memory blowup within a transaction
Solved premature freeing problem for STM with optimistic concurrency
Enabled object granularity conflict detection in C
22
Questions