Transactional Coherence and Consistency
Presenters: Muhammad Mohsin Butt. (g201103010)Presenters: Muhammad Mohsin Butt. (g201103010)
Coe-502 paper presentation 2
OUtline1. Introduction
2. Current Hardware
3. TCC in Hardware
4. TCC in Software
5. Performance evaluation
6. Conclusion.
• Transactional Coherence and Consistency (TCC) provides a lock free transactional model which simplifies parallel hardware and software.
• Transactions are the basic unit of parallel work which are defined by the programmer.
• Memory coherence, communication and memory consistency are implicit in a transaction.
Intoduction
• Provide illusion of a single shared memory to all
processors.
• Problem is divided into various parallel tasks that work on a
shared data present in shared memory.
• Complex cache coherence protocols required.
• Memory consistency models are also required to ensure the
correctness of the program.
• Locks used to prevent data races and provide sequential
access.
• Too many locks overhead can degrade performance.
Current Hardware
TCC in HARDWARE
• Processors execute speculative transactions in a continuous
cycle.
• A transaction is a sequence of instructions marked by
software that are guaranteed to execute and complete
atomically.
• Provides All Transactions All The time model which
simplifies parallel hardware and software.
TCC in HARDWARE
• When a transaction starts, it produces a block of writes in a
local buffer while transaction is executing.
• After completing transaction, hardware arbitrates system
wide for permission to commit the transaction.
• After acquiring permission, the node broadcasts the writes
of the transaction as one single packet.
• Transmission as a single packet reduces number of inter
processor messages and arbitrations.
• Other processors snoop on these write packets for
dependence violation.
TCC in HARDWARE
TCC in HARDWARE
• TCC simplifies cache design
• Processor hold data in unmodified and speculatively modified
form.
• During snooping invalidation is done if commit packet contains
address only.
• Update is done if commit packet contains address and data.
• Protection against data dependencies.
• If a processor has read from any of the commit packet address, the
transaction is re executed.
TCC in HARDWARE
• Current CMP need features that provide speculative
buffering of memory references and commit arbitration
control.
• Mechanism for gathering all modified cache lines from each
transaction into a single packet is required.
• Write Buffer completely separate from cache.
• Address buffer containing list of tags for lines containing data to be
committed.
TCC in HARDWARE
• Read BITs
• Set on a speculative read during a transaction.
• Current transaction is voilated and restarted if the snoop protocal
sees a commit packet having address of a location whose read bit is
set.
• Modified BITs
• During a transaction stores set this bit to 1.
• During violation lines having modified bit set to 1 are invalidated.
TCC in Software
• Programming with TCC is a 3 Step process.
• Divide program into transactions.
• Specify Transactions Order.
• Can be relaxed if not required.
• Tuning Performance
• TCC provide feedback where in program the violations occur
frequently
Loop Based Parallelization
• Consider Histogram Calculation for 1000 integer
percentage
/* input */
int *in = load_data();
int i, buckets[101];
for (i = 0; i < 1000; i++) {
buckets[data[i]]++;
}
/* output */
print_buckets(buckets);
Loop Based Parallelization
• Can be parallelized using.
t_for (i = 0; i < 1000; i++)
• Each loop body becomes a separate transaction.
• When two parallel iterations try to update same histogram
bucket, TCC hardware causes later transaction to violate,
forcing the later transaction to re execute.
• A conventional Shared memory model would require locks
to protect histogram bins.
• Can be further optimized using
• t_for_unordered()
Fork Based Parallelization
• t_fork() forces the parent transaction to commit and
create two completely new transactions.
• One continues execution of remaining code
• Second start executing the function provided in parameters. E.g
/* Initial setup */
int PC = INITIAL_PC;
int opcode = i_fetch(PC);
while (opcode ! = END_CODE){
t_fork(execute, &opcode,
1, 1, 1);
increment_PC(opcode, &PC);
opcode = i_fetch(PC);}
Explicit transaction commit ordering
• Provide partial ordering.
• Done by assigning two parameters to each transaction
• Sequence Number and Phase Number
• Transactions with same sequence number commit in an
ordered way defined by programmer.
• Transactions with different sequence number are
independent.
• Order for transactions having same sequence numbered is
achieved through phase number.
• Transaction having Lowest Phase number is executed first.
Performance Evaluation
Performance Evaluation
• Maximize Parallelization.
• Create as many transactions as possible
• Minimize Violations.
• Keep transactions small to reduce amount of work lost on violation
• Minimize Transaction Overhead
• Not To small size of transaction
• Avoid Buffer Overflow
• Can result in excessive serialization
Performance Evaluation
• Base Case.
• Simple parallelization without any optimization.
• Unordered
• Finding loops that can be un orderd.
• Reduction
• Finding areas that exploit reduction operations
• Privatization
• Privatize the variables to each transaction that cause violations.
• Using t_commit()
• Break large transactions to small ones but execute on same processor.
Reduces loss overhead due to violations and prevents buffer overflow.
• Loop Adjustments
• Using various loop adjustments optimizations provided by the compiler.
Performance Evaluation
Privatization and t_commitImprove performance
Inner Loops had too many violations Using outer loop_adjust improved result
Performance Evaluation
• CMP performance is close to Ideal TCC for small number of
processors.
Conclusions
• Bandwidth limitation is still a problem for scaling TCC to
more processors.
• No support for nested for loops.
• Dynamic optimization techniques still required to automate
performance tuning on TCC
Top Related