Multithreading with Boost Thread and Intel TBB

45
Multithreading with Boost and TBB Patrick Charrier TU Darmstadt Partner:

Transcript of Multithreading with Boost Thread and Intel TBB

Page 1: Multithreading with Boost Thread and Intel TBB

Multithreading withBoost and TBB

Patrick CharrierTU Darmstadt

Partner:

Page 2: Multithreading with Boost Thread and Intel TBB

Multithreading in Computer Animation

Partner:

Page 3: Multithreading with Boost Thread and Intel TBB

Data-parallel Programming OpenMP OpenCL

Task-parallel Programming Boost.Atomic Boost.Thread Boost.Lockfree Intel Thread Building Blocks (TBB)

Partner:

Page 4: Multithreading with Boost Thread and Intel TBB

Why Boost.Thread? Platform-independent Abstraction over pthreads and WinAPI Modern C++ API

C++03 Support Syntax compatible with C++11 std::thread

Partner:

Page 5: Multithreading with Boost Thread and Intel TBB

Boost.Thread Overview Threads Mutexes Locks Condition variables Futures

Extensions

Partner:

Page 6: Multithreading with Boost Thread and Intel TBB

Boost.Thread "Hello World“

Notes: boost::thread::interrupt() interrupts the thread. boost::scoped_thread<CallableThread> would

join automatically at end of scope.

Partner:

#include <boost/thread.hpp>

void f() { for(size_t i=0; i<100; ++i) std::cout << "Hello number " << i << std::endl;}

int main(){ boost::thread thread(f); thread.join();

return 0;}

Page 7: Multithreading with Boost Thread and Intel TBB

Boost.Thread this_thread

Sleeping, Exit Function, Interruption points

Partner:

namespace boost { namespace this_thread { thread::id get_id() noexcept; template <class Clock, class Duration> void sleep_until(const chrono::time_point<Clock, Duration>& abs_time); template <class Rep, class Period> void sleep_for(const chrono::duration<Rep, Period>& rel_time);

template<typename Callable> void at_thread_exit(Callable func); // EXTENSION

void interruption_point(); // EXTENSION bool interruption_requested() noexcept; // EXTENSION bool interruption_enabled() noexcept; // EXTENSION class disable_interruption; // EXTENSION class restore_interruption; // EXTENSION

// ... }}

Page 8: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Joe’s Bank Account

Partner:

BankAccount JoesAccount;

void bankAgent(){ for (int i =10; i>0; --i) { //... JoesAccount.Deposit(500); //... }}

void Joe() { for (int i =10; i>0; --i) { //... JoesAccount.Withdraw(100); int myPocket = JoesAccount.GetBalance(); std::cout << myPocket << std::endl; //... }}

int main() { //... boost::thread thread1(bankAgent); // start concurrent execution of bankAgent boost::thread thread2(Joe); // start concurrent execution of Joe thread1.join(); thread2.join(); return 0;}

Page 9: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Joe’s Bank Account

Joe will have problems with his balance.

Partner:

class BankAccount { int balance_;public: void Deposit(int amount) { balance_ += amount; } void Withdraw(int amount) { balance_ -= amount; } int GetBalance() { int b = balance_; return b; }};

Page 10: Multithreading with Boost Thread and Intel TBB

Boost.Thread, Joe’s Bank Accountand Internal Locks on Mutexes

Partner:

class BankAccount { boost::mutex mtx_; int balance_;public: void Deposit(int amount) { mtx_.lock(); balance_ += amount; mtx_.unlock(); } void Withdraw(int amount) { mtx_.lock(); balance_ -= amount; mtx_.unlock(); } int GetBalance() { mtx_.lock(); int b = balance_; mtx_.unlock(); return b; }};

Page 11: Multithreading with Boost Thread and Intel TBB

Boost.Thread, Joe’s Bank Accountand Internal Locks through Lock Guards

Lock Guards simplify locking.

Partner:

class BankAccount { boost::mutex mtx_; // explicit mutex declaration int balance_;public: void Deposit(int amount) { boost::lock_guard<boost::mutex> guard(mtx_); balance_ += amount; } void Withdraw(int amount) { boost::lock_guard<boost::mutex> guard(mtx_); balance_ -= amount; } int GetBalance() { boost::lock_guard<boost::mutex> guard(mtx_); return balance_; }};

Page 12: Multithreading with Boost Thread and Intel TBB

Boost.Thread Mutex Types

Partner:

Mutex Type Description

mutex ‘just’ a mutex

timed_mutex mutex with timeout

recursive_mutex may be recursively locked from same thread

recursive_timed_mutex recursive lock with timeout

shared_mutex ‘fair’ reader/writer

upgrade_mutex reader/writer

null_mutex ‘convenience’ mutex

Page 13: Multithreading with Boost Thread and Intel TBB

Boost.Atomic as an alternative to Mutexes

Atomics guarantee atomicity for additions, subtractions, etc. memory order

Partner:

boost::atomic<int> balance(0);

thread1: balance.fetch_add(500, boost::memory_order_release);

thread2: balance.fetch_sub(100, boost::memory_order_release);

Page 14: Multithreading with Boost Thread and Intel TBB

Boost.Thread and External Locks

Joe is charged 2 Euros perCredit Card Withdrawal.

The whole transaction should be “atomic”. Why is his account suddenly locked?

Partner:

void ATMWithdrawal(BankAccount& acct, int sum) { boost::lock_guard<boost::mutex> guard(acct.mtx_);

acct.Withdraw(sum); acct.Withdraw(2);}

Page 15: Multithreading with Boost Thread and Intel TBB

Boost.Thread - Lock Types

Partner:

Lock Type Description

strict_lock<Lockable> non-copyable lock

unique_lock<Lockable> Lock with many features (try, timeout, deferred)

upgrade_lock<Lockable> upgradable to strict_lock

reverse_lock<Lockable> unlock on construction, lock on destruction

spinlock busy waiting, no sleep (Boost.Atomic)

Page 16: Multithreading with Boost Thread and Intel TBB

Boost.Thread - Locks as Permits

Observation: Whenever an account is modifieda lock must be acquired.

Rephrase: Whenever an account is modifiedit must be permitted.

The helper class externally_locked<T,Lockable>treats locks as permits.Partner:

Page 17: Multithreading with Boost Thread and Intel TBB

Boost.Thread - externally_locked<T,Lockable>

Partner:

class BankAccount { int balance_;public: void Deposit(int amount) { balance_ += amount; } void Withdraw(int amount) { balance_ -= amount; }};

class AccountManager : public basic_lockable_adapter<thread_mutex>{public: typedef basic_lockable_adapter<thread_mutex> lockable_base_type; AccountManager() : checkingAcct_(*this) , savingsAcct_(*this) {} inline void Checking2Savings(int amount); inline void AMoreComplicatedChecking2Savings(int amount);private:

externally_locked<BankAccount, AccountManager> checkingAcct_; externally_locked<BankAccount, AccountManager> savingsAcct_;};

Page 18: Multithreading with Boost Thread and Intel TBB

Boost.Thread - externally_locked<T,Lockable>

externally_locked<T,Lockable>allows to protect arbitrary objects of type T.

Locking needs strict_lock<Lockable> !

Partner:

void AccountManager::Checking2Savings(int amount) { strict_lock<AccountManager> guard(*this);

checkingAcct_.get(guard).Withdraw(amount); savingsAcct_.get(guard).Deposit(amount);}

Page 19: Multithreading with Boost Thread and Intel TBB

Boost.Thread - Lock Options Specify in lock constructor:

x_lock(Lockable& m, boost::adopt_lock_t) x_lock(Lockable& m, boost::defer_lock_t) x_lock(Lockable& m, boost::try_lock_t)

Adopt lock: Immediately lock (default). Defer lock: Must be locked later manually. Try lock: Lock is only acquired

when mutex is currently unlocked.

Partner:

Page 20: Multithreading with Boost Thread and Intel TBB

Boost.Thread and another deadlock

Partner:

#include <boost/thread.hpp>

boost::mutex mut;bool data_ready;

void process_data();

void wait_for_data_to_process(){ boost::unique_lock<boost::mutex> lock(mut); while(!data_ready) {} process_data();}

void retrieve_data();void prepare_data();

void prepare_data_for_processing(){ retrieve_data(); prepare_data(); { boost::lock_guard<boost::mutex> lock(mut); data_ready=true; }}

Page 21: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Condition Variables

Goal: Wait until data is ready. Condition variable unlocks mutex on

wait(). Sleeps while waiting.

Partner:

#include <boost/thread.hpp>

boost::condition_variable cond;boost::mutex mut;bool data_ready;

void process_data();

void wait_for_data_to_process(){ boost::unique_lock<boost::mutex> lock(mut); while(!data_ready) { cond.wait(lock); } process_data();}

Page 22: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Condition Variables

Condition Variables are more than Atomics!Atomics do not support waking up threads.

notify_one() wakes one waiting thread. Note: notify_all() wakes all waiting threads.

Partner:

void retrieve_data();void prepare_data();

void prepare_data_for_processing(){ retrieve_data(); prepare_data(); { boost::lock_guard<boost::mutex> lock(mut); data_ready=true; } cond.notify_one();}

Page 23: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Futures Suppose we want to run a computation

async.

“The computation” is implemented ina single Functor (function or function object).

That Functor returns a single object ofan arbitrary type (int, float, double, …).

At some point in the future we require this value.

Partner:

Page 24: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Packaged Tasks

Partner:

#include <boost/thread/future.hpp>

int calculate_the_answer_to_life_the_universe_and_everything(){ return 42;}

int main() { boost::packaged_task<int> pt(calculate_the_answer_to_life_the_universe_and_everything); boost::unique_future<int> fi = pt.get_future();

boost::thread task(boost::move(pt)); // launch task on a thread

fi.wait(); // wait for it to finish

assert(fi.is_ready()); assert(fi.has_value()); assert(!fi.has_exception()); assert(fi.get_state()==boost::future_state::ready); assert(fi.get()==42);

std::cout << fi.get() << std::endl; std::cin.get();

return 0;}

Page 25: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Promises Promises are like Packaged Tasks,

except there is no Functor that needs to return.

A value is simply set at an unknown pointin the future from somewhere else.

Partner:

int main() { boost::promise<int> pi; boost::unique_future<int> fi; fi=pi.get_future();

pi.set_value(42);

assert(fi.is_ready()); assert(fi.has_value()); assert(!fi.has_exception()); assert(fi.get_state()==boost::future_state::ready); assert(fi.get()==42);

return 0;}

Page 26: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Futures The Future is now!

If we can immediately return the result,this is much faster.

Note: boost::async() creates Futures.

Partner:

boost::unique_future<int> compute(int x){ if (x == 0) return boost::make_ready_future(0); if (x < 0) return boost::make_ready_future<int>(std::logic_error("Error")); boost::unique_future<int> f1 = boost::async(calculate_the_answer_to_life_the_universe_and_everything); return f1;}

Page 27: Multithreading with Boost Thread and Intel TBB

Boost.Thread and Futures Futures can be piped!

Not yet, but in the future (Boost 1.56) ?

Partner:

using namespace boost;

std::string interpret_the_answer_to_life_the_universe_and_everything(unique_future<int> fi){ if(42 == fi.get()) return "I do not understand."; else return "That is even more confusing.";}

int main(){ unique_future<int> f1 = boost::async(calculate_the_answer_to_life_the_universe_and_everything); unique_future<std::string> f2 = f1.then(interpret_the_answer_to_life_the_universe_and_everything);

std::cout << f2.get() << std::endl; return 0;}

Page 28: Multithreading with Boost Thread and Intel TBB

Boost.Thread - Advanced Topics Locking multiple mutexes at once

(boost::lock) Thread Groups One-time initialization Barriers Synchronized values

Other mutex types Other lock types

Partner:

Page 29: Multithreading with Boost Thread and Intel TBB

Boost.Lockfree Provides a number of lockfree data

structures: Queue Single Producer Single Consumer Queue (SPSC) Stack

Not just concurrent, but really lockfree!

Partner:

Page 30: Multithreading with Boost Thread and Intel TBB

Boost.Lockfree Queue Example

Partner:

boost::lockfree::queue<int> queue;

void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!queue.push(value)) ; }}

boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (queue.pop(value)) ++consumer_count; }

while (queue.pop(value)) ++consumer_count;}

Page 31: Multithreading with Boost Thread and Intel TBB

Boost.Lockfree SPSC Queue Example

Partner:

boost::lockfree::spsc_queue<int, boost::lockfree::capacity<1024> > spsc_queue;

void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!spsc_queue.push(value)) ; }}

boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (spsc_queue.pop(value)) ++consumer_count; }

while (spsc_queue.pop(value)) ++consumer_count;}

Page 32: Multithreading with Boost Thread and Intel TBB

Boost.Lockfree Stack Example

Partner:

boost::lockfree::stack<int> stack(128);

void producer(void){ for (int i = 0; i != iterations; ++i) { int value = ++producer_count; while (!stack.push(value)) ; }}

boost::atomic<bool> done (false);void consumer(void){ int value; while (!done) { while (stack.pop(value)) ++consumer_count; }

while (stack.pop(value)) ++consumer_count;}

Page 33: Multithreading with Boost Thread and Intel TBB

Intel Thread Building Blocks Not as low level as Boost.Thread. Has more data-structures than

Boost.Lockfree.

Not strictly task-parallel, but also data-parallel.

But more flexible than OpenMP or OpenCL.

A specialist library for “difficult cases”.

Partner:

Page 34: Multithreading with Boost Thread and Intel TBB

Intel TBB – parallel_do Number of elements/iterations

is not known in advance - Cook until done.

New elements can be inserted dynamically and concurrently, even by the Functor itself.

Partner:

class ApplyFoo {public: void operator()( Item& item ) const { Foo(item); }};

void ParallelApplyFooToList( const std::list<Item>& list ) { tbb::parallel_do( list.begin(), list.end(), ApplyFoo() ); }

Page 35: Multithreading with Boost Thread and Intel TBB

Intel TBB – pipeline Parallel assembly lines Multiple sequential steps for each element

Partner:

void RunPipeline( int ntoken, FILE* input_file, FILE* output_file ) { tbb::filter_t<void,TextSlice*> f1( tbb::filter::serial_in_order, MyInputFunc(input_file) ); tbb::filter_t<TextSlice*,TextSlice*> f2(tbb::filter::parallel, MyTransformFunc() ); tbb::filter_t<TextSlice*,void> f3(tbb::filter::serial_in_order, MyOutputFunc(output_file) ); tbb::filter_t<void,void> f = f1 & f2 & f3; tbb::parallel_pipeline(ntoken,f);}

Page 36: Multithreading with Boost Thread and Intel TBB

Intel TBB – Concurrent Containers concurrent_hash_map<T> concurrent_vector<T> concurrent_queue<T>

Partner:

// A concurrent hash table that maps strings to ints.typedef concurrent_hash_map<string,int,MyHashCompare> StringTable;

void CountOccurrences() { // Construct empty table. StringTable table;

// Put occurrences into the table parallel_for( blocked_range<string*>( Data, Data+N, 1000 ), Tally(table) );

// Display the occurrences for( StringTable::iterator i=table.begin(); i!=table.end(); ++i ) printf("%s %d\n",i->first.c_str(),i->second);}

Page 37: Multithreading with Boost Thread and Intel TBB

Intel TBB – Mutual Exclusion Much like Boost.Thread Mutexes

Partner:

Mutex Scalable Fair Recursive Long Wait

mutex OS dependent OS dependent no blocks

recursive_mutex OS dependent OS dependent yes blocks

spin_mutex no no no yields

queuing_mutex ✓ ✓ no yields

spin_rw_mutex no no no yields

queuing_rw_mutex ✓ ✓ no yields

null_mutex moot ✓ ✓ never

null_rw_mutex moot ✓ ✓ never

Page 38: Multithreading with Boost Thread and Intel TBB

Intel TBB – Atomic Operations Much like Boost.Atomic

Partner:

atomic<int> x(1);

int old = x.fetch_and_add<release>(-1);

Memory order Description Default For

acquire Operations after the atomic operation never move over it.

read

release Operations before the atomic operation never move over it.

write

sequentially consistent Operations on either side never move over the atomic operation and the sequentially consistent atomic operations have a global order.

fetch_and_store fetch_and_add compare_and_swap

Page 39: Multithreading with Boost Thread and Intel TBB

Intel TBB – Cache Aligned Allocation Cache-efficiency is very important with

today’s hardware. Often more important than

number of instructions of an algorithm. Cache lines require memory alignment.

Cache Aligned Allocator Example:

Partner:

std::vector<int,cache_aligned_allocator<int> >;

Page 40: Multithreading with Boost Thread and Intel TBB

Intel TBB – The Task Scheduler Reasons for Task-based Programming

Matching parallelism to available resources

Faster task startup and shutdown More efficient evaluation order Improved load balancing Higher–level thinking

But: More potential for errors!Partner:

Page 41: Multithreading with Boost Thread and Intel TBB

Intel TBB – The Task Scheduler Task Example

Partner:

class FibTask: public task {public: const long n; long* const sum; FibTask( long n_, long* sum_ ) : n(n_), sum(sum_) {} task* execute() { // Overrides virtual function task::execute if( n<CutOff ) { *sum = SerialFib(n); } else { long x, y; FibTask& a = *new( allocate_child() ) FibTask(n-1,&x); FibTask& b = *new( allocate_child() ) FibTask(n-2,&y); // Set ref_count to 'two children plus one for the wait". set_ref_count(3); // Start b running. spawn( b ); // Start a running and wait for all children (a and b). spawn_and_wait_for_all(a); // Do the sum *sum = x+y; } return NULL; }};

Page 42: Multithreading with Boost Thread and Intel TBB

Intel TBB – Design Patterns The hidden strength of TBB lays in

many parallel design patternsit supports and documents: Agglomeration Element-wise Odd-even communication Wavefront Compare and Swap Loop More …

Partner:

Page 43: Multithreading with Boost Thread and Intel TBB

FRAGEN?

Partner:Partner:

Page 44: Multithreading with Boost Thread and Intel TBB

Vielen Dank![Speaker]

Ich freue mich auf Ihr Feedback!

Partner:Partner:

Page 45: Multithreading with Boost Thread and Intel TBB

Boost.Thread barrier

Partner:

#include <boost/atomic.hpp>#include <boost/thread.hpp>#include <boost/thread/scoped_thread.hpp>#include <boost/thread/barrier.hpp>#include <boost/bind.hpp>

boost::atomic<int> current(0);boost::mutex io_mutex;

void thread_fun(boost::barrier& cur_barier){ current.fetch_add(1,boost::memory_order_relaxed); cur_barier.wait();

boost::lock_guard<boost::mutex> locker(io_mutex); std::cout << current << std::endl;}

int main(){ boost::barrier bar(3); boost::scoped_thread<> thr1(boost::thread(boost::bind(&thread_fun, boost::ref(bar)))); boost::scoped_thread<> thr2(boost::thread(boost::bind(&thread_fun, boost::ref(bar)))); boost::scoped_thread<> thr3(boost::thread(boost::bind(&thread_fun, boost::ref(bar))));

std::cin.get(); return 0;}