Highly concurrent yet natural programming

Highly concurrent yet naturalprogramming

mefylquentin.hocquet@infinit.io

Version 1.2

Infinit & me

• Quentin "mefyl" Hocquet• Epita CSI (LRDE) 2008.• Ex Gostai• Into language theory• Joined Infinit early two years ago.

Infinit & me

• Quentin "mefyl" Hocquet• Epita CSI (LRDE) 2008.• Ex Gostai• Into language theory• Joined Infinit early two years ago.

Infinit

• Founded my Julien "mycure" Quintard, Epita SRS 2007• Based on his thesis at Cambridge• Decentralized filesystem in byzantine environment• Frontend: file transfer application based on the technology.• Strong technical culture

Concurrent and parallelprogramming

Know the difference

Parallel programmingAims at running two tasks simultaneously. It is a matter of performances.

Concurrent programmingAims at running two tasks without inter-blocking. It is a matter of behavior.

Task 1 Task 2

Know the difference

Task 1 Task 2

Know the difference

Sequential

Task 1 Task 2

Know the difference

Parallel

Task 1 Task 2

Know the difference

Concurrent

Sequential Concurrent

Know the difference

Parallel

Know the difference

Parallel

Sequential Concurrent Parallel

CPU usage N N N

Execution time Long Short Shorter

Know the difference

Parallel

Sequential Concurrent Parallel

CPU usage N N N

Execution time Long Short Shorter

Need to run in parallel No No Yes

Commercials

Peeling

Some real life examples

You are the CPU. You want to:

• Watch a film on TV.• Peel potatoes.

SequentialTV

Commercials

Peeling

ConcurrentTV

Peeling

ParallelTV Peeling

Commercials

Unload

You are the CPU. You want to:

• Do the laundry.• Do the dishes.

SequentialLoad

UnloadLoad

Unload

ConcurrentLoadLoad

UnloadUnload

ParallelLoad Load

Unload Unload

Some programming examples

Video encoding: encode a raw 2GB raw file to mp4.

• CPU bound.• File chunks can be encoded separately and then merged later.

ParallelEncodefirst half

Encodesecond half

SequentialConcurrentEncode first

Encodesecond half

Video encoding: encode a raw 2GB raw file to mp4.

• CPU bound.• File chunks can be encoded separately and then merged later.

Parallelism is a plus, concurrency doesn't apply.

An IRC server: handle up to 50k IRC users chatting.

• IO bound.• A huge number of clients that must be handled concurrently and mostly

waiting.

Concurrent Parallel

An IRC server: handle up to 50k IRC users chatting.

• IO bound.• A huge number of clients that must be handled concurrently and mostly

waiting.

Concurrency is needed, parallelism is superfluous.

Know the difference

Parallelism

• Is never needed for correctness.• Is about performances, not correct behavior.• Is about exploiting multi-core and multi-CPU architectures.

Concurrent programming

• Can be needed for correctness.• Is about correct behavior, sometimes about performances too.• Is about multiple threads being responsive in concurrent.

Know the difference

Parallelism

A good video encoding app:

• Encodes 4 times faster on a 4-core CPU. That's parallelism.

Know the difference

Parallelism

A good video encoding app:

• Encodes 4 times faster on a 4-core CPU. That's parallelism.• Has a responsive GUI while encoding. That's concurrency.

Who's best ?

If you are parallel, you are concurrent. So why bother ?

Who's best ?

• Being parallel is much, much more difficult. That's time, money andprogrammer misery.

Who's best ?

• Being parallel is much, much more difficult. That's time, money andprogrammer misery.

• You can't be efficiently parallel past your hardware limit. Those are systemcalls, captain.

Threads, callbacks

So, how do you write an echo server ?

The sequential echo server

TCPServer server;server.listen(4242);while (true){

TCPSocket client = server.accept();

while (true){std::string line = client.read_until("\n");client.send(line);

TCPSocket client = server.accept();try{while (true){std::string line = client.read_until("\n");client.send(line);

}}catch (ConnectionClosed const&){}

TCPSocket client = server.accept();serve_client(client);

• Dead simple: you got it instantly. It's natural.• But wrong: we handle only one client at a time.• We need ...

• Dead simple: you got it instantly. It's natural.• But wrong: we handle only one client at a time.• We need ... concurrency !

The parallel echo server

TCPServer server;server.listen(4242);

while (true){

serve_client(client);

TCPServer server;server.listen(4242);std::vector<std::thread> threads;while (true){

TCPSocket client = server.accept();std::thread client_thread([&]{serve_client(client);

});client_thread.run();vectors.push_back(std::move(client_thread));

TCPServer server;server.listen(4242);std::vector<std::thread> threads;while (true){

TCPSocket client = server.accept();std::thread client_thread([&]{serve_client(client);

});client_thread.run();vectors.push_back(std::move(client_thread));

• Almost as simple and still natural,• To add the concurrency property, we just added a concurrency construct

to the existing.

But parallelism is too much

• Not scalable: you can't run 50k threads.

• Not scalable: you can't run 50k threads.• Induces unwanted complexity: race conditions.

int line_count = 0;while (true){

while (true){

std::string line = client.read_until("\n");client.send(line);++line_count;

int line_count = 0;while (true){

TCPSocket client = server.accept();std::thread client_thread([&]{while (true){

std::string line = client.read_until("\n");client.send(line);++line_count;

We need concurrency without threads

We need to accept, read and write to socket without threads so withoutblocking.

• Use select to monitor all sockets at once.• Register actions to be done when something is ready.• Wake up only when something needs to be performed.

This is abstracted with the reactor design pattern:

• libevent• Boost ASIO• Python Twisted• ...

The callback-based echo server

Reactor reactor;TCPServer server(reactor);

server.accept(&handle_connection);reactor.run();

voidhandle_connection(TCPSocket& client){

client.read_until("\n", &handle_read);}

voidhandle_connection(TCPSocket& client);

voidhandle_read(TCPSocket& c, std::string const& l, Error e){

if (!e)c.send(l, &handle_sent);

voidhandle_read(TCPSocket& c, std::string const& l, Error e);

voidhandle_sent(TCPSocket& client, Error error){

if (!e)client.read_until("\n", &handle_read);

How do we feel now ?

• This one scales to thousands of client.

• This one scales to thousands of client.• Yet to add the concurrency property, we had to completely change the way

we think.

• This one scales to thousands of client.• Yet to add the concurrency property, we had to completely change the way

we think.• A bit more verbose and complex, but nothing too bad ... right ?

Counting lines with threads

while (true){std::string line = client.read_until("\n");

client.send(line);}

}catch (ConnectionClosed const&){

Counting lines with threads

int lines_count = 0;try{

while (true){std::string line = client.read_until("\n");++lines_count;client.send(line);

}}catch (ConnectionClosed const&){

std::cerr << "Client sent " << lines_count << "lines\n";}

Counting lines with callbacks

voidhandle_connection(TCPSocket& client){

int* count = new int(0);client.read_until("\n", std::bind(&handle_read, count));

voidhandle_read(TCPSocket& c, std::string const& l,

Error e, int* count){

if (e)std::cerr << *count << std::endl;

elsec.send(l, std::bind(&handle_sent, count));

voidhandle_read(TCPSocket& c, std::string const& l,

Error e, int* count);

voidhandle_sent(TCPSocket& client, Error error, int* count){

if (e)std::cerr << *count << std::endl;

elseclient.read_until("\n", std::bind(&handle_read, count));

Callback-based programming considered harmful

• Code is structured with callbacks.

• Code is structured with callbacks.• Asynchronous operation break the flow arbitrarily.

• Code is structured with callbacks.• Asynchronous operation break the flow arbitrarily.• You lose all syntactic scoping expression (local variables, closure,

exceptions, ...).

• Code is structured with callbacks.• Asynchronous operation break the flow arbitrarily.• You lose all syntactic scoping expression (local variables, closure,

exceptions, ...).• This is not natural. Damn, this is pretty much as bad as GOTO.

Are we screwed ?

Threads

• Respect your beloved semantic and expressiveness.• Don't scale and introduce race conditions.

Are we screwed ?

Threads

Callbacks

• Scale.• Ruins your semantic. Painful to write, close to impossible to maintain.

Are we screwed ?

Threads

Callbacks

I lied when I said: we need concurrency without threads.

Are we screwed ?

Threads

Callbacks

I lied when I said: we need concurrency without threads.

We need concurrency without system threads.

Coroutines

Also known as:

• green threads• userland threads• fibers• contexts• ...

Coroutines

• Separate execution contexts like system threads.• Userland: no need to ask the kernel.• Non-parallel.• Cooperative instead of preemptive: they yield to each other.

Coroutines

• Separate execution contexts like system threads.• Userland: no need to ask the kernel.• Non-parallel.• Cooperative instead of preemptive: they yield to each other.

By building on top of that, we have:

• Scalability: no system thread involved.• No arbitrary race-conditions: no parallelism.• A stack, a context: the code is natural.

Coroutines-based scheduler

• Make a scheduler that holds coroutines .• Embed a reactor in there.• Write a neat Socket class.

Coroutines-based scheduler

• Make a scheduler that holds coroutines .• Embed a reactor in there.• Write a neat Socket class. When read, it:

◦ Unschedules itself.◦ Asks the reactor to read◦ Pass a callback to reschedule itself◦ Yield control back.

Coroutines-based echo server

TCPServer server; server.listen(4242);std::vector<Thread> threads;int lines_count = 0;while (true){

TCPSocket client = server.accept();Thread t([client = std::move(client)] {try{while (true){

++lines_count;client.send(client.read_until("\n"));

}}catch (ConnectionClosed const&) {}

});threads.push_back(std::move(t));

What we built at Infinit: the reactor.

• Coroutine scheduler: simple round robin• Sleeping, waiting• Timers• Synchronization• Mutexes, semaphores• TCP networking• SSL• UPnP• HTTP client (Curl based)

Coroutine scheduling

reactor::Scheduler sched;reactor::Thread t1(sched,

[&]{print("Hello 1");reactor::yield();print("Bye 1");

});reactor::Thread t2(sched,

}););sched.run();

Coroutine scheduling

reactor::Scheduler sched;reactor::Thread t1(sched,

}););sched.run();

Hello 1Hello 2Bye 1Bye 2

Sleeping and waiting

reactor::Thread t1(sched,[&]{print("Hello 1");reactor::sleep(500_ms);print("Bye 1");

[&]{print("Hello 2");reactor::yield();print("World 2");reactor::yield();print("Bye 2");

Hello 1Hello 2World 2Bye 2

Hello 1Hello 2World 2Bye 2Bye 1

[&]{print("Hello 2");reactor::yield();print("World 2");reactor::wait(t1); // Waitprint("Bye 2");

Hello 1Hello 2World 2

Hello 1Hello 2World 2Bye 1Bye 2

Synchronization: signals

reactor::Signal task_available;std::vector<Task> tasks;

reactor::Thread handler([&] {while (true){if (!tasks.empty()){std::vector mytasks = std::move(tasks);for (auto& task: tasks)

; // Handle task}elsereactor::wait(task_available);

reactor::Thread handler([&] {while (true){if (!tasks.empty()){std::vector mytasks = std::move(tasks);for (auto& task: tasks)

; // Handle task}elsereactor::wait(task_available);

tasks.push_back(...);task_available.signal();

reactor::Thread handler([&] {while (true){if (!tasks.empty()) // 1{std::vector mytasks = std::move(tasks);for (auto& task: tasks)

; // Handle task}elsereactor::wait(task_available); // 4

tasks.push_back(...); // 2task_available.signal(); // 3

Synchronization: channels

reactor::Channel<Task> tasks;

reactor::Thread handler([&] {while (true){Task t = tasks.get();// Handle task

tasks.put(...);

Mutexes

But you said no race conditions! You lied again!

Mutexes

reactor::Thread t([&] {while (true){

for (auto& socket: sockets)socket.send("YO");

socket.push_back(...);

Mutexes

reactor::Mutex mutex;reactor::Thread t([&] {

while (true){reactor::wait(mutex);for (auto& socket: sockets)socket.send("YO");

mutex.unlock();}

{reactor::wait(mutex);socket.push_back(...);mutex.unlock();

Mutexes

reactor::Mutex mutex;reactor::Thread t([&] {

while (true){reactor::Lock lock(mutex);for (auto& socket: sockets)socket.send("YO");

{reactor::Lock lock(mutex);socket.push_back(...);

Networking: TCP

We saw a good deal of TCP networking:

reactor::TCPSocket socket("battle.net", 4242, 10_sec);// ...

}catch (reactor::network::ResolutionFailure const&){

// ...}catch (reactor::network::Timeout const&){

// ...}

Networking: TCP

We saw a good deal of TCP networking:

voidserve(TCPSocket& client){

try{std::string auth = server.read_until("\n", 10_sec);if (!check_auth(auth))// Impossible with callbacksthrow InvalidCredentials();

while (true) { ... }}catch (reactor::network::Timeout const&){}

Networking: SSL

Transparent client handshaking:

reactor::network::SSLSocket socket("localhost", 4242);socket.write(...);

Networking: SSL

Transparent server handshaking:

reactor::network::SSLServer server(certificate, key);server.listen(4242);while (true){

auto socket = server.accept();reactor::Thread([&] { ... });

Networking: SSL

SSLSocket SSLServer::accept(){

auto socket = this->_tcp_server.accept();// SSL handshakereturn socket

Networking: SSL

reactor::Channel<SSLSocket> _sockets;

void SSLServer::_handshake_thread(){

while (true){auto socket = this->_tcp_server.accept();// SSL handshakethis->_sockets.put(socket);

SSLSocket SSLServer::accept(){

return this->_accepted.get;}

Networking: SSL

void SSLServer::_handshake_thread(){

while (true){auto socket = this->_tcp_server.accept();reactor::Thread t([&]{

// SSL handshakethis->_sockets.put(socket);

std::string google = reactor::http::get("google.com");

reactor::http::Request r("kissmetrics.com/api",reactor::http::Method::PUT,"application/json",5_sec);

r.write("{ event: \"login\"}");reactor::wait(r);

std::string google = reactor::http::get("google.com");

reactor::http::Request r("kissmetrics.com/api",reactor::http::Method::PUT,"application/json",5_sec);

r.write("{ event: \"login\"}");reactor::wait(r);

• Chunking• Cookies• Custom headers• Upload/download progress• ... pretty much anything Curl supports (i.e., everything)

HTTP streaming

std::string content = reactor::http::get("my-api.infinit.io/transactions");

auto json = json::parse(content);

HTTP streaming

reactor::http::Request r("my-api.production.infinit.io/transactions");

assert(r.status() == reactor::http::Status::OK);// JSON is parsed on the fly;auto json = json::parse(r);

HTTP streaming

reactor::http::Request r("my-api.production.infinit.io/transactions");

assert(r.status() == reactor::http::Status::OK);// JSON is parsed on the fly;auto json = json::parse(r);

reactor::http::Request r("youtube.com/upload", http::reactor::Method::PUT);

std::ifstream input("~/A new hope - BrRIP.mp4");std::copy(input, r);

Better concurrency: futures, ...

std::string transaction_id = reactor::http::put("my-api.production.infinit.io/transactions");

// Ask the user files to share.reactor::http::post("my-api.infinit.io/transaction/", file_list);std::string s3_token = reactor::http::get("s3.aws.amazon.com/get_token?key=...");

// Upload files to S3

std::string transaction_id = reactor::http::put("my-api.production.infinit.io/transactions");

// Ask the user files to share.reactor::http::post("my-api.infinit.io/transaction/", file_list);std::string s3_token = reactor::http::get("s3.aws.amazon.com/get_token?key=...");

// Upload files to S3

reactor::http::Request transaction("my-api.production.infinit.io/transactions");

reactor::http::Request s3("s3.aws.amazon.com/get_token?key=...");

// Ask the user files to share.auto transaction_id = transaction.content();reactor::http::Request list("my-api.infinit.io/transaction/", file_list);

auto s3_token = transaction.content();// Upload files to S3

Version 1Wait meta

Ask files

Wait meta

Wait AWS

Version 2Ask files

How does it perform for us ?

• Notification server does perform:

◦ 10k clients per instance◦ 0.01 load average◦ 1G resident memory◦ Cheap monocore 2.5 Ghz (EC2)

How does it perform for us ?

• Notification server does perform:

◦ 10k clients per instance◦ 0.01 load average◦ 1G resident memory◦ Cheap monocore 2.5 Ghz (EC2)

• Life is so much better:

◦ Code is easy and pleasant to write and read◦ Everything is maintainable◦ Send metrics on login without slowdown? No biggie.◦ Try connecting to several interfaces and keep the first to respond? No

biggie.

Questions ?

Highly concurrent yet natural programming

Software

Transcript of Highly concurrent yet natural programming

Concurrent transition of ferroelectric and magnetic ... · Concurrent ansition tr of errf oelectric and magnetic ordering near room temperature ... that a highly elongated phase of

Systematic Stress Testing of Concurrent Programs · Concurrent Programming is HARD Concurrent executions are highly nondeterminisitic Rare thread interleavings result in Heisenbugs

Concurrent and Real-Time Task Management for Self ......Up to date, there is yet any controller for self-reconfigurable robot that is totally distributed (i.e., ID free), and concurrent

Mitigating the Right Turn Conflict Using Protected-Yet ...docs.trb.org/prp/14-2197.pdf · 1 Mitigating the Right Turn Conflict Using Protected-Yet-Concurrent 2 Phasing for Cycle Track

How to design little digital, yet highly concurrent, electronics?€¦ · a+ b+ x1- c+ x2- x1+ c- x3- a+ b+ a- b- c+ c-a- b- x2+ x3+ Unack’ed transitions x2- and x3- may cause a

ANNUAL REPORT ON CONCURRENT ENROLLMENT For 2010 …...yet able to report college matriculation or success in higher education among students who participated in concurrent enrollment

Biological systems are highly ordered, and yet,. Disorder reigns! Entropy rules! Everything’s falling apart!

Performance and Resource Modeling in Highly-Concurrent OLTP

A Design Framework for Highly Concurrent Systems · PDF fileA Design Framework for Highly Concurrent Systems Matt Welsh, Steven D. Gribble, Eric A. Brewer, and David Culler Computer

Data Structures for Concurrency - Springer · Controlled access, such as that offered by concurrent containers, comes at a cost: making a container “highly concurrent” is not

RHmalloc: A Very Large, Highly Concurrent Dynamic Memory ... · PDF fileUNIVERSITY OF TECHNOLOGY SYDNEY RHmalloc: A Very Large, Highly Concurrent Dynamic Memory Manager Thesis submitted

*10* Simple Yet Highly Productive Attitudes of Self Made Millionaires

The Staged Event-Driven Architecture for Highly …docs.huihoo.com/seda/quals-seda.pdf · The Staged Event-Driven Architecture for Highly-Concurrent Server ... We propose a new design

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

Asynchronous stream processing with Akka …...Make building powerful concurrent & distributed applications simple.Akka is a toolkit and runtime for building highly concurrent, distributed,

What Biology Can (and Can’t) Teach us about Securityevans/usenix04/usenix.pdfUseful with the Human Genome Yet They are trying to debug highly concurrent, asynchronous, type-unsafe,

Highly Concurrent and Fault-Tolerant h-out of-k Mutual Exclusion Using Cohorts Coteries for Distributed Systems.

The cost of concurrent,low-contentionmavronic/pdf/SIROCCO2003.pdf · 2005. 12. 12. · C. Busch et al. / Theoretical Computer Science 333 (2005) 373–400 375 construct highly concurrent,low-contention

Thinking in a Highly Concurrent, Mostly-functional Language · 2014. 9. 23. · Thinking in a Highly Concurrent, Mostly-functional Language Chicago Erlang Chicago, September 22nd

Pd O : An Oxidation Resistant yet Highly Catalytically ... · Pd 6O 4 +: An Oxidation Resistant yet Highly Catalytically Active Nano- oxide Cluster Samdra M. Lang,† Irene Fleischer,†

10 Simple Yet Highly Productive Attitudes of Self Made Millionaires