Ruby Concurrency

Post on 15-Jul-2015

45 views 1 download

Tags:

Transcript of Ruby Concurrency

or

Concurrency Hell

How I stopped worrying about it

Egor Hamaliy

Agenda

1. Concurrency and Parallelism

2. General Concepts

3. Models

• Actors

• Mutexes/Locks

• STM

• CSP

• Futures/Promises

• …

Models

Models

• Coroutines

• Evented IO• Process calculi• Petri nets• ...

GENERAL CONCEPTS

Concurrency/Parallelism

OS mechanism

Scheduling

Communication

Part #1

Concurrency vs Parallelism

What’s the difference ?

Not all programmers agree on the meaning of the

terms 'parallelism' and 'concurrency'. They may

define them in different ways or do not distinguishthem at all.

Rob Pike

Concurrency is about dealing with lots of things at

once.

Parallelism is about doing lots of things at once.

http://blog.golang.org/concurrency-is-not-parallelism

Rob Pike

Execution

How things are executed?

• Process

• Thread

• Green Thread

OS Primitives

Scheduling

How things are scheduled?

Preemptive

Cooperative

Communication

How do the executing things not trip over each other?

Communication is always HARD

Models

Threads/Mutexes

Transactional Memory

Processes & IPC

CSP

Evented/Coroutines

Actors

Part #2

Model Execution Scheduling Communication Concurrent/Parallel

Implementation

Mutexes Threads Preemptive Shared Memory(locks) C/P Mutex

Transactional

Memory

Threads Preemptive Shared memory(commit/abort) C/P Clojure STM

Processes &

IPC

Processes Preemptive Shared memory/Message

passing

C/P Resque/Forking

CSP Threads/Proc

esses

Preemptive Message passing(channels) C/P Golang

Actors Threads Preemptive Message passing(mailboxes) C/P Erlang/Celluloid

Futures &

Promises

Threads Cooperative Message passing(itself) C/P Oz/Celluloid

Coroutines 1

process/threa

d

Cooperative Message passing C Fibers

Evented 1

process/threa

d

Cooperative Shared memory C Eventmachine

Atomicity problems

Mutex

Pros

• No need to worry about scheduling (preemptive)

• Commonly used

• Wide language support

Cons

• Scheduling overhead (context switching)

• Synchronization/locking issues

Mutex

http://en.wikipedia.org/wiki/Software_transactional_memory

"After completing an entire transaction verifies that other

threads have not made changes to memory that it

accessed in the past...If validation is successful, made

permanent, is called a commit."

"May also abort at any time."

Transactional Memory

Thread1:

atomic {

- read variable

- increment variable

- write variable

}

Thread2:

atomic {

- read variable

- increment variable

# going to write, but Thread1 has written variable...

# notices Thread1 changed data, so ROLLS BACK

- write variable

}

“Don't wait on a lock, just check when we're ready to commit”

Transactional Memory

Transactional Memory

Pros

• Increased concurrency

• No thread needs to wait for access to a resource

• Smaller scope that needs synchronizing - modifying disjoint parts of a

data structure

STM's benefits

http://www.haskell.org/haskellwiki/Software_transactional_memory

Cons

• Aborting transactions

• Places limitations on the behavior of transactions - they cannot perform

any operation that cannot be undone, including most I/O.

Transactional Memory

Methods of IPC

• Pipes

• Shared memory

• Message queues

IPC

How do we handle atomicity? Don't share memory.

How to communicate?

IPC

Pros

• Can't corrupt data when data is not shared.

• No locking.

• Easier to scale horizontally (adding nodes).

Cons

• Can't communicate over shared memory

• Slower to spawn a new process

• More memory overhead.

• Scaling horizontally is expensive.

IPC

Communicating Sequential Processes

CSP

Pros

• Uses message passing and channels heavily, alternative to

locks

Cons

• Handling very big messages, or a lot of messages,

unbounded buffers

• Messaging is essentially a copy of shared

CSP

Actors

Atomicity? Conflict? Every actor has it's own address space.

Don't share memory. Communicate via mailboxes.

Actors

Comparison with CSP

• CSP processes are anonymous, while actors have

identities.

• Message-passing in actor systems is fundamentally

asynchronous (CSP traditionally uses synchronous

messaging: "rendezvous")

• CSP uses explicit channels for message passing,

whereas actor systems transmit messages to named

destination actors.

Actors

Pros

• Uses message passing and channels heavily

• No shared state (avoid locks, easier to scale)

• Easier to maintain the code

Cons

• When shared state is required doesn't fit as well

• Handling very big messages, or a lot of messages

• Messaging is essentially a copy of shared data

Actors

Fibers

Fibers are coroutines:

Cooperative! Handing execution rights between one

another, saving local state.

Fibers

A Curious Course on Coroutines and Concurrency:

David Beazley (https://twitter.com/dabeaz) writing an operating system

with only coroutines.

http://dabeaz.com/coroutines/

No Threads, Evented style, just cooperative scheduling of coroutines...

Possible use cases:

http://stackoverflow.com/questions/303760/what-are-use-cases-for-a-

coroutine

Fibers

Pros

• Expressive state: state based computations much easier

to understand and implement

• No need for locks (cooperative scheduling)

• Scales vertically (add more cpu)

Cons

• Single thread: Harder to parallelize/scale horizontally

(use more cores, add more nodes)

• Constrained to have all the components work together

symbiotically

Fibers

Eventmachine

Examples

• C10k problem

• Eventmachine in ruby

• Twisted in python

• Redis's event loop

• Apache vs Nginx

• Node.js vs the world

Eventmachine

Eventmachine

“Evented servers are really good for very light requests, but

if you have a long-running request, it falls down on its face”

Technically, valid, but in practice, not necessarily true.

Eventmachine

Reactor:

• wait for event (Reactor job)

• dispatch "Ready-to-Read" event to user handler (Reactor job)

• read data (user handler job)

• process data ( user handler job)

Proactor:

• wait for event (Proactor job)

• read data (now Proactor job)

• dispatch "Read-Completed" event to user handler (Proactor job)

• process data (user handler job)

Pros

• Avoid polling. CPU bound vs IO bound

• Expanding your horizons (very different paradigms)

• Scales well vs spawning many threads

Cons

• You block the event loop, all goes bad

• Program flow is "spaghetti"-ish

• Callback Hell

• Hard to debug, you loose "the stack”

Eventmachine

Sidenote: C10M

http://highscalability.com/blog/2013/5/13/the-secret-to-10-million-concurrent-connections-the-kernel-i.html

http://c10m.robertgraham.com/p/manifesto.html

http://c10m.robertgraham.com/2013/02/wimpy-cores-and-scale.html

Conclusion

USE THE BEST TOOL FOR THE JOB

Questions?