Concurrent Programming with Ruby and Tuple Spaces
-
Upload
luccastera -
Category
Technology
-
view
13.869 -
download
0
Transcript of Concurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple Spaces
Luc CasteraFounder / messagepub.com
The Free Lunch is Over:
A Fundamental Turn Toward Concurrency in Software
Source: http://www.gotw.ca/publications/concurrency-ddj.htm
The major processor manufacturers and architectures have run out of room with most of their traditional approaches to boosting CPU performance. Instead of driving clock speeds ands straight-line instruction throughput ever higher, they are instead turning en masse to hyperthreading and multicore architectures. [] And that puts us at a fundamental turning point in software development, at least for the next few years... Herb Sutter March 2005
Outline
1. The problem with Ruby Threads
2. Multiple Ruby Processes
3. Inter-process Communication with TupleSpaces
PART 1
The Problem With Threads
A closer look at the Ruby threading model
3 Types of Threading Models:
1 : N
1 : 1
M : N
3 Types of Threading Models:
1 : N
1 : 1
M : N
Kernel Threads
User-Space Threads
1 : N Green Threads
One kernel thread for N user threads
aka lightweight threads
10 ms
10 ms
10 ms
10 ms
10 ms
10 ms
10 ms
10 ms
RUBY 1.8
Pros and Cons
Pros:Thread creation, execution, and cleanup are cheap
Lots of threads can be created
Cons:Not really parallel because kernel scheduler doesn't know about threads and can't schedule them across CPUs or take advantage of SMP
Blocking I/O operation can block all green threadsExample: C Extension
Example: mysql gem (solution: NeverBlock mysqlplus)
blocking
1 : 1 Native Threads
1 kernel thread for each user thread
Pros and Cons
Pros:Threads can execute on different CPUs (truly parallel)
Threads do not block each other
Cons:Setup Overhead
Low limit on number of threads
Linux kernel bug with lots of threads
RUBY 1.9
I lied.
Global Interpreter Lock
A Global Interpreter Lock (GIL) is a mutual exclusion lock held by a programming language interpreter thread to avoid sharing code that is not thread-safe with other threads. There is always one GIL for one interpreter process.
Usage of a Global Interpreter Lock in a language effectively limits concurrency of a single interpreter process with multiple threads there is no or very little increase in speed when running the process on a multiprocessor machine.
Source: Wikipedia
A person (male or female) who intentionally or unintentionally stops the progress of two others getting their game on.
Concurrency is a myth in Ruby
Ilya Grigorik
Ilya Gregorik:
The implications of the GIL are surprising at first, but it turns out the solution to this problem is not all that complex: instead of thinking in threads, think how you could split the workload between different processes. Not only will you bypass an entire class of problems associated with concurrent programming (it's hard!), but you are also much more likely to end up with a horizontally scalable architecture for your application. Here are the steps:
1. Partition the work, or decompose your application 2. Add a communications / work queue (Starling, Beanstalkd, RabbitMQ) 3. Fork, or run multiple instances of you application
Not surprisingly, many of the Ruby applications have already adopted this strategy: a typical Rails deployments is powered by a cluster of app servers (Mongrel, Ebb, Thin), and alternative strategies like EventMachine, and Revactor (equivalents of Twisted in Python) are gaining ground as a simple way to defer and parallelize your network IO without introducing threads into your application.
Unless you are using JRuby.
A note on Fibers
Ruby 1.9 introduces fibers.
Fibers are green threads, but scheduling must be done by the programmer and not the VM.
Faster and cheaper then native threads.
Implemented for Ruby 1.8 by Aman Gupta.
Learn More:http://tinyurl.com/rubyfibers
http://all-thing.net/fibers
http://all-thing.net/fibers-via-continuations
M : N Hybrid Model
M kernel threads for N user threads
best of both worlds
Pros and Cons
Pros:Take advantage of multiple CPUs
Not all threads are blocked by blocking system calls
Cheap creation, execution, and cleanup
Cons:Need scheduler in userland and kernel to work with each other
Green threads doing blocking I/O operations will block all other green threads sharing same kernel thread
Difficult to write, maintain, and debug code
Writing multi-threaded code is really, really hard. And it is hard because of Shared Memory. Jim Weirich
The Other Problem with Threads
http://rubyconf2008.confreaks.com/what-all-rubyist-should-know-about-threads.html
Multi-Threaded Code is Hard+Concurrency is a myth =FAIL!
Stop thinking in threads
Design your application to use multiple processes
The implications of the GIL are surprising at first, but it turns out the solution to this problem is not all that complex: instead of thinking in threads, think how you could split the workload between different processes. Not only will you bypass an entire class of problems associated with concurrent programming (it's hard!), but you are also much more likely to end up with a horizontally scalable architecture for your application. Here are the steps:
1. Partition the work, or decompose your application 2. Add a communications / work queue (Starling, Beanstalkd, RabbitMQ) 3. Fork, or run multiple instances of you application
Not surprisingly, many of the Ruby applications have already adopted this strategy: a typical Rails deployments is powered by a cluster of app servers (Mongrel, Ebb, Thin), and alternative strategies like EventMachine, and Revactor (equivalents of Twisted in Python) are gaining ground as a simple way to defer and parallelize your network IO without introducing threads into your application.
PART 2
Multiple Ruby Processes
Pros and Cons
Pros:No longer sharing memory
Take advantage of multiple CPUs (Performance)
Not all threads are blocked by blocking system calls.
Scalability
Fault-Tolerance
Cons:Process creation, execution and cleanup is expensive
Uses a lot of memory (loading Ruby VM for every process)
Need a way for processes to communicate!
Latency
Starting/Stopping
Fault-Tolerance
Monitoring
but we will focus on...
How do the processes communicate?
Options
DRB
Sockets
QueuesRabbitMQ
ActiveMQ
Key-Value DatabasesRedis
Tokyo Cabinet
Memcached
Relational Databases
XMPP
TupleSpaces
Examples
Rails + Mongrel/Thin
Cluster of application servers (Mongrel, Thin...)
Communication between processes is done via the database.
Nanite
A self-assembling fabric of Ruby daemons
http://github.com/ezmobius/nanite
Uses RabbitMQ/AMQP for IPC
Revactor
Uses the actor model
Actors are kinda like threads, with messaging baked-in.
Each Actor has a mailbox.
It's like coding erlang in Ruby.
Messages are passed between actors using TCP sockets.
Good Documentation
http://revactor.org/
Erlang provides a sledgehammer for the problems of concurrent programming. But, sometimes you don't need a sledgehammer... just a flyswatter will do. Tony Arcieri
Discontinued for Reia
Journeta
Journeta is a dirt simple library for peer discovery and message passing between Ruby applications on a LAN
Uses UDP Sockets for IPC
Uses the fucked up Ruby socket API from their RDOC
Demo(?)
If time permits, show demo.
PART 3
TupleSpaces
Interprocess Communication with TupleSpaces
A tuple space provides a repository of tuples that can be accessed concurrently.
[:add, 1, 2]
[:result, 79]
[:add, 60, 5]
[:token]
[:search, linda]
[:where_is, :waldo
[:subtract, 10, 2]
[:save, 7864]
The Blackboard Metaphor
[:add, 1, 2]
[:result, 79]
[:add, 60, 5]
[:token]
[:search, linda]
[:where_is, :waldo
[:subtract, 10, 2]
[:save, 7864]
The Blackboard Metaphor
[:add, nil, nil]
[:add, 1, 2]
[:result, 79]
[:add, 60, 5]
[:token]
[:search, linda]
[:where_is, :waldo
[:subtract, 10, 2]
[:save, 7864]
The Blackboard Metaphor
[nil]
[:add, 1, 2]
[:result, 79]
[:add, 60, 5]
[:token]
[:search, linda]
[:where_is, :waldo
[:subtract, 10, 2]
[:save, 7864]
The Blackboard Metaphor
[:where_is, :waldo]
About Tuple Spaces
First implementation was Linda.
Linda was developed by David Gelernter and Nicholas Carriero at Yale University.
Implementations exists for most languages.
The Ruby implementation is Rinda.
Rinda is a built-in library, so no need to install.
5 Basic Operations
read
read_all
write
take
notify
5 Basic Operations
read
read_all
write
take
notify
Reads tuple, but does not remove it.
Blocking, by default, but takes an additional timeout argument.
5 Basic Operations
read
read_all
write
take
notify
Returns all tuples matching tuple. Does not remove the found tuples.
5 Basic Operations
read
read_all
write
take
notify
Adds Tuple
Takes an optional timeout parameter
5 Basic Operations
read
read_all
write
take
notify
Atomic Read + Delete
Blocking, by default, but takes an additional timeout argument.
5 Basic Operations
read
read_all
write
take
notify
Registers for notifications of events: Write
Take
Delete
Key Features
Spaces are sharedSpace handles details of concurrent access
Spaces are persistentIf agent process dies, data is still in space
However, if space process dies, data is lost (?)
Spaces are associativeAssociative lookups rather than memory location or identifier
Spaces are transactionally secureAtomic Operations
Spaces allow us to exchange executable content
A Rinda tuple can be an array or a hash
A Rinda tuple can be an array or a hash
( But let's stick with the array, I like that better! )
Start a Tuple Space on port 1234
Clients/Agents
DEMO
Rinda
RingServer
This is also a TupleSpace
SPOF
Rinda is not persistent...
If it crashes while you have tuples in the space, you lose them
all.
Only Ruby
Introducing Blackboard
TupleSpace implementation on top of Redis Persistent
Redis is a really fast key-value database.Like memcached but data is not volatile.
Same API Plug & Play
For now, only supports: take, read, and write
http://github.com/dambalah/blackboard
Server
Just start the redis-server:
$ redis-server
Client/Agents
DEMO
Blackboard
Blackboard Benchmarks
Blackboard: Future
Move from Redis to a custom based Erlang blackboard implementation.
I would like that Erlang implementation to be easily used from other programming languages also.
So it's really two projects:Blackboard in erlang
Ruby-library to talk to blackboard in erlang
Thank you!
Luc CasteraFounder / messagepub.com
Questions?Feedback?
[email protected]
www.speakerrate.com
Luc CasteraFounder / messagepub.com
Resources / References
Part 1: Threading Modelshttp://timetobleed.com/threading-models-so-many-different-ways-to-get-stuff-done/
http://envycasts.com/products/scaling-ruby
http://www.infoq.com/news/2007/05/ruby-threading-futures
http://thebogles.com/blog/2006/11/ruby-threading/
http://spec.ruby-doc.org/wiki/Ruby_Threading
http://www.bitwiese.de/2007/09/on-processes-and-threads.html
http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/
http://bartoszmilewski.wordpress.com/2008/08/24/threads-dont-scale-processes-do/
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
http://www.gotw.ca/publications/concurrency-ddj.htm
http://tinyurl.com/rubyfibers
Resources / References
Part 2: Multiple Processeshttp://github.com/ezmobius/nanite
http://erlang.org/
http://www.rabbitmq.com/
http://code.google.com/p/redis/
http://revactor.org/
http://journeta.rubyforge.org/
http://home.mindspring.com/~eric_rollins/ParallelRuby.html
Resources / References
Part 3: TupleSpaceshttp://c2.com/cgi/wiki?TupleSpace
http://en.wikipedia.org/wiki/Tuplespace
http://www.julianbrowne.com/article/viewer/space-based-architecture-example
http://www.rubyagent.com/
http://segment7.net/projects/ruby/drb/
http://segment7.net/projects/ruby/drb/rinda/ringserver.html
JavaSpaces Principles, Patterns, and Practice Freeman, Hupfer, et. al.
http://www.ruby-doc.org/stdlib/libdoc/rinda/rdoc/index.html
Things I wish I had time to spend on
MPI and Ruby-MPIhttp://github.com/abedra/mpi-ruby/tree/master
Ruby forkoff:http://tinyurl.com/forkoff
RindaBlackboard
Write (1000)0.0427490.253068
Take (500)0.08274415.844250
Read (500)0.02009820.098478
???Page ??? (???)06/10/2009, 20:34:24Page /