Post on 10-May-2015
description
Life in a Queue Tareque Hossain Education Technology
What is Message Queue?
• Message Queues are: o Communication Buffers o Between independent sender & receiver processes
o Asynchronous • Time of sending not necessarily same as receiving
• In context of Web Applications: o Sender: Web Application Servers
o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have
time/resources to do
Web App Server
Web App Server
Web App Server
Worker Server
Worker Server
Worker Server
Q1 Q2
T1
T2
T3
Web App Server
T5
T4
T6
T7
Dequeue Manager
Enqueue Manager
Inside a Message Queue
Message Queue Broker
How does it work? • Say a web application server has a task it
doesn’t have time to do • It puts the task in the message queue • Other web servers can access the same queue(s)
and put tasks there • Queues are FIFO (First In First Out) • Workers are greedy and they all watch the
queues for tasks • Workers asynchronously pick up the first
available task on the queue when they are ready
Do I need Message Queues?
• Message Queues are useful in certain situations
• General guidelines: o Does your web applications take more than a
few seconds to generate a response? o Are you using a lot of cron jobs to process data
in the background? o Do you wish you could distribute the processing
of the data generated by your application among many servers?
Wait I’ve heard Asynchronous before! • Yes. AJAX is an asynchronous communication
method between client & server
• Some of the response time issues can be solved: o With AJAX responses that continually enhance the
initial response o Only if the AJAX responses also complete within a
reasonable amount of time
• You need Message Queues when: o Long processing times can’t be avoided in generating
responses o You want application data to be continuously processed
in the background and readily available when requested
MQ Tasks: Processing User Uploads
• Resize uploaded image to generate different resolutions of images, avatars, gallery snapshots
• Reformat videos to match your player requirements
• YouTube, Facebook, Slideshare are good examples
MQ Tasks: Generate Reports • Generating reports from large amount of data
o Reports that contains graphical charts
o Multiple reports that cross reference each other
MQ Tasks: 3rd Party Integrations • Bulk processing of 3rd party service requests
o Refund hundreds of transactions using Paypal
o Any kind of data synchronization o Aggregation of RSS/other feeds
Social Network Feed Aggregator
MQ Tasks: Cron Jobs • Any cron job that is not time sensitive
o Asynchronous behavior of message queue doesn’t guarantee execution of tasks on the dot
o Jobs in cron that should be done as soon as resources become available are good candidates
Message Queue Solution Stack
Web Application Server
Task Management Subsystem
Message Queue Protocol Library
Message Queue Broker
Queue Worker
Task Management Subsystem
Message Queue Protocol Library
Protocol/Broker Choices
AMQP (Advanced Message Queuing Protocol)
Brokers • RabbitMQ • Apache Qpid • Apache ActiveMQ • OpenAMQ • StormMQ
JMS (Java Message Service)
Brokers • Apache Qpid • Apache ActiveMQ • OpenJMS • Open Message
Queue
STOMP (Streaming Text Orientated
Messaging Protocol)
Brokers • Apache ActiveMQ • STOMPServer • CoilMQ
OMG That’s too much! • Yeah. I agree. • Read great research details at Second Life dev site
o http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
• Let’s simplify. How do we choose? o How is the exception handling and recovery? o Is maintenance relatively low?
o How easy is deployment? o Are the queues persistent?
o How is the community support? o What language is it written in? How compatible is that
with our current systems?
o How detailed are the documentations?
Choice of PBS Education • We chose AMQP & RabbitMQ • Why?
o We don’t expect message volumes as high as 1M or more at a time
o RabbitMQ is free to use o The documentation is decent o There is decent clustering support, even though we never
needed clustering o We didn’t want to lose queues or messages upon broker
crash/ restart o We develop applications using Python/django and
setting up an AMQP backend using celery/kombu was easy
Message Queue Solution Stack
Web Application Server
Celery
PyAMQPlib/Kombu
RabbitMQ
Queue Worker
Celery
PyAMQPlib/Kombu
Celery? Kombu? Yummy. • django made web development using Python a
piece of cake
• Celery & Kombu make using message queue in your django/Python applications a piece of cake
• Kombu o AMQP based Messaging Framework for Python,
powered by PyAMQPlib o Provides fundamentals for creating queues, configuring
broker, sending receiving messages
• Celery o Distributed task queue management application
Celery Backends • Celery is very, very powerful • You can use celery to emulate message queue
brokers using a DB backend for broker o Involves polling & less efficient than AMQP
o Use for local development
• Bundled broker backends o amqplib, pika, redis, beanstalk, sqlalchemy, django,
mongodb, couchdb
• Broker backend is different that task & task result store backend o Used by celery to store results of a task, errors if failed
A Problem with a View • What is wrong with this view?
def create_report(request): ... Code for extracting parameters from request ... ... Code for generating report from lots of data ... return render_to_response(‘profiles/index.html’, { ‘report’: report, }, context_instance=RequestContext(request))
A Problem with a View
Lets Write a Celery Task • Writing celery tasks was never any more difficult
than this:
import celery @celery.task() def generate_report(*args, **kwargs): ... Code for generating report ... report.save()
Lets Write a Celery Task II • If you want to customize your tasks, inherit from
the base Task object from celery.task.base import Task class GenerateReport(Task): def __init__(self, *args, **kwargs): ... Custom init code ... return super(GenerateReport, self).__init__(*args, **kwargs) def run(self, *args, **kwargs): ... Code for generating report ... report.save()
Issuing a task • After writing a task, we issue the task from within
a request in the following way:
def create_report(request): ... Code for extracting parameters from request ... generate_report.delay(**params) // or GenerateReport.delay(**params) messages.success(request, 'You will receive an email when report generation is complete.') return HTTPResponseRedirect(reverse(‘reports_index’))
What happens when you issue tasks?
Application Server
Celery
Broker
Worker
Queue
Request Handler
Celery
Worker
Celery
Worker
Celery
Understanding Queue Routing • Brokers contains multiple virtual hosts • Each virtual host contains multiple exchanges
• Messages are sent to exchanges o Exchanges are hubs that connect to a set of queues
• An exchange routes messages to one or more queues
Exchange VHost
Queue
Understanding Queue Routing • In Celery configurations:
o binding_key binds a task namespace to a queue
o exchange defines the name of an exchange o routing_key defines which queue a message should be
directed to under a certain exchange
o exchange_type = ‘direct’ routes for exact routing keys o exchange_type = ‘topic’ routes for namespaced &
wildcard routing keys
• * (matches a single word) • # (matches zero or more words)
Example Celery Config for Routing CELERY_DEFAULT_QUEUE = "default" CELERY_QUEUES = { "feed_tasks": { "binding_key": "feed.#", }, "regular_tasks": { "binding_key": "task.#", }, "image_tasks": { "binding_key": "image.compress", "exchange": "mediatasks", "exchange_type": "direct", }, } CELERY_DEFAULT_EXCHANGE = "tasks" CELERY_DEFAULT_EXCHANGE_TYPE = "topic" CELERY_DEFAULT_ROUTING_KEY = "task.default”
Quick Tips
# Set expiration for a task – in seconds mytask.apply_async(args=[10, 10], expires=60)
# Route a task mytask.apply_async(
args=[filename], routing_key=“video.compress”
) # Or define task mapping in CELERY_ROUTES setting
# Revoke a task using the task instance result = mytask.apply_async(args=[2, 2], countdown=120) result.revoke() # Or save the task ID (result.task_id) somewhere from celery.task.control import revoke revoke(task_id)
Quick Tips • Execute task as a blocking call using:
• Avoid issuing tasks inside an asynchronous task that waits on children data (blocking) o Write re-usable pieces of code that can be called as
functions instead of called as tasks
o If necessary, use the callback + subtask feature of celery
• Ignore results if you don’t need them o If your asynchronous task doesn’t return anything
generate_report.apply(kwargs=params, **options)
@celery.task(ignore_results=True)
Good to know
• Do check whether your task parameters are serializable o WSGI request objects are not serializable
o Don’t pass request as a parameter for your task
• Don’t pass unnecessary data in task parameters o They have to be stored until task is complete
Good to know
• Avoid starvation of tasks using multiple queues o If really long video re-formatting tasks are processed
in the same queue as relatively quicker thumbnail generation tasks, the latter may starve
o Only available when using AMQP broker backend
• Use celerybeat for time sensitive repeated tasks o Can replace time sensitive cron jobs related to your web
application
Q & A • Slides available at:
o http://www.slideshare.net/tarequeh
• Extensive guides & documentation available at: o http://ask.github.com/celery/