Celery

Post on 10-May-2015

239 views 4 download

Tags:

description

Quick introduction to Celery

Transcript of Celery

Celery

Òscar Vilaplana

February 28 2012

oscar
Typewritten Text
@grimborg dev@oscarvilaplana.cat

Outline

self.__dict__

Use task queues

Celery and RabbitMQ

Getting started with RabbitMQ

Getting started with Celery

Periodic tasks

Examples

self.__dict__

{'name': 'Òscar Vilaplana','origin': 'Catalonia','company': 'Paylogic','tags': ['developer', 'architect', 'geek'],'email': 'dev@oscarvilaplana.cat',

}

Proposal

I Take a slow task.

I Decouple it from your system

I Call it asynchronously

Separate projects

Separate projects allow us to:I Divide your system in sections

I e.g. frontend, backend, mailing, reportgenerator. . .

I Tackle them individuallyI Conquer them�declare them Done:

I Clean codeI Clean interfaceI Unit testedI Maintainable

(but this is not only for Celery tasks)

Coupled Tasks

In some cases, it may not be possible to decouple some tasks.Then, we either:

I Have some workers in your system's networkI with access to the code of your systemI with access to the system's database

I They handle messages from certain queues, e.g. internal.#

Candidates

Processes that:

I Need a lot of memory.

I Are slow.

I Depend on external systems.

I Need a limited amount of data to work (easy to decouple).

I Need to be scalable.

Examples:

I Render complex reports.

I Import big �les

I Send e-mails

Example: sending complex emails

Create a in independent project: yourappmailI Generator of complex e-mails.

I It needs the templates, images. . .I It doesn't need access to your system's database.

I Deploy it in servers of our own, or in Amazon serversI We can add/remove as we need themI On startup:

I Join the RabbitMQ clusterI Start celeryd

I Normal operation: 1 server is enough

I On high load: start as many servers as needed (tpspeak

tpsserver)

yourappmail

A decoupled email generator:I Has a clean API

I Decoupled from your system's db: It needs to receive allinformation

I Customer informationI Custom dataI Contents of the email

I Can be deployed to as many servers as we needI Scalable

Not for everything

I Task queues are not a magic wand to make things fasterI They can be used as such (like cache).I It hides the real problem.

Celery

I Asynchronous distributed task queue

I Based on distributed message passing.

I Mostly for real-time queuing

I Can do scheduling too.

I REST: you can query status and results via URLs.

I Written in Python

I Celery: Message Brokers and Result Storage

Celery's tasks

I Tasks can be async or sync

I Low latency

I Rate limiting

I Retries

I Each task has an UUID: you can ask for the result back if youknow the task UUID.

I RabbitMQI Messaging systemI Protocol: AMQPI Open standard for messaging middlewareI Written in Erlang

I Easy to cluster!

Install the packages from the RabbitMQ website

I RabbitMQ ServerI Management Plugin (nice HTML interface)

I rabbitmq-plugins enable rabbitmq_management

I Go to http://localhost:55672/cli/ and download the cli.

I HTML interface at http://localhost:55672/

Set up a cluster

rabbit1$ rabbitmqctl cluster_statusCluster status of node rabbit@rabbit1 ...[{nodes,[{disc,[rabbit@rabbit1]}]},{running_nodes,[rabbit@rabbit1]}]...done.rabbit2$ rabbitmqctl stop_appStopping node rabbit@rabbit2 ...done.rabbit2$ rabbitmqctl resetResetting node rabbit@rabbit2 ...done.rabbit2$ rabbitmqctl cluster rabbit@rabbit1Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.rabbit2$ rabbitmqctl start_appStarting node rabbit@rabbit2 ...done.

Notes

I Automatic con�guration

I Use .config �le to describe the cluster.

I Change the type of the node

I RAM node

I Disk node

Install Celery

I Just pip install

De�ne a task

Example tasks.py

from celery.task import task

@taskdef add(x, y):

print "I received the task to add {} and {}".format(x, y)return x + y

Con�gure username, vhost, permissions

$ rabbitmqctl add_user myuser mypassword$ rabbitmqctl add_vhost myvhost$ rabbitmqctl set_permissions -p myvhost myuser ".*" ".*" ".*"

Con�guration �le

Write celeryconfig.py

BROKER_HOST = "localhost"BROKER_PORT = 5672BROKER_USER = "myusername"BROKER_PASSWORD = "mypassword"BROKER_VHOST = "myvhost"CELERY_RESULT_BACKEND = "amqp"CELERY_IMPORTS = ("tasks", )

Launch daemon

celeryd -I tasks # import the tasks module

Schedule tasks

from tasks import add

# Schedule the taskresult = add.delay(1, 2)

value = result.get() # value == 3

Schedule tasks by name

Sometimes the tasks module is not available on the clients

from tasks import add

# Schedule the taskresult = add.delay(1, 2)

value = result.get() # value == 3print value

Schedule the tasks better: apply_async

task.apply_async has more options:

I countdown=n: the task will run at least n seconds in thefuture.

I eta=datetime: the task will run not earlier than thandatetime.

I expires=n or expires=datetime the task will be revoked inn seconds or at datetime

I It will be marked as REVOKEDI result.get will raise a TaskRevokedError

I serializerI pickle: default, unless CELERY_TASK_SERIALIZER says

otherwise.I alternative: json, yaml, msgpack

Result

A result has some useful operations:

I successful: True if task succeeded

I ready: True if the result is ready

I revoke: cancel the task.

I result: if task has been executed, this contains the result if itraised an exception, it contains the exception instance

I state:I PENDINGI STARTEDI RETRYI FAILUREI SUCCESS

TaskSet

Run several tasks at once. The result keeps the order.

from celery.task.sets import TaskSetfrom tasks import addjob = TaskSet(tasks=[

add.subtask((4, 4)),add.subtask((8, 8)),add.subtask((16, 16)),add.subtask((32, 32)),

])result = job.apply_async()result.ready() # True -- all subtasks completedresult.successful() # True -- all subtasks successfulvalues = result.join() # [4, 8, 16, 32, 64]print values

TaskSetResult

The TaskSetResult has some interesting properties:

I successful: if all of the subtasks �nished successfully (noException)

I failed: if any of the subtasks failed.

I waiting: if any of the subtasks is not ready yet.

I ready: if all of the subtasks are ready.

I completed_count: number of completed subtasks.

I revoke: revoke all subtasks.

I iterate: iterate oer the return values of the subtasks oncethey �nish (sorted by �nish order).

I join: gather the results of the subtasks and return them in alist (sorted by the order on which they were called).

Retrying tasks

If the task fails, you can retry it by calling retry()

@taskdef send_twitter_status(oauth, tweet):

try:twitter = Twitter(oauth)twitter.update_status(tweet)

except (Twitter.FailWhaleError, Twitter.LoginError), exc:send_twitter_status.retry(exc=exc)

To limit the number of retries set task.max_retries.

Routing

apply_async accepts the parameter routing to create someRabbitMQ queues

pdf: ticket.#import_files: import.#

I Schedule the task to the appropriate queue

import_vouchers.apply_async(args=[filename],routing_key="import.vouchers")

generate_ticket.apply_async(args=barcodes,routing_key="ticket.generate")

celerybeat

from celery.schedules import crontab

CELERYBEAT_SCHEDULE = {# Executes every Monday morning at 7:30 A.M"every-monday-morning": {"task": "tasks.add","schedule": crontab(hour=7, minute=30,day_of_week=1),"args": (16, 16),

},}

There can be only one celerybeat running

I But we can have two machines that check on each other.

Import a big �le:

tasks.py

def import_bigfile(server, filename):with create_temp_file() as tmp:

fetch_bigfile(tmp, server, filename)import_bigfile(tmp)report_result(...) # e.g. send confirmation e-mail

Import big �le: Admin interface, server-Side

import tasksdef import_bigfile(filename):

result = tasks.imporg_bigfile.delay(filename)return result.task_id

class ImportBigfile(View):def post_ajax(request):

filename = request.get('big_file')task_id = import_bigfile(filename)return task_id

Import big �le: Admin interface, client-side

I Post the �le asynchronously

I Get the task_id back

I Put some �working. . . � message.

I Periodically ask Celery if the task is ready and change�working. . . � into �done!�

I No need to call Paylogic code: just ask Celery directly

I Improvements:I Send the username to the task.I Have the task call back the Admin interface when it's done.I The Backo�ce can send an e-mail to the user when the task is

done.

Do a time-consuming task.

from tasks import do_difficult_thing...stuff...# I have all data necessary to do the difficult thingdifficult_result = do_difficult_thing.delay(some, values)# I don't need the result just yet, I can keep myself busy... stuff ...# Now I really need the resultdifficult_value = difficult_result.get()

oscar
Typewritten Text
oscar
Typewritten Text
oscar
Typewritten Text
oscar
Typewritten Text