Gearman - Northeast PHP 2012

43
A Job Server to Scale By Mike Willbanks Sr. Web Architect Manager NOOK Developer Northeast PHP August 12, 2012

Transcript of Gearman - Northeast PHP 2012

Page 1: Gearman - Northeast PHP 2012

A Job Server to Scale

By Mike Willbanks

Sr. Web Architect Manager

NOOK Developer

Northeast PHP August 12, 2012

Page 2: Gearman - Northeast PHP 2012

2

• Talk

 Slides will be online later!

• Me

 Sr. Web Architect Manager at NOOK Developer

 Former MNPHP Organizer

 Open Source Contributor (Zend Framework and various others)

 Where you can find me:

• Twitter: mwillbanks G+: Mike Willbanks

•  IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com

• GitHub: https://github.com/mwillbanks

Housekeeping…

Page 3: Gearman - Northeast PHP 2012

3

• What is Gearman

  A general introduction

• Main Concepts

  Looking overall at how gearman works for you.

• Quick Start

 Make it go do something.

• Digging in

  A detailed look into gearman.

• PHP Integration

 How you should work with it in PHP including use cases and samples.

• Questions

  Although you can bring them up at anytime!

Agenda

Page 4: Gearman - Northeast PHP 2012

What is Gearman? Official Statement

What it means

Visual understanding

Platforms

Page 5: Gearman - Northeast PHP 2012

5

“Gearman provides a generic application framework to farm out work to other machines or processes that are better

suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between

languages.”

Official Statement

Page 6: Gearman - Northeast PHP 2012

6

• Gearman consists of a daemon, client and worker

 At the core, they are simply small programs.

• The daemon handles the negotiation of work

 Workers and Clients

• The worker does the work

• The client requests work to be done

What it Means

Page 7: Gearman - Northeast PHP 2012

7

In Pictures

Page 8: Gearman - Northeast PHP 2012

8

• OS

 Linux

 Windows (cygwin)

• API implementations available

 PHP

 Perl

 Java

 Ruby

 Python

Platforms

Page 9: Gearman - Northeast PHP 2012

Main Concepts Client -> Daemon -> Worker communication

Distributed Model

Page 10: Gearman - Northeast PHP 2012

10

Client -> Daemon -> Worker communication

Page 11: Gearman - Northeast PHP 2012

11

Distributed Model

Page 12: Gearman - Northeast PHP 2012

Quick Start Installation

Simple Bash Example

PHP Related (sorry, I’m all about the PHP)

Page 13: Gearman - Northeast PHP 2012

13

• Head to gearman.org

• Click Download

• Click on the LaunchPad download

• Download the Binary

• Unpack the binary

• ./configure && make && make install

• Bam! You’re off!

 For more advanced configuration see ./configure –help

• Starting

 gearmand -d

Installation

Page 14: Gearman - Northeast PHP 2012

14

• gearmand

 -d Run as background daemon

 -u [user] Run as user

 -L [host] Listen on host/ip

 -p [port] Listen on port

 -t [threads] Number of threads to use

 -v[vv] Verbosity

Gearmand Usage

Page 15: Gearman - Northeast PHP 2012

15

• Starting the Daemon

 gearmand –d

• Worker – command line style

 nohup gearman -w -f wc -- wc –l &

• Run the worker in the background.

• Client – command line style

 gearman -f wc < /etc/passwd

• Outputs the number of lines.

Simple Bash Example

Page 16: Gearman - Northeast PHP 2012

16

• gearman

 -w Worker mode

 -f [function] Function name to use

 -h [host] Job server host

 -p [port] Job server port

 -t [timeout] Timeout in milliseconds

 -H Full options for both clients and workers.

Gearman Client Command Line Usage

Page 17: Gearman - Northeast PHP 2012

Digging In Persistence

Workers

Monitoring

Page 18: Gearman - Northeast PHP 2012

18

• Gearman by default is an in-memory queue

 Leaving this as the default is ideal; however, does not work in all environments.

• Persistent Queues

 Libdrizzle

 Libsqlite3

 Libmemcached

 Postgres

 TokyoCabinet

 MySQL

 Redis

Persistence

Page 19: Gearman - Northeast PHP 2012

19

• Persistent queues require specific configuration during the compilation of gearman.

• Additionally, arguments to the gearman daemon need to be passed to talk to the specific persistence layer.

• Each persistence layer is actually built as a plugin to gearmand

 http://bazaar.launchpad.net/~tangent-org/gearmand/trunk/files/head:/libgearman-server/plugins/queue/

Getting Up and Running with Persistence

Page 20: Gearman - Northeast PHP 2012

20

Configuration Options

Page 21: Gearman - Northeast PHP 2012

21

• Clients send work to the gearmand server

 This is called the workload; it can be anything that can become a string.

 Utilize an open format; it will make life easier in the event you use multiple programming languages, are debugging or the like.

• XML, JSON, etc.

• Yes, you can serialize objects if you wanted to.

–  I recommend against this.

Clients

Page 22: Gearman - Northeast PHP 2012

22

• Workers are the dudes in the factory doing all the work

• Generally they will run as a daemon in the background

• Workers register a function that they perform

 They should ONLY be doing a single task.

 This makes them far easier to manage.

• The worker does the work and “can” return results

 If you are doing the work asynchronously you generally do not return the result.

 Synchronous work you will return the result.

Workers

Page 23: Gearman - Northeast PHP 2012

23

• Utilizing the Database

 If you keep a database connection

• Must have the ability to reconnect to the database.

• Watch for connection timeouts

• Handling Memory Leaks

 Watch the amount of memory and detect leaks then kill the worker.

• Request Languages

 PHP for instance, sometimes slows down after hundreds of executions, kill it off if you know this will happen.

Workers – special notes

Page 24: Gearman - Northeast PHP 2012

24

• Workers sometimes have issues and die, or you need to boot them back up after a restart

 Utilizing a service to watch your workers and ensure they are always running is a GOOD thing.

• Supervisord

 Can watch processes, restart them if they die or get killed

 Can manage multiple processes of the same program

 Can start and stop your workers.

 Running: supervisord –c myconfig.conf

• When running workers, BE SURE to handle KILL signals such as SIGKILL.

Keeping the Daemon Running

Page 25: Gearman - Northeast PHP 2012

25

Supervisord Example Add Proram

Page 26: Gearman - Northeast PHP 2012

26

• Gearman Status

 telnet on port 4730

 Write “STATUS”

• Gives you the registered functions, number of workers and items in the queue.

• Gearman Monitor – PHP Project

 Basic monitoring; but works and it is open source so you can improve it!

 https://github.com/yugene/Gearman-Monitor

Monitoring

Page 27: Gearman - Northeast PHP 2012

PHP Integration Usage (PEAR / PECL)

Frameworks / Integration

Handling Conditions

Use Cases

Page 28: Gearman - Northeast PHP 2012

28

• Two Options

 Net::Gearman (PEAR)

•  Implemented through sockets with PHP.

• https://pear.php.net/package/Net_Gearman/

 Gearman Extension (PECL)

•  Implemented through the C API from libgearman

• http://pecl.php.net/package/gearman

Usage

Page 29: Gearman - Northeast PHP 2012

29

• GearmanManager - agnostic

 https://github.com/brianlmoon/GearmanManager/

• Zend Framework 1: Zend_Gearman

 https://github.com/mwillbanks/Zend_Gearman

• Zend Framework 2: mwGearman

 https://github.com/mwillbanks/mwGearman

• Drupal

 http://drupal.org/node/783294

Frameworks and Integration

Page 30: Gearman - Northeast PHP 2012

30

• Watch for Memory Utilization

 Check peak usage then kill and restart the worker

• Don’t execute too many times

 PHP is not great at unlimited loops

• Keep your memory free

 Garbage collect when you can!

• Databases

 Implement a callback to ensure that you do not timeout; otherwise implement a reconnection.

Conditions

Page 31: Gearman - Northeast PHP 2012

31

• If you resize images on your web server:

 Web servers should serve, not process images.

 Images require a lot of memory AND processing power

• They are best to be processed on their own!

• Processing in the Background

 Generally will require a change to your workflow and checking the status with XHR to see if the job has been completed.

• This allows you to process them as you have resources available.

• Have enough workers to process them “quickly enough”

• Or just do it synchronously

Images

Page 32: Gearman - Northeast PHP 2012

32

Image Processing Example

Page 33: Gearman - Northeast PHP 2012

33

Image Processing Example

Page 34: Gearman - Northeast PHP 2012

34

• Sending email and/or generating templates and processing variables can take up time, time that is better spent getting the user to the next page.

• The feedback on the mail doesn’t really make a difference so it is great to send it to the background.

Email

Page 35: Gearman - Northeast PHP 2012

35

Email Example

Page 36: Gearman - Northeast PHP 2012

36

Email Example

Page 37: Gearman - Northeast PHP 2012

37

• Get all of your logs to a single place

• Process the logs to produce analytical data

• Impression / Click Tracking

• Why run introspection over the log file itself?

 Near real-time analysis is possible!

Log Analysis / Aggregation

Page 38: Gearman - Northeast PHP 2012

38

Log Analysis / Aggregation

Page 39: Gearman - Northeast PHP 2012

39

Log Analysis / Aggregation

Page 40: Gearman - Northeast PHP 2012

40

• You need to run an executable process…

• This process takes a given name and tells you how many processes are running on your worker machine.

 Purely for example purposes; however, you might want to run SaaS against a CMS or something to that degree.

Executable Processes

Page 41: Gearman - Northeast PHP 2012

41

Executable Process Example

Page 42: Gearman - Northeast PHP 2012

42

Executable Process Example

Page 43: Gearman - Northeast PHP 2012

Questions? These slides will be posted to SlideShare & SpeakerDeck.

 Slideshare: http://www.slideshare.net/mwillbanks

 SpeakerDeck: http://speakerdeck.com/u/mwillbanks

 Twitter: mwillbanks

 G+: Mike Willbanks

 IRC (freenode): mwillbanks

 Blog: http://blog.digitalstruct.com

 GitHub: https://github.com/mwillbanks