Distribute the workload, PHP Barcelona 2011

Post on 16-Apr-2017

4.086 views 0 download

Transcript of Distribute the workload, PHP Barcelona 2011

Distribute the workload

Helgi Þormar ÞorbjörnssonPHP Barcelona, 29th of October 2011

Saturday, 29 October 11

Who am I?

Saturday, 29 October 11

Co-founded Orchestra.io

Developer at PEAR

From Iceland

@h on Twitter

Helgi

Saturday, 29 October 11

Why Distribute?

Budget

Efficiency

Perception

Saturday, 29 October 11

Efficiency

10 small servers > 1 big

Saturday, 29 October 11

Budget

Spend wisely

Commodity servers

Cloud Computing (EC2)

Saturday, 29 October 11

Perception

Defer intensive processes

Give instant feedback

Users keep on browsing

Saturday, 29 October 11

Saturday, 29 October 11

Ant Colonies

Saturday, 29 October 11

Teamwork

When faced with a problem they will solve the problem as one.

Saturday, 29 October 11

Saturday, 29 October 11

Saturday, 29 October 11

Architect for Distribution

Saturday, 29 October 11

Characteristics

Decoupling

Elasticity

High Availability

Concurrency

Saturday, 29 October 11

Decoupling

Saturday, 29 October 11

Application

DB API

Cache FE

Saturday, 29 October 11

Application

DB API

Cache FE

Cache

API

API

Saturday, 29 October 11

Elasticity

Saturday, 29 October 11

Cloud Computing

Saturday, 29 October 11

Load Balancing

Saturday, 29 October 11

HA Proxy

Nginx

My Favourite

Saturday, 29 October 11

Monitoring

Saturday, 29 October 11

When do I need more servers?

Saturday, 29 October 11

Needs to be around from the start!

Saturday, 29 October 11

Keep records

Saturday, 29 October 11

Spot trends

Saturday, 29 October 11

Different types

Hardware Performance

Software Performance

Availability

Resourcing

Saturday, 29 October 11

ApplicationsNew Relic

CloudKick

ScoutApp

Nagios

Cacti

Circonus

Saturday, 29 October 11

Automation

Saturday, 29 October 11

Plug into your monitoring

Saturday, 29 October 11

Bringing together Monitoring and Elastic behaviour into one

beautiful whole!

Saturday, 29 October 11

Add some intelligence to add / remove servers as needed based

on current information.

Saturday, 29 October 11

Just make sure it doesn’t turn into...

Saturday, 29 October 11

Skynet!!Saturday, 29 October 11

High Availability

Saturday, 29 October 11

Get a highly available and resilient setup by following a few

of those recommendations

Saturday, 29 October 11

Remember, even Google has outages

Saturday, 29 October 11

What to avoid

Saturday, 29 October 11

Local Sessions

Saturday, 29 October 11

Store sessions in DB / Memcache

Solution

Saturday, 29 October 11

Local Memory

Saturday, 29 October 11

Networked Memcache

Solution

Saturday, 29 October 11

Local Files

Saturday, 29 October 11

Local Uploads

Saturday, 29 October 11

Writing to /tmp

Saturday, 29 October 11

Store on S3 or a networked FS

Solution

Saturday, 29 October 11

Serve up static files from CDNs

Solution

Saturday, 29 October 11

Servers can vanish at any given time

Saturday, 29 October 11

Internal APIs

Saturday, 29 October 11

Application

S3GFS FS

Internal Storage API

Saturday, 29 October 11

Application

MySQLMongo Cache

Internal DB API

Saturday, 29 October 11

SOA

Saturday, 29 October 11

Service Oriented Architecture

Saturday, 29 October 11

Sort of :-)

Saturday, 29 October 11

Eventually Consistent

Saturday, 29 October 11

CAP Therom

Saturday, 29 October 11

Consistency

Availability

Partition Tolerance

Saturday, 29 October 11

Consistency

All nodes see the same data at the same time

Saturday, 29 October 11

Availability

Node failures do not prevent survivors from continuing to

operate

Saturday, 29 October 11

Partition Tolerance

The system continues to operate despite arbitrary message loss

Saturday, 29 October 11

Consistency

Availability

Partition Tolerance

Saturday, 29 October 11

Queue Systems

Saturday, 29 October 11

Good forImage Processing

Distributed Logs

Data Mining

Mass Emails

Intensive transformation

Search

Saturday, 29 October 11

Common Tools

Gearman

Hadoop

ZeroMQ

RabbitMQ

And many others!

Saturday, 29 October 11

New York Times

4TB of TIFF files

Needed to get 11 million PDF versions

Used Hadoop and EC2

100 machines took 24 hours

Saturday, 29 October 11

Map/Reduce

Saturday, 29 October 11

Map

Master gets a problem to solve

Breaks into multiple sub-problems

Distributed to multiple workers

A worker can take the same steps

Answer passed back to Master

Saturday, 29 October 11

Reduce

Takes in answers from the map workers

Combines together to get an answer

There can be multiple reducers

Saturday, 29 October 11

process petabytes of data in few hours on commodity server farm

Saturday, 29 October 11

CouchDB

Saturday, 29 October 11

CouchDB

Highly Concurrent

Schema free, document based

RESTful API

Map/Reduce Views

Easy Replication

Saturday, 29 October 11

Gearman

Saturday, 29 October 11

Your Client Code

Gearman Client API(C, PHP, Perl, MySQL UDF, ...)

Gearman Job Servergearmand

Gearman Worker API(C, PHP, Perl, Python, ...)

Your Worker Code

Your App Gearman

Saturday, 29 October 11

pear.php.net/net_gearman

Saturday, 29 October 11

A Story!

Saturday, 29 October 11

Financial Software

Saturday, 29 October 11

3000+ Clients

Saturday, 29 October 11

Each one has 5 external data sources

Saturday, 29 October 11

Each data source is a web service

Saturday, 29 October 11

Ran every 6 hours every day

Saturday, 29 October 11

Cron

Gearman

Job 11

2

3

4

5

Web Services

1

43

2

5

Processing

Saturday, 29 October 11

But! That wasn’t enough

Saturday, 29 October 11

Job kicked off on login

Saturday, 29 October 11

Supervisord

Saturday, 29 October 11

j.mp/supervisord

Saturday, 29 October 11

Another Story!

Saturday, 29 October 11

CloudSplit

Saturday, 29 October 11

Near Real Time Cloud Analytics

Saturday, 29 October 11

Clients install logging agent locally

Saturday, 29 October 11

syslogd

Saturday, 29 October 11

Public API

Saturday, 29 October 11

Multiple Persistent Gearman Servers

Saturday, 29 October 11

Internal DB API

Saturday, 29 October 11

Agent syslogd

API

Gearman

Gearman

CouchDB

Worker

Worker

Worker

Internal API

Load Balanced

Load Balanced

PersistentSaturday, 29 October 11

CouchDB Setup

Saturday, 29 October 11

Write vs Read

Saturday, 29 October 11

Writes

Multi Master setup

Replicated

Deals with writes only

Saturday, 29 October 11

Reads

Multi Master setup

Replicated from write cluster

Slaves handle website requests

Saturday, 29 October 11

Heavy Map/Reduce usage for data

Saturday, 29 October 11

Questions?

@hhelgi@orchestra.io

Joind.in: http://joind.in/4326

Saturday, 29 October 11