Post on 16-Apr-2017
Distribute the workload
Helgi Þormar ÞorbjörnssonPHP Barcelona, 29th of October 2011
Saturday, 29 October 11
Who am I?
Saturday, 29 October 11
Co-founded Orchestra.io
Developer at PEAR
From Iceland
@h on Twitter
Helgi
Saturday, 29 October 11
Why Distribute?
Budget
Efficiency
Perception
Saturday, 29 October 11
Efficiency
10 small servers > 1 big
Saturday, 29 October 11
Budget
Spend wisely
Commodity servers
Cloud Computing (EC2)
Saturday, 29 October 11
Perception
Defer intensive processes
Give instant feedback
Users keep on browsing
Saturday, 29 October 11
Saturday, 29 October 11
Ant Colonies
Saturday, 29 October 11
Teamwork
When faced with a problem they will solve the problem as one.
Saturday, 29 October 11
Saturday, 29 October 11
Saturday, 29 October 11
Architect for Distribution
Saturday, 29 October 11
Characteristics
Decoupling
Elasticity
High Availability
Concurrency
Saturday, 29 October 11
Decoupling
Saturday, 29 October 11
Application
DB API
Cache FE
Saturday, 29 October 11
Application
DB API
Cache FE
Cache
API
API
Saturday, 29 October 11
Elasticity
Saturday, 29 October 11
Cloud Computing
Saturday, 29 October 11
Load Balancing
Saturday, 29 October 11
HA Proxy
Nginx
My Favourite
Saturday, 29 October 11
Monitoring
Saturday, 29 October 11
When do I need more servers?
Saturday, 29 October 11
Needs to be around from the start!
Saturday, 29 October 11
Keep records
Saturday, 29 October 11
Spot trends
Saturday, 29 October 11
Different types
Hardware Performance
Software Performance
Availability
Resourcing
Saturday, 29 October 11
ApplicationsNew Relic
CloudKick
ScoutApp
Nagios
Cacti
Circonus
Saturday, 29 October 11
Automation
Saturday, 29 October 11
Plug into your monitoring
Saturday, 29 October 11
Bringing together Monitoring and Elastic behaviour into one
beautiful whole!
Saturday, 29 October 11
Add some intelligence to add / remove servers as needed based
on current information.
Saturday, 29 October 11
Just make sure it doesn’t turn into...
Saturday, 29 October 11
Skynet!!Saturday, 29 October 11
High Availability
Saturday, 29 October 11
Get a highly available and resilient setup by following a few
of those recommendations
Saturday, 29 October 11
Remember, even Google has outages
Saturday, 29 October 11
What to avoid
Saturday, 29 October 11
Local Sessions
Saturday, 29 October 11
Store sessions in DB / Memcache
Solution
Saturday, 29 October 11
Local Memory
Saturday, 29 October 11
Networked Memcache
Solution
Saturday, 29 October 11
Local Files
Saturday, 29 October 11
Local Uploads
Saturday, 29 October 11
Writing to /tmp
Saturday, 29 October 11
Store on S3 or a networked FS
Solution
Saturday, 29 October 11
Serve up static files from CDNs
Solution
Saturday, 29 October 11
Servers can vanish at any given time
Saturday, 29 October 11
Internal APIs
Saturday, 29 October 11
Application
S3GFS FS
Internal Storage API
Saturday, 29 October 11
Application
MySQLMongo Cache
Internal DB API
Saturday, 29 October 11
SOA
Saturday, 29 October 11
Service Oriented Architecture
Saturday, 29 October 11
Sort of :-)
Saturday, 29 October 11
Eventually Consistent
Saturday, 29 October 11
CAP Therom
Saturday, 29 October 11
Consistency
Availability
Partition Tolerance
Saturday, 29 October 11
Consistency
All nodes see the same data at the same time
Saturday, 29 October 11
Availability
Node failures do not prevent survivors from continuing to
operate
Saturday, 29 October 11
Partition Tolerance
The system continues to operate despite arbitrary message loss
Saturday, 29 October 11
Consistency
Availability
Partition Tolerance
Saturday, 29 October 11
Queue Systems
Saturday, 29 October 11
Good forImage Processing
Distributed Logs
Data Mining
Mass Emails
Intensive transformation
Search
Saturday, 29 October 11
Common Tools
Gearman
Hadoop
ZeroMQ
RabbitMQ
And many others!
Saturday, 29 October 11
New York Times
4TB of TIFF files
Needed to get 11 million PDF versions
Used Hadoop and EC2
100 machines took 24 hours
Saturday, 29 October 11
Map/Reduce
Saturday, 29 October 11
Map
Master gets a problem to solve
Breaks into multiple sub-problems
Distributed to multiple workers
A worker can take the same steps
Answer passed back to Master
Saturday, 29 October 11
Reduce
Takes in answers from the map workers
Combines together to get an answer
There can be multiple reducers
Saturday, 29 October 11
process petabytes of data in few hours on commodity server farm
Saturday, 29 October 11
CouchDB
Saturday, 29 October 11
CouchDB
Highly Concurrent
Schema free, document based
RESTful API
Map/Reduce Views
Easy Replication
Saturday, 29 October 11
Gearman
Saturday, 29 October 11
Your Client Code
Gearman Client API(C, PHP, Perl, MySQL UDF, ...)
Gearman Job Servergearmand
Gearman Worker API(C, PHP, Perl, Python, ...)
Your Worker Code
Your App Gearman
Saturday, 29 October 11
pear.php.net/net_gearman
Saturday, 29 October 11
A Story!
Saturday, 29 October 11
Financial Software
Saturday, 29 October 11
3000+ Clients
Saturday, 29 October 11
Each one has 5 external data sources
Saturday, 29 October 11
Each data source is a web service
Saturday, 29 October 11
Ran every 6 hours every day
Saturday, 29 October 11
Cron
Gearman
Job 11
2
3
4
5
Web Services
1
43
2
5
Processing
Saturday, 29 October 11
But! That wasn’t enough
Saturday, 29 October 11
Job kicked off on login
Saturday, 29 October 11
Supervisord
Saturday, 29 October 11
Another Story!
Saturday, 29 October 11
CloudSplit
Saturday, 29 October 11
Near Real Time Cloud Analytics
Saturday, 29 October 11
Clients install logging agent locally
Saturday, 29 October 11
syslogd
Saturday, 29 October 11
Public API
Saturday, 29 October 11
Multiple Persistent Gearman Servers
Saturday, 29 October 11
Internal DB API
Saturday, 29 October 11
Agent syslogd
API
Gearman
Gearman
CouchDB
Worker
Worker
Worker
Internal API
Load Balanced
Load Balanced
PersistentSaturday, 29 October 11
CouchDB Setup
Saturday, 29 October 11
Write vs Read
Saturday, 29 October 11
Writes
Multi Master setup
Replicated
Deals with writes only
Saturday, 29 October 11
Reads
Multi Master setup
Replicated from write cluster
Slaves handle website requests
Saturday, 29 October 11
Heavy Map/Reduce usage for data
Saturday, 29 October 11
Questions?
@hhelgi@orchestra.io
Joind.in: http://joind.in/4326
Saturday, 29 October 11