Scalable Drupal infrastructure

52
Presented 2009-05-30 by David Strauss Designing, Scoping, and Configuring Scalable Drupal Infrastructure

description

A guide to planning, deploying, and scaling big websites using Drupal. For more Four Kitchens presentations, please visit http://fourkitchens.com/presentations

Transcript of Scalable Drupal infrastructure

Page 1: Scalable Drupal infrastructure

Presented 2009-05-30 by David Strauss

Designing, Scoping, and Configuring Scalable Drupal Infrastructure

Page 2: Scalable Drupal infrastructure

UnderstandingLoad Distribution

Page 3: Scalable Drupal infrastructure

Predicting peak trafficTraffic over the day can be highly irregular. To plan for peak loads, design as if all traffic were as heavy as the peak hour of load in a typical month -- and then plan for some growth.

Page 4: Scalable Drupal infrastructure

3%

10%

40%

50%

30%

100%

70%

20%

Analyzing hit distribution

Anonymous

Authenticated

Dynam

ic Pages

Static Content

Human

Web Craw

lerNo Special Treatment

7%

“Pay Wall” Bypass

Page 5: Scalable Drupal infrastructure

Throughput vs. Delivery MethodsGreen

(Static)Yellow

(Dynamic, Cacheable)Red

(Dynamic)

Content Delivery Network

Reverse Proxy Cache

Drupal + Page Cache+ memcached

Drupal + Page Cache

Drupal

●●●●●●●●●● ✖ ✖

●●●●●●● ●●●●●●● ✖

●●● ●●● ✖

●●● ●● ✖

●●● ● ●

1

Delivered by Apache without Drupal

1

1

1

More dots = More throughput Some actually can do this.2

2

10 req/s

1000 req/s

Page 6: Scalable Drupal infrastructure

Objective

Deliver hits using the fastest, most scalable

method available.

Page 7: Scalable Drupal infrastructure

Layering: Less Traffic at Each Step

CDN

Load Balancer

Reverse Proxy Cache

Application Server

Database

Traffic

Your Datacenter

DNS Round Robin

Page 8: Scalable Drupal infrastructure

Offload from the master database

Application Server

Search

Memory Cache

SlaveDatabase

Your master database is the single greatest limitation on scalability.

MasterDatabase

Page 9: Scalable Drupal infrastructure

Tools to use

‣ Apache Solr for search.(Acquia offers hosting of this now.)

‣ Squid or Varnish for reverse proxy caching.

‣ Any third-party service for CDN.

Page 10: Scalable Drupal infrastructure

Do the math‣ All non-CDN traffic travels through your load

balancers and reverse proxy caches. Even traffic passed through to application servers must run through the initial layers.

Load Balancer

Reverse Proxy Cache

Application Server

Traffic

What hit rate is each layer geing?How many servers share the load?

Page 11: Scalable Drupal infrastructure

Get a management/monitoring box

ManagementApplication

Server

Reverse Proxy Cache

Database

Load Balancer

(maybe two or three and have them specialized or

redundant)

Page 12: Scalable Drupal infrastructure

Planning + Scoping

Page 13: Scalable Drupal infrastructure

Infrastructure goals

‣ Redundancy

‣ Scalability

‣ Performance

‣ Manageability

Page 14: Scalable Drupal infrastructure

Redundancy

‣ When one server fails, the website shouldbe able to recover without taking too long.

‣ This requires N+1, putting a flooron system requirements.

‣ How long can your site be down?

‣ Automatic versus manual failover

Page 15: Scalable Drupal infrastructure

Performance

‣ Find the “sweet spot” for hardware. This is the best price/performance point.

‣ Avoid overspending on any type of component

‣ Yet, avoid creating bottlenecks

‣ Swapping memory to disk is very dangerous

Page 16: Scalable Drupal infrastructure

Relative importance

Processors/Cores Memory Disk Speed

Reverse Proxy Cache

Web Server

Database Server

Monitoring

● ●●● ●●

●●●●● ●● ●

●● ●●●● ●●●●

● ● ●

Page 17: Scalable Drupal infrastructure

Reverse proxy caches‣ Squid makes poor use of multiple cores. Focus on

getting the highest per-core performance. The best per-core performance is often on dual-core processors with high clock rates and lots of cache.

‣ Varnish is much more multithreaded.

‣ 4-8 GB memory, total

‣ Expect 1000 requests per second, per Squid

‣ 64-bit operating system if more than 2 GB RAM

Page 18: Scalable Drupal infrastructure

Web servers‣ Apache 2.2 + mod_php + memcached

‣ Many processors + many cores is best

‣ 25 Apache threads per core

‣ 50 MB memory per thread, system-wide

‣ 1 GB memory for system

‣ 1 GB memory for memcached

‣ Configure MaxClients in Apache to maximum system-wide thread count

‣ Expect 1 request per thread, per second

Page 19: Scalable Drupal infrastructure

Database servers‣ MySQL 5.0 cannot use more than eight cores

effectively but gets good gains from at least quad-core processors.

‣ Depend on each Apache thread needing one connection, and add another 50.

‣ Each MySQL connection needs around 6 MB.

‣ MySQL with InnoDB needs a buffer pool large enough to cache all indexes. Start by giving the pool most remaining database server memory and working from there.

‣ 64-bit operating system if more than 2 GB RAM

Page 20: Scalable Drupal infrastructure

Monitoring server

‣ Very low hardware requirements

‣ Choose hardware that is inexpensive but essentially similar to the rest of the cluster to reduce management overhead

‣ Reliability and fast failover are typically low priorities for monitoring services

Page 21: Scalable Drupal infrastructure

Assembling the numbers‣ Start with an architecture providing redundancy.

‣ Two servers, each running the whole stack

‣ Increase the number of proxy caches based on anonymous and search engine traffic.

‣ Increase the number of web servers based on authenticated traffic.

‣ Databases are harder to predict, but large sites should run them on at least two separate boxes with replication.

Page 22: Scalable Drupal infrastructure

PressflowMake Drupal sites scale by upgrading corewith a compatible, powerful replacement.

Page 23: Scalable Drupal infrastructure

Common large-site issues‣ Drupal core requires patching to effectively

support the advanced scalability techniques discussed here.

‣ Patches often conflict and have to be reapplied with each Drupal upgrade.

‣ The original patches are often unmaintained.

‣ Sites stagnate, running old, insecure versions of Drupal core because updating is too difficult.

Page 24: Scalable Drupal infrastructure

What is Pressflow?‣ Pressflow is a derivative of Drupal core that

integrates the most popular performance and scalability enhancements.

‣ Pressflow is completely compatible with existing Drupal 5 and 6 modules, both standard and custom.

‣ Pressflow installs as a drop-in replacement for standard Drupal.

‣ Pressflow is free as long as the matching version of Drupal is also supported by the community.

Page 25: Scalable Drupal infrastructure

What are the enhancements?‣ Reverse proxy support

‣ Database replication support

‣ Lower database and session management load

‣ More efficient queries

‣ Testing and optimization by Four Kitchenswith standard high-performance softwareand hardware configuration

‣ Industry-leading scalability supportby Four Kitchens and Tag1 Consulting

Page 26: Scalable Drupal infrastructure

Four Kitchens + Tag1

‣ Provide the development, support, scalability, and performance services behind Pressflow

‣ Comprise most members of the Drupal.org infrastructure team

‣ Have the most experience scaling Drupal sitesof all sizes and all types

Page 27: Scalable Drupal infrastructure

Ready to scale?‣ Learn more about Pressflow:

‣ Pick up pamphlets in the lobby

‣ Request Pressflow releases at fourkitchens.com

‣ Get the help you need to make it happen:

‣ Talk to me (David) or Todd here at DrupalCamp

‣ Email [email protected]

Page 28: Scalable Drupal infrastructure

Managing the Cluster

Page 29: Scalable Drupal infrastructure

The problem

Application Server

Application Server

Application Server

Application Server

Application Server

Soware and Configuration

Objectives:Fast, atomic deployment and rollbackMinimize single points of failure and contentionRestart servicesIntegrate with version control systems

Page 30: Scalable Drupal infrastructure

Manual updates and deployment

Application Server

Application Server

Application Server

Application Server

Application Server

Human Human Human Human Human

Why not: slow deployment,non-atomic/difficult rollbacks

Page 31: Scalable Drupal infrastructure

Shared storageApplication

ServerApplication

ServerApplication

ServerApplication

ServerApplication

Server

NFS

Why not: single point of contention and failure

Page 32: Scalable Drupal infrastructure

rsync

Application Server

Application Server

Application Server

Application Server

Application Server

Synchronizedwith rsync

Why not: non-atomic, does not manage services

Page 33: Scalable Drupal infrastructure

Capistrano

Application Server

Application Server

Application Server

Application Server

Application Server

Deployed withCapistrano

Capistrano provides near-atomic deployment,service restarts, automated rollback, test automation, and version control integration (tagged releases).

Page 34: Scalable Drupal infrastructure

Multistage deployment

Application Server

Application Server

Application Server

Application Server

Application Server

Deployed withCapistrano

Development Integration

Deployed withCapistrano

Staging

Deployed withCapistrano

Deploymentscan be staged.

cap staging deploycap production deploy

Page 35: Scalable Drupal infrastructure

But your application isn’t the only thing to manage.

Page 36: Scalable Drupal infrastructure

Beneath the application

Application Server

Application Server

Application Server

Application Server

Application Server

cfengine and bcfg2 are popularcluster-level system configuration tools.

Reverse Proxy Cache

DatabaseCluster-level configuration

Cluster management applies to package management, updates, and soware configuration.

Page 37: Scalable Drupal infrastructure

System configuration management‣ Deploys and updates packages, cluster-wide or

selectively.

‣ Manages arbitrary text configuration files

‣ Analyzes inconsistent configurations (and converges them)

‣ Manages device classes (app. servers, database servers, etc.)

‣ Allows confident configuration testing on a staging server.

Page 38: Scalable Drupal infrastructure

All on the management box

Management {Development

Integration

Staging

Deployment Tools

Monitoring

Page 39: Scalable Drupal infrastructure

Monitoring

Page 40: Scalable Drupal infrastructure

Types of monitoringFailure Capacity/Load

Analyzing Downtime

Viewing Failover

Troubleshooting

Notification

Analyzing Trends

Predicting Load

Checking Results of Configuration and Soware Changes

Page 41: Scalable Drupal infrastructure

Everyone needs both.

Page 42: Scalable Drupal infrastructure

What to use

Failure/Uptime Capacity/Load

Nagios

Hyperic

Cacti

Munin

Page 43: Scalable Drupal infrastructure

Nagios‣ Highly recommended.

‣ Used by Four Kitchens and Tag1 Consulting for client work, Drupal.org, Wikipedia, etc.

‣ Easy to install on CentOS 5 using EPEL packages.

‣ Easy to install nrpe agents to monitor diverse services.

‣ Can notify administrators on failure.

‣ We use this on Drupal.org

Page 44: Scalable Drupal infrastructure

Hyperic

‣ I haven’t used this much, but it’s fairly popular.

‣ More difficult to set up than Nagios.

Page 45: Scalable Drupal infrastructure

Cacti‣ Highly annoying to set up.

‣ One instance generally collects all statistics.(No “agents” on the systems being monitored.)

‣ Provides flexible graphs that can be customized on demand.

‣ Optimized database for perpetual statistics collection.

‣ We use this on Drupal.org and for client sites.

Page 46: Scalable Drupal infrastructure

Munin‣ Fairly easy to set up.

‣ One instance generally collects all statistics.(No “agents” on the systems being monitored.)

‣ Provides static graphs that cannot be customized.

Page 47: Scalable Drupal infrastructure

Cluster Problems

Page 48: Scalable Drupal infrastructure

Cache/session coherency‣ Systems that run properly on single boxes may

lose coherency when run on a networked cluster.

‣ Some caches, like APC’s object cache, have no ability to handle network-level coherency. (APC’s opcode cache is safe to use on clusters.)

‣ memcached, if misconfigured, can hash values inconsistently across the cluster, resulting in different servers using different memcached instances for the same keys.

‣ Session coherency can be helped with load balancer affinity.

Page 49: Scalable Drupal infrastructure

Cache regeneration races‣ Downside to network cache coherency: synched

expiration

‣ Hard to solve

Old Cached Item

Time

Expiration

New Cached Item

{All servers regenerating the item.

Page 50: Scalable Drupal infrastructure

Broken replication

‣ MySQL slave servers get out of synch, fall further behind

‣ No means of automated recovery

‣ Only solvable with good monitoring and recovery procedures

‣ Can automate removal from use, but requires cluster management tools

Page 51: Scalable Drupal infrastructure

Server failure‣ Load balancers can remove broken or overloaded

application reverse proxy caches.

‣ Reverse proxy caches like Varnish can automatically use only functional application servers.

‣ Cluster management tools like heartbeat2 can manage service IPs on MySQL servers to automate failover.

‣ Conclusion: Each layer intelligently monitors and uses the servers beneath it.

Page 52: Scalable Drupal infrastructure

All content in this presentation, except where noted otherwise, is Creative Commons Attribution-ShareAlike 3.0 licensed and copyright 2009 Four Kitchen Studios, LLC.