Drupal Backend Performance and Scalability

45
Drupal Backend Performance and Scalability Ashok and Christefano August 7, 2010

Transcript of Drupal Backend Performance and Scalability

Page 1: Drupal Backend Performance and Scalability

Drupal Backend Performance and Scalability

Ashok and ChristefanoAugust 7, 2010

Page 2: Drupal Backend Performance and Scalability

About me (BTMash)

Systems programmer. Work at California Institute of the Arts. Working with Drupal since 2006 (4.6.x) Primarily help with patches and

upgrades for contrib modules on d.o. Abuse Favorites Flag Content Fivestar Nice Menus Simplenews Userpoints Userpoints_badges User_badge And more!

Strong interest in server optimization

Page 3: Drupal Backend Performance and Scalability

About me (Christefano)

LADrupal Manager Camp Organizer Drupal Core Contributor Person of the Year D.O and G.D.O webmaster teams.

Page 4: Drupal Backend Performance and Scalability

Resources

Khalid Baheyeldin http://2bits.com

Drupal.org infrastructure group

Peter Zaitsev http://www.mysqlperformanceblog.com/about/

Page 5: Drupal Backend Performance and Scalability

Goals• Define your objectives and goals first

• Do you want faster response to the end user per page?

• Do you want to handle more page views?

• Do you want to minimize downtime?

• Each is different, though they may be related

• Sometimes, there are some ‘low hanging fruit’ that are easy to see, that provide noticeable improvement with little effort.

• Gets hard to achieve more performance (more efforts, low return)• More infrastructure (splitting servers, multiple web/db servers, etc)

• Patching Drupal

• Revisions to Architecture (CCK, Views)

Page 6: Drupal Backend Performance and Scalability

Diagnosis• Proper diagnosis is essential before proposing and implementing a

solution or you are running blind.

• Must be based on proper data.

• Analysis of data collected.

• Can lead towards a few possible paths of optimization.

Page 7: Drupal Backend Performance and Scalability

Points of optimization1. Introduction

2. Tools to measure and diagnose issues

3. Speed Optimizations

Page 8: Drupal Backend Performance and Scalability

On to the Server…• Much more complex

• Measurement and monitoring tools

• Hardware

• The LAMP Stack• Linux, Apache, MySQL, PHP

• Drupal• Database queries

• Modules

• Caching

Page 9: Drupal Backend Performance and Scalability

Validation• Avoid the ‘wild goose chase’

• Validate the results on a test server

• Replicate the site on a development server (or local machine with the current data)• Backup and Migrate module help with that

• Recreate the site

• Gather a time difference ratio between the test and production server (production server is likely to be much faster)• Give a rough idea on any improvements made.

• Measure again and see if relative times remain the same.

Page 10: Drupal Backend Performance and Scalability

The LAMP Stack• Currently most commonly used stack for hosting Drupal and

similar applications• Linux

• Apache

• MySQL

• PHP

• Most of this part of the presentation applies to *BSD as well.

Page 11: Drupal Backend Performance and Scalability

Hardware• Physical server matters

• Dedicated

• VPS

• In the Cloud (Yes, it does work!)

• Multiple cores are the norm• 8 > 4 > 2 > 1

• Lots of RAM (caching the file system and database as much as possible)

• Multiple disks

• Not really applicable to shared hosting

Page 12: Drupal Backend Performance and Scalability

Multiple Servers• One database server, multiple web servers

• Can use DNS round robin for load balancing

• Or proper load balancers

• Can use a reverse proxy (much like Drupal.org)

• Do it only if you have the budget• Complexity is expensive (running cost, maintenance time)

• Tuning a system can avoid (or delay) a split

Page 13: Drupal Backend Performance and Scalability

Testing tools• ab/ab2 (Apache Benchmark)

• Ab –c 50 –n 10000

• Do 50 concurrent requests for up to 10000 requests

• Average response time per second

• How many requests handled per second

• Can test with authenticated sessions

• Siege• http://www.joedog.org/index/siege-home

• Similar in functionality to ab

• Can test POST functionality

• Cannot natively use on windows

• JMeter• http://jakarta.apache.org/jmeter/

• Desktop application written in java (can natively use on windows)

Page 14: Drupal Backend Performance and Scalability

Hardware Monitoring tools• Top

• Classic Unix/Linux program

• Real time monitoring (What is the system doing right now??)

• Load average

• CPU utilization

• Memory usage

• List of processes, sorted, with CPU memory

• Can change order of sorting, time interval, grouping, along with many other things.

• Htop• http://htop.sourceforge.net/

• Similar to top but for multiple cores

• Faster

• Colours, styles

Page 15: Drupal Backend Performance and Scalability

Hardware Monitoring tools• atop

• http://www.atcomputing.nl/Tools/atop/

• Different format and information

• Shows network statistics

• Runs a collection daemon in the background

• Vmstat• Report memory statistics

• Can display in increments (vmstat 5 means it will show snapshots every 5 seconds)

• netstat• Shows active network connections

• netstat –anp

• netstat –anp | grep EST

Page 16: Drupal Backend Performance and Scalability

Graphical Monitoring tools• Cacti

• http://www.cacti.net/

• Available as a package on Ubuntu, Debian (any other *nix / *bsd?)

• Easy to understand graphs

• Displays history over day, week, month, year

• Graphs available to display statistics for CPU, memory, network, Apache, MySQL

• Many others written by others available online (can find one for memcached, as an example)

• Munin• http://munin.projects.linpro.no/

• Very similar to Cacti

• Can also create your own monitoring scripts

• Nagios• Haven’t used it yet

• Supposed to be very powerful (alerts by email, sms, etc)

• Drupal module which integrates with tool

Page 17: Drupal Backend Performance and Scalability

Linux / BSD• Use a proven stable distribution (Debian Stable, Ubuntu

LTS, RHEL, CentOS)

• Use recent versions

• Use whatever your distribution your staff has expertise with (most important!)

• Try to avoid bloat• Don’t install PostGreSQL if you are only using MySQL, no desktop

server, java, etc

• Balance compiling own programs versus using packages from distro

• Compiling provides full control (and may let you get more out of) on the program.• Can be a pain to upgrade.

• Using packages only let you use whatever the latest version may be (APC was not available for RHEL until middle of last year)• Upgrades are relatively painless.

Page 18: Drupal Backend Performance and Scalability

Apache• Most popular, supported, and feature rich

• Fairly stable

• Can also be enabled with too many unnecessary modules

• Disable unnecessary modules (mod_proxy, cgi_module, etc may not be necessary on server if not hosting sites which take advantage of it)

• Smaller process = more users can access site

• ‘apachectl –M’ displays all modules currently enabled on site.

• Monitoring Tool – apachetop• http://www.webta.org/projects/apachetop/

• Reads and analyses Apache’s access log

• Show all / recent hits• Requests / s, KB / s, KB / request

• Good to detect crawlers

• Run using ‘apachetop –f /path/to/site/access.log’

Page 19: Drupal Backend Performance and Scalability

Apache Optimizations• MaxClients (prevent swapping / thrashing)

• Too low – cannot server enough clients

• Too high – you run out of memory, and you start swapping (server dies and cannot server ANY clients)

• MaxRequestsPerChild• Tune it to terminate the process faster and free up memory.

• KeepAlive• Keep it on.

• mod_gzip/deflate• Compress css, js, etc

Page 20: Drupal Backend Performance and Scalability

Alternatives• lighttpd

• http://www.lighttpd.net/

• Popular with Ruby on Rails

• Lighter than Apache (can go as low as 1MB)

• Known to have leaks

• Nginx• http://nginx.net/

• More stable than lighty

• Easier to setup than lighty

• Drupalcampla is running on it…

• Both run PHP as only as FastCGI

• Separate processes

Page 21: Drupal Backend Performance and Scalability

Alternatives (cont’d)• Varnish

• HTTP accelerator

• Set it up as a reverse proxy to send the call to apache if it cannot (or should not) serve something itself

• Serve anonymous page requests and static files

• Not particularly difficult to set up.

• Does require some level of tuning.

• Used on Drupal.org

• Will be used on ladrupal.org

• Squid• Similar to Varnish

• Precursor to Varnish (so go with the above)

Page 22: Drupal Backend Performance and Scalability

MySQL• Most popular database for Drupal

• Not necessarily the best database from technology standpoint (default ending lacks ACID, transactions, concurrency) but still does a good job.

• Easy to set up

• Various pluggable engines (InnoDB, Archive, etc)

Page 23: Drupal Backend Performance and Scalability

MySQL Monitoring• mtop / mytop

• Like top, but for MySQL

• Real time monitoring (no history)

• Shows slow queries and locks

• If you have neither –• SHOW FULL PROCESS LIST – mysqladmin processlist

• Mysqlreport• http://hackmysql.com/mysqlreport

• Reports on server – no recommendations (documentation on site explains a lot, however)

• MySQL DB Tuner• http://wiki.mysqltuner.com/MySQLTuner

• Another one at http://www.day32.com/MySQL/

• Provides fairly decent recommendations (and warnings) on MySQL settings

Page 24: Drupal Backend Performance and Scalability

MySQL Monitoring• Slow Query Log

• Can be enabled in your my.cnf file

• Lists queries more than N seconds

• Can also list queries which do not have indexes (if you choose)

• Very useful to identify potential bottlenecks• Helped me identify that the COUNT(*) query used through the search module

was cause for site search crawl

• Removed count with a base query which cut the search query and render to less than half the time

• Best way to interpret all of it• mysql_slow_log_parser_script

(http://www.mysqlperformanceblog.com/files/utils/mysql_slow_log_parser)

Page 25: Drupal Backend Performance and Scalability

MySQL Engines• MyISAM

• Fast reads

• Less overhead

• Poor concurrency (table level locking)

• InnoDB• Transactional

• Slower in some cases (SELECT COUNT(…))

• Lots of settings to analyze and change

• Better concurrency (good for tables that are constantly hit, such as sessions, watchdog, etc)

• New Engines in development• Falcon (promising ACID)

• SolidDB

• Concerns over the many new variants of MySQL• MariaDB, Percona (forked InnoDB), Google Patchsets, Mysql 5.5

Page 26: Drupal Backend Performance and Scalability

MySQL Tuning• Query Cache, Key Buffer

• VERY Important!

• Also important• Table Cache (caches your tables)

• sort_buffer_size, read_buffer_size

• tmp_table_size (required to be the same or greater than query cache)

• InnoDB• Innodb_buffer_pool_size (most important parameter)

• Set this value to be 50 – 80% of total memory you wish to allocate towards MySQL

• Loads of settings!

• http://www.mysqlperformanceblog.com/ is a great resource to learn lots about tuning MySQL

Page 27: Drupal Backend Performance and Scalability

MySQL Replication• Used on Drupal.org

• INSERT/DELETE/UPDATE queries go to master

• SELECT queries go to slave(s)

• Provide noticeable improvement• Usage on http://www.zimmertwins.com dramatically sped up the site

• D.O has also seen noticeable improvements

• Patch for 6.x seen at http://drupal.org/node/147160

• Beware of complexity (connection between master/slave goes wrong, can bring down site)

• Did extensive tuning of Zimmer Twins DB server• Noticeable improvement despite lack of querying slave server

• Removed need for slave server.

Page 28: Drupal Backend Performance and Scalability

PHP• Use a recent, stable release

• Drupal 7 will require 5.2.x, as do a few current 6.x contributed modules

• Install an Op-code cacher / accelerator• eAccelerator

• APC

• Xcache

• Zend optimizer (commercial)

• Very useful in bringing down memory usage for a site

Page 29: Drupal Backend Performance and Scalability

Op-code Caching• Benefits

• Lowered memory usage

• Significant decrease in CPU utilization

• Can be biggest impact on a site

• Usage on http://www.zimmertwins.com lowered memory usage per process from 20 MB down to less than 4 MB

• Usage on http://calarts.edu lowered memory usage per process down from 35 MB down to 7 MB

• Cached page usage was even lower (2 MB or less)

• 2bits.com has written comparisons between the 3 cachers at http://tinyurl.com/5uo874

• Drawbacks• May crash

• May require restarts after updating code

• Using the capistrano script by the TxT folk (who use xcache) may be useful.

Page 30: Drupal Backend Performance and Scalability

Op-code Caching• Op-code caching will not work in all circumstances

• Network connections

• Sorting arrays

• Database queries

• All of the above

• Some culprit modules• Tagadelic

• Node access modules

• Admin menu

• Forum

• Tracker

Page 31: Drupal Backend Performance and Scalability

Running PHP• mod_php

• Standard php module used by Apache

• No state retained between requests

• Few issues

• Well tested and supported

• May not need to use anything else

• Can be as low as 10 – 12 MB

• Can easily be over 100 MB (depending on drupal modules installed, apache modules enabled on server, etc)

• mod_cgi• Old way of running php

• Don’t use it now (inefficient, each request forks process)

• Fast CGI• Much faster than cgi

• Can be used with nginx, lighttpd, apache

• Apache + fcgi tested to be more stable with lower memory usage than mod_php

Page 32: Drupal Backend Performance and Scalability

Drupal• Mainly database intensive

• Can be a CPU hog

• Can be memory intensive

• Some modules are known to be slow

• Site may not be affected by bottleneck.

• Maintenance tips• Disable unnecessary modules

• Make sure cron runs regularly

Page 33: Drupal Backend Performance and Scalability

Drupal Tools• Devel

• http://drupal.org/project/devel

• Total page execution

• Query execution time

• Query log

• Memory utilization

• Can be combined with stress testing

• Log pages which use most queries

• Log pages which take most time to generate

• Trace

• http://drupal.org/project/trace

• Use for debugging

• Traces output, hook invocations, warnings

• Can filter by query type

Page 34: Drupal Backend Performance and Scalability

Module calls over network• Does your module do interesting things over the network?

• Email users (og)

• Call web widgets / APIs (digg, youtube, flickr)

• Cache data as much as possible (from web widgets/apis)

• Use helper modules to aid with reducing bog• job_queue module

• Queue Mail module

• SMTP mail module

• Weather

Page 35: Drupal Backend Performance and Scalability

Drupal page caching• Page Caching

• Primarily for anonymous users

• Doesn’t necessarily affect authenticated users

• May expire too soon

• Set minimum cache expiry time

• Aggressive cache is not compatible with all modules (statistics from core) but can provide better performance.

• Other types of caching are not necessarily turned off• Filter

• Menu

• Blocks (6.x)

• Variables

• Forms

Page 36: Drupal Backend Performance and Scalability

Useful Drupal caching modules• Boost

• http://drupal.org/project/boost

• Creates HTML for pages and stores it in files

• Requires changes to .htaccess files and symlinks (admin section of module now shows what to write into htaccess file)

• Usable on shared hosts as well as VPS and dedicated servers

• Enhances ability to handle traffic spikes (from anonymous users)

• Can also use module to display site while in maintenance mode (only uncachedpages and authenticated users will get maintenance mode page)

• Development Seed has case study on using boost for Anti Poverty event site that notes performance gain by nearly 1000 times (http://tinyurl.com/l7ktfy)

• Cache Router

• FS Fastpath

• Memcached

• Authenticated User Cache

Page 37: Drupal Backend Performance and Scalability

Pluggable caching• Use $conf variable in settings.php

• $conf[‘cache_inc’] = ‘./path/to/caching_mechanism.inc’;

• Allows you to have a custom caching module

• Can even use this to completely disable caching (DO NOT USE ON A PRODUCTION SITE)

Page 38: Drupal Backend Performance and Scalability

Memcached• Distributed object caching in memory

• Written by danga for livejournal

• Lives in memory

• Can span multiple servers (

• Drupal install uses pluggable caching mechanism

• Should be seamless for Drupal 6.x• Drupal 5.x requires a patch + schema changes

• Can lower db queries in ½

• Has own requirements (apache needs to restarted, more instances = more complexity)

Page 39: Drupal Backend Performance and Scalability

Changing the search mechanism• Drupal Core Search is good…?

• But can quickly get bad with more and more content

• DB Tuning can only go so far

• With nearly 1M nodes, search on zimmertwins.ca would take over 30 seconds.

• Try another engine!

• ApacheSOLR• Very fast

• Pain to configure

• Requires server on which Java can be installed

• Available as a drupal module

• Acquia offers service to use their setup (wholly takes load off your server)

• LuceneAPI• Not as fast as ApacheSolr, but easy to install and can work with that before

moving on to Solr

• Google CSE module might also be a good fit

Page 40: Drupal Backend Performance and Scalability

Other options• Using an optimized distribution

• Pressflow• http://fourkitchens.com/pressflow-makes-drupal-scale

• Supports database replication

• Only supports MySQL

• Cleanly supports reverse proxies such as Varnish

• Optimized for PHP5

• Mercury• Build based on Pressflow

• Is an Amazon Machine Image to be used as instance of Amazon EC2

• Install image contains installation of Solr, Varnish, Memcached

• Tuned Apache settings

• Performance is very promising.

Page 41: Drupal Backend Performance and Scalability

Other options (cont’d)• Patching Drupal

• Other name for it is ‘hack core’

• Not recommended (unless you know what you are doing)

• Sometimes it is necessary

• Hack core from a safe distance if possible• Copy core module into sites/site_name_directory/modules and apply changes

• Use pluggable caching to have your own version of caching

• Create schema changes in a separate module (so you can remove them cleanly in the future)

• Create own module and override original mechanism if process is called through form builder (change the submission or validation to point to yours)

• Have an easy way to track changes you have made

Page 42: Drupal Backend Performance and Scalability

Other options (cont’d)• Case 1:

• User login on zimmertwins.ca was painfully slow (5+ seconds per user)

• Gist of problem: DB not using index on username due to lower()

• Original issue at http://drupal.org/node/83738 (now ongoing in a new issue)

• ‘Bug’ has been around since 2006

• Solution: Modified patch on d.o for site

• Copied user module into sites/all/modules to use override code

• user login time now less than 0.1 seconds

• As a note, d.o has also been patched (login has required matching case for past few months)

• Case 2:• Comments do not have index on user id: problem with abuse module

• Solution: Created index on user id as an update

• Added update to module that requires this schema change (my own module)

• Loading for comment by user no longer an issue.

Page 43: Drupal Backend Performance and Scalability

Module Developers• Take advantage of caching

• Use memory wisely• Unset the variable if you do not have a need for it down the line)

• Save variable to memory for future use so processing is not done multiple times

• Take Advantage of AHAH functionality• Fewer queries

• Not reloading the page

• Saving bandwidth

• Learn to use jQuery (falls in line with above)

Page 44: Drupal Backend Performance and Scalability

(Possibly) Related Camp Presentations

• Drupal Performance: Flash Storage as a Game Changer• Christoph Weber

• Using Flash Storage to power your database for one.

• Install Drupal 7 with Ubuntu 9.10 on Amazon EC2• HarnishGoradia (thatlaguy)

• Building on SOLR Search• John Fiala

Page 45: Drupal Backend Performance and Scalability

Thank you!

Questions? Answers? Tips from you all? Discuss!