FOWA Scaling The Lamp Stack Workshop

72
Scaling the LAMP Stack Future of Web Apps October 5, 2007

description

Slides from the workshop "Scaling the LAMP Stack" at the Future of Web Apps on October 5, 2007

Transcript of FOWA Scaling The Lamp Stack Workshop

Page 1: FOWA Scaling The Lamp Stack Workshop

Scaling the LAMP StackFuture of Web AppsOctober 5, 2007

Page 2: FOWA Scaling The Lamp Stack Workshop

Introductions

Page 3: FOWA Scaling The Lamp Stack Workshop

Specific Problems, Challenges and Issues

Page 4: FOWA Scaling The Lamp Stack Workshop

About this workshop

• This is a broad topic

• Theory and application

• Real-world focus

• Interactive (please!)

Page 5: FOWA Scaling The Lamp Stack Workshop

About web apps and scaling

Some different ways of looking at the problem…

Page 6: FOWA Scaling The Lamp Stack Workshop

Things to think about• Multi-server: locking and

concurrency• Running many: keep in mind

what’s expensive, sloppy or risky• Code quality• The law of truly large numbers

Page 7: FOWA Scaling The Lamp Stack Workshop

Elements of Scaling• Split up different tasks• Use more hardware (intelligently)• Partition• Replicate• Cache• Optimize (code and hardware)• Identify and fix weaknesses• Manage

Page 8: FOWA Scaling The Lamp Stack Workshop

Tools and Components

• Apache + PHP

• MySQL

• File System (local)

• Networked File System

• Load Balancers

• memcached

Page 9: FOWA Scaling The Lamp Stack Workshop

Contemplating Scaling

• Understand what your app does (and how much)

• Identify the bottlenecks

• Solve near-term problems

• Design well, but don’tover-design

Page 10: FOWA Scaling The Lamp Stack Workshop

Web apps do lots of things

Different operations have different scaling issues.

Page 11: FOWA Scaling The Lamp Stack Workshop

What does your app do?

List the high level elements of what your application does. Separate out different functions that will have different scaling issues.

Page 12: FOWA Scaling The Lamp Stack Workshop

Common things that web apps do

• Manage connections/protocols

• Deliver static content

• Manage sessions

• Manage user data

• Render dynamic pages

• Access external APIs

• Process media

Page 13: FOWA Scaling The Lamp Stack Workshop

Update the list of things your app does

• Add anything you missed

• Note which items you do in quantity

Page 14: FOWA Scaling The Lamp Stack Workshop

Easy vs. Difficult Scaling

What happens when you add hardware?

•Does it work?

•Does more hardware = more performance?

Page 15: FOWA Scaling The Lamp Stack Workshop

Things that break when you scale

• State that isn’t properly shared (especially sessions)

• Updates/refreshes (caching and replication issues)

Page 16: FOWA Scaling The Lamp Stack Workshop

Things that don’t improve when you add more servers

• Unpartitioned databases

• Anything that locks/blocks

• Inefficient code, especially big queries

Page 17: FOWA Scaling The Lamp Stack Workshop

Scaling Each Element

• (do easy separations first)

Page 18: FOWA Scaling The Lamp Stack Workshop

Managing Connections/Protocols

• No problem putting on multiple servers

• Apache is goodo Not too far away out of the boxo Moderately tunable

• Linux tuningo TCP stack (tune to handle unusual

networking needs)

Page 19: FOWA Scaling The Lamp Stack Workshop

Key Apache Configuration Issues

• MaxClients (and ServerLimit, ThreadLimit and ThreadsPerChild)

• Avoid using PHP (or other) handler unnecessarily

• Use the worker MPM

• Maybe MaxRequestsPerChild

Page 20: FOWA Scaling The Lamp Stack Workshop

Delivering Static Content

• Don’t process it unnecessarilyo Either cache or use no Apache

handlerso Caching can let you treat semi-static

content as static

• Multiple servers complicates updates, but is otherwise easy

Page 21: FOWA Scaling The Lamp Stack Workshop

General Discussion:Multi-server, state and sessions

Rethinking state for multi-server environments

• What is state?

• Short-term state (sessions)

• Long-term state (application data)

• Managing state is usually the hardest part of scaling

Page 22: FOWA Scaling The Lamp Stack Workshop

What happens with state

• Written (created/destroyed/changed)

• Read

• Stored

Page 23: FOWA Scaling The Lamp Stack Workshop

Requirements for managing state

• Depend on what it is and how it is used

• Perfect coherence

• Performance of different operations

Page 24: FOWA Scaling The Lamp Stack Workshop

Ways of scaling state

• Replication: make more copies

• Partitioning: split up the work

• Caching

Should make different choices for different state/data elements

Page 25: FOWA Scaling The Lamp Stack Workshop

About Load Balancers

• What load balancers doo Spread loado Detect server failureso Stickiness/persistenceo Acceleration (especially SSL)

• Fancy features (including good stickiness) are expensive

Page 26: FOWA Scaling The Lamp Stack Workshop

Why sticky sessions are not usually good in practice

• Servers fail

• Corner cases exist

Page 27: FOWA Scaling The Lamp Stack Workshop

Managing Sessions

Page 28: FOWA Scaling The Lamp Stack Workshop

Where session data can be stored

• Browser cookies

• Web server temporary files(not scalable)

• App server state

• Database

• Cache

Page 29: FOWA Scaling The Lamp Stack Workshop

PHP session management

• Default (files) method is not multi-server friendly, and thus not scalable (unless sticky)

• Can implement a different back-end easily

Page 30: FOWA Scaling The Lamp Stack Workshop

Designing a session back-end

• Requirements

• Data storage optionso Cookies only (re-auth, let the browser

take care of the logout – but less secure)o Full-featured involves a combination of

cookies and database and cache

(discussion of session details)

Page 31: FOWA Scaling The Lamp Stack Workshop

Managing small user data

• Databases are more efficient, flexible and sharable than small files

• Frequently-read data should be cached

Page 32: FOWA Scaling The Lamp Stack Workshop

Managing large user data

• NFS has flaws but is almost inevitable

• Locking is usually not important, but can be

• Performance degradation can be sudden

Page 33: FOWA Scaling The Lamp Stack Workshop

About NFS• NFS is usually transparent to your

app• NFS is easy to implement gives you

multiple-write access• NFS locking is not to be trusted• The Linux NFS client is slow for writes

and can do bad things under stress

Page 34: FOWA Scaling The Lamp Stack Workshop

User data and locking

• Names based on hashes often mean no locking is needed

• Databases do locking better than file systems do

• Locking requires housekeeping

Page 35: FOWA Scaling The Lamp Stack Workshop

Disk Storage Hardware

• Disk performance can degrade suddenly

• If the ratio of access to storage is low, then even slow disk is usually fine

• Think about seek times and spindles

Page 36: FOWA Scaling The Lamp Stack Workshop

Rendering dynamic pages• Depends heavily on application

specifics (query, search, process, etc.)

• Watch out for:o Onerous queries (create and watch slow query log)o Locking of resources and/or incoherence if state changeso Heavy CPU and memory usage

• Cache both elements and complete pages

Page 37: FOWA Scaling The Lamp Stack Workshop

Processing media

• CPU intensive

• May be memory intensive

• Might be spiky

• Might need its own server pool

Page 38: FOWA Scaling The Lamp Stack Workshop

Hardware

• Start simple

• Observe performance and respond accordingly

• Get lots of memory

Page 39: FOWA Scaling The Lamp Stack Workshop

Hardware-driven behaviors

• Sudden degradation because demand exceeds supply (usually relieved unhappily)

• Get behind due to a spike, and recover

• Not enough resources for normal optimization

Page 40: FOWA Scaling The Lamp Stack Workshop

Specific hardware issues

• Not enough memoryo Severe: paging/swappingo Mild: poor automatic caching; slowness

due to fragmentation

• Disk seek (very common)

• CPU (but might really be memory)

• Disk throughput (rare for web apps)

Page 41: FOWA Scaling The Lamp Stack Workshop

Hardware decisions

• SCSI/SAS vs. SATA

• Resource ratios

• Combining vs. splitting functions

• Big vs. little boxes

Page 42: FOWA Scaling The Lamp Stack Workshop

Techniques

• Caching

• Partitioning

• Replication

• Data management middleware

• Queuing

Page 43: FOWA Scaling The Lamp Stack Workshop

Caching• Turn expensive operations in to

cheap ones• Reduce:

o Database readso Object and page calculation/rendering

operations

• Cache objects and subobjects• Add memory

Page 44: FOWA Scaling The Lamp Stack Workshop

Apache Caching

• Can be done with zero application modifications

• Complete pages/HTTP requests only

• Must use Apache 2.2

• Cache is not shared between servers

Page 45: FOWA Scaling The Lamp Stack Workshop

memcached

• Extremely useful

• Distributed caching system

• Requires new thinking and new coding

• Straightforward API

Page 46: FOWA Scaling The Lamp Stack Workshop

memcached URLs

• Home: http://www.danga.com/memcached/

• Intro: http://www.majordojo.com/2007/03/memcached-howto.php

• PHP documentation: http://www.php.net/memcache

Page 47: FOWA Scaling The Lamp Stack Workshop

Partitioning• Mostly for data management

• Split load on to separate servers/pools

• Partition algorithm/mechanism must be lightweight

• Partition algorithm must anticipate the future

Page 48: FOWA Scaling The Lamp Stack Workshop

File Storage Partitioning

• Index/database gives the most flexibility

• Hash-based is simplest

Page 49: FOWA Scaling The Lamp Stack Workshop

Database partitioning

• You will need to do this, but perhaps later than you think

• Index vs. hash-based

Page 50: FOWA Scaling The Lamp Stack Workshop

Replication

• Used where data is read far more than written

• Consider caching first

• Also used for failure recovery

Page 51: FOWA Scaling The Lamp Stack Workshop

Types of Replication

• Replication: sync vs. asynco Synchronous is not usually scalableo Asynchronous only works with certain

kinds of data and use cases, because of coherence issues

Page 52: FOWA Scaling The Lamp Stack Workshop

Database Replication

• Simple but finicky

• Asynchronous (but not by much)

• Allows big queries and backups to be moved to separate servers

Page 53: FOWA Scaling The Lamp Stack Workshop

File System Replication

• Slow and very asynchronous

• Mostly for disaster recovery

Page 54: FOWA Scaling The Lamp Stack Workshop

Data Management Middleware• Mostly for databases• Can handle partitioning and

replication, and do it well• Big investment in coding to the

API• Sometimes easier to add

functionality to app

Page 55: FOWA Scaling The Lamp Stack Workshop

Queuing

• Save work for later

• Useful for less urgent operations, especially messaging

• Can be used to wait for a pause, or to separate hardware

Page 56: FOWA Scaling The Lamp Stack Workshop

Dealing with lots of hardware(operations)

• Automation

• Process

Page 57: FOWA Scaling The Lamp Stack Workshop

Imaging/Provisioning

• Be consistent

• Use your distro’s automation (Kickstart, AutoYaST, etc.)

• Use boring, meaningful hostnames

• Make re-imaging easy

Page 58: FOWA Scaling The Lamp Stack Workshop

Deployment Systems• Content and code replication• Coherence/atomic updates• Managing pieces and processes• Simple scripts are fine• Create audit trail• Include back-out• Think 3AM• Do it!

Page 59: FOWA Scaling The Lamp Stack Workshop

Monitoring systems

• A pain, but a lifesaver• Start with built-in basics• Add custom checks, especially end-

to-end and communication between pieces

• Eliminate false alarms (ongoing)• Nagios, usually

Page 60: FOWA Scaling The Lamp Stack Workshop

Coping with hardware failure• Have extra servers/capacity• Load balancers handle stateless

layers• Replication prepares you to handle

data layers manually• Use middleware or app-level multiple

writes to get true data layer redundancy

Page 61: FOWA Scaling The Lamp Stack Workshop

Change management

• Part automation, part process

• Use version control on everything

• Stage changes with realistic data

• Know how to back out

• Consult the right people (internal and/or external)

Page 62: FOWA Scaling The Lamp Stack Workshop

Efficiency

• Access the smallest amount (DB, FS, etc.)

• Don’t do complex stuff when simple will suffice

Page 63: FOWA Scaling The Lamp Stack Workshop

Using the database efficiently• Keep it simple• Know what queries you do• Index every query key• Cache to reduce demand• Check slow query log• Replicate if you need big queries

Page 64: FOWA Scaling The Lamp Stack Workshop

The messy real world

Page 65: FOWA Scaling The Lamp Stack Workshop

Security and abuse

• Mostly same issues, just magnified

• You will be a target

• Spam (coming and going)

• Abuse of file storage

Page 66: FOWA Scaling The Lamp Stack Workshop

Corner Cases

• Murphy’s law enforcement

• Watch out for how different user activities relate

• Lock data, not functions

• Housekeeping

Page 67: FOWA Scaling The Lamp Stack Workshop

Performance and tuning• Observation and responsiveness is

more important than pre-optimizing• Redesign as needed• Collect the data to be able to

analyze (both resource utilization and end-user performance)

Page 68: FOWA Scaling The Lamp Stack Workshop

Miscellaneous Warnings

Page 69: FOWA Scaling The Lamp Stack Workshop

Files and directories

• Most default file system configurations get really slow with lots of files in one directory

• Numerical limits on files and subdirectories

• Some programs don’t like files over 2GB

Page 70: FOWA Scaling The Lamp Stack Workshop

AJAX

• Sequential round trips

• Make preloading invisible

• UI that waits for too many things

Page 71: FOWA Scaling The Lamp Stack Workshop

Other topics

• Multiple sites

• CDNs

Page 72: FOWA Scaling The Lamp Stack Workshop

Scaling the LAMP StackFuture of Web AppsOctober, 2007

Daniel Lieberman [email protected]