Software at Scale

Why is scale important?

0

10000

20000

30000

40000

50000

60000

70000

80000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Usage Difficulty

“Do things that don’t scale!”

Missed opportunity

Permanent scaling need

But scale if it’s on the way.

A tale of two startups(“Or how I spent 2013…”)

Clipless

Built to scale.v1 developed in 3 months.PR blast to TechCrunch, AndroidPolice, etc. led to 1700% month over month growth.Handling over 10,000 QPS.Acquired 3 months from launch.

Shark Tank Startup

Scaling ignored.v1 developed in 3 months.Reran on Shark Tank, service and website went down almost immediately.Still slow (but steady) growth.

What was different?

Clipless (Tomcat, 1-3 Digital Ocean VMs)

Load balanced, replicated servers and DBs.Well-written RESTful API, any server could answer any query.Multithreaded backend.Batched, asynchronous DB operations.Caching by locality and time.Queued network operations.

S.T. Startup (Ruby on Rails, Heroku)

No load balancing.Replicated DB via Heroku postgres.Not truly REST, backends kept state.Single-threaded backend (one request blocked entire Heroku dyno).Direct, blocking DB access.DB caching via ActiveRecord.

Potential Bottlenecks

• Client resources• CPU

• Memory

• I/O

• Server resources

• Database resources• Open connections

• Running queries

• Network resources• Bandwidth

• Connections / open sockets

• Availability (esp. on Wifi / mobile networks)


• Client resources• CPU

• Memory

• I/O


• Database resources




Profile your algorithmsCrunch less dataReuse more old workOffload some processing to the server


• Client resources

• Server resources• CPU

• Memory

• I/O





Profile your algorithmsCrunch less dataReuse more old work (across users)Divide and Conquer (“shard”)Spin up and balance more servers




• Database resources• Open connections

• Running queries




Optimize your queriesConnection poolingAdd a second-level cacheReuse more old work (across users)Divide and Conquer (“shard”)Batch DB requestsSpin up and replicate more DBs








Add a local cacheSend diffsCompress responses (CPU tradeoff)Connection poolingBatch network requests

Profiling

Purpose: find the “hotspots” in your program.

Things you care about:• “CPU time” – time spent processing your program’s instructions.

• “Memory” – RAM being used to store your program’s data.

• “Wall time” – overall time spent waiting for the program.

• Methods:• Basic: “Stopwatch”

• Advanced: Profiler (e.g. jprof, jprofiler, hprof, Netbeans, Visual Studio)

(Diagnosing the problem)

Stopwatch

• Easy: just time methods.

Matlab:

function [result] = do_something_expensive(data)

tic

…

toc

end

• In Java, use Guava’s Stopwatch class (start() and stop() methods).

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Stopwatch.html

Profiler

Strategies

Caching and Reuse

• Trades off CPU for space.

• Look for repetition of input.(Including subproblems)

• Compute a key from the input.

• Associate the result with the key.

• Important: algorithm must be a deterministic mapping from input to output.

• Important: if you change what the algorithm depends on, update the cache key.

“There are only two hard things in Computer Science:cache invalidation and naming things.” --Phil Karlton

Name: Alice

Job: Developer

Salary: 100,000

<Alice, [email protected]>

Cache

Computing a Cache Key

• Hashing is a good strategy.

• Object.hash (JDK7) / Objects.hashCode (Guava)

• Beware: Hashes can collide – sanity check results!

• Searching:• Hash data

• Query cache for hash key.

• If found, return associated value.

• If not, query live service and store the result in the cache.

<Alice, [email protected]>

0xAF724…

Concurrency

Work Work Work

Work

Work

Work

A lot of time

Less time

Sequential programs run like this:

Concurrent programs run like this:

Race Conditions

Problem: Two threads can simultaneously write to the same variables.

If you ran this code in two threads:

if (x < 1) { x++; }

Then x would usually end up at 1.

But sometimes it would be 2!

• Race conditions such as that one are among the hardest bugs to find + fix.

• Three ways to manage this:

• Immutability

• Local state

• Synchronization

• Race conditions only happen when you write to shared, mutable state.

Immutability

• General tip: try to minimize the number of states your program can end up in.

• Concurrency

• REST

• (And your programs will just have less state, so you’ll produce fewer bugs)

• Declare variables final where possible, set them in the constructor, and don’t write setters unless you must:

// String is an immutable type - can’t change it at runtime.

// foo is an immutable variable - can’t reassign it.

private final String foo;

public Bar(String foo) {

this.foo = Preconditions.checkNotNull(foo);

}

Local State

• Sometimes you need to modify state.

• But you can still avoid locking if it’s only visible to you:

• Two threads can write copies of same data.

• Optionally, can be merged back in single thread afterwards.

• (This is how MapReduce works)

Java inner classes help tremendously with this!

// Every time you run sendToNetwork, you’ll use a new channel. No shared state!

void sendToNetwork() {

final Channel channel = new HttpChannel(context);

channel.connect();

Thread foo = new Thread() {

@Override

public void run() {

channel.send(“I am the jabberwocky”);

}

}

}

Synchronization

• If you do need to write shared state, you need to synchronize access to it.

• Last resort: slows your program and deadlock-prone.

Object lock;

synchronized (lock) {

if (x < 1) { x++; }

}

Now x is always 1! No interruption possible between read and write.

• More advanced: read/write locks (ReentrantReadWriteLock…)

• Also check out Java “Atomic” classes and “concurrent” collections:• AtomicBoolean, AtomicInteger, …• ConcurrentHashMap…

Futures

• Threads compute asynchronously.

• Caller wants some way of knowing the result when it’s ready.

• Future: handle to a result that may or may not be available yet.• future.get(): waits for a result and returns it, with optional timeout.

• Futures allow for asynchronous calls to immediately return, and for the program to wait for the results when it’s convenient.

• Also see Guava’s ListenableFuture.

The usual pattern:

ThreadPoolExecutor pool;

Callable<String> action = new Callable<String>() {

@Override

public String call() throws NetworkException {

return askTheNetworkForMyString();

}

};

Future<String> result = pool.submit(callable);

String myString = result.get(); // Waits until the result is available. Throws if an exception was thrown inside the Callable.

REST

• Scalable client / server architecture.

• Sockets are complicated, usually uses HTTP.

• Each HTTP request hits an “endpoint”, which does one thing.

e.g. GET http://api.clipless.co/json/deals/near/Times_Square

• Principles:

• Server does not store state (see immutability)

• Responses can be cached (see caching)

• Client doesn’t care if server is final endpoint or proxy.

• State usually ends up in DB, server communicates with client using tokens.

Clipless Architecture

10,000 reqs / second

Protobuf over HTTP

Apache (mod_proxy_balancer)

Tomcat

MySQL

Content-Addressable

Cache

Content-Addressable

Cache

Static Content

• Static content (e.g. HTML, images) is highly cacheable.

• Easiest way to cache: use a CDN• Akamai, S3, CloudFlare, CloudFront, MaxCDN, …

• Cache key:• Some HTTP headers (inc. Cache-Control header)

• Page requested

• Last-modified (e.g. from a “HEAD” to your server)

• Added bonus: most CDNs are “closer” to your users than your server.

• Compressing content reduces bandwidth:• Browsers usually support gzip decompression.

• Apache, nginx: Gzip compression plugins

• Javascript / CSS: minification

• Images: Google PageSpeed service / CloudFlare

• Program data: Protocol Buffers, Thrift

• Why use your bandwidth when you can use someone else’s?

Sharding

Requests A-L Requests M-Z

Alice

BobMallory

Batching Network RequestsThe Operation Queue / Proactor Pattern

Producer

Producer

Thread-safe queue

Worker Thread Pool

Work

NetworkListener

onUp: queue.resume()onDown: queue.suspend()

Work Work

Producer

ListenableFuture<Result>

How to Test

• Mock large amounts of data, measure performance• Can be automated so you never encounter performance regressions

• Network stress tests• ab

• blitz

• loader.io

• ulimits

• Packet sniffers

• Round trip time services, e.g. NewRelic.

General Principles

• Scale when you anticipate the need.

• Scale eagerly when you don’t need to go far out of the way.• CDNs and Gzip compression good examples.

• Or when retrofitting will be painful.• RESTful architecture from the beginning: much easier than tacking it on later!

• But caching is usually easy to add later.

• Focus on the big improvements:• 80/20 rule

• Profile and knock out the biggest CPU / memory hogs first.

• Practice and internalize to reduce scaling costs!• Concurrency is much easier with mastery.

• Caching seems much easier with mastery, often isn’t.

• Internalize immutability and you’ll just write better code.

Thanks!

Good luck, and always bring mangosteens to acquisition talks.

Software at Scale

Technology

Transcript of Software at Scale