Software at Scale
-
Upload
new-york-city-college-of-technology-computer-systems-technology-colloquium -
Category
Technology
-
view
558 -
download
0
Transcript of Software at Scale
Why is scale important?
0
10000
20000
30000
40000
50000
60000
70000
80000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Usage Difficulty
“Do things that don’t scale!”
Missed opportunity
Permanent scaling need
But scale if it’s on the way.
A tale of two startups(“Or how I spent 2013…”)
Clipless
Built to scale.v1 developed in 3 months.PR blast to TechCrunch, AndroidPolice, etc. led to 1700% month over month growth.Handling over 10,000 QPS.Acquired 3 months from launch.
Shark Tank Startup
Scaling ignored.v1 developed in 3 months.Reran on Shark Tank, service and website went down almost immediately.Still slow (but steady) growth.
What was different?
Clipless (Tomcat, 1-3 Digital Ocean VMs)
Load balanced, replicated servers and DBs.Well-written RESTful API, any server could answer any query.Multithreaded backend.Batched, asynchronous DB operations.Caching by locality and time.Queued network operations.
S.T. Startup (Ruby on Rails, Heroku)
No load balancing.Replicated DB via Heroku postgres.Not truly REST, backends kept state.Single-threaded backend (one request blocked entire Heroku dyno).Direct, blocking DB access.DB caching via ActiveRecord.
Potential Bottlenecks
• Client resources• CPU
• Memory
• I/O
• Server resources
• Database resources• Open connections
• Running queries
• Network resources• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)
Potential Bottlenecks
• Client resources• CPU
• Memory
• I/O
• Server resources
• Database resources
• Network resources• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)
Profile your algorithmsCrunch less dataReuse more old workOffload some processing to the server
Potential Bottlenecks
• Client resources
• Server resources• CPU
• Memory
• I/O
• Database resources
• Network resources• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)
Profile your algorithmsCrunch less dataReuse more old work (across users)Divide and Conquer (“shard”)Spin up and balance more servers
Potential Bottlenecks
• Client resources
• Server resources
• Database resources• Open connections
• Running queries
• Network resources• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)
Optimize your queriesConnection poolingAdd a second-level cacheReuse more old work (across users)Divide and Conquer (“shard”)Batch DB requestsSpin up and replicate more DBs
Potential Bottlenecks
• Client resources
• Server resources
• Database resources
• Network resources• Bandwidth
• Connections / open sockets
• Availability (esp. on Wifi / mobile networks)
Add a local cacheSend diffsCompress responses (CPU tradeoff)Connection poolingBatch network requests
Profiling
Purpose: find the “hotspots” in your program.
Things you care about:• “CPU time” – time spent processing your program’s instructions.
• “Memory” – RAM being used to store your program’s data.
• “Wall time” – overall time spent waiting for the program.
• Methods:• Basic: “Stopwatch”
• Advanced: Profiler (e.g. jprof, jprofiler, hprof, Netbeans, Visual Studio)
(Diagnosing the problem)
Stopwatch
• Easy: just time methods.
Matlab:
function [result] = do_something_expensive(data)
tic
…
toc
end
• In Java, use Guava’s Stopwatch class (start() and stop() methods).
Profiler
Strategies
Caching and Reuse
• Trades off CPU for space.
• Look for repetition of input.(Including subproblems)
• Compute a key from the input.
• Associate the result with the key.
• Important: algorithm must be a deterministic mapping from input to output.
• Important: if you change what the algorithm depends on, update the cache key.
“There are only two hard things in Computer Science:cache invalidation and naming things.” --Phil Karlton
Name: Alice
Job: Developer
Salary: 100,000
<Alice, [email protected]>
Cache
Computing a Cache Key
• Hashing is a good strategy.
• Object.hash (JDK7) / Objects.hashCode (Guava)
• Beware: Hashes can collide – sanity check results!
• Searching:• Hash data
• Query cache for hash key.
• If found, return associated value.
• If not, query live service and store the result in the cache.
<Alice, [email protected]>
0xAF724…
Concurrency
Work Work Work
Work
Work
Work
A lot of time
Less time
Sequential programs run like this:
Concurrent programs run like this:
Race Conditions
Problem: Two threads can simultaneously write to the same variables.
If you ran this code in two threads:
if (x < 1) { x++; }
Then x would usually end up at 1.
But sometimes it would be 2!
• Race conditions such as that one are among the hardest bugs to find + fix.
• Three ways to manage this:
• Immutability
• Local state
• Synchronization
• Race conditions only happen when you write to shared, mutable state.
Immutability
• General tip: try to minimize the number of states your program can end up in.
• Concurrency
• REST
• (And your programs will just have less state, so you’ll produce fewer bugs)
• Declare variables final where possible, set them in the constructor, and don’t write setters unless you must:
// String is an immutable type - can’t change it at runtime.
// foo is an immutable variable - can’t reassign it.
private final String foo;
public Bar(String foo) {
this.foo = Preconditions.checkNotNull(foo);
}
Local State
• Sometimes you need to modify state.
• But you can still avoid locking if it’s only visible to you:
• Two threads can write copies of same data.
• Optionally, can be merged back in single thread afterwards.
• (This is how MapReduce works)
Java inner classes help tremendously with this!
// Every time you run sendToNetwork, you’ll use a new channel. No shared state!
void sendToNetwork() {
final Channel channel = new HttpChannel(context);
channel.connect();
Thread foo = new Thread() {
@Override
public void run() {
channel.send(“I am the jabberwocky”);
}
}
}
Synchronization
• If you do need to write shared state, you need to synchronize access to it.
• Last resort: slows your program and deadlock-prone.
Object lock;
synchronized (lock) {
if (x < 1) { x++; }
}
Now x is always 1! No interruption possible between read and write.
• More advanced: read/write locks (ReentrantReadWriteLock…)
• Also check out Java “Atomic” classes and “concurrent” collections:• AtomicBoolean, AtomicInteger, …• ConcurrentHashMap…
Futures
• Threads compute asynchronously.
• Caller wants some way of knowing the result when it’s ready.
• Future: handle to a result that may or may not be available yet.• future.get(): waits for a result and returns it, with optional timeout.
• Futures allow for asynchronous calls to immediately return, and for the program to wait for the results when it’s convenient.
• Also see Guava’s ListenableFuture.
The usual pattern:
ThreadPoolExecutor pool;
Callable<String> action = new Callable<String>() {
@Override
public String call() throws NetworkException {
return askTheNetworkForMyString();
}
};
Future<String> result = pool.submit(callable);
String myString = result.get(); // Waits until the result is available. Throws if an exception was thrown inside the Callable.
REST
• Scalable client / server architecture.
• Sockets are complicated, usually uses HTTP.
• Each HTTP request hits an “endpoint”, which does one thing.
e.g. GET http://api.clipless.co/json/deals/near/Times_Square
• Principles:
• Server does not store state (see immutability)
• Responses can be cached (see caching)
• Client doesn’t care if server is final endpoint or proxy.
• State usually ends up in DB, server communicates with client using tokens.
Clipless Architecture
10,000 reqs / second
Protobuf over HTTP
Apache (mod_proxy_balancer)
Tomcat
MySQL
Content-Addressable
Cache
Content-Addressable
Cache
Static Content
• Static content (e.g. HTML, images) is highly cacheable.
• Easiest way to cache: use a CDN• Akamai, S3, CloudFlare, CloudFront, MaxCDN, …
• Cache key:• Some HTTP headers (inc. Cache-Control header)
• Page requested
• Last-modified (e.g. from a “HEAD” to your server)
• Added bonus: most CDNs are “closer” to your users than your server.
• Compressing content reduces bandwidth:• Browsers usually support gzip decompression.
• Apache, nginx: Gzip compression plugins
• Javascript / CSS: minification
• Images: Google PageSpeed service / CloudFlare
• Program data: Protocol Buffers, Thrift
• Why use your bandwidth when you can use someone else’s?
Sharding
Requests A-L Requests M-Z
Alice
BobMallory
Batching Network RequestsThe Operation Queue / Proactor Pattern
Producer
Producer
Thread-safe queue
Worker Thread Pool
Work
NetworkListener
onUp: queue.resume()onDown: queue.suspend()
Work Work
Producer
ListenableFuture<Result>
How to Test
• Mock large amounts of data, measure performance• Can be automated so you never encounter performance regressions
• Network stress tests• ab
• blitz
• loader.io
• ulimits
• Packet sniffers
• Round trip time services, e.g. NewRelic.
General Principles
• Scale when you anticipate the need.
• Scale eagerly when you don’t need to go far out of the way.• CDNs and Gzip compression good examples.
• Or when retrofitting will be painful.• RESTful architecture from the beginning: much easier than tacking it on later!
• But caching is usually easy to add later.
• Focus on the big improvements:• 80/20 rule
• Profile and knock out the biggest CPU / memory hogs first.
• Practice and internalize to reduce scaling costs!• Concurrency is much easier with mastery.
• Caching seems much easier with mastery, often isn’t.
• Internalize immutability and you’ll just write better code.
Thanks!
Good luck, and always bring mangosteens to acquisition talks.