Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web...

52
Copyright 2012, 2015, 2018 & 2019 – Noah Mendelsohn Scalability, Performance & Caching Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah COMP 150-IDS: Internet Scale Distributed Systems (Fall 2019)

Transcript of Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web...

Page 1: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

Copyright 2012, 2015, 2018 & 2019 – Noah Mendelsohn

Scalability, Performance &

Caching

Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2019)

Page 2: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 2

Goals

Explore some general principles of performance, scalability and caching

Explore key issues relating to performance and scalability of the Web

Page 3: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 3

Performance Concepts and Terminology

Page 4: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Performance, Scalability, Availability, Reliability

Performance – Get a lot done quickly – Preferably a low cost

Scalability – Low barriers to growth – A scalable system isn’t necessarily fast… – …but it can grow without slowing down

Availability – Always there when you need it

Reliability – Never does the wrong thing, never loses or corrupts data

4

Page 5: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Throughput vs. Response Time vs. Latency

Throughput: the aggregate rate at which a system does work

Response time: the time to get a response to a request

Latency: time spent waiting (e.g. for disk or network)

We can improve throughput by: – Minimizing work done per request – Doing enough work at once to keep all hardware resources busy… – …and when some work is delayed (latency) find other work to do – Using parallelism, often to work on multiple requests independently

We can improve response time by: – Minimizing total work and delay (latency) on critical path to a response – Applying parallel resources to an individual response…including streaming – Precomputing response values

5

Page 6: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 6

Know How Fast Things Are

Page 7: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Typical “speeds n feeds”

CPU (e.g. Intel Core I7): – A few billions instructions / second per core – Memory: 20GB/sec (20 bytes/instruction executed)

Long distance network – Latency (ping time): 10-100ms – Bandwidth: 5 – 100 Mb/sec

Local area network (Gbit Ethernet) – Latency: 50-100usec (note microseconds) – Bandwidth: 1 Gb/sec (100mbytes/sec)

Hard disk – Rotational delay: 5ms – Seek time: 5 – 10 ms – Bandwidth from magenetic media: 1Gbit/sec

SSD – Setup time: 100usec – Bandwidth: 2Gbit/sec (typical) note: SSD wins big on latency, some on bandwidth

7

Page 8: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 8

Making Systems Faster Single Thread Speed

Page 9: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Sharing the CPU

PowerPt Browser

Multiple Programs Running at once

OPERATING SYSTEM

Page 10: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

What affects speed of a single program?

Browser Code

OPERATING SYSTEM

How well is code written?

In what language?

Compiler optimization?

Page 11: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

What affects speed of a single program?

Browser Code

OPERATING SYSTEM

How well is code written?

In what language?

Compiler optimization?

System Library Code

How efficient are system libraries? (including malloc,

sqrt)

How efficient is the OS (including

file I/O, networking

stack)?

Page 12: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

What affects speed of a single program?

Browser Code

OPERATING SYSTEM

System Library Code (e.g. sqrt)

+ GPU

How powerful is the GPU? How much memory does it have?

A GPU is intended to speed graphics

operations with a CPU-like core optimized for parallel work and data

streaming How well does application and

associated libraries use the

GPU?

Note that GPUs are also useful for general parallel computation

Page 13: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

What affects speed of a single program?

Browser Code

OPERATING SYSTEM

System Library Code (e.g. sqrt)

Network connection

performance

Speed /capacity of storage devices

Page 14: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 14

Making Systems Faster Hiding Latency

Page 15: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Hard disks are slow

Platter

Sector

Page 16: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Handling disk data the slow way

Sector

The Slow Way

• Read a block • Compute on block • Read another block • Compute on other block • Rinse and repeat

Computer waits msec while reading disk 1000s of instruction times!

Page 17: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Faster way: overlap to hide latency

The Faster Way

• Read a block • Start reading another block • Compute on 1st block • Start reading 3rd block • Compute on 2nd block • Rinse and repeat

Buffering: we’re reading ahead…computing while reading!

Sector

Page 18: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 18

Making Systems Faster Bottlenecks and Parallelism

Page 19: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Parallelism and pipelining

19

Adjust contrast and sharpness

Page 20: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Parallelism

20

Multiple computers each take a piece of the image

Page 21: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Pipelining

21

Compute brightness range

Adjust brightness

Sharpen

Page 22: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Amdahl’s claim: parallel processing won’t scale

22

1967: Major controversy …will parallel computers work?

“Demonstration is made of the continued validity of the single processor approach and of the weaknesses of the multiple processor approach in terms of application to real problems and their attendant irregularities. Gene Amdahl*”

* Gene Amdahl, Validity of the single processor approach to achieving large scale computing capabilities, AFIPS spring joint computer conference, 1967 http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf

Page 23: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Amdahl: why no parallel scaling?

23

“The first characteristic of interest is the fraction of the computational load which is associated with data management housekeeping. This fraction […might eventually be reduced to 20%...]. The nature of this overhead appears to be sequential so that it is unlikely to be amenable to parallel processing techniques. Overhead alone would then place an upper limit on throughput of five to seven times the sequential processing rate. Gene Amdahl (Ibid)

In short: even if the part you’re optimizing went to zero time, the speedup would be only 5x.

Speedup = 1/(rs +(rp/n)) where rs and rp are sequential/parallel fractions of computation

As rp/n 0, Speedup -> 1/rs

Page 24: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

So…why does parallelism work after all?

24

Amdahl missed that as we got more parallelism, we would work on bigger problems!

• Simulations with more data points • Indexing all the pages on the World Wide Web • Serving search queries from all users of the Web • Running word processors “in the cloud” for millions of users • Etc.

Page 25: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 25

Web Performance and Scaling

Page 26: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Web Performance & Scalability Goals

Overall Web Goals: – Extraordinary scalability, with good performance – Therefore…very high aggregate throughput (think of all the accesses being

made this second) – Economical to deploy (modest cost/user) – Be a good citizen on the Internet

Web servers: – Decent performance, high throughput and scalability

Web clients (browsers): – Low latency (quick response for users) – Reasonable burden on PC – Minimize memory and CPU on small devices

26

Page 27: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

What we’ve already studied about Web scalability…

Web Builds on scalable hi-perf. Internet infrastructure: – IP – DNS – TCP

Decentralized administration & deployment – The only thing resembling a global, central Web server is the DNS root – URI generation

Stateless protocols – Relatively easy to add servers

27

Page 28: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Web server scaling

Web-Server Application - logic

Browser Data store

Reservation Records

Page 29: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Stateless HTTP protocol helps scalability

Web-Server Application - logic

Browser

Data store

Web-Server Application - logic

Browser

Web-Server Application - logic

Browser

Page 30: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 30

Caching

Page 31: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 31

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

Page 32: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Why does caching work at all?

Locality: – In many computer systems, a small fraction of data gets most of the accesses – In other systems, a slowly changing set of data is accessed repeatedly

History: use of memory by typical programs – Denning’s Working Set Theory* – Early demonstration of locality in program access to memory – Justified paged virtual memory with LRU replacement algorithm – Also indirectly explains why CPU caches work

But…not all data-intensive programs follow the theory: – Video processing! – Many simulations – Hennessy and Patterson: running vector (think MMX/SIMD) data through the

CPU cache was a big mistake in IBM mainframe vector implementations

32

* Peter J. Denning, The Working Set Model for Program Behavior, 1968 http://denninginstitute.com/pjd/PUBS/WSModel_1968.pdf

Also 2008 overview on locality from Denning: http://denninginstitute.com/pjd/PUBS/ENC/locality08.pdf

Page 33: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Why is caching hard?

33

Things change

Telling everyone when things change adds overhead

So, we’re tempted to cheat… …caches out of sync with reality

Page 34: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

CPU Caching – Simple System

34

CPU

Memory

CACHE

Read data

Read data

Read data Read request

Read request

Page 35: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

CPU Caching – Simple System

35

CPU

Memory

CACHE Read data

Read data Repeated read request

Life is Good No Traffic to Slow Memory

Page 36: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

CPU Caching – Store Through Writing

36

CPU

Memory

CACHE Write data

Write request

Write request

Everything is up-to-date… …but every write waits for slow memory!

Page 37: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

CPU Caching – Store In Writing

37

CPU

Memory

CACHE Write data

Write request

The write is fast, but memory is out of date!

Page 38: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

CPU Caching – Store In Writing

38

CPU

Memory

CACHE Write data

If we try to write data from memory to disk, the wrong data will go out!

Page 39: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Cache invalidation is hard!

39

CPU

Memory

CACHE Write data We can start to see why cache invalidation is hard!

Page 40: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Multi-core CPU caching

40

CPU

Memory

CACHE

Write request

CPU

CACHE

CACHE

Coherence Protocol

Page 41: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Data

Multi-core CPU caching

41

CPU

Memory

CACHE

Write request

CPU

CACHE

Read request

CACHE

Coherence Protocol

Data

Data

Page 42: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Data

Multi-core CPU caching

42

CPU

Memory

CACHE

Write request

CPU

CACHE

Disk read request

CACHE

Coherence Protocol

Data

Disk Data

A read from disk must flush all caches

Data

Data

Page 43: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Consistency vs. performance

Caching involves difficult tradeoffs

Coherence is the enemy of performance! – This proves true over and over in lots of systems – There’s a ton of research on weak-consistency models…

Weak consistency: let things get out of sync sometimes – Programming: compilers and libraries can hide or even exploit weak

consistency

Yet another example of leaky abstractions!

43

Page 44: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 44

What about Web Caching?

Note: update rate on Web is mostly low – makes things easier!

Page 45: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Browsers have caches

E.g. Firefox E.g. Apache

Browser Usually includes a cache!

Web Server

Browser cache prevents repeated requests for same representations

Page 46: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Browsers have caches

E.g. Firefox E.g. Apache

Browser Usually includes a cache!

Web Server

Browser cache prevents repeated requests for same representations…even different pages share images stylesheets, etc.

Page 47: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

Web Reservation System

Web Server Application - logic

Browser or Phone App Data Store

iPhone or Android Reservation Application

Flight Reservation Logic

Reservation Records

Many commercial applications work this way

E.g. Squid

Proxy Cache (optional!)

HTTP HTTP RPC? ODBC? Proprietary?

Page 48: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

HTTP Caches Help Web to Scale

Browser

Browser

Browser

Data store

Web-Server Application -

logic

Web-Server Application -

logic

Web-Server Application -

logic

Web Proxy Cache

Web Proxy Cache

Web Proxy Cache

Page 49: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 49

Web Caching Details

Page 50: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn

HTTP Headers for Caching Cache-control: max-age:

– Server indicates how long response is good Heuristics:

– If no explicit times, cache can guess Caches check with server when content has expired

– Sends ordinary GET w/validator headers – Validators: Modified time (1 sec resolution); Etag (opaque code) – Server returns “304 Not Modified” or new content

Cache-control: override default caching rules – E.g. client forces check for fresh copy – E.g. client explicitly allows stale data (for performance, availability)

Caches inform downstream clients/proxies of response age PUT/POST/DELETE clear caches, but…

– No guarantee that updates go through same proxies as all reads! – Don’t mark as cacheable things you expect to update through parallel proxies!

50

Page 51: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 51

Summary

Page 52: Scalability, Performance Caching · The Web is a highly scalable system w/most OK performance Web approaches to scalability – Built on scalable Internet infrastructure – Few single

© 2010 Noah Mendelsohn 52

Summary

We have studied some key principles and techniques relating to peformance and scalability – Hardware performance – Single program issues (code quality, compiler, etc.) – Hiding latency – Parallelism and Amdahl’s law – Buffering and caching – Stateless protocols, etc.

The Web is a highly scalable system w/most OK performance Web approaches to scalability

– Built on scalable Internet infrastructure – Few single points of control (DNS root changes slowly and available in parallel) – Administrative scalability: no central Web site registry

Web performance and scalability – Very high parallelism (browsers, servers all run in parallel) – Stateless protocols support scale out – Caching