Servers: Concurrency and Performance Jeff Chase Duke University.

53
Servers: Concurrency and Performance Jeff Chase Duke University

Transcript of Servers: Concurrency and Performance Jeff Chase Duke University.

Page 1: Servers: Concurrency and Performance Jeff Chase Duke University.

Servers: Concurrency and Performance

Jeff ChaseDuke University

Page 2: Servers: Concurrency and Performance Jeff Chase Duke University.

HTTP Server

• HTTP Server– Creates a socket (socket)– Binds to an address– Listens to setup accept backlog– Can call accept to block waiting for

connections– (Can call select to check for data on multiple socks)

• Handle request– GET /index.html HTTP/1.0\n

<optional body, multiple lines>\n\n

Page 3: Servers: Concurrency and Performance Jeff Chase Duke University.

Inside your server

packet queues

listen queue

accept queue

Server application(Apache,

Tomcat/Java, etc)

Measuresoffered loadresponse timethroughpututilization

Page 4: Servers: Concurrency and Performance Jeff Chase Duke University.

Example: Video On Demand

Client() {fd = connect(“server”);write (fd, “video.mpg”);while (!eof(fd)) {

read (fd, buf); display (buf);}

}

Server() {while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) {

read(fd, block);write (cfd, block);

} close (cfd); close (fd);}

[MIT/Morris]

How many clients can the server support?Suppose, say, 200 kb/s video on a 100 Mb/s network link?

Page 5: Servers: Concurrency and Performance Jeff Chase Duke University.

Performance “analysis”

• Server capacity:– Network (100 Mbit/s)– Disk (20 Mbyte/s)

• Obtained performance: one client stream• Server is limited by software structure• If a video is 200 Kbit/s, server should be able to

support more than one client.

[MIT/Morris]

500?

Page 6: Servers: Concurrency and Performance Jeff Chase Duke University.

WebServer Flow

TCP socket space

state: listeningaddress: {*.6789, *.*}completed connection queue: sendbuf:recvbuf:

128.36.232.5128.36.230.2

state: listeningaddress: {*.25, *.*}completed connection queue:sendbuf:recvbuf:

state: establishedaddress: {128.36.232.5:6789, 198.69.10.10.1500}sendbuf:recvbuf:

connSocket = accept()

Create ServerSocket

read request from connSocket

read local file

write file to connSocket

close connSocket Discussion: what does each step do and how long does it take?

Page 7: Servers: Concurrency and Performance Jeff Chase Duke University.

Web Server Processing Steps

Accept ClientConnection

Read HTTPRequest Header

FindFile

Send HTTPResponse Header

Read FileSend Data

may blockwaiting ondisk I/O

Want to be able to process requests concurrently.

may blockwaiting onnetwork

Page 8: Servers: Concurrency and Performance Jeff Chase Duke University.

Process States and Transitionsrunning(user)

running(kernel)

readyblocked

Run

Wakeup

interrupt,exception

Sleep

Yield

trap/return

Page 9: Servers: Concurrency and Performance Jeff Chase Duke University.

Server Blocking

• accept() when no connect requests are waiting on the listen queue– What if server has multiple ports to listen from?

• E.g., 80 for HTTP, 443 for HTTPS• open/read/write on server files• read() on a socket, if the client is sending too slowly• write() on socket, if the client is receiving too slowly

– Yup, TCP has flow control like pipes

What if the server blocks while serving one client, and another client has work to do?

Page 10: Servers: Concurrency and Performance Jeff Chase Duke University.

Under the Hood

CPU

I/O device

I/O requestI/O completion

start (arrival rate λ)

exit (throughput λ until some

center saturates)

Page 11: Servers: Concurrency and Performance Jeff Chase Duke University.

Concurrency and Pipelining

CPU

DISK Before

NET

CPU

DISK

NET

After

Page 12: Servers: Concurrency and Performance Jeff Chase Duke University.

Better single-server performance

• Goal: run at server’s hardware speed– Disk or network should be bottleneck

• Method:– Pipeline blocks of each request– Multiplex requests from multiple clients

• Two implementation approaches:– Multithreaded server– Asynchronous I/O

[MIT/Morris]

Page 13: Servers: Concurrency and Performance Jeff Chase Duke University.

Concurrent threads or processes

• Using multiple threads/processes– so that only the flow

processing a particular request is blocked

– Java: extends Thread or implements Runnable interface

Example: a Multi-threaded WebServer, which creates a thread for each request

Page 14: Servers: Concurrency and Performance Jeff Chase Duke University.

Multiple Process Architecture

• Advantages– Simple programming while addressing blocking issue

• Disadvantages– Many processes; large context switch overheads– Consumes much memory– Optimizations involving sharing information among

processes (e.g., caching) harder

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

Process 1

Process N

separate address spaces

Page 15: Servers: Concurrency and Performance Jeff Chase Duke University.

Using Threads

• Advantages– Lower context switch overheads– Shared address space simplifies optimizations (e.g.,

caches)• Disadvantages

– Need kernel level threads (why?)– Some extra memory needed to support multiple stacks– Need thread-safe programs, synchronization

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

Thread 1

Thread N

Page 16: Servers: Concurrency and Performance Jeff Chase Duke University.

Multithreaded serverserver() {

while (1) { cfd = accept(); read (cfd, name); fd = open (name); while (!eof(fd)) {

read(fd, block);write (cfd, block);

} close (cfd); close (fd);}}

for (i = 0; i < 10; i++)threadfork (server);

• When waiting for I/O, thread scheduler runs another thread

• What about references to shared data?

• Synchronization

[MIT/Morris]

Page 17: Servers: Concurrency and Performance Jeff Chase Duke University.

Event-Driven Programming• One execution stream: no

CPU concurrency.• Register interest in events

(callbacks).• Event loop waits for events,

invokes handlers.• No preemption of event

handlers.• Handlers generally short-

lived.

EventLoop

Event Handlers

[Ousterhout 1995]

Page 18: Servers: Concurrency and Performance Jeff Chase Duke University.

Single Process Event Driven (SPED)

• Single threaded• Asynchronous (non-blocking) I/O• Advantages

– Single address space– No synchronization

• Disadvantages– In practice, disk reads still block

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

Event Dispatcher

Page 19: Servers: Concurrency and Performance Jeff Chase Duke University.

Asynchronous Multi-Process Event Driven (AMPED)

• Like SPED, but use helper processes/thread for disk I/O• Use IPC to communicate with helper process• Advantages

– Shared address space for most web server functions– Concurrency for disk I/O

• Disadvantages– IPC between main thread and helper threads

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

Event Dispatcher

Helper 1 Helper 1 Helper 1

This hybrid model is used by the “Flash” web server.

Page 20: Servers: Concurrency and Performance Jeff Chase Duke University.

Event-Based Concurrent Servers Using I/O

Multiplexing

• Maintain a pool of connected descriptors.• Repeat the following forever:

– Use the Unix select function to block until:• (a) New connection request arrives on the listening

descriptor.• (b) New data arrives on an existing connected

descriptor.– If (a), add the new connection to the pool of

connections.– If (b), read any available data from the connection

• Close connection on EOF and remove it from the pool.

[CMU 15-213]

Page 21: Servers: Concurrency and Performance Jeff Chase Duke University.

Select

• If a server has many open sockets, how does it know when one of them is ready for I/O?int select(int n, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,

struct timeval *timeout);

• Issues with scalability: alternative event interfaces have been offered.

Page 22: Servers: Concurrency and Performance Jeff Chase Duke University.

Asynchronous I/Ostruct callback { bool (*is_ready)(); void (*cb)(arg); void *arg;}

main() { while (1) {

for (c = each callback) { if (c->is_ready()) c->handler(c->arg);

} }}

• Code is structured as a collection of handlers• Handlers are nonblocking• Create new handlers for blocking operations• When operation completes, call handler

[MIT/Morris]

Page 23: Servers: Concurrency and Performance Jeff Chase Duke University.

Asychronous server

init() {on_accept(accept_cb);

}accept_cb() {

on_readable(cfd,name_cb);}on_readable(fd, fn) {

c = new callback(test_readable, fn, fd); add c to callback list;}

name_cb(cfd) {read(cfd,name);fd = open(name);on_readable(fd, read_cb);

}read_cb(cfd, fd) {

read(fd, block);on_writeeable(fd, write_cb);

}write_cb(cfd, fd) {

write(cfd, block);on_readable(fd, read_cb);

}

[MIT/Morris]

Page 24: Servers: Concurrency and Performance Jeff Chase Duke University.

Multithreaded vs. Async• Hard to program

– Locking code– Need to know what blocks

• Coordination explicit• State stored on thread’s stack

– Memory allocation implicit• Context switch may be

expensive• Multiprocessors

• Hard to program– Callback code– Need to know what blocks

• Coordination implicit• State passed around explicitly

– Memory allocation explicit• Lightweight context switch• Uniprocessors

[MIT/Morris]

Page 25: Servers: Concurrency and Performance Jeff Chase Duke University.

Coordination example

• Threaded server:– Thread for network

interface– Interrupt wakes up

network thread– Protected (locks and

conditional variables) shared buffer shared between server threads and network thread

• Asynchronous I/O– Poll for packets

• How often to poll?– Or, interrupt generates

an event• Be careful: disable

interrupts when manipulating callback queue.

[MIT/Morris]

Page 26: Servers: Concurrency and Performance Jeff Chase Duke University.

Threads!

One View

Page 27: Servers: Concurrency and Performance Jeff Chase Duke University.

Should You Abandon Threads?

• No: important for high-end servers (e.g. databases).

• But, avoid threads wherever possible:– Use events, not threads, for GUIs,

distributed systems, low-end servers.– Only use threads where true CPU

concurrency is needed.– Where threads needed, isolate usage

in threaded application kernel: keepmost of code single-threaded.

Threaded Kernel

Event-Driven Handlers

[Ousterhout 1995]

Page 28: Servers: Concurrency and Performance Jeff Chase Duke University.

Another view

• Events obscure control flow– For programmers and tools

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s);}RequestHandler(struct session *s) { …; CacheHandler.enqueue(s);}CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}. . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

[von Behren]

Page 29: Servers: Concurrency and Performance Jeff Chase Duke University.

Control Flow

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); read_request(&s); pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; CacheHandler.enqueue(s);}. . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

• Events obscure control flow– For programmers and tools

[von Behren]

Page 30: Servers: Concurrency and Performance Jeff Chase Duke University.

Exceptions• Exceptions complicate control flow

– Harder to understand program flow– Cause bugs in cleanup code

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s);}. . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

[von Behren]

Page 31: Servers: Concurrency and Performance Jeff Chase Duke University.

State Management

Threads Eventsthread_main(int sock) { struct session s; accept_conn(sock, &s); if( !read_request(&s) ) return; pin_cache(&s); write_response(&s); unpin(&s);}

pin_cache(struct session *s) { pin(&s); if( !in_cache(&s) ) read_file(&s);}

CacheHandler(struct session *s) { pin(s); if( !in_cache(s) ) ReadFileHandler.enqueue(s); else ResponseHandler.enqueue(s);}RequestHandler(struct session *s) { …; if( error ) return; CacheHandler.enqueue(s);}. . . ExitHandlerr(struct session *s) { …; unpin(&s); free_session(s); }AcceptHandler(event e) { struct session *s = new_session(e); RequestHandler.enqueue(s); }

AcceptConn.

WriteResponse

ReadFile

ReadRequest

PinCache

Web Server

Exit

• Events require manual state management• Hard to know when to free

– Use GC or risk bugs

[von Behren]

Page 32: Servers: Concurrency and Performance Jeff Chase Duke University.

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

AcceptConn

ReadRequest

FindFile

SendHeader

Read FileSend Data

Thread 1

Thread N

Page 33: Servers: Concurrency and Performance Jeff Chase Duke University.

Internet Growth and Scale

The InternetThe Internet

How to handle all those client requests raining on your server?

Page 34: Servers: Concurrency and Performance Jeff Chase Duke University.

Servers Under Stress

Ideal

Peak: some resource at max

Overload: someresource thrashing

Load (concurrent requests, or arrival rate)

Perf

orm

anc

e

[Von Behren]

Page 35: Servers: Concurrency and Performance Jeff Chase Duke University.

Response Time

Components• Wire time +• Queuing time +• Service demand +• Wire time (response)

Depends on• Cost/length of request• Load conditions at server

late

ncy

offered load

Page 36: Servers: Concurrency and Performance Jeff Chase Duke University.

Queuing Theory for Busy People

• Big Assumptions– Queue is First-Come-First-Served (FIFO, FCFS).– Request arrivals are independent (poisson

arrivals).– Requests have independent service demands.– i.e., arrival interval and service demand are

exponentially distributed (noted as “M”).

M/M/1 Service Center

offered loadrequest stream @

arrival rate λ

wait here

Process for mean service demand D

Page 37: Servers: Concurrency and Performance Jeff Chase Duke University.

Utilization

• What is the probability that the center is busy?– Answer: some number between 0 and 1.

• What percentage of the time is the center busy?– Answer: some number between 0 and 100

• These are interchangeable: called utilization U • If the center is not saturated, i.e., it completes all

its requests in some bounded time, then:• U = λD = (arrivals/T * service demand)• “Utilization Law”

• The probability that the service center is idle is 1-U.

Page 38: Servers: Concurrency and Performance Jeff Chase Duke University.

Little’s Law• For an unsaturated queue in steady state, mean

response time R and mean queue length N are governed by:

Little’s Law: N = λR

• Suppose a task T is in the system for R time units.

• During that time:– λR new tasks arrive.– N tasks depart (all tasks ahead of T).

• But in steady state, the flow in balances flow out.– Note: this means that throughput X = λ.

Page 39: Servers: Concurrency and Performance Jeff Chase Duke University.

Inverse Idle Time “Law”

R

1(100%)

Service center saturates as 1/ λ approaches D: small increases in λ cause large increases in the expected response time R.

U

Little’s Law gives response time R = D/(1 - U).

Intuitively, each task T’s response time R = D + DN.Substituting λR for N: R = D + D λR Substituting U for λD: R = D + URR - UR = D --> R(1 - U) = D --> R = D/(1 - U)

Page 40: Servers: Concurrency and Performance Jeff Chase Duke University.

Why Little’s Law Is Important

1. Intuitive understanding of FCFS queue behavior.• Compute response time from demand parameters (λ, D).• Compute N: how much storage is needed for the queue.

2. Notion of a saturated service center. – Response times rise rapidly with load and are unbounded.

• At 50% utilization, a 10% increase in load increases R by 10%.• At 90% utilization, a 10% increase in load increases R by 10x.

3. Basis for predicting performance of queuing networks.• Cheap and easy “back of napkin” estimates of system

performance based on observed behavior and proposed changes, e.g., capacity planning, “what if” questions.

Page 41: Servers: Concurrency and Performance Jeff Chase Duke University.

What does this tell us about server behavior at

saturation?

Page 42: Servers: Concurrency and Performance Jeff Chase Duke University.

Under the Hood

CPU

I/O device

I/O requestI/O completion

start (arrival rate λ)

exit (throughput λ until some

center saturates)

Page 43: Servers: Concurrency and Performance Jeff Chase Duke University.

Common Bottlenecks

• No more File Descriptors• Sockets stuck in TIME_WAIT• High Memory Use (swapping)• CPU Overload• Interrupt (IRQ) Overload

[Aaron Bannert]

Page 44: Servers: Concurrency and Performance Jeff Chase Duke University.

Scaling Server Sites: Clustering

server array

Clients

L4: TCPL7: HTTP

SSLetc.

Goalsserver load balancingfailure detectionaccess control filteringpriorities/QoSrequest localitytransparent caching smart

switch

virtual IP addresses

(VIPs)

What to switch/filter on?L3 source IP and/or VIPL4 (TCP) ports etc.L7 URLs and/or cookiesL7 SSL session IDs

Page 45: Servers: Concurrency and Performance Jeff Chase Duke University.

Scaling Services: Replication

InternetInternet

Distribute service load across multiple sites.

How to select a server site for each client or request?

Is it scalable?Client

Site A Site B

?

Page 46: Servers: Concurrency and Performance Jeff Chase Duke University.

Extra Slides

(Any new information on the following slides will not be tested.)

Page 47: Servers: Concurrency and Performance Jeff Chase Duke University.

Event-Based Concurrent Servers Using I/O

Multiplexing

• Maintain a pool of connected descriptors.• Repeat the following forever:

– Use the Unix select function to block until:• (a) New connection request arrives on the listening

descriptor.• (b) New data arrives on an existing connected

descriptor.– If (a), add the new connection to the pool of

connections.– If (b), read any available data from the connection

• Close connection on EOF and remove it from the pool.

[CMU 15-213]

Page 48: Servers: Concurrency and Performance Jeff Chase Duke University.

Problems of Multi-Thread Server

• High resource usage, context switch overhead, contended locks

• Too many threads throughput meltdown, response time explosion

• Solution: bound total number of threads

Page 49: Servers: Concurrency and Performance Jeff Chase Duke University.

Event-Driven Programming

• Event-driven programming, also called asynchronous i/o• Using Finite State Machines (FSM) to monitor the progress of requests• Yields efficient and scalable concurrency• Many examples: Click router, Flash web server, TP Monitors, etc.

• Java: asynchronous i/o– for an example see: http://www.cafeaulait.org/books/jnp3/examples/12/

Page 50: Servers: Concurrency and Performance Jeff Chase Duke University.

Traditional Processes

• Expensive and “heavyweight”• One system call per process• Fork overhead• Coordination

Page 51: Servers: Concurrency and Performance Jeff Chase Duke University.

Events

• Need async I/O• Need select• Wasn’t originally available• Not standardized• Immature• But efficient• Code is distributed all through the program• Harder to debug and understand

Page 52: Servers: Concurrency and Performance Jeff Chase Duke University.

Threads

• Separate interface and implementation• Pthreads interface• Implementation is user-level or kernel (native)• If user-level, needs async I/O• But hide the abstraction behind the thread

interface

Page 53: Servers: Concurrency and Performance Jeff Chase Duke University.

Reference

The State of the Art in Locally Distributed Web-server Systems

Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni and Philip S. Yu