1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University...

1

Combining Events and

Threads for Scalable Network Services

Peng Li and Steve Zdancewic

University of Pennsylvania

PLDI 2007, San Diego

2

Overview A Haskell framework for massively concurrent network

applications Servers, P2P systems, load generators

Massive concurrency ::= 1,000 threads? (easy)

| 10,000 threads? (common)

| 100,000 threads? (challenging)

| 1,000,000 threads? (20 years later?)

| 10,000,000 threads? (in 15 minutes) How to write such programs?

The very first decision to make: the programming modelShall we use threads or events?

A lazy, purely functional programming languagehttp://www.haskell.org

3

Threads vs. EventsThe multithreaded model

One thread ↔ one client Synchronous I/O Scheduling: OS/runtime libs

int send_data(int fd1, int fd2) { while (!EOF(fd1)) { size = read_chunk(fd, buf, count); write_chunk(fd, buf, size); } …

The event-driven model: One thread ↔ 10000 clients Asynchronous I/O Scheduling: programmer

while(1) { nfds=epoll_wait(kdpfd, events, MAXEVT,-1); for(n=0; n<nfds; ++n) handle_event(events[n]); …

Threads Events

Expressiveness and Abstraction

(for programming each client)

Synchronous I/O + intuitive control flow primitives

Finite state machines /

Continuation-passing style (CPS) programming

Flexibility and Control

(for resource scheduling)

Baked into OS/runtime, difficult to customize

Programmer has complete control – tailored to each application’s needs

“Why threads are a bad idea (for most purposes)” [USENIX ATC 1999]

“Why events are a bad idea (for high-concurrency servers)”

[HotOS 2003]

4

Can we get the best of both worlds?

The bridge between threads/events?(some kind of “continuation” support)

Resource scheduling: events•Written as part of the application • Tailored to application’s needs

Programming with each client: threads • Synchronous I/O• Intuitive control-flow primitives

One application program

5

Roads to lightweight, application-level concurrency

Direct language support for continuations:Good if you have them

Source-to-source CPS translationsRequires hacking on compiler/runtimeOften not very elegant

Other solutions? (no language support) (no compiler/runtime hacks)

6

The poor man’s concurrency monad “A poor man’s concurrency monad” by Koen Claessen,

JFP 1999. (Functional Pearl) The thread interface:

The CPS monad The event interface:

A lazy, tree-like data structure called “trace”

server_loop s = do { sock <- sock_accept s; sys_fork (client sock); server_loop;}client_loop sock = do { sock_send sock data; sock_close sock;}sock_send sock data = do { ... n<-sys_nbio (write_nb ...); ...; sys_epoll_wait sock EPOLL_READ; ... foo; ... n<-sys_nbio (write_nb ...); ...}

scheduler = do { ... trace <- fetch_thread; execute trace; ...}

execute trace = case trace of SYS_NBIO c -> do { cont <- c; execute cont; } SYS_FORK t1 t2 -> ...

SYS_NBIO(accept)

SYS_EPOLL_WAIT(s)

SYS_NBIO(accept)

SYS_FORK

SYS_NBIO(write_nb)

SYS_EPOLL_WAIT(sock)

Multithreaded code Trace

Thread Abstraction E

vent Abstraction

InternalRepresentation Scheduler code

CPSMonad

SYS_NBIO(write_nb)

7

Questions on the poor man’s approach

Does it work for high-performance network services?(using a pure, lazy, functional language?)

How does the design scale up to real systems? Symmetrical multiprocessing? Synchronization? I/O?

How cheap is it? How much does a poor man’s thread cost?

How poor is it? Does it offer acceptable performance?

8

Our experiment

A high-performance Haskell framework for massively-concurrent network services!!!

Supported features: Linux Asynchronous IO (AIO) epoll() and nonblocking IO OS thread pools SMP support Thread synchronization primitives

Applications developed IO benchmarks on FIFO pipes / Disk head scheduling A simple web server for static files HTTP load generator Prototype of an application-level TCP stack

We used the Glasglow Haskell Compiler (GHC)

9

Exception handling

Nested function calls

Conditional branches

Synchronous call to I/O lib

Recursion

Multithreaded code example

10

Event-driven code exampleA wrapper function to the C library call using the Haskell Foreign Function Interface

(FFI)

Put events in queues for processing in other OS threads

An event loop running in a separate OS thread

11

A complete event-driven I/O subsystem

Submit AIO request

Register event handler

ready_queue

blio_queue

worker_blio

worker_epoll

worker_aio

SYS_BLIO

/ forkworker_main

worker_main

worker_main

worker_main

Epollinterface

AIO interface

Event notification

AIO completion

Fet

ch t

hre

ad

s

OS thread pool for CPS thread execution

with event handler

Fetch thread

System call completion, thread ready to run

Context switchworker_blio

OS thread pool for blocking I/O

Haskell Foreign Function

Inteface (FFI)

Each event loop runs in a

separate OS thread

One “virtual processor” event

loop for each CPU

12

Modular and customizable I/O system (add a TCP stack if you like)

Submit AIO request

Register event handler

ready_queue

blio_queue

worker_epoll

worker_aio

SYS_BLIO

/ forkworker_main

worker_main

worker_main

worker_main

Epollinterface

AIO interface

Event notification

AIO completion

Fetc

h t

hre

ad

s

OS thread pool for CPS thread execution

with event handler

Fetch thread

System call completion, thread ready to run

Context switch

TCP stackstates

TCP User requests

worker_tcp_input

worker_tcp_timer

Request Completion

Request Completion

Blocking

worker_blio

worker_blio

OS thread pool for blocking I/O

worker_main

Define / interpret TCP syscalls (22 lines)

Event loop for incoming packets (7 lines)

Event loop for timers (9 lines)

13

How cheap is a poor man’s thread?

Minimal memory consumption: 48 bytes

Each thread just loops and does nothing Actual size determined by thread-local states

Even an ethernet packet can be >1,000 bytes… Pay as you go --- only pay for things needed

In contrast: A Linux POSIX thread’s stack has 2MB by default The state-of-the-art user-level thread system (Capriccio) use at

least a few KBs for each thread

Observation:The poor man’s thread is extremely memory-efficient(Challenging most event-driven systems)

48 bytes

14

I/O scalability test Comparison against the Linux POSIX Thread Library

(NPTL) Highly optimized OS thread implementation Each NPTL thread’s stack limited to 32KB

Mini-benchmarks used: Disk head scheduling (all threads running) FIFO pipe scalability with idle threads (128 threads running)

15

A simple web server

16

How poor is the poor man’s monad?

Not too shabby Benchmarks shows comparable (if not higher)

performance to existing, optimized systems

An elegant design is more important than 10% performance improvement

Added benefit: type safety for many dangerous things Continuations, thread queues, schedulers, asynchronous I/O

17

Related Work We are motivated by two projects:

Twisted: the python event-driven framework for scalable internet applications- The programmer must write code in CPS

Capriccio: a high-performance user-level thread system for network servers- Requires C compiler hacks

- Difficult to customize (e.g. adding SMP support)

Continuation-based concurrency [Wand 80], [Shivers 97], …

Other languages and programming models: CML, Erlang, …

18

Conclusion Haskell and The Poor Man’s Concurrency

Monad are a promising solution for high-performance, massively-concurrent networking applications:

Get the best of both threads and events!

This poor man’s approach is actually very cheap, and not so poor!

http://www.cis.upenn.edu/~lipeng/homepage/unify.html

1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University...

Documents

Transcript of 1 Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University...