Capriccio: Scalable Threads for Internet Service

46
Capriccio: Scalable Threads for Internet Service

description

Introduction Internet services have ever-increasing scalability demands Current hardware is meeting these demands Software has lagged behind Recent approaches are event-based Pipeline stages of events

Transcript of Capriccio: Scalable Threads for Internet Service

Page 1: Capriccio: Scalable Threads for Internet Service

Capriccio: Scalable Threads for Internet Service

Page 2: Capriccio: Scalable Threads for Internet Service

Introduction

Internet services have ever-increasing scalability demands

Current hardware is meeting these demands Software has lagged behind

Recent approaches are event-based Pipeline stages of events

Page 3: Capriccio: Scalable Threads for Internet Service

Drawbacks of Events

Events systems hide the control flow Difficult to understand and debug Eventually evolved into call-and-return

event pairs Programmers need to match related events Need to save/restore states

Capriccio: instead of event-based model, fix the thread-based model

Page 4: Capriccio: Scalable Threads for Internet Service

Goals of Capriccio

Support for existing thread API Little changes to existing applications

Scalability to thousands of threads One thread per

execution Flexibility to address

application-specific needs

Performance

Ease

of

Prog

ram

min

g Threads

Threads

Events

Ideal

Page 5: Capriccio: Scalable Threads for Internet Service

Thread Design Principles

Kernel-level threads are for true concurrency

User-level threads provide a clean programming model with useful invariants and semantics

Decouple user from kernel level threads More portable

Page 6: Capriccio: Scalable Threads for Internet Service

Capriccio

Thread package All thread operations are O(1) Linked stacks

Address the problem of stack allocation for large numbers of threads

Combination of compile-time and run-time analysis

Resource-aware scheduler

Page 7: Capriccio: Scalable Threads for Internet Service

Thread Design and Scalability

POSIX API Backward compatible

Page 8: Capriccio: Scalable Threads for Internet Service

User-Level Threads

+ Performance+ Flexibility- Complex preemption- Bad interaction with kernel scheduler

Page 9: Capriccio: Scalable Threads for Internet Service

Flexibility

Decoupling user and kernel threads allows faster innovation

Can use new kernel thread features without changing application code

Scheduler tailored for applications Lightweight

Page 10: Capriccio: Scalable Threads for Internet Service

Performance

Reduce the overhead of thread synchronization

No kernel crossing for preemptive threading

More efficient memory management at user level

Page 11: Capriccio: Scalable Threads for Internet Service

Disadvantages

Need to replace blocking calls with nonblocking ones to hold the CPU Translation overhead

Problems with multiple processors Synchronization becomes more

expensive

Page 12: Capriccio: Scalable Threads for Internet Service

Context Switches

Built on top of Edgar Toernig’s coroutine library

Fast context switches when threads voluntarily yield

Page 13: Capriccio: Scalable Threads for Internet Service

I/O

Capriccio intercepts blocking I/O calls Uses epoll for asynchronous I/O

Page 14: Capriccio: Scalable Threads for Internet Service

Scheduling

Very much like an event-driven application

Events are hidden from programmers

Page 15: Capriccio: Scalable Threads for Internet Service

Synchronization

Supports cooperative threading on single-CPU machines Requires only Boolean checks

Page 16: Capriccio: Scalable Threads for Internet Service

Threading Microbenchmarks

SMP, two 2.4 GHz Xeon processors 1 GB memory two 10 K RPM SCSI Ultra II hard

drives Linux 2.5.70 Compared Capriccio, LinuxThreads,

and Native POSIX Threads for Linux

Page 17: Capriccio: Scalable Threads for Internet Service

Latencies of Thread Primitives

Capriccio LinuxThreads NPTL

Thread creation

21.5 21.5 17.7

Thread context switch

0.24 0.71 0.65

Uncontended mutex lock

0.04 0.14 0.15

Page 18: Capriccio: Scalable Threads for Internet Service

Thread Scalability

Producer-consumer microbenchmark LinuxThreads begin to degrade after

20 threads NPTL degrades after 100 Capriccio scales to 32K producers and

consumers (64K threads total)

Page 19: Capriccio: Scalable Threads for Internet Service

Thread Scalability

Page 20: Capriccio: Scalable Threads for Internet Service

I/O Performance

Network performance Token passing among pipes Simulates the effect of slow client links 10% overhead compared to epoll Twice as fast as both LinuxThreads and

NPTL when more than 1000 threads Disk I/O comparable to kernel threads

Page 21: Capriccio: Scalable Threads for Internet Service

Linked Stack Management

LinuxThreads allocates 2MB per stack 1 GB of VM holds only 500 threads

Fixed Stacks

Page 22: Capriccio: Scalable Threads for Internet Service

Linked Stack Management

But most threads consumes only a few KB of stack space at a given time

Dynamic stack allocation can significantly reduce the size of VM

Linked Stack

Page 23: Capriccio: Scalable Threads for Internet Service

Compiler Analysis and Linked Stacks

Whole-program analysis Based on the call graph Problematic for recursions Static estimation may be too conservative

Page 24: Capriccio: Scalable Threads for Internet Service

Compiler Analysis and Linked Stacks

Grow and shrink the stack size on demand Insert checkpoints to determine whether

we need to allocate more before the next checkpoint

Result in noncontiguous stacks

Page 25: Capriccio: Scalable Threads for Internet Service

Placing Checkpoints

One checkpoint in every cycle in the call graph

Bound the size between checkpoints with the deepest call path

Page 26: Capriccio: Scalable Threads for Internet Service

Dealing with Special Cases

Function pointers Don’t know what procedure to call at

compile time Can find a potential set of procedures

Page 27: Capriccio: Scalable Threads for Internet Service

Dealing with Special Cases

External functions Allow programmers to annotate external

library functions with trusted stack bounds

Allow larger stack chunks to be linked for external functions

Page 28: Capriccio: Scalable Threads for Internet Service

Tuning the Algorithm

Stack space can be wasted Internal and external fragmentation

Tradeoffs Number of stack linkings External fragmentation

Page 29: Capriccio: Scalable Threads for Internet Service

Memory Benefits

Tuning can be application-specific No preallocation of large stacks

Reduced requirement to run a large numbers of threads

Better paging behavior Stacks—LIFO

Page 30: Capriccio: Scalable Threads for Internet Service

Case Study: Apache 2.0.44

Maximum stack allocation chunk: 2KB Apache under SPECweb99 Overall slowdown is about 3%

Dynamic allocation 0.1% Link to large chunks for external functions

0.5% Stack removal 10%

Page 31: Capriccio: Scalable Threads for Internet Service

Resource-Aware Scheduling

Advantages of event-based scheduling Tailored for applications With event handlers

Events provide two important pieces of information for scheduling Whether a process is close to completion Whether a system is overloaded

Page 32: Capriccio: Scalable Threads for Internet Service

Resource-Aware Scheduling

Thread-based View applications as sequence of

stages, separated by blocking calls Analogous to event-based scheduler

Page 33: Capriccio: Scalable Threads for Internet Service

Blocking Graph

Node: A location in the program that blocked

Edge: between two nodes if they were consecutive blocking points

Generated at runtime

Page 34: Capriccio: Scalable Threads for Internet Service

Resource-Aware Scheduling

1. Keep track of resource utilization 2. Annotate each node with resource

used and its outgoing edges3. Dynamically prioritize nodes

Prefer nodes that release resources

Page 35: Capriccio: Scalable Threads for Internet Service

Resources

CPU Memory (malloc) File descriptors (open, close)

Page 36: Capriccio: Scalable Threads for Internet Service

Pitfalls

Tricky to determine the maximum capacity of a resource Thrashing depends on the workload

Disk can handle more requests that are sequential instead of random

Resources interact VM vs. disk

Applications may manage memory themselves

Page 37: Capriccio: Scalable Threads for Internet Service

Yield Profiling

User threads are problematic if a thread fails to yield

They are easy to detect, since their running times are orders of magnitude larger

Yield profiling identifies places where programs fail to yield sufficiently often

Page 38: Capriccio: Scalable Threads for Internet Service

Web Server Performance

4x500 MHz Pentium server 2GB memory Intel e1000 Gigabit Ethernet card Linux 2.4.20 Workload: requests for 3.2 GB of

static file data

Page 39: Capriccio: Scalable Threads for Internet Service

Web Server Performance

Request frequencies match those of the SPECweb99

A client connects to a server repeated and issue a series of five requests, separated by 20ms pauses

Apache’s performance improved by 15% with Capriccio

Page 40: Capriccio: Scalable Threads for Internet Service

Resource-Aware Admission Control

Consumer-producer applications Producer loops, adding memory, and

randomly touching pages Consumer loops, removing memory

from the pool and freeing it Fast producer may run out of virtual

address space

Page 41: Capriccio: Scalable Threads for Internet Service

Resource-Aware Admission Control

Touching pages too quickly will cause thrashing

Capriccio can quickly detect the overload conditions and limit the number of producers

Page 42: Capriccio: Scalable Threads for Internet Service

Programming Models for High Concurrency

Event Application-specific optimization

Thread Efficient thread runtimes

Page 43: Capriccio: Scalable Threads for Internet Service

User-Level Threads

Capriccio is unique Blocking graph Resource-aware scheduling Target at a large number of blocking

threads POSIX compliant

Page 44: Capriccio: Scalable Threads for Internet Service

Application-Specific Optimization

Most approaches require programmers to tailor their application to manage resources

Nonstandard APIs, less portable

Page 45: Capriccio: Scalable Threads for Internet Service

Stack Management

No garbage collection

Page 46: Capriccio: Scalable Threads for Internet Service

Future Work

Multi-CPU machines Profiling tools for system tuning