Providing Multi-tenant Services with FPGAs: Case Study on a Key … · Providing Multi-tenant...
Transcript of Providing Multi-tenant Services with FPGAs: Case Study on a Key … · Providing Multi-tenant...
Providing Multi-tenant Services with FPGAs:Case Study on a Key-Value Store
Zsolt István*, Gustavo Alonso, Ankit Singla
Systems Group, Computer Science Dept., ETH Zürich
* Now at IMDEA Software Institute, Madrid
FPGAs in the Cloud
• Wider adoption of FPGAs (e.g., Amazon F1, Microsoft Catapult, …)
• Many promising use-cases but often singe-tenant designs
• Clouds built on sharing and multi-tenancy
❑ High utilization
❑ Flexible provisioning
❑ Load isolation and QoS guarantees
2
Providing multi-tenancy with FPGAs
Virtualization• General purpose (PR)
• Few tenants
• Trades off functionality
• Course grained resource alloc.
• Tenants “bring” applications
Multi-tenant applications• Domain-specific
• Many tenants
• Trades off performance (?)
• Fine grained resource alloc.
• Provider “brings” application3
FPGA FPGA
Multi-tenant application as a service
Key-value store
• Widely deployed in the cloud and datacenters
• Different tradeoffs but similar interfaces, e.g.: • Memcached – caching, no replication, latency-
optimized, main-memory
• Amazon S3 – BLOB store, replicated, BW-optimized, needs large capacity
4
Building a multi-tenant KVS (Multes)
• Area well studied in related work• Several pipelined designs, all saturate network link
• Caribou: Interfaces and functionality similar to SW [VLDB17]
• FPGA can provide replication for fault-tolerance [NSDI16]
• Requirements for multi-tenancy:• Performance isolation
• Data isolation
• Flexibility in resource allocation (focus on network bandwidth)
• Efficient use of resources regardless of number of tenants
5[VLDB17] Z. István, D. Sidler, G. Alonso Caribou: Intelligent Distributed Storage.[NSDI16] Z. István, D. Sidler, G. Alonso, M. Vukolic: Consensus in a Box: Inexpensive Coordination in Hardware.
Designing for multi-tenancy
• Caribou is composed of four modules• Requests can take various routes• Some traffic is inter-node• Hard to reason about load interactions!
• Multes: Reorganized pipeline to ensure all requests take same path (1)• Hash table implements parts of the
replication log features (multi-version)• More coupling between modules (op-
codes)
6
Multes (single pipeline)
Memory
Traffic Shap
erTraffic Sh
aper
Network Stack (TCP)
Replication
Multivers. Hash Table + Allocator
Value Access +
Processing
Caribou
Memory
Value Access + Processing
Hash Table + Allocator
Replication + Log Manager
Network Stack (TCP)
Re
plicatio
n
me
ssagesC
lient
messages
Token buckets
• Commonly used in networking scenarios• Max. number of tokens (D), adding C tokens
every T cycles
• Limits data rate, burst size
• Buffer space on the FPGA?• Queue commands before data movement
• Token buckets can be configured with no overhead at runtime (2)• Per-tenant allocations controlled by
software7
Traffic Shaper
Token Bucket
Token Bucket
Token Bucket
Extract ten
ant ID
Round-robin
Inp
ut p
ackets/com
man
ds
Ou
tpu
t packets/co
mm
and
s
Configuration
Per-tenant limits (D,C,T)
Meta-
data
Bo
dy
Encodes the “real cost” of the request
Request/command
Tenant 2Tenant 1 replicated group
Replicated KVS
8
FPGA node
FPGA node
FPGA node
FPGA node
Leader
Leader
• Caribou implements inter-FPGA replication (leader based algorithm)
Replica Replica
Replica Replica
Tenant 3 replicated group
Having multiple roles
• Control state machine at heart of replication protocol• Data and control handled separately
• Multiple copies not an option• Complex logic + plumbing
• SM extended to store state for each tenant – can context switch per each packet (3) • Not all states need tenant context• Latency inside SM not on critical path• Now in registers, but could use BRAMs to
store state
9
Ou
t. com
man
d
Inp
ut m
essage
Tenant 1 State
Tenant 2 State
Tenant 3 State
Replication controller (atomic broadcast
protocol)
List of peers,Role in protocol,
Outstanding proposals, etc.
Encodes key, data
op., socket numbers,
etc.
Evaluation of Multes
• Multiple Xilinx VC709s connected to a 10Gbps switch
• 9 load generating machines, Go-based benchmarking tool• Tenants connect to different TCP port numbers (e.g. 2880, 2881, …)
✓Multes offers flexible multi-tenancy while efficiently using the network link
10
Netw
ork
Tenant 2
Tenant 1
Client
Client
ClientClient
Client
Client
Multes
Memory/Storage
Replication protocol
No performance loss due to multi-tenancy
• Read-only throughput on a single node
11
Load isolation
• Replicated write latency of Tenant0 (group = 3)• Additional tenants using their full read bandwidth (1/8 of 10Gbps)
12
Replicated write latency [us](without client overhead)
Resource Usage: Small cost for sharing
0102030405060708090
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
% o
f V
C7
09
res
ou
rces
No. of max. supported tenants in Multes
13
Caribou
2x Caribou
Logic
BRAM
Logic
BRAM
Multes T=2
The FPGA part on the VC709 is XC7VX690T-2FFG1761C
Thoughts on the future
Platform-as-a-service
• Customize KVS with tenant-defined processing for different “flavors”
• Combining multi-tenant application with small PR regions• Simple streaming interfaces – can use HLS, OpenCL,
etc.
• Misbehaving PR region does not impact others
14
Multes
Memory
Traffic Shap
erTraffic Sh
aper
Network Stack (TCP)
Replication
Multivers. Hash Table + Allocator
Value Access +
Processing
Conclusion
Multes: multi-tenant KVS service that doesn’t sacrifice performance
Project on Github: https://github.com/fpgasystems/caribou
Relied on three techniques:
1) Single-pipeline architecture and traffic shapers → no load interaction
2) Runtime-parameterization of control modules → flexible allocations
3) “Contexts” in controlling state machines → no overhead when switching between tenants
→ Applicable to many network-facing applications on FPGAs
15