Cloud on steroids Accelerating your cloud via cyborg
Transcript of Cloud on steroids Accelerating your cloud via cyborg
![Page 1: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/1.jpg)
2018 Lenovo Internal. All rights reserved.
Cloud on steroids
Accelerating your cloud via cyborg
Jinghua Gao, Zhenghao Wang (Staff Researcher, Lenovo Research)2018-05-23
OpenStack Vancouver Summit , May 2018
![Page 2: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/2.jpg)
2
Necessity of Acceleration
Management
Cyborg Introduction
Demo
Summary
01
02
04
05
Agenda Lenovo’s Contribution to Cyborg03
2018 Lenovo Internal. All rights reserved.
![Page 3: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/3.jpg)
32018 Lenovo Internal. All rights reserved.
1. Necessity of Acceleration Management
![Page 4: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/4.jpg)
42018 Lenovo Internal. All rights reserved.
Prevalence of Accelerations
1. Virtual Networking Offloading
2. Dynamic Optimization of Packet Flow Routing
3. Load Balancing and NAT,
4. Open vSwitch, HTTPs offloading
…
1. NVMe Over Fabric Enabled Acceleration
2. High Performance Persistent Memory
…
1. vBRAS, HQoS, Multicast Offloading
2. vRAN, Cipher/Decipher Offloading
3. SBC, Media Codec Offloading
4. Tensorflow, Model Training Acceleration
5. Crpytocurrency Mining Acceleration
6. Next Generation Fire Wall (NGFW) Acceleration
…
VM/App
layer
Compute Acceleration Storage Acceleration Network Acceleration
Infrastructure
layer
ASIC GPU FPGA
Provide
Hardware Accelerators
DPDK/SPDK
Software Accelerators
Accelerators
Usage Scenarios AI NFV BlockchainGenetic
SequencingBig Data
&
…
![Page 5: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/5.jpg)
52018 Lenovo Internal. All rights reserved.
Challenges
• Difficult to standardize various acceleration technologies – Software accelerators: DPDK, SPDK.
– Multi-vendor hardware accelerators with different architecture, like GPU, ASIC, FPGA etc.
• Complex– Different contexts and usage scenarios.
– Different forms: virtualized, shared by time, pass-through, etc.
• Expensive– Non-trivial management efforts
– High price of hardware.
Cyborg Project
Need a unified acceleration management framework to enable acceleration as a service
![Page 6: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/6.jpg)
62018 Lenovo Internal. All rights reserved.
2. Cyborg Introduction
![Page 7: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/7.jpg)
72018 Lenovo Internal. All rights reserved.
• General management framework– Software accelerators: DPDK/SPDK, PMEM, XDP/eBPF, ...
– Hardware accelerators : FPGA, GPU, QAT, NVMe SSD,
CCIX based Caches….
• Lifecycle management of accelerators– Discovery, Program, Attach, Detach, Remove
Accelerators
Discovery
Program
AttachDetach
Remove
Timeline and Definition
Rocky Release
os-acc
Xilinx FPGA driver
pythonclient
Nomad repo
established
Feb 2016
Apr 2016
Oct 2016 Feb 2017
Sep 2017
Feb 2018 Sep 2018
First BOF session
at Austin
First design session
in Barcelona
Rename to cyborg
Pike PTG
Becomes official
project
Queens PTG
Queens Release
API-DB
Conductor-Agent
Generic Driver
![Page 8: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/8.jpg)
8
Architecture
cyborg-api
cyborg-conductor cyborg-db
cyborg-agent
fpga-driver gpu-driver
vendor-a-fpga-driver vendor-b-fpga-driver vendor-c-gpu-driver
spdk-driver…
controller-node
compute-node
2018 Lenovo Internal. All rights reserved.
![Page 9: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/9.jpg)
92018 Lenovo Internal. All rights reserved.
Interaction with Other Projects
Attached to the VM where
workload demands acceleration.
Two main use case groups Other projects
Nova
FPGA(Intel & Xilinx)
Accelerator examples
Nova & Glance
Used by infrastructure, and then
utilized via appropriate service.
GPU, QAT…
DPDK/SPDK
![Page 10: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/10.jpg)
10
Interaction with Nova
• Work with Nova through three steps:
Representation
at Discovery
Instance
placement/
scheduling
Attaching
accelerators to
Instances32
2018 Lenovo Internal. All rights reserved.
nova-api
nova-conductor
nova-scheduler
nova-compute
hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
nova-placement-api
accelerators
update
cyborg-db
Upstream:
controllercompute
1
![Page 11: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/11.jpg)
11
Interaction with Nova
• Work with Nova through three steps:
Representation
at Discovery
Instance
placement/
scheduling
Attaching
accelerators to
Instances
1 32
2018 Lenovo Internal. All rights reserved.
nova-api
nova-conductor
nova-scheduler
nova-compute
hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
nova-placement-api
accelerators
update
cyborg-db
Upstream:
controllercompute
filter/weigher
![Page 12: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/12.jpg)
12
Interaction with Nova
• Work with Nova through three steps:
Representation
at Discovery
Instance
placement/
scheduling
Attaching
accelerators to
Instances
1 32
2018 Lenovo Internal. All rights reserved.
nova-api
nova-conductor
nova-scheduler
nova-compute
hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
nova-placement-api
accelerators
update
cyborg-db
Upstream:
controllercompute
filter/weigher
os-acc
![Page 13: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/13.jpg)
132018 Lenovo Internal. All rights reserved.
3. Lenovo’s Contributionto Cyborg
![Page 14: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/14.jpg)
142018 Lenovo Internal. All rights reserved.
Real World Requirements
AINFV Blockchain Big Data
GPU FPGANVMe
SSD
Accelerators
Netronome
smartnic
cavium
smartnic
Intel QAT
Hypervisor
DPDK
Neutron
OpenStack
Nova
API
Conductor
Agent
cyborg
Driver
...
NFVVNF(vRAN, vBRAS, SBC…) / Infrastructure( NGFW, OVS…)
High performance – 10~100Gbps up
High reliability – up time of 99.999%
Low-latency -- less than 100ms usually
![Page 15: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/15.jpg)
152018 Lenovo Internal. All rights reserved.
Lenovo’s Efforts on Cyborg
• Integrate with nova.– Provide an acceleration solution without
nova-placement.
– Provide the accelerator during VM boot time or via a separate attach/detach action.
• Extend drivers– Use upstream FPGA driver
– Add GPU, Netronome driver etc.
• There are still productions before newton release don’t have nova-placement.
• To dynamically use accelerators.
• To accelerate different workloads.
![Page 16: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/16.jpg)
16
Boot Time Attachment
Cyborg Use Case: GPU 1/2
nova-api
nova-conductornova-scheduler
nova-compute
Hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
Accelerators
cyborg-db
Resource updating at discoveryPeriodically update to cyborg-db.
Instance scheduling1. Create VMs with specific image properties.
2. Scheduling using acc_filter.
3. Cyborg return the compute nodes list.
Attaching accelerators to Instances1. Call cyborg to claim required GPU resource.
2. Define the XML with GPU pci_address.
3. Run VM, If fail, call cyborg to release the
allocated GPU resource.
periodically retrieve
acc_filtercontrollercompute
image_propeties
claim resources
2018 Lenovo Internal. All rights reserved.
![Page 17: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/17.jpg)
17
Run-time Attachment(Hot-plug)
Cyborg Use Case: GPU 2/2
Command:
nova accelerator-attach instance_id --type
GPU
Difference with boot time attachment:1. Query nova-db to get instance location.
2. Call cyborg to get accelerator list.
3. Add a new XML file and attach to VM.
nova-api
nova-compute
Hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
Accelerators
cyborg-db
controllercompute
2018 Lenovo Internal. All rights reserved.
![Page 18: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/18.jpg)
18
Cyborg Use Case: FPGA1. Use image properties to define the accelerator type & fpga function.
-Request-time Programming
2. Use existing glance table for FPGA bitstreams. Difference with GPU
attachment workflow:
1. Nova-compute call cyborg &
periodically check the
program status of bitstream
programming.
2. Cyborg get bitstreams from
glance then program it to
FPGA.
3. Change “type” of FPGA pf/vf.
The reason to change the
type of vf/pf is that resources
may be different in the
hypervisior level to be
attachded.
e.g. if the FPGA pf/vf is
programed with a given NiC
bitsreams, then cyborg should
change the type from fpga to
smartnic.
glance
2018 Lenovo Internal. All rights reserved.
nova-api
nova-conductornova-scheduler
nova-compute
Hypervisor
cyborg-api
cyborg-conductor
cyborg-agentDriver A Driver B Driver C
Accelerators
cyborg-dbtype
controllercompute
![Page 19: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/19.jpg)
192018 Lenovo Internal. All rights reserved.
4. DemoVM provisioning with GPU pass-through
![Page 20: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/20.jpg)
20
Internet
2018 Lenovo Internal. All rights reserved.
Environment
• Lenovo ThinkCloud OpenStack 4.2 Version– 3 nodes, 1 controller node and 2 compute nodes.
– One compute node with NVIDIA GPU.
• Demo: VM Provisioning with GPU Pass-through
node-4
controller
node-5
compute
node-6
compute
G
P
U
Internet
SwitchThinkCloud OpenStack 4.2
![Page 21: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/21.jpg)
212018 Lenovo Internal. All rights reserved.
5. Summary
![Page 22: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/22.jpg)
222018 Lenovo Internal. All rights reserved.
Summary
• Achievements– Use cyborg to manage different accelerators in Lenovo Product.
– Integrate with nova, form a standard workflow of creating VM with GPU/FPGA… pass-through.
• Future Work
– Support sharing accelerator hardware among VMs.- Cyborg-driver support for discovering and storing shared accelerators.
– Application Plugin mechanism of cyborg-api etc.
![Page 23: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/23.jpg)
232018 Lenovo Internal. All rights reserved.
Q&A
• Jinghua Gao– Email: [email protected]
– Twitter: @Miss_Coco_Gao
– IRC: coco
– Network acceleration & Datacenter traffic analysis
• Zhenghao Wang– Email: [email protected]
– IRC: wangzhh
– OpenStack Zun&Cyborg contributor
– Cloud computing researcher at Lenovo
![Page 24: Cloud on steroids Accelerating your cloud via cyborg](https://reader034.fdocuments.in/reader034/viewer/2022042604/6262d030ba762e71014c858a/html5/thumbnails/24.jpg)