OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in...

46
1 SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG

Transcript of OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in...

Page 1: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

1

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION

ACCELA ZHAO, LAYNE PENG

Page 2: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

2

Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling and container technologies.

WHO ARE THOSE GUYS …

Layne Peng, Principal Technologist at EMC OCTO, experienced cloud architect, one of the earliest contributors to Cloud Foundry in China, 9 patents owner and a book author.

Mail: [email protected]

Mail: [email protected] Twitter: @layne_peng

Page 3: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

3

WHAT IS RESOURCE UTILIZATION?

This is what we buy

This is what we use

A gap of $$$ wasted

Page 4: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

4

ENERGY AND RESOURCE UTILIZATION

Energy-related costs 42% of total (including buy new machines)

An idle server consumes even 70% as much energy as running in full-speed

Low resource utilization is energy inefficient Waste energy, waste money

Real world resource utilization is usually low: around 20% or less

Page 5: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

5

A CLOSER LOOK TO CLOUD

The key advantage - cloud consolidation

Less machines, more apps. Energy-efficient and saves money.

Improved resource utilization

Page 6: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

6

• Scheduling - choose the best resource placement when app starts – Examples: Green Cloud, Paragon. And the schedulers in

Openstack, Kubernetes, Mesos, …

• Migration - continuously optimize the resource placement when app is running – Examples: Openstack Watcher, VMware DRS

• Soft Container - elastic, and dynamically adjust resource constraints in response to co-located apps – Related: Google Heracles

RESOURCE UTILIZATION ON CLOUD

Soft Container

Page 7: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

7

RESOURCE UTILIZATION ON CLOUD

Scheduler

Migration

Apps

Soft Container

Manages resource utilization at app kick-off

Manages resource utilization cross hosts while app running

Manages resource utilization at fine granularity inside host

Page 8: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

8

RESOURCE UTILIZATION ON CLOUD

A battle of putting more apps in each host

vs. guaranteed app SLA

The key problem: resource interference

Page 9: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

9

• What is resource interference? – Apps co-located in one host share resources like CPU,

cache, memory, …

– They interfere with each other, result in poor performance compared to running standalone

– Resource interference make SLA unenforceable

• Related readings – Google Heracles: an analysis of resource interference

– Paragon: resource interference-aware scheduling

– Bubble-up: to measure resource interference

THE KEY PROBLEM: RESOURCE INTERFERENCE

Page 10: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

10

RESOURCE INTERFERENCE: HOW IT LOOKS?

MySQL standalone running vs co-located with a CPU & disk hungry task

Page 11: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

11

• Bubble-up – The setup

• Run app co-located with resource benchmarks, each benchmark stresses one type of resource

– App tolerated resource interference • Slowly increase resource benchmark stress until app fails its SLA.

• The critical point shows how much resource interference the app can tolerate.

– App caused resource interference • Run app at what its SLA requires.

• The stress it causes on each type of resource is the app’s caused resource interference.

• Where to use it? – Better resource utilization management

– Scheduling, Migration, Soft Container, …

RESOURCE INTERFERENCE: HOW TO MEASURE?

Page 12: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

12

RESOURCE INTERFERENCE: HOW TO MEASURE?

MySQL standalone running, vs co-located with CPU stress, vs disk stress. In my case, MySQL is much more sensitive to CPU interference.

Page 13: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

13

• Motivations – Increase resource utilization by co-locating more apps

• E.g. Business services is critical but may not use all resources on the host. Add the low priority hadoop batching tasks to fill what is left.

– Respond to the dynamic nature of time-varying workload • E.g. Business service may become more idle at lunch time, hadoop

tasks can then expand its resource bubble and utilize the leftover.

– Guarantee the SLA of critical apps • E.g. When the business service suddenly requires more resource for

processing, hadoop tasks will shrink instantly to give out resources.

• Challenges – Resource control and isolation of interference

– Respond to dynamic workload change

INTRODUCING TO SOFT CONTAINER

Page 14: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

14

• What does “Soft” mean? – Varying container resources needs based upon neighbors

and SLAs. (The container becomes elastic)

– “Expanding” (bubble up) resources when idle resources exist

– Shrinking resources on a specific container, when another critical app demands more resources

INTRODUCING TO SOFT CONTAINER

Container resource bubble

Time

Resource

Page 15: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

15

THE FEEDBACK CONTROL LOOP

Controller

Watcher Limiter

Containers

Soft Container

Page 16: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

16

RESOURCES TO LIMIT

• CPU – Core

– Time Quota

– …

• Memory – Size

– Bandwidth

– …

• Disk I/O – IOPS

– Throughput

– …

Page 17: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

17

RESOURCES TO LIMIT - MISSING

• CPU – Core

– Time Quota

– …

• Cache – LLC

– …

• Memory – Size

– Bandwidth*

– …

• GPU – …

• Device* – …

• Network – Ulimit

– Bandwidth

– …

• Disk I/O – IOPS

– Throughput

– …

Kernel 3.6, most supports can be found in the community…

Page 18: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

18

ISOLATION THE RESOURCES - NAMESPACE

/proc/<pid>/ns: • lrwxrwxrwx 1 root root 0 Jun 21 18:38 ipc -> ipc:[4026532509] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 mnt -> mnt:[4026532507] • lrwxrwxrwx 1 root root 0 Jun 16 18:24 net -> net:[4026532512] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 pid -> pid:[4026532510] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 user -> user:[4026531837] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 uts -> uts:[4026532508]

• clone(): create a new process and attached to a new namespace • unshare(): create a new namespace and attaches to a existed process • setns(): Set a a process to a existing namespace

• security namespace • security keys namespace • device namespace • time namespace

We are still waiting…

Page 19: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

19

LIMIT THE RESOURCE - CGROUP

Task, Control Group & Hierarchy Subsystem – Control options

• blkio • cpu • cpuacct • cpuset • devices

• freezer • memory • net_cls • net_prio • ns

Create a cgroup subsystem Change the limitation…

Usage

# echo 524288000 > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes

Page 20: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

20

MISSING - NETWORK

Isolation, does not means resource controlling

10

Suppose two containers in a machine, totally 100Gbps b/w

80

100Gbps

Page 21: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

21

MISSING - NETWORK

Isolation, does not means resource controlling

10

Suppose two containers in a machine, totally 100Gbps b/w

80

100Gbps

95

100Gbps

If the GREEN container consumes the majority of b/w, which may have a negative impact on the BLUE one… How we can avoid this case from happening?

Page 22: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

22

MISSING - NETWORK

Community attempts: Base on Traffic Control (tc)

Nightmare of the PaaS providers…

Page 23: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

23

MISSING - NETWORK

Community attempts: Base on Traffic Control (tc)

Nightmare of the PaaS providers…

Page 25: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

25

MISSING - GPU

Nvidia’s efforts:

a. GPU exposed as separated normal devices in /dev

Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation

b. devices cgroup: • Allow/Deny/List • Access

i. R ii. W iii. M

Usable, but insufficient… 1. Launch multiple jobs in parallel, each one us a subset of avaiable GPUs; 2. How about share GPU between Jobs with proper isolation? Can we share

a GPU like we can a CPU?

Page 26: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

26

MISSING - CACHE

Intel’s efforts:

Cache Monitor Technology (CMT) • For an OS or VMM to indicate a software-

defined ID for each of applications or VMs that are scheduled to run on a core. This ID is called the Resource Monitoring ID (RMID).

• To Monitor cache occupancy on a per RMID basis

• For an OS or VMM to read LLC occupancy for a given RMID at any time.

Cache Allocation Technology (CAT) • The ability to enumerate the CAT capability and

the associated LLC allocation support via CPUID.

• Interfaces for the OS/hypervisor to group applications into classes of service (CLOS) and indicate the amount of last-level cache available to each CLOS. These interfaces are based on MSRs (Model-Specific Registers).

Code and Data Prioritization (CDP) • Extension to CAT • a new CPUID feature flag is added within the

CAT sub-leaves at CPUID.0x10.[ResID=1]:ECx[bit 2] to indicate support

Page 27: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

27

MISSING – MEMORY BANDWIDTH

Monitor

Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache

occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.

• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.

Control

Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard

Page 28: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

28

MISSING – MEMORY BANDWIDTH

Monitor

Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache

occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.

• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.

Control

Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard

Page 29: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

29

• Latencies – App request latency

– Disk IO await

– Network response time

• Queue length – CPU load average

– Disk request queue size

– Network queue length

• Utilization – CPU util rate

– Disk util rate

– Network util rate

WATCH THE WORKLOAD CHANGE

• Bandwidth – DRAM bandwidth

– CPU bandwidth

– Disk bandwidth

• Request count – App request count

– Disk IOPS / req/s

– Network IOPS / req/s

• Granularity – Global level

– Per container level

Page 30: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

30

THE FEEDBACK CONTROL LOOP

Controller

Watcher Limiter

Containers

Soft Container

Page 31: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

31

THE FEEDBACK CONTROL LOOP

Controller

Watcher Limiter

Containers

Soft Container

Immediate response

Page 32: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

32

THE FEEDBACK CONTROL LOOP

Controller

Watcher Limiter

Containers

Soft Container

Immediate response

How to immediately resize the containers?

Page 33: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

33

HOW WE LOOK AT RESIZE?

a. Create a new container; b. Live migrate the contents to new container:

1. Transfer existed data to new container; 2. Transfer the instant data to new container.

c. Stop the old container d. Start the new container e. Route the traffic to new container

Page 34: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

34

9527 /usr/sbin/httpd

Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs

Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs

a. Mount to new cgroup or change the value of the cgroup

b. Done!

IN CONTAINER’S WORLD…

Page 35: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

35

9527 /usr/sbin/httpd

Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs

Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs

a. Mount to new cgroup or change the value of the cgroup

b. Done!

IN CONTAINER’S WORLD…

We need to take a fresh look at the resources management from

Container’s perspective.

Page 36: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

36

SOFT CONTAINER: IMPLEMENTATION

Controller Algorithm ”expand”

Algorithm ”pin_idle”

Algorithm plugin N

Watcher CPU plugin

Disk plugin

Watcher plugin N

Limiter RunC plugin

Docker plugin

Limiter plugin N

Metrics Store

CPU statistics

Disk …

More …

Container Repo

RunC plugin

Docker plugin

Container type N

Containers

Auto discovery

Page 37: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

37

• Early version

• Support RunC and Docker containers

• A few controller algorithms which are effective

• Able to expand with more plugins

SOFT CONTAINER: CURRENT STATUS

Completely runnable!

Page 38: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

38

Demo Time :-)

Page 39: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

39

BENCHMARK RESULTS: BEFORE

If uncontrolled, MySQL workload is severely interfered by co-located low priority task

Page 40: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

40

BENCHMARK RESULTS: BEFORE

The CPU utilization is far from saturation while workload varies by time (Although in my case, disk IO is highly utilized)

Page 41: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

41

BENCHMARK RESULTS: SOFT CONTAINER

With Soft Container (green line), latency impact is controlled. (We can improve the algorithm to cope better with peak workload)

Page 42: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

42

BENCHMARK RESULTS: SOFT CONTAINER

Soft Container helps improve CPU utilization by co-locating new tasks with MySQL

Page 43: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

43

BENCHMARK RESULTS: SOFT CONTAINER

CPU utilization looks close to saturation, after adding in iowait time

Page 44: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

44

• Soft Container monitors app resource needs and overall resource utilization in realtime

• Soft Container issues resource controls in realtime, to guard app SLA and balance resource utilization

HOW DOES SOFT CONTAINER DID THIS?

Page 45: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

45

BENCHMARK RESULTS: SOFT CONTAINER

How the resource bubble floats under the control of Soft Container. (The vibration threshold are made very sensitive to workload change)

Page 46: OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in cloud scheduling and container technologies. WHO ARE THOSE GUYS … Layne Peng, Principal

Q&A