OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in...

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION

ACCELA ZHAO, LAYNE PENG

Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling and container technologies.

WHO ARE THOSE GUYS …

Layne Peng, Principal Technologist at EMC OCTO, experienced cloud architect, one of the earliest contributors to Cloud Foundry in China, 9 patents owner and a book author.

Mail: accela.zhao@emc.com

Mail: layne.peng@emc.com Twitter: @layne_peng

WHAT IS RESOURCE UTILIZATION?

This is what we buy

This is what we use

A gap of $$$ wasted

ENERGY AND RESOURCE UTILIZATION

Energy-related costs 42% of total (including buy new machines)

An idle server consumes even 70% as much energy as running in full-speed

Low resource utilization is energy inefficient Waste energy, waste money

Real world resource utilization is usually low: around 20% or less

A CLOSER LOOK TO CLOUD

The key advantage - cloud consolidation

Less machines, more apps. Energy-efficient and saves money.

Improved resource utilization

• Scheduling - choose the best resource placement when app starts – Examples: Green Cloud, Paragon. And the schedulers in

Openstack, Kubernetes, Mesos, …

• Migration - continuously optimize the resource placement when app is running – Examples: Openstack Watcher, VMware DRS

• Soft Container - elastic, and dynamically adjust resource constraints in response to co-located apps – Related: Google Heracles

RESOURCE UTILIZATION ON CLOUD

Soft Container

Scheduler

Migration

Soft Container

Manages resource utilization at app kick-off

Manages resource utilization cross hosts while app running

Manages resource utilization at fine granularity inside host

A battle of putting more apps in each host

vs. guaranteed app SLA

The key problem: resource interference

• What is resource interference? – Apps co-located in one host share resources like CPU,

cache, memory, …

– They interfere with each other, result in poor performance compared to running standalone

– Resource interference make SLA unenforceable

• Related readings – Google Heracles: an analysis of resource interference

– Paragon: resource interference-aware scheduling

– Bubble-up: to measure resource interference

THE KEY PROBLEM: RESOURCE INTERFERENCE

RESOURCE INTERFERENCE: HOW IT LOOKS?

MySQL standalone running vs co-located with a CPU & disk hungry task

• Bubble-up – The setup

• Run app co-located with resource benchmarks, each benchmark stresses one type of resource

– App tolerated resource interference • Slowly increase resource benchmark stress until app fails its SLA.

• The critical point shows how much resource interference the app can tolerate.

– App caused resource interference • Run app at what its SLA requires.

• The stress it causes on each type of resource is the app’s caused resource interference.

• Where to use it? – Better resource utilization management

– Scheduling, Migration, Soft Container, …

RESOURCE INTERFERENCE: HOW TO MEASURE?

MySQL standalone running, vs co-located with CPU stress, vs disk stress. In my case, MySQL is much more sensitive to CPU interference.

• Motivations – Increase resource utilization by co-locating more apps

• E.g. Business services is critical but may not use all resources on the host. Add the low priority hadoop batching tasks to fill what is left.

– Respond to the dynamic nature of time-varying workload • E.g. Business service may become more idle at lunch time, hadoop

tasks can then expand its resource bubble and utilize the leftover.

– Guarantee the SLA of critical apps • E.g. When the business service suddenly requires more resource for

processing, hadoop tasks will shrink instantly to give out resources.

• Challenges – Resource control and isolation of interference

– Respond to dynamic workload change

INTRODUCING TO SOFT CONTAINER

• What does “Soft” mean? – Varying container resources needs based upon neighbors

and SLAs. (The container becomes elastic)

– “Expanding” (bubble up) resources when idle resources exist

– Shrinking resources on a specific container, when another critical app demands more resources

INTRODUCING TO SOFT CONTAINER

Container resource bubble

Resource

THE FEEDBACK CONTROL LOOP

Controller

Watcher Limiter

Containers

Soft Container

RESOURCES TO LIMIT

• CPU – Core

– Time Quota

– …

• Memory – Size

– Bandwidth

– …

• Disk I/O – IOPS

– Throughput

– …

RESOURCES TO LIMIT - MISSING

• CPU – Core

– Time Quota

– …

• Cache – LLC

– …

• Memory – Size

– Bandwidth*

– …

• GPU – …

• Device* – …

• Network – Ulimit

– Bandwidth

– …

• Disk I/O – IOPS

– Throughput

– …

Kernel 3.6, most supports can be found in the community…

ISOLATION THE RESOURCES - NAMESPACE

/proc/<pid>/ns: • lrwxrwxrwx 1 root root 0 Jun 21 18:38 ipc -> ipc:[4026532509] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 mnt -> mnt:[4026532507] • lrwxrwxrwx 1 root root 0 Jun 16 18:24 net -> net:[4026532512] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 pid -> pid:[4026532510] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 user -> user:[4026531837] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 uts -> uts:[4026532508]

• clone(): create a new process and attached to a new namespace • unshare(): create a new namespace and attaches to a existed process • setns(): Set a a process to a existing namespace

• security namespace • security keys namespace • device namespace • time namespace

We are still waiting…

LIMIT THE RESOURCE - CGROUP

Task, Control Group & Hierarchy Subsystem – Control options

• blkio • cpu • cpuacct • cpuset • devices

• freezer • memory • net_cls • net_prio • ns

Create a cgroup subsystem Change the limitation…

# echo 524288000 > /sys/fs/cgroup/memory/foo/memory.limit_in_bytes

MISSING - NETWORK

Isolation, does not means resource controlling

Suppose two containers in a machine, totally 100Gbps b/w

100Gbps

MISSING - NETWORK

Isolation, does not means resource controlling

Suppose two containers in a machine, totally 100Gbps b/w

100Gbps

If the GREEN container consumes the majority of b/w, which may have a negative impact on the BLUE one… How we can avoid this case from happening?

MISSING - NETWORK

Community attempts: Base on Traffic Control (tc)

Nightmare of the PaaS providers…

MISSING - NETWORK

Community attempts: Base on Traffic Control (tc)

Nightmare of the PaaS providers…

MISSING - GPU

Nvidia’s efforts:

a. GPU exposed as separated normal devices in /dev

Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation

b. devices cgroup: • Allow/Deny/List • Access

i. R ii. W iii. M

MISSING - GPU

Nvidia’s efforts:

a. GPU exposed as separated normal devices in /dev

Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation

b. devices cgroup: • Allow/Deny/List • Access

i. R ii. W iii. M

Usable, but insufficient… 1. Launch multiple jobs in parallel, each one us a subset of avaiable GPUs; 2. How about share GPU between Jobs with proper isolation? Can we share

a GPU like we can a CPU?

MISSING - CACHE

Intel’s efforts:

Cache Monitor Technology (CMT) • For an OS or VMM to indicate a software-

defined ID for each of applications or VMs that are scheduled to run on a core. This ID is called the Resource Monitoring ID (RMID).

• To Monitor cache occupancy on a per RMID basis

• For an OS or VMM to read LLC occupancy for a given RMID at any time.

Cache Allocation Technology (CAT) • The ability to enumerate the CAT capability and

the associated LLC allocation support via CPUID.

• Interfaces for the OS/hypervisor to group applications into classes of service (CLOS) and indicate the amount of last-level cache available to each CLOS. These interfaces are based on MSRs (Model-Specific Registers).

Code and Data Prioritization (CDP) • Extension to CAT • a new CPUID feature flag is added within the

CAT sub-leaves at CPUID.0x10.[ResID=1]:ECx[bit 2] to indicate support

MISSING – MEMORY BANDWIDTH

Monitor

Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache

occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.

• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.

Control

Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard

MISSING – MEMORY BANDWIDTH

Monitor

Memory Bandwidth Monitoring (MBM) • Mechanisms in hardware to monitor cache

occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis.

• Mechanisms for the OS or hypervisor to read back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime.

Control

Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard

• Latencies – App request latency

– Disk IO await

– Network response time

• Queue length – CPU load average

– Disk request queue size

– Network queue length

• Utilization – CPU util rate

– Disk util rate

– Network util rate

WATCH THE WORKLOAD CHANGE

• Bandwidth – DRAM bandwidth

– CPU bandwidth

– Disk bandwidth

• Request count – App request count

– Disk IOPS / req/s

– Network IOPS / req/s

• Granularity – Global level

– Per container level

Controller

Watcher Limiter

Containers

Soft Container

Controller

Watcher Limiter

Containers

Soft Container

Immediate response

Controller

Watcher Limiter

Containers

Soft Container

Immediate response

How to immediately resize the containers?

HOW WE LOOK AT RESIZE?

a. Create a new container; b. Live migrate the contents to new container:

1. Transfer existed data to new container; 2. Transfer the instant data to new container.

c. Stop the old container d. Start the new container e. Route the traffic to new container

9527 /usr/sbin/httpd

Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs

Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs

a. Mount to new cgroup or change the value of the cgroup

b. Done!

IN CONTAINER’S WORLD…

9527 /usr/sbin/httpd

Control Groups (cgroup): • CPU time: 20 • System memory: 1G • Disk bandwidth: 2000 • Network bandwidth: 100Mbs

Control Groups (cgroup): • CPU time: 70 • System memory: 5G • Disk bandwidth: 8000 • Network bandwidth: 1Gbs

a. Mount to new cgroup or change the value of the cgroup

b. Done!

IN CONTAINER’S WORLD…

We need to take a fresh look at the resources management from

Container’s perspective.

SOFT CONTAINER: IMPLEMENTATION

Controller Algorithm ”expand”

Algorithm ”pin_idle”

Algorithm plugin N

Watcher CPU plugin

Disk plugin

Watcher plugin N

Limiter RunC plugin

Docker plugin

Limiter plugin N

Metrics Store

CPU statistics

Disk …

More …

Container Repo

RunC plugin

Docker plugin

Container type N

Containers

Auto discovery

• Early version

• Support RunC and Docker containers

• A few controller algorithms which are effective

• Able to expand with more plugins

SOFT CONTAINER: CURRENT STATUS

Completely runnable!

Demo Time :-)

BENCHMARK RESULTS: BEFORE

If uncontrolled, MySQL workload is severely interfered by co-located low priority task

BENCHMARK RESULTS: BEFORE

The CPU utilization is far from saturation while workload varies by time (Although in my case, disk IO is highly utilized)

BENCHMARK RESULTS: SOFT CONTAINER

With Soft Container (green line), latency impact is controlled. (We can improve the algorithm to cope better with peak workload)

Soft Container helps improve CPU utilization by co-locating new tasks with MySQL

CPU utilization looks close to saturation, after adding in iowait time

• Soft Container monitors app resource needs and overall resource utilization in realtime

• Soft Container issues resource controls in realtime, to guard app SLA and balance resource utilization

HOW DOES SOFT CONTAINER DID THIS?

How the resource bubble floats under the control of Soft Container. (The vibration threshold are made very sensitive to workload change)

OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in...

Documents

Transcript of OpenStack managed Cloud Foundry service Market · Openstack community contributor, experienced in...

OpenStack & Cloud Foundry (OpenStack Fall 2012 Summit)

OpenStack Operator's Guide - SUSE OpenStack Cloud Monitoring

SUSE OpenStack Cloud · 2018. 12. 11. · SUSE OpenStack Cloud Positioning SUSE OpenStack Cloud is the open source private cloud solution of choice for enterprise business, designed

The OpenStack Cloud at CERN - OpenStack Nordic

OpenStack Cloud Application Developmentindex-of.co.uk/Cloud-Technology/Openstack Cloud Application Devel… · OpenStack through the lens of these same technologies. Here is a list

Hybrid cloud openstack meetup

Cloud Management with OpenStack · Cloud Management with OpenStack Bryan Barton ... • What is OpenStack ... -RDO PackStack/Manual -Warewulf

OpenStack Cloud Storage - SNIA · OpenStack Cloud Storage January 14, 2015 . 2 Alex McDonald, SNIA – CSI Cloud Storage Initiative Chair NetApp ... Quick OpenStack overview !

Cloud Foundry and OpenStack

OpenStack End User Guide - SUSE OpenStack Cloud 8 · OpenStack End User Guide SUSE OpenStack Cloud 8 ABSTRACT OpenStack is an open-source cloud computing platform for public and private

IBM Cloud OpenStack Services

Dell OpenStack Cloud Solution

Architecting cloud with OpenStack

Dell OpenStack Powered Cloud Solution and Case Sharingevents.csdn.net/OpenStack/Herry Xu-Dell OpenStack Powered Cloud... · Dell OpenStack Powered Cloud Solution and Case ... •

Docker OpenStack Cloud Foundry

SUSE OpenStack Cloud - media.zones.com · SUSE OpenStack Cloud. An automated cloud computing platform built for today’s enterprise, SUSE OpenStack Cloud lets you rapidly deploy

Prying the Cloud Open: Dell Crowbar & OpenStack · Prying the Cloud Open: Dell Crowbar & OpenStack Rob Hirschfeld (@zehicle), Principal Cloud Architect ... CloudOps for OpenStack

Containerized SUSE OpenStack Cloud Technology Preview · 1 Welcome to Containerized SUSE OpenStack Cloud Technology Preview1 2 Containerized SUSE OpenStack Cloud Tech Preview2 2.1

Deployment Guide - SUSE OpenStack Cloud 7 · I ARCHITECTURE AND REQUIREMENTS 1 1 The ... 15.3 Upgrading from SUSE OpenStack Cloud 6 to SUSE OpenStack Cloud 7 261 ... SUSE® OpenStack

|OpenStack | - Introduction to OpenStack Cloud|