The Next Step ofOpenStack Evolution for NFV Deployments

26
The Next Step of OpenStack Evolution for NFV Deployments Dirk Kutscher NEC Chris Wright Red Hat

Transcript of The Next Step ofOpenStack Evolution for NFV Deployments

Page 1: The Next Step ofOpenStack Evolution for NFV Deployments

The Next Step ofOpenStack Evolutionfor NFV DeploymentsDirk Kutscher

NECChris Wright

Red Hat

Page 2: The Next Step ofOpenStack Evolution for NFV Deployments

Page 2 © NEC Corporation 2015

Intro

▌Dirk Chief Researcher for Networking @ NEC Laboratories Europe SDN Architect IRTF Information-Centric Networking RG OPNFV TSC

▌Chris Chief Technologist @ Red Hat Linux developer Cloud, KVM, network virtualization OpenDaylight & OPNFV Board

Page 3: The Next Step ofOpenStack Evolution for NFV Deployments

Page 3 © NEC Corporation 2015

NEC – Communications and IT Solutions

▌Cloud Infrastructure

▌Telecom networks and services

▌World‘s first commercial LTE deployment

▌World‘s first commercial vEPC deployment

▌Linux- (and generally OSS)-based product range

Page 4: The Next Step ofOpenStack Evolution for NFV Deployments

Page 4 © NEC Corporation 2015

NECNFV Platform

Page 5: The Next Step ofOpenStack Evolution for NFV Deployments

Page 5 © NEC Corporation 2015

NEC NFV Solutions History

Linux-based ATCA systems

First-generation of virtualized systems with proprietary resource

manager

OpenStack-based VIM and orchestration systems

OPNFV-based solutions

Page 6: The Next Step ofOpenStack Evolution for NFV Deployments

Page 6 © NEC Corporation 2015

Relevant Upstream Projects

Linux Kernel

KVM

OpenvSwitch

DPDKLibvirt

OpenStack

Neutron Nova

Page 7: The Next Step ofOpenStack Evolution for NFV Deployments

Page 7 © NEC Corporation 2015

Red Hat Upstream Leadership

❖ Red Hat has near 20 year history in open source, we have the experience and resources to:➢ Support production-ready customers globally➢ Drive new features➢ Influence strategy and direction of project➢ Enable partner collaboration

❖ Wide ranging participation, contrasts with most others who are more narrowly focused

❖ All of these efforts allows us to create an enterprise-grade distribution with ecosystem, lifecycle, and support that customers expect from Red Hat

Page 8: The Next Step ofOpenStack Evolution for NFV Deployments

Page 8 © NEC Corporation 2015

Red Hat Product Mapping

Linux Kernel

KVM

OpenvSwitch

DPDKLibvirt

OpenStack

Neutron Nova

RHEL w/KVM

RHEL OSP

Virt stack (QEMU + libvirt)

OVS + DPDK

SDN Controller CEPH

CloudForms

Compute Network Storage Management

Page 9: The Next Step ofOpenStack Evolution for NFV Deployments

Page 9 © NEC Corporation 2015

Working together inside OPNFV on Requirements

Virtual Network Functions

Orchestration and Management

Continuous Build and Integration

Continuous Deployment

and Testing

NFV/Platform Requirements

Upstream/Partner Projects

Compute Storage Network

Octopus/CI

Bootstrap/GetStarted

Pharos compliant lab …

FuncTest

QTip

Doctor

Promise

OVSKVM

OpenStack

OpenDaylight

Page 10: The Next Step ofOpenStack Evolution for NFV Deployments

Page 10 © NEC Corporation 2015

Requirements from a Telecommunications Perspective

▌General objectives for NFV-based networks1. Automation (deployment, life-cycle management, elasticity)

2. Flexbility – adding/removing new features fast

3. Cost efficiency (consolidation of functions onto fewer physical boxes)

Specific Requirements▌Availability and Fault Management

Faults can happen Detecting root-causes reliably and react quickly Minimizing downtime

▌Performance Balance virtualization with optimal resource usage

▌Multi-domain operation Extending NFV domains across DC boundaries

Page 11: The Next Step ofOpenStack Evolution for NFV Deployments

Page 11 © NEC Corporation 2015

Work Items for OpenStack

No.

Work Item Example

1 Detecting and Notifying about Hardware Failures

Reporting HW failures to Guest layer to initiate application failover

2 Collecting Information and Configuring VM Allocation

Correlation between vCPU and pCPU/NIC for pinning configuration

3 Multi-domain orchestration Multiple NFV domain interworking across DC networks

4 OpenStack (Controller Node) Availability

Support controller node failover, isolation of controller node failure from VM operation

5 Physical Server Scale-out Automatic PM set up including installation of agent software

6 Live System Upgrade Update mechanism minimizing impact to others

7 VM connectivity VLAN tagging usage, mapping to dedicated physical NIC to each virtual NW

8 VM Control Commands VM shutdown and reboot control from outside

Page 12: The Next Step ofOpenStack Evolution for NFV Deployments

Page 12 © NEC Corporation 2015

WI-O01: Detecting and Notifying about Hardware Failures

▌Infrastructure Failures Can and will always happen Want to avoid impact on (critical) service availability

▌ATCA approach Standby components Intensive monitoring Monitoring and management blades per box Integration into carriers‘ network management infrastructure

▌NFV and Cloud approach Have to maintain service availability levels Want to find appropriate telemetry and re-action approach ... Without losing benefits of virtualization and automation

Page 13: The Next Step ofOpenStack Evolution for NFV Deployments

Page 13 © NEC Corporation 2015

WI-O01: Detecting and Notifying about Hardware Failures

▌Physical Machine Failure Failure of Devices: CPU, Memory, Disks (IDE, SCSI, SAS), IPMB Bus, Fan,

Chipset, etc. Device warning: Temperature Anomaly, Abnormal Voltage, etc. System Error: Kernel, File System, Block Device, Boot, etc. State Problem (Notification) : NIC Link, M-State, etc.

▌Chassis Failure EM Card Error and Warning, Switch Module Failure, etc.

▌Storage Failure Controller, Physical DK, Logical DK, Power, FAN, Battery, Monitoring Bus, Bus

between shared DK, etc.

▌LAN Redundancy Error Problems reported in Health Check: LAN Port Error, Communication Error, etc.

Page 14: The Next Step ofOpenStack Evolution for NFV Deployments

Page 14 © NEC Corporation 2015

WI-O01: Detecting and Notifying about Hardware Failures

Mid WI-K03

ComputingHardware

StorageHardware

NetworkHardware

Hardware resources

Virtualisation LayerVirtualised

InfrastructureManager(s)

VNFManager(s)

VNF 2

OrchestratorOSS/BSS

NFVI

VNF 3VNF 1

Execution reference points Main NFV reference pointsOther reference points

Virtual Computing

Virtual Storage

Virtual Network

NFV Management and Orchestration

EMS 2 EMS 3EMS 1

Service, VNF and Infrastructure Description

Or-Vi

Or-Vnfm

Vi-Vnfm

Os-Ma

Se-Ma

Ve-Vnfm

Nf-Vi

Vn-Nf

Vl-Ha

Option1

Option2

How do HV hosts notify or relay H/W

failures to Guest OS?What is appropriate

notification I/F?

Execute recovery action(s)(e.g. Deactivate VNFC, recreate VM)

Execute recovery action(s)(e.g. Switch Over)

Notify H/W failure

Report HW failure to app (VNF instance) to initiate application failover1. Report HW failures directly from hypervisor to VMs2. Detect HW failures and report it to an orchestrator like Heat3. Use existing monitoring solutions, e.g., Zabbix, Nagios

Page 15: The Next Step ofOpenStack Evolution for NFV Deployments

Page 15 © NEC Corporation 2015

WI-O01: Detecting and Notifying about Hardware Failures

▌Option 1: reporting from HV to VM Relay Error Notification

• Relay an NIC error by setting tap devices down (by Neutron L2 Plugin Agent)• Emulate Error as Machine Check Exceptions (MCE)

Use qemu-guest-agent to send commands from HV to VM(s)• Requires extra packages to be added to guest OS

▌ Option 2: reporting to orchestrator Detect H/W failures (e.g. abnormal CPU temperature) by Ceilometer

agent(s) and report it to Orchestrator like Heat

▌ Option 3: Zabbix or Nagios

Page 16: The Next Step ofOpenStack Evolution for NFV Deployments

Page 16 © NEC Corporation 2015

WI-O01: Detecting and Notifying about Hardware Failures

▌Status as of Kilo / April 2015 Option 2

• Ceilometer Performance Improvements– Database data TTL (Juno)

» https://blueprints.launchpad.net/ceilometer/+spec/db-ttl » https://review.openstack.org/#/c/30635/

– Support Time To Live on Event Database» https://blueprints.launchpad.net/ceilometer/+spec/event-database-ttl» https://review.openstack.org/#/c/153943/ » https://review.openstack.org/#/c/146367/

– Time Series Database (Gnocchi)» https://wiki.openstack.org/wiki/Gnocchi

• OPNFV Doctor project identified requirements

• Russel Bryant’s Blog post– http://blog.russellbryant.net/2015/03/10/the-different-facets-of-openstack-ha/

Page 17: The Next Step ofOpenStack Evolution for NFV Deployments

Page 17 © NEC Corporation 2015

▌Performance requirements for virtualized carrier networks

WI-O02: Collecting Information and Configuring VM Allocation

Compute node

Socket #0 Socket #1

MemoryMemoryMemoryMemoryMemoryMemory

Core ID #0 Core ID #1 Core ID #0 Core ID #1

CPU #0(thread)

CPU #4(thread)

CPU #1(thread)

CPU #5(thread)

CPU #2(thread)

CPU #6(thread)

CPU #3(thread)

CPU #7(thread)

Node 0 Node 1

Page 18: The Next Step ofOpenStack Evolution for NFV Deployments

Page 18 © NEC Corporation 2015

WI-O02: Collecting Information and Configuring VM Allocation

A) Collect information of H/W resources

B) Configure VM allocation (e.g. specify pCPU as scheduler hint)

C) Allocating physical resources to specific VMs

1. CPU pinning

2. RAM allocation

3. NIC: Mapping to dedicated physical NIC to each virtualized network

Requirements: Compute node

Socket #0 Socket #1

MemoryMemoryMemoryMemoryMemoryMemory

Core ID #0 Core ID #1 Core ID #0 Core ID #1

CPU #0(thread)

CPU #4(thread)

CPU #1(thread)

CPU #5(thread)

CPU #2(thread)

CPU #6(thread)

CPU #3(thread)

CPU #7(thread)

Node 0 Node 1

Page 19: The Next Step ofOpenStack Evolution for NFV Deployments

Page 19 © NEC Corporation 2015

WI-O02: Collecting Information and Configuring VM Allocation

Compute node

Socket #0 Socket #1

MemoryMemoryMemoryMemoryMemoryMemory

Core ID #0 Core ID #1 Core ID #0 Core ID #1

CPU #0(thread)

CPU #4(thread)

CPU #1(thread)

CPU #5(thread)

CPU Node Core ID Status

0 0 0 VM0-vCPU0

1 0 1

2 1 0 VM0-vCPU1

3 1 1

4 0 0 VM1-vCPU0

5 0 1

6 1 0 disable(Reserve for Host OS)

7 1 1

CPU #2(thread)

CPU #6(thread)

CPU #3(thread)

CPU #7(thread)

Node 0 Node 1

CPU Resource Management Schema

Node Huge page size

Total pages

Availablepages

0 2M 80 40

1 1G 2 0

1 2M 40 40

CPU Architecture of Compute node

Memory Resource Management Schema

Compute Resource Management

Page 20: The Next Step ofOpenStack Evolution for NFV Deployments

Page 20 © NEC Corporation 2015

WI-O02: Collecting Information and Configuring VM Allocation

Resources control level

CPU pinning

avoid crossing NUMA Node

avoid sharingphysical core

0 disable disable disable1 enable disable disable2 enable enable disable3 enable disable enable4 enable enable enable

1. User sets “Resource Control level” for VMs

CoreCoreCore

NUMA Node 0 NUMA Node 1

6 7

VM2 (Level1or3)VM10 1

0 1

16GB 32GB

32GB 32GB

0 1 2 3

2 3 4 5

NUMA Node 0

2 3

NUMA Node 1

VM2(Level2or4)VM10 1

0 1

16GB 32GB

32GB 32GB

0 1 2 3

6 74 5

avoid crossing NUMA Node

Core6 7

VM10 1

0 1 2 3 4 5

avoid sharingphysical core

2

VM2(Level1or2)

0 1 2

CoreCoreCoreCore7

VM10 1

0 1 2 4 5

2

VM2(Level3or4)

0 1 2

63

Legend

Virtual CPU

Available CPU

Assigned CPU

Blocked CPU

VM2’s vCPU cannot share CPU with other VM’s vCPU..

Compute Resource Allocation

2. Orchestrator allocates compute resources to VM according to “Resource Control Level”

Page 21: The Next Step ofOpenStack Evolution for NFV Deployments

Page 21 © NEC Corporation 2015

WI-O02: Collecting Information and Configuring VM Allocation

▌Virt driver guest vCPU topology configuration (Implemented in Juno)[BP] https://blueprints.launchpad.net/nova/+spec/virt-driver-vcpu-topologyThis feature aims to give users the ability to control the vCPU topology through flavor and

image metadata.

▌Virt driver guest NUMA node placement & topology (Implemented in Kilo)[BP] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placementThis feature aims to enhance the libvirt driver to be able to do intelligent NUMA node

placement for guests.

▌Virt driver pinning guest vCPUs to host pCPUs (Implemented in Kilo)[BP] https://blueprints.launchpad.net/nova/+spec/virt-driver-cpu-pinningUser can specify preferred and max counts of sockets, cores and threads

▌Virt driver large page allocation for guest RAM (Implemented in Kilo)[BP] https://blueprints.launchpad.net/nova/+spec/virt-driver-large-pages

▌ I/O (PCIe) Based NUMA Scheduling (Implemented in Kilo)[BP] https://blueprints.launchpad.net/nova/+spec/input-output-based-numa-scheduling

Upstream development status

Page 22: The Next Step ofOpenStack Evolution for NFV Deployments

22

OPNFV Doctor: Fault management use case

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4

X1. Fault Monitoring

- Hardware fault- Hypervisor fault- Host OS fault

6. Execute Instruction- e.g. migrate VM

2. Inform the Consumer?If YES, find owner of

affected VMs from database

OpenStack Northbound Interface

3. FaultNotification(VM ID, Fault ID)

5. Instruction(VM ID)

4. Switch to SBY configuration

• VIM cannot detect certain NFVI faults; such is necessary to detect the faults and notify the Consumer in order to ensure the proper functioning of EPC VNFs like MME and S/P-GW

Page 23: The Next Step ofOpenStack Evolution for NFV Deployments

23

OPNFV Doctor: Maintenance use case

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4 6. Execute Instruction- e.g. migrate VM

OpenStack Northbound Interface

3. MaintenanceNotification(VM ID)5. Instruction

(VM ID)

4. Switch to SBY configuration

2. Which VMs are affected?Find Consumer owning the VM(s) from the database.

Administrator

1. MaintenanceRequest(Server S3)

• VIM needs to receive maintenance instructions from the Consumer, i.e. the operator/administrator of the VNF

Page 25: The Next Step ofOpenStack Evolution for NFV Deployments

Page 25 © NEC Corporation 2015 NEC Confidential

Summary and Outlook

▌Open Source NFV infrastructure vital for achieving agile development of robust, high-performance and open solutions

▌NFV platform spans across multiple Open Source projects

▌Red Hat and NEC: Upstream first approach

▌OpenStack Telco WG for developing/analyzing use cases within OpenStack community

▌OPNFV: implementing ETSI NFV framework, developing new requirements with an upstream first approach

Page 26: The Next Step ofOpenStack Evolution for NFV Deployments

Page 26 © NEC Corporation 2015

Thank you