PaaS evolution of Huawei IT System with Kubernetes v1.0schd.ws/hosted_files/lc3china2017/de/PaaS...

PaaS Evolution of Huawei IT System with Kubernetes

Zefeng Wang (Kevin), HuaweiJianlin Wu, Huawei

Talk Title HereSpeaker Name, Company

1. Overview

2. Practices

3. Outcome & Benefits

4. Issues & lessons learned

5. Looking Forward

6. Q&A

Agenda

Overview: Huawei IT introduction

2000+ Businesses Services

8 DC in use, across continents

Hundreds of thousands of VMs

170k+ users

thousands of tech-stacks

Russia

South Africa

UK

Middle East

South America

China East

MexicoChina South

Overview: Challenges

Traditional IT model, can’t respond agilely to business upgradesTo many manual approval processes, no DevOps

01

Large scale Deployment, hard to O&MVM scale growing fast, expensive to maintain

02

Micro Service transformation, instances multiply fastEach service adds 3 times more instances than before, IaaScan’t scale fast enough

03

Heavy weight virtualization, low resource utilityApp instance often takes whole VM, too much reserved resources

04

User experience of global services not good enoughNo global distributed routing, no guarantee of response latency

05

Overview: Roadmap

• Transformation from traditional, monolithic, SOA application (WAS) to Cloud Native, and move from IaaS to PaaS,

over 50% in production done already.

• Distributed / Micro-service arch, make instances multiply.

• Light-weight containerized Tomcat application become major choice of IT Business Services.

12000 20000 15000 12000 8000200016000

64000

128000

160000

2016 2017 2018 2019 2020

VM(WAS) Docker(Tomcat/MSA)1800 2000 1500 1200 800

2001000

4000

800010000

2 0 1 6 2 0 1 7 2 0 1 8 2 0 1 9 2 0 2 0

CONTAINERIZATION (DEPT P)

WAS Package Tomcat Package

Monolithic SOA Distributed Containerized Microservices

Overview: Architecture

Application-oriented “All Cloud” architecture for Enterprise IT

Ideal Delivery Financial . . .

CD Test Run

Huawei IT Apps

Dev CI

Web IDE

Micro-service

Framework

Building

Code Repository

Static Analysis

Pipeline

API Mgmt.

Container

Services

Middleware

Services

Testing IDE

Interface Testing

Production

Testing

Resource Config

Center

Logging &

Tracing

Monitoring &

Alerts

SCM

PaaS Core (3+1)

Application Scheduling &

Resource Management

Micro Service

Dev & Governance

Application

Development Pipeline

Cloud Native

Middleware Services

Disaster recovery DCActive/Active DC

DG NJSZ

Active/Active…1km

Kubernetes

… …

Kubernetes

Active/Passive

Service

Integration

Traffic Mgmt.

Accounting Mgmt.

Security Mgmt.

Ops Console

Integration

Gateway

Capacity Mgmt.

Health Mgmt.

Redundancy

Config Mgmt.

Self servicing

Kubernetes

Overview: Architecture

DC-UKDC-NJDC-DG

Portal

Cloud Controller

Cloud Service

Message Quene

Unified Topology Abstraction

Declarative workload distribution

Asynchronous communication

Modular architecture

Unified O&M

DC-SZCluster Controller

/ Federation

...

Manage 4000+ Node（VM PM ）. 20000+ Containers.

Practice: Unified Topology Abstraction

RegionAvailability Zone

Data Center

Cluster

Node

Simplify Management, unified Visualization, Scheduling, Monitoring, Tracing.

Cell

ENV Alias

Practice: Multi-tenancy & Security

KeyStone

k8s node1

(bound to ns A)

app1

k8s node2

(bound to ns A)

app3

k8s MasterELB

app2 app1

k8s node3

(bound to ns B)

app4

k8s node4

(bound to ns B)

app6app5 app4

Dedicated Nodes

Node is dedicated for a project/namespace, requests from node is limited to its corresponding namespace.

Applications of different projects are isolated since they run on different set of nodes

k8s node5

(shared)

app7

k8s node

(shared)

app8app8 app7

Shared Nodes (k8s default)

No nodes bound to any namespaces, pods of different namespaces can run on a same node.

PK/Cert based communicationauthN / authZ between components.

Portal / CLI

Token based user request authN / authZ

Ingress

controller

Embed role and namespace info into certs

Cert Manager PKI based token validation

Login & fetch token

Requestwithtoken

etcd

Encrypted secret in etcd

Practice: Active-active deployments

cluster Bcluster A

k8s node

app

1

k8s node

app

3

k8s master

ELB

app

2

app

1

k8s node

app

1

k8s node

app

2

app

3

app

1

Health

check

k8s master

• Two-level scheduling: 1) scheduling across clusters, 2) pod scheduling inside cluster

• Dispatcher split and call k8s API to create workload in clusters.

• User is able to create application across clusters.

Workload

dispatcher

Practice: Active-active deployments

cluster Bcluster A

k8s node

app

1

k8s node

app

3

k8s master

ELB

app

2

app

1

k8s node

app

1

k8s node

app

2

app

3

app

1

Health

check

k8s master

• Two-level scheduling: 1) scheduling across clusters, 2) pod scheduling inside cluster

• Dispatcher split and call k8s API to create workload in clusters.

• User is able to create application across clusters.

Workload

dispatcherFederation !

Practice: Process & Container in pods

Pods with Containers

API Server ETCD

Pods with Processes

Image & software

Repository

Cloud-native (Containerized) Applications

Run as pods with containers.

Legacy (Non-containerized) Applications

Running as special pods with processes

Extended runtime to reflect processes (run

packages directly on nodes)

Useful in cases that applications hard to

containerize, but want move to PaaS

Hybrid application scheduling and resource management

Bootstrap

Practice: Automated Cluster Deployment

1. Kubernetes on Kubernetes, easy to update tenant control plan

2. Agent management with xLet

3. Automatic machine taking over (VM/PM to node)

4. Integrated with IaaS CPI, dynamic VM provisioning

API Server

ETCD

Controllers, Scheduler

Software / Image

Repository

Master

ETCD

VM/PM

For tenant control plan

Tenant plan

1. Node install

Machine Controller

xLet

Practice: Integration with Custom Load Balancer

Kubernetes LB Ingress Controller

Redis

ELB killing Features：

• Enterprise grade support on WS-AT Transaction

Protocol for WebSphere applications

• Online dynamic route configuration, zero down

time

• Sticky session, health check…

• Multi DC Failover

• Policy Based traffic control for Canary release

Support 260+ domains, Access Traffic: 100 million/day

Practice: Network

Network sensitive & IP-Based only Applications:

Docker engine with Bridge network mode

Mapping container ports to host ports

Auto provisioning host port to Pod when started

on the node.

Host IP and host port injected as ENV

Pros: Low overhead, good performance

Cons: No services registration, discovery and multi-

tenancy

For Cloud-native Applications:

Kube-proxy(IPVS, Iptables) and service

iCAN network.

Traffic to Internet

Traffic across nodes

Traffic inside node

ID1 ID2

Node 1

POD 1 POD 2

Docker0

172.16.93.11/24

eth0 192.168.1.11

172.16.93.12/24

Docker

ID3 ID4

Node 2

POD 3 POD 4

Docker0

172.16.93.11/24

eth0 192.168.1.12

172.16.93.12/24

Docker

Physical Network / IaaS Network

Physical Network / IaaS Network

Practice: Storage

K8s node

VolumeHost Path

POD

Log

Heapdump

Other

POD

Config

Data

Other

CephFSFuison

Storage…

DriverNFS, Ceph, Fuxi

Cinder

HostPath：

For log, heapdump.

Cleanup with GC scripts.

Fuxi: https://github.com/openstack/fuxi

Huawei open source storage solution

integrate with Cinder / FusionStorage

For PV cases.

Practice: Scaling & Upgrade

Auto Scaling

Node Node Node

CPI

Kubernetes

API Server ETCDPaaS ELB

Upgrade

ingress

V1 V1 V1

V2 V2 V2

Canary release

V1 V1 V1

V2 V2 V2

Rolling update

1 2 3

1

…

Practice: Monitoring

Kafka Opentsdb/Hbase/ES

Prometheus ICAgent

Policy Engine

Big data

analysis

Alert

Events

visualizing

Events

Metrics CollectionPrometheus exporter

StorageOpentsdb/Hbase/ESMulti data centers

Intelligence analysisreal-time analysis, Big Data analyticsIntelligence alert and auto-scalling

VisualizationErrors, Warning percentage, resource pressure, etc. Heapster

Support tens of thousands of containers online monitoring.

Outcome & Benefits

50% less VMs than predicated, 3x resource utility, saving a lot of maintenanceContainerized applications on VM/PM instead of running directly & exclusively on VM.

01

Infrastructure agnostic application scheduling and resource managementShielding cloud provider, application runtime, network, storage

02

10x Dev efficiency, faster resource apply/approval (Day -> Sec), enables DevOpsUsed to take days to get a VM, now only need seconds to get a container03

Automated O&M, 90% less manual work.E.g. 50% less O&M engineers in Dept P04

Improved application availability, with Canary release solutionAdvanced traffic control and load balancing

05

Issues & lessons learned

Various node templates, hard to maintain: 1. use unified standard node template2. unified configuration management

Limited scale of services: 1. workaround with hostPort mapping2. use IPVS instead of IP-tables

Don’t forget log rotation: 1. docker logging driver -–log-driver json-file --log-opt max-size=10m --log-opt

max-file=3 Too many openfile in container:

1. docker daemon --default-ulimit nofile=20480:40960 nproc=10240:20480。 Devicemapper no space left:

1. direct lvm, loop device: online scalinghttps://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#how-the-devicemapper-storage-driver-works

Buggy application container cause node OOM: 1. set default memory limit for containers (limitrange)

https://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#how-the-devicemapper-storage-driver-works

Looking Forward

1. Federated Jobs

2. Horizontal Workload Scaling across Federation

3. Federated StatefulSets

4. Resource Quota Federation

5. IPVS-based in-cluster service load balancing

6. Multi-network plan

7. Automatic rolling back for failed deployments

8. In-place rolling update

9. …

Q&A

Q & A

Thank you!

PaaS evolution of Huawei IT System with Kubernetes v1.0schd.ws/hosted_files/lc3china2017/de/PaaS...

Documents

Transcript of PaaS evolution of Huawei IT System with Kubernetes v1.0schd.ws/hosted_files/lc3china2017/de/PaaS...