PaaS evolution of Huawei IT System with Kubernetes v1.0schd.ws/hosted_files/lc3china2017/de/PaaS...
Transcript of PaaS evolution of Huawei IT System with Kubernetes v1.0schd.ws/hosted_files/lc3china2017/de/PaaS...
PaaS Evolution of Huawei IT System with Kubernetes
Zefeng Wang (Kevin), HuaweiJianlin Wu, Huawei
Talk Title HereSpeaker Name, Company
1. Overview
2. Practices
3. Outcome & Benefits
4. Issues & lessons learned
5. Looking Forward
6. Q&A
Agenda
Overview: Huawei IT introduction
2000+ Businesses Services
8 DC in use, across continents
Hundreds of thousands of VMs
170k+ users
thousands of tech-stacks
Russia
South Africa
UK
Middle East
South America
China East
MexicoChina South
Overview: Challenges
Traditional IT model, can’t respond agilely to business upgradesTo many manual approval processes, no DevOps
01
Large scale Deployment, hard to O&MVM scale growing fast, expensive to maintain
02
Micro Service transformation, instances multiply fastEach service adds 3 times more instances than before, IaaScan’t scale fast enough
03
Heavy weight virtualization, low resource utilityApp instance often takes whole VM, too much reserved resources
04
User experience of global services not good enoughNo global distributed routing, no guarantee of response latency
05
Overview: Roadmap
• Transformation from traditional, monolithic, SOA application (WAS) to Cloud Native, and move from IaaS to PaaS,
over 50% in production done already.
• Distributed / Micro-service arch, make instances multiply.
• Light-weight containerized Tomcat application become major choice of IT Business Services.
12000 20000 15000 12000 8000200016000
64000
128000
160000
2016 2017 2018 2019 2020
VM(WAS) Docker(Tomcat/MSA)1800 2000 1500 1200 800
2001000
4000
800010000
2 0 1 6 2 0 1 7 2 0 1 8 2 0 1 9 2 0 2 0
CONTAINERIZATION (DEPT P)
WAS Package Tomcat Package
Monolithic SOA Distributed Containerized Microservices
Overview: Architecture
Application-oriented “All Cloud” architecture for Enterprise IT
Ideal Delivery Financial . . .
CD Test Run
Huawei IT Apps
Dev CI
Web IDE
Micro-service
Framework
Building
Code Repository
Static Analysis
Pipeline
API Mgmt.
Container
Services
Middleware
Services
Testing IDE
Interface Testing
Production
Testing
Resource Config
Center
Logging &
Tracing
Monitoring &
Alerts
SCM
PaaS Core (3+1)
Application Scheduling &
Resource Management
Micro Service
Dev & Governance
Application
Development Pipeline
Cloud Native
Middleware Services
Disaster recovery DCActive/Active DC
DG NJSZ
Active/Active…1km
Kubernetes
… …
Kubernetes
Active/Passive
Service
Integration
Traffic Mgmt.
Accounting Mgmt.
Security Mgmt.
Ops Console
Integration
Gateway
Capacity Mgmt.
Health Mgmt.
Redundancy
Config Mgmt.
Self servicing
Kubernetes
Overview: Architecture
DC-UKDC-NJDC-DG
Portal
Cloud Controller
Cloud Service
Message Quene
Unified Topology Abstraction
Declarative workload distribution
Asynchronous communication
Modular architecture
Unified O&M
DC-SZCluster Controller
/ Federation
...
Manage 4000+ Node(VM PM ). 20000+ Containers.
Practice: Unified Topology Abstraction
RegionAvailability Zone
Data Center
Cluster
Node
Simplify Management, unified Visualization, Scheduling, Monitoring, Tracing.
Cell
ENV Alias
Practice: Multi-tenancy & Security
KeyStone
k8s node1
(bound to ns A)
app1
k8s node2
(bound to ns A)
app3
k8s MasterELB
app2 app1
k8s node3
(bound to ns B)
app4
k8s node4
(bound to ns B)
app6app5 app4
Dedicated Nodes
Node is dedicated for a project/namespace, requests from node is limited to its corresponding namespace.
Applications of different projects are isolated since they run on different set of nodes
k8s node5
(shared)
app7
k8s node
(shared)
app8app8 app7
Shared Nodes (k8s default)
No nodes bound to any namespaces, pods of different namespaces can run on a same node.
PK/Cert based communicationauthN / authZ between components.
Portal / CLI
Token based user request authN / authZ
Ingress
controller
Embed role and namespace info into certs
Cert Manager PKI based token validation
Login & fetch token
Requestwithtoken
etcd
Encrypted secret in etcd
Practice: Active-active deployments
cluster Bcluster A
k8s node
app
1
k8s node
app
3
k8s master
ELB
app
2
app
1
k8s node
app
1
k8s node
app
2
app
3
app
1
Health
check
k8s master
• Two-level scheduling: 1) scheduling across clusters, 2) pod scheduling inside cluster
• Dispatcher split and call k8s API to create workload in clusters.
• User is able to create application across clusters.
Workload
dispatcher
Practice: Active-active deployments
cluster Bcluster A
k8s node
app
1
k8s node
app
3
k8s master
ELB
app
2
app
1
k8s node
app
1
k8s node
app
2
app
3
app
1
Health
check
k8s master
• Two-level scheduling: 1) scheduling across clusters, 2) pod scheduling inside cluster
• Dispatcher split and call k8s API to create workload in clusters.
• User is able to create application across clusters.
Workload
dispatcherFederation !
Practice: Process & Container in pods
Pods with Containers
API Server ETCD
Pods with Processes
Image & software
Repository
Cloud-native (Containerized) Applications
Run as pods with containers.
Legacy (Non-containerized) Applications
Running as special pods with processes
Extended runtime to reflect processes (run
packages directly on nodes)
Useful in cases that applications hard to
containerize, but want move to PaaS
Hybrid application scheduling and resource management
Bootstrap
Practice: Automated Cluster Deployment
1. Kubernetes on Kubernetes, easy to update tenant control plan
2. Agent management with xLet
3. Automatic machine taking over (VM/PM to node)
4. Integrated with IaaS CPI, dynamic VM provisioning
API Server
ETCD
Controllers, Scheduler
Software / Image
Repository
Master
ETCD
VM/PM
For tenant control plan
Tenant plan
1. Node install
Machine Controller
xLet
Practice: Integration with Custom Load Balancer
Kubernetes LB Ingress Controller
Redis
ELB killing Features:
• Enterprise grade support on WS-AT Transaction
Protocol for WebSphere applications
• Online dynamic route configuration, zero down
time
• Sticky session, health check…
• Multi DC Failover
• Policy Based traffic control for Canary release
Support 260+ domains, Access Traffic: 100 million/day
Practice: Network
Network sensitive & IP-Based only Applications:
Docker engine with Bridge network mode
Mapping container ports to host ports
Auto provisioning host port to Pod when started
on the node.
Host IP and host port injected as ENV
Pros: Low overhead, good performance
Cons: No services registration, discovery and multi-
tenancy
For Cloud-native Applications:
Kube-proxy(IPVS, Iptables) and service
iCAN network.
Traffic to Internet
Traffic across nodes
Traffic inside node
ID1 ID2
Node 1
POD 1 POD 2
Docker0
172.16.93.11/24
eth0 192.168.1.11
172.16.93.12/24
Docker
ID3 ID4
Node 2
POD 3 POD 4
Docker0
172.16.93.11/24
eth0 192.168.1.12
172.16.93.12/24
Docker
Physical Network / IaaS Network
Physical Network / IaaS Network
Practice: Storage
K8s node
VolumeHost Path
POD
Log
Heapdump
Other
POD
Config
Data
Other
CephFSFuison
Storage…
DriverNFS, Ceph, Fuxi
Cinder
HostPath:
For log, heapdump.
Cleanup with GC scripts.
Fuxi: https://github.com/openstack/fuxi
Huawei open source storage solution
integrate with Cinder / FusionStorage
For PV cases.
Practice: Scaling & Upgrade
Auto Scaling
Node Node Node
CPI
Kubernetes
API Server ETCDPaaS ELB
Upgrade
ingress
V1 V1 V1
V2 V2 V2
Canary release
V1 V1 V1
V2 V2 V2
Rolling update
1 2 3
1
…
Practice: Monitoring
Kafka Opentsdb/Hbase/ES
Prometheus ICAgent
Policy Engine
Big data
analysis
Alert
Events
visualizing
Events
Metrics CollectionPrometheus exporter
StorageOpentsdb/Hbase/ESMulti data centers
Intelligence analysisreal-time analysis, Big Data analyticsIntelligence alert and auto-scalling
VisualizationErrors, Warning percentage, resource pressure, etc. Heapster
Support tens of thousands of containers online monitoring.
Outcome & Benefits
50% less VMs than predicated, 3x resource utility, saving a lot of maintenanceContainerized applications on VM/PM instead of running directly & exclusively on VM.
01
Infrastructure agnostic application scheduling and resource managementShielding cloud provider, application runtime, network, storage
02
10x Dev efficiency, faster resource apply/approval (Day -> Sec), enables DevOpsUsed to take days to get a VM, now only need seconds to get a container03
Automated O&M, 90% less manual work.E.g. 50% less O&M engineers in Dept P04
Improved application availability, with Canary release solutionAdvanced traffic control and load balancing
05
Issues & lessons learned
Various node templates, hard to maintain: 1. use unified standard node template2. unified configuration management
Limited scale of services: 1. workaround with hostPort mapping2. use IPVS instead of IP-tables
Don’t forget log rotation: 1. docker logging driver -–log-driver json-file --log-opt max-size=10m --log-opt
max-file=3 Too many openfile in container:
1. docker daemon --default-ulimit nofile=20480:40960 nproc=10240:20480。 Devicemapper no space left:
1. direct lvm, loop device: online scalinghttps://docs.docker.com/engine/userguide/storagedriver/device-mapper-driver/#how-the-devicemapper-storage-driver-works
Buggy application container cause node OOM: 1. set default memory limit for containers (limitrange)
Looking Forward
1. Federated Jobs
2. Horizontal Workload Scaling across Federation
3. Federated StatefulSets
4. Resource Quota Federation
5. IPVS-based in-cluster service load balancing
6. Multi-network plan
7. Automatic rolling back for failed deployments
8. In-place rolling update
9. …
Q&A
Q & A
Thank you!