Continuous availability: from the shift paradigm to ... · 1/17/2018 · Simulation – continous...
Transcript of Continuous availability: from the shift paradigm to ... · 1/17/2018 · Simulation – continous...
Continuous availability: from the shift paradigm
to unmanned operation.
Pietro Tiberi
17 January 2018 – TIPS Contact Group
2
Agenda
Continuous availability: from the shift paradigm to unmanned operation
1
Introduction
2
Continuous
Availability
3
Results
4
Conclusions and perspective
3
Introduction TIPS Non functional requirements - Reliability / Availability
(RPO=0)
(RTO=15 minutes)
Transactions Lost
Downtime
99.9%
Continuous availability: from the shift paradigm to unmanned operation
4
Introduction Datacenter Operations
Continuous availability: from the shift paradigm to unmanned operation
Human based
(on shifts) Unmanned
5
CONTINUOUS OPERATION
Continuous availability: from the shift paradigm to unmanned operation
6
Continuous Availability From high availability to continuous availability
Continuous availability: from the shift paradigm to unmanned operation
o Redundancy
o Fault Tolerance
o Clustering
o Active Active configuration
o Proactive
monitoring
o Continuous
delivery
o Automatic
remediation
o Dynamic capacity
management
7
Continuous Availability Proactive Monitoring
Continuous availability: from the shift paradigm to unmanned operation
o Infrastructure monitoring
o Application monitoring o Detect events
before failures
o Trigger automatic
actions
o Analyze the event
8
Continuous Availability IT Automation
Continuous availability: from the shift paradigm to unmanned operation
9
Continuous Availability From Agile to Devops
Continuous availability: from the shift paradigm to unmanned operation
10
Continuous Availability DevOps - Everything as Code
Continuous availability: from the shift paradigm to unmanned operation
Code
Virtual Infrastructure
11
Continuous Availability Dynamic Capacity Management
Continuous availability: from the shift paradigm to unmanned operation
o Consumption
trend analysis
o Resource utilization
rate optimization o What if scenarios
o Predict future
requirements and
trends
12 Continuous availability: from the shift paradigm to unmanned operation
13
Test Plant Architecture
Continuous availability: from the shift paradigm to unmanned operation
Message Layer
Database Layer
User A User B
Message Router
Message Processor
Message Router
Kafka Broker
Aerospike Database
write
store store
write
write read
put
get
get
put
Application Layer
14
Results Test Architecture
Specific tests to verify the relevant
domain functions.
Common simulation layer to
reproduce real operational
environment.
executed on
Continuous availability: from the shift paradigm to unmanned operation
15
Results Simulation – continous delivery (1)
Normal traffic condition (500 msg/s), timeout = 10.000 ms
Kafka cluster rolling update
0 messages lost
0 timeout expired
Continuous availability: from the shift paradigm to unmanned operation
SIMUL.APP.01 : message latency (1 sec average)
16
Results
Continuous availability: from the shift paradigm to unmanned operation 07 November 2017 – CMG Impact 2017
SIMUL.APP.02 : message latency (1 sec average)
Simulation – continous delivery (2)
Heavy traffic condition (2000 msg/s), timeout = 10.000 ms
Kafka cluster rolling update
0 messages lost
some timeout expired
17
Results Simulation – proactive monitoring
Continuous availability: from the shift paradigm to unmanned operation
Normal traffic condition (500 msg/s)
average E2E processing time = 45 ms
High vCPU load added to Message Processor nodes.
T0-T1 below threshold
T2-T3 exceed threshold
18
Conclusions and perspective
Phased
Approach Bi-modal
Data Center
Tool
Continuous availability: from the shift paradigm to unmanned operation
Continuous availability: from the shift paradigm
to unmanned operation.
Pietro Tiberi ([email protected])
Thanks for your attention