Monitoring Redefined - Austrian Testing Board

91
Klaus Enzenhofer Director Technology Strategy Monitoring Redefined klaus-enzenhofer @kenzenhofer

Transcript of Monitoring Redefined - Austrian Testing Board

Page 1: Monitoring Redefined - Austrian Testing Board

Klaus Enzenhofer

Director Technology Strategy

Monitoring Redefined

klaus-enzenhofer

@kenzenhofer

Page 2: Monitoring Redefined - Austrian Testing Board
Page 3: Monitoring Redefined - Austrian Testing Board
Page 4: Monitoring Redefined - Austrian Testing Board
Page 5: Monitoring Redefined - Austrian Testing Board

confidential

Page 6: Monitoring Redefined - Austrian Testing Board
Page 7: Monitoring Redefined - Austrian Testing Board

The 4 Core KPIs of Monitoring

Page 8: Monitoring Redefined - Austrian Testing Board

confidential

#1: Business KPI

9

Page 9: Monitoring Redefined - Austrian Testing Board
Page 10: Monitoring Redefined - Austrian Testing Board

confidential

Watch your business success!

Business KPI

Page 11: Monitoring Redefined - Austrian Testing Board

What‘s next?

Page 12: Monitoring Redefined - Austrian Testing Board
Page 13: Monitoring Redefined - Austrian Testing Board

14

#2: Availability KPI

Page 14: Monitoring Redefined - Austrian Testing Board
Page 15: Monitoring Redefined - Austrian Testing Board

Ad

on

air

Page 16: Monitoring Redefined - Austrian Testing Board

Payment Service

Transaction Service

Login Service

Balance Service

User Service

Customer View

SLA

00:00 23:59

?

98%

100%

99%

95%

99%

Page 17: Monitoring Redefined - Austrian Testing Board

Watch your Availability!

Page 18: Monitoring Redefined - Austrian Testing Board

What‘s next?

Page 19: Monitoring Redefined - Austrian Testing Board
Page 20: Monitoring Redefined - Austrian Testing Board

confidential

Page 21: Monitoring Redefined - Austrian Testing Board

confidential

Page 22: Monitoring Redefined - Austrian Testing Board

IE6/IE7

NO reload button

Page 23: Monitoring Redefined - Austrian Testing Board

25

Example of error:

What you see here, is the CUSS status 309

Approximately 20 minutes before 309 there is the last customer interaction

The fix:Improved the code to prevent this freeze situation.

Page 24: Monitoring Redefined - Austrian Testing Board

26

CK – Business KPI Dashboard

Page 25: Monitoring Redefined - Austrian Testing Board

Watch your Errors!

Page 26: Monitoring Redefined - Austrian Testing Board

What‘s next?

Page 27: Monitoring Redefined - Austrian Testing Board

4.5 sec 15 sec

Why?

Page 28: Monitoring Redefined - Austrian Testing Board

Network

Same Page

4.5 sec 15 secSanity Check

Browser CheckChrome 49 Chrome Mobile 33

Server Side

Local WLANLocal WLAN

Only difference is Browser & Device

Page 29: Monitoring Redefined - Austrian Testing Board

confidential

Why did they look at the performance on the mobile device?

Page 30: Monitoring Redefined - Austrian Testing Board

Change in their compensations plan!

Page 31: Monitoring Redefined - Austrian Testing Board

Contract SLA: Average Response Time < 3 sec

User

on Desktop + Mobile

Page 32: Monitoring Redefined - Austrian Testing Board

Good idea?!

Page 33: Monitoring Redefined - Austrian Testing Board
Page 34: Monitoring Redefined - Austrian Testing Board

Let‘s take a look at the timings!Navigation Start: 0 ms

Domain Lookup End: 269 ms

Connect End: 330 ms

Response Start: 517 ms

Response End: 518 ms

Dom Loading: 519 ms

Dom Interactive: 519 ms

DomContentLoaded Event End: 520 ms

Dom Complete: 520 ms

Page 35: Monitoring Redefined - Austrian Testing Board

0.5 sec 0.5 sec

Developer

Page 36: Monitoring Redefined - Austrian Testing Board

User

Page 37: Monitoring Redefined - Austrian Testing Board
Page 38: Monitoring Redefined - Austrian Testing Board

285 Resources for an initial Page Load

151 CSS and 121 JavaScript files

Page 39: Monitoring Redefined - Austrian Testing Board

~200 Resources had larger Header than Body

Page 40: Monitoring Redefined - Austrian Testing Board

The CDN bill exploded!

Page 41: Monitoring Redefined - Austrian Testing Board

htt

ps:

//w

hat

do

esm

ysit

eco

st.c

om

Page 42: Monitoring Redefined - Austrian Testing Board
Page 43: Monitoring Redefined - Austrian Testing Board

http://cdn.shopify.com/s/files/1/1462/9702/articles/26_cangoroo_1024x1024.jpg?v=1473016235

Page 44: Monitoring Redefined - Austrian Testing Board

Back Home

Page 45: Monitoring Redefined - Austrian Testing Board

Back Home

Page 46: Monitoring Redefined - Austrian Testing Board

HTTP Archive – Transfer Size Trend

http://httparchive.org/trends.php

Average Size ~2 500 KB By 1.6 € per 100 KB

40 € to get started!!!!

Page 47: Monitoring Redefined - Austrian Testing Board

#4: Performance KPI

Page 48: Monitoring Redefined - Austrian Testing Board

confidential

Monitoring needs to cover:

Business Results

Availability

Errors

Performance

Page 49: Monitoring Redefined - Austrian Testing Board
Page 50: Monitoring Redefined - Austrian Testing Board
Page 51: Monitoring Redefined - Austrian Testing Board

Monitoring used to

be about looking at

dashboards …

Page 52: Monitoring Redefined - Austrian Testing Board

Process Memory (GB)

CPU Graphs (%)

Page 53: Monitoring Redefined - Austrian Testing Board

.. and about

analyzing logs &

exceptions …

Page 54: Monitoring Redefined - Austrian Testing Board

confidential

Top Exceptions

Top Logs

Page 55: Monitoring Redefined - Austrian Testing Board

But the apps and

services we build

have transformed to

something more

dynamic…

Page 56: Monitoring Redefined - Austrian Testing Board

confidential

Develop

Ship

Deploy

Run

Scale

Compute

nodejs mongo db netty cassandra redis

ansible jenkins puppet chef

docker cloudfoundry rh openshift rh atomic rocket

core os rancher kvm busybox

mesos marathon kubernetes swarm

Amazon azure openstack mesosphere calico weave

eureka/hystrix

A whole new technology stack & polyglot development

AmazonDynamoDB AWS Lambda

AWS CodeDeploy

Amazon EC2 Container Services

Amazon EC2

AWS ElasticBeanstalk

Amazon API Gateway

Page 57: Monitoring Redefined - Austrian Testing Board

confidential

Granularity

Page 58: Monitoring Redefined - Austrian Testing Board

Granularity

Doc Processor Doc Transformer Doc Signer

Doc Encryption

Doc Shipment

Document Encryption is carved out at a separate service. May not be the best option to run it as a separate service

Documents

Page 59: Monitoring Redefined - Austrian Testing Board

confidential

Tight Coupling

Page 60: Monitoring Redefined - Austrian Testing Board

Tightly coupled. Really Distributed?

Page 61: Monitoring Redefined - Austrian Testing Board

confidential

Inefficient Service Flow (drawing parallels to Web Performance

Optimization)

Page 62: Monitoring Redefined - Austrian Testing Board

SFPO (Service Flow&Performance Optimization) has to teach us how to optimize (micro)service

dependencies through Service Flows

Page 63: Monitoring Redefined - Austrian Testing Board

Especially useful to identify: inefficient 3rd party services, recursive call chains, N+1 Query Patterns, loading too much data, no data

caching, … -> sounds very familiar to WPO

Page 64: Monitoring Redefined - Austrian Testing Board

Classical cascading effect of recursive service calls!

Page 65: Monitoring Redefined - Austrian Testing Board

THIS IS WHY

monitoring had to

transform as well

Page 66: Monitoring Redefined - Austrian Testing Board

2 major releases/year

customers deploy & operate on-prem

26 major releases/year

500 prod deployments/dayself-service online sales SaaS & Managed

2011 2016

sprint releases (continuous-delivery)

1h : Code -> Prod6 monthsmajor/minor release

Page 67: Monitoring Redefined - Austrian Testing Board

Monitoring as Pipeline & Platform Feature

Dev Perf/Test Ops Biz

Faster Innovation with Quality Gates

Faster Acting on Feedback

Unit Perf

Cont. Perf

New Deploy

New Capability

CI CD Remove/Promote

Triage/Optimize

Update Tests

Innovate/Design$$$

Lower Costs

Happy Users

Page 68: Monitoring Redefined - Austrian Testing Board

acting as

Engineers

Role of Dynatrace DevOps Team

Dynatrace Managed/SaaS

Orchestration Layer

Dynatrace Pipeline Visualization

Deployment Timeline

Log Overview

using Dynatrace Log APIJIRA Integrations

&

Product Managers

Page 69: Monitoring Redefined - Austrian Testing Board

Shift-Left Continuous Performance with Dynatrace

“Performance Signature”for Build Nov 16

“Performance Signature” for Build Nov 17

Page 70: Monitoring Redefined - Austrian Testing Board

Learnings when scaling DevOps Pipelines

Feature Team A

Feature Team B

Feature Team X

Improve “Efficiency”

Cloud Ops

Ensure “Operational Service”

PM/Biz

Imp

rove

“B

usin

ess”

Page 71: Monitoring Redefined - Austrian Testing Board

Dynatrace Transformation by the numbers

26

500

Releases / Year

Deployments / Day

31000 60hUnit & Int Tests / hour UI Tests per Build

More Quality

~120 340Code commits / day Stories per sprint

More Agile

93%Production bugs found

by Dev

More Stability 450 99.998%Global EC2 Instances Global Availability

Page 72: Monitoring Redefined - Austrian Testing Board

High Performers vs Low Performers: Speed Gap Closing but Quality Gap Increasing

https://puppet.com/resources/whitepaper/2017-state-devops-report/

Page 73: Monitoring Redefined - Austrian Testing Board

BizDevOps Adoption Challenges

Technical Complexity DevOps promotes choice:“the best stack for your problem”

Bad Data & Code Quality DevOps today mainly driven by Biz “faster to market” but not “quality to market”

Data & Department Silos DevOps promotes small & agile: “2 Pizza Teams”, “Services”, “Containers”

IDG Research: April 2017 - http://www.computerwoche.de/a/digitale-kundenbeziehung-keine-halben-sachen,3330524,2

https://www.dynatrace.com/blog/devops-adoption-challenges-from-around-the-world/

Page 74: Monitoring Redefined - Austrian Testing Board

The reason why:Different Perspective from Biz and DevOps

Page 75: Monitoring Redefined - Austrian Testing Board
Page 76: Monitoring Redefined - Austrian Testing Board

Marketing Analysts

Executives

Search Engine Optimization

Security team

Business AnalyticsFraud Detection

UX-Designer

App Owner

CxO Customer Success Team

Page 77: Monitoring Redefined - Austrian Testing Board
Page 78: Monitoring Redefined - Austrian Testing Board

Biz View: Airline – Platinum Member Traveling

Book Flight

Check-In

Stop in Lounge

Inflights

Max Platinum

Going on a Trip

Page 79: Monitoring Redefined - Austrian Testing Board

Team Individual Pipe Cycle Time Monitoring

Geo Service Team Weekly

Product Service Team Every Sprint

Book Service Team Daily

Auth Service Team On-Demand

Mobile App Team Monthly

Dev View: Airline – Platinum Member Traveling

Page 80: Monitoring Redefined - Austrian Testing Board

Team Individual Pipe

Payment Service A

Check in Service B

Passport Service C

Baggage Service D

Check in Service X

ISSUE! Max Platinum Can Not Check In!

Page 81: Monitoring Redefined - Austrian Testing Board

Silo #4

Silo #3

Silo #2

Silo #1

Are we making MONEY with Max?Which digital touchpoints is MAX using?

System Availability Errors PerformanceBusiness ResultDigital Touchpoints:

Mobile AppDesktop Web

Kiosk AppPoS-System

Voice Interfaces (Alexa,...)Rich Client App

System Availability Errors PerformanceBusiness Result

System Availability Errors PerformanceBusiness Result

System Availability Errors PerformanceBusiness Result

Silo #6 Silo #7 Silo #8

Page 82: Monitoring Redefined - Austrian Testing Board

12-01-2011IAR - Version 0.91

84

Page 83: Monitoring Redefined - Austrian Testing Board

confidential

85

Page 84: Monitoring Redefined - Austrian Testing Board

86

Page 85: Monitoring Redefined - Austrian Testing Board

confidential

So what should we do now?

Page 86: Monitoring Redefined - Austrian Testing Board

confidential

Have a BIG vision

Page 87: Monitoring Redefined - Austrian Testing Board

confidential

We need to answer the same questions for ALL touchpoints

System Availability Errors PerformanceBusiness ResultDigital Touchpoints:

Mobile AppDesktop Web

Kiosk AppPoS-System

Voice Interfaces (Alexa,...)Rich Client App

System Availability Errors PerformanceBusiness Result

System Availability Errors PerformanceBusiness Result

System Availability Errors PerformanceBusiness Result

Page 88: Monitoring Redefined - Austrian Testing Board

Digital Touchpoints:Mobile App

Desktop Web

Locations:Vienna, Austria Store Salzburg

Check-in Terminal A FRAConstruction Site ABC, India

Device:Mobile

Mobile Broswer

Kiosk AppPoS System

Voice Interfaces (Alexa,...)Rich Client App

Smart WatchATM

Car Entertainment SystemTV

Page 89: Monitoring Redefined - Austrian Testing Board

confidential

OpsDev

Biz

Collaboration based on Consistent Data

Page 90: Monitoring Redefined - Austrian Testing Board

Act tomorrow locally!

Establish a quality gate beyond functional health

Introduce monitoring early in the pipeline

Chart your money making step/action

Take a look the 4 Key KPIs and check them

Make the KPIs available to others

Start with a minimal DevOps

Check your monitoring solution future readiness

No Monitoring in place? – Checkout Dynatrace

Page 91: Monitoring Redefined - Austrian Testing Board

Klaus Enzenhofer

Director Technology Strategy

Monitoring redefined

klaus-enzenhofer

@kenzenhofer