Measure() or die()

Post on 14-Apr-2017

115 views 0 download

Transcript of Measure() or die()

By Arik Lerner Team Lead Automation & Performance/Resilience

Measure() OR Die();

Measure or Die

- 3.5 years in Liveperson

- 2 years - Reporting Platform

- 1.5 years Team Lead Automation & Performance/Resilience

- Interests: Private pilot on Cessna 172

Bio

➔ How we monitor with e2e testing

➔ E2E Products & Persona’s

➔ The Awakens of the End2End Data

➔ Architecture & Life cycle

Meetup Agenda

About Liveperson

Liveperson transforms theconnection between brands and

consumers.

3BN Visits/month

200BN API calls/month

2 PB data a year

1.5 M Visits concurrent

Our Scale

Our Engineering

~200 people RnD

Constant innovation

Multiple Technologies

Fast release cycle

We Monitor Liveperson Services

By e2e tests which simulate Real Business scenario

➔ Indicates real business problems

➔ Service availability from consumer eyes.

➔ Alert and acquire immediate action.

➔ Insight on our business services

Agent Login Enter into the system

Visitor init chatVisitor enter into site

Agent Chat

E2E Scenario Example

E2E customers expectations

➔ Stability == TRUST

➔ Investigatable

➔ Service Coverage

➔ Scale

E2E

E2E Dashboard Statistics

Real Time Dashboard

Kibana - HAR statistics & Aggregation

E2E Persona’s

Production specialist

PMO

Management

This is Yossi.When Yossi gets up in the morning Yossi looks at the E2E RT dashboardYossi recognize failureYossi enters into E2E debug center toolsYossi is smart!Be like Yossi.

Production Specialist User Story

PMO User Story

This is Michal.Before any software deployment When dashboard failure rate is below 3%Michal have a GO for deploymentMichal is smart!Be like Michal.

Management story

This is Eli.When Eli getup in the morning.Eli looks into the Dashboard statisticsEli can see the health and availabilityEach Data CentersEli is smart!Be like Eli.

➔ Total failures rate.

◆ Filter for each Data Center

◆ Filter each business flow

KPIs

➔ Trend to understand service stability

Widgets

What KPIs do I need to measure ?

➔ Total chats failure rate.

➔ Total missing engagements

➔ Total login failures

➔ Average login response time.

KPIs

➔ Failure cause break down

➔ Client location root cause

➔ Test scenario failures

Widgets

What KPIs do I need to measure ?

The Awakening of the End2End Data

Start collecting the data!

➔ Get build failures/success

➔ Get failure cause

➔ Business flows

➔ Test duration

➔ Client location

➔ Data Center location

➔ Account

@Test

Raw Data Output

The HTTP Archive format or HAR, is a JSON-formatted archive file format for logging of a web browser's

interaction with a site. The common extension for these files is .har.

The specification for the HTTP Archive (HAR) format defines an archival format for HTTP transactions that can

be used by a web browser to export detailed performance data about web pages it loads. The specification for

this format is produced by the Web Performance Working Group [1] of the World Wide Web Consortium (W3C).

The specification is in draft form and is a work in progress.

HAR (Http Archive)

➔Logging web browser traffic

HAR proxy diagram

Proxy on port XXX

Selenium WebDriver

HAR

www.Liveperson.com

Request passes through proxy

Based on BrowserMob embedded proxy server

Code snippet - adding proxy into Selenium

• N scenarios• Running from M locations • Running to X Data Centers • Yields HAR Data

Question: how do we investigate the data for the entire Farm/Location/Scenario ? etc...

Answer: aggregation.

Pop quiz:

Start with collecting the data!

@Test

Raw Data Output { metaData:{ "Testname": ChatFlow, "Account": qa12345, "ClientLocation": US, "DataCenter": UK, }}

MetadataHAR

Kafka (topic e2e)

Logstash + Elasticsearch

Kibana Dashboard

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

HAR files@Test @Test

HAR Processor

Files Output Get Json

Send data

Code snippet send message into Kafka

Our benefits➔ Data Retention - 30 days

➔ Ability to query and aggregate over the data for investigation

➔ Ability to build dashboards

➔ Access to the data thorough Elasticsearch APIs

ELK & HAR Downsides➔ Complicated queries over Kibana

➔ ELK setup & maintenance

➔ When getting response timeout -> HAR displayed enormous number (need to be handled by code)

What more E2E outputs do we have ?

@Test

More Output BDD ReportsVideoLogsBrowser console logs

Code snippet

BDD - Behaviour Driven Development

MySql DB KAFKA + ELK

Kibana service E2E Reports

HAR datae2e data

Graphite

Zabbix

Jenkins Master

Production

metrics

Grafana

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

Jenkins

Slave

DC-1 DC-2 DC-N

@Test @Test

RT Dashboard

Jenkins Master DR

E2E Test Lifecycle

DEV ProductionStagingQADEV

E2E @ Scale

E2E @ Scale➔ 1.5M http traffic records per day

➔ 200K runs per day

➔ 60 Jenkins slaves machines

➔ 28 scenarios

➔ 6 client location

➔ 6 Regions

What to take home ?

➔ Monitor your Data Centers from consumer experience

➔ Collect data

➔ Provide business meaning with the data.

THANK YOU!We are hiring

YouTube.com/LivePersonDev

Twitter.com/LivePersonDev

Facebook.com/LivePersonDev

Slideshare.net/LivePersonDev