Performance testing in scope of migration to cloud by Serghei Radov

67
Performance Testing in scope of migration to Cloud Serghei Radov

Transcript of Performance testing in scope of migration to cloud by Serghei Radov

Page 1: Performance testing in scope of migration to cloud by Serghei Radov

Performance Testing in scope of migration to CloudSerghei Radov

Page 2: Performance testing in scope of migration to cloud by Serghei Radov

Serghei Radov

Current position:Senior Performance Engineer at Lohika

Contacts : [email protected] Github: github.com/grinslifeSkype : serghei.radov

Page 3: Performance testing in scope of migration to cloud by Serghei Radov

AGENDA● Cloud computing principles ● Challenges● Performance testing as part migration process● What toolset could be used ? ● How to avoid common pitfalls ?● Does the "90 percentile" really work?● What will be the cost of performance testing

toolset?

Page 4: Performance testing in scope of migration to cloud by Serghei Radov
Page 5: Performance testing in scope of migration to cloud by Serghei Radov

● Multi-tenancy ● Statistical multiplexing ● Horizontal scalability ● Data partitioning● Consistent hashing ● Eventual consistency

Cloud computing principles

Page 6: Performance testing in scope of migration to cloud by Serghei Radov

Multi-tenancy

Page 7: Performance testing in scope of migration to cloud by Serghei Radov

Statistical multiplexing

Page 8: Performance testing in scope of migration to cloud by Serghei Radov

Horizontal scalability

Page 9: Performance testing in scope of migration to cloud by Serghei Radov

Data partitioning

Page 10: Performance testing in scope of migration to cloud by Serghei Radov

Eventual consistency

Page 11: Performance testing in scope of migration to cloud by Serghei Radov

Cloud performance challenges

● Over provisioning● Under provisioning● ELB network traffic issues ● Availability and Reliability

Page 12: Performance testing in scope of migration to cloud by Serghei Radov

Over provisioning

Page 13: Performance testing in scope of migration to cloud by Serghei Radov

Under provisioning

Page 14: Performance testing in scope of migration to cloud by Serghei Radov

Solution for effective provisioning

Predictive auto-scaling

Scale up early,

Scale down slowly

Use time as a proxy

Machine learning

Page 15: Performance testing in scope of migration to cloud by Serghei Radov

Netflix’s Predictive Scaling Engine

Page 16: Performance testing in scope of migration to cloud by Serghei Radov

Predictive Auto Scaling Engines tools

Scryer

Elastisys

AppDynamics

VMTurbo

Rancher

Page 17: Performance testing in scope of migration to cloud by Serghei Radov

Multi cloud or hybrid cloud

Multiple Availability Zones

Zones independence

Deploy at multiple regions

Employ solid backup and

Recovery strategies

Some tips

Page 18: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria ➢ Select tools for monitoring and

testing➢ Discuss capacity planning

responsibilities ➢ Workload Characterization➢ Test tools for testing➢ Run tests, analyze, scale, re-run <-

cycle ➢ Report to stakeholders

Page 19: Performance testing in scope of migration to cloud by Serghei Radov

Define performance tests SLA

StatefulnessResponse timeTime-out Exceptions that can be included in

the SLA:FailureNetwork issuesDenial of serviceScheduled maintenance

Page 20: Performance testing in scope of migration to cloud by Serghei Radov

New Relic Response times

Page 21: Performance testing in scope of migration to cloud by Serghei Radov

NRQL - NewRelic query language

SELECT uniqueCount(session) FROM PageView SINCE 1 week agoSELECT uniqueCount(session) FROM PageView SINCE 1 week ago COMPARE WITH 1 week ago

SELECT count(*) FROM PageView SINCE 1 day ago COMPARE WITH 1 day ago TIMESERIES AUTOSELECT uniqueCount(uuid) FROM MobileSession FACET osVersion SINCE 7 days ago

Page 22: Performance testing in scope of migration to cloud by Serghei Radov

Gathering response times

Page 23: Performance testing in scope of migration to cloud by Serghei Radov

Additional response times metrics

All these response times are presented as part of App response time.

- Database response times

- Memcached response time

- WebExternal

- Ruby

- GC calls

New Relic provides advanced ability to trace response times across systems using NRQL.

Page 24: Performance testing in scope of migration to cloud by Serghei Radov

Additional response times metrics

Page 25: Performance testing in scope of migration to cloud by Serghei Radov

Transactions throughput

- DC and Cloud resources are not compatible due to differences in hardware configurations.

- Same transactions count should correspond to current production level at DC or above to be able to serve current users without latency.

Page 26: Performance testing in scope of migration to cloud by Serghei Radov

Target PEAK load will be 1.14K RPM

Lowest point will be 430 RPM

Finding peaks

(extracted from New relic for presentation only instead of DataDog)

Page 27: Performance testing in scope of migration to cloud by Serghei Radov

Scenario per one server

- Ramp up to 430 RPM slowly to 700 RPM in 4 hours

- Run test for 6 hours

- Ramp up to 1.14K rpm

- Run test for 11 hours

- Ramp down slowly

Page 28: Performance testing in scope of migration to cloud by Serghei Radov

Hardware acceptance level

- App server CPU usage

- should not go above 60% during peak 150% load

- threshold of 80%

- Memory usage (avg 60%, threshold 80%)

- Network usage throughput (should correspond DC levels)

- Auto-scaled groups set to false ( initial criteria )

All these metric values depended on production usage, budget and target VMs provisioning size.

Page 29: Performance testing in scope of migration to cloud by Serghei Radov

CPU usage per 1 server (DataDog)

Page 30: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria ➢ Select tools for monitoring and

testing➢ Discuss capacity planning

responsibilities ➢ Workload Characterization➢ Test tools for testing➢ Run tests, analyze, scale, re-run <-

cycle ➢ Report to stakeholders

Page 31: Performance testing in scope of migration to cloud by Serghei Radov

Monitoring targets

Response timesResource utilisation at SUT Resource utilisation at Test ToolExceptions Workload behaviour

Page 32: Performance testing in scope of migration to cloud by Serghei Radov

Load Test tool (flood.io)

Page 33: Performance testing in scope of migration to cloud by Serghei Radov

Response times

Page 34: Performance testing in scope of migration to cloud by Serghei Radov

Resource usage

Page 35: Performance testing in scope of migration to cloud by Serghei Radov

to catch exceptions

Page 36: Performance testing in scope of migration to cloud by Serghei Radov

Tracking workload in real-time

Page 37: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria ➢ Select tools for monitoring and

testing➢ Discuss capacity planning

responsibilities ➢ Workload Characterization➢ Test tools for testing➢ Run tests, analyze, scale, re-run <-

cycle ➢ Report to stakeholders

Page 38: Performance testing in scope of migration to cloud by Serghei Radov

Select proper EC2 type for an AppGeneral Purpose Compute Optimized Memory Optimized GPU Storage Optimized Dense-storage Instances

Page 39: Performance testing in scope of migration to cloud by Serghei Radov

Model vCPUMem (GiB) Storage

Dedicated EBS Bandwidth (Mbps)

c4.large 2 3.75 EBS-Only 500c4.xlarge 4 7.5 EBS-Only 750c4.2xlarg

e 8 15 EBS-Only 1,000c4.4xlarg

e 16 30 EBS-Only 2,000c4.8xlarg

e 36 60 EBS-Only 4,000

Select proper EC2 type for an App

Page 40: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria ➢ Select tools for monitoring and

testing➢ Discuss capacity planning

responsibilities ➢ Workload Characterization➢ Test tools for testing➢ Run tests, analyze, scale, re-run <-

cycle ➢ Report to stakeholders

Page 41: Performance testing in scope of migration to cloud by Serghei Radov

Workload Characterization- Catch traffic patterns

- Resource utilisation

- Distribution of response times

- Distribution of response sizes

- Characterizations of users behaviour

- Analyse input data

- Use performance analysis toolkit

Page 42: Performance testing in scope of migration to cloud by Serghei Radov

Traffic patterns

“Keep workload as real as possible.”

Page 43: Performance testing in scope of migration to cloud by Serghei Radov

Resource utilisation

Page 44: Performance testing in scope of migration to cloud by Serghei Radov

Characterize user behaviour Investigate user actions by help of

- New Relic Browser (session+funnel functions)

- Universal Analytics with User behaviour path

- Mixpanel.com (needs code injection)

- Server’s logs at NGINX- (http requests, REST calls)

- Sumo-logic (apache access logs)

- Server’s App logs (HP ALM has QC sense)

- DB activity logs (applied solution)

Page 45: Performance testing in scope of migration to cloud by Serghei Radov

Write analytical tools that will

Parse access / ELB logs

Unite into scripts by timestamp and IP

Reduce amount of unique scripts

Restore high level user actions

Workload distribution

Write load test scripts

Hard Way

Page 46: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria ➢ Select tools for monitoring and

testing➢ Discuss capacity planning

responsibilities ➢ Workload Characterization➢ Test tools for testing➢ Run tests, analyze, scale, re-run <-

cycle ➢ Report to stakeholders

Page 47: Performance testing in scope of migration to cloud by Serghei Radov

Jmeter Gatling Locust Grinder Tsung

Open Source load tools - 54 found

Page 48: Performance testing in scope of migration to cloud by Serghei Radov

Distributed JMeter testing

Page 49: Performance testing in scope of migration to cloud by Serghei Radov

● BlazeMeter - (JMeter)

● Visual Studio Team Services - (JMeter)

● Flood IO - (Jmeter , Gatling, Ruby DSL)● Redline 13 - (Jmeter , Gatling, Ruby DSL)

● OctoPerf - (JMeter)

Load Tool as Service Providers

Page 50: Performance testing in scope of migration to cloud by Serghei Radov

Create a Grid ( Docker containers)

Page 51: Performance testing in scope of migration to cloud by Serghei Radov

Flood.io Grids ( JM at Docker EC2)

Page 52: Performance testing in scope of migration to cloud by Serghei Radov

Create a Flood (upload jmx & data)

Page 53: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria

➢ Select tools for monitoring and testing

➢ Discuss capacity planning responsibilities

➢ Workload Characterization

➢ Test tools for testing

➢ Run tests, analyze, scale, re-run <- cycle

➢ Report to stakeholders

Page 54: Performance testing in scope of migration to cloud by Serghei Radov

Load Test tool (flood.io)

Page 55: Performance testing in scope of migration to cloud by Serghei Radov

General Test result

Amazon Approval for Large Tests is needed

Page 56: Performance testing in scope of migration to cloud by Serghei Radov

Flood.io results split by transactions

Page 57: Performance testing in scope of migration to cloud by Serghei Radov
Page 58: Performance testing in scope of migration to cloud by Serghei Radov

➢ Define acceptance criteria

➢ Select tools for monitoring and testing

➢ Discuss capacity planning responsibilities

➢ Workload Characterization

➢ Test tools for testing

➢ Run tests, analyze, scale, re-run <- cycle

➢ Report to stakeholders

Page 59: Performance testing in scope of migration to cloud by Serghei Radov

Reports● Goals & achievements (e.g 150% of Daily RPM is reached)

● Side effects are found (DB connections limit reached due to quick ramp up)

● Exceptions caught during testing (e.g. ELB lost connections)

● Run-time notes and fixes made by DevOps (EC2 change during the test iterations)

● Observations ( CPU usage was critical resource during RPM increase)

● Recommendations ( EC2 - add more VM, add more Shards DB)

Page 60: Performance testing in scope of migration to cloud by Serghei Radov

Pitfalls during performance testing

Pitfall 1 : 90% percentile matches to prod.Pitfall 2 : Extrapolation on horizontal scalePitfall 3 : Use a Small Amount of Hard Coded DataPitfall 5 : Run Tests from One LocationPitfall 4 : Focus on a Single Use Case

Page 61: Performance testing in scope of migration to cloud by Serghei Radov

Does the "90 percentile" really work ?

Page 62: Performance testing in scope of migration to cloud by Serghei Radov

Does the "90 percentile" really work ?

Page 63: Performance testing in scope of migration to cloud by Serghei Radov

Does the "90 percentile" really work ?

Page 64: Performance testing in scope of migration to cloud by Serghei Radov

Does the "90 percentile" really work ?

Page 65: Performance testing in scope of migration to cloud by Serghei Radov

What will be the cost of performance testing toolset?

Cloud Jmeter Provider Type Users Monthly Nodes/Hours AWS cost

BlazeMeter pro 3K 499 100 167.50$

Flood.io(shared nodes) pay as you go 15K+ 499 100 167.50$

SOASTA pay as you go 10K 22500 undefined 0

Page 66: Performance testing in scope of migration to cloud by Serghei Radov

Questions and Answers

Page 67: Performance testing in scope of migration to cloud by Serghei Radov

Thank You!