Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
-
Upload
andreas-grabner -
Category
Technology
-
view
1.173 -
download
2
Transcript of Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
1 @Dynatrace
Application Quality Metrics for your Pipeline(and why Docker is not the solution to all of your problems) Andreas (Andi) Grabner -
@grabnerandi
Metrics-Driven DevOps
700 deployments / year
10 + deployments / day
50 – 60 deployments / day
Every 11.6 seconds
Example #1: Online Casino 282! Objects on that page9.68MB Page Size
8.8s Page Load Time
Most objects are images delivered from your main
domain
Very long Connect time (1.8s) to your CDN
879! SQL Queries8! Missing CSS & JS Files
340! Calls to GetItemById
Example #2: Lawyer Website based on SharePoint
11s! To load Landing Page
• Waterfall Agile: 3 years• 220 Apps - 1 deployment per month
“EVERYONE can do Continuous Delivery”
“Every manual tester does AUTOMATION”“WE DON’T LOG BUGS – WE FIX THEM!”
Measures Built-In, Visible to EveryonePromote your Wins, Educate your Peers
Challenges
Deploy Faster!!
Fail Faster!?
Its not about blind automation of pushing more bad code through a shiny pipeline
Metrics based
Decisions!
Time of D
eployment
Availability dropped to 0%
Bad Deployment based on Resource Consumption
With increasing load: Which LAYER doesn’t SCALE?
Usage by Channel? Errors on Devices?
App with Regular Load supported by
10 Containers
Twice the Load but 48 (=4.8x!) Containers! App doesn’t scale!!
Technical Debt!
80%$60B
Insufficient Focus on Quality
The “War Room”
Facebook – December 2012
20%80%
I learning from
others
4 use cases WHY did it happen? HOW to avoid it! METRICS to guide you.
#1 : Not every Architect
makes good decisions
• Symptoms• HTML takes between 60 and 120s to render• High GC Time
• Developer Assumptions• Bad GC Tuning• Probably bad Database Performance as rendering was simple
• Result: 2 Years of Finger pointing between Dev and DBA
Project: Online Room Reservation System
Developers built own monitoringvoid roomreservationReport(int officeId){ long startTime = System.currentTimeMillis(); Object data = loadDataForOffice(officeId); long dataLoadTime = System.currentTimeMillis() - startTime; generateReport(data, officeId);}
Result:Avg. Data Load Time: 45s!
DB Tool says:Avg. SQL Query: <1ms!
#1: Loading too much data24889! Calls to the Database
API!
High CPU and High Memory Usage to keep all data in Memory
#2: On individual connections 12444! individual
connections
Classical N+1 Query Problem
Individual SQL really <1ms
#3: Putting all data in temp Hashtable
Lots of time spent in Hashtable.get
Called from their Entity Objects
• … you know what code is doing you inherited!!• … you are not making mistakes like this
• Explore the Right Tools• Built-In Database Analysis Tools• “Logging” options of Frameworks such as Hibernate, …• JMX, Perf Counters, … of your Application Servers• Performance Tracing Tools: Dynatrace, Ruxit, NewRelic,
AppDynamics, Your Profiler of Choice …
Lessons Learned – Don’t Assume …
Key Metrics# of SQL Calls# of same SQL Execs (1+N)# of ConnectionsRows/Data Transferred
41 @Dynatrace
42 @Dynatrace
#2There is no easy "Migration" to Micro(Services)
43 @Dynatrace
26.7s Execution Time 33! Calls to the
same Web Service
171! SQL Queries through LINQ by this Web Service – request
similar data for each call
Architecture Violation: Direct access to DB instead from frontend logic
44 @Dynatrace
Key Metrics# Service Calls, # Containers# of Threads, Sync and Wait # SQL executions# of SAME SQL’sPayload (kB) of Service Calls
45 @Dynatrace
46 @Dynatrace
#3don't ASSUME you
know the environment
Distance calculation issues
480km biking in 1 hour!
Solution: Unit Test in Live App reports Geo
Calc Problems
Finding: Only happens on certain
Android versions
3rd party issues
Impact of bad 3rd party calls
49 @Dynatrace
Key Metrics# of functional errors# and Status of 3rd party callsPayload of Calls
12 000 000 $
51 @Dynatrace
#4Thinking Big?
Then Start Small!
52 @DynatraceAvailability dropped to 0%
Load Spike resulted in UnavailabilityAd on air
53 @Dynatrace
Alternative: “GoDaddy goes DevOps”
Response time improved 4x
1h before SuperBowl KickOff
1h after Game ended
54 @Dynatrace
Key Metrics
# Domains
Total Size of Content
55 @Dynatrace
What have we learned so far?
56 @Dynatrace
1. # Resources2. Size of Resources3. Page Size4. # Functional Errors5. 3rd Party calls6. # SQL Executions7. # of SAME SQLs
MetricBased
DecisionsAre Cool
We want to get from here …
To here!
Use these application metrics as additional Quality Gates
60
What you currently measure
What you should measure
Quality Metrics in your pipeline # Test Failures
Overall Duration
Execution Time per test# calls to API# executed SQL statements# Web Service Calls# JMS Messages# Objects Allocated# Exceptions# Log Messages# HTTP 4xx/5xxRequest/Response SizePage Load/Rendering Time…
Extend your Continuous Integration
12 0 120ms3 1 68ms
Build 20 testPurchase OKtestSearch OK
Build 17 testPurchase OKtestSearch OK
Build 18 testPurchase FAILEDtestSearch OK
Build 19 testPurchase OKtestSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms3 1 68ms
12 5 60ms3 1 68ms
75 0 230ms3 1 68ms
Test & Monitoring Framework Results Architectural Data
We identified a regresesion
Problem solved
Exceptions probably reason for failed testsProblem fixed but now we have an
architectural regressionProblem fixed but now we have an
architectural regressionNow we have the functional and architectural confidence
Let’s look behind the scenes
#1: Analyzing every Unit & Integration test
#2: Metrics for each test
#3: Detecting regression based on measure
Unit/Integration Tests are auto baselined! Regressions auto-detected!
Build-by-Build Quality ViewBuild Quality Overview in
Dynatrace or JenkinsBuild Quality Overview in
Dynatrace & your CI server
Production Data: Real User & Application Monitoring
Recap!
#1: Pick your App Metrics
# of Service Calls Bytes Sent & Received
# of Worker Threads
# of Worker Threads
# of SQL Calls, # of Same SQLs # of DB
Connections
# of SQL Calls, # of Same SQLs # of DB
Connections
#2: Figure out how to monitor themhttp://bit.ly/dtpersonal
#3: Automate it into your Pipeline
#4: Also do it in Production
Better Software,
Faster!!
Draw better Unicorns
75 @Dynatrace
Questions and/or DemoSlides: slideshare.net/grabnerandiGet Tools: bit.ly/dtpersonalYouTube Tutorials: bit.ly/dttutorialsContact Me: [email protected] Me: @grabnerandiRead More: blog.dynatrace.com
76 @Dynatrace
Andreas GrabnerDynatrace Developer Advocate@grabnerandihttp://blog.dynatrace.com