London web perfug_performancefocused_devops_feb2014
-
Upload
andreas-grabner -
Category
Documents
-
view
492 -
download
0
Transcript of London web perfug_performancefocused_devops_feb2014
1
BEST PRACTICES ON PERFORMANCE-FOCUSED DEVOPS
Andreas Grabner, Technology Strategist @ Compuware/dynaTrace
Popularity of Topics on Google
Who is doing it? How many successful deployments can they do?
300 Deployments / Year
50-60 Deployments / Day
10+ Deployments / Day
Every 11.6 seconds
More on Amazons Story
75% fewer outages since 2006
90% fewer outage minutes
~0.001% of deployments cause a problem
Instantaneous automatic rollback
Deploying every 11.6s
Testing is Important – and gives Confidence
But are we ready for “The Real” world?
Measure Performance during the game
Ball Possession: 40 : 60Fouls: 0 : 0Score: 0 : 0
Minute 1 - 5
Measure Performance during the game
Minute 6 - 35
Ball Possession: 80 : 20Fouls: 2 : 12Score: 0 : 0
Deep Dive Analysis
Options “To Fix” the situation
Not always a happy ending
Minute 90
Ball Possession: 80 : 20Fouls: 4 : 25Score: 3 : 0
FRUSTRATED FANS!!
17
How does that relate to
Software?
How does that relate to
Software?
From Deploy to …
Deploy Promotion/Event Problems Ops Playbook War Room
Timeline
The “War Room” – back then
'Houston, we have a problem‘NASA Mission Control Center, Apollo 13, 1970
The “War Room” – back then
'Houston, we have a problem‘NASA Mission Control Center, Apollo 13, 1970
The “War Room” – NOW
Facebook – December 2012
The “War Room” – NOW
Facebook – December 2012
Problem: Attitudes like this don’t help either
Image taken from https://www.scriptrock.com/blog/devops-whats-hype-about/
Shopzilla CIO (in 2010): “… when they get in the war room - the developers and ops teams describe the problem as the enemy, not each other”
Problem: Very “expensive” to work on these issues
~80% of problems
caused by ~20% patterns
YES we know this
80% Dev Time in Bug Fixing
$60B Defect Costs
BUT
Lets start on the frontend - 80/20 rule from Steve
The impact of Architectural Decisions
Metrics: # Visitors, # Requests / User
Bloated Mobile Web Sites (Shopping Sites Dec 2013)
Huge Java Script, HTML and CSS Files
Large Resources take longer to load – especially on slow
connections
Total Size: 2.5MB
Metric: Total Page Size
Bloated Mobile Web Sites Super Bowl Ad Landing Page
434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …
Total size of ~ 20MB
Good First Impression and OnLoad Time – but very long Total Load Time
Metrics: Load Time, # Resources, …
Mobile Web Site: JUST Redirects to regular page
Most CSS and JS files are redirected to the www domain
This is a lot of time “wasted” especially on high latency mobile connections
Metric: # HTTP 3xx, 4xx, 5xx
Individual images are very small
But downloading so many resoures in parallel takes its time due to browser
connection limits
52 individual country flags!
Metric: # Images
Misconfigured CDN!
46! HTTP 403 Requests for images on the landing page
Probably a configuration problem with their Amazon AWS CDN
Lots of time “wasted” due to roundtrips that just result in a 403
Metric: # HTTP 4xx
Slow JavaScript
1.1s on my IE 10 to execute magicSpanLinks()
The each loop calls this block of JavaScript for every span node
759 span nodes are processed by the anonymous function
It adds the dynamically generated link and removes the
old span
We can see all the DOM Modifications and how this sums
up in execution time
Metric: Time Spent in JS Execution
Browser-specific Problems
• “I would love to spend money with you – but I cant get through the checkout process”
• Looked at the data from this user to identify the error
HTTP 404 at OrderItemAdd
Metric: HTTP xxx per Browser
Root Cause: A cookie bug in Safari
Invalid Cookie Exception
Impacts ALL Safari Users
• ALL Safari users were impacted that bought a product with special characters in the name
Metric: # Exceptions per User,# Exceptions per Browser
But - then we have to focus on the backend
Deployment Mistakes lead to internal Exceptions
Metric: # of Exception Objects Created
Deployment Mistakes lead to high logging overhead
Metric: Time Spent in critical Methods
Production Deployment leads to Log SYNC Issues
Metric: Time Spent in Sync & Logging# of Log Messages
Top Problem Patterns: Resource Pools
Metric: Connection Pool Usage, Acquisition Time
Long running SQL with Production Data
Metric: Time Spent in SQL Execution
N+1 Query Problem
Metric: # SQL Executions / Request# of “same” SQL Executions
Memory Leaks under Production Load
Still crashes
Problem fixed!Fixed Version Deployed
Metric: Heap Size, Object Churn Rate
List of Metrics we just saw
• Architectural Metrics
– Active Worker Threads
– Time Spent in Tier/Class
– # of Exception Objects Created
– Time Spent in critical Methods
– Time Spent in Sync & Logging
– # of Log Messages
• Database Metrics
– Connection Pool Usage, Acquisition Time
– Time Spent in SQL Execution
– # SQL Executions / Request
– # of “same” SQL Executions
• Memory Metrics
– Heap Size, Object Churn Rate
– # Important Objects on Heap
• Web & Mobile Metrics
– Size and # of Resources
– # of JS Files
– Total Page Size
– # HTTP 3xx, 4xx, 5xx
– Time Spent in JS Execution
– # Crashes
– # Errors
How about this idea?
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Lets look behind the scenes
Exceptions probably reason for failed tests
Problem fixed but now we have an architectural regression
Problem fixed but now we have an architectural regression
Now we have the functional and architectural confidence
How? Performance Focus in Test Automation
Cross Impact of KPIs
Analyzing All Unit / Performance Tests
Analyze Perf Metrics
Identify Regressions
How? Performance Focus in Test Automation
Embed your Architectural Results in Jenkins
59
Getting control over your weekend again …Getting control over your weekend again …
Enjoy a beer with friends?
Instead of pizza and soda with your colleagues?
IF WE DO ALL THAT
80% Dev Time for Bug Fixing
$60B Costs by Defects
Recommended Book
https://itrevolution.wufoo.com/forms/phoenix-project-ebook-offer/
FREE Products & More Info
• dynaTrace Enterprise– Full End-to-End Visibility in your Java, .NET, PHP Apps
– Sign up for a 15 Days Free Trial on http://compuwareapm.com
• dynaTrace AJAX Edition– Browser Diagnostics for IE + FF
– Download @ http://ajax.dynatrace.com
• Our Blog: http://apmblog.compuware.com
© 2011 Compuware Corporation — All Rights Reserved
Simply Smarter