STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
-
Upload
andreas-grabner -
Category
Software
-
view
731 -
download
0
description
Transcript of STP 2014 - Lets Learn from the Top Performance Mistakes in 2013
LET’S LEARN FROM THE TOP PERF MISTAKES
@grabnerandihttp://apmblog.compuware.com
What to do with the fastest car …
… if it fails to reach the finish line
What to do with millions of $$ for
building a web site …
Performance, Scalability & Architecture
#1: Architectural Decisions
#1: “We want more Web 2.0”
#1: Load Test Prior to Change
#1: Load Test After Change
Metrics: # Visitors# Requests / User
Business: Do we need all these bells and
whistles?
#2: Disconnected Teams
#2: “Teamwork” between Dev and Ops
SEV1 Problem in Production
Need access to log files
Where are they? Can’t get them
Need to increase log level
Can’t do! Can’t change config files in prod!
#2: Solution: Implement a Custom “On Demand” Remote Logger
#2: Implementation and Rollout
Implemented Custom Logger
Worked well in Load Testing
#2: What happened?
~ 1Mio Lock Exceptions in 30 mins
#2: Root Cause: A special WebSphere Setting!
Log Service provides a synchronized log file across ALL JVMs
Log Service provides a synchronized log file across
ALL JVMs
Metrics: # Log Messages, # Exceptions
Share: Same Server Settings
#3: Implementation
Flaws
#3: Business Impact requires Action!
#3: Solution: Cache to the RESCUE!!
#3: Implementation and Rollout
Implemented InMemory Cache
Worked well in Load Testing
#3: Result: Out of Memory Crashes!!
Still crashes
Problem fixed!Fixed Version Deployed
Metrics: Heap Size, # Objects Allocated,# Objects in Cache
Cache Hit Ratio
Test: With realistic Data
#4: Push without a Plan
#4: Mobile Landing Page of Super Bowl Ad
434 Resources in total on that page:230 JPEGs, 75 PNGs, 50 GIFs, …
Total size of ~ 20MB
#4: m.store.com redirects to www.store.com
ALL CSS and JS files are redirected to the www domain
This is a lot of time “wasted” especially on high latency mobile
connections
#4: Critical Pages not Optimized!
Browse, Search and Product Info
performs well
… because they don’t follow best practices: 87 Requests, 28
Redirects, …
Critical Pages such as Shopping Cart are very
slow …
Metrics: Load Time, # Resources (Images, …),
# HTTP 3xx, 4xx, 5xx
Dev: Build for Mobile
Test: Test on Mobile
#5: “Blindly” (Re)use Existing
Components
#5: Requirement: We need a report
#5: Using Hibernate results in 4k+ SQL Statements to display 3 items!
Hibernate Executes 4k+ Statements
Individual Execution VERY
FAST
But Total SUM takes 6s
#5: Requirement: We need a fancy UI
#5: Using Telerik Controls Results in 9s for Data-Binding of UI Controls
#1: Slow Stored ProcedureDepending on Request
execution time of this SP varies between 1 and 7.5s
#2: 240! Similar SQL StatementsMost of these 240! Statements are
not prepared and just differ in things like Column Names
Metrics: # Total SQLs# SQLs / Web Request# Same SQLs / Request
Transferred Rows
Test: With realistic Data
Dev: “Learn” Frameworks
12 000 000 $
#6: No “Agile” Deployment
Ad on air
Availability dropped to 0%
#6: Load Spike resulted in Unavailability
#6: Alternative: “GoDaddy goes DevOps”
Response time improved 4x
1h before SuperBowl KickOff
1h after Game ended
#6: Behind the Scenes
Metrics: AvailabilityPage Size, # Objects
# Hosts, # Connections
DevOps: “Feature” Switches
What have we learned today?
NOT EVERY ARCHITECT
MAKES GOOD DECISIONS
UNDERSTAND THE TECHNOLOGY
WE ARE WORKING WITH
# of Requests / User
# of Log Messages
# of Exceptions
# Objects Allocated
# Objects In Cache
Cache Hit Ratio
# of Images
# of SQLs
# SQLs per RequestAvailability
# HTTP 3xx, 4xx
Page Size
A final thought …
How about this idea?
12 0 120ms
3 1 68ms
Build 20 testPurchase OK
testSearch OK
Build 17 testPurchase OK
testSearch OK
Build 18 testPurchase FAILED
testSearch OK
Build 19 testPurchase OK
testSearch OK
Build # Test Case Status # SQL # Excep CPU
12 0 120ms
3 1 68ms
12 5 60ms
3 1 68ms
75 0 230ms
3 1 68ms
Test Framework Results Architectural Data
We identified a regresesion
Problem solved
Let’s look behind the scenes
Exceptions probably reason for failed tests
Problem fixed but now we have an architectural regression
Problem fixed but now we have an architectural regression
Now we have the functional and architectural confidence
How? Performance Focus in Test Automation
Cross Impact of KPIs
Analyzing All Unit / Performance Tests
Analyze Perf Metrics
Identify Regressions
More Info
• My Blog: http://apmblog.compuware.com
• Tweet about it: @grabnerandi
• dynaTrace Enterprise– Full End-to-End Visibility in your Java, .NET, PHP Apps
– Sign up for a 15 Days Free Trial on http://compuwareapm.com
• dynaTrace AJAX Edition– Browser Diagnostics for IE + FF
– Download @ http://ajax.dynatrace.com
THANK YOU@grabnerandi