Post on 30-Aug-2020
Failing GracefullyAs A Feature
Lorne KligermanDirector of Product, Gremlin
@lklig
2
3
4
Be down in 10!
T-Ho 2017
Hey team… bit of a spill but I’m fine.
5
We Expect Technology To Just Work™
Technical Issues Likely Cost Retailers Billions12.01.16
Macy’s, Lowe’s hit by Black Friday technical glitches11.27.17
Retail outages online leave shoppers frustrated on Black Friday11.23.18
People.com
Black Friday Failures
@lklig
Wells Fargo accidentally foreclosed hundreds of homeowners8.7.18
Customers report difficulty accessing Chase Bank mobile and online2.16.19
Citibank Website down, not working 2.28.19
Investopedia
Breaking Banks
@lklig
Computer Problems Blamed For Flight Delays4.1.19
Major US Airlines hit by delays after glitch at vendor4.1.19
Pilots of doomed Boeing 737 MAX fought the plane’s software and lost4.4.19
Airline Incidents
@lklig
9
Technology is fragile.When it breaks, we
shouldn’t notice.
@lklig
10
Plan ahead to keep your
users happy
FAILURE
GRACEFUL DEGRADATION
@lklig
11
Why Are Failures So Common?
12
Legacy Systems
@lklig
13
Lack of Testing
Failure
UI
End to end
Integration
Unit
@lklig
14
With Scale Comes Complexity
@lklig
@lklig
@lklig
17
What Can We Do About It?
18
Design For Failure
19@lklig
20
Designing For Failure
Key User Stories & Features
Edge Cases From Unexpected User
Behaviour
Dependency Failures
@lklig
2121@lklig
22@lklig
Loading Screens Are Not Graceful
23
Inject Failure
By Breaking Things On Purpose
@lklig
Inject failure one service at a time.
Maintain critical functionality.
24@lklig
Common Failures Modes
25
ErrorsHTTP 400, 401, 402
500, 503
Blackhole Latency
@lklig
THAT DEGRADE THE USER EXPERIENCE
@lklig
26
Degrade Gracefully
27
Graceful Degradation
● Provide the best possible experience
● All but the most critical functionality can fall off
● Don’t give up on your users, hold state as long as possible
@lklig
28@lklig
When one dependency fails, users are often affected
Storage
Auth
User Data
Content
Cache
Feature 1
Feature 2
29
Implemented As Designed
@lklig
30
Added Latency
@lklig
31
Blocked Video Link
@lklig
32
Blocked JQuery Request
@lklig
33@lklig
34
Delight Your Users
3535
Graceful Degradation Done Right
@lklig
36
Positive Business Impact
Product Launch
Delight users with new features
Success Metrics
Quantitative goals of the launch
Product Landing
Were the goals achieved? Why or why not? What got in way?
@lklig
37@lklig
Maintain release velocity
Deliver a positive user experience
Engineers spend less time in war rooms
Plan Experiments Early
@lklig
38
RELIABILITY THROUGH CHAOS ENGINEERING
Design for Failure
Identify the most critical end user
functionality.
Inject Failure
Impact your system to be sure your user experience
isn’t impacted.
Degrade Gracefully
Plan for non critical functionality not to
get in the way.
Delight Your Users
Your product metrics will show behaviour, no
matter the condition.
Graceful Degradation As a Feature
@lklig
USE LORNE FOR 20% OFF
40
gremlin.com/lorne
Q&A
Lorne KligermanDirector of Product, Gremlin
@lklig