Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

74
Flight Checks: Quality Assurance for Releases that Prevent Disasters from Escaping into the Wild Brie Hoblin @bhoblin QA Engineer Sage Logik, LLC

Transcript of Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Page 1: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Flight Checks:

Quality Assurance for Releases that Prevent Disasters from Escaping into the

Wild

Brie Hoblin@bhoblinQA EngineerSage Logik, LLC

Page 2: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Imagine…You work at a start-up, and a client is breathing down your neck…(or your boss’s neck)

Page 3: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Saying “this next release is absolutely the most important release ever”

Page 4: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

“…And by the way can you just add these 3 little things by Thursday?” (It is Tuesday.)

Page 5: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

…And then your project manager says “SURE!”

Page 6: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

…And then all the developer’s heads explode.

Page 7: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

And as the QA Engineer, you say

Page 8: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Maybe that’s a familiar

situation…

Page 9: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Imagine…Or maybe you work at a big company…

Page 10: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

And there’s some really big new features that have been in testing for close to 2 weeks…

Page 11: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

And on the last day of testing you find there’s a really big bug that impacts crucial functionality…

Page 12: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Now what?

Page 13: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Today we’re going to talk about 2 things:

Page 14: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Making Deployment Decisions…

1. How to make (or at least communicate recommendations to the people who do make) decisions to release (with bugs), or delay, and whether or not to rollback vs. hotfixing in production when bugs do make it out into the wild

Release / delay?Rollback / hotfix?

Page 15: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Disaster Avoidance/Prevention

2. How to reduce the likelihood that those high pressure moments will happen or, in other words,

How to prevent disasters

Page 16: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

But first let’s back up for a minute

Page 17: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

How do we know a disaster when we see one?

Page 18: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

What exactly constitutes a ‘disaster’?

Page 19: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

First let’s look at some well-known disasters recognized by history:

Page 20: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Mars Climate Orbiter

Page 21: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

• Designed to gather data on Mars’ climate & atmosphere

• & to be the communications relay for Mars Polar Lander

Page 22: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

But…It approached the planet at the wrong angle…

Page 23: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

…And then it exploded.

Page 24: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Why?

Page 25: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Because one of the scientists that worked on the ground-based computer software created output in units of pounds/seconds instead of the SI units of newton/seconds specified in the contract between NASA and Lockheed.

Page 26: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Wrong units!!!

Page 27: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Or what about the Prius software bug in 2014?

Page 28: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

• Various warning lights would light up• Car would enter ‘safe mode’ • Some cars stopped suddenly while being driven

Page 29: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Why?

Page 30: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Because a few software settings caused higher thermal stress in certain transistors, causing them to become damaged or deformed.

Page 31: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Spaghetti Code!

Lack of integration testing!

Page 32: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

“But we work on websites not cars or spacecraft!”

Page 33: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Amazon 1p price glitch:

Page 34: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Why?

Page 35: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

• Caused by a glitch in 3rd party software provided by Repricer Express

• Repriced thousands of products from mattresses to Playstation 4’s to just 1p

• Small retailers lost tens of thousands of dollars overnight and faced bankruptcy

• Both Amazon & RepricerExpress did not offer any compensation to sellers

Page 36: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

3rd party software fail!

Page 37: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Whenever we have a client paying us to make them

software, there’s important stuff on the line.

Page 38: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

So what made those bugs disasters?

Page 39: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

A disaster is (for the purposes of this presentation) a software bug that harms:

o The client’s bottom lineo The users’ bottom lineo The client’s faith in youo The users’ trust in the website / appo Users’ ability to use the website/appORo Your reputation

To an untenable degree.

Page 40: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Caveats…Or causes maimings or deaths(Therac 25, Toyota Camry--Barbara Schwarz)

….Otherwise probably has to affect a significant number of users (if 2% of users lose faith in your website, may not be a ‘disaster’)

Page 41: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Additionally…• How will the bug impact internal

resources as you make reparations / fixes?

• Can the user recover by taking a reasonable action?

• How captive is your audience?• Have you exposed sensitive data?

Page 42: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Can you recognize a disaster?

Page 43: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 1: You work on a website that sells outdoor gear and it is discovered a few hours after your latest release that there is a javascript error that causes the quantity of the last item in the cart to decrease when the user uses the mouse to scroll down the page to click “Purchase,” leading to incorrect totals/ missing items in the order that the user may not notice on the following confirmation page—but the incorrect total is displayed.

Page 44: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Disaster?

Maybe

Page 45: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Impacts user’s trust in the website to an untenable degree…(“I didn’t get my medium blue softshell jacket before I left for vacation!”)

Unless the company makes quick reparations.

So NOT a disaster unless handled poorly by the company,

OR if the cost is too great in terms of internal resources needed to make reparations, resulting in damage to the company’s bottom line—was the bug released during a time of high traffic? Are there a lot of orders to fix?

Page 46: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 2: You work on a website similar to jumponit and as you navigate from the homepage to a specific deal and then back to the homepage, the website loses track of your geolocation and changes your ‘current location’ to a random city. There is no way for the user to reselect their location. This bug is not discovered until after being released and is reported by a user in a large metropolitan area almost 24 hours after the release.

Page 47: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Disaster

Page 48: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

• If verified, this impacts a large group of users, maybe all users

• There is no way for users to recover• Website is completely unusable to anyone who

doesn’t happen to be randomly traveling to wherever the website decided they were located

• Impacts user’s trust in the website, BUT users will probably come back because there’s good deals

• More importantly is impact on bottom line from lost sales for period of time bug was out there

Page 49: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 3: The client calls and screams at your project manager because they can’t log in as a full admin and view important financial reports. They’ve been trying to reach your company for hours because there is a big finance review meeting in 10 minutes…

Page 50: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Disaster (or at least pretty close to it)

Page 51: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

At that point you’ve probably significantly dinged your

client’s trust in you.

Page 52: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

When things look like disasters but aren’t…

Page 53: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Maybe not ideal, but…• Client reports terrible bug while smoke testing a

new release…turns out they haven’t cleared their cache

• Company admin reports terrible bug while impersonating user…that does not affect actual users

• Homepage of website initially loads fine, but then goes blank and reports no results found for your area after about 20 seconds…but it’s ok because most users click on something and go to another page in the first 5 seconds

Page 54: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

When things don’t look like disasters but are…

Page 55: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Looks fine, really isn’t!• Lack of integration testing so everyone knows it’s

fine within their part of the app/website but no one has tested the whole thing

• Third party software changes that go unnoticed• Subtle calculation errors when checking out,

especially in fees, taxes, or percentages paid out to vendors

• Works in your neck of the woods but nowhere else (geolocation issues specific to other geolocations, time zone issues specific to other time zones)

Page 56: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Looks fine, really isn’t!• Everyone tests in Chrome but significant % of

users has issue in other browser• Everyone tests web and completely forgets

mobile• Subtle calculation errors leading to falsely

positive results in business reports

Page 57: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Let’s carry this all one step further…

Page 58: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Deploy or Delay?• When possible do a little of both. Three hours of testing is

better than none. • Start ups are a culture of higher risk—so be prepared to

take more risks and test less than you would in a more established company—lean towards deploying

• Really focus whatever time for testing you have on core functionality and on the devices used the most to access your website/app

• Devote some sliver of time to testing admin functionality• If core functionality is compromised, DELAY• Consider the quality of what is currently out there—if

latest release is an improvement, release it even with the bugs

Page 59: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Deploy or Delay?• Scenario 1: You’re testing an app that is map-

based and it doesn’t load smoothly. You can’t zoom in or out effectively, and it takes forever for parts of the map to load. You feel frustrated trying to use the app, so the users will too. It’s slow and stutters on both iPhones and Androids. You can find the information you’re looking for, it just takes a long time and a lot of patience. The current version of the app crashes every 30 seconds and you cannot accomplish any basic tasks.

Page 60: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Deploy or Delay?Scenario 2: The website you’re testing allows users to barter for services with each other, review each other publicly, and post items for sale. It is currently very stable and usable but is only available in your local city. This next release is to expand the website into 3 additional cities. You discover in the eleventh hour of testing that new users cannot successfully sign up—something broke in the most recent round of fixes. Additionally, existing users can no longer review each other.

Page 61: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Rollback?• Have a rollback plan in place. Decide ahead of time what

will trigger a rollback. • Is core functionality being impacted?• Will a fix (either coding it or testing it) take a significant

amount of time?• How easy is it to rollback? Was everything snapshotted at

the same time? Will info be lost in the meantime?• How recent is your database backup?• Are there database migrations that are either difficult, or

require more time to rollback?• How long will it take to rollback vs. hotfixing?• The risk of rollback failing is equal to the risk of failure to

deploy

Page 62: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 1: After the latest release you discover during smoke testing that you cannot successfully purchase anything on the website. The lead dev determines it will take them a full day to fix things. You know the team has been able to snapshot everything for a stable save point across different components of the website.

Page 63: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 2: After the latest software release, you discover that 3 important variables that are used to generate financial reports concerning overall revenue are being calculated at inflated values. All 3 are consistently inflated by 12.5%. The developers need a few days to fix things. To rollback would be difficult because it turns out there is a problem with your database backup.

Page 64: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Hotfix?• Is it a small change?• Does it involve any dependencies?• Is it impossible to rollback?• Is core functionality being compromised?

Page 65: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 1: You’re smoke testing a release and notice that the company name is spelled wrong on the homepage, in the header.

Page 66: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Scenario 2: Your company just did a major release that adds a new feature to your website where users can upload photos and automatically create a slideshow for their listings in your directory. Unfortunately all the existing photos on the website are now not loading, and, users cannot upload photos either. And even though everything worked in your test environment, it also turns out that users cannot create a new listing in the directory either.

Page 67: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Preventing Disasters

Page 68: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Recipe for Disaster:• Culture of ego, devs who assume they can pull it

off rather than asking thorough questions• Lack of communication about requirements• Project manager who always says yes to the

client• Culture that does not adequately weigh the risks

of moving too fast against the value of testing and doing things right

Page 69: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Recipe for Disaster:• Testing has been outsourced overseas• Team has no diversity• Lots of third party dependencies• Lack of time for testing• When devs are expected to do most of the testing• When devs are not aware of widely accepted

design standards• Party culture that is cavalier about business

needs of client

Page 70: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Preventing Disasters:• When in doubt, ask questions!• Involve QA in the design process• Foster interdependence vs competition• Be thorough about requirements gathering and

then firm with client about not changing them for that release

• Take the potential for disaster seriously. You are not immune.

• If testing is going to be outsourced, up communication. Preferably, keep testing in house.

Page 71: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Preventing Disasters:• Value diversity within your team. If your team lacks

diversity seek other individuals to offer quick feedback.• Be diligent about integration testing around third party

software. (And be clear about how they will handle bugs in their software if it leads to loss of revenue.)

• QA time should be roughly equal to dev time. Really. • Even the best devs mostly test that things work. QA will

actually try to break things. Don’t leave all the testing to devs.

• Encourage devs to be aware of existing design standards.• Work hard, then play hard. Client’s needs are main focus.

Page 72: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Thank you!Brie Hoblin@[email protected] Logik, LLC

Page 73: Flight checks -QA for Releases that Prevent Disasters from Escaping into the Wild

Resourceshttp://www.computerworld.com/article/2515483/enterprise-applications/epic-failures--11-infamous-software-bugs.html

http://www.reuters.com/article/us-toyota-recall-idUSBREA1B1B920140212

http://www.safetyresearch.net/blog/articles/toyota-unintended-acceleration-and-big-bowl-%E2%80%9Cspaghetti%E2%80%9D-code

http://www.computerworlduk.com/it-vendors/amazon-1p-price-glitch-who-should-pay-up-3591160/