The quality attribute of upgradability
-
Upload
len-bass -
Category
Technology
-
view
107 -
download
2
description
Transcript of The quality attribute of upgradability
NICTA Copyright 2012 From imagination to impact
The Quality Attribute of
Upgradability
Len Bass with
Hiroshi Wada, Ingo Weber, Liming Zhu,
Ross Jeffery
NICTA Copyright 2012 From imagination to impact2
About NICTA
National ICT Australia
• Federal and state funded research company established in 2002
• Largest ICT research resource in Australia
• National impact is an important success metric
• ~700 staff/students working in 5 labs across major capital cities
• 7 university partners• Providing R&D services, knowledge
transfer to Australian (and global) ICT industry
NICTA technology is in over 1 billion mobile
phones
NICTA Copyright 2012 From imagination to impact
Consider the follow sequence.
• You have prepared an upgrade to an existing large enterprise system– You have coded it– You have tested it– It is ready!!
• Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested.
• What happens then?
3
NICTA Copyright 2012 From imagination to impact
Consider the follow sequence.
• You have prepared an upgrade to an existing large enterprise system– You have coded it– You have tested it– It is ready!!
• Alternatively, the IT department (or you) get a package from a third party – a vendor or open source – that has been coded and tested.
• What happens then?– ~10% of the time the upgrade will fail.
4
NICTA Copyright 2012 From imagination to impact
This is the upgradability problem
• How do we make upgrading a system less problematic?
• Talk outline– Characteristics of the upgrade problem– FMEA analysis
• Possible causes of failure• Failure prevention, detection, and recovery
– Relation to existing product and process quality work
5
NICTA Copyright 2012 From imagination to impact
Upgrades to enterprise systems are a very common occurrence
Upgrade frequency of some common systems
This frequency would suggest it is important to get the upgrades correct
6
Application Average release interval
Facebook (platform) < 7 days
Google Docs <50 days
Media Wiki 21 days
Joomla 30 days
NICTA Copyright 2012 From imagination to impact
Unfortunately, Upgrades Fail Often
• 4.6-10 component failures each month in three large-scale Internet services. Mostly during regular maintenance
• Average and maximum failure rates from a survey of systems administrators are 8.6% and 50%.
• Some claim that user visible failures from upgrade outweigh user visible failures from software errors.
7
NICTA Copyright 2012 From imagination to impact
Why is this?
• Installation is complicated.– Installation guides for SAS 9.3 Intelligence, IBM i, Oracle 11g for
Linux are ~250 pages each– Apache description of addresses and ports (one out of 16
descriptions) has following elements:• Choosing and specifying ports for the server to listen to• IPv4 and IPv6• Protocols• Virtual Hosts
– The number of configuration options that must be set can be large
• Hadoop has 206 options• Hbase has 64
– Many dependencies are not visible until execution
8
NICTA Copyright 2012 From imagination to impact
Provides Research Agenda
• Indeed, the surprise is not that upgrades fail 8.6% of the time but that they are successful 91.4% of the time.
• Rich area for research.
9
NICTA Copyright 2012 From imagination to impact
What kind of problem is this - product?
• ISO 25010 provides– A quality in use model composed of five
characteristics (some of which are further subdivided into subcharacteristics) that relate to the outcome of interaction when a product is used in a particular context of use.
– I.e. is upgradability a quality of the system being upgraded?
• The answer is yes.
10
NICTA Copyright 2012 From imagination to impact
What kind of problem is this – process?
• ITIL (Information Technology Infrastructure Library) – Change Management aims to ensure that
standardised methods and procedures are used for efficient handling of all changes.
• SPICE – ISO 15504– process assessment provides the means of
characterizing the current practice within an organizational unit in terms of the capability of the selected processes.
• Is upgradability of quality of the process used to manage information technology?
• The answer is yes.
11
NICTA Copyright 2012 From imagination to impact
Upgradability is a hybrid quality problem
• A hybrid quality problem is one in which improvement involves both product and process and in which the product has process awareness.
• Many product centered conferences – Dependability– Security– …
• Some process centered conferences– Software Process Improvement– SPICE– SPEG– … 12
NICTA Copyright 2012 From imagination to impact
Hybrid quality improvement is not well served by the academic community• Hybrid quality improvement – as we shall see – involves
close interaction between product, process and tools to support the process.
• Venues that should emphasize this interaction include– Profes (Product focused Software Development and
Process Improvement)– ASQ (Conference on Quality and Improvement)
• Yet an examination of the CFPs and proceedings for these conferences shows a distinction between process activities and product characteristics
• We will present the results of a FMEA (Failure Mode and Effects Analysis) style analysis for upgradability and then return to the hybrid quality issue
13
NICTA Copyright 2012 From imagination to impact
FMEA
• Failure Modes and Effect Analysis is an inductive failure analysis for analysis of failure modes.
• FMEA involves describing – Potential failure modes– The severity and likelihood of these failures.
• We will focus on the first portion and generate the potential failure modes as well as potential prevention, detection, and recovery from these failures.
• I.e. we are performing an FMEA style analysis, not an FMEA, per se.
14
NICTA Copyright 2012 From imagination to impact
Scenario for Upgradability
15
• We are concerned with the following scenario– Version N+1 of an enterprise system is available for
deployment.• Version N+1 can be deployed by developers• Version N+1 can be deployed by the Information Technology
Department (The Release Manager if there is one).
– Version N+1 is completely coded and tested by its developers.
• Measures can include– Downtime– Resources (hardware or personnel) required to
perform the upgrade– Number of failed attempts to install upgrade
NICTA Copyright 2012 From imagination to impact
Fundamental goals during upgrade
• The literature identifies four fundamental goals while upgrade is occurring.– Efficiently manage resources – Completely and correctly specify configurations– Manage multiple versions to avoid problems with
version mismatch.– Maintain consistency of persistent data.
• Failures are caused by the violation of one of these fundamental goals.– Our FMEA analysis will look at potential causes for
violations of one of these goals.
16
NICTA Copyright 2012 From imagination to impact
Activities during an upgrade of a system
• Make the upgrade available. • Prepare the environment. Ensure that there are
sufficient resources available for installation and that assumed software is available.
• Configuration• Deployment• Activation
17
NICTA Copyright 2012 From imagination to impact
Organization of next portion of the presentation
• For each activity˗ Potential fault (a fault is a failure in waiting)˗ Prevention of the fault˗ Detection of the fault˗ Correction of the fault
• Research opportunity• Blank cell• Cell with only partial coverage
18
NICTA Copyright 2012 From imagination to impact
Make Upgrade available
19
Fault possibility Prevention Detection RecoveryElement omitted/included incorrectly in installing software
Manifest
Bill of lading
Recreate distribution
System corrupted during movement
Hash code, checksum
Retransmit
Source of distribution from an untrusted site
Digital signature
Forgotten/misplaced credentials
Separate secret Independent channel for new credentials
Credential verifier unavailable
Codify acceptable credentials in distribution
NICTA Copyright 2012 From imagination to impact
Prepare environment
20
Fault possibility Prevention Detection RecoveryIncorrect versions of support libraries
Include version number in specification
Utilize services to announce incompatibilities
Encode hash of APIs
Multiple versions of support libraries simultaneously required
Include version number in nameLibraries expose version numbers Linkers version aware
Insufficient resources Rolling Upgrade
Schema modification on database
Convert data to new schema prior to upgrade
NICTA Copyright 2012 From imagination to impact
Configuration
21
Fault possibility Prevention Detection Recovery
Missing parameter Parameter database
Parameter built into tool
Static analysis of code
Incorrectly specified parameter
Abstract specification
Check syntax
Validate against a specification
Inconsistent parameters
Constraint checker
NICTA Copyright 2012 From imagination to impact
Deployment
22
Fault possibility Prevention Detection Recovery
Insufficient resources Pre-allocate during preparation
Rolling upgrade
Inconsistent hardware Verify during preparation
Operator error Undo mechanism
NICTA Copyright 2012 From imagination to impact
Activation
23
Fault possibility Prevention Detection Recovery
Discovered hidden dependency
Monitoring Recovery block
Multiple simultaneous versions
Separation
Dynamic Software Update
Automatic translation of data when old schema is used
Version aware code and data
Version aware load balancer
NICTA Copyright 2012 From imagination to impact
Our activities in this space so far (green cells)
• Mixed version race condition solution• Operator undo
24
NICTA Copyright 2012 From imagination to impact 25
• Common practice when pushing an upgrade to a large number of servers is to perform the upgrades one (or several) servers at a time
• This means that version N+1 (the new version) will be available on some servers and version N (the old version) will be available on other servers.
• Suppose version N+1 has functionality not available in version N
What is the “mixed version race condition”
NICTA Copyright 2012 From imagination to impact 26
1. A client (browser) issues a request that is routed by the load balancer to an instance of version N+1
2. Version N+1 sends JavaScript assuming new functionality back to the client.
3. Client sends an AJAX request that utilizes new functionality and the load balancer routes it to an instance of version N.
4. Error because version N does not have the new functionality.
Now consider the following sequence
NICTA Copyright 2012 From imagination to impact
Mixed Version Race Condition
27
3
4
New Version
X ERROR
Client (browser) Server
1
2
5
Start rolling upgrade
Initial request
HTTP reply with embedded JavaScript
AJAX callbackOld Version
NICTA Copyright 2012 From imagination to impact
What does the solution involve?
1. Label communication between instances and the client with version information
2. Modify load balancer so that messages are routed to an appropriate version
3. Modify load balancer so that messages are balanced across all child instances.
28
NICTA Copyright 2012 From imagination to impact
Why is this a hard problem?• Large installations have multiple distributed load balancers that
must be kept in synch. I.e. some load balancers may know about new version and some may not
• Not enough to put version number in message– Suppose second request goes to a load balancer that does not yet know about
version N+1.
• Must keep messages balanced so that all servers handle roughly the same number of requests.
29
/service/vN+1/service/vN
/service
server server server server
/service/vN
/service
server server
NICTA Copyright 2012 From imagination to impact
Operator undo
• After perofmring an operation in AWS, may want to go back to original state – i.e. Undo the operation
• Not always that straight-forward:– Attaching volume is no problem while the instance is
running, detaching might be problematic– Creating / changing auto-scaling rules has effect on
number of running instances• Cannot terminate additional instances, as the rule would
create new ones!
– Deleted / terminated / released resources are gone!
30
NICTA Copyright 2012 From imagination to impact
Undo for System Operators
31
+ commit+ pseudo-delete
begin-transaction rollback
dododo
Administrator
NICTA Copyright 2012 From imagination to impact
Approach
32
begin-transaction rollback
dododo
Sense cloud resources states
Sense cloud resources states
Administrator
Undo System
NICTA Copyright 2012 From imagination to impact
Approach
33
begin-transaction rollback
dododo
Sense cloud resources states
Sense cloud resources states
Administrator
Undo System
Goal stateGoal state
Initial state
Initial state
NICTA Copyright 2012 From imagination to impact
begin-transaction rollback
dododo
Sense cloud resources states
Sense cloud resources states
PlanGenerate codeExecute
Administrator
Undo System
Goal stateGoal state
Initial state
Initial state
Set of actionsSet of
actions
Approach
34
NICTA Copyright 2012 From imagination to impact
Upgradability as a process&product quality
• Architecture of the system being upgraded can affect the process of installation– Suppose the system checks for version information
from dependent libraries. Then the process must encompass descriptions of what to do if an error condition occurs.
• Process of upgrade can affect the architecture of the product.– Suppose the process is supported by a tool that
checks the health of the installation of version N+1. Then the system must make visible the information used by this tool.
35
NICTA Copyright 2012 From imagination to impact
Summary
• Upgrade is an important problem– Upgrade failures affect user satisfaction– Upgrade failures happen frequently
• Upgrade involves the interaction of product and process quality issues. – Communities are focussed on improving the quality of
the process or the product. Not the joint process/product quality.
• Multiple opportunities for research exist.
36
NICTA Copyright 2012 From imagination to impact
Q&A
37
Research study opportunities in dependable cloud computing:• Software Architecture • Data Management • Performance Engineering • Autonomic Computing
To find out more, send your CV and undergraduate details [email protected]
Thank You!