The Forecasting and Impact of the Loss of Critical Human ... Manuscript.pdf · The Forecasting and...
Transcript of The Forecasting and Impact of the Loss of Critical Human ... Manuscript.pdf · The Forecasting and...
Accepted for Publication: IEEE Trans. on Engineering Management
The Forecasting and Impact of the Loss of Critical Human Skills Necessary for Supporting Legacy Systems
Peter Sandborn and Varun J. Prabhakar
CALCE Electronic Products and Systems Center Department of Mechanical Engineering
University of Maryland College Park, MD 20742 USA
Index Terms
Workforce planning, workforce management, obsolescence, DMSMS, life-cycle cost,
organizational forgetting, legacy systems
Abstract
The loss of critical human skills that are either non-replenishable or take very long periods of
time to reconstitute, impacts the support of legacy systems ranging from infrastructure, military
and aerospace to IT. Many legacy systems must be supported for long periods of time because
they are prohibitively expensive to replace. Loss of critical human skills is a problem for legacy
system support organizations as they try to understand and mitigate the effects of an aging
workforce with highly specialized, low-demand skill sets. The existing research focuses on
workers that have skills that are obsolete and therefore need to be retrained to remain
employable; alternatively this paper addresses the system support impacts due to the lack of
workers with the required skill set. This paper develops a model for forecasting the loss of
critical human skills and the impact of that loss on the future cost of system support. The model
can be used to support business cases for system replacement. A detailed case study of a legacy
control system from the chemical manufacturing industry is provided and managerial insights
associated with the support of the system drawn.
Accepted for Publication: IEEE Trans. on Engineering Management
Managerial Relevance Statement
The model developed in this paper allows managers to provide quantitative answers to the
following questions: 1) when will the impact of the loss of critical human skills be felt, i.e., is
there a point in the future where the system starts to become more expensive to support; 2) “how
bad is bad?”, i.e., when will the cost of supporting the system become prohibitive; and 3) is there
an alternative to replacing the system?, i.e., could a change in hiring or training practices (if
possible) resolve the problem, and if so, what hiring rates are needed? The case study provided
in the paper demonstrates the loss of younger workers (seeking what they perceive to be more
lucrative opportunities) and clearly supports a business case for the timing of future system
replacement.
Accepted for Publication: IEEE Trans. on Engineering Management
I. INTRODUCTION
There are many types of mission, safety and infrastructure critical systems that have very
long field support lives of 20-30 years or more. These systems are often referred to as legacy
systems because they are based on, or are composed of, out-of-date (old) processes,
methodologies, technologies, parts, and/or application software.1 Due to mismatches between
the fast rate of technology advancement and the massive expense associated with the
replacement of these mission, safety and infrastructure critical systems, legacy system
dependence is likely to be a reality forever and many of today’s new systems are guaranteed to
become tomorrow’s legacy systems. Examples of legacy systems include military systems,
industrial and transportation control systems, and communications infrastructure. Technology
and part obsolescence are common problems faced by these systems, e.g., the inability to procure
spare parts to maintain the systems through the system’s support life. This problem is commonly
referred to as Diminishing Manufacturing Sources and Material Shortages (DMSMS) and is
particularly challenging for electronic part management, [1]. Obsolescence, however, is not
solely a hardware problem; it also impacts software [2] and the human skills necessary to
maintain legacy systems. Human skills obsolescence is a growing problem for organizations that
must support legacy systems. These organizations need to understand, forecast and mitigate the
effects of an aging work-force with highly specialized (and in many cases effectively
irreplaceable) skill sets.
Product manufacturing and support organizations require a dependable pool of skills so that
they can perform their businesses efficiently and without interruption. However, the pool of
available people and their associated skills is always in flux, requiring careful management and
1 In software engineering, legacy systems definitions are usually confined to software and architecture, this paper uses a broader definition that includes hardware and processes as well.
Accepted for Publication: IEEE Trans. on Engineering Management
forecasting. Workforce planning is commonly defined as getting the right number of people
with the right competencies in the right jobs at the right time. There are numerous issues that
arise associated with mismatches between the skills possessed by the workforce and the skills
needed by employers, which we broadly classify into the following three categories: skills
obsolescence, skill shortage, and critical skills loss.
Skills obsolescence (also referred to as human capital obsolescence) is the situation in which
workers (for whatever reason) do not have the skills that they need, which includes workers that
have skills that are obsolete and need to be retrained. De Grip and Van Loo [3] define five types
of skills obsolescence: the wear of skills due to aging or illness that may be related to working
conditions; the atrophy of skills due to insufficient use (skills decay); job-specific obsolescence
due to technological and organizational change, including skills that are simply no longer needed
(skill depreciation); sector-specific obsolescence due to shifts in employment; and firm-specific
skills obsolescence due to displacement. A significant body of literature is devoted to the
analysis of skills obsolescence, e.g., [4] and the references contained therein. Skill shortage
refers to insufficient current skill competences, e.g., [5]. Skills shortage expresses the need to
identify, train and retain the workforce to fill current and expected future skill gaps. Skills
shortage also includes an organization’s failure to protect core skill competencies in economic
downturns [6].
The third category is critical skills loss, which is the topic of this paper. Critical skills loss
describes the loss of skills that are either non-replenishable or take very long periods of time
(many years) to reconstitute. Critical skills loss is a special case of “organizational forgetting”.
Organizational forgetting describes the situation where organizations lose knowledge gained
through learning-by-doing due to labor turnover, periods of inactivity, and/or failure to
Accepted for Publication: IEEE Trans. on Engineering Management
institutionalize tacit knowledge [7]. Most existing analyses of organizational forgetting
implicitly assume that the situation is relatively short term and seek to forecast the recovery
period and the associated disruption in productivity and schedule, e.g., [8-10]. Critical skills loss
(the subject of this paper) represents a permanent (non-recoverable) and involuntary form of
organizational forgetting. Critical skills loss is usually the result of long-term (20+ years)
attrition where skilled workers retire and there are an insufficient number of younger workers to
take their place.2 Critical skills loss is not the result of inactivity, poor planning, or a lack of
foresight by an organization. It is simply the inevitable outcome of the organization’s
dependence on specialized skills for which there is relatively little, but non-zero, demand (which
is also the nature of DMSMS-type obsolescence problems for hardware and software, see [1]).
System support and management challenges created by the loss of necessary human skills have
been reported in a number of industries including healthcare [11], nuclear power [12], aerospace
[13], and other enterprises [14]. For example, Goodridge and McGee [15] and Hilson [16]
discuss the IT industry and the shortage of mainframe application programmers experienced in
legacy applications – the appropriate skills are no longer taught and younger workers are not
interested in learning them. The loss of critical skills problem is particularly troublesome for
organizations that must support legacy systems for long periods of time, e.g., 20-30 years or
longer. For many defense systems, the problems are potentially devastating: “Even a 1-year
delay in funding for CVN-76 [aircraft carrier] will result in the loss of critical skills which will
take up to 5 years to reconstitute through new hires and training. A longer delay could cause a
permanent loss in the skills necessary to maintain our carrier force.” [17].
2 In the case of legacy system support, 5 (or in some cases much longer) years of on-the-job experience by an engineer may be necessary to become competent.
Accepted for Publication: IEEE Trans. on Engineering Management
As the human capital capable of supporting a system shrinks, eventually the time required to
support the system will increase – the objective of this paper is to understand when the increase
will occur and its magnitude. Increases in support time increase business interrupt time for
manufacturing causing a loss of revenue. Increases in support time in the service, transportation
and defense industries can decrease system availability, which can lead to loss of revenue,
compromised safety, and possibly result in the loss of life (e.g., air support for Marines
compromised by lack of helicopters due to repair delays). Due to prohibitive costs, legacy
systems are often not replaced until the consequences of their age (and possible lack of support)
result in catastrophic failures, or a viable replacement business case can be constructed and sold
to their customers.3 In order to construct business cases, management must be able to provide
quantitative answers to the following questions: 1) when will the impact of human obsolescence
be felt, i.e., is there a point in time where the system starts to become more expensive to support;
2) “how bad is bad?”, i.e., when will the cost of supporting the system exceed my profit margin
(or budget)? 3) “is there an alternative to replacing the system?” could a change in hiring or
training practices (if possible) resolve the problem? The model described in this paper provides
a forecast of the loss of human skills as a function of time and the associated impact of that loss
on the system support cost. These outcomes can be used as direct support for system
replacement business case development.
A. Existing Work
The existing literature suggests that some of the possible causes for critical skills loss
include: declines in education and training (e.g., university educated engineers are no longer
3 For example, “customers” may be the company owners (e.g., manufacturing system replacement), the tax-paying public (e.g., civil infrastructure replacement), or governments (e.g., military asset replacement).
Accepted for Publication: IEEE Trans. on Engineering Management
trained in the programming languages used in many legacy systems, e.g., [18]); younger workers
that are discouraged from entering particular workforces that they perceive to be in decline (e.g.,
nuclear power [12]) or not cutting-edge [19,20]; younger workers leaving legacy system
sustainment jobs to pursue what they perceive to be more lucrative and exciting opportunities
(this observation is partially supported by Fig. 2 in Section III, which shows the distribution of
exit age for a legacy control system); shrinking “feeder” occupations (e.g., the U.S. Navy feeding
skilled workers to the nuclear power industry [12]); older workers being protective of what they
know (not passing along their knowledge in order to protect their jobs, e.g., [21]); and
differences in social and cultural influences between younger and older workers regarding the
perceptions of their jobs [15].
While there are many qualitative statements of the problem of critical skills loss and non-
replenishable knowledge loss [11-17], to the authors’ knowledge there is no quantitative
forecasting of its rate or impact. The majority of existing research addresses skills obsolescence,
which is the opposite of the problem addressed in this paper wherein workers’ skills are no
longer useful or become outdated as a result of automation and the rapid growth in technology,
e.g., [3,4,22-25]. These works address ways to mitigate skill “decay” over time for a given
workforce. The existing work on critical skill loss focuses on knowledge preservation – the
realization that non-replenishable knowledge is being lost and the need to capture it, [26,27].
There is also work on retirement wave planning [28], but this work is focused on head count not
skill content.
The analytical techniques presented by Bohlander and Snell [29] modeled a situation similar
to critical skills loss, however, they do not include the attrition or the costs associated with the
varying availability of the workers. Bordoloi [30] developed a model for workers that are at
Accepted for Publication: IEEE Trans. on Engineering Management
different skill levels entering and exiting a company; additionally, the rate at which the employer
gains and loses employees is also taken into account in their planning model. However,
Bordoloi [30] stops short of estimating the experience in the skills pool as a function of time and
thereby determining the impact that critical skills loss has on the support of systems. Huang et
al. [31] developed a planning model using a computer simulation where the goal was to
determine an ideal hiring rate based on the different levels of skill that workers had. Although
this model uses workforce simulation and calculates an ideal hiring rate, this model does not
account for the costs incurred by the unavailability of workers in their determination of an ideal
hiring rate. Holt [32] points out that the traditional understanding of a workforce has been in
terms of the “physical sum of people employed,” which is the basis for most common workforce
planning models. A more useful approach is to observe workers in an organization as assets with
varying skill levels. Holt defines human capital investment as the “total amount and quality of
talent, knowledge, expertise, and training that these workers possess.” Holt minimizes cost while
maintaining balanced levels of human capital. The model is based on categorizing workers by
“classes” wherein all workers in a class share similar attributes. However, the Holt model lacks
aging and age-related effects of individual workers over time.
The maintenance workforce planning literature, e.g., [33-36], focus on resource allocation
and scheduling optimization. The goals of a maintenance policy are maximizing plant
availability and minimizing cost via the timely presence of maintenance workers. Koochaki et
al. [33] points out that “maintenance workers are usually highly skilled and therefore difficult to
recruit” and that “the efficient and effective use of a scarce maintenance workforce is very
important”. Koochaki et al. [33] considers the impact of maintenance resource constraints (i.e.,
limited maintenance workers) on the grouping of maintenance activities and comparing
Accepted for Publication: IEEE Trans. on Engineering Management
condition-based maintenance (CBM) and age-based replacement. Ahire et al. [36], minimizes
the makespan (the total length of the schedule) for a sets of preventive maintenance tasks when
there are workforce availability constraints. Other papers in this area have treated the
influence of CBM on workforce planning and maintenance scheduling, e.g., [33] and references
contained therein, however, these papers focus on the determining the optimum size of the
maintenance workforce. None of the existing work treating the maintenance workforce
addresses the problem of loss of irreplaceable maintenance resources.
The existing literature makes no attempt to study or verify the postulated causes of critical
skills obsolescence, or quantify its impacts on an organization. The objective of this paper is to
develop a model that uses existing historical workforce data (that organizations have) to forecast
the future impacts of that loss and to relate the impacts to management decision. This paper is
concerned with the dynamics of the cumulative skills pool (“head content”) and its relation to the
cost of sustaining a system as a function of time. The questions that need to be addressed are
“what will today’s skills pool look like after ‘x’ years?” and “what impact will the future skills
pool have on the ability to continue to support the system?” The model described in this paper
offers a projection into the future and a way to quantify cumulative experience versus the influx
of new inexperienced workers and worker departures. This model enables management to not
only project future costs, but also to make quantitative business cases for system replacement.
Section II describes the dynamic human skills model developed in this paper. This is
followed in Section III by a numerical case study of a legacy control system from the chemical
manufacturing industry. The case study includes the analysis of the life-cycle cost impacts of
critical skills loss and the managerial insights derived from the analysis.
Accepted for Publication: IEEE Trans. on Engineering Management
II. A MODEL FOR ESTIMATING THE HUMAN SKILLS POOL OVER TIME
The motivation for the model developed in this section is to forecast the future change in
“experience” relative to today’s skills pool. The model assumes that sufficient experience to
support the system exists today, and determines how long it takes for the experience to decrease
to a point where there is an impact on the cost and how large that impact will be.
The model presented in this section estimates the number of skilled employees (“head count”
or pool size) and the cumulative experience in the skills pool (“head content”) in order to
determine the resources that will be available to maintain a legacy system or family of systems in
the future. The cumulative experience in the skills pool impacts the time to perform the
maintenance activities to sustain a system. The time to perform maintenance to support a system
is a significant cost driver for availability-centric systems. Availability is the ability of a service
or a system to be functional when it is requested for use or operation and is determined by the
combination of reliability and maintainability. For many systems, the cost of unavailability is
very large, e.g., high-volume manufacturing, airlines, medical devices, etc.
The assumptions and parameters used to build the model developed in this section were
driven by the actual data available for the numerical case study (see Section III). In several cases
the simplest possible assumption was made because there was no data to support more complex
assumptions – these assumptions are addressed as they appear in the model. The model
developed in this section has four primary inputs: the distribution of current ages (fC), the
distribution of hiring ages (fH), the distribution of exit ages (fL) and the hiring rate (H). The data
from which the distributions can be constructed can be readily obtained from human resource
records kept by organizations. Using these inputs, we wish to determine the quantity of people
Accepted for Publication: IEEE Trans. on Engineering Management
of a specific age in the pool as a function of time.4 Once the quantity of people of each age at a
point in time are computed, the model converts the head count to cumulative experience at that
point in time using a parametric relationship derived from actual data. The experience can then
be used to calculate the time to resolve a maintenance event and that time (along with the
quantity of maintenance events to be resolved) can be used to calculate the cost.
A. The Pool Size and Experience Calculation
In order to assess pool size and experience, we have to project (i.e., forward calculate into the
future) the experience of the people in the pool based on when they were hired while accounting
for age related loss.
The level of experience within the skills pool varies over time and can be determined from
the following: 1) the new hires added to the skills pool; 2) the attrition rate or loss of skilled
employees due to job changes, promotions, retirement, etc.; and 3) accounting for the varying
skill levels of the people in the pool and how those skill levels improve as people remain in the
pool.
Initially, we assume that the organization’s policy dictates the hiring rate, H, resulting in a
pool with varying skills that can be modeled over time. Since generally the data (see Section III
for example) is only available on an annual basis, we utilize discrete one year time steps, i,
throughout the analyses. Let H be the new hires per year as a fraction of the pool size at the start
of the analysis period (i = 0). Let fC be the probability mass function (PMF) of age for the current
skills pool, fH be the PMF of age for new hires, and fL be the PMF of age for people exiting the
4 Both stationary and non-stationary models are considered herein, i.e., stationary models assume that the distributions do not change over time and non-stationary models allow them to change over time governed by drift and volatility parameters.
Accepted for Publication: IEEE Trans. on Engineering Management
pool (attrition). Then, Ni(a), the net frequency of people in the pool of age a during year i, is
given by,
)(1)()1()( 1 afaHfaNaN LHii (1)
where, i is the number of years from the start of the analysis period. The first term in the
brackets is the current pool, the second term in the brackets is the new hires, and the multiplier
models the retention rate. Note, (1) assumes that the hiring rate, H, is the same for all ages. As a
boundary condition, we require that at i = 0, N0(a) = fc(a) corresponding to the current
distribution of worker age in the skills pool. Ni(a) is a fraction that is measured relative to the
starting (i = 0) pool; for example, the number of people of age a in year 0 would be the product
of N0(a) and the total initial (i = 0) head count. Therefore, the cumulative net frequency of people
in the skills pool, NNET, in year i is the sum of Ni(a) over all the ages given by,
r
yaiNET aNiN )()( (2)
where, r is retirement age and y is the age of the youngest employee in the skills pool. We use
the term “net frequency” because NNET(i) can be greater than 1, which can occur if the number of
new hires exceeds the number of people exiting the pool. The age of the youngest worker, y,
corresponds to the age of the youngest worker at the start of the analysis period (from fC) or the
youngest worker that is hired during the analysis period (from fH). This model assumes that a
mandatory retirement age, r, exists. Early retirement along with other reasons for the loss of
skilled employees are captured in the PMF for attrition, fL. Conversely, the PMF of retention, fR
for the skills pool can be estimated annually by, fR(a) = 1 – fL(a).
Estimating the size of the pool (head count) over time is necessary but not sufficient to
capture the ability to support a system into the future because not all workers in the pool have an
equivalent level of experience and therefore an equivalent level of “value” to the support of the
Accepted for Publication: IEEE Trans. on Engineering Management
system. The following analysis estimates cumulative experience in order to track the true value
of the pool of skilled workers where “experience” is defined as the length of time that a worker
has spent in a particular position. The cumulative experience of the pool in year i, Ei, can be
calculated using,
r
yaiEEi aNIaRE )( (3)
where, RE and IE are parameters used to map age to effective experience measured in years (RE
and IE are determined using a curve fit to actual data, e.g., see Section III.B). Note, “experience”
has units of time, however, Ei is only used in the model to determine the change in cumulative
experience from the initial condition.
The cumulative experience of the pool in year i as a fraction of the current pool experience,
E0, can be calculated as,
0E
EE i
ci (4)
Using the cumulative experience, the time to perform maintenance in year i is given by,
ic
mm
i E
TT 0 (5)
where, mT0 is the time to perform a maintenance activity with a skills pool having E0 experience
at time i = 0. The time to perform maintenance modeled in (5) increases as experience decreases
due to several factors: 1) less experienced workers may require more time to perform the task
(learning curve effects), and/or 2) the number of workers who can perform the required
maintenance activity is smaller, so they may not (for example) be available at every site, i.e.,
they may have to travel from a different location.
Accepted for Publication: IEEE Trans. on Engineering Management
All of the development so far assumes a fixed hiring rate that may or may not be sufficient to
maintain the current level of experience. The model can be used to estimate the hiring rate
required to maintain the current level of experience over the system’s support life. The required
hiring rate (assuming hiring is possible) in year i, Hi, can be solved for from (1) as,
)(
1)1(
)(1
)(1 af
aNaf
aNH
Hi
L
ii (6)
subject to the following constraint which ensures that the cumulative experience is constant at
E0,
10
E
EE i
ci (7)
The simplest assumption for solving (1) or (6) is that we have a “stationary process”. A
stationary process is a stochastic process whose joint probability distribution does not change
when shifted in time or space (time is the relevant parameter for this model). For a stationary
process, the mean and variance do not change over time. In reality, hiring age (fH) and exit age
(fL) are not stationary processes. To model the problem as a non-stationary process, we model
the change in the mean of the distribution using geometric Brownian motion,5
tttt dWXdtXdX (8)
where Xt is the value of the quantity being simulated at time t (the mean of the hiring age or exit
age distributions in our case), μ is a drift component, σ is a variance component (also called
shock or volatility), and Wt is a Brownian motion (also known as a Weiner or continuous-time
stochastic process). Wt is modeled as dt in our case. No shape change in the distributions is
modeled (because no case study information could be obtained to support it).
5 Continuous-time models (such as Brownian motion) are commonly used to model continuous degradation processes for asset management of complex engineering systems [37].
Accepted for Publication: IEEE Trans. on Engineering Management
B. The Cost Impact of Human Skills Obsolescence
The most immediate impact of the loss of critical human skills for the support of legacy
systems is in the ability to perform system support activities in a timely manner. In general,
corrective maintenance costs consist of: spare parts, labor, downtime, overhead,
consumables/handling, and equipment/facilities [38]. When a maintenance event occurs for a
system, the cost of performing the necessary maintenance action is modeled using,
bi
c
m
bli
c
mp
imi C
E
TfC
E
TCC
i
j
i
00 (9)
where the term in brackets is the maintenance time from (5), jbf is the fraction of the
maintenance events of severity level j that result in a business interrupt, piC is the cost of
replacement parts (if replacements are needed) in the ith year, liC is the cost of labor (per unit
time) in the ith year (assumed to be burdened, i.e., includes overhead), and biC is the cost of
business interrupt (per unit time) in the ith year. All equipment and facilities necessary for
performing maintenance are assumed to be subsumed in the overhead, and consumables and
handling are assumed to be negligible. liC , b
iC and piC are discounted to a base year using the
weighted average cost of capital. piC may include holding (inventory) costs if the part is obsolete
in year i and was procured in a lifetime or bridge buy at an earlier time.6 The failure modes of
6 A lifetime buy refers to purchasing a sufficient number of parts to sustain a system through its entire support life (until its end of support date) at the point in time when the part is discontinued (i.e., the point in time when the part is no longer procurable from its original manufacturer), while a bridge buy only procures enough parts to sustain the system to a redesign that will replace the part [39]. Lifetime and bridge buys are a common obsolescence mitigation approach used for managing DMSMS of electronic parts where the ability to procure parts after discontinuance may be limited or introduce unacceptable risks of procuring counterfeit parts. If parts are purchased at a lifetime or bridge buy, the cost of the replacement part (this is a per part cost) in year i is given by,
Accepted for Publication: IEEE Trans. on Engineering Management
the system are assigned “severity levels” (j) that are correlated to business interrupt fractions jbf
that can range from 0 to 1 (0 = no business interrupt, 1 = all the maintenance time is business
interrupt). Time-to-failure distributions for each severity level are constructed from the time-to-
failure distributions of the relevant failure mechanisms and these are sampled to create
maintenance events using a discrete event simulation (see Section III).
The default model for maintenance time given in (5), which is embedded in (9), assumes that
only the pool experience impacts the time to perform maintenance. The problem is that an
organization may have sufficient experience in the pool, but there is a minimum head count that
must be staffed per system (a plant, for example, would be a single installation of a system under
analysis). If head count drops too low, maintenance time may be impacted even if sufficient
experience exists. For such a case, the average number of people per plant is given by,
N
MfP origs (10)
where fs is the pool size relative to today, Morig is the original number of people (corresponding
to a fully staffed organization), and N is the number of plants. Rearranging (10) to solve for the
threshold (minimum) pool size gives,
orig
s M
NPf
threshold
min (11)
When thresholdss ff the maintenance time is penalized using,
i
pp
i
ni
nkkn
pp
i w
CCni
w
I
w
CCni
)1( :For ;
)1()1( : For
1
where pC is the purchase price of the part, w is the weighted average cost of capital, n is the year of the
lifetime buy or bridge buy, and I is the holding cost per part per year. Note, both i and n are measured relative to the base year for money.
Accepted for Publication: IEEE Trans. on Engineering Management
s
s
c
mm
i f
f
E
TT threshold
i
0 (12)
and (9) becomes,
li
bib
mi
pi
mi CCfTCC (13)
Equation (13) is used to compute the cost of maintenance events in the ith year of system
support.
III. NUMERICAL CASE STUDY
This section presents a case study that implements the model developed in Section II with
data from a chemical product manufacturing company. The system considered is a legacy
control system (developed in the 1970s) with over 2000 instances (plants) installed and currently
supported worldwide. The control system has two identical controllers designed so that one can
take over for the other upon failure (or stoppage for maintenance). The system consists of
software and obsolete electronics hardware, which is supported via existing lifetime buys and
aftermarket procurements. The software is machine code and represents the most serious critical
skills loss problem. The model developed in Section II requires three primary distribution
inputs, which are determined from actual data: the current age distribution (fC), the distribution of
hiring age (fH) and the distribution of exit age (fL). However, the two distribution inputs that are
available from the actual data are the hiring age (fH) and a current age distribution (fC) shown in
Fig. 1. Notice that the current age distribution (Fig. 1b) has a mode of approximately 55 years of
age, which, unfortunately is very close to the early retirement age from the company and
demonstrates the issue that this paper focuses on.
Accepted for Publication: IEEE Trans. on Engineering Management
A. Synthesizing an Exit Age (fL) Distribution
The exit age distribution (fL) for this case study is not known and needs to be synthesized in a
way that is consistent with the known hiring age and current age distributions shown in Fig.1.
A simulator was created that starts with the distribution in Fig. 1a and uses a manually
defined exit age distribution to generate the current age distribution in Fig. 1b. The simulator
assumes that the hiring age and exit age distributions remain constant over time (a “stationary
process”) and that the company is propagated long enough to reach a steady state.7 The exit age
distribution was adjusted until a result that approximately matches the known current age
distribution (Fig. 1b) is obtained. We used 5 year increments in the simulator (assuming that this
allows perturbations in year-to-year hiring and exiting to be averaged out). The resulting
synthesized exit age distribution is shown in Fig. 2.
7 Note, the assumption that the hiring age and exit age distribution models remain constant over time is NOT an assumption of the general model developed in Section II. This assumption is specific to this stationary process case study. Note, in Section III.D the non-stationary process version of the case study lifts this assumption.
0
10
20
30
40
50
60
15 20 25 30 35 40 45 50 55 60 65
Count
Hiring Age into MOD
0
10
20
30
40
50
60
70
15 20 25 30 35 40 45 50 55 60 65 70
Count
Current Age in MOD
(a) (b)
Fig. 1. Actual data: (a) hiring age distribution (fH), (b) current age distribution (fC).
Accepted for Publication: IEEE Trans. on Engineering Management
The distribution in Fig. 2 is a “bathtub curve”.8 It shows that a large portion of the workers
exit early – most of these are workers who are changing jobs within the company. For the
company modeled in this case study, it has been difficult to retain new people to support the
legacy system as younger employees relocate to other job opportunities that they perceive as
having better long-term career prospects. Between the ages of 45 and 60 almost no workers
leave, and above 60 the workers are retiring.
B. Mapping Pool Size to Experience
The distributions shown in Figs. 1 and 2 only capture the number of workers (pool size), not
their relative experience. To get from pool size to experience one more input is needed: the
mapping from age to experience within the company. To generate the parameters for the
mapping function in (3), both the years of experience and the years of service to the company
were considered (Fig. 3). The following weighting was adjusted to maximize the correlation
coefficient between the two:
8 The bathtub curve is commonly used in reliability engineering to describe a particular form of the hazard function comprised on infant mortality, random failures, and wear-out. In our case the hazard rate (instantaneous failure rate in time) is analogous to the exit rate (in age).
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
15 20 25 30 35 40 45 50 55 60 65 70
Fraction of all people exiting the pool
during this bin
Synthesized Exit Age from MOD
Fig. 2. Synthesized exit age distribution (fL).
Accepted for Publication: IEEE Trans. on Engineering Management
Weighted Experience = 0.32(Years of Experience) + 0.68(Years in the Company) (14)
This weighting does not imply that years in the company is twice as important as experience -
experience is embedded within the years in the company.
At some point, workers cannot gain any more experience. Depending on the role of the
particular workers, experience was capped at between 5 and 15 years, i.e., the EE IaR term
in (3) is limited to a maximum value. Ideally, if enough data was available, we could have
synthesized separate hiring, loss and current age distributions for each role, however, the number
of usable records available did not allow this.
The best-fit line shown in Fig. 3 gives the values of the parameters in (3) as RE = 0.8446 and
IE = -21.068 for this case study.
C. Stationary Process Case Study Results
This section presents a case study that implements the model developed in Section II with
data from a chemical product manufacturing company. The fC, fH and fL distributions are shown
in Figs. 1 and 2. The analysis in this section assumes that all the input distributions are
y = 0.8546x ‐ 21.068R² = 0.708
0
5
10
15
20
25
30
35
40
0 10 20 30 40 50 60 70
Experience
Age
Average of Years of Service and Years of MOD Exp
67% confidence intervals
Fig. 3. Weighted experience to worker age correlation. R2 = 0.708.
Accepted for Publication: IEEE Trans. on Engineering Management
stationary (unchanging) for the entire analysis period. The mandatory retirement age, r, is
assumed to be 65 years at which workers are required to leave the skills pool, and the youngest
worker is y = 22.
Figure 4a shows NNET, the net pool size (number of workers) over time as a fraction of the
pool size in 2010. Fig. 4b shows the relative experience (relative to 2010), and Fig. 4c shows the
average age of the workers in the pool. These results assume no hiring, H = 0. The results
indicate that although a 10% drop in head count occurs in the first 6 years, the experience
remains approximately constant (existing workers are gaining enough experience to offset the
drop in head count). After 2016, the experience drops quickly as the most experienced workers
leave and are not replenished.
Accepted for Publication: IEEE Trans. on Engineering Management
If the lost skills are replenishable, we now wish to estimate the future hiring rate, Hi, required
to preserve the current level of experience, E0, in the skills pool. The analysis identifies a
solution for the system of equations (the objective function in (6) is subject to the constraint in
(7)) to determine the annual hiring rate, Hi, needed to replenish the loss in cumulative experience
0
0.2
0.4
0.6
0.8
1
1.2
2010 2015 2020 2025 2030 2035 2040 2045 2050 2055
Pool Size Relative to Today
Year
0
10
20
30
40
50
60
70
2010 2015 2020 2025 2030 2035 2040 2045 2050 2055
Average Age
Year
0
0.2
0.4
0.6
0.8
1
1.2
2010 2015 2020 2025 2030 2035 2040 2045 2050 2055
Relative Experience
Year
(a)
(b)
(c)
Year
Year
Year
Fig. 4. (a) Pool size, (b) relative experience, and (c) average age. No hiring (H = 0) assumed.
Accepted for Publication: IEEE Trans. on Engineering Management
as a result of attrition and retirement. The estimation of the required hiring rate can be performed
at any point during the analysis or as frequently as needed to replenish the skills pool and
maintain the desired experience level. For this example, the operation is repeated periodically at
1 year time steps until the end of the analysis period (end of the system’s support life). Figure 5
shows results for hiring rate, Hi, as a function of the number of years from the start of the
analysis.
Figure 5 shows that no hiring is initially required (note, hiring is not allowed to drop below
0, which would reflect a layoff situation). Starting in 2017 a hiring rate of over 6% is required
for 9 years and settles to 2-5% thereafter. Note, when H is greater than zero in (1), the hiring
rate is assumed to be applied to the entire hiring age distribution, fH. It is important to note that
the required hiring rate in Fig. 5 takes into account both the time required to learn the skills
necessary to support the system and the exit age distribution found in Fig. 2.
The impact of the results developed in this example appears in Fig. 6, which shows the
annual cost of supporting the legacy control system from the chemical product manufacturing
company through year 2040 (over 2000 instances of the system require support in this case).
The cost model used is a stochastic discrete event simulator that samples time-to-failure
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
7.00%
8.00%
9.00%
10.00%
2010 2015 2020 2025 2030 2035 2040 2045 2050 2055Required Hiring Rate to Maintain Pool Size
or Experience (%)
Year
Fig. 5. Required hiring rate, Hi, (if hiring is possible) to maintain pool experience.
Accepted for Publication: IEEE Trans. on Engineering Management
distributions for the components of the control system to obtain maintenance events (event dates
and components that need replacement). Subsystem-specific (and severity category specific9)
failure distributions are sampled to obtain failure dates for the system. At each maintenance
event, maintenance resources are drawn and a cost is estimated using (13). The majority of the
maintenance events do not result in business interrupt time because they only impact one of the
two parallel control systems and jbf = 0, however, a small fraction (the most severe events)
result in dual control system failures. The risk of dual failures and the resulting business
interrupt is captured by the differing severity categories. The specific data associated with the
system count, the subsystem/severity category reliabilities, and the cost of business interrupt
time is proprietary to the customer and not included here.
9 Severity categories indicate both the level of maintenance required (which translates into the maintenance resources needed) and the amount of business interrupt (if any) associated with the maintenance event.
$0
$2,000,000
$4,000,000
$6,000,000
$8,000,000
$10,000,000
$12,000,000
$14,000,000
$16,000,000
2010 2015 2020 2025 2030 2035 2040
Support Cost
Year
$2M
Human obsolescence not included
Human obsolescence included in cost model
Fig. 6. Annual cost to support a legacy system with and without the inclusion of skills obsolescence (H = 0). For this figure w = 0 (no cost of money), therefore, the results are actual dollars.
Accepted for Publication: IEEE Trans. on Engineering Management
For the case study, thresholdsf was determined to be 0.54, or once the number of people in the
pool drops below 54% of the number that are in the pool today (in 2010), extra maintenance time
penalty (modeled by (12)) will occur.
Two results are shown in Fig. 6. The results show that there is little effect of skill
obsolescence prior to 2025, after 2030 the impact of the loss of skills becomes significant. Note,
in year 2028 the availability of spares (hardware) also becomes a problem resulting in the cost
step shown between years 2028 and 2030. The lowest curve in Fig. 6 assumes that there is no
human obsolescence (icE = 1 for all i in (9)). Even in this case, the annual cost increases due to
part obsolescence that results in lifetime buys of parts that tie up significant capital in pre-
purchased parts and long-term holding costs is evident. The highest cost curve shown in Fig. 6
assumes that no replenishment of lost skills is possible (H = 0, which is close to reality for this
case study).
D. Non-Stationary Process Case Study Results
All the results presented so far are for the stationary process, i.e., the hiring age and exit age
distributions assumed to be the same at all times in the future. Figure 7 is for the same case as
Fig. 4b except that a non-stationary model with the following parameters: μ = 0.005, σ = 0.01
(hiring age), and μ = -0.005, σ = 0.01 (exit age) has been used. 100 samples are shown in Fig. 7
and the stationary result is indicated. It is evident that the stationary result represents a better-
than-average case. A histogram of the experience in 2021 relative to 2010 is shown as an inset
to demonstrate the utility of the non-stationary solution.
Accepted for Publication: IEEE Trans. on Engineering Management
E. Managerial Insights
Figure 6 is only an example result but it serves to demonstrate how the model developed in
this paper can be used to draw important managerial insights. The two questions posed in the
introduction to this paper were: (a) when will the company start to see the effects of the loss of
critical skills, and (b) how bad will it be? Based on Fig. 6, one would conclude that for the
example considered in the case study the impact of human obsolescence will not be noticed until
2030 and the cost of supporting the system as a result of impending human skills obsolescence
will increase to over $13M/year by 2040. What could management do with this information?
This paper offers a means to estimate the theoretical hiring strategy for replenishing a skills pool
based on experience. The result shown in Fig. 5 emphasizes a potential deficiency in the
planning and allocation of human skills to address future sustainment needs - unfortunately, in
the case study described in this paper, changing the hiring profile to that shown in Fig. 5 is
impractical (practically speaking, the hiring into the sustainment groups is close to zero). An
alternate strategy would be to manage sustainment needs through planned “phase-out” of
Fig. 7. Non-stationary model. Relative experience. No hiring (H = 0) assumed.
Accepted for Publication: IEEE Trans. on Engineering Management
existing systems. The result in Fig. 6 tells management that a replacement system needs to be
developed and deployed to the plants before 2030; this information can be used as part of a
business case made to higher levels of management to support long-term strategic planning. It
should be noted (and as demonstrated in Fig. 7) that the stationary solution in Fig. 6 is a better-
than-average case. Also, as discussed in the next section, while in the case study “break-fix”
may be adequately supported until 2030, the personnel performing break-fix tasks provide
additional value that will be lost long before 2030.
In addition to the application-specific insights, Fig. 2 in the case study supports the
supposition that younger workers leave legacy system sustainment jobs for other jobs in the
company - exit age peaks at 35 years and nobody exits the company between 45 and 55 years.
IV. DISCUSSION AND CONCLUSIONS
The general workforce planning problem is ensuring that you have the right number of
people, with the right skills sets, in the right jobs, at the right time. To address this problem in
cases where the workforce is non-replenishable, a model for the loss of critical skills necessary to
support legacy systems has been developed. The model estimates the number of skilled
employees (pool size) and the combined (cumulative) experience of the skills pool in order to
determine the resources that will be available to maintain a legacy system or family of systems.
The cumulative experience of the skills pool impacts the time (and thereby the cost) to perform
the necessary maintenance activities to sustain a system. Because of the prohibitive cost of
replacement, legacy systems are often not replaced until their age results in catastrophic failures
or support costs that are untenable. The model presented in this paper can be used to support
Accepted for Publication: IEEE Trans. on Engineering Management
business cases for system replacement by forecasting when issues will arise and how expensive
they will be.
Several simplifying assumptions have been made in the model described in this paper. In our
solution, we have assumed that everyone entering the pool has no experience – depending on the
application this may not be the case; some workers could enter the pool with relevant experience
gained from other positions. Our solution also assumes that the only way workers gain
experience is through years on the job. There may be other methods that can accelerate the rate
at which workers become more experienced, e.g., knowledge bases created to capture the
experience of older workers [26,27]. A discrete time analysis is presented in this paper because
the available input data only exists on an annual basis, however, a continuous time solution could
also be developed.
There are also indirect consequences of a shrinking skills pool that are difficult to quantify in
terms of money. The people who are maintaining systems (especially if they are engineers) are
most likely performing other tasks as well. Besides “break-fix,” their tasks may include
preventative maintenance, upgrade projects and knowledge transfer. As resources decrease, we
assume that all tasks except break-fix tasks would decrease; however, even if sufficient resources
are available for break-fix tasks, an acceleration in critical skills loss can occur as a result of
dwindling resources. Reducing non-break-fix tasks can help by increasing or improving: future
maintenance efficiency, system reliability that could decrease future maintenance requirements,
system performance, and retention of skilled people. There are many other factors that could
modify the case study results presented. These include geography (location in the world),
gender, the type of product produced, etc. These effects can certainly be analyzed with the
model if sufficient data existed to distinguish groups of workers and places of employment.
Accepted for Publication: IEEE Trans. on Engineering Management
The model presented in this paper has only been used to assess support (maintenance and
business interrupt) costs, but it could also be used to assess system availability penalties (beyond
business interrupts) that may be assessed by customers for systems subject to availability
contracts such as performance-based logistics contracts [40].
Finally, the initial use case presented in this paper should be followed-up with a larger
population that potentially enlists the cooperation of more than one company possibly via the
solicitation of support from relevant trade organizations whose members share a common
interest in this topic.
REFERENCES
[1] P. Sandborn, “Trapped on technology’s trailing edge,” IEEE Spectrum, vol. 45, no. 4, pp. 42-58, 2008.
[2] P. Sandborn, “Software obsolescence - Complicating the part and technology obsolescence management problem,” IEEE Transactions on Components and Packaging Technologies, vol. 30, no. 4, pp. 886-888, 2007.
[3] A. De Grip and J. Van Loo, “The economics of skills obsolescence: A review,” Research in Labor Economics, vol. 21, ed. A. De Grip, J. Van Loo, and K. Mayhew, Elsevier, pp. 1-26, 2002.
[4] A. De Grip, J. Van Loo, and K. Mayhew, The Econonics of Skills Obsolescence: Theoretical Innovations and Emperical Applications, Research in Labor Economics, vol. 21, Elsevier, 2002.
[5] F. Green, S. Machin, and D. Wilkinson, “The meaning and determinants of skills shortages,” Oxford Bulletin on Economics and Statistics, vol. 60, no. 1, pp. 165-187, May 1998.
[6] K. Melymuka, “Don’t just stand there – Get ready,” Computerworld, vol. 36, no. 26, pp. 48-49, 2002.
[7] D. Besanko, U. Doraszelski, Y. Kryukov, and M. Satterthwaite, “Learning-by-doing, organizational forgetting and industry dynamics,” Econometrica, vol. 78, no. 2, pp. 453-508, March 2010.
[8] C. D. Bailey, “Forgetting and the learning curve: A laboratory study,” Management Science, vol. 35, no. 3, pp. 340-352, March 1989.
[9] L. Argote, S. Beckman, D. and Epple, “The persistence and transfer of learning in an industrial setting,” Management Science, vol. 36, no. 2, pp. 140-154, 1990.
Accepted for Publication: IEEE Trans. on Engineering Management
[10] L. Argote, Organizational Learning: Creating, Retaining and Transferring Knowledge, Springer, New York, 2013.
[11] J. Waldman, “Measuring retention rather than turnover: A different and complementary HR calculus,” Human Resource Planning, vol. 27, no. 3, pp. 6-9, January 2004.
[12] “Nuclear workforce planning: Preparing for the long haul,” ScottMadden Perspectives, Spring 2008. Available at: http://www.scottmadden.com/insight/233/Nuclear-Workforce-Planning.html
[13] “Testimony of Elliot Pulham President & CEO, The Space Foundation, Before the Commission on the Future of the U.S. Aerospace Industry” May 14, 2002, Available at: http://www.spaceref.com/news/viewpr.html?pid=8335
[14] M. Leibold and S. Voelpel, Managing the Aging Workforce: Challenges and Solutions, John Wiley & Sons, New York, NY, 2006.
[15] E. Goodridge and M. K. McGee, “IT’s generation gap,” Informationweek.com, 2002.
[16] G. Hilson, “Generation gap is big iron’s only threat,” Computing Canada, vol., 27, no. 7, pp. 1 and 8, March 23, 2001.
[17] Congressional Record Volume 140, Number 65 (Monday, May 23, 1994). Available at: http://www.gpo.gov/fdsys/pkg/CREC-1994-05-23/html/CREC-1994-05-23-pt1-PgH68.htm
[18] S. Shead, “Universities won’t teach ‘uncool’ Cobol anymore – but should they?” ZDNet, March 7, 2013. Available at: http://www.zdnet.com/universities-wont-teach-uncool-cobol-anymore-but-should-they-7000012250/
[19] J. D. Ahrens, N. Prywes and E. Lock, “Software process reengineering: Towards a new generation of CASE technology,” J. Systems Software, vol. 20, pp. 71-84, 1995.
[20] W. S. Adolph, “Cash cow in the tar pit: Reengineering a legacy system,” IEEE Software, vol. 13, no. 3, pp. 41-47, 1996.
[21] D. M. Andolšek, “Knowledge hoarding or sharing,” Proceedings of the Management, Knowledge and Learning International Conference, Celje Slovenia, 2011.
[22] S. S. Dubin, “Obsolescence of lifelong education,” American Psychologist, vol. 27, pp. 486-498, 1972.
[23] J. Allen and A. De Grip, Skill Obsolescence Lifelong Learning and Labour Market Participation, Research Centre for Education and the Labour Market, Maastrict: Maastricht University, 2004.
[24] W. Arthur, W. Bennett Jr., P. L. Stanush, and T. L. McNelly, “Factors that influence skill decay and retention: A quantitiatve review and analysis,” Human Performance, vol. 11, no. 1, pp. 57-101, 1998.
[25] J. D. Hagman, Effects of Training Schedule and Equipment Variety on Retention and Transfer of Maintenance Skill, Alexandria, VA: U.S. Army Research Institute for Behavioral and Social Sciences, Research Report 1309, 1980.
[26] C. Joe and P. Yoong, “Harnessing the knowledge assets of older workers: A work in progress report,” Decision Support in an Uncertain and Complex World: The IFIP TC8/WG88.3 International Conference, pp. 392-401, 2004.
Accepted for Publication: IEEE Trans. on Engineering Management
[27] D. E. Hailey and C. E. Hailey, “The nature of process preservation: Tools for capturing and preserving professional skills.” Available at: http://imrl.usu.edu/publications/ProcessP.pdf
[28] B. Friel, “Reality check,” Government Executive, May 15, 2002.
[29] G. W. Bohlander and S. A. Snell, Managing Human Resources, 15th edition, Mason, Ohio: South-Western CENEGAGE Learning, 2010.
[30] S. K. Bordoloi, “Flexibility, adaptability and efficiency in dynamic manufacturing systems,” Journal of Production and Operations Management, vol. 8, no. 2, pp. 133-150, 1999.
[31] H.-C. Huang, L.-H. Lee, H. Song, and B. T. Eck, “SimMan - A simulation model for workforce capacity planning,” Computers and Operations Research, vol. 36, no. 8, pp. 2490-2497, 2009.
[32] B. A. Holt, Development of an Optimal Replenishment Policy for Human Capital Inventory, Ph.D. Dissertation from the University of Tennessee Knoxville, 2011.
[33] J. Koochaki, J. A. C. Bokhorst, H. Wortmann, and W. Klingenberg, “The influence of condition-based maintenance on workforce planning and maintenance scheduling,” International Journal of Production Research, vol. 51, no. 8, pp. 2339-2351, 2013.
[34] S. Martorell, M. Villamizar, S. Carlos, and A. Sanchez, “Maintenance modeling and optimization integrating human and material resources,” Reliability Engineering and System Safety, vol. 95, no. 12, pp. 1293-1239, December 2010.
[35] D. Ait-Kaki, J.-B. Menye, and H. Kane, “Resources assignment model in maintenance activities scheduling,” International Journal of Production Research, vol. 49, no. 22, pp. 6677-6689, November 2011.
[36] S. Ahire, G. Greenwood, A. Gupta, and M. Terwillinger, “Workforce-constrained preventative maintenance scheduling using evolution strategies,” Decision Sciences, vol. 31, no. 4, pp. 833-859, 2000.
[37] N. Gorjian, L. Ma, M. Mittinty, P. Yarlagadda, and Y. Sun. “A review on degradation models in reliability analysis,” Proceedings 4th World Congress on Engineering Asset Management, Athens, Greece, 2009.
[38] W. J. Fabrycky and B. S. Blanchard, Life-Cycle Cost and Economic Analysis, Prentice Hall, Upper Saddle River, NJ, 1991.
[39] D. Feng, P. Singh, and P. Sandborn, “Optimizing lifetime buys to minimize lifecycle cost," Proceedings of the 2007 Aging Aircraft Conference, Palm Springs, CA, April 2007.
[40] R. Beanum, “Performance-Based logistics and contractor support methods,” Proceedings of the IEEE Systems Readiness Technology Conference (AUTOTESTCON), 2006.