Mortality forecasting methods - UiO
Transcript of Mortality forecasting methods - UiO
Mortality forecasting methods:
The Lee-Carter model vs. the Brass model. Estimation on Norwegian data.
Xuan Ngoc Thi Tran
Master of Philosophy in Economics Department of Economics
University of Oslo
May 2019
Acknowledgements
First and foremost, I would like to express my deepest gratitude to my supervisor, Nico Keilman.
Your generous guidance, insightful feedback and endless enthusiasm really paved the way for
this thesis.
A very special dedication to my parents, Suong Van Tran and Gai Thi Truong, whose long
journey, hard work and resilience made this journey possible.
I would also like to thank all my friends at SV who has been there since the very beginning, and
especially Sigri, who helped me to the finish line.
And finally, to CBB. Without you I would not have thrived in the rush.
Any remaining errors or shortcomings are entirely my own.
Abstract
The aim of this thesis is to compare two mortality forecasting models, the Lee-Carter model, and
the Brass model. The analysis in this thesis makes use of Statistics Norway’s population projection
report, which uses the Lee-Carter model in their forecasts, and the life table data for the Norwegian
population collected from SSB. In an attempt to find a simpler substitute to the Lee-Carter model,
I try to answer the three following questions: Will the Brass model result in a good fit to the
Norwegian data? Will the Brass model result in a simpler yet adequate substitute for a mortality
forecasting using Norwegian data? How do the results for future mortality obtained by the Brass
model differ from those based on the Lee-Carter model?
Comparing the two referred models, the Brass model did indeed return different results from those
reported by SSB. The forecasted life expectancy values found in this thesis were consistently higher
than the Lee-Carter values. In 2060 I forecast a life expectancy of 91.02 years for women and 90.67
for men. This differed with the Lee-Carter results by 2.27 and 0.72 years, respectively. Both models
found future life expectancy between men and women to intersect too early. This due to a steep
increase in male life expectancy during the past few decades. SSB made an arbitrary adjustment in
female life expectancy projections to correct for this. My analysis show a stronger argument for
adjusting male life expectancy, which suits the forecasted values of the Brass model better.
The analysis found both advantages and limitations using the Brass model. The regression returned
a good fit to Norwegian data. A logit transformation allows the relational Brass model to be
expressed linearly. This gives two parameters that permit a simple and intuitive interpretation of
the data. The advantage of the logit transformation equally resulted in a more complex estimation
of prediction intervals. A simulation had to be performed, returning a wider prediction interval than
what was reported by SSB. It is too premature to conclude whether the Brass model is better than
the Lee-Carter model, but a more comprehensive analysis is recommended.
Table of contents
1 Introduction and background .................................................................................... 1
2 Data description .......................................................................................................... 3 2.1 The survivorship function and life expectancy ............................................................................. 4
2.2 Period Selection ............................................................................................................................ 7
3 The Model .................................................................................................................... 9 3.1 The Brass model (1971) ................................................................................................................ 9
3.1.1 The logit transformation ........................................................................................................... 9
3.1.2 The standard life table ............................................................................................................ 10
3.1.3 The parameters: 𝜶𝒕 and 𝜷𝒕 ..................................................................................................... 12
4 Estimation .................................................................................................................. 13 4.1 Fitting the model ......................................................................................................................... 13
4.2 The fitted model .......................................................................................................................... 14
4.3 Results: alpha and beta ................................................................................................................ 15
4.4 Expanding the base period .......................................................................................................... 17
5 Forecasting models for the parameters ................................................................... 20 5.1 The Random walk ....................................................................................................................... 20
5.2 The unit root test ......................................................................................................................... 21
5.3 Forecasting results ....................................................................................................................... 22
5.3.1 Alpha ....................................................................................................................................... 22
5.3.2 Beta ......................................................................................................................................... 25
5.4 Life expectancy ........................................................................................................................... 27
5.5 The prediction interval (PI) ......................................................................................................... 29
5.5.1 Finding the variance ............................................................................................................... 31
5.5.2 Simulation ............................................................................................................................... 31
6 Summary and conclusion ......................................................................................... 35
7 References .................................................................................................................. 37
List of figures and tables
Figures
Figure 2.1 Rectangularization ......................................................................................................... 5
Figure 2.2: History of life expectancy in Norway Source: FHI, 2018 ............................................ 7
Figure 3.1: The logits of the standard life table, 1990 and 2017. ................................................. 11
Figure 4.1: Line graphs of the logits of 𝑙𝑥 for 1990 and 2017 against the standard ..................... 16
Figure 4.2 Estimated parameters for each year between 1990 and 2017 . (a) 𝛼𝑡 and (b) 𝛽𝑡. ....... 18
Figure 4.3: Fitted parameters 𝑏𝑎𝑠𝑒1950 vs. 𝑏𝑎𝑠𝑒1990. (a) 𝛼𝑡, women; (b) 𝛼𝑡, men; (c) 𝛽𝑡,
women and (d) 𝛽𝑡, men. ................................................................................................................. 19
Figure 5.1: Forecasts with 95% prediction intervals ..................................................................... 24
Figure 5.2: Forecasted life expectancy at birth for men and women, 10-year intervals between
2020-2060 ....................................................................................................................................... 28
Figure 5.3: SSB’s forecasted e0 with 80% PI Source: Statistics Norway, 2018 .......................... 30
Figure 5.4: Forecasted e0 with 80% PI ......................................................................................... 30
Tables
Table 1.1: Comparability with SSB ................................................................................................ 2
Table 2.1: An excerpt of the survivorship function life table ......................................................... 4
Table 3.1: Interpretation of 𝛼𝑡 and 𝛽𝑡 Source: Rowland, 2003 .................................................... 12
Table 4.1: Fitted values of 𝛼𝑡 and 𝛽𝑡 ............................................................................................ 14
Table 5.1: Life expectancy at birth (𝑒0) for men and women. Brass compared to SSB .............. 27
Table 5.2: Variance of 𝛼𝑡 in 2040 and 2060 for men and women ................................................ 32
Table 5.3: Variance of forecasted 𝛽𝑡 for men and women ........................................................... 32
Table 5.4: Variance of forecasted 𝜀𝑡 for men and women ............................................................ 33
Table 5.5: The values of 𝛼𝑡, 𝛽𝑡 and 𝜀𝑡 used in the simulation ...................................................... 34
1
1 Introduction and background
Statistics Norway (“Statistisk sentralbyrå”, abbreviated as SSB) publishes official population
projections for Norway with regular intervals. Part of these projections are assumptions on the
future course of mortality. For more than a decade, SSB has used the so-called Lee-Carter model
for analysing the historical development of mortality and extrapolating it into the future. This
model, originally constructed by Ronald D. Lee and Lawrence Carter in 1992, starts from a table
of empirical age-specific death rates - one for men and one for women (SSB, 2018).
When the death rate in year 𝑡 for a population aged 𝑥 is written as 𝑚7,8, the model is
𝑙𝑛;𝑚7,8< = 𝑎7 + 𝑏7𝑘8 + 𝜀7,8 (1)
Parameters 𝑎7 , 𝑏7 and 𝑘8 are to be estimated from the data, while 𝜀7,8 is an error term. The
estimates of 𝑎7, for all ages 𝑥, can be interpreted as the general age schedule of mortality. The
parameter 𝑘8 reflects year-on-year changes in mortality. These changes are not the same for every
age. The parameter 𝑏7 reflects how they differ across the ages. Once the parameters have been
estimated, predictions of mortality consist of extrapolating the time series of 𝑘8-values. Parameters
𝑎7 and 𝑏7 are kept constant.
The Lee-Carter model, as expressed in (1), was first introduced in 1992 for US data. Given an
unexpectedly good fit to data of other western countries, it has since become one of the most widely
used forecasting models for mortality. However, it has some limitations:
1) Estimating the parameters of the model is complicated because the right-hand side of
expression (1) does not have any independent variables - there are only parameters. One
solution is to use singular value decomposition of 𝑙𝑛(𝑚7,8) − 𝑎A7 . Parameters 𝑎7 are
estimated as averages of 𝑙𝑛(𝑚7,8) across time (Lee and Carter, 1992).
2) Assuming a constant value of the parameters 𝑏7 may lead to strong distortions in the age
pattern of predicted mortality.
2
This paper attempts to overcome the complexity of the Lee-Carter model by applying the Brass
model. The Brass model was introduced in 1971 and with it, Brass created a new method to
generate life tables. It is one example of a so-called relational model life table, where a life table is
computed based on the relation to a “standard” life table (Rowland, 2003). The structure of the
model and estimation of its parameters are considerably less complicated than the Lee-Carter
model. Thus, a natural question is whether the Brass model is eligible to be used in forecasting
Norwegian mortality. Brass himself was optimistic about using his own method in forecasting
(Brass, 1971). In their paper, Lee and Carter also refer to the Brass model as a feasible option to
mortality forecasting (Lee and Carter, 1992). The application of the Brass method in mortality
forecasts has been explored in a number of studies by researchers including Keyfitz (1991) and
Himes et al. (1994), but none with the use of Norwegian mortality data.
Therefore, the questions that I will attempt to answer in this thesis will be as follows:
1. Will the Brass model result in a good fit to the Norwegian data?
2. Will the Brass model result in a simpler yet adequate substitute for a mortality forecasting
using Norwegian data?
3. How do the results for future mortality obtained by the Brass model differ from those based
on the Lee-Carter method?
To maintain comparability, I will throughout this thesis use SSBs approach as a guideline for many
of the adjustments that I will make in this analysis. The appropriate time period to use will also be
considered, therefore all available data is collected. Calculations were done using the statistical
softwares Stata and R. A summary of the comparability with SSBs analysis can be found in the
table below with further justifications explained throughout the text.
SSB This thesis
Age group 0-119 0-110
Base period 1990-2017 1990-2017
Forecasting horizon 2018-2100 (82-years) 2018-2060 (43-years)
Table 1.1: Comparability with SSB
3
2 Data description
The Norwegian life table data used in this paper was collected from two sources: Statistics Norway
(SSB) and the Human Mortality Database (HMD). Public access to SSB data only dates back to
1966 (SSB: Statistikkbanken, 2018). To get the entire timeline, the dataset from SSB (1966-2017)
was combined with HMD-data from 1846 to 1965.
In the SSB report for Norway’s 2018 population projections, the period 1990-2017 was used as the
base to calculate future mortality. It is assumed that the dataset I have collected reflects the one
used in the SSB report. To maintain comparability I will therefore mainly use the data provided by
SSB. It must be noted that the age on the available life tables ranges from 0 to 106. I have extended
the range to age 110 by assuming a death probability of 0.5 for each year from 107 to 110. In SSB’s
report the maximum age is set to 119. Despite this difference, the main purpose of this paper is to
ultimately find the life expectancy at birth. Given the small and variable number of survivors at
these age groups, along with the calculation methods used in this thesis, the exclusion of the age
group 110-119 is not expected to have a significant outcome on the results.
In this paper we only use the survivorship function (“𝑙7-column”) of period life tables. This is a
cumulative probability function, where the 𝑙7-column describes the number of survivors at exact
age 𝑥 per 100,000 births based on a given set of age specific mortality rates. Here, I use one-year
age groups. The survivorship function shows the distribution for each age, starting at birth and
ending at age 110. The annual data is available by sex, and for reasons that will be further explained
later on, I will analyse the data for men and women separately. An excerpt of the dataset is
illustrated in table 2.1.
4
Table 2.1: An excerpt of the survivorship function life table
2.1 The survivorship function and life expectancy
Figure 2.1 plots the survivorship function for the Norwegian population based on age specific death
rates from the years 1846, 1950 1990 and 2017. The illustration shows a clear pattern in the
Norwegian survival distribution: with the years, the concentration of deaths in older ages has
increased. Survival was more evenly distributed between ages in 1846, where roughly 59% of
women and 54% of men survived until age 50. In 2017, the corresponding numbers had increased
to 98% and 97%, respectively. Improved living conditions, better health system and prevention of
diseases in the 150-year period has led to a drastic decline in infant and child mortality, currently
having reached an all-time low (FHI, 2018). An increasing proportion survive till maturity or older
ages and it is expected that this development will continue in the future (SSB, 2018). This tendency,
where the graph shifts toward the upper right corner making it look like a rectangle, is called
rectangularization (Rowland, 2003).
5
Figure 2.1 Rectangularization
6
Life expectancy (𝑒7) is a measure of the average time a population group is expected to live based
on a given set of age-specific death rates. In the past, war, poverty and poor health were main
factors affecting the age at which people died. Technological improvements and the introduction
of penicillin in 1925 lead to a great reduction of infectious diseases and deaths (Meslé and Vallin,
2011).
Figure 2.2 shows the life expectancy at birth for males and females in Norway during the period
1846-2016. As illustrated, the curves follow different paths. The life expectancy for women has,
for any given point in time, exceeded the life expectancy for men. Since 1925, life expectancy for
the female population has steadily increased, while the trajectory for the male population has been
less consistent. The gender gap in life expectancy was at its greatest in the time frame 1950-1990,
peaking at around the 1980’s. From 1990 onwards, the steepness in men’s life expectancy trajectory
has exceeded the one for women, which has led to a dramatic decrease in the life expectancy gender
gap.
Interestingly, a gender gap in life expectancy may be rooted in numerous factors related to biology
and psychology, ultimately leading to behaviours and lifestyle choices that affect life expectancy
(Oksuzyan et. al, 2006). Given the peak in the life expectancy gender gap during 1950-1990, the
most prominent explaining factor has been tobacco use. The Norwegian institute of public health
(FHI) reports that the proportion of men smoking peaked in the 1960’s followed by a steep decline.
For women, on the other hand, the smoking proportions were constant during the same time frame
and did not start to decline until the 90’s. A greater change in men’s habits has now lead to a drastic
reduction in the gap. From 2017 and onwards SSB reports that smoking is expected to be less
important for the gender gap in the future (SSB, 2018). This is further supported by a recent
Swedish study which pointed out that this impact plays a minor role for the current gender gap in
the Swedish population (Sundberg et. al., 2018). Given the variating factors in the sexes, the
forecasting of life expectancy for men and women will be performed separately.
7
Figure 2.2: History of life expectancy in Norway
Source: FHI, 2018
2.2 Period Selection
In mortality forecasting it is important to consider what past period should be used to extrapolate
future mortality (Keyfitz, 1991). As I have pointed out previously, the Norwegian life expectancy
patterns have varied greatly with time, especially for the male population. Choosing the appropriate
dataset as the base must therefore be done with careful consideration. SSB has noticed this in their
reports, as they have consistently underestimated the population projections dating before 2016. In
newer reports, the base period has been changed to 1990-2017, which has been found to reduce the
issue of underestimation (SSB, 2018). Choosing the same base period will not only maintain
comparability between the studies, but I also find this to be a reasonable time frame to use given
the consistency in life expectancy trajectories for both sexes in the period, as can be seen in figure
2.2.
8
The SSB report use prediction values for each year leading to 2100. Given the relatively short base
period (n = 28 years), the length of the forecasting horizon are subdued to large inaccuracies. As
SSB uses 2060 as one of their main reference years, I will restrict my forecasting horizon to this
year. This gives a shorter forecasting horizon which is equal to 43 years (m = 43).
9
3 The Model
When William Brass introduced his method in 1971, he revolutionized how life tables for
populations with low quality or incomplete datasets were constructed. His mathematical relational
method allowed life tables to be constructed independently of historical data (Preston et. al, 2004).
The World Health Organization (WHO) presents the Brass logit system as one of the main model
life tables, along with commonly used life tables like the UN model life table and the Coale-
Demeney model (WHO, 2000). In this section I will lay down the foundation of the Brass model,
followed by a description of the application to mortality forecasting on Norwegian data.
3.1 The Brass model (1971)
When Brass developed his relational method, he noticed that the relational curve between two life
tables becomes an approximate straight line, after logit transformation. This discovery allowed the
relationship between two survivorship functions to be expressed linearly:
𝑦7,8 = 𝛼8 + 𝛽8𝑦7,C + 𝜀8 (2)
The y on the left-hand side represents the logit transformed survivorship function for a given year,
𝑡, and a given age, 𝑥. Similarly the y on the right-hand side has also been logit transformed, but
this one represents a single reference point, which in numerous literatures has been referred to as
the standard life table. This is denoted by the 𝑠 (Newell, 1988). Parameters 𝛼8 and 𝛽8 are to be
estimated from the data, while 𝜀8 is an error term. Conventionally, by using available data one can
therefore make reasonable assumptions about 𝛼8 and 𝛽8, and together with the standard life table,
compute the life table for a given year.
3.1.1 The logit transformation
Following the model, the first key step consists of transforming the 𝑙7 -values by logit
transformation:
1. Divide 𝑙7 by 100 000 to express it as a proportion, 𝑝7.
10
2. Logit transform 𝑝7 by using the following expression:
𝑙𝑜𝑔𝑖𝑡(𝑝7) = 0.5𝑙𝑛(1 − 𝑝7𝑝7
) (3)
Here we follow Brass’ original approach. Alternatively we could have defined 𝑙𝑜𝑔𝑖𝑡(𝑝7) as
𝑙𝑛(𝑝7/(1 − 𝑝7)).
3.1.2 The standard life table
In relevant literature it has been stated that “…Any standard can be chosen, but it is sensible to
choose one that shows some sort of average pattern.” (Newell, 1988, pp.155). Brass has defined a
general standard life table, but he recommended using a life table that better reflects the population
given that the data is available (Brass, 1971).
Given the complete datasets available for the Norwegian population, I found the former to be the
most sensible approach: by using the average. The standard life table can therefore be found by the
following expression:
𝑙C =
∑ 𝑙7,88KL8
𝑛 (4)
Where 𝑡 is the start of the base period, 𝑇 is the end and 𝑛 = 𝑇 − 𝑡 + 1 is the number of years in
the given period.
As stated in the period selection section, the base period is 1990-2017, which gives the following
calculation of the standard life table:
𝑙7,C =
∑ 𝑙7,8NOPQ8KPRRO
28 (5)
The values of the logit transformed life table are illustrated in figure 3.1. As expected, the standard
falls exactly between 1990 and 2017. Using the standard life table to find the life expectancy at
birth would return values close to those in the mid-2003.
11
Figure 3.1: The logits of the standard life table, 1990 and 2017.
12
3.1.3 The parameters: 𝜶𝒕 and 𝜷𝒕
Given the standard, the Brass model summarizes age-specific mortality for a fixed year 𝑡 in two
parameters, 𝛼8 and 𝛽8. The estimate of 𝛼8 can be interpreted as the level of mortality in year 𝑡. The
second parameter, 𝛽8, denotes the relationship between childhood and adult mortality. It must be
noted that both interpretations must be
seen relative to the standard. A computed
life table using 𝛼8 = 0 and 𝛽8 = 1
returns a life table identical to the
standard life table.
In relevant literature, Brass’ general
standard life table has been used to find
appropriate ranges related to 𝛼8 and 𝛽8 .
Reasonable values are illustrated in table
3.1, where the range for 𝛼8 is set between
-1.5 and 0.8 and for 𝛽8 between 0.6 and
1.4. Note that the general standard life table returns life expectancy close to mid-1940’s. As my
standard life table is close to mid-2003, the ranges in my analysis could differ.
Given the interpretations of 𝛼8 and 𝛽8, it is possible to infer expectations on the regression results.
For both men and women in the Norwegian population, the life expectancy has steadily increased
over the past few decades. SSB has reported that this number will continue to increase in the future
(SSB, 2018). As time increase, 𝛼8 is therefore expected to become more and more negative.
During the past decade, life expectancy has reached an all-time high and correspondingly, child
mortality has reached an all-time low (FHI, 2018). If either of these values would improve in the
future, the marginal change is expected to be quite small. It is therefore reasonable to assume that
the child-adult mortality relationship will remain quite steady, with a 𝛽8 close to one.
𝛼8
-1.5 0.0 0.8
Higher life expectancy
Life expectancy the same as the standard
Low life expectancy
𝛽8
0.6 1.0 1.4
High infant and child mortality, low adult mortality
The relationship between adult and child mortality the
same as the standard
Low infant and child mortality,
high adult mortality
Table 3.1: Interpretation of 𝛼8 and 𝛽8 Source: Rowland, 2003
13
4 Estimation
4.1 Fitting the model
The linearity of model (2) allows the parameters 𝛼8 and 𝛽8 to be estimated by an ordinary least
squares regression (OLS). In his paper, Brass expressed a concern applying the OLS to fit the
estimates. Using the United Nations mortality schedule he found irregularities in the youngest and
oldest age groups. Especially, when comparing the fitted values against the observed values, he
noted great discrepancies at age one and in the oldest age groups – they did not fit the data well.
With the possibility of errors in reporting, Brass suggested weighted least squares (WLS) as an
alternative method. He reasoned that putting less weight on these groups in the regression could
solve the issue. However, he concluded that the simple operation of the OLS outweigh the
“arbitrary and laborious” application of the other, ultimately preferring the OLS (Brass, 1971).
Stewart (2004) performed an evaluation of the statistical methods used in the Brass relational
model, using the West model level 22 stable life table as the underlying mortality distribution.
Testing five different methods, which were based on a variation of OLS, WLS and maximum
likelihood estimation (MLE), he concluded that the MLE produced the most efficient estimates.
There is an additional concern. The survivorship function 𝑙7 is not observed, but is constructed
based on estimated age-specific death rates. An appropriate approach would be to use data on death
counts and numbers of persons alive, both broken down by age, use a Poisson model to estimate
the death rates and their standard errors, and include these standard errors in the estimation
procedure for 𝛼8 and 𝛽8. But this method is also found to be too complicated.
In the introduction I listed the level of “complexity” as one of the limitations to the Lee-Carter
model. Despite the suggested methods of WLS or MLE, the logit transformation aspect of the data
makes the application of these methods quite difficult. There is also a possibility that the Norwegian
population data returns a better fit to the model. Thus, staying true to Brass and the underlying idea
of this thesis, I will proceed using the simple method of OLS.
14
4.2 The fitted model
An important assumption regarding the error term is that it has zero expectation and a constant
variance. A violation in the assumption could lead to inefficiency in the estimates and at the very
worst, biasedness (Carter Hill et. al, 2008). When I checked for heteroscedasticity in the residuals,
the patterns were found to be slightly irregular. Therefore, to overcome the presence of
heteroscedasticity the OLS regression was performed with robust standard errors.
Regardless of the robust standard error application, I found the model to return a good fit to the
data. For both men and women, all the reported 𝑅²-values were greater than 0.99. The estimated
values of 𝛼8and 𝛽8 with their corresponding standard errors (SE) and 𝑅²-values are listed in table
4.1. This table gives the time series dataset for 𝛼8and 𝛽8 , which will be used in the mortality
forecast analysis.
Table 4.1: Fitted values of 𝛼8 and 𝛽8
15
In figure 4.1, the empirical survivorship function (in logit form) 𝑦8 is plotted against the fitted
values. Fitting the model by using OLS on Norwegian data, I did not find the issues Brass referred
to, except for minor tendencies in the oldest age group in 2017 for men, as illustrated by the green
curve in figure 4.1 (b).
4.3 Results: alpha and beta
As the parameters 𝛼8 and 𝛽8 lays the foundation for the mortality forecast, it is important to have a
closer look at their behaviour. Figure 4.2 shows the estimated 𝛼8 and 𝛽8 parameters for each year
between 1990 and 2017. The green curves represent women, while the blue represent men. A fitted
dashed line has been added to each of the parameters to give a better illustration of the slopes.
Both parameters seem to behave as I previously expected. 𝛼8 for both men and women follows a
downwards going slope, with men following a steeper trajectory. This reflects that men in the last
few decades have had a greater improvement in life expectancy compared to women, as noted in
section 2.1.
The parameter 𝛽8 for both sexes fluctuates around a value close to one, indicating little change
between the child- and adult mortality in the past two decades. While the 𝛽8 for women follows a
close to constant path, the men’s path is slightly tilting downwards. Following a similar reasoning
as for 𝛼8, a great reduction in the mortality rate for men has naturally had a greater impact on the
child-adult mortality relationship.
16
Figure 4.1: Line graphs of the logits of 𝑙7 for 1990 and 2017 against the
standard
17
4.4 Expanding the base period
In section 2.2 I set the base period to be 1990-2017 (“𝐵𝑎𝑠𝑒PRRO”). To further emphasize the
importance of period selection and to see the implications of choosing a different one, I performed
an analysis using a longer time span. I did this by expanding the base period, starting in 1950
(“𝐵𝑎𝑠𝑒PRXO”) instead of 1990. 1950 was arbitrarily chosen, but looking back at figure 2.2 this
period marks the start of modern life in Norway. Living conditions improved remarkably and war,
hunger and pandemics were no longer determining factors in life expectancy. This makes it a more
realistic option, as opposed to using the whole timeline since 1846.
I recomputed the standard using expression (6) and re-estimated 𝛼8 and 𝛽8 for every year in the 68-
year period. The results are illustrated in figure 4.3.
𝑙7,C =
∑ 𝑙7,8NOPQ8KPRXO
68 (6)
Figure 4.3 show the results using 𝑏𝑎𝑠𝑒PRXO (blue) compared to using 𝑏𝑎𝑠𝑒PRRO (green). 𝛼8 for
women are illustrated in (a) and for men in (b). Similarly for 𝛽8, (c) represents women and (d) men.
The dashed lines have been added to illustrate the slopes of the curves. Given the different base
period there is a natural shift in all the curves.
With the exception of 𝛼8 for women, the change in the base period returned remarkably different
results. In figure 4.3 (b) we can see that using 𝑏𝑎𝑠𝑒PRRO returned a steeper slope than 𝑏𝑎𝑠𝑒PRXO.
This reflects how men during the past few decades have had a remarkable increase in life
expectancy compared to previous years. The female child-adult mortality relationship follow a
more intuitive path, as seen in figure 4.3 (c) – with remarkable decreases in infant mortality 1950-
1990, it is natural that the curve is steeper for 𝑏𝑎𝑠𝑒PRXO compared to 𝑏𝑎𝑠𝑒PRRO.
18
(a)
(b)
Figure 4.2 Estimated parameters for each year between 1990 and 2017 . (a) 𝛼8 and (b)
𝛽8.
19
The flat trend for men using 𝑏𝑎𝑠𝑒PRXO in figure 4.3 (d) is a little misleading, where it seems like
the relationship has been constant since the 1950s. This is due to the high volatility in male
mortality combined with an decreasing infant mortality.
This analysis further confirms the importance of period selection. Men’s trajectory is probably
more extreme than it will be in the future, but overall it seems that the development the past few
decades reflects the future better. In what follows, I will restrict myself to parameter estimates for
1990-2017 (see Table 4.1).
Figure 4.3: Fitted parameters 𝑏𝑎𝑠𝑒PRXO vs. 𝑏𝑎𝑠𝑒PRRO.
(a) 𝛼8, women; (b) 𝛼8, men; (c) 𝛽8, women and (d) 𝛽8, men.
20
5 Forecasting models for the parameters
Now that the parameters have been estimated and evaluated, the forecasting of the mortality consist
of extrapolating the time series of 𝛼8 - and 𝛽8 -values. To produce the appropriate forecasting
models for the parameters, I applied basic time series theory to the data.
5.1 The Random walk
As illustrated in figure 4.2, both 𝛼8 and 𝛽8 for men and women fluctuates around a given trend.
These fluctuations are seemingly random, which points to the random walk model as a suitable
model. The random walk is one of the fundamental models in time series modelling and is defined
by the following expression:
𝑦8 = 𝑦8ZP + 𝑐 + 𝜀8 (7)
Where 𝑦8 represents the current value of the variable of interest, 𝑦8ZP the value in the previous
period, 𝑐 is a constant which denotes a drift and 𝜀8 is a random error ∼𝑁(0, 𝜎N). 𝜎_`N and 𝑐 are the
parameters to be estimated. When 𝑐 is equal to zero, the random walk model is without drift. As
this model only incorporates one lagged value of 𝑦8, it is also called an autoregressive model of
order 1 (AR(1)).
Iain Currie produced a technical note on random walk with drift estimation, of which I will follow
here (Currie, 2010). The first step in estimating the random walk with drift is to compute the first
difference:
𝑧8 = 𝑦8 − 𝑦8ZP (8)
Which gives:
𝑧8 = 𝑐 + 𝜀8 (9)
21
The parameter 𝑐 can then be estimated as the average increment by the following equation:
�̂� = 𝑧̅ =
1𝑛 − 1d𝑧8
e
N
(10)
where n=28.
And the 𝜎_`N is estimated as the sample variance of 𝑧8:
𝜎AN =
1𝑛 − 2d(𝑧8 − 𝑧̅)N
e
N
(11)
The variance and standard error of �̂� are as follows:
𝑉𝑎𝑟(�̂�) = 𝑉𝑎𝑟(𝑧)̅ =
𝜎AN
𝑛 − 1 (12)
Which gives:
𝑆𝐸(𝑐)j =
𝜎A√𝑛 − 1
(13)
The 95% prediction interval for 𝑦 predicted in the year (𝑛 + 𝑚) can be found using the following
expression:
𝑦Almn ± 𝑡 ∗ 𝑆𝐸(𝑦Aemq) (14)
𝑆𝐸(𝑦Aemq) = 𝑚 ∗ 𝑆𝐸(�̂�) (15)
Where 𝑚 denotes number of years forecasted ahead. E.g. with a forecast until 2060, 𝑚 = 43. As
𝑛 = 28, 𝑡 = 2.04841
5.2 The unit root test
Now, let’s go back to equation (7):𝑦8 = 𝑦8ZP + 𝑐 + 𝜀8
This equation implies a second parameter, 𝑐O:
22
𝑦8 = 𝑐O𝑦8ZP + 𝑐 + 𝜀8 (16)
Where 𝑐O is equal to one. This implication is called a unit root and has an important feature: by
verifying the value of 𝑐O one can identify the proper autoregressive model.
𝑐O < 1 denotes stationarity and 𝑐O =1 denotes nonstationarity The random walk model is a
nonstationary AR(1). To confirm the random walk as the proper model for my data I need to check
that the condition of 𝑐O = 1 is fulfilled (Carter Hill et al., 2008).
To verify that my estimated parameters follow a random walk, I therefore test to see whether 𝑐O=1.
In this thesis I use the Augmented Dickey-Fuller test (ADF), which tests the null hypothesis of
𝑐O = 1 against an alternative. The test can be performed on models with or without a drift.
5.3 Forecasting results
5.3.1 Alpha
For both men and women, 𝛼8 follows a downward trend which indicates a drift. I therefore run the
following time series model:
𝛼8 = 𝑐O𝛼8ZP + 𝑐 + 𝜀8 (17)
To verify that this time-series model suits the data, I run an ADF-test. I fail to reject the null
hypothesis that 𝑐O = 1 against the alternative on all significance level, both for women and for
men. I therefore conclude the time series to be nonstationary and the random walk model with drift
to fit the data well for the 𝛼8 -parameters. The preferred models with the standard errors in
parentheses are as follows:
Women: 𝛼8t = 𝛼8ZPu−0.0142 + 𝜀8𝜎vwN = 0.0009 (18)
(0.0058)
23
Men: 𝛼8t = 𝛼8ZPu−0.0213 + 𝜀8𝜎vwN = 0.0013 (19)
(0.0070)
The constant terms, -0.0142 and -0.0213, indicate the average annual change in 𝛼8 for women and
men respectively. Over the 43-year forecasting horizon, 𝛼8 is forecasted with a decline in 𝛼8 of 43
times -0.0142 and -0.0213, or a total change of -0.6106 and -0.9159, for women and men
respectively. This corresponds to a 𝛼8 value of -0.8089 for women and -1.2084 for men in 2060.
Figure 5.1 (a) and (b) plot the past values of alpha for women and men along with the forecasts
based on the time series model and the associated 95% prediction intervals. It can be seen that the
forecasts for men follow a steeper trend than the forecasts for women, which naturally follows the
trend of the original data ranging 1990-2017.
24
Figure 5.1: Forecasts with 95% prediction intervals
25
5.3.2 Beta
Figure 4.2 (b) illustrates that 𝛽8 for both men and women fluctuates around a value close to 1. As
it looks like 𝛽8 for men follow a (slightly) downward going trend, while it follows a flatter trend
for women, I will test the following two time series models on both sexes:
With drift:
𝛽8 = 𝑐O𝛽8ZP + 𝑐 + 𝜀8 (20)
Without a drift:
𝛽8 = 𝑐O𝛽8ZP + 𝜀8 (21)
Performing the ADF test with a drift for both sexes, I found the results to reject the null of 𝑐O=1 on
all significance levels for women. For men it could not be rejected on the 1% and 5% significance
levels. Repeating the process, but using the model without a drift, the null could not be rejected at
the 1% and 5% level for women and it could not be rejected on all significance levels for men. I
believe that the test for men returned mixed results due to the slight downwards going trend, which
is very small in absolute terms. My conclusion is that the indication of the presence of a unit root
was stronger for both sexes using the model without a drift. The random walk model without a drift
is given by the following expression:
𝛽8 = 𝛽8ZP + 𝜀8 (22)
A random walk without a drift can be interpreted as a constant value with random fluctuations
given by the error term, 𝜀8. As 𝛽8 for both men and women randomly fluctuates around a constant
value close to one, I perform a hypothesis test to see whether the parameter can be expressed as
such a constant. Conducting the test of 𝐻O: 𝛽8 = 1 against and alternative, it returned mixed results.
For both sexes, only 20% of the tested values did not reject the null.
26
Due to the mixed results I instead set the 𝛽8 to be the average of their historical values, given by:
𝑘z =
1𝑛 d 𝛽8j
NOPQ
8KPRRO
(23)
Where 𝑛 is the number of years from 1990-2017. Assuming independence between the 𝛽8j for the
subsequent years, the variance can be found by the following:
𝑣𝑎𝑟;𝑘z< = 𝑣𝑎𝑟 |
1𝑛 d 𝛽8j
NOPQ
8KPRRO
} =1𝑛N d (𝑆𝐸;𝛽8j <)N
NOPQ
8KPRRO
(24)
My preferred model for forecasting 𝛽8 is as follows:
𝛽8j = 𝑘z + 𝜀8
Which gives the following preferred models with standard errors in parenthesis:
Women:
𝛽8j = 1.0097 + 𝜀8𝜎vwN = 0.0002 (25)
(0.0007)
Men:
𝛽8j = 1.0174 + 𝜀8𝜎vwN = 0.0005 (26)
(0.0012)
27
The 𝛽8-model for women assumes a constant value equal to 1.0097. As the value is very close to
1, I assume that future development of the child-adult mortality relationship will stay close to the
standard. For men, we also assume a close to constant development of adult-child mortality
relationship in the future, but the estimate is a bit larger for men, equals 1.0174, when comparing
with the standard.
5.4 Life expectancy
With the appropriate data in place, the data can then be used to compute life expectancy values at
birth. Life expectancy can be found using the following expression:
𝑒7 = 0.5 ∗ 𝑝7 + d 𝑝8
e
8K7mP
(27)
Here 𝑛 = 106,while 𝑝7 =��
POOOOO as stated in section 3.1.1.
With the fitted values in the 10-year interval from 1990-2010 and the forecasted values from 2020
onwards, I found the life
expectancy values at
birth as presented in table
5.1. 2017 has been
included as a reference
point.
As expected, the
forecasts using the Brass
model differ from those
presented in the SSB
report. In table 5.1 we can
see that the model returns
life expectancies below
Table 5.1: Life expectancy at birth (𝑒O) for men and women. Brass compared to SSB
Brass SSB
year Women Men Women Men
1990 79.55 72.82 79.81 73.44 2000 81.07 75.67 81.38 75.96
2010 83.40 78.74 83.15 78.85 2017 84.67 81.28 84.28 80.91
2020 85.18 82.08 84.70 81.60 2030 86.81 84.58 86.40 83.60
2040 88.31 86.83 87.80 85.40 2050 89.72 88.85 89.10 87.00
2060 91.02 90.67 90.30 88.40 Source: SSB.no, 2018
28
SSB in 1990 and 2000 for women, while it in 2010 onwards is higher. For men, values below SSB
lasts until 2010 and lies above after. Only counting the Brass-values that lies above SSB, the
difference between women varies between a minimum at 0.25 years in 2010 and a maximum of
0.72 in 2060. The difference is gradually increasing for every year. For men this difference starts
with 0.32 years in 2017 and concludes at 2.27 years in 2060.
By 2060 the Brass model forecasts a life expectancy at birth of 91.02 for women and 91.39 for
men. The corresponding values from the SSB report are 90.30 and 88.40, which differ by 0.72 and
2.27 years respectively.
In the SSB report, life
expectancy of women was
arbitrarily adjusted in 2060 in
order to avoid a situation in
which the life expectancy of
men would exceed that of
women. They did this by
increasing the life
expectancy of women in
2060 by 0.8 years. Without
the adjustments the life
expectancy between men and
women would intersect by
2100 - which according to
SSB is too early on the basis of demographic trend analysis of Norwegian data. I experience the
same issue in my analysis. In the forecast life expectancy for men has a large marginal increase
during the 10-year interval between 2020 and 2060. By 2060 the difference between men in Brass
and SSB is about 3 times as large as the difference between women for the same year. The
determining factor behind this issue is the same for both SSB and my analysis: due to the large
Figure 5.2: Forecasted life expectancy at birth for men and women,
10-year intervals between 2020-2060
29
increase in life expectancy for men the past few decades, a mechanical forecast using this trend
could lead to an overestimation, as the same trend is not expected to continue at the same rate in
the future. The Brass values presented here are unadjusted, indicating an intersection between the
sexes in the early 2060’s. Which clearly indicates that the Brass model result in a larger
overestimation than the method used by SSB.
To avoid a too early intersection of the paths, SSB listed two possible solutions: They could either
make an upwards adjustment for women or they could make a downward adjustment for men. They
did the former and justified this choice by the consistent underestimation they performed in the
data in the past. This despite the correction they had already done by choosing a shorter base period,
as I have mentioned in section 2.2. This fact, along with the given “over-forecast” of male life
expectancy makes a stronger argument for a downward adjustment in the path for men, which
certainly corresponds well with the results that I have presented.
5.5 The prediction interval (PI)
Including a prediction interval for the life expectancy values would provide a more complete
depiction of the results. As illustrated in figure 5.3, SSB reports results using a 80% prediction
interval, which corresponds with a 80% probability that the true life expectancy value lies within
this interval. Here, I will do the same.
30
Figure 5.3: SSB’s forecasted eO with 80% PI
Source: Statistics Norway, 2018
Figure 5.4: Forecasted eO with 80% PI
31
5.5.1 Finding the variance
The variance is a necessary measure in computing the prediction interval. But due to the logit
transformation of the original data, the variance of the 𝑦7,8 in model (2) cannot be linearly
translated to a variance of the 𝑒7 in the same year. An attempt to derive the variance proved to be
a tedious and complex task. Thus, an alternative method was found in order to produce an estimated
prediction interval: finding it by performing a simulation.
5.5.2 Simulation
The idea behind the simulation is that a prediction interval for 𝑒� can be found by simulating the
distribution of 𝑒7. This can be done by using the variance of 𝛼8, 𝛽8, and 𝜀8 in model (2). Generating
random draws from a normal distribution for each of these parameters, by using their designated
standard deviation as derived from their variance, the combination of these data points will then
form a data set which consists of n-numbers of 𝑦7,8-values. Then, by transforming each of these
values into 𝑒7, this data set will form a histogram (simulated distribution) of 𝑒7 in year 𝑡 of which
the prediction interval can be found. A complete depiction would be finding a prediction interval
for every 𝑡 from 2018 to 2060. But given the manual method which only returns values for a given
year, I will restrict the simulation to only year 2040 and 2060 for both sexes.
The first step in the simulation is to find the variance of each of the parameters in 2040 and 2060.
a) Variance 𝜶𝒕������C8
Let’s have a second look at equation (17):
𝛼8 = 𝛼8ZP + 𝑐 + 𝑒8
For 2060 this gives:
𝛼NO�O = 𝛼NOPQ + 43 ∗ 𝑐Ow + d 𝑒8w
NO�O
8KNOP�
(28)
32
The variance of 𝛼NO�O is therefore:
𝑣𝑎𝑟(𝛼NO�O) = 𝑣𝑎𝑟(𝛼NOPQ) + 43N ∗ var(𝑐Ow ) + 43 ∗ var(𝑒8w) (29)
Similarly for 2040, the variance can be found by the following expression:
𝑣𝑎𝑟(𝛼NO�O) = 𝑣𝑎𝑟(𝛼NOPQ) + 23N ∗ var(𝑐Ow ) + 23 ∗ var(𝑒8w) (30)
The results (rounded to 4 decimals):
𝑣𝑎𝑟(𝛼NO�O) 𝑣𝑎𝑟(𝛼NO�O)
Men 0.5636 0.03848
Women 0.1010 0.0559
b) Variance 𝜷𝒕������C8
𝛽8 = 𝑘z + 𝑒8 (31)
𝛽NO�O = 𝑘z + 𝑒8 (32)
𝑣𝑎𝑟(𝛽8) = var(𝑘z) + 𝑣𝑎𝑟(𝑒8) (33)
The var(𝑘z) is defined in equation (24)
The results:
𝑣𝑎𝑟(𝛽8)
Men 0.0005
Women 0.0002
Table 5.2: Variance of 𝛼8 in 2040 and 2060 for men and women
Table 5.3: Variance of forecasted 𝛽8 for men and women
33
c) Variance 𝜺𝒕������C8
As mentioned in section 4.2 the variance of the model should be constant, as heteroscedasticity
could lead to inefficiency in the estimates. Therefore, ideally 𝑣𝑎𝑟(𝜀8) should be the same for every
year 𝑡. When 𝛼8 and 𝛽8 were estimated for every year from 1990 to 2017, the variance for each
estimation was small, but with slightly different values. Therefore I set the general variance of the
error term to be the average of the estimations:
∑ 𝑣𝑎𝑟(𝜺𝒕)NOPQ8KPRRO
𝑛𝑜. 𝑜𝑓𝑦𝑒𝑎𝑟𝑠
(34)
The results:
𝑣𝑎𝑟(𝜺8)
Men 0.0029
Women 0.0015
The statistical program R allows you to generate a random normal distribution by using a set of
inputs: mean, standard deviation (SD) and number of observations (n). Here I set 𝑛 = 1000 for
each generation. Given that I want to find the prediction interval for the forecasted values, these
are set as the mean. The inputs I used are summarized in table 5.5.
Table 5.4: Variance of forecasted 𝜀8 for men and women
34
The random extraction generated 1000 data points for each of the parameters. These were
combined using equation (2) which resulted in 1000 𝑦7,8s for men and 1000 for women. A total of
2000 tables. In the end, computing the distribution for the life expectancy at birth (𝑒O) in 2040 and
2060 resulted in the curves illustrated in figure 5.4. For women the 80% estimated prediction
interval resulted in a lower bound of 85.47 and an upper bound of 90.79 in 2040. For men, I found
these values to be 83.58 in the lower bound and 90.02 in the upper bound. Correspondingly in
2060, these values were 87.04, 94.12, 86.36 and 94.01, respectively. There is a large overlap
between the PIs of the genders. Men’s PI was wider due to a greater dispersion in the data. In 2060
the computation returned a median close to the forecasted values at 91.02 for women and 90.67 for
men. The mean values were 90.77 and 90.36, respectively.
The simulation performed here returned a
much wider prediction interval than what is
reported from SSB. In 2060 SSBs prediction
lie within a roughly ±2 year prediction
interval. The prediction interval here is
almost double the size, at roughly ±4 years.
The reason for this difference is unclear,
because the way SSB constructed its 80%
prediction interval is not well documented
(SSB, 2018).
𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑚𝑒𝑎𝑛 𝑆𝐷
Men
𝛼NO�O -1.2084 0.3822
𝛼NO�O -0.78258 0.2363
𝛽8 1.0174 0.0226
𝜀8 0 0.0539
women
𝛼NO�O -0.8089 0.3177
𝛼NO�O -0.52424 0.1964
𝛽8 1.0097 0.0138
𝜀8 0 0.0386
Table 5.5: The values of 𝛼8, 𝛽8 and 𝜀8 used in the simulation
35
6 Summary and conclusion
Applying the Brass model to Norwegian population data, I found a satisfactory model to use in
mortality forecasting. It was shown that the model fits the data well, overcoming some of the issues
previously raised by Brass. The parameters are easy and intuitive to interpret given the linearity in
the model. As opposed to the Lee-Carter model, where 𝑘8 is the only parameter to be forecasted,
the Brass model require two parameters, 𝛼8 and 𝛽8. This yields a more flexible model.
Not surprisingly, the model returned results that differed from what was reported by SSB. In the
end, the forecasted Brass values resulted in a higher life expectancy at birth in 2060 for both men
and women. Most notably, the great improvement of the male life expectancy the past few decades
lead to an aggressive projection of the future trajectory. The same issue was found in SSB’s
analysis and an arbitrary adjustment had been performed in order to depict a more realistic forecast.
Given their justification, a similar line of argument would make a stronger case for using the Brass
model.
One of the critiques to the Lee-Carter model was its complexity. The Brass model also had a few
limitations. In constructing the prediction interval, a simulation had to be conducted as it was
difficult to find the variance of the life expectancy. Overall, the forecasting results that I presented
were greater than those reporter by SSB. This could indicate a general overestimation in using the
Brass model, or, alternatively, an underestimation of 𝑒O by SSB.
One possible improvement of my approach is to re-estimate the model by including time-series
expressions for 𝛼8 and 𝛽8 directly in expression (2). In that case, the only parameters to be
estimated are 𝑐 and 𝑘, as well as the variance of 𝜀8. Also, one has to select an appropriate starting
value 𝛼PRQO. Time constraints did not allow me to implement this approach.
In this thesis, the simple approaches were preferred over the more complex. In the process I may
therefore have left out other suitable methods that could produce more precise results. Given the
many factors that needs to be considered in forecasting, among them the appropriate base period,
36
forecasting models and so on, it is too premature to conclude whether the Brass model is better
than the Lee-Carter. Hopefully, a more comprehensive analysis could prove in favour of Brass.
37
7 References
Brass, W. (1971). On the scale of mortality. Biological Aspects of Demography, William Brass (ed.). New York: Barnes & Noble Inc.
Carter Hill, R. Griffiths, W.E. & Lim, G. C.. (2008). Principles of Econometrics. USA: John
Wiley & Sons, Inc. 3rd edition. Currie, I. (2010). Volatility v Trend Risk: A technical note on estimating and forecasting with
random walk with drift. In: Longevitas (online document). [updated (21.04.2019); read (01.05.2019)]. Available from <https://www.longevitas.co.uk/site/informationmatrix/ volatilityv.trendrisk.html>
Folkehelseinstituttet (FHI) (2018). Life expectancy in Norway. In: Public Health Report - Health
status in Norway (online document). Oslo: Norwegian Institute of Public Health [updated (04.10.2018); read (01.04.2019)]. Available from <https://www.fhi.no/en/op/hin/population/life-expectancy/>
Himes, C. L., Preston, S. H. & Condran, G. (1994). A Relational Model of Mortality at Older
Ages in Low Mortality Countries. Population studies. Vol. 48(2), pp. 269-291. Keyfitz, N. (1991). Experiments in the Projection of Mortality. Canadian Studies in Population,
Vol. 18(2), pp. 1-17
Lee, R. D. & Carter, L. R. (1992). Modeling and forecasting U. S. mortality. Journal of the American Statistical Association, 87(419), pp. 659-671.
Newell, C. (1988). Methods and models in demography. Great Britain: Belhaven Press. Meslé, F., & Vallin, J. (2011). Historical trends in mortality. In: R.G. Rogers, & E.G. Crimmins
(eds). International handbook of adult mortality. New York: Springer, pp. 9–47. World Health Organisation (WHO) (2000). WHO System of Model Life Tables. GPE Discussion
Paper Series. No. 8, World Health Organisation.
Preston, S. H., Heuveline, P. & Guillot, M. (2001). Demography: measuring and modeling population processes.
38
Oksuzyan, A., Juel, K., Vaupel, J. W. & Christensen, K. (2008). Men: good health and high mortality. Sex differences in health and aging. Aging Clinical and Experimental Research, Vol. 20(2), pp. 91-102.
Rowland, D. T. (2003). Demographic methods and concepts. New York: Oxford University Press
Inc. Statistics Norway (2018). Norway’s 2018 population projections: Main results, methods
and assumptions. Report no. 2018/22, Statistics Norway.
Statistics Norway (2018). Statistikkbanken: Døde, dødelighetstabeller. [updated (28.06.2018); read (01.04.2019)]. Available from: <https://www.ssb.no/statbank/table/07902/> Stewart, Q. T. (2004). Brass’ Relational Model: A Statistical Analysis. Mathematical Population
Studies, Vol. 11(1), pp. 51-72. Sundberg, L., Agahi, N., Fritzell, J. & Fors, S. (2018). Why is the gender in life expectancy
decreasing? The impact of age- and cause-specific mortality in Sweden 1997-2014. International Journal of Public Health. Vol 63(6), pp. 673-681.