Mortality forecasting methods - UiO

Mortality forecasting methods:

The Lee-Carter model vs. the Brass model. Estimation on Norwegian data.

Xuan Ngoc Thi Tran

Master of Philosophy in Economics Department of Economics

University of Oslo

May 2019

Acknowledgements

First and foremost, I would like to express my deepest gratitude to my supervisor, Nico Keilman.

Your generous guidance, insightful feedback and endless enthusiasm really paved the way for

this thesis.

A very special dedication to my parents, Suong Van Tran and Gai Thi Truong, whose long

journey, hard work and resilience made this journey possible.

I would also like to thank all my friends at SV who has been there since the very beginning, and

especially Sigri, who helped me to the finish line.

And finally, to CBB. Without you I would not have thrived in the rush.

Any remaining errors or shortcomings are entirely my own.

Abstract

The aim of this thesis is to compare two mortality forecasting models, the Lee-Carter model, and

the Brass model. The analysis in this thesis makes use of Statistics Norway’s population projection

report, which uses the Lee-Carter model in their forecasts, and the life table data for the Norwegian

population collected from SSB. In an attempt to find a simpler substitute to the Lee-Carter model,

I try to answer the three following questions: Will the Brass model result in a good fit to the

Norwegian data? Will the Brass model result in a simpler yet adequate substitute for a mortality

forecasting using Norwegian data? How do the results for future mortality obtained by the Brass

model differ from those based on the Lee-Carter model?

Comparing the two referred models, the Brass model did indeed return different results from those

reported by SSB. The forecasted life expectancy values found in this thesis were consistently higher

than the Lee-Carter values. In 2060 I forecast a life expectancy of 91.02 years for women and 90.67

for men. This differed with the Lee-Carter results by 2.27 and 0.72 years, respectively. Both models

found future life expectancy between men and women to intersect too early. This due to a steep

increase in male life expectancy during the past few decades. SSB made an arbitrary adjustment in

female life expectancy projections to correct for this. My analysis show a stronger argument for

adjusting male life expectancy, which suits the forecasted values of the Brass model better.

The analysis found both advantages and limitations using the Brass model. The regression returned

a good fit to Norwegian data. A logit transformation allows the relational Brass model to be

expressed linearly. This gives two parameters that permit a simple and intuitive interpretation of

the data. The advantage of the logit transformation equally resulted in a more complex estimation

of prediction intervals. A simulation had to be performed, returning a wider prediction interval than

what was reported by SSB. It is too premature to conclude whether the Brass model is better than

the Lee-Carter model, but a more comprehensive analysis is recommended.

Table of contents

1 Introduction and background .................................................................................... 1

2 Data description .......................................................................................................... 3 2.1 The survivorship function and life expectancy ............................................................................. 4

2.2 Period Selection ............................................................................................................................ 7

3 The Model .................................................................................................................... 9 3.1 The Brass model (1971) ................................................................................................................ 9

3.1.1 The logit transformation ........................................................................................................... 9

3.1.2 The standard life table ............................................................................................................ 10

3.1.3 The parameters: 𝜶𝒕 and 𝜷𝒕 ..................................................................................................... 12

4 Estimation .................................................................................................................. 13 4.1 Fitting the model ......................................................................................................................... 13

4.2 The fitted model .......................................................................................................................... 14

4.3 Results: alpha and beta ................................................................................................................ 15

4.4 Expanding the base period .......................................................................................................... 17

5 Forecasting models for the parameters ................................................................... 20 5.1 The Random walk ....................................................................................................................... 20

5.2 The unit root test ......................................................................................................................... 21

5.3 Forecasting results ....................................................................................................................... 22

5.3.1 Alpha ....................................................................................................................................... 22

5.3.2 Beta ......................................................................................................................................... 25

5.4 Life expectancy ........................................................................................................................... 27

5.5 The prediction interval (PI) ......................................................................................................... 29

5.5.1 Finding the variance ............................................................................................................... 31

5.5.2 Simulation ............................................................................................................................... 31

6 Summary and conclusion ......................................................................................... 35

7 References .................................................................................................................. 37

List of figures and tables

Figures

Figure 2.1 Rectangularization ......................................................................................................... 5

Figure 2.2: History of life expectancy in Norway Source: FHI, 2018 ............................................ 7

Figure 3.1: The logits of the standard life table, 1990 and 2017. ................................................. 11

Figure 4.1: Line graphs of the logits of 𝑙𝑥 for 1990 and 2017 against the standard ..................... 16

Figure 4.2 Estimated parameters for each year between 1990 and 2017 . (a) 𝛼𝑡 and (b) 𝛽𝑡. ....... 18

Figure 4.3: Fitted parameters 𝑏𝑎𝑠𝑒1950 vs. 𝑏𝑎𝑠𝑒1990. (a) 𝛼𝑡, women; (b) 𝛼𝑡, men; (c) 𝛽𝑡,

women and (d) 𝛽𝑡, men. ................................................................................................................. 19

Figure 5.1: Forecasts with 95% prediction intervals ..................................................................... 24

Figure 5.2: Forecasted life expectancy at birth for men and women, 10-year intervals between

2020-2060 ....................................................................................................................................... 28

Figure 5.3: SSB’s forecasted e0 with 80% PI Source: Statistics Norway, 2018 .......................... 30

Figure 5.4: Forecasted e0 with 80% PI ......................................................................................... 30

Tables

Table 1.1: Comparability with SSB ................................................................................................ 2

Table 2.1: An excerpt of the survivorship function life table ......................................................... 4

Table 3.1: Interpretation of 𝛼𝑡 and 𝛽𝑡 Source: Rowland, 2003 .................................................... 12

Table 4.1: Fitted values of 𝛼𝑡 and 𝛽𝑡 ............................................................................................ 14

Table 5.1: Life expectancy at birth (𝑒0) for men and women. Brass compared to SSB .............. 27

Table 5.2: Variance of 𝛼𝑡 in 2040 and 2060 for men and women ................................................ 32

Table 5.3: Variance of forecasted 𝛽𝑡 for men and women ........................................................... 32

Table 5.4: Variance of forecasted 𝜀𝑡 for men and women ............................................................ 33

Table 5.5: The values of 𝛼𝑡, 𝛽𝑡 and 𝜀𝑡 used in the simulation ...................................................... 34

1

1 Introduction and background

Statistics Norway (“Statistisk sentralbyrå”, abbreviated as SSB) publishes official population

projections for Norway with regular intervals. Part of these projections are assumptions on the

future course of mortality. For more than a decade, SSB has used the so-called Lee-Carter model

for analysing the historical development of mortality and extrapolating it into the future. This

model, originally constructed by Ronald D. Lee and Lawrence Carter in 1992, starts from a table

of empirical age-specific death rates - one for men and one for women (SSB, 2018).

When the death rate in year 𝑡 for a population aged 𝑥 is written as 𝑚7,8, the model is

𝑙𝑛;𝑚7,8< = 𝑎7 + 𝑏7𝑘8 + 𝜀7,8 (1)

Parameters 𝑎7 , 𝑏7 and 𝑘8 are to be estimated from the data, while 𝜀7,8 is an error term. The

estimates of 𝑎7, for all ages 𝑥, can be interpreted as the general age schedule of mortality. The

parameter 𝑘8 reflects year-on-year changes in mortality. These changes are not the same for every

age. The parameter 𝑏7 reflects how they differ across the ages. Once the parameters have been

estimated, predictions of mortality consist of extrapolating the time series of 𝑘8-values. Parameters

𝑎7 and 𝑏7 are kept constant.

The Lee-Carter model, as expressed in (1), was first introduced in 1992 for US data. Given an

unexpectedly good fit to data of other western countries, it has since become one of the most widely

used forecasting models for mortality. However, it has some limitations:

1) Estimating the parameters of the model is complicated because the right-hand side of

expression (1) does not have any independent variables - there are only parameters. One

solution is to use singular value decomposition of 𝑙𝑛(𝑚7,8) − 𝑎A7 . Parameters 𝑎7 are

estimated as averages of 𝑙𝑛(𝑚7,8) across time (Lee and Carter, 1992).

2) Assuming a constant value of the parameters 𝑏7 may lead to strong distortions in the age

pattern of predicted mortality.

2

This paper attempts to overcome the complexity of the Lee-Carter model by applying the Brass

model. The Brass model was introduced in 1971 and with it, Brass created a new method to

generate life tables. It is one example of a so-called relational model life table, where a life table is

computed based on the relation to a “standard” life table (Rowland, 2003). The structure of the

model and estimation of its parameters are considerably less complicated than the Lee-Carter

model. Thus, a natural question is whether the Brass model is eligible to be used in forecasting

Norwegian mortality. Brass himself was optimistic about using his own method in forecasting

(Brass, 1971). In their paper, Lee and Carter also refer to the Brass model as a feasible option to

mortality forecasting (Lee and Carter, 1992). The application of the Brass method in mortality

forecasts has been explored in a number of studies by researchers including Keyfitz (1991) and

Himes et al. (1994), but none with the use of Norwegian mortality data.

Therefore, the questions that I will attempt to answer in this thesis will be as follows:

1. Will the Brass model result in a good fit to the Norwegian data?

2. Will the Brass model result in a simpler yet adequate substitute for a mortality forecasting

using Norwegian data?

3. How do the results for future mortality obtained by the Brass model differ from those based

on the Lee-Carter method?

To maintain comparability, I will throughout this thesis use SSBs approach as a guideline for many

of the adjustments that I will make in this analysis. The appropriate time period to use will also be

considered, therefore all available data is collected. Calculations were done using the statistical

softwares Stata and R. A summary of the comparability with SSBs analysis can be found in the

table below with further justifications explained throughout the text.

SSB This thesis

Age group 0-119 0-110

Base period 1990-2017 1990-2017

Forecasting horizon 2018-2100 (82-years) 2018-2060 (43-years)

Table 1.1: Comparability with SSB

3

2 Data description

The Norwegian life table data used in this paper was collected from two sources: Statistics Norway

(SSB) and the Human Mortality Database (HMD). Public access to SSB data only dates back to

1966 (SSB: Statistikkbanken, 2018). To get the entire timeline, the dataset from SSB (1966-2017)

was combined with HMD-data from 1846 to 1965.

In the SSB report for Norway’s 2018 population projections, the period 1990-2017 was used as the

base to calculate future mortality. It is assumed that the dataset I have collected reflects the one

used in the SSB report. To maintain comparability I will therefore mainly use the data provided by

SSB. It must be noted that the age on the available life tables ranges from 0 to 106. I have extended

the range to age 110 by assuming a death probability of 0.5 for each year from 107 to 110. In SSB’s

report the maximum age is set to 119. Despite this difference, the main purpose of this paper is to

ultimately find the life expectancy at birth. Given the small and variable number of survivors at

these age groups, along with the calculation methods used in this thesis, the exclusion of the age

group 110-119 is not expected to have a significant outcome on the results.

In this paper we only use the survivorship function (“𝑙7-column”) of period life tables. This is a

cumulative probability function, where the 𝑙7-column describes the number of survivors at exact

age 𝑥 per 100,000 births based on a given set of age specific mortality rates. Here, I use one-year

age groups. The survivorship function shows the distribution for each age, starting at birth and

ending at age 110. The annual data is available by sex, and for reasons that will be further explained

later on, I will analyse the data for men and women separately. An excerpt of the dataset is

illustrated in table 2.1.

4

Table 2.1: An excerpt of the survivorship function life table

2.1 The survivorship function and life expectancy

Figure 2.1 plots the survivorship function for the Norwegian population based on age specific death

rates from the years 1846, 1950 1990 and 2017. The illustration shows a clear pattern in the

Norwegian survival distribution: with the years, the concentration of deaths in older ages has

increased. Survival was more evenly distributed between ages in 1846, where roughly 59% of

women and 54% of men survived until age 50. In 2017, the corresponding numbers had increased

to 98% and 97%, respectively. Improved living conditions, better health system and prevention of

diseases in the 150-year period has led to a drastic decline in infant and child mortality, currently

having reached an all-time low (FHI, 2018). An increasing proportion survive till maturity or older

ages and it is expected that this development will continue in the future (SSB, 2018). This tendency,

where the graph shifts toward the upper right corner making it look like a rectangle, is called

rectangularization (Rowland, 2003).

5

Figure 2.1 Rectangularization

6

Life expectancy (𝑒7) is a measure of the average time a population group is expected to live based

on a given set of age-specific death rates. In the past, war, poverty and poor health were main

factors affecting the age at which people died. Technological improvements and the introduction

of penicillin in 1925 lead to a great reduction of infectious diseases and deaths (Meslé and Vallin,

2011).

Figure 2.2 shows the life expectancy at birth for males and females in Norway during the period

1846-2016. As illustrated, the curves follow different paths. The life expectancy for women has,

for any given point in time, exceeded the life expectancy for men. Since 1925, life expectancy for

the female population has steadily increased, while the trajectory for the male population has been

less consistent. The gender gap in life expectancy was at its greatest in the time frame 1950-1990,

peaking at around the 1980’s. From 1990 onwards, the steepness in men’s life expectancy trajectory

has exceeded the one for women, which has led to a dramatic decrease in the life expectancy gender

gap.

Interestingly, a gender gap in life expectancy may be rooted in numerous factors related to biology

and psychology, ultimately leading to behaviours and lifestyle choices that affect life expectancy

(Oksuzyan et. al, 2006). Given the peak in the life expectancy gender gap during 1950-1990, the

most prominent explaining factor has been tobacco use. The Norwegian institute of public health

(FHI) reports that the proportion of men smoking peaked in the 1960’s followed by a steep decline.

For women, on the other hand, the smoking proportions were constant during the same time frame

and did not start to decline until the 90’s. A greater change in men’s habits has now lead to a drastic

reduction in the gap. From 2017 and onwards SSB reports that smoking is expected to be less

important for the gender gap in the future (SSB, 2018). This is further supported by a recent

Swedish study which pointed out that this impact plays a minor role for the current gender gap in

the Swedish population (Sundberg et. al., 2018). Given the variating factors in the sexes, the

forecasting of life expectancy for men and women will be performed separately.

7

Figure 2.2: History of life expectancy in Norway

Source: FHI, 2018

2.2 Period Selection

In mortality forecasting it is important to consider what past period should be used to extrapolate

future mortality (Keyfitz, 1991). As I have pointed out previously, the Norwegian life expectancy

patterns have varied greatly with time, especially for the male population. Choosing the appropriate

dataset as the base must therefore be done with careful consideration. SSB has noticed this in their

reports, as they have consistently underestimated the population projections dating before 2016. In

newer reports, the base period has been changed to 1990-2017, which has been found to reduce the

issue of underestimation (SSB, 2018). Choosing the same base period will not only maintain

comparability between the studies, but I also find this to be a reasonable time frame to use given

the consistency in life expectancy trajectories for both sexes in the period, as can be seen in figure

2.2.

8

The SSB report use prediction values for each year leading to 2100. Given the relatively short base

period (n = 28 years), the length of the forecasting horizon are subdued to large inaccuracies. As

SSB uses 2060 as one of their main reference years, I will restrict my forecasting horizon to this

year. This gives a shorter forecasting horizon which is equal to 43 years (m = 43).

9

3 The Model

When William Brass introduced his method in 1971, he revolutionized how life tables for

populations with low quality or incomplete datasets were constructed. His mathematical relational

method allowed life tables to be constructed independently of historical data (Preston et. al, 2004).

The World Health Organization (WHO) presents the Brass logit system as one of the main model

life tables, along with commonly used life tables like the UN model life table and the Coale-

Demeney model (WHO, 2000). In this section I will lay down the foundation of the Brass model,

followed by a description of the application to mortality forecasting on Norwegian data.

3.1 The Brass model (1971)

When Brass developed his relational method, he noticed that the relational curve between two life

tables becomes an approximate straight line, after logit transformation. This discovery allowed the

relationship between two survivorship functions to be expressed linearly:

𝑦7,8 = 𝛼8 + 𝛽8𝑦7,C + 𝜀8 (2)

The y on the left-hand side represents the logit transformed survivorship function for a given year,

𝑡, and a given age, 𝑥. Similarly the y on the right-hand side has also been logit transformed, but

this one represents a single reference point, which in numerous literatures has been referred to as

the standard life table. This is denoted by the 𝑠 (Newell, 1988). Parameters 𝛼8 and 𝛽8 are to be

estimated from the data, while 𝜀8 is an error term. Conventionally, by using available data one can

therefore make reasonable assumptions about 𝛼8 and 𝛽8, and together with the standard life table,

compute the life table for a given year.

3.1.1 The logit transformation

Following the model, the first key step consists of transforming the 𝑙7 -values by logit

transformation:

1. Divide 𝑙7 by 100 000 to express it as a proportion, 𝑝7.

10

2. Logit transform 𝑝7 by using the following expression:

𝑙𝑜𝑔𝑖𝑡(𝑝7) = 0.5𝑙𝑛(1 − 𝑝7𝑝7

) (3)

Here we follow Brass’ original approach. Alternatively we could have defined 𝑙𝑜𝑔𝑖𝑡(𝑝7) as

𝑙𝑛(𝑝7/(1 − 𝑝7)).

3.1.2 The standard life table

In relevant literature it has been stated that “…Any standard can be chosen, but it is sensible to

choose one that shows some sort of average pattern.” (Newell, 1988, pp.155). Brass has defined a

general standard life table, but he recommended using a life table that better reflects the population

given that the data is available (Brass, 1971).

Given the complete datasets available for the Norwegian population, I found the former to be the

most sensible approach: by using the average. The standard life table can therefore be found by the

following expression:

𝑙C =

∑ 𝑙7,88KL8

𝑛 (4)

Where 𝑡 is the start of the base period, 𝑇 is the end and 𝑛 = 𝑇 − 𝑡 + 1 is the number of years in

the given period.

As stated in the period selection section, the base period is 1990-2017, which gives the following

calculation of the standard life table:

𝑙7,C =

∑ 𝑙7,8NOPQ8KPRRO

28 (5)

The values of the logit transformed life table are illustrated in figure 3.1. As expected, the standard

falls exactly between 1990 and 2017. Using the standard life table to find the life expectancy at

birth would return values close to those in the mid-2003.

11

Figure 3.1: The logits of the standard life table, 1990 and 2017.

12

3.1.3 The parameters: 𝜶𝒕 and 𝜷𝒕

Given the standard, the Brass model summarizes age-specific mortality for a fixed year 𝑡 in two

parameters, 𝛼8 and 𝛽8. The estimate of 𝛼8 can be interpreted as the level of mortality in year 𝑡. The

second parameter, 𝛽8, denotes the relationship between childhood and adult mortality. It must be

noted that both interpretations must be

seen relative to the standard. A computed

life table using 𝛼8 = 0 and 𝛽8 = 1

returns a life table identical to the

standard life table.

In relevant literature, Brass’ general

standard life table has been used to find

appropriate ranges related to 𝛼8 and 𝛽8 .

Reasonable values are illustrated in table

3.1, where the range for 𝛼8 is set between

-1.5 and 0.8 and for 𝛽8 between 0.6 and

1.4. Note that the general standard life table returns life expectancy close to mid-1940’s. As my

standard life table is close to mid-2003, the ranges in my analysis could differ.

Given the interpretations of 𝛼8 and 𝛽8, it is possible to infer expectations on the regression results.

For both men and women in the Norwegian population, the life expectancy has steadily increased

over the past few decades. SSB has reported that this number will continue to increase in the future

(SSB, 2018). As time increase, 𝛼8 is therefore expected to become more and more negative.

During the past decade, life expectancy has reached an all-time high and correspondingly, child

mortality has reached an all-time low (FHI, 2018). If either of these values would improve in the

future, the marginal change is expected to be quite small. It is therefore reasonable to assume that

the child-adult mortality relationship will remain quite steady, with a 𝛽8 close to one.

𝛼8

-1.5 0.0 0.8

Higher life expectancy

Life expectancy the same as the standard

Low life expectancy

𝛽8

0.6 1.0 1.4

High infant and child mortality, low adult mortality

The relationship between adult and child mortality the

same as the standard

Low infant and child mortality,

high adult mortality

Table 3.1: Interpretation of 𝛼8 and 𝛽8 Source: Rowland, 2003

13

4 Estimation

4.1 Fitting the model

The linearity of model (2) allows the parameters 𝛼8 and 𝛽8 to be estimated by an ordinary least

squares regression (OLS). In his paper, Brass expressed a concern applying the OLS to fit the

estimates. Using the United Nations mortality schedule he found irregularities in the youngest and

oldest age groups. Especially, when comparing the fitted values against the observed values, he

noted great discrepancies at age one and in the oldest age groups – they did not fit the data well.

With the possibility of errors in reporting, Brass suggested weighted least squares (WLS) as an

alternative method. He reasoned that putting less weight on these groups in the regression could

solve the issue. However, he concluded that the simple operation of the OLS outweigh the

“arbitrary and laborious” application of the other, ultimately preferring the OLS (Brass, 1971).

Stewart (2004) performed an evaluation of the statistical methods used in the Brass relational

model, using the West model level 22 stable life table as the underlying mortality distribution.

Testing five different methods, which were based on a variation of OLS, WLS and maximum

likelihood estimation (MLE), he concluded that the MLE produced the most efficient estimates.

There is an additional concern. The survivorship function 𝑙7 is not observed, but is constructed

based on estimated age-specific death rates. An appropriate approach would be to use data on death

counts and numbers of persons alive, both broken down by age, use a Poisson model to estimate

the death rates and their standard errors, and include these standard errors in the estimation

procedure for 𝛼8 and 𝛽8. But this method is also found to be too complicated.

In the introduction I listed the level of “complexity” as one of the limitations to the Lee-Carter

model. Despite the suggested methods of WLS or MLE, the logit transformation aspect of the data

makes the application of these methods quite difficult. There is also a possibility that the Norwegian

population data returns a better fit to the model. Thus, staying true to Brass and the underlying idea

of this thesis, I will proceed using the simple method of OLS.

14

4.2 The fitted model

An important assumption regarding the error term is that it has zero expectation and a constant

variance. A violation in the assumption could lead to inefficiency in the estimates and at the very

worst, biasedness (Carter Hill et. al, 2008). When I checked for heteroscedasticity in the residuals,

the patterns were found to be slightly irregular. Therefore, to overcome the presence of

heteroscedasticity the OLS regression was performed with robust standard errors.

Regardless of the robust standard error application, I found the model to return a good fit to the

data. For both men and women, all the reported 𝑅²-values were greater than 0.99. The estimated

values of 𝛼8and 𝛽8 with their corresponding standard errors (SE) and 𝑅²-values are listed in table

4.1. This table gives the time series dataset for 𝛼8and 𝛽8 , which will be used in the mortality

forecast analysis.

Table 4.1: Fitted values of 𝛼8 and 𝛽8

15

In figure 4.1, the empirical survivorship function (in logit form) 𝑦8 is plotted against the fitted

values. Fitting the model by using OLS on Norwegian data, I did not find the issues Brass referred

to, except for minor tendencies in the oldest age group in 2017 for men, as illustrated by the green

curve in figure 4.1 (b).

4.3 Results: alpha and beta

As the parameters 𝛼8 and 𝛽8 lays the foundation for the mortality forecast, it is important to have a

closer look at their behaviour. Figure 4.2 shows the estimated 𝛼8 and 𝛽8 parameters for each year

between 1990 and 2017. The green curves represent women, while the blue represent men. A fitted

dashed line has been added to each of the parameters to give a better illustration of the slopes.

Both parameters seem to behave as I previously expected. 𝛼8 for both men and women follows a

downwards going slope, with men following a steeper trajectory. This reflects that men in the last

few decades have had a greater improvement in life expectancy compared to women, as noted in

section 2.1.

The parameter 𝛽8 for both sexes fluctuates around a value close to one, indicating little change

between the child- and adult mortality in the past two decades. While the 𝛽8 for women follows a

close to constant path, the men’s path is slightly tilting downwards. Following a similar reasoning

as for 𝛼8, a great reduction in the mortality rate for men has naturally had a greater impact on the

child-adult mortality relationship.

16

Figure 4.1: Line graphs of the logits of 𝑙7 for 1990 and 2017 against the

standard

17

4.4 Expanding the base period

In section 2.2 I set the base period to be 1990-2017 (“𝐵𝑎𝑠𝑒PRRO”). To further emphasize the

importance of period selection and to see the implications of choosing a different one, I performed

an analysis using a longer time span. I did this by expanding the base period, starting in 1950

(“𝐵𝑎𝑠𝑒PRXO”) instead of 1990. 1950 was arbitrarily chosen, but looking back at figure 2.2 this

period marks the start of modern life in Norway. Living conditions improved remarkably and war,

hunger and pandemics were no longer determining factors in life expectancy. This makes it a more

realistic option, as opposed to using the whole timeline since 1846.

I recomputed the standard using expression (6) and re-estimated 𝛼8 and 𝛽8 for every year in the 68-

year period. The results are illustrated in figure 4.3.

𝑙7,C =

∑ 𝑙7,8NOPQ8KPRXO

68 (6)

Figure 4.3 show the results using 𝑏𝑎𝑠𝑒PRXO (blue) compared to using 𝑏𝑎𝑠𝑒PRRO (green). 𝛼8 for

women are illustrated in (a) and for men in (b). Similarly for 𝛽8, (c) represents women and (d) men.

The dashed lines have been added to illustrate the slopes of the curves. Given the different base

period there is a natural shift in all the curves.

With the exception of 𝛼8 for women, the change in the base period returned remarkably different

results. In figure 4.3 (b) we can see that using 𝑏𝑎𝑠𝑒PRRO returned a steeper slope than 𝑏𝑎𝑠𝑒PRXO.

This reflects how men during the past few decades have had a remarkable increase in life

expectancy compared to previous years. The female child-adult mortality relationship follow a

more intuitive path, as seen in figure 4.3 (c) – with remarkable decreases in infant mortality 1950-

1990, it is natural that the curve is steeper for 𝑏𝑎𝑠𝑒PRXO compared to 𝑏𝑎𝑠𝑒PRRO.

18

(a)

(b)

Figure 4.2 Estimated parameters for each year between 1990 and 2017 . (a) 𝛼8 and (b)

𝛽8.

19

The flat trend for men using 𝑏𝑎𝑠𝑒PRXO in figure 4.3 (d) is a little misleading, where it seems like

the relationship has been constant since the 1950s. This is due to the high volatility in male

mortality combined with an decreasing infant mortality.

This analysis further confirms the importance of period selection. Men’s trajectory is probably

more extreme than it will be in the future, but overall it seems that the development the past few

decades reflects the future better. In what follows, I will restrict myself to parameter estimates for

1990-2017 (see Table 4.1).

Figure 4.3: Fitted parameters 𝑏𝑎𝑠𝑒PRXO vs. 𝑏𝑎𝑠𝑒PRRO.

(a) 𝛼8, women; (b) 𝛼8, men; (c) 𝛽8, women and (d) 𝛽8, men.

20

5 Forecasting models for the parameters

Now that the parameters have been estimated and evaluated, the forecasting of the mortality consist

of extrapolating the time series of 𝛼8 - and 𝛽8 -values. To produce the appropriate forecasting

models for the parameters, I applied basic time series theory to the data.

5.1 The Random walk

As illustrated in figure 4.2, both 𝛼8 and 𝛽8 for men and women fluctuates around a given trend.

These fluctuations are seemingly random, which points to the random walk model as a suitable

model. The random walk is one of the fundamental models in time series modelling and is defined

by the following expression:

𝑦8 = 𝑦8ZP + 𝑐 + 𝜀8 (7)

Where 𝑦8 represents the current value of the variable of interest, 𝑦8ZP the value in the previous

period, 𝑐 is a constant which denotes a drift and 𝜀8 is a random error ∼𝑁(0, 𝜎N). 𝜎_`N and 𝑐 are the

parameters to be estimated. When 𝑐 is equal to zero, the random walk model is without drift. As

this model only incorporates one lagged value of 𝑦8, it is also called an autoregressive model of

order 1 (AR(1)).

Iain Currie produced a technical note on random walk with drift estimation, of which I will follow

here (Currie, 2010). The first step in estimating the random walk with drift is to compute the first

difference:

𝑧8 = 𝑦8 − 𝑦8ZP (8)

Which gives:

𝑧8 = 𝑐 + 𝜀8 (9)

21

The parameter 𝑐 can then be estimated as the average increment by the following equation:

�̂� = 𝑧̅ =

1𝑛 − 1d𝑧8

e

N

(10)

where n=28.

And the 𝜎_`N is estimated as the sample variance of 𝑧8:

𝜎AN =

1𝑛 − 2d(𝑧8 − 𝑧̅)N

e

N

(11)

The variance and standard error of �̂� are as follows:

𝑉𝑎𝑟(�̂�) = 𝑉𝑎𝑟(𝑧)̅ =

𝜎AN

𝑛 − 1 (12)

Which gives:

𝑆𝐸(𝑐)j =

𝜎A√𝑛 − 1

(13)

The 95% prediction interval for 𝑦 predicted in the year (𝑛 + 𝑚) can be found using the following

expression:

𝑦Almn ± 𝑡 ∗ 𝑆𝐸(𝑦Aemq) (14)

𝑆𝐸(𝑦Aemq) = 𝑚 ∗ 𝑆𝐸(�̂�) (15)

Where 𝑚 denotes number of years forecasted ahead. E.g. with a forecast until 2060, 𝑚 = 43. As

𝑛 = 28, 𝑡 = 2.04841

5.2 The unit root test

Now, let’s go back to equation (7):𝑦8 = 𝑦8ZP + 𝑐 + 𝜀8

This equation implies a second parameter, 𝑐O:

22

𝑦8 = 𝑐O𝑦8ZP + 𝑐 + 𝜀8 (16)

Where 𝑐O is equal to one. This implication is called a unit root and has an important feature: by

verifying the value of 𝑐O one can identify the proper autoregressive model.

𝑐O < 1 denotes stationarity and 𝑐O =1 denotes nonstationarity The random walk model is a

nonstationary AR(1). To confirm the random walk as the proper model for my data I need to check

that the condition of 𝑐O = 1 is fulfilled (Carter Hill et al., 2008).

To verify that my estimated parameters follow a random walk, I therefore test to see whether 𝑐O=1.

In this thesis I use the Augmented Dickey-Fuller test (ADF), which tests the null hypothesis of

𝑐O = 1 against an alternative. The test can be performed on models with or without a drift.

5.3 Forecasting results

5.3.1 Alpha

For both men and women, 𝛼8 follows a downward trend which indicates a drift. I therefore run the

following time series model:

𝛼8 = 𝑐O𝛼8ZP + 𝑐 + 𝜀8 (17)

To verify that this time-series model suits the data, I run an ADF-test. I fail to reject the null

hypothesis that 𝑐O = 1 against the alternative on all significance level, both for women and for

men. I therefore conclude the time series to be nonstationary and the random walk model with drift

to fit the data well for the 𝛼8 -parameters. The preferred models with the standard errors in

parentheses are as follows:

Women: 𝛼8t = 𝛼8ZPu−0.0142 + 𝜀8𝜎vwN = 0.0009 (18)

(0.0058)

23

Men: 𝛼8t = 𝛼8ZPu−0.0213 + 𝜀8𝜎vwN = 0.0013 (19)

(0.0070)

The constant terms, -0.0142 and -0.0213, indicate the average annual change in 𝛼8 for women and

men respectively. Over the 43-year forecasting horizon, 𝛼8 is forecasted with a decline in 𝛼8 of 43

times -0.0142 and -0.0213, or a total change of -0.6106 and -0.9159, for women and men

respectively. This corresponds to a 𝛼8 value of -0.8089 for women and -1.2084 for men in 2060.

Figure 5.1 (a) and (b) plot the past values of alpha for women and men along with the forecasts

based on the time series model and the associated 95% prediction intervals. It can be seen that the

forecasts for men follow a steeper trend than the forecasts for women, which naturally follows the

trend of the original data ranging 1990-2017.

24

Figure 5.1: Forecasts with 95% prediction intervals

25

5.3.2 Beta

Figure 4.2 (b) illustrates that 𝛽8 for both men and women fluctuates around a value close to 1. As

it looks like 𝛽8 for men follow a (slightly) downward going trend, while it follows a flatter trend

for women, I will test the following two time series models on both sexes:

With drift:

𝛽8 = 𝑐O𝛽8ZP + 𝑐 + 𝜀8 (20)

Without a drift:

𝛽8 = 𝑐O𝛽8ZP + 𝜀8 (21)

Performing the ADF test with a drift for both sexes, I found the results to reject the null of 𝑐O=1 on

all significance levels for women. For men it could not be rejected on the 1% and 5% significance

levels. Repeating the process, but using the model without a drift, the null could not be rejected at

the 1% and 5% level for women and it could not be rejected on all significance levels for men. I

believe that the test for men returned mixed results due to the slight downwards going trend, which

is very small in absolute terms. My conclusion is that the indication of the presence of a unit root

was stronger for both sexes using the model without a drift. The random walk model without a drift

is given by the following expression:

𝛽8 = 𝛽8ZP + 𝜀8 (22)

A random walk without a drift can be interpreted as a constant value with random fluctuations

given by the error term, 𝜀8. As 𝛽8 for both men and women randomly fluctuates around a constant

value close to one, I perform a hypothesis test to see whether the parameter can be expressed as

such a constant. Conducting the test of 𝐻O: 𝛽8 = 1 against and alternative, it returned mixed results.

For both sexes, only 20% of the tested values did not reject the null.

26

Due to the mixed results I instead set the 𝛽8 to be the average of their historical values, given by:

𝑘z =

1𝑛 d 𝛽8j

NOPQ

8KPRRO

(23)

Where 𝑛 is the number of years from 1990-2017. Assuming independence between the 𝛽8j for the

subsequent years, the variance can be found by the following:

𝑣𝑎𝑟;𝑘z< = 𝑣𝑎𝑟 |

1𝑛 d 𝛽8j

NOPQ

8KPRRO

} =1𝑛N d (𝑆𝐸;𝛽8j <)N

NOPQ

8KPRRO

(24)

My preferred model for forecasting 𝛽8 is as follows:

𝛽8j = 𝑘z + 𝜀8

Which gives the following preferred models with standard errors in parenthesis:

Women:

𝛽8j = 1.0097 + 𝜀8𝜎vwN = 0.0002 (25)

(0.0007)

Men:

𝛽8j = 1.0174 + 𝜀8𝜎vwN = 0.0005 (26)

(0.0012)

27

The 𝛽8-model for women assumes a constant value equal to 1.0097. As the value is very close to

1, I assume that future development of the child-adult mortality relationship will stay close to the

standard. For men, we also assume a close to constant development of adult-child mortality

relationship in the future, but the estimate is a bit larger for men, equals 1.0174, when comparing

with the standard.

5.4 Life expectancy

With the appropriate data in place, the data can then be used to compute life expectancy values at

birth. Life expectancy can be found using the following expression:

𝑒7 = 0.5 ∗ 𝑝7 + d 𝑝8

e

8K7mP

(27)

Here 𝑛 = 106,while 𝑝7 =��

POOOOO as stated in section 3.1.1.

With the fitted values in the 10-year interval from 1990-2010 and the forecasted values from 2020

onwards, I found the life

expectancy values at

birth as presented in table

5.1. 2017 has been

included as a reference

point.

As expected, the

forecasts using the Brass

model differ from those

presented in the SSB

report. In table 5.1 we can

see that the model returns

life expectancies below

Table 5.1: Life expectancy at birth (𝑒O) for men and women. Brass compared to SSB

Brass SSB

year Women Men Women Men

1990 79.55 72.82 79.81 73.44 2000 81.07 75.67 81.38 75.96

2010 83.40 78.74 83.15 78.85 2017 84.67 81.28 84.28 80.91

2020 85.18 82.08 84.70 81.60 2030 86.81 84.58 86.40 83.60

2040 88.31 86.83 87.80 85.40 2050 89.72 88.85 89.10 87.00

2060 91.02 90.67 90.30 88.40 Source: SSB.no, 2018

28

SSB in 1990 and 2000 for women, while it in 2010 onwards is higher. For men, values below SSB

lasts until 2010 and lies above after. Only counting the Brass-values that lies above SSB, the

difference between women varies between a minimum at 0.25 years in 2010 and a maximum of

0.72 in 2060. The difference is gradually increasing for every year. For men this difference starts

with 0.32 years in 2017 and concludes at 2.27 years in 2060.

By 2060 the Brass model forecasts a life expectancy at birth of 91.02 for women and 91.39 for

men. The corresponding values from the SSB report are 90.30 and 88.40, which differ by 0.72 and

2.27 years respectively.

In the SSB report, life

expectancy of women was

arbitrarily adjusted in 2060 in

order to avoid a situation in

which the life expectancy of

men would exceed that of

women. They did this by

increasing the life

expectancy of women in

2060 by 0.8 years. Without

the adjustments the life

expectancy between men and

women would intersect by

2100 - which according to

SSB is too early on the basis of demographic trend analysis of Norwegian data. I experience the

same issue in my analysis. In the forecast life expectancy for men has a large marginal increase

during the 10-year interval between 2020 and 2060. By 2060 the difference between men in Brass

and SSB is about 3 times as large as the difference between women for the same year. The

determining factor behind this issue is the same for both SSB and my analysis: due to the large

Figure 5.2: Forecasted life expectancy at birth for men and women,

10-year intervals between 2020-2060

29

increase in life expectancy for men the past few decades, a mechanical forecast using this trend

could lead to an overestimation, as the same trend is not expected to continue at the same rate in

the future. The Brass values presented here are unadjusted, indicating an intersection between the

sexes in the early 2060’s. Which clearly indicates that the Brass model result in a larger

overestimation than the method used by SSB.

To avoid a too early intersection of the paths, SSB listed two possible solutions: They could either

make an upwards adjustment for women or they could make a downward adjustment for men. They

did the former and justified this choice by the consistent underestimation they performed in the

data in the past. This despite the correction they had already done by choosing a shorter base period,

as I have mentioned in section 2.2. This fact, along with the given “over-forecast” of male life

expectancy makes a stronger argument for a downward adjustment in the path for men, which

certainly corresponds well with the results that I have presented.

5.5 The prediction interval (PI)

Including a prediction interval for the life expectancy values would provide a more complete

depiction of the results. As illustrated in figure 5.3, SSB reports results using a 80% prediction

interval, which corresponds with a 80% probability that the true life expectancy value lies within

this interval. Here, I will do the same.

30

Figure 5.3: SSB’s forecasted eO with 80% PI

Source: Statistics Norway, 2018

Figure 5.4: Forecasted eO with 80% PI

31

5.5.1 Finding the variance

The variance is a necessary measure in computing the prediction interval. But due to the logit

transformation of the original data, the variance of the 𝑦7,8 in model (2) cannot be linearly

translated to a variance of the 𝑒7 in the same year. An attempt to derive the variance proved to be

a tedious and complex task. Thus, an alternative method was found in order to produce an estimated

prediction interval: finding it by performing a simulation.

5.5.2 Simulation

The idea behind the simulation is that a prediction interval for 𝑒� can be found by simulating the

distribution of 𝑒7. This can be done by using the variance of 𝛼8, 𝛽8, and 𝜀8 in model (2). Generating

random draws from a normal distribution for each of these parameters, by using their designated

standard deviation as derived from their variance, the combination of these data points will then

form a data set which consists of n-numbers of 𝑦7,8-values. Then, by transforming each of these

values into 𝑒7, this data set will form a histogram (simulated distribution) of 𝑒7 in year 𝑡 of which

the prediction interval can be found. A complete depiction would be finding a prediction interval

for every 𝑡 from 2018 to 2060. But given the manual method which only returns values for a given

year, I will restrict the simulation to only year 2040 and 2060 for both sexes.

The first step in the simulation is to find the variance of each of the parameters in 2040 and 2060.

a) Variance 𝜶𝒕��C8

Let’s have a second look at equation (17):

𝛼8 = 𝛼8ZP + 𝑐 + 𝑒8

For 2060 this gives:

𝛼NO�O = 𝛼NOPQ + 43 ∗ 𝑐Ow + d 𝑒8w

NO�O

8KNOP�

(28)

32

The variance of 𝛼NO�O is therefore:

𝑣𝑎𝑟(𝛼NO�O) = 𝑣𝑎𝑟(𝛼NOPQ) + 43N ∗ var(𝑐Ow ) + 43 ∗ var(𝑒8w) (29)

Similarly for 2040, the variance can be found by the following expression:

𝑣𝑎𝑟(𝛼NO�O) = 𝑣𝑎𝑟(𝛼NOPQ) + 23N ∗ var(𝑐Ow ) + 23 ∗ var(𝑒8w) (30)

The results (rounded to 4 decimals):

𝑣𝑎𝑟(𝛼NO�O) 𝑣𝑎𝑟(𝛼NO�O)

Men 0.5636 0.03848

Women 0.1010 0.0559

b) Variance 𝜷𝒕��C8

𝛽8 = 𝑘z + 𝑒8 (31)

𝛽NO�O = 𝑘z + 𝑒8 (32)

𝑣𝑎𝑟(𝛽8) = var(𝑘z) + 𝑣𝑎𝑟(𝑒8) (33)

The var(𝑘z) is defined in equation (24)

The results:

𝑣𝑎𝑟(𝛽8)

Men 0.0005

Women 0.0002

Table 5.2: Variance of 𝛼8 in 2040 and 2060 for men and women

Table 5.3: Variance of forecasted 𝛽8 for men and women

33

c) Variance 𝜺𝒕��C8

As mentioned in section 4.2 the variance of the model should be constant, as heteroscedasticity

could lead to inefficiency in the estimates. Therefore, ideally 𝑣𝑎𝑟(𝜀8) should be the same for every

year 𝑡. When 𝛼8 and 𝛽8 were estimated for every year from 1990 to 2017, the variance for each

estimation was small, but with slightly different values. Therefore I set the general variance of the

error term to be the average of the estimations:

∑ 𝑣𝑎𝑟(𝜺𝒕)NOPQ8KPRRO

𝑛𝑜. 𝑜𝑓𝑦𝑒𝑎𝑟𝑠

(34)

The results:

𝑣𝑎𝑟(𝜺8)

Men 0.0029

Women 0.0015

The statistical program R allows you to generate a random normal distribution by using a set of

inputs: mean, standard deviation (SD) and number of observations (n). Here I set 𝑛 = 1000 for

each generation. Given that I want to find the prediction interval for the forecasted values, these

are set as the mean. The inputs I used are summarized in table 5.5.

Table 5.4: Variance of forecasted 𝜀8 for men and women

34

The random extraction generated 1000 data points for each of the parameters. These were

combined using equation (2) which resulted in 1000 𝑦7,8s for men and 1000 for women. A total of

2000 tables. In the end, computing the distribution for the life expectancy at birth (𝑒O) in 2040 and

2060 resulted in the curves illustrated in figure 5.4. For women the 80% estimated prediction

interval resulted in a lower bound of 85.47 and an upper bound of 90.79 in 2040. For men, I found

these values to be 83.58 in the lower bound and 90.02 in the upper bound. Correspondingly in

2060, these values were 87.04, 94.12, 86.36 and 94.01, respectively. There is a large overlap

between the PIs of the genders. Men’s PI was wider due to a greater dispersion in the data. In 2060

the computation returned a median close to the forecasted values at 91.02 for women and 90.67 for

men. The mean values were 90.77 and 90.36, respectively.

The simulation performed here returned a

much wider prediction interval than what is

reported from SSB. In 2060 SSBs prediction

lie within a roughly ±2 year prediction

interval. The prediction interval here is

almost double the size, at roughly ±4 years.

The reason for this difference is unclear,

because the way SSB constructed its 80%

prediction interval is not well documented

(SSB, 2018).

𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 𝑚𝑒𝑎𝑛 𝑆𝐷

Men

𝛼NO�O -1.2084 0.3822

𝛼NO�O -0.78258 0.2363

𝛽8 1.0174 0.0226

𝜀8 0 0.0539

women

𝛼NO�O -0.8089 0.3177

𝛼NO�O -0.52424 0.1964

𝛽8 1.0097 0.0138

𝜀8 0 0.0386

Table 5.5: The values of 𝛼8, 𝛽8 and 𝜀8 used in the simulation

35

6 Summary and conclusion

Applying the Brass model to Norwegian population data, I found a satisfactory model to use in

mortality forecasting. It was shown that the model fits the data well, overcoming some of the issues

previously raised by Brass. The parameters are easy and intuitive to interpret given the linearity in

the model. As opposed to the Lee-Carter model, where 𝑘8 is the only parameter to be forecasted,

the Brass model require two parameters, 𝛼8 and 𝛽8. This yields a more flexible model.

Not surprisingly, the model returned results that differed from what was reported by SSB. In the

end, the forecasted Brass values resulted in a higher life expectancy at birth in 2060 for both men

and women. Most notably, the great improvement of the male life expectancy the past few decades

lead to an aggressive projection of the future trajectory. The same issue was found in SSB’s

analysis and an arbitrary adjustment had been performed in order to depict a more realistic forecast.

Given their justification, a similar line of argument would make a stronger case for using the Brass

model.

One of the critiques to the Lee-Carter model was its complexity. The Brass model also had a few

limitations. In constructing the prediction interval, a simulation had to be conducted as it was

difficult to find the variance of the life expectancy. Overall, the forecasting results that I presented

were greater than those reporter by SSB. This could indicate a general overestimation in using the

Brass model, or, alternatively, an underestimation of 𝑒O by SSB.

One possible improvement of my approach is to re-estimate the model by including time-series

expressions for 𝛼8 and 𝛽8 directly in expression (2). In that case, the only parameters to be

estimated are 𝑐 and 𝑘, as well as the variance of 𝜀8. Also, one has to select an appropriate starting

value 𝛼PRQO. Time constraints did not allow me to implement this approach.

In this thesis, the simple approaches were preferred over the more complex. In the process I may

therefore have left out other suitable methods that could produce more precise results. Given the

many factors that needs to be considered in forecasting, among them the appropriate base period,

36

forecasting models and so on, it is too premature to conclude whether the Brass model is better

than the Lee-Carter. Hopefully, a more comprehensive analysis could prove in favour of Brass.

37

7 References

Brass, W. (1971). On the scale of mortality. Biological Aspects of Demography, William Brass (ed.). New York: Barnes & Noble Inc.

Carter Hill, R. Griffiths, W.E. & Lim, G. C.. (2008). Principles of Econometrics. USA: John

Wiley & Sons, Inc. 3rd edition. Currie, I. (2010). Volatility v Trend Risk: A technical note on estimating and forecasting with

random walk with drift. In: Longevitas (online document). [updated (21.04.2019); read (01.05.2019)]. Available from <https://www.longevitas.co.uk/site/informationmatrix/ volatilityv.trendrisk.html>

Folkehelseinstituttet (FHI) (2018). Life expectancy in Norway. In: Public Health Report - Health

status in Norway (online document). Oslo: Norwegian Institute of Public Health [updated (04.10.2018); read (01.04.2019)]. Available from <https://www.fhi.no/en/op/hin/population/life-expectancy/>

Himes, C. L., Preston, S. H. & Condran, G. (1994). A Relational Model of Mortality at Older

Ages in Low Mortality Countries. Population studies. Vol. 48(2), pp. 269-291. Keyfitz, N. (1991). Experiments in the Projection of Mortality. Canadian Studies in Population,

Vol. 18(2), pp. 1-17

Lee, R. D. & Carter, L. R. (1992). Modeling and forecasting U. S. mortality. Journal of the American Statistical Association, 87(419), pp. 659-671.

Newell, C. (1988). Methods and models in demography. Great Britain: Belhaven Press. Meslé, F., & Vallin, J. (2011). Historical trends in mortality. In: R.G. Rogers, & E.G. Crimmins

(eds). International handbook of adult mortality. New York: Springer, pp. 9–47. World Health Organisation (WHO) (2000). WHO System of Model Life Tables. GPE Discussion

Paper Series. No. 8, World Health Organisation.

Preston, S. H., Heuveline, P. & Guillot, M. (2001). Demography: measuring and modeling population processes.

38

Oksuzyan, A., Juel, K., Vaupel, J. W. & Christensen, K. (2008). Men: good health and high mortality. Sex differences in health and aging. Aging Clinical and Experimental Research, Vol. 20(2), pp. 91-102.

Rowland, D. T. (2003). Demographic methods and concepts. New York: Oxford University Press

Inc. Statistics Norway (2018). Norway’s 2018 population projections: Main results, methods

and assumptions. Report no. 2018/22, Statistics Norway.

Statistics Norway (2018). Statistikkbanken: Døde, dødelighetstabeller. [updated (28.06.2018); read (01.04.2019)]. Available from: <https://www.ssb.no/statbank/table/07902/> Stewart, Q. T. (2004). Brass’ Relational Model: A Statistical Analysis. Mathematical Population

Studies, Vol. 11(1), pp. 51-72. Sundberg, L., Agahi, N., Fritzell, J. & Fors, S. (2018). Why is the gender in life expectancy

decreasing? The impact of age- and cause-specific mortality in Sweden 1997-2014. International Journal of Public Health. Vol 63(6), pp. 673-681.

Mortality forecasting methods - UiO

Documents

Transcript of Mortality forecasting methods - UiO