USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES...
-
Upload
diana-barnett -
Category
Documents
-
view
253 -
download
1
Transcript of USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR PASSENGERS CONVEYANCES FROM EU COUNTRIES...
USE OF GENERALIZED LINEAR USE OF GENERALIZED LINEAR MODEL IN FORECASTING OF AIR MODEL IN FORECASTING OF AIR
PASSENGERS CONVEYANCES PASSENGERS CONVEYANCES FROM EU COUNTRIESFROM EU COUNTRIES
Catherine ZhukovskayaFaculty of Transport and Mechanical
EngineeringRiga Technical University
2
The 8th Tartu Conference on Multivariate Statistics
OutlineOutline
1. Introduction2. Informative base3. Used models for analyzing and forecasting of the air
passengers’ conveyances4. Elaboration of linear models 5. Elaboration of generalized linear models 6. Conclusion7. References
3
The 8th Tartu Conference on Multivariate Statistics
1. Introduction1. Introduction
Most the literature which is devoted to forecasting of transport flows contain only simple forecasting models on the base of the time series methods [Hünt (2003)] or linear regression methods with small number of explanatory variables [Butkevičius, Vyskupaitis (2005), Šliupas (2006)].
Two different approaches for the forecasting of air passengers conveyances from EU countries were considered in this investigation: the classical method of linear regression; the generalized linear model (GLM).
The aim of this investigation is to illustrate the advantage of using the GLM comparing with the simple linear regression models.
The verification of the models and the evaluation of the unknown parameters are included as well.
All calculations are being done with Statistica 6.0 and elaborated computer software in MathCad 12.
4
The 8th Tartu Conference on Multivariate Statistics
t1 - total population of the country (TP), millions of inhabitants;
t2 - area of the country (AREA), thousands of km2;
t3 - density of the country population (PD), number of inhabitants per km2;
t4 - monthly labour costs (MLC), thousands of euros;
t5 - gross domestic product (GDP) “per capita” in Purchasing Power Standards (PPS) (GDP_PPS);
t6 - gross domestic product (GDP), billions of euro;
t7 - comparative price level (CPL);
t8 - inflation rate (IR);
t9 - unemployment rate (UR);
t10 - labour productivity per hour worked (LPHW).
FactorsFactors
2. Informative base2. Informative base The forecasted variable was the number of air passenger carried,
expressed in millions of passengers.
5
The 8th Tartu Conference on Multivariate Statistics
The following 25 countries of EU were selected: Belgium, Czech Republic, Denmark, Germany, Estonia, Greece, Spain, France, Ireland, Italy, Cyprus, Latvia, Lithuania, Luxembourg, Hungary, Malta, Netherlands, Austria, Poland, Portugal, Slovenia, Slovakia, Finland, Sweden and United Kingdom.
The considered period was from 1996 to 2005.
All data for this investigation have been received from the electronic database“The Statistical Office of the European Communities” (EUROSTAT)
http://epp.eurostat.ec.europa.eu
The final number of the observation was 161: Data for the period from 1996 to 2004 have been used for the estimation
and forecasting - 140 observations; Data of the 2005 have been used for the check out of the quality of
forecasting, so called the cross-validation (CV) - 21 observations.
6
The 8th Tartu Conference on Multivariate Statistics
3. 3. Used models for analyzing and forecasting of Used models for analyzing and forecasting of the air passengers’ conveyancesthe air passengers’ conveyances
The data about concrete country for the concrete year were taken as the observation.
The main object of the consideration was the air passengers’ conveyances from EU countries.
All the considered models were the group models [Andronov (1983)].
Classification of regressional models according to their mathematical form: Linear regression models; Generalized linear regression models (GLM).
Main notionsMain notions
7
The 8th Tartu Conference on Multivariate Statistics
The linear regression model [Hardle (2004)]:
E(Y(k)(x)) = xT, (1)
where: Y(k) is a dependent variable for the k-th considered model;
x = (x1, x2, …, xd)T is d-dimensional vector of explanatory variables; = (0, 1, 2, …, d)T is a coefficient vector that has to be estimated
from observations for Y(k) and x.
The generalized linear regression model:
E(Y(k)(x)) = G{xT}, (2)
where G() is the known function of the one dimensional variable.
8
The 8th Tartu Conference on Multivariate Statistics
4. Elaboration of linear models4. Elaboration of linear models
The basic criteria for the best model choosing:1. Multiple coefficient of determination (R2);2. Fisher criterion (F);3. Sum of the squares of the residuals (SSRes);4. Sum of the squares of residuals for the cross-validation (CV SSRes).
For the checking of the statistical hypotheses we always used the statistical significance level = 0.05.
MODEL #1MODEL #1
Y(1) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6+ 7x7 + 8x8 + 9x9 + 10x10,
where Y(1) is the total number of air passenger carried;x1 = t1, x2 = t2, x3 = t3, x4 = t4, x5 = t5, x6 = t6, x7 = t7, x8 = t8, x9 = t9, x10 = t10.
9
The 8th Tartu Conference on Multivariate Statistics
Table 1
Results for the MODEL #1Results for the MODEL #1
Ê(Y(1)(x)) = 14 – 0,77x1 + 0,16x2 + 185,8x3 -2,44x4 + 0,53x5 + 0,07x6 + 0,05x7 +
+ 0,32x8 -1,2x9 - 1,03x10
..
Fisher criterion F = 63.49R2 = 0.831
Variable Factor b t(129) p-level
Intercept 14.00 0.84 0.405
x1 TP -0.77 -1.56 0.121
x2 AREA 0.16 5.60 0.000
x3 PD 185.80 4.67 0.000
x4 MLC -2.44 -0.44 0.660
x5 GDP_PPS 0.53 1.68 0.096
x6 GDP 0.07 3.81 0.000
x7 CPL 0.05 0.37 0.710
x8 IR 0.32 0.29 0.771
x9 UR -1.20 -1.59 0.114
x10 LPHW -1.03 -3.75 0.000
10
The 8th Tartu Conference on Multivariate Statistics
MODEL MODEL #2#2
Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5,
where Y(2) = Y(1);x1 = t2, x2 = t3, x3 = t6, x4 = t10, x5 = t11.
Results for the MODEL #2Results for the MODEL #2
Ê(Y(2)(x)) = 13.56 + 0,09x1 + 134,01x2 + 0,05x3 - 0,68x4 + 29,36x5.
t11 (ON) =0, if the considered country is the old member of EU; 1, if the considered country is the new one.
Table 2
Variable Factor b t(134) p-level
Intercept 13.56 2.45 0.016
x1 AREA 0.09 4.45 0.000
x2 PD 134.01 4.32 0.000
x3 GDP 0.05 10.34 0.000
x4 LPHW -0.68 -5.12 0.000
x5 ON 29.36 4.21 0.000
R2 = 0.829
Fisher criterion F = 129.85
New factorNew factor
11
The 8th Tartu Conference on Multivariate Statistics
MODEL MODEL #3#3
Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5,
where Y(3) = Y(1);
Table 3
Results for the MODEL #3Results for the MODEL #3
Ê(Y(3)(x)) = -6,34 + 113,26x1 + 0,14x2 - 0,52x3 - 0,03x4 + 3,03x5
R2 = 0.867
Fisher criterion F = 174.08
Modifications of factorsModifications of factors
2162161612122211 ,,,,,,, ttttttttttttttt
252141036231 ,,,, txtxtxtxtx
Variable Factor b t(134) p-level
Intercept -6.34 -1.05 0.296
x1 PD 113.26 4.00 0.000
x2 GDP 0.14 10.66 0.000
x3 LPHW -0.52 -5.80 0.000
x4 sq(TP) -0.03 -7.56 0.000
x5 sqrt(AREA) 3.03 5.74 0.000
12
The 8th Tartu Conference on Multivariate Statistics
Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the MODEL #3for the MODEL #3
1 2
-50.00
0.00
50.00
100.00
150.00
200.00
250.00
0 20 40 60 80 100 120 140
Observed Predicted
Figure 1. Plot of observed and predicted values.
Figure 2. Plot of observed and predicted values for the CV.
-50.00
0.00
50.00
100.00
150.00
200.00
250.00
0 3 6 9 12 15 18 21
CVObserved CVPredicted
13
The 8th Tartu Conference on Multivariate Statistics
MODEL MODEL #4#4
Y(4) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8 + 9x9,
where Y(4) = Y(1)/t1 - the ratio between the total number of air passenger carried and the number of inhabitants of the country;
Table 4
Results for the MODEL #4Results for the MODEL #4
Ê(Y(4)(x)) = 0,56 + 2,33x1 - 1,04x2 - 0,02x3 + 0,001x4 + 1,76x5 - 0,0004x6 +
+0,04x7 + 0,17x8.
R2 = 0.760
Fisher criterion F = 45.81
169128271611564433221 ,,,,,,, ttxttxtxtxtxtxtxtxtx ,
Variable Factor b t(131) p-level
Intercept -5.67 -6.25 0.000
x1 AREA -0.02 -6.73 0.000
x2 PD 10.37 6.19 0.000
x3 MLC -0.73 -4.19 0.000
x4 ON 0.83 8.30 0.000
x5 sqrt(TP) -1.02 -7.32 0.000
x6 sqrt(AREA) 1.06 7.10 0.000
x7 AREA/TP -0.12 -6.98 0.000
x8 sqrt(AREA)/TP 0.94 5.84 0.000
x9 GDP/TP 0.15 6.28 0.000
14
The 8th Tartu Conference on Multivariate Statistics
MODEL MODEL #5#5
Y(2) = 0 + 1x1 + 2x2 + 3x3 + 4x4 + 5x5 + 6x6 + 7x7 + 8x8,
where Y(5) = Y(4);
t12 (HL) =0, if the value y/t1 for the considered country is small (less than 2);
1, if the value y/t1 is larger than 2.
Table 5
Results for the MODEL #5Results for the MODEL #5
Ê(Y(5)(x)) = 0,99 - 0,46x1 - 0,02x2 - 0,02x3 - 0,02x4 + 0,01x5 + 1,27x6 + 1,15x7 + 0,07x8
R2 = 0.864
Fisher criterion F = 104.174
New New factor
Variable Factor b t(131) p-level
Intercept 0.99 3.93 0.000
x1 MLC -0.46 -3.41 0.001
x2 GDP_PPS -0.02 -3.81 0.000
x3 IR -0.02 -1.33 0.187
x4 UR -0.02 -1.90 0.056
x5 LPHW 0.01 3.72 0.000
x6 ON 1.27 9.21 0.000
x7 HL 1.15 15.30 0.000
x8 GDP/TP 0.07 3.41 0.001
.,,,,,,, 16812711610594835241 ttxtxtxtxtxtxtxtx
15
The 8th Tartu Conference on Multivariate Statistics
Pivot results for the linear regression modelsPivot results for the linear regression models
Model R2 R1 F R2 SSRes R3CV
SSResR4
Sum R
Total R
#1 0.831 3 63.49 4 52 651 5 114 885 5 17 5
#2 0.829 4 129.85 2 53 344 5 109 723 4 15 3
#3 0.867 1 174.10 1 41 599 2 49 450 1 5 1
#4 0.760 5 45.81 5 35 064 3 57 310 3 16 4
#5 0.864 2 104.20 3 12 775 1 51 448 2 8 2
Table 6
16
The 8th Tartu Conference on Multivariate Statistics
Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the MODEL #5for the MODEL #5
0.00
50.00
100.00
150.00
200.00
250.00
0 20 40 60 80 100 120 140
RObserved RPredicted
3 4
Figure 3. Plot of recalculated observed and predicted values.
Figure 4. Plot of recalculated observed and predicted values for the CV.
0.00
50.00
100.00
150.00
200.00
250.00
0 3 6 9 12 15 18 21
RObserved RCVPredicted
17
The 8th Tartu Conference on Multivariate Statistics
4. Elaboration of generalized linear models4. Elaboration of generalized linear models
For the further investigation the best linear regression model (Model #5) has been chosen
Two different GLM were considered. In both of them the value of the regressand Y(GLM) = Y(5) / t1 and the collection of the regressors are the same as for Model #5.
GLM1GLM1
where hi is the total population number, xi is vector-columns of the independent variables, i is the observation number, i = 1, 2, …, n.
,
jji
jji
i
x
x
hYE
,
,GLM1
exp1
exp
j
j
i
β
β
x (3)
GLM2GLM2 ,1
j
ji
i
xa
hYE
,
GLM2
exp j
i
β
x(4)
where a is additional parameter (constant).
18
The 8th Tartu Conference on Multivariate Statistics
For unknown parameter vector estimation we used the least squares criterion
n
i βii YYβR
1
2
0 minˆ
1. Linearization
(5)
where Yi and Ŷi are observed and calculated values of Y.
j
jij*
*
xβY
Y,
1ln
jjij*
xβaY
,1
ln
LM1LM1
LM2LM2
(6)
(7)
where Y* = Y/ h.
19
The 8th Tartu Conference on Multivariate Statistics
.ˆ987654321
987654321
0.647.810.290.4448.80.70.026.680.00113.78
0.647.810.290.4448.80.70.026.680.00113.78LM1
1 xxxxxxxxx
xxxxxxxxx
e
ehxYE
The models LM1 and LM2 give the following estimate for E(Y)
.0.3
1987654321 0.110.410.21.6717.960.810.041.71.6311.65
LM2xxxxxxxxxe
hxYE ˆ
We can see that linearization gives bad results. Making attempts to improve the obtained results a two-stage estimation procedure was developed.
The first stage corresponds to the considered linearization. As the second step we used the procedure of calibration when we precise the gotten estimates by using the well-known gradient method.
SSRes CV SSRes
Model #5 LM1 LM2 Model #5 LM1 LM2
R0/n 12 775 27 447 21 834 51 448 676 576 229 554
Table 7
The values of SSRes and CV SSRes for the Model #5 and LM
20
The 8th Tartu Conference on Multivariate Statistics
Gradients for the least squares criterion
2
,
,1
1,
,
exp1
exp
exp1
exp
2
jjij
ij
jijin
i
jjij
jjij
ii
xβ
xxβh
xβ
xβ
hYβR
2
,
,1
1, exp
exp
exp
12
jjij
ij
jijin
i
jjij
ii
xβa
xxβh
xβa
hYβR
GLM1GLM1
GLM2GLM2
(8)
(9)
2. Calibration
21
The 8th Tartu Conference on Multivariate Statistics
The GLM1 and GLM2 have the following estimates for E(Y):
,1 987654321
987654321
0.150.680.111.265.770.760.021.221.057.05
0.150.680.111.265.770.760.021.221.057.05GLM1
xxxxxxxxx
xxxxxxxxx
e
ehxYE
ˆ
.6.3
1987654321 0.060.130.11.127.810.820.020.781.097.26
GLM2xxxxxxxxxe
hxYE ˆ
CV SSRes
Model #5 GLM1 GLM2
R0/n 51 447 47 807 34 567
Table 8
For the GLM2 we found the optimum value of R0 not only from the values but from the parameter also.
22
The 8th Tartu Conference on Multivariate Statistics
Analysis of observed and predicted valuesAnalysis of observed and predicted valuesfor the GLMfor the GLM
5 6
Figure 5. Plot of observed and predicted values.
Figure 6. Plot of observed and predicted values for the CV.
-50
0
50
100
150
200
250
300
0 20 40 60 80 100 120 140
Robserved GLM1 GLM2
-50.00
0.00
50.00
100.00
150.00
200.00
250.00
0 3 6 9 12 15 18 21
CV Observed CV GLM1 CV GLM2
23
The 8th Tartu Conference on Multivariate Statistics
0
10000
20000
30000
40000
50000
60000
70000
80000
1 2 3 4 5 6 7 8 9 10
SSRes CV SSRes
Figure 7. The values of SSRes and CV SSRes as a function of parameter for GLM 2
Dependence of values SSRes and CV SSRes from the Dependence of values SSRes and CV SSRes from the value of parameter value of parameter for GLM2 for GLM2
7
The optimal value for analysis of SSRes was obtained then = 2. The best result for the analysis of CV SSRes was obtained then = 6.
24
The 8th Tartu Conference on Multivariate Statistics
6. Conclusion6. Conclusion The linear and generalized linear regressional models for the
forecasting of air passengers conveyances from EU countries were considered. These models contain a big number of explanatory factors and their combinations.
For the estimation of the unknown parameters of the linear regressional models we used the standard procedures. For the estimation of unknown parameters of GLM the special two-stage procedure has been elaborated.
The cross-validation approach has been taken as the main procedure for the check out the adequacy of all considered models and choosing the best model for the forecasting.
The advantage of GLM application has been shown.
25
The 8th Tartu Conference on Multivariate Statistics
7. References7. References 1. Andronov A.M. etc. Forecasting of air passengers conveyances on the
transport. // Transport, Moscow, 1983. (In Russian).
2. Butkevičius J., Vyskupaitis A. Development of passenger transportation by Lithuanian sea transport. // In Proceedings of International Conference RelStat’04, Transport and Telecommunication, Vol.6. N 2, 2005.
3. Hardle W., Muller M., Sperlich S., Werwatz A. Nonparametric and Semiparametric Models. Springer, Berlin, 2004.
4. Hünt U. Forecasting of railway freight volume: approach of Estonian railway to arise efficiency. // In TRANSPORT – 2003, Vol. XXVIII, No 6, pp. 255-258.
5. Šliupas T. Annual average daily traffic forecasting using different techniques. // In TRANSPORT – 2006, Vol. XXI, No 1, pp. 38-43.
6. EUROSTAT YEARBOOK 2005. The statistical guide to Europe. Data 1993–2004. EU, EuroSTAT, 2005.URL: http://epp.eurostat.ec.europa.eu
26
The 8th Tartu Conference on Multivariate Statistics
THANK YOU FOR YOUR ATTENTIONTHANK YOU FOR YOUR ATTENTION