PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

15
Research Article Prediction of Train Arrival Delay Using Hybrid ELM-PSO Approach Xu Bao , 1 Yanqiu Li , 2 Jianmin Li , 2 Rui Shi , 2 and Xin Ding 2 1 College of Traffic Engineering, Huaiyin Institute of Technology, Jiangsu 223001, China 2 School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China Correspondence should be addressed to Xu Bao; [email protected] Received 10 May 2021; Accepted 4 June 2021; Published 14 June 2021 Academic Editor: Linchao Li Copyright © 2021 Xu Bao et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In this study, a hybrid method combining extreme learning machine (ELM) and particle swarm optimization (PSO) is proposed to forecast train arrival delays that can be used for later delay management and timetable optimization. First, nine characteristics (e.g., buffer time, the train number, and station code) associated with train arrival delays are chosen and analyzed using extra trees classifier. Next, an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristics mentioned before as input features. Furthermore, the PSO algorithm is chosen to optimize the hyperparameter of the ELM compared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating. Finally, a case is studied to confirm the advantage of the proposed model. Contrasted to four baseline models (k-nearest neighbor, categorical boosting, Lasso, and gradient boosting decision tree) across different metrics, the proposed model is demonstrated to be proficient and achieve the highest prediction accuracy. In addition, through a detailed analysis of the prediction error, it is found that our model possesses good robustness and correctness. 1. Introduction With the rapid development of society and the continuous improvement of people’s quality of life, people have put forward higher requirements for the reliability and punc- tuality of high-speed railway transportation [1]. However, the train will inevitably be disturbed by a large number of random factors in the process of running, which will lead to the train delay. For one thing, train delay will change the structure of train diagram, increase the cost of railway operation and the difficulty of reasonable utilization of transportation resources, and have a great negative impact on the reliability and punctuality of high-speed railway operation. For another, it will increase the travel time of passengers, affect their travel plans, and bring serious in- convenience to passengers [2]. erefore, accurate forecast of train delay is of great significance for high-speed train operation organization, transportation service quality im- provement, and operation safety [3]. e traditional models are a classical approach for train delay prediction, such as probability distribution models [4, 5], regression models, event-driven methods, and graph theory- based approaches. For the probability distribution model, Higgins and Kozan proposed an exponential distribution model, which applied a three-way, two-block station train delay propagation signal system, to estimate delays of trains caused by train operational accidents [4]. rough the assessment of the linear relationship between several independent features and dependent features [2], regression models were widely employed to predict train delays, dwell times, and running times [6, 7]. However, the main drawback of regression models is that the ability of linear analytic model relies much on in- ternal and mathematical assumptions. ey are good at cap- turing the linear relationship between features and dealing with low-dimensional data, not simple linear data, such as train operation data [8]. For event-driven and graph theory-based methods, Kecman and Goverde [9] used timed event graph with dynamic weight to predict train running time. Milinkovi´ c Hindawi Journal of Advanced Transportation Volume 2021, Article ID 7763126, 15 pages https://doi.org/10.1155/2021/7763126

Transcript of PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

Page 1: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

Research ArticlePrediction of Train Arrival Delay Using HybridELM-PSO Approach

Xu Bao 1 Yanqiu Li 2 Jianmin Li 2 Rui Shi 2 and Xin Ding 2

1College of Traffic Engineering Huaiyin Institute of Technology Jiangsu 223001 China2School of Traffic and Transportation Beijing Jiaotong University Beijing 100044 China

Correspondence should be addressed to Xu Bao baoxuhyiteducn

Received 10 May 2021 Accepted 4 June 2021 Published 14 June 2021

Academic Editor Linchao Li

Copyright copy 2021 Xu Bao et alis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In this study a hybrid method combining extreme learningmachine (ELM) and particle swarm optimization (PSO) is proposed toforecast train arrival delays that can be used for later delay management and timetable optimization First nine characteristics(eg buffer time the train number and station code) associated with train arrival delays are chosen and analyzed using extra treesclassifier Next an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristicsmentioned before as input features Furthermore the PSO algorithm is chosen to optimize the hyperparameter of the ELMcompared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating Finally a case isstudied to confirm the advantage of the proposed model Contrasted to four baseline models (k-nearest neighbor categoricalboosting Lasso and gradient boosting decision tree) across different metrics the proposedmodel is demonstrated to be proficientand achieve the highest prediction accuracy In addition through a detailed analysis of the prediction error it is found that ourmodel possesses good robustness and correctness

1 Introduction

With the rapid development of society and the continuousimprovement of peoplersquos quality of life people have putforward higher requirements for the reliability and punc-tuality of high-speed railway transportation [1] Howeverthe train will inevitably be disturbed by a large number ofrandom factors in the process of running which will lead tothe train delay For one thing train delay will change thestructure of train diagram increase the cost of railwayoperation and the difficulty of reasonable utilization oftransportation resources and have a great negative impacton the reliability and punctuality of high-speed railwayoperation For another it will increase the travel time ofpassengers affect their travel plans and bring serious in-convenience to passengers [2] erefore accurate forecastof train delay is of great significance for high-speed trainoperation organization transportation service quality im-provement and operation safety [3]

e traditional models are a classical approach for traindelay prediction such as probability distribution models [4 5]regression models event-driven methods and graph theory-based approaches For the probability distribution modelHiggins and Kozan proposed an exponential distributionmodel which applied a three-way two-block station train delaypropagation signal system to estimate delays of trains causedby train operational accidents [4] rough the assessment ofthe linear relationship between several independent featuresand dependent features [2] regression models were widelyemployed to predict train delays dwell times and runningtimes [6 7] However the main drawback of regressionmodelsis that the ability of linear analytic model relies much on in-ternal and mathematical assumptions ey are good at cap-turing the linear relationship between features and dealing withlow-dimensional data not simple linear data such as trainoperation data [8] For event-driven and graph theory-basedmethods Kecman and Goverde [9] used timed event graphwith dynamic weight to predict train running time Milinkovic

HindawiJournal of Advanced TransportationVolume 2021 Article ID 7763126 15 pageshttpsdoiorg10115520217763126

et al [10] used fuzzy Petri net to predict the train delays themodel considers the characteristics of the hierarchical structureand fuzzy reasoning to simulate the train operation and pre-dicts the train delays in different delay scenarios Huang et al[8] used graph theory to calculate the degree and propagationrange of train delay under specific condition Although thetraditional model started early in the study of delay predictionit generally has the limitation of poor generalization perfor-mance and is only suitable for specific scenarios

Recently the application of machine learning methodsto predict train delays has been widely concerned by re-searchers which makes up for the shortcomings of tradi-tional methods [2] e purpose of Peters et al was to utilizethe historical travel time between stations to predict the trainarrival time more precisely [11] e moving average al-gorithm of historical travel time and KNNs of last arrivaltime algorithm were employed and estimated Some re-searchers are devoted to ANNs to predict train delays[12ndash14]e aim of [14] was to propose preeminent ANNs topredict the train delay of Iran railway with three differentmodels including standard real number binary coding andbinary set encoding inputs Nevertheless the predictionaccuracy of ANNs cannot meet the needs of actual delaymanagement and the parameter adjustment is complexMarkovic et al [2] proposed a support vector regressionmodel in train delay problem of passenger train whichcaptured the relationship between the arrival delay and avariety of changing external factors and compared it withthe artificial neural networks e results indicated that thesupport vector regression method outperformed the ANNsAnother neural network has been proposed in recent yearsA Bayesian network model for predicting the propagation oftrain delays was presented by [15] In view of the complexityand dependence three different BN schemes for train delayprediction were proposed namely heuristic hill-climbingprimitive linear and hybrid structures [16] e resultsturned out to be quite satisfying Recently it has becomepopular to combine several models to capture variouscharacteristics of train operation data to predict train delayA study developed a train delay prediction model whichcombines convolutional neural networks long short-termmemory network and fully-connected neural network ar-chitectures to solve this issue [17]

To improve the backpropagation algorithm and simplifythe setting of learning parameters of general machinelearning models the ELM algorithm was proposed by BinHuang et al [18] ELM has the advantages of small com-putation good generalization and fast convergence Onaccount of these advantages ELM has been frequently ap-plied to regression problems in the real world [19ndash22]erefore a new study that combined a shallow ELM and adeep ELM tuned via the threshold out technique wasemployed to predict train delays taking the weather datainto account [23]

Parameter adjustment is another critical factor toguarantee the good performance of machine learningmodels [24 25] Although the well-known random searchalgorithm can achieve the purpose of optimization itgenerates all the solutions randomly without considering the

previous solutions An adjusting parameter model calledPSO has become one of the widely used parameter ad-justment methods because of its ability to address intractablematters in the real world Only the optimal particle of PSOtransmits the information to the next particle in the iterativeevolution process As a consequence the searching speed ofPSO is faster than random search and grid search [26] eexperiment [27] did just prove the advantage of PSO Bycomparing the performance of PSO with random searchalgorithm for the optimal control problem [27] found outthat PSO was capable of locating better solution with thesame number of fitness function calculations than randomsearch algorithm

erefore according to what the author has learnt wepropose PSO to optimize the hyperparameter of ELM toforecast train arrival delays

e contributions this paper makes are as follows

(1) emain features affecting the train delay predictionare evaluated by the extra trees classifier en theproposed model is constructed based on these fea-tures which possess spatiotemporal characteristics(train delays at each station) In this way the in-terpretability of the proposed model is improved

(2) e proposed model is applied to the arrival delayprediction of trains on HSR line which suggests abrand-new perspective for the train delay predictionproblem In addition to solving the drawbacks ofbackpropagation algorithm the advantage of ELM-PSO is also to solve the arduous problem of manualregulating the hidden neurons of ELM better thanrandom search and Bayesian optimization at accu-racy and efficiency

(3) We perform experiments on a section of theWuhan-Guangzhou (W-G) HSR line e proposed modelnot only is compared to other two adjusting pa-rameter models but also is contrasted with fourprediction models from different perspectives Ourmodel turns out to have an extraordinary ability inmanaging large-scale data in accuracy

e remainder of this paper is distributed as follows inSection 2 the train delay problem and selection of char-acteristic features are described e hybrid ELM-PSO ap-proach is introduced in detail in Section 3 e datadescription and experimental settings are presented inSection 4e performance analysis is discussed in Section 5Finally conclusions are presented in Section 6

2 Description of the Train Delay Problem

Train delay problem is visualized in Figure 1 to assist incomprehending this abstract problem e train delaycontains two contents train arrival delay and train departuredelay For a station sn tAsn represents the time that one trainis scheduled to arrive at station sn and the same goes for tDsnwhich implies the time that one train is scheduled to departat station sn Certainly the train will have its own actualtimetable due to changing external factors which are

2 Journal of Advanced Transportation

expressed as tAsnprime and tDsnprime respectively e difference be-tween the actual and scheduled arrival time at station sntAsnprime minus tAsn is referred to as the train arrival delay e samegoes for the train departure delay tDsnprime minus tDsn is is theprimitive description of the train delay problem

is paper only focuses on the train arrival delay pre-diction We suppose that there is an aimed train Tk which isat present station sn at time tAsn Our purpose is to predictthe arrival delay (tAsn+1prime minus tAsn+1) of the targeted train Tkat itsfollowing station sn+1 for all conditions according to theinformation of train k at stations sn sn+1 and snminus1 which ismade up of the following nine features

(1) e station code (X1)(2) e train number (X2) which indicates the number

of the trains(3) e length between the present station and the next

station (Dsn+1 minus Dsn) (X3)(4) e scheduled running times between the present

station and the previous station (tAsn minus tAsnminus1) (X4)(5) e actual running times between the present station

and the previous station (tAsnprime minus tAsnminus1prime ) (X5)(6) e scheduled running times between the present

station and the next station (tAsn+1 minus tAsn) (X6)(7) e actual running times between the present station

and the next station (tAsn+1prime minus tAsnprime ) (X7)(8) Buffer time which indicates the difference between

X6 and actual minimum running time of all trainsbetween the present station and the next station(X6 minus min T1(tAsn+1prime minus tAsnprime ) middot middot middot Tk(tAsn+1prime minus tAsnprime )1113864 1113865)(X8)

(9) e arrival delay time at the present station(tAsnprime minus tAsn) (X9)

Y represents the arrival delay time at the next station sn+1of train Tk (tAsn+1prime minus tAsn+1)

ere are multiple potential interdependent features(eg the train number the length between two adjacentstations) that are intently related to train delay prediction

Based on the collected data and the experience of dis-patchers we ultimately select nine features that are possibleto influence train delays

We apply extra trees classifier to analyze the correlationbetween all features and train delays e results areexhibited in Figure 2 As shown in the figure the deeper thered the higher the importanceere is no doubt that X9 hasthe highest importance with Y e actual and scheduledrunning times between the present station and the nextstation also contribute largely to the accurate prediction ofY Moreover the buffer time which is an important factoraffecting the length of the train recovery time is alsocomparatively prominent in delay prediction processTaking the buffer time into account allows us to obtain morerealistic prediction results

e train arrival delay prediction problem in this paper istransformed into the following expression

Y f X1 X2 X3 X4 X5 X6 X7 X8 X9( 1113857 (1)

where Xi is the information of train Tk running throughstations sn sn+1 and snminus1 Y is the arrival delay time at thefollowing station sn+1 of train Tk and f(x) is the predictionprocess

3 Methodology

is paper proposes a hybrid model of ELM and PSO fortrain delay prediction ELM is widely used in regressionproblems because of its advantages of small computationgood generalization performance and fast convergencespeed [19ndash21] PSO algorithm is a random and paralleloptimization algorithm which has the advantages of fastconvergence speed and simple algorithm [25 28] ereforewe aim to combine the advantages of ELM and PSO algo-rithm to improve the behavioral knowledge in the delayprediction domain For the principle of ELM and PSO onecan refer to Li et al [29] Perceptron et al [30] and Zhanget al [31] e running process of the proposed hybridmethod is as follows

Dwell time

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

tAsnndash1 tDsnndash1 tAsn tDsn tAsn+1 tDsn+1

tprimeAsnndash1 tprimeDsnndash1 tprimeAsn tprimeDsn tprimeAsn+1 tprimeDsn+1

Arrival delay time

Nominaltimetable

Actualtimetable

Breakdown

Figure 1 Conversion from the train itinerary to mathematical notation

Journal of Advanced Transportation 3

Step 1 data preprocessing First 9 features mentionedin Section 2 are generated a N times 9 matrix where N

represents the total number of events according to thetrain operation records Second remove abnormaldelay (trains may be canceled due to some emergencies)to reduce its interference with predictions ird fill inthe missing data according to the adjacent data aroundthe missing onesStep 2 initializing the parameters and populationParameters such as maximal iteration number pop-ulation size and speed and position of the first particleare initialized Each particle l has its own positionAl(ite) and speed Vl(ite) e position of each particlein the population is equivalent to the number ofneurons in the hidden layer of ELMerefore there ismerely one dimension of each particle

Al(ite) hl(ite) l 1 k (2)

where hl(ite) represents the number of hidden layerneurons of ELM in the iteth iterationStep 3 ELM (hidden layer activation function sigmoidfunction) is used e processed feature set X and theposition of particles (the number of hidden layerneurons) generated by PSO are input into ELMConsequently ELM can output the weight matrixunder the current number of hidden layer neuronsefunction of calculating the fitness of particles is asfollows

pl(ite) fitness Al(ite) X( 1113857

1113936Nn1 yn minus fn(x)( 1113857

2

N

1113971

Al(ite) hl(ite) l 1 k

(3)

where N is the number of samples yn is the actualoutput value on test set and fn(x) is the predictedoutput value on test setStep 4 calculate the fitness of each particle andcompare to update the current best fitness and itsparticle locationStep 5 start the iteration PSO will update the positionsand velocities of all particles and then repeat step 4 Ifthe maximum number of iterations is exceeded it willend the processStep 6 output the results We can obtain the outputvalue on test set as well as the optimal number ofhidden layer neurons

e specific flowchart is shown in Figure 3

4 Application to a Case Study

41 Dataset Description e data employed to verify theELM-PSO are obtained from the dispatching office of arailway bureaue 15 stations applied in the study include asection the length of which is 1096 km from CBN to GZS onthe double-track W-G HSR line ere are more than400000 data points used in this study with a time span fromOctober 2018 to April 2019e train original operation dataand route map of the targeted 15 stations on the W-G HSRline are shown in Table 1 and Figure 4

Analysis of the delay ratio of each station reveals not onlythe condition of each station but also an increasing emphasison the indispensability of train arrival delay predictionwhich contributes to improving the ability of each station tocope with and even inhibit the increase in train arrivaldelays Trains with arrival delay greater than 4 minutes areconsidered as delayed trains What is intuitively presented inFigure 5 is that the delay ratios of all the stations are basically

00569

00193

00132

00165

00355

0236

0015

00151

05924

0 02 04 06Important to Y

X9

X8

X7

X6

X5

X4

X3

X2

X1

Inpu

t var

iabl

es

Figure 2 Bar chart of the correlations between the nine input features and output Y

4 Journal of Advanced Transportation

Input data

Output predicted delay

Data preprocessing

Initializing the parameters and population

Updating the positionsand velocities of all particles

Maximum numberof iterations

No

ite = ite + 1

Yes

Calculating the fitness ofeach particle and updating the current best fitness

Figure 3 e flowchart of ELM-PSO method

Table 1 Train operation data format in the database

Station Station code Date Actual arrival Actual departure Train Scheduled arrival Scheduled departureGZS 369 2018727 1204 1204 G100 1205 1205GZN 368 2018727 1219 1219 G100 1219 1219QY 367 2018727 1226 1226 G100 1226 1226YDW 366 2018727 1237 1237 G100 1238 1238

Wuchang (WC)

Xianningbei (XNN)Chibibei (CBN)

Yueyangdong (YYE)

Miluodong (MLE)

Changshanan (CSS)Zhuzhouxi (ZZW)

Hengshanxi (HSW)

Hengyangdong (HYE)Leiyangxi (LYW)Chenzhouxi (CZW)Lechangdong (LCE)Shaoguan (SG)Yingdexi (YDW)Qingyuan (QY)Guangzhoubei (GZN)

Guangzhounan (GZS)

Qingsheng (QS)

HFSCQN

NN

Hong Kong

N

Wuhan-Guangzhouhigh-speed railwayline (W-G HSR line)

NWC

GY

Figure 4 Map of the W-G HSR line

Journal of Advanced Transportation 5

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 2: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

et al [10] used fuzzy Petri net to predict the train delays themodel considers the characteristics of the hierarchical structureand fuzzy reasoning to simulate the train operation and pre-dicts the train delays in different delay scenarios Huang et al[8] used graph theory to calculate the degree and propagationrange of train delay under specific condition Although thetraditional model started early in the study of delay predictionit generally has the limitation of poor generalization perfor-mance and is only suitable for specific scenarios

Recently the application of machine learning methodsto predict train delays has been widely concerned by re-searchers which makes up for the shortcomings of tradi-tional methods [2] e purpose of Peters et al was to utilizethe historical travel time between stations to predict the trainarrival time more precisely [11] e moving average al-gorithm of historical travel time and KNNs of last arrivaltime algorithm were employed and estimated Some re-searchers are devoted to ANNs to predict train delays[12ndash14]e aim of [14] was to propose preeminent ANNs topredict the train delay of Iran railway with three differentmodels including standard real number binary coding andbinary set encoding inputs Nevertheless the predictionaccuracy of ANNs cannot meet the needs of actual delaymanagement and the parameter adjustment is complexMarkovic et al [2] proposed a support vector regressionmodel in train delay problem of passenger train whichcaptured the relationship between the arrival delay and avariety of changing external factors and compared it withthe artificial neural networks e results indicated that thesupport vector regression method outperformed the ANNsAnother neural network has been proposed in recent yearsA Bayesian network model for predicting the propagation oftrain delays was presented by [15] In view of the complexityand dependence three different BN schemes for train delayprediction were proposed namely heuristic hill-climbingprimitive linear and hybrid structures [16] e resultsturned out to be quite satisfying Recently it has becomepopular to combine several models to capture variouscharacteristics of train operation data to predict train delayA study developed a train delay prediction model whichcombines convolutional neural networks long short-termmemory network and fully-connected neural network ar-chitectures to solve this issue [17]

To improve the backpropagation algorithm and simplifythe setting of learning parameters of general machinelearning models the ELM algorithm was proposed by BinHuang et al [18] ELM has the advantages of small com-putation good generalization and fast convergence Onaccount of these advantages ELM has been frequently ap-plied to regression problems in the real world [19ndash22]erefore a new study that combined a shallow ELM and adeep ELM tuned via the threshold out technique wasemployed to predict train delays taking the weather datainto account [23]

Parameter adjustment is another critical factor toguarantee the good performance of machine learningmodels [24 25] Although the well-known random searchalgorithm can achieve the purpose of optimization itgenerates all the solutions randomly without considering the

previous solutions An adjusting parameter model calledPSO has become one of the widely used parameter ad-justment methods because of its ability to address intractablematters in the real world Only the optimal particle of PSOtransmits the information to the next particle in the iterativeevolution process As a consequence the searching speed ofPSO is faster than random search and grid search [26] eexperiment [27] did just prove the advantage of PSO Bycomparing the performance of PSO with random searchalgorithm for the optimal control problem [27] found outthat PSO was capable of locating better solution with thesame number of fitness function calculations than randomsearch algorithm

erefore according to what the author has learnt wepropose PSO to optimize the hyperparameter of ELM toforecast train arrival delays

e contributions this paper makes are as follows

(1) emain features affecting the train delay predictionare evaluated by the extra trees classifier en theproposed model is constructed based on these fea-tures which possess spatiotemporal characteristics(train delays at each station) In this way the in-terpretability of the proposed model is improved

(2) e proposed model is applied to the arrival delayprediction of trains on HSR line which suggests abrand-new perspective for the train delay predictionproblem In addition to solving the drawbacks ofbackpropagation algorithm the advantage of ELM-PSO is also to solve the arduous problem of manualregulating the hidden neurons of ELM better thanrandom search and Bayesian optimization at accu-racy and efficiency

(3) We perform experiments on a section of theWuhan-Guangzhou (W-G) HSR line e proposed modelnot only is compared to other two adjusting pa-rameter models but also is contrasted with fourprediction models from different perspectives Ourmodel turns out to have an extraordinary ability inmanaging large-scale data in accuracy

e remainder of this paper is distributed as follows inSection 2 the train delay problem and selection of char-acteristic features are described e hybrid ELM-PSO ap-proach is introduced in detail in Section 3 e datadescription and experimental settings are presented inSection 4e performance analysis is discussed in Section 5Finally conclusions are presented in Section 6

2 Description of the Train Delay Problem

Train delay problem is visualized in Figure 1 to assist incomprehending this abstract problem e train delaycontains two contents train arrival delay and train departuredelay For a station sn tAsn represents the time that one trainis scheduled to arrive at station sn and the same goes for tDsnwhich implies the time that one train is scheduled to departat station sn Certainly the train will have its own actualtimetable due to changing external factors which are

2 Journal of Advanced Transportation

expressed as tAsnprime and tDsnprime respectively e difference be-tween the actual and scheduled arrival time at station sntAsnprime minus tAsn is referred to as the train arrival delay e samegoes for the train departure delay tDsnprime minus tDsn is is theprimitive description of the train delay problem

is paper only focuses on the train arrival delay pre-diction We suppose that there is an aimed train Tk which isat present station sn at time tAsn Our purpose is to predictthe arrival delay (tAsn+1prime minus tAsn+1) of the targeted train Tkat itsfollowing station sn+1 for all conditions according to theinformation of train k at stations sn sn+1 and snminus1 which ismade up of the following nine features

(1) e station code (X1)(2) e train number (X2) which indicates the number

of the trains(3) e length between the present station and the next

station (Dsn+1 minus Dsn) (X3)(4) e scheduled running times between the present

station and the previous station (tAsn minus tAsnminus1) (X4)(5) e actual running times between the present station

and the previous station (tAsnprime minus tAsnminus1prime ) (X5)(6) e scheduled running times between the present

station and the next station (tAsn+1 minus tAsn) (X6)(7) e actual running times between the present station

and the next station (tAsn+1prime minus tAsnprime ) (X7)(8) Buffer time which indicates the difference between

X6 and actual minimum running time of all trainsbetween the present station and the next station(X6 minus min T1(tAsn+1prime minus tAsnprime ) middot middot middot Tk(tAsn+1prime minus tAsnprime )1113864 1113865)(X8)

(9) e arrival delay time at the present station(tAsnprime minus tAsn) (X9)

Y represents the arrival delay time at the next station sn+1of train Tk (tAsn+1prime minus tAsn+1)

ere are multiple potential interdependent features(eg the train number the length between two adjacentstations) that are intently related to train delay prediction

Based on the collected data and the experience of dis-patchers we ultimately select nine features that are possibleto influence train delays

We apply extra trees classifier to analyze the correlationbetween all features and train delays e results areexhibited in Figure 2 As shown in the figure the deeper thered the higher the importanceere is no doubt that X9 hasthe highest importance with Y e actual and scheduledrunning times between the present station and the nextstation also contribute largely to the accurate prediction ofY Moreover the buffer time which is an important factoraffecting the length of the train recovery time is alsocomparatively prominent in delay prediction processTaking the buffer time into account allows us to obtain morerealistic prediction results

e train arrival delay prediction problem in this paper istransformed into the following expression

Y f X1 X2 X3 X4 X5 X6 X7 X8 X9( 1113857 (1)

where Xi is the information of train Tk running throughstations sn sn+1 and snminus1 Y is the arrival delay time at thefollowing station sn+1 of train Tk and f(x) is the predictionprocess

3 Methodology

is paper proposes a hybrid model of ELM and PSO fortrain delay prediction ELM is widely used in regressionproblems because of its advantages of small computationgood generalization performance and fast convergencespeed [19ndash21] PSO algorithm is a random and paralleloptimization algorithm which has the advantages of fastconvergence speed and simple algorithm [25 28] ereforewe aim to combine the advantages of ELM and PSO algo-rithm to improve the behavioral knowledge in the delayprediction domain For the principle of ELM and PSO onecan refer to Li et al [29] Perceptron et al [30] and Zhanget al [31] e running process of the proposed hybridmethod is as follows

Dwell time

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

tAsnndash1 tDsnndash1 tAsn tDsn tAsn+1 tDsn+1

tprimeAsnndash1 tprimeDsnndash1 tprimeAsn tprimeDsn tprimeAsn+1 tprimeDsn+1

Arrival delay time

Nominaltimetable

Actualtimetable

Breakdown

Figure 1 Conversion from the train itinerary to mathematical notation

Journal of Advanced Transportation 3

Step 1 data preprocessing First 9 features mentionedin Section 2 are generated a N times 9 matrix where N

represents the total number of events according to thetrain operation records Second remove abnormaldelay (trains may be canceled due to some emergencies)to reduce its interference with predictions ird fill inthe missing data according to the adjacent data aroundthe missing onesStep 2 initializing the parameters and populationParameters such as maximal iteration number pop-ulation size and speed and position of the first particleare initialized Each particle l has its own positionAl(ite) and speed Vl(ite) e position of each particlein the population is equivalent to the number ofneurons in the hidden layer of ELMerefore there ismerely one dimension of each particle

Al(ite) hl(ite) l 1 k (2)

where hl(ite) represents the number of hidden layerneurons of ELM in the iteth iterationStep 3 ELM (hidden layer activation function sigmoidfunction) is used e processed feature set X and theposition of particles (the number of hidden layerneurons) generated by PSO are input into ELMConsequently ELM can output the weight matrixunder the current number of hidden layer neuronsefunction of calculating the fitness of particles is asfollows

pl(ite) fitness Al(ite) X( 1113857

1113936Nn1 yn minus fn(x)( 1113857

2

N

1113971

Al(ite) hl(ite) l 1 k

(3)

where N is the number of samples yn is the actualoutput value on test set and fn(x) is the predictedoutput value on test setStep 4 calculate the fitness of each particle andcompare to update the current best fitness and itsparticle locationStep 5 start the iteration PSO will update the positionsand velocities of all particles and then repeat step 4 Ifthe maximum number of iterations is exceeded it willend the processStep 6 output the results We can obtain the outputvalue on test set as well as the optimal number ofhidden layer neurons

e specific flowchart is shown in Figure 3

4 Application to a Case Study

41 Dataset Description e data employed to verify theELM-PSO are obtained from the dispatching office of arailway bureaue 15 stations applied in the study include asection the length of which is 1096 km from CBN to GZS onthe double-track W-G HSR line ere are more than400000 data points used in this study with a time span fromOctober 2018 to April 2019e train original operation dataand route map of the targeted 15 stations on the W-G HSRline are shown in Table 1 and Figure 4

Analysis of the delay ratio of each station reveals not onlythe condition of each station but also an increasing emphasison the indispensability of train arrival delay predictionwhich contributes to improving the ability of each station tocope with and even inhibit the increase in train arrivaldelays Trains with arrival delay greater than 4 minutes areconsidered as delayed trains What is intuitively presented inFigure 5 is that the delay ratios of all the stations are basically

00569

00193

00132

00165

00355

0236

0015

00151

05924

0 02 04 06Important to Y

X9

X8

X7

X6

X5

X4

X3

X2

X1

Inpu

t var

iabl

es

Figure 2 Bar chart of the correlations between the nine input features and output Y

4 Journal of Advanced Transportation

Input data

Output predicted delay

Data preprocessing

Initializing the parameters and population

Updating the positionsand velocities of all particles

Maximum numberof iterations

No

ite = ite + 1

Yes

Calculating the fitness ofeach particle and updating the current best fitness

Figure 3 e flowchart of ELM-PSO method

Table 1 Train operation data format in the database

Station Station code Date Actual arrival Actual departure Train Scheduled arrival Scheduled departureGZS 369 2018727 1204 1204 G100 1205 1205GZN 368 2018727 1219 1219 G100 1219 1219QY 367 2018727 1226 1226 G100 1226 1226YDW 366 2018727 1237 1237 G100 1238 1238

Wuchang (WC)

Xianningbei (XNN)Chibibei (CBN)

Yueyangdong (YYE)

Miluodong (MLE)

Changshanan (CSS)Zhuzhouxi (ZZW)

Hengshanxi (HSW)

Hengyangdong (HYE)Leiyangxi (LYW)Chenzhouxi (CZW)Lechangdong (LCE)Shaoguan (SG)Yingdexi (YDW)Qingyuan (QY)Guangzhoubei (GZN)

Guangzhounan (GZS)

Qingsheng (QS)

HFSCQN

NN

Hong Kong

N

Wuhan-Guangzhouhigh-speed railwayline (W-G HSR line)

NWC

GY

Figure 4 Map of the W-G HSR line

Journal of Advanced Transportation 5

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 3: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

expressed as tAsnprime and tDsnprime respectively e difference be-tween the actual and scheduled arrival time at station sntAsnprime minus tAsn is referred to as the train arrival delay e samegoes for the train departure delay tDsnprime minus tDsn is is theprimitive description of the train delay problem

is paper only focuses on the train arrival delay pre-diction We suppose that there is an aimed train Tk which isat present station sn at time tAsn Our purpose is to predictthe arrival delay (tAsn+1prime minus tAsn+1) of the targeted train Tkat itsfollowing station sn+1 for all conditions according to theinformation of train k at stations sn sn+1 and snminus1 which ismade up of the following nine features

(1) e station code (X1)(2) e train number (X2) which indicates the number

of the trains(3) e length between the present station and the next

station (Dsn+1 minus Dsn) (X3)(4) e scheduled running times between the present

station and the previous station (tAsn minus tAsnminus1) (X4)(5) e actual running times between the present station

and the previous station (tAsnprime minus tAsnminus1prime ) (X5)(6) e scheduled running times between the present

station and the next station (tAsn+1 minus tAsn) (X6)(7) e actual running times between the present station

and the next station (tAsn+1prime minus tAsnprime ) (X7)(8) Buffer time which indicates the difference between

X6 and actual minimum running time of all trainsbetween the present station and the next station(X6 minus min T1(tAsn+1prime minus tAsnprime ) middot middot middot Tk(tAsn+1prime minus tAsnprime )1113864 1113865)(X8)

(9) e arrival delay time at the present station(tAsnprime minus tAsn) (X9)

Y represents the arrival delay time at the next station sn+1of train Tk (tAsn+1prime minus tAsn+1)

ere are multiple potential interdependent features(eg the train number the length between two adjacentstations) that are intently related to train delay prediction

Based on the collected data and the experience of dis-patchers we ultimately select nine features that are possibleto influence train delays

We apply extra trees classifier to analyze the correlationbetween all features and train delays e results areexhibited in Figure 2 As shown in the figure the deeper thered the higher the importanceere is no doubt that X9 hasthe highest importance with Y e actual and scheduledrunning times between the present station and the nextstation also contribute largely to the accurate prediction ofY Moreover the buffer time which is an important factoraffecting the length of the train recovery time is alsocomparatively prominent in delay prediction processTaking the buffer time into account allows us to obtain morerealistic prediction results

e train arrival delay prediction problem in this paper istransformed into the following expression

Y f X1 X2 X3 X4 X5 X6 X7 X8 X9( 1113857 (1)

where Xi is the information of train Tk running throughstations sn sn+1 and snminus1 Y is the arrival delay time at thefollowing station sn+1 of train Tk and f(x) is the predictionprocess

3 Methodology

is paper proposes a hybrid model of ELM and PSO fortrain delay prediction ELM is widely used in regressionproblems because of its advantages of small computationgood generalization performance and fast convergencespeed [19ndash21] PSO algorithm is a random and paralleloptimization algorithm which has the advantages of fastconvergence speed and simple algorithm [25 28] ereforewe aim to combine the advantages of ELM and PSO algo-rithm to improve the behavioral knowledge in the delayprediction domain For the principle of ELM and PSO onecan refer to Li et al [29] Perceptron et al [30] and Zhanget al [31] e running process of the proposed hybridmethod is as follows

Dwell time

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

Snndash1 Snndash1 Sn Sn Sn+1 Sn+1

tAsnndash1 tDsnndash1 tAsn tDsn tAsn+1 tDsn+1

tprimeAsnndash1 tprimeDsnndash1 tprimeAsn tprimeDsn tprimeAsn+1 tprimeDsn+1

Arrival delay time

Nominaltimetable

Actualtimetable

Breakdown

Figure 1 Conversion from the train itinerary to mathematical notation

Journal of Advanced Transportation 3

Step 1 data preprocessing First 9 features mentionedin Section 2 are generated a N times 9 matrix where N

represents the total number of events according to thetrain operation records Second remove abnormaldelay (trains may be canceled due to some emergencies)to reduce its interference with predictions ird fill inthe missing data according to the adjacent data aroundthe missing onesStep 2 initializing the parameters and populationParameters such as maximal iteration number pop-ulation size and speed and position of the first particleare initialized Each particle l has its own positionAl(ite) and speed Vl(ite) e position of each particlein the population is equivalent to the number ofneurons in the hidden layer of ELMerefore there ismerely one dimension of each particle

Al(ite) hl(ite) l 1 k (2)

where hl(ite) represents the number of hidden layerneurons of ELM in the iteth iterationStep 3 ELM (hidden layer activation function sigmoidfunction) is used e processed feature set X and theposition of particles (the number of hidden layerneurons) generated by PSO are input into ELMConsequently ELM can output the weight matrixunder the current number of hidden layer neuronsefunction of calculating the fitness of particles is asfollows

pl(ite) fitness Al(ite) X( 1113857

1113936Nn1 yn minus fn(x)( 1113857

2

N

1113971

Al(ite) hl(ite) l 1 k

(3)

where N is the number of samples yn is the actualoutput value on test set and fn(x) is the predictedoutput value on test setStep 4 calculate the fitness of each particle andcompare to update the current best fitness and itsparticle locationStep 5 start the iteration PSO will update the positionsand velocities of all particles and then repeat step 4 Ifthe maximum number of iterations is exceeded it willend the processStep 6 output the results We can obtain the outputvalue on test set as well as the optimal number ofhidden layer neurons

e specific flowchart is shown in Figure 3

4 Application to a Case Study

41 Dataset Description e data employed to verify theELM-PSO are obtained from the dispatching office of arailway bureaue 15 stations applied in the study include asection the length of which is 1096 km from CBN to GZS onthe double-track W-G HSR line ere are more than400000 data points used in this study with a time span fromOctober 2018 to April 2019e train original operation dataand route map of the targeted 15 stations on the W-G HSRline are shown in Table 1 and Figure 4

Analysis of the delay ratio of each station reveals not onlythe condition of each station but also an increasing emphasison the indispensability of train arrival delay predictionwhich contributes to improving the ability of each station tocope with and even inhibit the increase in train arrivaldelays Trains with arrival delay greater than 4 minutes areconsidered as delayed trains What is intuitively presented inFigure 5 is that the delay ratios of all the stations are basically

00569

00193

00132

00165

00355

0236

0015

00151

05924

0 02 04 06Important to Y

X9

X8

X7

X6

X5

X4

X3

X2

X1

Inpu

t var

iabl

es

Figure 2 Bar chart of the correlations between the nine input features and output Y

4 Journal of Advanced Transportation

Input data

Output predicted delay

Data preprocessing

Initializing the parameters and population

Updating the positionsand velocities of all particles

Maximum numberof iterations

No

ite = ite + 1

Yes

Calculating the fitness ofeach particle and updating the current best fitness

Figure 3 e flowchart of ELM-PSO method

Table 1 Train operation data format in the database

Station Station code Date Actual arrival Actual departure Train Scheduled arrival Scheduled departureGZS 369 2018727 1204 1204 G100 1205 1205GZN 368 2018727 1219 1219 G100 1219 1219QY 367 2018727 1226 1226 G100 1226 1226YDW 366 2018727 1237 1237 G100 1238 1238

Wuchang (WC)

Xianningbei (XNN)Chibibei (CBN)

Yueyangdong (YYE)

Miluodong (MLE)

Changshanan (CSS)Zhuzhouxi (ZZW)

Hengshanxi (HSW)

Hengyangdong (HYE)Leiyangxi (LYW)Chenzhouxi (CZW)Lechangdong (LCE)Shaoguan (SG)Yingdexi (YDW)Qingyuan (QY)Guangzhoubei (GZN)

Guangzhounan (GZS)

Qingsheng (QS)

HFSCQN

NN

Hong Kong

N

Wuhan-Guangzhouhigh-speed railwayline (W-G HSR line)

NWC

GY

Figure 4 Map of the W-G HSR line

Journal of Advanced Transportation 5

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 4: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

Step 1 data preprocessing First 9 features mentionedin Section 2 are generated a N times 9 matrix where N

represents the total number of events according to thetrain operation records Second remove abnormaldelay (trains may be canceled due to some emergencies)to reduce its interference with predictions ird fill inthe missing data according to the adjacent data aroundthe missing onesStep 2 initializing the parameters and populationParameters such as maximal iteration number pop-ulation size and speed and position of the first particleare initialized Each particle l has its own positionAl(ite) and speed Vl(ite) e position of each particlein the population is equivalent to the number ofneurons in the hidden layer of ELMerefore there ismerely one dimension of each particle

Al(ite) hl(ite) l 1 k (2)

where hl(ite) represents the number of hidden layerneurons of ELM in the iteth iterationStep 3 ELM (hidden layer activation function sigmoidfunction) is used e processed feature set X and theposition of particles (the number of hidden layerneurons) generated by PSO are input into ELMConsequently ELM can output the weight matrixunder the current number of hidden layer neuronsefunction of calculating the fitness of particles is asfollows

pl(ite) fitness Al(ite) X( 1113857

1113936Nn1 yn minus fn(x)( 1113857

2

N

1113971

Al(ite) hl(ite) l 1 k

(3)

where N is the number of samples yn is the actualoutput value on test set and fn(x) is the predictedoutput value on test setStep 4 calculate the fitness of each particle andcompare to update the current best fitness and itsparticle locationStep 5 start the iteration PSO will update the positionsand velocities of all particles and then repeat step 4 Ifthe maximum number of iterations is exceeded it willend the processStep 6 output the results We can obtain the outputvalue on test set as well as the optimal number ofhidden layer neurons

e specific flowchart is shown in Figure 3

4 Application to a Case Study

41 Dataset Description e data employed to verify theELM-PSO are obtained from the dispatching office of arailway bureaue 15 stations applied in the study include asection the length of which is 1096 km from CBN to GZS onthe double-track W-G HSR line ere are more than400000 data points used in this study with a time span fromOctober 2018 to April 2019e train original operation dataand route map of the targeted 15 stations on the W-G HSRline are shown in Table 1 and Figure 4

Analysis of the delay ratio of each station reveals not onlythe condition of each station but also an increasing emphasison the indispensability of train arrival delay predictionwhich contributes to improving the ability of each station tocope with and even inhibit the increase in train arrivaldelays Trains with arrival delay greater than 4 minutes areconsidered as delayed trains What is intuitively presented inFigure 5 is that the delay ratios of all the stations are basically

00569

00193

00132

00165

00355

0236

0015

00151

05924

0 02 04 06Important to Y

X9

X8

X7

X6

X5

X4

X3

X2

X1

Inpu

t var

iabl

es

Figure 2 Bar chart of the correlations between the nine input features and output Y

4 Journal of Advanced Transportation

Input data

Output predicted delay

Data preprocessing

Initializing the parameters and population

Updating the positionsand velocities of all particles

Maximum numberof iterations

No

ite = ite + 1

Yes

Calculating the fitness ofeach particle and updating the current best fitness

Figure 3 e flowchart of ELM-PSO method

Table 1 Train operation data format in the database

Station Station code Date Actual arrival Actual departure Train Scheduled arrival Scheduled departureGZS 369 2018727 1204 1204 G100 1205 1205GZN 368 2018727 1219 1219 G100 1219 1219QY 367 2018727 1226 1226 G100 1226 1226YDW 366 2018727 1237 1237 G100 1238 1238

Wuchang (WC)

Xianningbei (XNN)Chibibei (CBN)

Yueyangdong (YYE)

Miluodong (MLE)

Changshanan (CSS)Zhuzhouxi (ZZW)

Hengshanxi (HSW)

Hengyangdong (HYE)Leiyangxi (LYW)Chenzhouxi (CZW)Lechangdong (LCE)Shaoguan (SG)Yingdexi (YDW)Qingyuan (QY)Guangzhoubei (GZN)

Guangzhounan (GZS)

Qingsheng (QS)

HFSCQN

NN

Hong Kong

N

Wuhan-Guangzhouhigh-speed railwayline (W-G HSR line)

NWC

GY

Figure 4 Map of the W-G HSR line

Journal of Advanced Transportation 5

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 5: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

Input data

Output predicted delay

Data preprocessing

Initializing the parameters and population

Updating the positionsand velocities of all particles

Maximum numberof iterations

No

ite = ite + 1

Yes

Calculating the fitness ofeach particle and updating the current best fitness

Figure 3 e flowchart of ELM-PSO method

Table 1 Train operation data format in the database

Station Station code Date Actual arrival Actual departure Train Scheduled arrival Scheduled departureGZS 369 2018727 1204 1204 G100 1205 1205GZN 368 2018727 1219 1219 G100 1219 1219QY 367 2018727 1226 1226 G100 1226 1226YDW 366 2018727 1237 1237 G100 1238 1238

Wuchang (WC)

Xianningbei (XNN)Chibibei (CBN)

Yueyangdong (YYE)

Miluodong (MLE)

Changshanan (CSS)Zhuzhouxi (ZZW)

Hengshanxi (HSW)

Hengyangdong (HYE)Leiyangxi (LYW)Chenzhouxi (CZW)Lechangdong (LCE)Shaoguan (SG)Yingdexi (YDW)Qingyuan (QY)Guangzhoubei (GZN)

Guangzhounan (GZS)

Qingsheng (QS)

HFSCQN

NN

Hong Kong

N

Wuhan-Guangzhouhigh-speed railwayline (W-G HSR line)

NWC

GY

Figure 4 Map of the W-G HSR line

Journal of Advanced Transportation 5

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 6: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

not optimistic At the same time the delay ratios of the twotargeted stations CZW and GZN are particularly dreadfulwith arrival delay ratios of 012 Our goal is to minimize thearrival delay ratio by predicting the arrival delay at eachstation

42 Experimental Settings

421 Baseline Models In order to compare the performanceof our proposed method the k-nearest neighbor (KNN)categorical boosting (CB) gradient boosting decision tree(GBDT) and Lasso are used as baseline models We take20 of the dataset as the test set and the rest as the trainingset e experiment runs in Python in an environment withan Intelreg Core i5-6200U processor 213GHz and 8GB RAMBriefly an overview description and hyperparameter set-tings of each model are as follows

(1) KNN KNN algorithm is extensively applied indiffering applications massively owing to its sim-plicity comprehensibility and relatively promisingmanifestation [32]

① N_neighbors 15② Weights uniform③ Leaf_size 30④ P 2

(2) CB CB is a machine learning model based ongradient boosting decision tree (GBDT) [33 34] CBis an outstanding technology especially for datasetswith heterogeneous features noisy data and com-plex dependencies

① Depth 3② Learning_rate 01③ Loss_functionRMSE

(3) GBDT GBDT has been employed to numerousproblems [35] which has many nonlinear trans-formations and strong expression ability and does

not need to do complex feature engineering andfeature transformation

① N_estimators 30② Loss ls③ Learning_rate 01

(4) Lasso Lasso is a prevailing technique capable ofsimultaneously performing regularization and fea-ture filtering Furthermore data can be analyzedfrom multiple dimensions by Lasso [36]

① Alpha 30② Max_iter 1000③ Selection cyclic

422 Evaluation Metrics Root mean squared error(RMSE) mean absolute error (MAE) and R-squared areselected to assess the models e definitions of the errormetrics are shown in equation (4) equation (5) andequation (6)

MAE 1113936

Ni1 yi minus 1113954yi

11138681113868111386811138681113868111386811138681113868

N (4)

RMSE

1113936Ni1 yi minus 1113954yi( 1113857

2

N

1113971

(5)

R minus squared 1 minus1113936

Ni1 yi minus 1113954yi( 1113857

2

1113936Ni1 yi minus yi( 1113857

2 (6)

where yi is an observed value 1113954yi is a predicted value yi is theaverage value of yi and N represents the sample size

423 Hyperparameter Tuning Models We compare PSOwith the other two hyperparameter tuning models to as-certain the most satisfying one e overview and hyper-parameter settings of each model are as follows

CBN YYE MLE CSS ZZW HSW HYE LYW CZW LCE SG YDW QY GZN GZS

0047

0059 0062

0052

011

0116

0102

0113 0118 0112 0121

0031

0121 0119 0113

Arr

ival

del

ay ra

tio

Station

Figure 5 Arrival delay ratio for each station

6 Journal of Advanced Transportation

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 7: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

(1) PSO to locate the optimal hyperparameter of theELM the parameter settings of the PSO algorithmare as follows PSO has 20 particles at each iterationand there are altogether 20 iterations which isequivalent to 400 iterations of Bayesianoptimization

① Number of particles 20② Fitness function RMSE on test set③ Search dimension 1④ Particle search range [1 2000]⑤ Maximum number of iterations 20

(2) BO (Bayesian optimization) BO calculates theposterior probability distribution of the first n pointsthrough a substitution function and obtains theobjective function of each hyperparameter at eachvalue point

① Objective function RMSE on test set② Substitution function Gaussian process

regression③ Acquisition functionUCB (upper confidence

bound)④ Hyperparameter search range [1 2000]⑤ Maximum number of iterations 400

(3) GA (genetic algorithm) the traditional iterativemodel is easy to fall into the trap of local minimawhich makes the iteration impossible to continueGA overcomes the phenomenon of ldquodead looprdquo andis a global optimization algorithm [37]

① Objective function RMSE on test set② Hyperparameter search range [1 2000]③ Generations 20④ Population size 20⑤ Maximum number of iterations 400

5 Performance Analysis

51 PSO Optimization Result Comparison e process ofPSO tuning the hyperparameter is shown in Figure 6 efitness value achieves minimum after five iterationse bestfitness value is 10387 on test set when there are 1462neurons of the ELMe structure of the network is optimalcorrespondingly

e search range [1ndash2000] of hyperparameter is deter-mined by manually trying several values in the range of[1ndash10000] When the hyperparameter value is greater than2000 the fitness tends to be stable Also the time con-sumption is multiplied acutely Ultimately we decide tolimit the search range to [1ndash2000] weighing time con-sumption and precision

e computational cost is shown in Table 2 and theresults are the optimal results of each model running severaltimes We gain two observations from this table First theoptimal particle number of ELM-PSO always focuses on1462 the only difference is the number of iterations at bestRMSE Second compared with ELM-BO and ELM-GA

ELM-PSO is the ideal model that takes the shortest time tolocate the optimal fitness on the test set

52 Model Accuracy Comparison In this section the per-formance comparison between ELM-PSO and baselinemodels is performed

First we compare the overall performance of the fivemodelse evaluation metrics are R-squared the MAE andthe RMSEe corresponding results on test set and trainingset are summarized in Tables 3 and 4 respectivelye ELM-PSOmodel performs optimally among the five models in notonly the training set (R-squared 09973 MAE 03377RMSE 08247) but also the test set (R-squared 09955MAE 03490 RMSE 10387) Although the running timeof our model has no obvious advantage compared with othermodels on test set it is within the tolerable range Also wenotice that there are models that perform well in the trainingset but are not good in the test set which reveals theparamountcy of enough generalization ability of models inthe prediction problem

en by separating the delay duration into three bins(ie [0ndash1200 s] gt1200 s and all delayed trains (trains witharrival delay greater than 240 seconds)) we attempt tomeasure the capability of the benchmark models and ourmodel to seize the features of train delays to varying degreeson test set As is distinctly shown in Table 5 the proposedmodel outperforms the other benchmark models in eachtime horizon and each evaluation metric achieving forexample an RMSE of 05201 in the first bin is finding istaken as evidence that our model can constantly adjust itselfto capture the characteristics of varying degrees of traindelays to enhance the prediction accuracy To further assessthe performance of our model the comprehensive analysesare discussed in a later section

53 Further Analysis On the basis of the previous sectionwe will evaluate the performance of the ELM-PSO modelfrom other angles including the prediction errors for eachstation precisely the prediction correctness and therobustness

First and foremost the errors of the ELM-PSOmodel forthe predicted arrival delays are calculated at the station levelon test set Viewing the overall situation in Figure 7 we havenoticed that the prediction errors are low e MAE and R-squared both remain stable at each station And the RMSEsfor different stations are mostly less than 90s However greatfluctuations occur at the YYE and QY stations Reasonsresulting in such phenomena are that the two stations areclose to the transfer stations and the buffer times of YYE andQY are both small e prediction accuracy at these stationstends to be slightly hindered by these factors

In addition to put forward more detailed and embeddedresults we describe the correctness of the absolute residualbetween the predicted values and the actual values for eachstation from three intervals (ie lt30 s 30 sndash60 s and60 sndash90 s) (Figure 8) In lt30 s interval the correctness of

Journal of Advanced Transportation 7

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 8: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

each stations exceeds 75 In brief the overall resultsconfirm the impressive prediction correctness of the pro-posed model

At last we investigate the robustness of our model todata size In detail we further train and test our model using10 20 30 40 50 60 70 80 and 90 re-spectively of the total data as test set and compare theresults with the baseline models e data sizes used in theexperiments are shown in Figure 9 e performance of ourmodel on both training and test set is more outstanding thanothers As we can see the RMSEs of our model stay prettystable using data with different sizes while the RMSEs of

other models are higher and fluctuating ese figures showthat the proposed model has the smallest predictive RMSEMAE and R-squared for all trains which demonstrates therobustness of our model to different data sizes

54 Statistical Tests In this section the Friedman test (FT)and Wilcoxon signed rank test (WSRT) are used to verifythe advantages of our proposed method compared withother methods [35 38] e results FT and WSRT areshown in Table 6 FT algorithm is a nonparametric sta-tistical tool which determines the difference by ranking

1031 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

104

105

106

107

108

109

11

111

112

113

Fitn

ess

Iterations

Figure 6 Convergence discriminant graph of PSO optimization

Table 2 e performance of each hyperparameter setting model

Model Maximum number of iterations e number of iterations at best RMSE RMSE Neurons Time (s)ELM-PSO 20 5 10387 1462 90000ELM-BO 400 120 10708 1279 129600ELM-GA 20 6 10668 1494 108000

Table 3 Prediction errors on each modelrsquos test set for the W-G HSR line

Model RMSE MAE R-squared Time (s)ELM-PSO 10387 03490 09955 856CB 16808 10464 09883 9GBDT 19976 12031 09835 18Lasso 19852 09240 09847 9KNN 16488 05025 09887 27

Table 4 Prediction errors on each modelrsquos training set for the W-G HSR line

Model RMSE MAE R-squaredELM-PSO 08247 03377 09973CB 16368 10407 09896GBDT 19495 12107 09852Lasso 16046 09199 09893KNN 14681 04558 09916

8 Journal of Advanced Transportation

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 9: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

the performance of each method It can be seen from thetable that the proposed method has better ranking thanCB GBDT Lasso and KNN at 5 significance level thatis the efficiency is better In addition the results of WSRTshowed that the p-value was less than 005 (5

significance level) which rejected the null hypothesis Itmeans that there is a statistical difference between theproposed method and other methods at is the per-formance of the proposed method is better than that ofother methods

Table 5 Model performance comparison on test set for the five models for different delay bins

Delay bin (seconds) Model RMSE MAE R-squared

[0 1200]

ELM-PSO 05201 03006 09655CB 12628 09223 07967

GBDT 13968 09886 07513Lasso 10957 07903 08469KNN 10195 05278 08675

gt1200

ELM-PSO 56009 28249 09924CB 62335 31986 09906

GBDT 80733 48589 09843Lasso 65023 31169 09898KNN 90247 48611 09804

All delayed trains (gt240)

ELM-PSO 33116 13457 09951CB 40469 22680 09927

GBDT 51561 31498 09881Lasso 39251 17418 09931KNN 55878 28828 09860

0010203040506070809

1111213141516171819

2

CBN YYE MLE CSS ZZW HSW HYE LYW CZW

Pred

ictio

n er

ror

Station LCE SG YDW QY GZN

RMSEMAER-squared

Figure 7 Prediction errors in terms of the RMSE MAE and R-squared for different stations

Journal of Advanced Transportation 9

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 10: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

87

80

77

78

79

82

83

82

86

84

88

83

87

85

85

0 20 40 60 80 100

CBN

YYE

MLE

CSS

ZZW

HSW

HYE

LYW

CZW

LCE

SG

YDW

QY

GZN

GZS

Prediction correctness

Stat

ion

9

14

16

15

15

12

12

14

10

13

8

13

10

10

11

lt 05 min05 min-1 min1 min-15 min

Figure 8 Prediction correctness for each station on test set

10 Journal of Advanced Transportation

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 11: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

0

05

1

15

2

25

3

35

4

RMSE

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(a)

Figure 9 Continued

Journal of Advanced Transportation 11

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 12: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

0

02

04

06

08

1

12

14

16

18

2

MA

E

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(b)

Figure 9 Continued

12 Journal of Advanced Transportation

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 13: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

6 Conclusion

In this paper a hybrid ELM-PSO method is proposed topredict train delays e ELM can overcome the short-comings of backpropagation training algorithm and theadvantage of PSO is its excellent ability in searching the besthyperparameter Four benchmark models CB KNNGBDT and Lasso models are selected to compare withproposed model ese models were run on the same datacollected from China Railways ELM-PSO tends to have a

better performance and generalization ability (R-squared 09955 MAE 03490 RMSE 10387) than theother models on the test set Our work can not only providesufficient time and auxiliary decision for the dispatcher tomake reasonable optimization and adjustment plan but alsohave practical significance for improving the quality ofrailway service and helping passengers estimate their traveltime

e dataset used in this paper contains train delays underall types of scenarios erefore in the future we will

09

092

094

096

098

1

102

R-sq

uare

d

Percentage of the data used as test set ()

LASSO1LASSO2GBDT1GBDT2CAT1

10 20 30 40 50 60 70 80 90

CAT2KNN1KNN2ELM1ELM2

(c)

Figure 9 MAE RMSE and R-squared values on training and test sets with different data sizes (LASSO1 represents the performance ontraining set LASSO2 represents the performance)

Table 6 Friedman ranking test and WSRT results

Models Mean rank FT p value Model 1 vs model 2 WSRT Z-score WSRT p valueELM-PSO (M1) 150

le0001

mdash mdash mdashCB (M2) 355 M1-M2 minus259200 le0001GBDT (M3) 408 M1-M3 minus261312 le0001Lasso (M4) 353 M1-M4 minus257456 le0001KNN (M5) 234 M1-M5 minus174373 le0001

Journal of Advanced Transportation 13

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 14: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

consider dividing all the data into certain types of delayscenarios according to particular rules and implementingcurrently prevalent models to train and predict each scenarioto achieve a higher accuracy Finally in terms of the inputfeatures all the information of the features in this paper canbe obtained from train timetables In the future other typesof features such as the infrastructure weather features andother HSR lines obstruction will be taken into account

Data Availability

e data used to support the findings of this study weresupplied by China Railway Guangzhou Bureau Group CoLtd under license and so cannot be made freely availableAccess to these data should be considered by the corre-sponding author upon request with permission of ChinaRailway Guangzhou Bureau Group Co Ltd

Conflicts of Interest

e authors declare that they do not have any commercial orassociative interest that represents conflicts of interest inconnection with the paper they submitted

Authorsrsquo Contributions

Xu Bao contributed to conceptualization prepared theoriginal draft was responsible for software and visualizedthe study Yanqiu Li prepared the original draft was re-sponsible for software and visualized the study Jianmin Licontributed to methodology and reviewed and edited themanuscript Rui Shi contributed to supervision and datacuration Xin Ding contributed to data curation

Acknowledgments

is work was financially supported by the FundamentalResearch Funds for the Central Universities of China(2019JBM077) and the Open Fund for Jiangsu Key Labo-ratory of Traffic and Transportation Security (Huaiyin In-stitute of Technology)

References

[1] J L Espinosa-Aranda and R Garcıa-Rodenas ldquoA demand-based weighted train delay approach for rescheduling railwaynetworks in real timerdquo Journal of Rail Transport Planning ampManagement vol 3 no 1-2 pp 1ndash13 2013

[2] N Markovic S Milinkovic K S Tikhonov and P SchonfeldldquoAnalyzing passenger train arrival delays with support vectorregressionrdquo Transportation Research Part C Emerging Tech-nologies vol 56 pp 251ndash262 2015

[3] C Wen W Mou P Huang and Z Li ldquoA predictive model oftrain delays on a railway linerdquo Journal of Forecasting vol 39no 3 pp 470ndash488 2019

[4] A Higgins and E Kozan ldquoModeling train delays in urbannetworksrdquo Transportation Science vol 32 no 4 pp 346ndash3571998

[5] J Yuan and I A Hansen ldquoOptimizing capacity utilization ofstations by estimating knock-on train delaysrdquo Transportation

Research Part B Methodological vol 41 no 2 pp 202ndash2172007

[6] P B L Wiggenraad Alighting and Boarding Times of Pas-sengers at Dutch Railway Stations TRAIL Research SchoolDelft Netherland 2001

[7] M F Gorman ldquoStatistical estimation of railroad congestiondelayrdquo Transportation Research Part E Logistics and Trans-portation Review vol 45 no 3 pp 446ndash456 2009

[8] P Huang C Wen and L Fu ldquoModeling train operation assequences a study of delay prediction with operation andweather datardquo Transportation Research Part E Logistics andTransportation Review vol 141 2020

[9] P Kecman and R M P Goverde ldquoOnline data-drivenadaptive prediction of train event timesrdquo IEEE Transactionson Intelligent Transportation Systems vol 16 no 1pp 465ndash474 2015

[10] S Milinkovic M Markovic S Veskovic et al ldquoA fuzzy Petrinet model to estimate train delaysrdquo Simulation ModellingPractice amp 7eory vol 33 pp 144ndash157 2013

[11] J Peters B Emig M Jung and S Schmidt ldquoPrediction ofdelays in public transportation using neural networksrdquo inProceedings of the International Conference on ComputationalIntelligence for Modelling Control and Automation and In-ternational Conference on Intelligent Agents Web Technologiesand Internet Commerce vol 2 pp 92ndash97 Vienna AustriaNovember 2005

[12] C Jiang P Huang J Lessan L Fu and C Wen ldquoForecastingprimary delay recovery of high-speed railway using multiplelinear regression supporting vector machine artificial neuralnetwork and random forest regressionrdquo Canadian Journal ofCivil Engineering vol 46 no 5 pp 353ndash363 2019

[13] J Hu and B Noche ldquoApplication of artificial neuron networkin analysis of railway delaysrdquo Open Journal of Social Sciencesvol 4 no 11 pp 59ndash68 2016

[14] M Yaghini M M Khoshraftar and M Seyedabadi ldquoRailwaypassenger train delay prediction via neural network modelrdquoJournal of Advanced Transportation vol 47 no 3 pp 355ndash368 2013

[15] F Corman and P Kecman ldquoStochastic prediction of traindelays in real-time using Bayesian networksrdquo TransportationResearch Part C Emerging Technologies vol 95 pp 599ndash6152018

[16] J Lessan L Fu and C Wen ldquoA hybrid Bayesian networkmodel for predicting delays in train operationsrdquo Computers ampIndustrial Engineering vol 127 pp 1214ndash1222 2019

[17] P Huang C Wen L Fu Q Peng and Y Tang ldquoA deeplearning approach for multi-attribute data a study of traindelay prediction in railway systemsrdquo Information Sciencesvol 516 pp 234ndash253 2020

[18] G Bin Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquo Neurocomputing vol 70no 1ndash3 pp 489ndash501 2006

[19] Y Chen and W Wu ldquoMapping mineral prospectivity usingan extreme learning machine regressionrdquo Ore Geology Re-views vol 80 pp 200ndash213 2017

[20] Y YoanMiche A Sorjamaa P Bas et al ldquoOP-ELM optimallypruned extreme learning machinerdquo IEEE Transactions onNeural Networks vol 21 no 1 pp 158ndash162 2010

[21] Y Yimin Yang Y Yaonan Wang and X Xiaofang YuanldquoBidirectional extreme learning machine for regressionproblem and its learning effectivenessrdquo IEEE Transactions onNeural Networks and Learning Systems vol 23 no 9pp 1498ndash1505 2012

14 Journal of Advanced Transportation

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15

Page 15: PredictionofTrainArrivalDelayUsingHybrid ELM-PSOApproach

[22] F Han H-F Yao and Q-H Ling ldquoAn improved evolu-tionary extreme learning machine based on particle swarmoptimizationrdquo Neurocomputing vol 116 pp 87ndash93 2013

[23] L Oneto E Fumeo G Clerico et al ldquoDynamic delay pre-dictions for large-scale railway networks deep and shallowextreme learning machines tuned via thresholdoutrdquo IEEETransactions on Systems Man and Cybernetics Systemsvol 47 no 10 pp 2754ndash2767 2017

[24] Y Li and Z Yang ldquoApplication of EOS-ELM with binaryjaya-based feature selection to real-time transient stabilityassessment using PMU datardquo IEEE Access vol 5pp 23092ndash23101 2017

[25] Y Zhang T Li G Na G Li and Y Li ldquoOptimized extremelearning machine for power system transient stability pre-diction using synchrophasorsrdquo Mathematical Problems inEngineering vol 2015 Article ID 529724 8 pages 2015

[26] Y Wang H Zhang and G Zhang ldquocPSO-CNN an efficientPSO-based algorithm for fine-tuning hyper-parameters ofconvolutional neural networksrdquo Swarm and EvolutionaryComputation vol 49 pp 114ndash123 2019

[27] S V Konstantinov A I Diveev G I Balandina andA A Baryshnikov ldquoComparative research of random searchalgorithms and evolutionary algorithms for the optimalcontrol problem of the mobile robotrdquo Procedia ComputerScience vol 150 pp 462ndash470 2019

[28] D Wang D Tan and L Liu ldquoParticle swarm optimizationalgorithm an overviewrdquo Soft Computing vol 22 no 2pp 387ndash408 2017

[29] Y Li X Xu J Li and R Shi ldquoA delay prediction model forhigh-speed railway an extreme learning machine tuned viaparticle swarm optimizationrdquo in Proceedings of the 2020 IEEE23rd International Conference on Intelligent TransportationSystems (ITSC) Rhodes Greece September 2020

[30] M Perceptron J Tang and S Member ldquoExtreme learningmachine for multilayer perceptronrdquo IEEE Transactions onNeural Networks and Learning Systems vol 27 no 4pp 809ndash821 2016

[31] J-R Zhang J Zhang T-M Lok and M R Lyu ldquoA hybridparticle swarm optimization-back-propagation algorithm forfeed forward neural network trainingrdquo Applied Mathematicsand Computation vol 185 no 2 pp 1026ndash1037 2007

[32] Z Deng X Zhu D Cheng M Zong and S Zhang ldquoEfficientkNN classification algorithm for big datardquo Neurocomputingvol 195 pp 143ndash148 2016

[33] Y Zhang Z Zhao and J Zheng ldquoCatBoost a new approachfor estimating daily reference crop evapotranspiration in aridand semi-arid regions of Northern Chinardquo Journal of Hy-drology vol 588 Article ID 125087 2020

[34] G Huang L Wu X Ma et al ldquoEvaluation of CatBoostmethod for prediction of reference evapotranspiration inhumid regionsrdquo Journal of Hydrology vol 574 pp 1029ndash10412019

[35] J Yang C Zhao H Yu and H Chen ldquoUse GBDT to predictthe stock marketrdquo Procedia Computer Science vol 174pp 161ndash171 2020

[36] R Tibshirani ldquoRegression shrinkage and selection via thelasso a retrospectiverdquo Journal of the Royal Statistical SocietySeries B (Statistical Methodology) vol 73 no 3 pp 273ndash2822011

[37] M Wang L Wang X Xu Y Qin and L Qin ldquoGeneticalgorithm-based particle swarm optimization approach toreschedule high-speed railway timetables a case study inChinardquo Journal of Advanced Transportation vol 2019 ArticleID 6090742 12 pages 2019

[38] R Shi X Xu J Li et al ldquoPrediction and analysis of trainarrival delay based on XGBoost and Bayesian optimizationrdquoApplied Soft Computing vol 109 2021

Journal of Advanced Transportation 15