A learning framework based on weighted knowledge transfer ...

A learning framework based on weighted knowledge transferfor holiday load forecasting

Pan ZENG1, Chang SHENG1, Min JIN1

Abstract Since the variation pattern of load during holi-

days is different than that of non-holidays, forecasting

holiday load is a challenging task. With a focus on this

problem, we propose a learning framework based on

weighted knowledge transfer for daily peak load forecast-

ing during holidays. First, we select source cities which can

provide extra hidden knowledge to improve the forecast

accuracy of the load of the target city. Then, all the

instances which are from source cities and the target city

will be weighted and trained by the improved weighted

transfer learning algorithm which is based on the TrA-

daBoost algorithm and can decrease negative transfer. We

evaluate our method with the classical support vector

machine method and a method based on knowledge

transfer on a real data set, which includes eleven cities

from Guangdong province to illustrate the performance of

the method. To solve the problem of limited historical

holiday load data, we transfer the data from nearby cities

based on the fact that nearby cities in Guangdong province

have a similar economic development level and similar

load variation pattern. The results of comparative experi-

ments show that the forecasting framework proposed by

this paper outperforms these methods in terms of mean

absolute percent error and mean absolute scaled error.

Keywords Load forecasting, Holiday effect, Sparse data,

Weighted transfer learning

1 Introduction

Short-term power load forecasting (STLF) is one of the

key technologies to achieve the observability and control-

lability of a strong smart grid. The ever-increasing supply

of distributed power, the use of renewable energy, and the

establishment of new grid systems have raised an urgent

need for accurate and effective load forecasting. During the

past few decades, several STLF methods, including tradi-

tional and artificial intelligence methods, have been pro-

posed [1, 2]. Traditional techniques, including linear

regression, auto regressive moving average (ARMA) [3],

grey model [4], Kalman filter-based methods, etc., use a

mathematical model to map the features to the load con-

sumption so as to make predictions. However, the rela-

tionship between fluctuation of load and exogenous factors

is complex and nonlinear, which makes it extremely dif-

ficult for us to build a precise model. In recent years,

artificial intelligence methods, such as fuzzy regression

models [5], ensemble forecasting method [6], artificial

neural network (ANN) [7, 8] and support vector machines

(SVMs) [9, 10], have been widely used to forecast elec-

tricity loads. ANN is based on multilayered perceptrons

and has a good performance on time series forecasting.

Several different types of ANNs, including radial basis

function (RBF) neural networks [11], back propagation

(BP) neural network and fuzzy neural networks [12] are

also used in load forecasting. However, the computational

CrossCheck date: 28 May 2018

Received: 20 October 2017 /Accepted: 28 May 2018 / Published

online: 4 September 2018

� The Author(s) 2018

& Min JIN

[email protected]

Pan ZENG

[email protected]

Chang SHENG

[email protected]

1 College of Computer Science and Electronic Engineering,

Hunan University, Changsha, China

123

J. Mod. Power Syst. Clean Energy (2019) 7(2):329–339

https://doi.org/10.1007/s40565-018-0435-z

http://crossmark.crossref.org/dialog/?doi=10.1007/s40565-018-0435-z&domain=pdf

http://crossmark.crossref.org/dialog/?doi=10.1007/s40565-018-0435-z&domain=pdf

https://doi.org/10.1007/s40565-018-0435-z

speed of ANN is not fast enough, and its solution is

sometimes easily trapped into a local optimum. The SVM,

which was proposed by Vapnik, is a novel powerful

machine learning method based on statistical learning

theory. It can effectively avoid the overfitting problem in

ANN. Since the SVM problem is a convex quadratic pro-

gramming problem, it can always obtain a unique and

globally optimal solution. SVM was originally designed for

classification problems and has achieved good perfor-

mances. With the introduction of Vatnik’s insensitive loss

function, a modified SVM, which is called support vector

regression (SVR) was proposed. Recently, SVR has been

applied to various applications, such as battery remaining

useful life estimation [13], natural gas demand prediction

[14], etc., and has shown excellent performance. SVR has

also been widely used in load forecasting.

Lin [15], the champion of the EUNITE Competition 2001,

has confirmed the superiority of the SVR. However, the

forecasting model proposed by Lin does not perform well for

holidays. This is primarily because holiday trends are dif-

ferent from those of non-holidays and the amount of data on

holidays is sparse. Ehab E. Elattar [16] presented an locally

weighted support vector machine regression (LWSVR)

model to predict load based on a time series. Hu [17] pro-

posed a model selection method for SVR in short-term load

forecasting. Che [18] proposed a kernel function selection

method for SVR in the short-term load forecasting problem.

Hong [19] developed a method that used the 24 solar terms

calendar to categorize days of a year for load forecasting

instead of the Gregorian calendar. However, these methods

failed to pay attention to the load forecasting for holidays and

the performance for holidays is yet to be improved. The loads

of holidays are usually less predictable than those of non-

holidays. On the one hand, the load variation pattern of hol-

idays is quite different from that of non-holidays, because of

the great change in human activity. On the other hand, the

historical load data of holidays is always limited. This is

called the problem of sparse data. Therefore, in order to

increase the load forecasting accuracy of holidays, both the

problems mentioned above should be well considered and

solved. In the Global Energy Forecasting Competition

(GEFCom2012) [20], some competitors had an emphasis on

holiday load forecasting. Reference [21] treated 6 special

holidays as weekend days: Memorial Day, Labour Day,

Thanksgiving Day and the day after, New Year’s Day,

Independence Day and Christmas Day and improved the

forecasting accuracy. However, since other public holidays

are not treated as weekend days and the dates of holidays are

different in other areas, this method should be modified when

used in such areas. Reference [22] modeled the holiday effect

by using a factor variable, which set the day before holiday as

a non-zero value and holiday as a zero value. While the

variable on other days was set as another different value. This

method divided days into three categories: holidays, the days

before holidays and other days. This improved the load

forecasting accuracy. Reference [23] analyzed the difficulties

in holiday load forecasting and proposed a series of approa-

ches to deal with public holidays and weekends, and further

drew a conclusion that the most promising approach is to add

public holiday dummies to the model but set the weekday

dummies at the holidays to zero. With this approach the load

forecasting accuracy during holidays is greatly increased.

These works are significant for research on holiday load

forecasting and are using different approaches to solve the

first problem of holiday load forecasting. However, they

failed to consider the second problem, namely, the problem of

sparse data. Reference [24] reveals that the availability of

historical holiday load data has an impact on the choice of

holiday modeling technique, and that limited historical data

leads to the limitation of the choice of the holiday modeling

methods. That is to say, in order to improve the forecasting

accuracy on holidays, just marking the holidays by using

different variables is not sufficient. More attention should be

paid to solve the problem of sparse data.

Transfer learning could solve the problem of sparse data

by using the knowledge learned from some source tasks

and finally improving the learning process of the target

task. Transfer learning has been successfully used in many

fields, such as natural language processing, face recogni-

tion, load forecasting etc., and even supports cross-domain

knowledge transfer. In load forecasting, the target task is to

forecast the load of the target city, and source tasks are that

of source cities, i.e. cities that can provide hidden knowl-

edge to improve the forecast accuracy of the load of the

target city. Zhang [25] proposed a method based on transfer

learning for load forecasting. With the use of extra

knowledge transferred from other cities, the overall fore-

cast accuracy has been improved. However, transferring all

load data from source cities results in two issues. First, a

large amount of noisy data could be imported, which has a

negative impact on the forecast result. Second, since the

influence of the data from the target city and source cites is

different, we can’t treat these two kinds of data equally.

In this paper, a learning framework based on weighted

knowledge transfer for holiday load forecasting is pro-

posed. First, a new holiday feature is introduced to indicate

the turning point of holiday loads, thus solving the first

problem of holiday load forecasting. Moreover, by using

weighted transferring historical holiday data from source

cities to the target city, the sparse holiday data sets can be

appropriately enriched and the load forecasting accuracy of

holidays can be improved without influencing the load

forecasting performance of non-holidays. Lastly, a fore-

casting framework named HWT-SVR (weighted transfer

learning of holidays), which can give different weights to

holiday load data from different sources, is proposed.

330 Pan ZENG et al.

123

Different weights reflect the degree of relevant influence of

the different training samples from the target city and

source cities on the holiday load data from the target city

and embody the essential attributes of different load data.

In this way, we can get a more accurate forecasting model.

Furthermore, with the use of the improved TrAdaBoost

algorithm we proposed, we solve the negative transfer

phenomenon that occurs in certain cities. Figure 1 shows

the flowchart of the proposed forecasting framework.

The rest of the paper is organized as follows. Section 2

analyzes data preprocessing in detail, as well as feature

selection and data set selection. Section 3 proposes the

method based on weighted knowledge transfer of holidays.

Section 4 describes the whole procedure of the proposed

forecasting framework in this paper. Section 5 illustrates

the forecast performance of this framework with some case

studies. Section 6 concludes the paper.

2 Data processing and selection

This section primarily covers four aspects: motivation of

the study, load data preprocessing, feature selection and

data set selection. After these steps, the model can more

accurately capture the load variation.

2.1 Study motivation

The variation of load on holidays is dramatically dif-

ferent than on non-holidays. Taking China as an example,

the overall electric load during the Spring Festival

represents the valley of the year because of the shutdown of

large-scale productive activities. Therefore, the accuracy of

load forecasting on holidays is always lower than on non-

holidays, and it’s more difficult to predict holiday load

especially when there are fewer historical load data on

holidays than on non-holidays. Moreover, the traditional

holidays in China are celebrated by lunar calendar, so they

are not fixed to date, and this makes it more challenging to

predict the load on holidays. Improving the forecasting

accuracy on holidays will greatly improve overall fore-

casting accuracy. In China, there are four important and

official holidays: New Year’s Day, Spring Festival, May

Day and National Day, as shown in Fig. 2. We classified

these four holidays into four types. The Spring Festival is

celebrated according to lunar calendar, which makes it

different from the other three holidays. The New Year’s

Day holiday only lasts three days and we classified it into

another type. The May Day holiday is in summer and lasts

seven days, and the National Day is in autumn. Apparently,

the load during holidays is lower than on non-holidays.

2.2 Data preprocessing

Noise and missing data can affect the accuracy of a

forecast. In this work, Pauta criteria [26] have been used to

analyze the load data. Pauta criteria are used to detect

exceptional data. Assume xi is a series of data, a data point

satisfying (1) will be treated as exceptional data.

xi � �xj j[ 3r ð1Þ

where �x represents the mean value of xi and r represents

the standard error.

Data preprocessing

Feature selection

Basic feature selection Create holiday feature

Data set selection

Weighted knowledge transfer of holidays

Source cites selection

Weight reallocation

Improved TrAdaBoost algorithm for negative transfer cities

Output result

Fig. 1 Flowchart of the proposed forecasting framework Fig. 2 Holiday load data

A learning framework based on weighted knowledge transfer for holiday load forecasting 331

123

This study used the method of linear interpolation [4] to

correct exceptional data points. For example, Fig. 3 shows

the raw data of daily load consumption. It can be seen that

there are some exceptional data and missing data. The

actual daily peak load of one city after being processed is

plotted in Fig. 4.

After dealing with exceptional data, all load values will

be scaled by (2), where y’ is the normalized load, y is the

actual load, u is the average of the actual load, and r is the

standard deviation of the actual load.

y0 ¼ ðy� uÞ=r ð2Þ

2.3 Feature selection

Input variable selection is an extremely important step

in load forecasting, which directly affects forecasting per-

formance [27, 28]. In this paper, we select features pri-

marily according to the analysis of load variation trends.

2.3.1 Basic feature selection

To make use of the periodicity of the load trend, the past

load values up to 7 days are selected as a part of the basic

features. In order to let the algorithm capture the regulation

of the load trend more accurately, we use seven Boolean

variables as the weekday and holiday indicator. For

example, we set the second Boolean variable to true for

Tuesday and all first six variables to false for Sunday.

2.3.2 Create holiday feature

The load value of a holiday is difficult to forecast accu-

rately, not only because the data of holiday is too sparse

compared with other days, but also the variation of the trends

of a holiday is large. Figure 2 shows all four types of holiday.

It is obvious that there is a turning point in each holiday.

Since the pattern of the trend of load on holiday is fixed

relatively during each year, the turning point of each holiday

is fixed too. To make the algorithm perform better at these

turning points, we add another Boolean variable which is

used to indicate the turning point of the holiday.

Finally, the total number of features is 15 in our work, as

described in (3).

W1;W2; . . .;W6;H1;H2; L1; L2; . . .; L7ð Þ ð3Þ

2.4 Data set selection

The summer load is significantly higher than that inwinter,

as shown in Fig. 4. To reduce the interference of different

categories of data, the load can be forecasted with the data set

of the forecasting day that corresponds to the season. Many

methods exist to select the data set. The k-means clustering

algorithm is the most frequently used in machine learning.

The algorithmcan automatically cluster a data set based on the

distance of the data point to the objective function. The peak

daily load is first clustered into two categories by the algo-

rithm using the Euclidean distance function, and then, each

monthwill be grouped into the corresponding season based on

the number of days. The clustering results are shown in Fig. 5,

where class 1 is winter and class 2 is summer. It is clear that

January to April and November to December represent clus-

ters for winter, whereas the other months are the summer

clusters. To ensure the clustering result is closer to the situa-

tion of forecasting, we can take the load data within the last

two years before the predicted day.

3 Weighted knowledge transfer of holidays

This section describes in detail the source cities’ selec-

tion, weights’ distribution based on HWT-SVR and the

improved TrAdaBoost algorithm to solve the negative

transfer problem.

In transfer learning, the input and output data observa-

tions of the source tasks and target task are connected by a

hidden variable that means there is a latent or uncertain

Fig. 3 Daily load data before processing

Fig. 4 Peak load of one city after being processed

332 Pan ZENG et al.

123

relation between input and output variables, as shown in

Fig. 6. Therefore, additional knowledge and information

can be transferred from the source tasks to the target task,

which improves forecasting performance. In load fore-

casting, the amount of holiday data is sparse, and the trends

of the holidays and non-holidays are completely different;

thus, it is difficult for the model to accurately forecast

holidays. To solve this problem, the holiday data of rele-

vant source cities are transferred to the target city.

However, there may be potential interference regarding

the transfer of holiday data to the target city, as shown by

the dashed lines of Fig. 6. The negative transfer may come

from data of unrelated cities and related cities, where the

distribution of certain load points is different from that of

the target city. Therefore, selecting appropriate source

cities and data of source cities are two key factors.

3.1 Source city selection

Source city selection plays a vital role in transfer

learning. If the selection is inappropriate, it may lead to

negative transfer.

For a set of candidate cities T ¼ Tkjk ¼ 1; 2; . . .;mf g, ifthe ith city Ti is the target city, the goal of the source city

selection step is to select source cities from the remaining

m - 1 cities. Thus, the number of possible combinations is

2m-1. Exhaustive search on such a large number of candidate

combinations is impossible. In order to overcome this

problem, source cities are selected in the following way.

First, the similarity between target city and each can-

didate source city, i.e. the similarity between the load trend

of the target city and the load trend of each candidate

source city needs to be established. We use these similar-

ities to represent source city selection priority. For a set of

candidate source cities T1; T2; . . .; Tmf g, let

xðjÞt ; y

ðjÞt

� �; t ¼ 1; 2; . . .; n

n obe the original training data

set of city Tj, where xðjÞt is the attribute, i.e. the feature part

of the instance of jth candidate source city on time point t;

yðjÞt is the label part of the instance. Thus, we can use the

similarity between y(i) and y(j) to represent the similarity

between city Ti and Tj, here y(i) can be represented as (4).

yðiÞ ¼ yðiÞ1 ; y

ðiÞ2 ; . . .; y

ðiÞt

� �ð4Þ

The incidence degree between each point is defined in (5),

where yk0 � yki�� is the absolute difference between the source

city load yki and the target city load yk0, mini mink y

k0 � yki

�� isthe two-level minimum difference, which means the

minimum difference among all points (k = 1,2,…,m) and

all of the source cities yt (t = 1,2,…,n), maxi maxk yk0 � yki

�� is the two-level maximum difference, which has a similar

meaning to that of the two-level minimum difference, and qis the resolution coefficient, whose value ranges from 0 to 1

and is normally taken as 0.5.

niðkÞ ¼mini mink y

k0 � yki

�� þ qmaxi maxk yk0 � yki

�� yk0 � yki�� þ qmaxi maxk y

k0 � yki

�� ð5Þ

Integrating the incidence degree of each point, the

incidence degree between two vectors is obtained as

follows:

ri ¼1

N

XNk¼1

niðkÞ ð6Þ

Second, we need to determine the number of source

cities. Too many source cities not only result in negative

transfer, which reduces accuracy, but also increase the

computational time. Too few cities will reduce the

forecasting accuracy because there is not enough

supplementary information. To take into account the

forecast performance and computational efficiency, an

optimization algorithm is required to determine the number

of source cities to be used, which can be described by

objective function (7), where m and n are the number of

source cities to be selected and the number of candidate

source cities, respectively, and MAPE(m,n) is the mean

absolute percentage error of HWT-SVR based on the m

Input oftarget task

Input ofsource task 1

Input ofsource task m

Negative transferHidden variables

Output of target task

Positive transfer

Fig. 6 Target task, source task and negative transfer in transfer

learning problems

Fig. 5 Result of the k-means algorithm


123

source cities; k is a tuning parameter that controls the

trade-off between the number of source cities and

MAPE(m,n), where the first item minimizes the number

of source cities selected to reduce the computational

overhead. The second item requires that the number of

possible source cities is as large as possible to optimize

forecasting performance. The objective function is

minimized by these two items such that the most

appropriate number of source cities is determined. A

previous paper [29] specifically discusses the principle of

the optimization algorithm which is used to determine the

number of source cities to be transferred.

minm f ðmÞ ¼ km

nþ 1� kð Þ �MAPE m; nð Þ ð7Þ

In our research, the case study is based on 11 cities from

Guangdong province. Guangzhou is the provincial capital of

Guangdong province. Guangzhou and Shenzhen are the most

two economically developed cities in Guangdong province.

Since the electric load is affected by economic development,

the load variation pattern of Guangzhou and Shenzhen are

similar. For Dongguan and Foshan, in terms of economic

development, they are several years behind Guangzhou and

Shenzhen. Geographically, these 4 cities are not far from each

other. Therefore, the load variation pattern of Dongguan and

Foshan are similar to the load variation pattern of Guangzhou

and Shenzhen a few years ago. As for Zhaoqing andMeizhou,

their economies are not as developed as Guangzhou and

Shenzhen, and their load variation pattern tends to be similar.

Therefore, in practice, we could transfer knowledge learnt

from source cities to target city.

3.2 Weights reallocation

Weighted support vector regression (WSVR) [16] is an

improved support vector regression algorithm, which not

only has the advantages of SVR such as the ability to reach

the global optimal solution and good performance on a small

amount of data, but also has the ability to weight each

training instance. Thus, it can give more attention to the

instances that have higher weight. The parameters ofWSVR

can be obtained by solving a quadratic programming prob-

lem with linear equality and inequality constraints.

Our aim is to forecast the target city, and the distribution

of data is different between target city and each source city.

To decrease negative transfer, we need to ensure the weight

of instances of the target city is higher than the weight of

instances of source cities. The weight of each instance of the

target city, that is, the reference city, will be set to 1, and the

weight of each instance of source cities is computed using the

Pearson correlation coefficient as described in (8), where X

and Y are the feature part of the instance of source city and

target city, respectively; f is the dimension of the feature; and

�X and �Y are the average of each feature vector, respectively.

The value ranges from 0 to 1; the larger the value, the more

similar the features are and vice versa.

wi ¼

Pfk¼1

X � �Xð Þ Y � �Yð ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPfk¼1

Xi � �Xð Þ2s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Pfk¼1

Yi � �Yð Þ2s ð8Þ

3.3 Dealing with negative transfer

While useful information is transferred from source

cities to target city, some irrelevant information will also

be transferred to the target city, i.e. the negative transfer.

This is not only because of the difference of trend, but also

because of the difference of the load data distribution.

TrAdaBoost [30] is an algorithm based on AdaBoost that

can be used to reduce the influence of negative transfer. It

will decrease the weights of those instances whose distri-

butions are different from the data distribution of the target

city. The algorithm process is given in Algorithm 1.

Algorithm 1 TrAdaBoost algorithmInput: the two labeled data sets Td, Ts and the unlabeled data set S, a base learning algorithm Learner, and the maximum number of iterations N. Initialize the initial weight vector, 1 1 1 1

1 2( , , , )n mw w w += …wFor t = 1,2, ,N do

1. Set t = 1,2, ,N2. Call Learner, providing it with the combined

training set T with the distribution pt over T and the unlabeled data set S.

3. Calculate the error rate of εt on Ts

1

1

| ( ) ( ) |n m ti t i i

t n mti ni

i n

w h x c x

wε

+

+= +

= +

−= ∑∑

4. Set1/ (1 2 ln( / ) )

/ (1 )t t t

n Nββ ε ε

= += −

5. Update the new weight vector| ( ) ( )|

1| ( ) ( )|

source task

target task

t i i

t i i

h x c xtiit

i h x c xtii

w xw

w x

β

β

−+

− −

⎧ ∈⎪= ⎨∈⎪⎩

end forOutput: the weights of training data.

TrAdaBoost algorithm is designed for classification

problems, whereas load forecasting is a regression prob-

lem. In practical application, the error rate in the 2nd step

of TrAdaBoost should be less than or equal to 0.5 to ensure

that it will gradually decrease as the number of iterations

334 Pan ZENG et al.

123

increases. For classification problems, the result of ht(xi)

must be either 0 (negative sample) or 1 (positive sample).

Thus, the aforementioned condition is easy to satisfy. In

order to allow this algorithm to fit the regression problem,

we change the equation in the 2nd step of the algorithm to

(9), where Pt(xi) is the predicted load value; r(xi) is the real

load value; and d is an adjusted parameter that is less than

1. The actual meaning is that if the percentage predicted

error of the sample is less than d, it is considered to be

correct; otherwise, it is an error. By setting the threshold,

the algorithm can effectively reduce negative transfer. The

value should be obtained by experiment.

et ¼Xnþm

i¼nþ1

wtiq xið ÞPnþm

i¼nþ1

wti

ð9Þ

q xið Þ ¼ 0 ifPt xið Þ � r xið Þj j

r xið Þ \d

1 otherwise

8<: ð10Þ

4 Proposed forecasting framework

The specific steps of the forecasting framework based on

HWT-SVR are described as follows.

Step 1: Correct abnormal data and use (2) to scale the

corrected data.

Step 2: Extract the label part of instances of data and

construct vectors using (4) for all cities.

Step 3: Use (6) to calculate similarities between the

target city and candidate source cities and rank

the candidate source cities according to the

similarities we figured out.

Step 4: Use optimization function (7) to determine the

number of source cities selected; then, select the

appropriate cities as the transfer cities according

to the sorting order from Step 3.

Step 5: Create datasets according to (3), remove all non-

holiday instances in source cities, and calculate

the weights of each training instance according to

(8).

Step 6: Use the k-means algorithm to partition the data

set and train the HWT-SVR model with the

weights of Step 5 and the corresponding data set.

Step 7: Predict the load of the target city using the HWT-

SVR model in Step 6.

Step 8: Readjust the weights of the data set of the

negative transfer cities with the improved

TrAdaBoost algorithm; then, train the HWT-

SVM model, and predict the load.

5 Prediction case

For the case study, input variables and forecasting

model were presented in the above section. To illustrate the

feasibility of the proposed method, we made three case

studies with the daily peak load data of 11 cities from

Guangdong province from 2005 to 2007. The data of 2005

and 2006 are set to be the training set and the data of 2007

are set to be the test set. For each month, the month before

the predicting month in the same season is set to be the

validation set. For example, when predicting the load of

January of 2007, since January is in winter as we analyzed

before, winter month from 2005 to 2006 are set to be the

training set except for December 2006, which is the vali-

dation set. The value of k and d are respectively 0.1 and

0.25, obtained by experiment.

5.1 Some common mistakes

To assess the performance of the forecasting model, two

accuracy measures, including MAPE and MASE (mean

absolute scaled error), were used in this study. MAPE is a

measure of the accuracy of the forecasting model, where

the smaller the value is, the more accurate the forecasting

is. The MASE is a scaled error that is scaled by a naive

forecast value, which is less than one if the forecast is

better than the naive method, and the smaller the MASE

value is, the better the forecasting performance is. The

definitions of these metrics are shown as follows:

MAPE ¼ 1

n

Xni¼1

Li � L̂i

Li

�� 100 ð11Þ

MASE ¼

Pni¼1

Li � L̂i��

nn�1

Pni¼2

Li � Li�1j jð12Þ

where Li and L̂i are the actual and forecasted load demand

value of the ith sample, respectively; n is the forecast horizon.

5.2 Case 1

In this test case, we forecast the load of Guangzhou City.

We compared our proposed method, i.e. HWT-SVR with

three other methods. The first method, which is denoted as

SVR, is proposed by Lin and helped his team win the

EUNITE Competition 2001. The second method is T-SVR

which is based on the method of transfer learning proposed

byZhang. The thirdmethod, denoted asHF-SVR, is based on

the SVR model with the use of the holiday feature we pro-

posed in Sect. 2. The forecasting results of thesemethods for


123

Guangzhou City in January are shown in Fig. 7. We can see

that the holiday feature we proposed improves the forecast-

ing accuracy of the holidays. Table 1 also presents the

MAPE and MASE. It is clear that the HF-SVR model does

not fully capture the holiday characteristics. Taking into

account the fact that the holiday data are too sparse, we

transfer the holiday data of source cities. The HWT-SVR

result is represented by the red curve. We can see that the

HWT-SVR model performs better than the other models;

also, the MAPE and MASE are better than those of the

others. On the other hand, from the T-SVR result, which is

represented by the yellow curve, it is not difficult to see that

the performance of the forecasting result for non-holidays

decreased, caused by transferring the non-holiday data.

Figures 8, 9 and 10 and Table 2 show the results of load

forecasting in February, May and October, corresponding to

Spring Festival, May Day and National Day. As shown in

Table 2, our method gains the best performance in May and

October, and the accuracy in February is slightly lower than

that of SVR. Figure 11 shows the actual and predicted load

and the percentage error of SVR, T-SVR and HWT-SVR.

We can see from the figure that the percentage errors of SVR

and T-SVR are not stable and fluctuate dramatically during

holidays. The percentages of HWT-SVR are relatively

steady and the fluctuation around holidays is not as obvious

as that of SVR and T-SVR (Fig. 12).

5.3 Case 2

In this test case, we compared HWT-SVR with three

other methods mentioned in the previous test case on other

cities. The result is shown in Table 3. From the MAPE of

each city, it can be seen that the proposed forecasting

model exhibited an improved forecasting performance for

most cities. However, the MAPE was more improved by

the HF-SVR method for Meizhou City and less improved

for the HWT-SVR model, which was even lower than that

of the SVR model. The reason is that negative transfer

occurred for the city. In the next case, we will present the

result of the method of the improved negative transfer. The

numbers of source cities for each target city are also listed

in the last column of Table 3 after applying the optimiza-

tion algorithms. The number of transferred source cities inFig. 7 Forecasted and actual peak daily load in January

Table 1 MAPE and MASE of January

Method MAPE (%) MASE

SVR 3.50 0.74

T-SVR 3.19 0.67

HF-SVR 3.11 0.66

HWT-SVR 2.14 0.46

Note: Bold indicates the best performace of different methods

Fig. 8 Forecasted and actual peak daily load in February

Fig. 9 Forecasted and actual peak daily load in May

336 Pan ZENG et al.

123

each target city is not greater than 5; if an excessive

number of source cities is selected, it will cause negative

transfer. In addition, regarding a special case, the number

of source cities transferred is only one for Foshan City.

When additional cities were transferred, the MAPE and

MASE decreased. The load distribution of the city is sig-

nificantly different from that of the others. Therefore, we

can conclude that if the load distribution is clearly differ-

ent, it is a negative result of transfer learning. However,

this situation is rare, and thus, we can apply the method.

5.4 Case 3

In this test case, we tested THWT-SVR, which combi-

nes HWT-SVR and the proposed improved TrAdaBoost

algorithm on Meizhou city to evaluate its ability to dealwith negative transfer. The results are shown in Table 4.

As observed from the MAPE and MASE of Table 4, the

Fig. 10 Forecasted and actual peak daily load in October

Fig. 11 Result of 2007

Table 2 MAPE and MASE of October

Month Method MAPE (%) MASE

February SVR 4.71 0.75

T-SVR 5.20 0.90

HF-SVR 4.94 0.78

HWT-SVR 4.72 0.76

May SVR 5.26 0.78

T-SVR 5.09 0.72

HF-SVR 5.03 0.70

HWT-SVR 4.36 0.65

October SVR 3.48 0.73

T-SVR 3.00 0.63

HF-SVR 3.15 0.66

HWT-SVR 2.33 0.49


Fig. 12 MAPE results of January of each task city with the

comparison methods


123

algorithm can reasonably allocate the weights of the data

set to reduce negative transfer.

6 Conclusion

This paper proposes an STLF approach based on

weighted transfer learning of holidays. First, this approach

abstracts a new holiday feature to indicate holidays. Sec-

ondly, the holiday data of source cities are transferred to

the target city to appropriately enrich sparse holiday data

and further aid in forecasting. According to the reliability

of different data sources, the training data of the target city

and source cities are allocated different weights to enhance

forecasting performance. In addition to overcome negative

transfer issues of a target city, the improved TrAdaBoost

algorithm adjusts the initial weights based on the Pearson

correlation coefficient to solve the problem.

We compared our method with SVR and T-SVR on

more than a dozen of cities to illustrate its feasibility. The

case studies show that the performance metrics were

improved by the method proposed in this paper. From the

comparative experimental analysis, the improvement is due

to the new holiday feature, transfer learning of the holidays

and the improved TrAdaBoost algorithm. The proposed

method is suitable for a power system that has the load data

of source cities, particularly for those where the accuracy

of holiday forecasting is low. In future work, we will

consider more features to better increase the performance

of forecasting, such as weather data, economic condition,

climate, etc.

Acknowledgements This work was supported by the National Nat-

ural Science Foundation of China (No. 61773157), and in part by the

National Scientific and Technological Achievement Transformation

Project of China (No. 201255).

Open Access This article is distributed under the terms of the

Creative Commons Attribution 4.0 International License (http://

creativecommons.org/licenses/by/4.0/), which permits unrestricted

use, distribution, and reproduction in any medium, provided you give

appropriate credit to the original author(s) and the source, provide a

link to the Creative Commons license, and indicate if changes were

made.

References

[1] Hong T (2010) Short term electric load forecasting. https://

search.proquest.com/docview/852985857. Accessed Jan 2010

[2] Hong T, Fan S (2016) Probabilistic electric load forecasting: a

tutorial review. Int J Forecast 32(3):914–938

[3] Li YY, Han D, Yan Z (2018) Long-term system load forecasting

based on data-driven linear clustering method. J Mod Power

Syst Clean Energy 6(2):306–316

[4] Jin M, Zhou X, Zhang ZM et al (2012) Short-term power load

forecasting using grey correlation contest modeling. Expert Syst

Appl 39(1):773–779

[5] Song KB, Baek YS, Hong DH et al (2005) Short-term load

forecasting for the holidays using fuzzy linear regression

method. IEEE Trans Power Syst 20(1):96–101

[6] Zhou M, Jin M (2017) Holographic ensemble forecasting

method for short-term power load. IEEE Trans Smart Grid.

https://doi.org/10.1109/TSG.2017.2743015

[7] Rana M, Koprinska I (2016) Forecasting electricity load with

advanced wavelet neural networks. Neurocomputing

182:118–132

[8] Khotanzad A, Afkhami-Rohani R, Maratukulam D (1998)

ANNSTLF-artificial neural network short-term load forecaster-

generation three. IEEE Trans Power Syst 13(4):1413–1422

[9] Ceperic E, Ceperic V, Baric A (2013) A strategy for short-term

load forecasting by support vector regression machines. IEEE

Trans Power Syst 28(4):4356–4364

[10] Zeng P, Jin M (2018) Peak load forecasting based on multi-

source data and day-to-day topological network. IET Gener

Transm Dis 12(6):1374–1381

[11] Ko CN, Lee CM (2013) Short-term load forecasting using SVR

(support vector regression)-based radial basis function neural

network with dual extended Kalman filter. Energy 49:413–422

[12] Lou CW, Bong MC (2015) A novel random fuzzy neural net-

works for tackling uncertainties of electric load forecasting. Int J

Electr Power 73:34–44

[13] Patil MA, Tagade P, Hariharan KS et al (2015) A novel mul-

tistage support vector machine based approach for Li ion battery

remaining useful life estimation. Appl Energy 159:285–297

[14] Zhu L, Li MS, Wu QH et al (2015) Short-term natural gas

demand prediction based on support vector regression with false

neighbours filtered. Energy 80:428–436

Table 3 MAPE and MASE of October

Target city SVR T-SVR HF-SVR HET-SVR NSC

Dongguan 5.31 5.22 5.07 3.72 3

Foshan 4.85 4.78 4.53 4.53 3

Guangzhou 3.50 3.48 3.11 2.14 5

Huizhou 6.22 5.43 6.01 3.94 4

Jiangmen 3.90 3.80 3.28 2.25 2

Meizhou 3.16 3.75 2.89 3.41 2

Shantou 2.95 2.92 2.70 2.59 1

Shenzhen 3.93 3.89 3.78 3.55 5

Zhaoqing 5.21 5.20 4.43 4.97 4

Zhongshan 5.31 4.27 4.86 2.64 4

Zhuhai 3.79 3.67 3.26 3.24 3

Table 4 MAPE and MASE with the improved TrAdaBoost

algorithm

Method MAPE (%) MASE

SVR 3.16 0.82

HWT-SVR 3.41 0.88

THWT-SVR 2.88 0.74


338 Pan ZENG et al.

123

http://creativecommons.org/licenses/by/4.0/

http://creativecommons.org/licenses/by/4.0/

https://search.proquest.com/docview/852985857

https://search.proquest.com/docview/852985857

https://doi.org/10.1109/TSG.2017.2743015

[15] Chen BJ, Chang MW, Lin CJ (2004) Load forecasting using

support vector machines: a study on EUNITE competition 2001.

IEEE Trans Power Syst 19(4):1821–1830

[16] Elattar EE, Goulermas J, Wu QH (2010) Electric load fore-

casting based on locally weighted support vector regression.

IEEE Trans Syst Man Cybern C 40(4):438–447

[17] Hu ZY, Bao YK, Xiong T (2014) Comprehensive learning

particle swarm optimization based memetic algorithm for model

selection in short-term load forecasting using support vector

regression. Appl Soft Comput 25:15–25

[18] Che JX, Wang JZ (2014) Short-term load forecasting using a

kernel-based support vector regression combination model.

Appl Energy 132:602–609

[19] Xie JR, Hong T (2018) Load forecasting using 24 solar terms.

J Mod Power Syst Clean Energy 6(2):208–214

[20] Hong T, Pinson P, Fan S (2014) Global energy forecasting

competition 2012. Int J Forecast 30(2):357–363

[21] Charlton N, Singleton C (2014) A refined parametric model for

short term load forecasting. Int J Forecast 30(2):364–368

[22] Ben Taieb S, Hyndman RJ (2014) A gradient boosting approach

to the Kaggle load forecasting competition. Int J Forecast

30(2):382–394

[23] Ziel F (2018) Modeling public holidays in load forecasting: a

German case study. J Mod Power Syst Clean Energy

6(2):191–207

[24] Xie J, Chien A (2016) Holiday demand forecasting in the

electric utility industry. In: Proceedings of the 2016 SAS global

forum, Las Vegas, USA, 18–21 April 2016, 13 pp

[25] Zhang YL, Luo GM (2015) Short term power load prediction

with knowledge transfer. Inf Syst 53:161–169

[26] Zhao H, Min F, Zhu W (2013) Test-cost-sensitive attribute

reduction of data with normal distribution measurement errors.

Math Probl Eng 3:4928–4942

[27] Hu ZY, Bao YK, Xiong T et al (2015) Hybrid filter-wrapper

feature selection for short-term load forecasting. Eng Appl Artif

Intell 40:17–27

[28] Koprinska I, Rana M, Agelidis VG (2015) Correlation and

instance based feature selection for electricity load forecasting.

Knowl Based Syst 82:29–40

[29] Che JX, Wang JZ, Tang YJ (2012) Optimal training subset in a

support vector regression electric load forecasting model. Appl

Soft Comput 12(5):1523–1531

[30] Dai W, Yang Q, Xue GR et al (2007) Boosting for transfer

learning. In: Proceedings of the 24th international conference on

machine learning, Corvallis, USA, 20–24 June 2007,

pp 193–200

Pan ZENG received the B.E. degree in building electricity and

intelligence from Qingdao University of Technology, Qingdao,

China, in 2012, and the M.S. degree in software engineering from

Hunan University, Changsha, in 2016. He is currently working toward

the Ph.D. degree in the College of Computer Science and Electronic

Engineering, Hunan University, Changsha, China. His research

interests include machine learning, artificial intelligence, load fore-

casting, and their engineering applications.

Chang SHENG received the M.S. degree in software engineering

from Hunan University, Changsha, China, in 2016. His research

interests include machine learning, load forecasting, and their

engineering applications.

Min JIN received the B.E. degree in automation and the M.S. degree

in control theory and engineering from Central South University of

Technology, Changsha, China, in 1995 and 1997, respectively, and

the Ph.D. degree in control theory and engineering from Central

South University, Changsha, in 2000. From August 2000 to August

2004, she was a Senior Engineer with Dongfang Electronics

Information Industry Company, Yantai, China. From December

2010 to January 2012, she was a Visiting Scholar with the Department

of Electrical and Computer Science, Georgia Institute of Technology,

Atlanta, GA, USA. Since November 2004, she has been with the

College of Computer Science and Electronic Engineering, Hunan

University, Changsha, China, where she is currently a professor. Her

research interests include artificial intelligence, big data and industry

4.0, load forecasting, and their engineering applications.


123

A learning framework based on weighted knowledge transfer ...

Documents

Transcript of A learning framework based on weighted knowledge transfer ...