D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. ·...
Transcript of D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. ·...
![Page 1: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/1.jpg)
This project has received funding from the European
Union’s Horizon 2020 research and innovation
programme under grant agreement No 768619
D4.4 Predictive Energy Production
and Demand Algorithms
The RESPOND Consortium 2020
Integrated Demand REsponse
SOlution Towards Energy
POsitive NeighbourhooDs
WP 4 – ICT enabled cooperative Demand
Response model
T4.4: ENERGY PRODUCTION AND DEMAND FORECASTING
Ref. Ares(2020)1858036 - 31/03/2020
![Page 2: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/2.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
2 | 48
PROJECT ACRONYM RESPOND
DOCUMENT D4.4 Predictive Energy Production and Demand
Algorithms
TYPE (DISTRIBUTION LEVEL) ☐ Public
☑ Confidential
☐ Restricted
DELIVERY DUE DATE 31/03/2020
DATE OF DELIVERY 31/03/2020
STATUS AND VERSION 1.0
DELIVERABLE RESPONSIBLE TEK
AUTHOR (S) Iker Esnaola (TEK)
Francisco Javier Diez (TEK)
Meritxell Gomez (TEK)
Dea Pujic (IMP)
Marko Jelic (IMP)
Nikola Tomasevic (IMP)
OFFICIAL REVIEWER(S) Carlos Lopez (FEN)
![Page 3: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/3.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
3 | 48
DOCUMENT HISTORY
ISSUE DATE CONTENT AND CHANGES
V0.1 14/02/2020 Table of content
V0.2 01/03/2020 Contributions from TEK
V0.3 10/03/2020 Contributions from IMP
V0.4 13/03/2020 Unofficial review from TEK
V1.0 30/03/2020 Official review from FEN
![Page 4: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/4.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
4 | 48
EXECUTIVE SUMMARY
The potential of DR programmes is particularly promising in the residential sector. Towards the
implementation of the optimal DR programmes, RESPOND aims at developing services that allow the
estimation of produced and consumed energy in dwellings and neighbourhoods within the pilot sites.
Although most energy forecasting approaches are data-driven due to their high performance, there are
also physics-based models. In this deliverable both approaches are followed, choosing the best ones for
each case.
The Energy Production Forecasting service develops models to estimate the production of RES generation
systems available at the RESPOND pilot sites, that is, PV panels (in Aarhus and the Aran Islands) and Solar
Thermal Collectors (in Madrid). For PV panels, Random Forest models were the used, whilst for the STC
Neural Networks were the chosen. For the Aran Islands, a physical model was employed, as it required
only parameters that can most commonly be found in the PV cells data sheets.
Likewise, the Energy Demand Forecasting service has been found necessary to accurately forecast short-
term electricity demand. Furthermore, this service has also covered the estimation of DHW consumption
in the Spanish neighbourhood. However, this DHW consumption case is rather complicated due to the
aforementioned difficulties, thus is yet to be adequately solved. These predictive models were based on
kNN type of algorithms due to their high performance.
All the developed models have been deployed and are currently automatized.
![Page 5: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/5.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
5 | 48
TABLE OF CONTENTS
1. Introduction 9
1.1 Aims and objectives 9
1.2 Relation to other project activities 9
1.3 Deliverable Structure 9
2. Data Mining for Energy Forecasting 10
2.1 The Knowledge Discovery in Databases 10
2.2 Training, Validating and Testing Predictive Models 12
2.3 Putting Models into Production 13
3. Energy Production Forecasting 15
3.1 SoA review 15
3.2 Data availability 16
3.3 Methodology 17
3.4 Results and discussion 18
3.5 Service Deployment 23
4. Energy Demand Forecasting 25
4.1 SoA review 25
4.2 Data Availability 26
4.3 Methodology 29
4.4 Service Deployment 39
5. Conclusions 41
Annex 1 42
![Page 6: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/6.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
6 | 48
LIST OF FIGURES
Figure 1: An overview of the steps that compose the KDD Process. 11
Figure 2: A visualization of train, validation and test dataset splits
(Source:https://tarangshah.com/blog/2017-12-03/train-validation-and-test-sets/) 13
Figure 3: Increase of renewable production with years 15
Figure 4: Dependency between the PV production and solar intensity 16
Figure 5: Example of GUI for PV production in Aarhus 17
Figure 6: Part of Madrid data point list relevant for measuring STC production 17
Figure 7: Madrid topology 18
Figure 8: Example of Aarhus PV forecaster performance 19
Figure 9: Example of Aran Island PV forecaster performance. 21
Figure 10 - Example of Madrid STC production forecaster performance 23
Figure 11 - Example of part of the weather forecasted data stored in MySQL 23
Figure 12 - Example of production forecast values MySQL table data 23
Figure 13: ARIMA's Electric Consumption prediction of a period of 10 days in March for Madrid_02. 30
Figure 14: ARIMA's Electric Consumption prediction of a period of 24 hours for Madrid_02 31
Figure 15: Linear Regression's Electric Consumption prediction for Madrid_02. 33
Figure 16: Electricity Consumption prediction obtained with an SVR model. 34
Figure 17: Real vs forecasted energy consumption of a predictive model with a good performance 35
Figure 18: Real vs forecasted energy consumption of a predictive model with a bad performance 35
Figure 19: Residuals of a predictive model with a good performance. 35
Figure 20: Residuals of a predictive model with a bad performance. 36
Figure 21: Forecasted vs actual electric consumption in Madrid (neighbourhood level) 36
Figure 22: Electric Consumption prediction of a period of 10 days in March for Madrid_02. 37
Figure 23: Electric Consumption prediction of a period of 10 days in March for Aarhus_11. 37
Figure 24: Initial predictions obtained for Madrid Neighbourhood DHW consumption. 38
Figure 25: Predictions obtained for Madrid Neighbourhood DHW consumption with predictive model
trained with data up to November 2019. 38
Figure 26: Predictions obtained for Madrid Neighbourhood DHW consumption with predictive model
trained with data up to March 2020. 39
Figure 27: Deployment of the Energy Demand Forecasting Services. 40
![Page 7: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/7.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
7 | 48
LIST OF TABLES
Table 1: Comparison between different ML approaches for PV production forecasting. 18
Table 2: Constants. 21
Table 3: Static parameters. 21
Table 4: Comparison between different ML approaches for STC production forecasting 22
Table 5: Demand data availability for Madrid. 27
Table 6: Demand data availability for the Aran Islands. 28
Table 7: Demand data availability for Aarhus. 28
Table 8: Comparison between ARIMA and SARIMA models. 30
Table 9: DHW Consumption Prediction Confussion Matrix. 39
![Page 8: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/8.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
8 | 48
ABBREVIATIONS AND ACRONYMS
DR Demand Response
MAE Mean Average Error
ML Machine Learning
PV Photovoltaic
RES Renewable Energy Source
STC Solar Thermal Collector
SVR Support Vector Regression
![Page 9: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/9.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
9 | 48
1. INTRODUCTION
1.1 AIMS AND OBJECTIVES
Buildings’ energy consumption has dramatically increased over the last decade due to different factors
including the population growth, the increase in time spent indoors or the increased demand for building
functions and indoor quality [1]. As a matter of fact, buildings account for more than 35% of global energy
use and nearly 40% of energy-related CO2 emissions [2]. However, significant energy savings can be
achieved in buildings if they are properly operated.
The residential sector is characterized by many end consumers with relatively low individual energy
demand, but with very high demand when considered in terms of home clusters, districts and residential
communities. For example, in 2016 the residential sector represented the 25.4% of final energy
consumption and 17.4% of gross inland energy consumption in the EU [3]. In this sector, space heating
and water heating are the major end-uses, followed by appliances, cooking and lighting [4]. Therefore,
the potential of DR programmes is particularly promising for this sector.
Being able to accurately predict the amount of energy to be produced over a period of time, and knowing
in advance when demand peaks will occur, can definitely contribute to a better management of their
disparity, thus allowing the suggestion of the most suitable DR programs to end-users. And this is precisely
the aim of RESPOND’s Task 4.4: the development of services that allow the estimation of produced and
consumed energy in dwellings and neighbourhoods within the RESPOND pilot sites.
1.2 RELATION TO OTHER PROJECT ACTIVITIES
With regards to the interaction between Task 4.4 and the rest of RESPOND project activities, the main
interactions are listed below:
• As for the WP2, the T4.4 is built based on the data collected by the central IoT platform designed
in T2.1, the early deployment described in T2.4, and the actual platform deployment in T2.5.
• As for the WP4, the T4.4 supports the optimized control within T4.5.
• As for the WP5, T4.4 results are leveraged by the RESPOND mobile app developed in T5.4.
• As for the WP6, T4.4 results will be validated in T6.2 with the methods and criteria defined in T6.1.
1.3 DELIVERABLE STRUCTURE
The rest of the deliverable is structured as follows. Section 2 introduces the energy forecasting topic. Section 3 focuses on the development of Energy Production Forecasting services, while Section 4 focuses on the development of Energy Demand Forecasting services. Finally, conclusions of this task are collected in Section 5.
![Page 10: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/10.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
10 | 48
2. DATA MINING FOR ENERGY FORECASTING
Energy forecasting is crucial for planning the optimal energy consumption. Therefore, numerous
approaches for energy forecasting were proposed in literature.
Autoregressive Integrated Moving Aveage (ARIMA) models are the most general class of models for
predicting the future values in a time series. These models aim to describe the autocorrelations in the
data. ARIMA are used to time series forecasting exploiting the past values.
The common conclusion is that the highest performances are being achieved using machine learning (ML)
models, and, therefore, various ML approaches will be presented in this section, with neural networks
(NNs) as the first one of them.
Neural Networks are models generally used for modelling complex dependences between the inputs and
outputs. They are capable of extracting relevant features even ones that have not been discovered by the
experts, and, therefore have been used in several of fields such as image processing, speech recognition,
spell checking etc. Unfortunately, the fact that most of those features are non-explainable, which is the
biggest drawback of most of the ML approaches, e.g. k-Nearest Neighours and support vector regression.
On the other hand, in the group of more explainable techniques which have been used linear regression,
regression trees and random forest algorithm could be found. Linear regression is linear function between
inputs and outputs, which gives bigger weights to the more important inputs. It could be suitable, for
example, for photovoltaic production modelling, having in mind high correlation between the solar
irradiance and the produced energy. Regression trees are models which separate space of input variables
into parts, for which is given certain output estimation. Averaging the output of different regression trees
modelling the same function, random forest estimation is obtained.
Finally, what all of previously presented approaches have in common is the fact that their parameters are
being determined using supervised learning techniques, or in other words, a vast variety of real-world
data is being used in order to reach the optimal set of model’s parameters. Therefore, the quality of those
data highly influences the model’s performances and is one of the most relevant parts when ML
approaches are considered. However, most common problems with raw data are detection of numerous
errors, which are inevitable due to problems with communications, sensors, harsh weather conditions
etc. Having all of previous in mind, data preprocessing is inevitable in order to exploit full potential of the
proposed ML approaches.
2.1 THE KNOWLEDGE DISCOVERY IN DATABASES
The KDD (Knowledge Discovery in Databases) is a process leading to the extraction of useful knowledge
from raw data [5]. This process is composed of the following five steps: Data Selection, Preprocessing,
Transformation, Data Mining and Interpretation. It is an interactive and iterative process rather than a
strict workflow. It involves numerous loops and many decisions made between any two of the mentioned
steps. The necessity of having such a flexible process arises from the wide range of methods and
![Page 11: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/11.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
11 | 48
parameter selections that can be applied in each step. An overview of the flow of KDD process steps is
illustrated in Figure 1.
Figure 1: An overview of the steps that compose the KDD Process.
Next, each KDD step is explained.
• Data Selection. It consists in selecting the datasets and the subset of variables or data
samples where the knowledge discovery is going to be performed. With the advent of new
paradigms such IoT (Internet of Things) or LD (Linked Data), data analysts may get lost in
today's chaotic information universe. As a matter of fact, much of this available data may
be redundant and therefore, it hinders the knowledge extraction as well as making it more
time and resource consuming. Therefore, in order to ease the upcoming KDD phases, data
analysts need to put their domain knowledge to work to select the sets of data and
variables used to do the analysis.
• Preprocessing. Different methods are applied to ensure quality of the data and prepare the
data for a subsequent analysis. Nowadays, datasets are prone to suffer from noise, outliers,
missing values, and inconsistencies due to their typical big size and their probable origin
from multiple and heterogeneous sources. Not only do these data quality issues
compromise knowledge extraction algorithms' performance, but they also may have a
negative impact on decision-making processes.
• Transformation. The data is changed into a form which data mining algorithms can work
with and improve their performance. This phase comprises different tasks although there
are two of them which are particularly relevant: feature generation and feature selection.
These two tasks are related, and often applied subsequently, because it is useful to post-
process the set of created features and discard features that have little value.
• Data Mining. The data analysis or discovery algorithm that best matches the data analyst's
goals is applied searching for hidden patterns in the data. Data analyst's role in this phase
consists in selecting the suitable algorithm and fine-tuning it with the appropriate
parameters. Furthermore, as each algorithm's performance may vary depending on the
input data, data analysts’ expertise and even intuition at times play a role in this phase.
![Page 12: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/12.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
12 | 48
• Interpretation. It is the final phase where the results, patterns and models derived are used
to support decision-making processes. This phase also relies on the data analysts’
knowledge in the domain at hand, and even for a domain-expert, this task may end up
being challenging in certain scenarios.
2.2 TRAINING, VALIDATING AND TESTING PREDICTIVE MODELS
Data-driven predictive models are highly dependent, not only on the quality but also on the amount of
data available. However, in order to ensure an adequate performance of the developed predictive model,
available data needs to be splitted for training, validating and testing purposes.
• Training Dataset: The sample of data used to fit the model. The developed model sees and
learns from this data.
• Validation Dataset: The sample of data used to provide an unbiased evaluation of a model
fit on the training dataset while tuning model hyperparameters. Hence the model
occasionally sees this data, but it never learns from it.
• Test Dataset: The sample of data used to provide an unbiased evaluation of a final model
fit on the training dataset. The Test dataset provides the gold standard used to evaluate
the model. It is only used once a model is completely trained (using the train and validation
datasets).
Finding an adequate splitting ratio of available data into Train, Validation and Test sets may depend on
two factors: the total number of samples in the available data, and the actual model being trained. Some
models need substantial data to train upon, so in those cases larger training sets are needed. Models with
few hyperparameters might be easier to validate and fine-tune, so in these cases, validation set may be
reduced. On the contrary, models with more hyperparameters, may need a large validation set.
Furthermore, it may also happen to have a model with no hyperparameters or ones that cannot be easily
tuned, where validation sets may not be necessary. Overall, similar to many other aspects in Machine
Learning, the train-test-validation split ratio (shown in Figure 2) is specific to the use case.
![Page 13: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/13.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
13 | 48
Figure 2: A visualization of train, validation and test dataset splits (Source:https://tarangshah.com/blog/2017-12-03/train-validation-and-test-sets/)
2.3 PUTTING MODELS INTO PRODUCTION
The deployment of predictive models is the process for making them available in production
environments, where they can provide predictions to other software systems. It is only once these models
are deployed to production that they start adding value, making deployment a crucial step. However,
there is complexity in the deployment of machine learning models.
There are two main ways to get predictions from predictive models put into production: online (or real-
time) predictions and batch predictions. When deciding which one of these two to choose, there are
different factors that need to be considered.
Load implications
Choosing a real-time prediction approach requires managing peak load. Depending on the approach taken
and how the prediction is going to be used, choosing a real-time approach might require having a machine
with the extra computing power available for providing a prediction within a certain Service Level
Agreement (SLA). On the contrary, in a batch approach, the computing of predictions can be spread out
throughout the day based on the capacity available.
Infrastructure Implications
Selecting a real-time approach puts a much higher operational responsibility. There is a need to monitor
how the system is working, generate alerts when there are issues, as well as take some consideration
concerning failover responsibility. For batch prediction, however, the operational obligation is much
lower. Some monitoring is needed, and altering is desired, but the need to monitor arising issues is much
lower.
Cost Implications
Real-time predictions have also implications from a cost point of view. The need for more computing
power without the ability to spread the load throughout the day can force into purchasing more
computing capacity than you would need or to pay for a spot price increase. Depending on the approach
and requirements taken, there might also be extra cost because of the need to have more powerful
compute capacity for meeting SLAs. Additionally, there be a higher infrastructure footprint when choosing
real-time predictions. One potential limitation there is when it was chosen to rely on app prediction - for
that specific scenario, the cost might end up being cheaper than going for a batch approach.
![Page 14: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/14.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
14 | 48
Evaluation Implications
Evaluating the prediction performance in a real-time manner can be more challenging than for batch
predictions. Evaluating and debugging real-time prediction models is significantly more complex to
manage. It requires a log collection mechanism that will allow collecting the different predictions and
features that yielded the score for further evaluation.
![Page 15: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/15.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
15 | 48
3. ENERGY PRODUCTION FORECASTING
In the 20th century electrical energy was produced mainly from the fossil fuels. However, this created
concerns about ecological environmental – primary about greenhouse gas emissions, global warming and
climate change. Therefore, lately, renewable energy sources (RES), such as photovoltaic (PV) panels, solar
thermal collectors (STCs) and wind turbines (WTs), were incorporated in the energy production, as well,
to decrease the use of fossil fuels, as shown in Figure 3 from [6]. Nonetheless, the renewable production
highly depends on the weather conditions, so this change significantly influenced destabilization of the
grid system. With the aim of improving grid stability and quality of the grid systems, it was necessary to
provide consumption and production planning ahead, which resulted with the necessity of developing the
energy production forecaster, which is the main focus of this section.
3.1 SOA REVIEW
Before explaining the developed models, in this subsection brief state-of-the-art summary will be given.
Namely, as stated in literature, PV forecasting approaches can be divided in three groups: physical models,
statistical models and hybrid models [7,8] depending on the approach used for the estimation of the
production depending on the required inputs. However, what is in common for all of these methodologies
are the inputs, as they all model the dependency of the renewable production depending on the weather
conditions. Physical approaches were firstly presented, and they represent set of mathematical equations
and physical laws which model the renewable system. Even though they were replaced by the novel data-
driven approaches in the field of PV forecasting, these models are practically the only one presented in
literature regarding STC production [9]. However, for PV forecasting, physical models are usually
outperformed with data-driven approaches, which are present in recent and SoA papers. Nonetheless,
due to the fact that their estimation is based on the mathematical modeling of the system, their main
advantage is that they do not need any historical data, so in some use-cases when historical data in
inaccessible, they are the only applicable ones. However, for application of these methodologies
Figure 3: Increase of renewable production with years
![Page 16: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/16.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
16 | 48
numerous physical parameters are required. This is significant drawback, having in mind that these
characteristics are usually hard to access. On the other hand, data-driven models, both regressive (AR,
ARMA, ARIMA, ARMAX, NARMAX, etc.) and machine learning (neural networks, support vector machines,
random forests, kNNs, etc.) require numerous of the historical data, but are capable of much more precise
modelling, which significantly improves performances. Additionally, none of the physical parameters are
required in order to implement this approach. Finally, hybrid approaches tend to combine benefits from
previously explained models in order to further improve them. As a part of this task, several models were
developed, comparing both data-driven models and physical ones, and the results and comparison will be
given further in this deliverable.
3.2 DATA AVAILABILITY
As renewable production forecasting is mainly motivated by the requirements of the planning and
rescheduling the production and consumption, this task was created with the aim of providing the
required inputs for the planning and optimization carried out as a part of Task 4.2 and 4.3. Therefore,
horizon and time resolution of the forecasted output correspond to the one defined as a part of previous
D4.2 and D4.3 as day ahead forecasting and optimization with the hourly resolution.
As a part of RESPOND project there are three pilots for which production forecasters were supposed to
be developed – Aarhus (Denmark), Aran Islands (Ireland) and Madrid (Spain). In Aarhus and Aran Islands,
PV panels were present, while in Madrid STCs were installed. As it has already been explained, current
State-of-the-art solutions for the PV production day-ahead forecasting are mostly based on the data-
driven techniques, and so brief analysis of the necessary data will be covered.
Having in mind the fact that production of the renewable energy sources highly depends on the weather
condition, it was necessary to provide forecasted weather parameters with horizon and time resolution
corresponding to the forecaster’s one. Additionally, if data-driven models were to be considered, it was
necessary to provide historical weather data
parameters, as well. With respect to the fact
that correlation between the PV and STC
production and solar radiation is extremely high
(Figure 4 from [10]), it was necessary to find
weather service which provides information
about the radiation. The weather forecasting
service that fulfilled all of the previous
requirements and that has been used as a part
of this task is WeatherBit1. This relevant data
obtained through the weather service has been
stored as a part of RESPOND platform in MySQL
DB.
1 https://www.weatherbit.io/
Figure 4: Dependency between the PV production and solar intensity
![Page 17: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/17.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
17 | 48
3.3 METHODOLOGY
As it has been described, current state of the art solutions are primarily focused on Machine Learning
approaches (ML). However, their utilization depends on the data availability. For Aarhus pilot case,
production data for all the neighbouring buildings present as pilot within RESPOND project is available
online2 for 3 previous years, as shown in Figure 5. Therefore, all relevant data for ML models training was
available and it was decided to use them for purpose of PV production forecasting in Denmark.
Unlike the Aarhus pilot case, where a couple of pilot buildings are sharing a single PV plant, the Aran Island
pilot is formed out of different geographically separated houses with some of them having its own PV
production. However, for only 2, out of 6 of them, production measurement data were available in the
InfluxDB. Having in mind that these panels differ amongst each other, so that present data might
adequately represent the missing one, it was decided to employ physical modeling approach for Aran
Island pilot case. Additionally, this created space for benchmarking and comparing various techniques in
the similar scenarios.
Finally, for Madrid pilot various sensors were deployed in Boiler Room for temperature and heat metering,
as shown in Figure 6, with measurement with ID “TEK-0000001-009” corresponding to the optimizer STC
production input. Namely, as explained in D4.2, and shown in Figure 7 the topology of the hot water
system is modelled through the Energy Hub with it having forecasted previously mentioned measurement
as the input. Taking all previous into consideration, it was decided to use ML approach as enough data
2 https://evishine.dk/ALBOA
Figure 6: Part of Madrid data point list relevant for measuring STC production
Figure 5: Example of GUI for PV production in Aarhus
![Page 18: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/18.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
18 | 48
was available. Additionally, the field of applying different ML techniques for STC production forecast is
unexplored, having one more argument for this choice.
3.4 RESULTS AND DISCUSSION
Aarhus pilot case
As it has previously described for Danish pilot Machine learning approach has been deployed. In order to
achieve as high performance as possible, various different ML methodologies were tested – support
vector regression (SVR), linear regression (LR), neural networks (NN), k nearest neighbors (kNN) and
random forest algorithm (RF). For all of them list of input parameters has been chosen as follows: relative
humidity, wind speed, pressure, dew point, UV, wind direction, temperature, cloud coverage and global
horizontal irradiance (GHI). The output of each of these models was a single value, representing the
production at the timestamp which corresponds to the inputs weather parameter’s one. In other words,
for day-ahead hourly production forecast 24 different arrays of forecasted weather parameters were
brought for model to estimate 24 different outputs.
Table 1: Comparison between different ML approaches for PV production forecasting.
Model/MAE [%] Aarhus
SVR - RBF 8.99
SVR - linear 8.87
SVR - sigmoid 8.79
Linear regression 9.63
Neural network 8.5
KNN 8.75
Figure 7: Madrid topology
STC forecaster
![Page 19: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/19.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
19 | 48
After inputs and output have been adequately normalized models have
been developed and trained in Python. For all of the approaches, optimal
set of hyper parameters has been obtained using grid search, and the example of results is shown in Table
1. Namely, for linear regression two hyper parameters were considered – polynomial degree of input n
and regularization factor alpha. For each combination of parameters Mean Square Error (MSE) and Mean
Absolute Error (MAE) were obtained for 5 independent training for training, validation and testing set,
and depending on the mean validation performance the optimal parameters were found.
Finally, after optimal set of parameters for each methodology has been established, comparison against
each of the methodologies has been concluded, as shown in Table 1. It can be noticed that when
comparing MAEs, neural networks achieve the highest performance, and therefore have been chosen as
the final model for the Aarhus day-ahead hourly production forecaster. The example of this model
forecast for one day is given in Figure 5 and accompanying with the MAE of just 8.5% it can be concluded
that this model is adequate for RESPOND platform deployment.
Aran Island pilot case
In Aran Island pilot site, due to the lack of data, it was necessary to employ physical model for production
forecasting for 6 houses with PVs – h1, h2, h3, h4, h5 and h12. The model presented in [11] has been
selected, given the fact that required physical data is widely spread in PV cells’ data sheets, making it
applicable in practice. In Table 2 and Table 3, all constants and static parameters relevant for the following
model are listed in Table 2 and Table 3 and are given in basic SI units if not differently noted. All static
parameters were searched from pilot coordinator, and for those for which data was not available most
common values were taken (e. g. for 𝛽 optimal angle was adopted). Apart from the static, this model
required 5 dynamic parameters regarding time and weather – GHI, temperature, number of the day in
the year, cloud coverage and current time. Details regarding model equations are given next:
Final estimation of the PV power consumption 𝑃𝑃𝑉 is given as
𝑃𝑃𝑉 = 𝑌𝑃𝑉𝑓𝑃𝑉(𝐺𝑡/𝐺𝑡𝑆𝑇𝐶)(1 + 𝑎𝑝(𝑇𝑐 − 𝑇𝑐𝑆𝑇𝐶))
where estimated cell temperature 𝑇𝑐 is given as
𝑇𝑐 =
𝑇𝑎 + (𝑇𝑐𝑁𝑂𝐶𝑇 − 𝑇𝑎𝑁𝑂𝐶𝑇) (𝐺𝑡
𝐺𝑡𝑁𝑂𝐶𝑇) (1 − (
𝜂𝑚𝑝𝑆𝑇𝐶(1 − 𝑎𝑝𝑇𝑐𝑆𝑇𝐶)𝜏𝑎
))
(1 + (𝑇𝑐𝑁𝑂𝐶𝑇 − 𝑇𝑎𝑁𝑂𝐶𝑇)(𝐺𝑡/𝐺𝑡𝑁𝑂𝐶𝑇)(𝑎𝑝 ∗ 𝜂𝑚𝑝𝑆𝑇𝐶/𝜏𝑎)
while 𝜂𝑚𝑝𝑆𝑇𝐶 defined as maximum power point efficienncy under standard test conditions is given as
KNN weighted 8.45
Random forest 8.61
Figure 8: Example of Aarhus PV forecaster performance
![Page 20: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/20.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
20 | 48
𝜂𝑚𝑝𝑆𝑇𝐶 = 𝑌𝑃𝑉/(𝐴𝑃𝑉 ∗ 𝐺𝑡𝑆𝑇𝐶)
and solar radiation incident on the PV array 𝐺𝑡 as
𝐺𝑡 = (𝐺𝑏 + 𝐺𝑑 ⋅ 𝐴𝑖) ⋅ 𝑅𝑏 + 𝐺𝑑(1 − 𝐴𝑖)(1 + cos 𝛽 /2)(1 + 𝑓 sin3𝛽
2+ 𝐺𝜌(1 − cos 𝛽 /2))
The ratio of the beam radion on the tilted surface to beam radiation on the horizontal surface 𝑅𝑏 is given as
𝑅𝑏 = 𝑐𝑜𝑠 𝜗 / 𝑐𝑜𝑠 𝜗𝑧
where 𝜗 is the angle of incidence and 𝜗𝑧 the zenith angle (both in °) and stays
cos 𝜗 = sin 𝛿 sin 𝜙 cos 𝛽 −sin 𝛿 cos 𝜙 sin 𝛽 cos 𝛾 + cos 𝛿 cos 𝜙 cos 𝛽 cos 𝜔 +
+ cos 𝛿 sin 𝜙 sin 𝛽 cos 𝛾 cos 𝜔 + cos 𝛿 sin 𝛽 sin 𝛾 sin 𝜔
cos 𝜗𝑧 = cos 𝜙 cos 𝛿 cos 𝜔 + sin 𝜙 sin 𝛿
where 𝜔 is the average hour angle given as the arithmetic mean hour angles 𝜔1 and 𝜔2 for the beginning and ending
timestamps 𝑡𝐶1 and 𝑡𝐶2
𝜔 = 𝜔1 + 𝜔2
2
𝜔1 = (𝑡𝑠1 − 12)/15
𝜔2 = (𝑡𝑠2 − 12)/15
where 𝑡𝑠1 and 𝑡𝑠2 are begging and ending timestamps in solar time
𝑡𝑠𝑖 = 𝑡𝑐𝑖 + 𝜆 15⁄ − 𝑍𝑐 + 𝐸
𝑅𝑏 is limited to the [𝑅𝑏𝑀𝐼𝑁, 𝑅𝑏𝑀𝐴𝑋], where 𝑅𝑏𝑀𝐼𝑁 = −1 and 𝑅𝑏𝑀𝐴𝑋 = 1 were experimentally determined.
Solar declination 𝛿, factor depending on the Earth’s position with the respect to the sun 𝐵 and solar equation of time
𝐸 are given as
𝛿 = 23.45 sin 360(284 + 𝑛)/365
𝐵 = 360(𝑛 − 1)/365
𝐸 = 3.82(0.000075 + 0.001868 cos 𝐵 − 0.032077 sin 𝐵 − 0.014615 cos 2𝐵 − 0.04089 sin 2𝐵)
Additionally, diffuse 𝐺𝑑, beam 𝐺𝑏 and extraterrestrial horizontal 𝐺𝑜 and extraterrestrial normal 𝐺𝑜𝑛 radiation are
calculated as follows
𝐺𝑑 = {
𝐺(1 − 0.09𝑘𝑡), 𝑘𝑡 ≤ 0.22
𝐺(0.9511 − 0.1604𝑘𝑡 + 4.388𝑘𝑡2 − 16.638𝑘𝑡
3 + 12.336𝑘𝑡4), 0.22 < 𝑘𝑡 ≤ 0.8
0.165𝐺, 𝑘 > 0.8
𝐺𝑏 = 𝐺 − 𝐺𝑑
𝐺𝑜 = 12/𝜋 ⋅ 𝐺𝑜𝑛 (cos 𝜙 cos 𝛿 (sin 𝜔2 − sin 𝜔1) + 𝜋(𝜔2 − 𝜔1)/180 sin 𝜙 sin 𝛿
𝐺𝑜𝑛 = 𝐺𝑠𝑐(1 + 0.033 cos 360𝑛/365)
where
𝐺 = (𝑜𝑓𝑓𝑠𝑒𝑡 + (1 − 𝑜𝑓𝑓𝑠𝑒𝑡) ⋅ (1 − 𝑐𝑙𝑜𝑢𝑑 𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒)) ⋅ 𝑔ℎ𝑖
and
𝑘𝑡 = 𝐺/𝐺𝑜
Finally, horizon brightening factor 𝑓 and the anisotropy index 𝐴𝑖 are given as
![Page 21: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/21.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
21 | 48
𝑓 = √𝐺𝑏/𝐺
𝐴𝑖 = 𝐺𝑏/𝐺𝑜
Table 2: Constants.
Label Description Value
𝐺𝑆𝐶 Solar constant 1367
𝜌𝑔 Ground reflectance 0.2
𝑇𝑎𝑁𝑂𝐶𝑇 The ambient temperature at which nominal operating cell temperature defined 20
𝐺𝑡𝑁𝑂𝐶𝑇 The solar radiation at which nominal operating cell temperature is defined 800
𝐺𝑡𝑆𝑇𝐶 The incident radiation at standard test conditions 1000
𝑇𝑐𝑆𝑇𝐶 The temperature at standard test conditions 25
𝑓𝑝𝑣 The derating factor 0.8
𝜏𝑎 The product of the solar transmittance and solar absorbance 0.9
𝑜𝑓𝑓𝑠𝑒𝑡 Parameter for GHI calculation in accordance with the cloud coverage 0.2
Table 3: Static parameters.
Label Description Unit H1 H2 H3 H4 H5 H12
𝜆 Longitude ° -9.686 -9.662 -9.687 -9.685 -9.685 -9.663
𝜙 Latitude ° 53.131 53.101 53.129 53.129 53.129 53.124
𝑍𝑐 Time zone offset 1 1 1 1 1 1
𝛽 Slope of the PV cell surface ° 32 32 32 32 32 32
𝛾 Azimuth of the PV cell surface
° 0 0 0 0 0 0
𝑌𝑃𝑉 Rated capacity of the PV array
W 2000 4000 2000 2000 2000 2000
𝑎𝑝 Temperature coefficient 1/°C -0.004 -0.004 -0.004 -0.004 -0.004 -0.004
𝐴𝑃𝑉 Surface area of the PV cell m2 13.04 26.08 13.04 13.04 13.06 13.04
𝑇𝑐𝑁𝑂𝐶𝑇 Nominal operating cell temperature
°C 45.3 45.3 45.3
45.3 45.3 45.3
Finally, the example of performance for one-day production forecasting for house 3 is given in Figure 9. It
can be noticed that the estimation has significant deviations from the real production in comparison with
Figure 9: Example of Aran Island PV forecaster performance.
![Page 22: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/22.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
22 | 48
the Aarhus model, which was expected due to the fact that data-driven models usually achieve higher
performances. However, as ML approach was not applicable in the Aran use case, MAE of 21% achieved
on house 3 could be taken as acceptable.
In the end, it should be pointed out that in case of Aran, production forecast is calculated for each of the
houses with the renewable energy sources. Apart from the fact that the panels on different households
differ amongst each other, more importantly, optimization uses different energy hubs due to different
topology between the houses (e.g. having/not electrical storage, having different load profiles because of
electrical vehicle etc.), resulting with the necessity of separated production and demand forecasts for
these hubs. Nevertheless, more details will be given in the integration subsection.
Madrid pilot case
Similarly, to Aarhus case, Madrid production forecasting model was ML based, and involved
benchmarking various approaches – SVR, LR, NNs, kNNs and RF with hyper parameters optimized using
grid search (shown in Annex 1). Output of these models was estimation of renewable production at the
same time as the corresponded input weather parameters which included relative humidity, wind speed,
pressure, dew point, UV, wind direction, temperature, cloud coverage, global horizontal irradiance
(GHI), direct horizontal irradiance (DHI) and direct normal irradiance (DNI). However, measurements of
this output stored in InfluxDB are
obtained using sensor which generates
pulses after each 1kWh of energy,
leading to the conclusion that the measurements are highly imprecise.
Therefore, in order to compensate the lack of precision in the
measurement with higher model’s precision, additional inputs were
added being previous STC production. The fact that correlation
between the STC production at time 𝑡 and 𝑡 + Δ𝑡 is higher as Δ𝑡 is
smaller, was motivation to include 5 more inputs corresponding to
production in 5 previous hours. Apart from this change, the approach
for developing STC forecasting model was the same as the Aarhus one,
so hyper parameters were optimized using grid search and final comparison between different models is
given in Table 4, leading to the conclusion that random forest algorithm with MAE of 6.2% suits the best
for the STC production forecasting. An example of model estimation for one day is given in Figure 10,
corroborating the conclusion that this model is adequate for the RESPOND platform deployment.
Methodology/MAE Madrid
SVR - RBF 8.52
SVR - linear 9.17
SVR - sigmoid 9.01
Linear regression 8.60
Neural network 7.63
KNN 7.90
KNN weighted 7.85
Random forest 6.2
Table 4: Comparison between different ML approaches for STC production forecasting
![Page 23: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/23.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
23 | 48
3.5 SERVICE DEPLOYMENT
As a last step in the production forecast services development, integration and deployment was carried
out. On top of the developed models, a forecaster service was developed. Namely, it was designed to
obtain all the necessary inputs, perform the calculation, and store back the outputs which are further
exploited by other parts of respond platform (e.g. optimizer).
In the context of input data collection, relevant data parameters are obtained from MySQL data base, in
which WeatherBit data is stored, as showed in Figure 11. Apart from the weather data, as dynamic inputs
previous STC production is obtained from the InfluxDB. For all the pilots, relevant input data are obtained
with horizon of 1 day and hourly resolution, corresponding to the gaining output. As far as output is
considered, it is an array of 24 values stored in the MySQL DB. Depending on the pilot, outputs correspond
either to the neighboring level (Aarhus, Madrid) or to the house level (Aran), which was predefined by the
optimizer’s requirement. It is necessary to point out that for all pilot sites, both neighboring and
household level can be calculated from the stored values, either by proportionally downscaling or
summing up the estimations. Examples of stored values in ‘production_forecast_values’ are presented in
Figure 11 - Example of part of the weather forecasted data stored in MySQL
Figure 12 - Example of production forecast values MySQL table data
Figure 10 - Example of Madrid STC production forecaster performance
![Page 24: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/24.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
24 | 48
Figure 12, where can be seen that each forecasted production output value ‘value’ for corresponding time
interval between ‘timestamp_start’ and ‘timestamp_end’ is labeled with the ‘load_type_id’ (electrical
load, thermal load, dhw) and ‘location_id’ (Aarhus, Aran, Madrid and individual households). Finally, this
service is deployed on the server using Open Wisk and its running is orchestrated by the master scheduler
which controls the order of services in the control loop.
![Page 25: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/25.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
25 | 48
4. ENERGY DEMAND FORECASTING
The ability to accurately forecast short-term electricity demand can assist power system operators and
market participants in ensuring sustainable electricity planning decisions and securing electricity supply
for the consumers [12]. Unlike the regularity in commercial buildings, more irregularity is foreseen in
residential electrical consumption. As a matter of fact, electricity usage at individual household level
shows high variance, since it relies on users’ lifestyle, occupancy behaviour, building characteristics and
calendar information [13,14].
4.1 SOA REVIEW
There is extensive research in the forecasting of energy demand. A study investigates fifteen anonymous
individual household’s electricity consumption forecasting using a Support Vector Regression (SVR)
modelling approach, applied both to daily and hourly data granularity [15]. In this experiment,
households’ occupation, dwelling properties and socioeconomic status were unknown. Therefore,
aggregating hourly consumption to daily was an effective way to mitigate the impact of randomness in
hourly behaviours of family members.
Under the assumption that there usually exists an intrinsic low-dimensional structure governing the data
recorded from a collection of residential houses and that using this structure in load forecasting can help
improve the forecasting performance, a compressive load forecasting approach incorporating both
temporal and spatial information is presented in another study [16]. The proposed method is called
nonuniform CST-LF as it is inspired by CS (Compressive Sensing) and structured-sparse recovery
algorithms, and it is tested against various benchmark models using real and high-quality data, showing
that the proposed approach improves the short-term electric demand forecasting.
A research focused showing how calendar effects, forecasting granularity and the length of the training
set affect the accuracy of a day-ahead load forecast for residential customers [17]. Regression trees,
neural networks, and support vector regression were tested, and the former was the technique obtaining
best results. The use of historical load profiles with daily and weekly seasonality, combined with weather
data, leaves the explicit calendar effects a very low predictive power. In the setting studied in the article,
it was shown that forecast errors can be reduced by using a coarser forecast granularity. It was also found
that one year of historical data is enough to develop a load forecast model for residential customers as a
further increase in training dataset has a marginal benefit.
However, the energy consumption prediction field is not limited to the electricity. On the contrary, the
forecasting of DHW (Domestic Hot Water) consumption has been proved to be of interest, as it has the
potential to reduce the energy consumption of hot water systems. In this regard, a research proposed a
recurrent neural network which was trained with the measured DHW consumption of a 40-unit residential
building in Quebec City (Canada), to predict the future consumption [18]. It was found that the water
consumption profile of the building changed from day to day throughout the year and that it had an
important noise component. A predictive model was developed in this work and it was obtained by pairing
![Page 26: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/26.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
26 | 48
a recurrent neural network to predict the filtered domestic hot water demand with a random forest to
predict the noise signal.
All these evidences reinforce the need to develop a service that accurately forecasts the energy
consumption in RESPOND pilot sites, towards its optimisation.
4.2 DATA AVAILABILITY
Prior to the development of RESPOND’s Energy Demand Forecasting service, available energy sources
were analysed.
Time series data was collected on InfluxDB, and in order to ease its analysis, a Java-based application has
been developed using the influxdb-java client library. This application (which is known as
influxdbClient.jar) can be executed on any system with access to the database, and it allows the execution
of queries and saving of the results in different file formats.
Furthermore, this application is configurable as database connection and the query parameters can be
set by the user. Regarding connection settings, apart from the endpoint parameters, secure
communications establishment and authorized access can be configured. Regarding query settings write
and read timeouts and the maximum number of records that is expected to be retrieved by the database
can be configured. InfluxDB Settings are configurable as follow:
influxdb.ip=
influxdb.port=
influxdb.enablessl=
keystore.path
keystore.passwd=
influxdb.database=
influxdb.user=
influxdb.password=
influxdb.connectTimeout=
influxdb.writeTimeout=
influxdb.readTimeout=
influxdb.maxRecords=
Finally, with views to exporting InfluxDB query results, the application can be executed providing the
InfluxDB query and the name of the file where results will be saved as arguments. For example:
sudo java -jar influxdbClient.jar “InfluxDBquery" JSONFile
This application was leveraged to export the available data in JSON format. In order to evaluate the quality
of this data, the following indicators have been assessed:
• Completeness. It refers to the degree of presence of attributes in the data set, that is, the
percentage of data available. Three metrics are calculated in relation with data loss.
Completeness provides the number of data points that are lost. The other two indicators are the
complementary of percentage of observations and variable lost, respectively.
![Page 27: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/27.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
27 | 48
• Time uniqueness. When data is received by sensors, repetitions in values of temporal variable
are quantified. This metric allows to know the percentage of unique dates.
• Precision. It measures the representation degree of data, that is, how correct the available data
is for each variable. Given an upper and lower limit, the percentage of values received outside
this threshold is calculated. This metric is called Range. Note that outliers are detected in this
process. Furthermore, in the attributes of interest, the deviation in relation to average values is
measured and the data dispersion is quantified by means of three different metrics: Consistency,
Typicity and Moderation.
• Timeliness. It checks the punctuality of data calculating the uniformity of temporal variable.
Usually, in time series, data is expected to be received at uniform time intervals. The percentage
of waits exceeding the expected one between observations is calculated by this indicator.
• Format. It refers to the percentage of data received in a format different from the expected for
the information contained to be consistent.
The evaluation of these indicators has been performed by means of an R script. Results show that
Completeness, Format and Time Uniqueness indicators obtain 100% of quality in all cases. Regarding the
Precision, it fluctuates between 95% and 100% due to small data variability. We do not consider these
values to be alarm indicators, since in the case of the Range indicator 100% is always obtained. It is worth
mentioning that Missing data are not identified by the proposed indicator. When a sensor fails for
whatever reason, it stops sending data, including the time value. Due to failures in sensors, waiting times
occur in the time variable. This fact is reflected in the low values of Timeliness indicators.
The following tables summarize the data available for RESPOND’s three pilot sites. For each table, the
initial and last date for the registered measurements are shown, as well as the percentage of lost data.
Rows in red indicate houses that were considered to have insufficient data to do acceptable predictions.
This limit has been established in the 30%, which is considered a significant number, as it is normally used
for testing purposes in Machine Learning model development approaches. This analysis was last
performed on 06/03/2020.
House % Missing Values Initial Date Final Date
Madrid_00 0.85% 2019-07-06 2020-02-16
Madrid_01 0.85% 2019-07-06 2020-02-16
Madrid_02 1.48% 2019-07-06 2020-02-16
Madrid_03 1.48% 2019-01-01 2020-03-05
Madrid_04 1.48% 2019-01-01 2020-03-05
Madrid_05 0.85% 2019-07-06 2020-02-16
Madrid_06 2.41% 2019-06-20 2020-03-05
Madrid_07 2.41% 2019-06-20 2020-03-05
Madrid_10 0.85% 2019-07-06 2020-02-16
Madrid_12 1.48% 2019-01-01 2020-03-05
Madrid_13 1.48% 2019-01-01 2020-03-05
Table 5: Demand data availability for Madrid.
![Page 28: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/28.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
28 | 48
House % Missing Values Initial Date Final Date
Aran_01 6.30 % 2019-01-01 2019-07-20
Aran_02 8.54 % 2020-02-06 2020-03-05
Aran_03 1.93 % 2019-04-02 2020-03-05
Aran_04 24.25 % 2019-05-02 2020-03-05
Aran_05 2.08 % 2019-04-26 2020-03-05
Aran_06 12.75 % 2019-09-04 2020-03-05
Aran_08 6.30 % 2019-08-06 2020-03-05
Aran_10 3.97 % 2019-09-04 2020-03-05
Aran_12 40.48 % 2019-11-05 2020-03-05
House % Missing Values Initial Date Final Date
Aarhus_01 49.28% 2019-03-27 2019-03-05
Aarhus_02 36.17 % 2019-04-03 2020-03-05
Aarhus_03 16.33 % 2019-03-26 2020-03-05
Aarhus_04 62.29 % 2019-03-27 2020-03-05
Aarhus_05 13.50 % 2019-03-25 2020-03-05
Aarhus_06 3.59 % 2019-03-26 2020-03-05
Aarhus_07 22.15 % 2019-04-07 2020-03-05
Aarhus_08 2.17 % 2019-03-26 2020-03-05
Aarhus_09 4.74 % 2019-03-29 2020-03-05
Aarhus_10 58.41 % 2019-03-26 2020-03-05
Aarhus_11 2.12 % 2019-03-26 2020-03-05
Aarhus_12 1.28 % 2019-03-28 2019-07-20
Aarhus_13 21.96 % 2019-03-14 2020-03-05
Aarhus_14 3.53 % 2019-03-26 2020-03-05
Aarhus_15 32.72 % 2019-03-26 2020-03-05
Aarhus_16 57.75 % 2019-03-28 2020-03-05
Aarhus_17 21.88 % 2019-04-03 2020-03-05
Aarhus_18 45.07 % 2019-03-27 2020-03-05
Aarhus_19 99.40% 2019-03-26 2020-02-13
Aarhus_20 30.25 % 2019-04-03 2020-03-05 Table 7: Demand data availability for Aarhus.
Looking at the data availability results provided by Table 5, Table 6 and Table 7, it can be concluded that
not all the pilot sites have the same data availability. The Spanish pilot site is the one with the lowest data
Table 6: Demand data availability for the Aran Islands.
![Page 29: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/29.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
29 | 48
loss percentage in their participant houses, losing less than 3% of data in the worst scenarios. For four of
the participant houses there is historical data for a period that exceeds the year. In the Aran Islands case,
data loss is rather heterogeneous among the participants, being less than 2% in the best case, and
exceeding the 40% of data loss in the worst case. It is worth mentioning that, only two of the nine
participant houses have more than the 20% of data loss. Finally, the Aarhus pilot site is the most affected
one by data loss. These results can definitely be attributed to the deployment problems explained in
RESPOND’s periodic reports. Only six out of the twenty participant houses have a lower data loss than 5%,
while other six houses have more than their 40% data missing. In total, there are nine houses data have
lost at least one quarter of their measurements.
4.3 METHODOLOGY
Traditionally, the energy demand forecasting has been addressed via data-driven algorithms due to their
high performance. Therefore, RESPOND’s Energy Demand Forecasting Service has targeted these
algorithms with views to having the best performance possible.
Electric Energy Forecasting
Firstly, we decide that the explanatory input variables in the predictive models were extracted from the
time variable. On the one hand, this agreement provides simplicity to the models and allows the results
to be explained. On the other hand, continuity in time allows the imputation of the missing data in case
of sensor failure.
Before creating models, we identified some outlier values. These are values that excessively exceed the
typical values for electrical consumption. After observing the behavior of the consumption data for the
different houses, we concluded that a common pattern would lack precision. Finally, we decided to
remove values greater than 3000 kWh. These values are considered meaningless and possibverly caused
by a failure in the data collection method.
In the process of finding the best predictive model, we started with Autoregressive Integrated Moving
Average (ARIMA(p,d,q)) models. Those models are fitted to time series data to predict future points where
data show evidence of non-stationarity. Time series can be transformed into stacionary by differentiation
d times. Once the series is stationary, we used the classic explanatory methods to choose the orders p
and q based on the comparation of Akaike Information Criterion (AIC) and Bayesian Information Criterion
(BIC). An autoregressive model of order p , AR(p) is one that forecast the variable of interest using a linear
combination of p past values of the variable. AR(p) can be written as
𝑦𝑡 = 𝑐 + 𝜙1𝑦𝑡−1 + 𝜙2𝑦𝑡−2 + ⋯ + 𝜙𝑝𝑦𝑡−𝑝 + 휀𝑡
where c and 𝜙𝑖 , 𝑖 = 1, … , 𝑝 are the regression coefficients that will be estimated by the maximum
likelihood estimation (MLE), 𝑦𝑡−𝑖, 𝑖 = 1, … , 𝑝 are the p lagged values of 𝑦𝑡 used as predictors and 휀𝑡 is
white noise. A moving average model of order q, MA(q) specifies that the variable of interest depends on
the lagged values of a stochastic term, according to the equation:
𝑦𝑡 = 𝜇 + 휀𝑡 + 𝜃1휀𝑡−1 + 𝜃2휀𝑡−2 + ⋯ + 𝜃𝑞휀𝑡−𝑞
![Page 30: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/30.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
30 | 48
where 𝜇 is the mean of the series, 𝜃𝑖 , 𝑖 = 1, … , 𝑞 are the parameters estimated by MLE and 휀𝑡, … , 휀𝑡−𝑞 are
the current and the q previous values of white noise.
Due to high amount of data, this searching for the optimal p and q was neither simple nor satisfactory.
Models are implemented using the statistical software R. It has multiple functions for the treatment of
time series. Specifically, we used a method that finds the best Seasonal Autoregressive Integrated Moving
Average (SARIMA) model. The idea is that SARIMA models are ARIMA models (p, d, q) whose residues are
ARIMA (P, D, Q). Table 8 compares the bad results obtained in both predictions using data corresponding
on the second house of Madrid.
Model p d q P D Q Seasonal RMSE
ARIMA 2 1 4 - - - - 1540.07
SARIMA 3 1 4 0 0 2 24 1294.81 Table 8: Comparison between ARIMA and SARIMA models.
The problem in both cases is the same. The more time passes, the less accurate the estimation is. Figure
13 shows the result of the prediction of a period of 10 days in March for the second house in Madrid. The
graph is the SARIMA model prediction. Figure 14 shows the estimated values for 24 hours to observe that
the estimation is not accurate.
Figure 13: ARIMA's Electric Consumption prediction of a period of 10 days in March for Madrid_02.
![Page 31: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/31.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
31 | 48
Figure 14: ARIMA's Electric Consumption prediction of a period of 24 hours for Madrid_02
Then, we used Machine Learning techniques to develop the energy demand forecasting predictive
models. One of the most important steps in Machine Learning was the decision of which input variables
to use. It is worth mentioning that the home-related data was captured for periods of time that never
exceeded one-year period. For that reason, information about minutes, hours, days and months are
included in the input variables, but not the year variable.
Having decided that the training variables will be the ones mentioned above, we consider that this type
of data is inherently cyclical. We used a sinusoidal transformation into 2 dimensions to include them on
the model. This way, two new features are created from each variable, deriving a sine transformation and
cosine transformation, considering their periodicity. This method is not applied in day variable because
each month has a different number of days. We consider that finding a method to generalize the
periodicity of this variable complicates the problem and we do not believe that it provides relevant
information.
For example, the transformation carried out in the minute variable is shown below.
𝑓𝑚𝑖𝑛: [0,60) → [0,1] × [0,1]
𝑥 ↦ (𝑐𝑜𝑠 (2𝜋𝑥
60) , 𝑠𝑖𝑛 (
2𝜋𝑥
60))
In the domain space of minute variable there are sixty natural values. The Euclidean distance between
two consecutive elements of this set is always 1
𝑑(𝑥, 𝑥 + 1) = 1, ∀𝑥 ∈ [0,59)
Since element 0 is consecutive to 59, the Euclidean distance between these two points should be 1 as
well. Obviously, this is not the case.
𝑑(59,0) = 59
![Page 32: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/32.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
32 | 48
Using this kind of method, homogenous Euclidean distance is obtained between all consecutive points in
a two-dimensional space.
Additionally, other categorical variables are created based on the time variable. We included information
about the season of the year and the day of the week. We observed that the behavior in the electrical
consumption differs between working days and holidays. Due to this fact, we decided to create a new
dichotomous variable providing information about holiday days in each place.
The first machine learning algorithm based on supervised learning that we tested was Linear Regression.
Linear relationship between dependent variables and independent variable was found. Figure 15 shows
the estimated values of electric consumption using the best linear model.
0
10
20
30
40
50
60
70
0 3 6 9 12151821242730333639424548515457
Original minute variable
Linear
-1,5
-1
-0,5
0
0,5
1
1,5
-1,5 -1 -0,5 0 0,5 1 1,5Sin
us
Cosinus
Cyclical transformation of minute variable
day sin(month) cos(month) sin(hour) cos(hour) Season Weekday Workingday RMSE
x x x x x x x x 540.22
x x x x x x x 541.48
x x x x x 558.07
![Page 33: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/33.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
33 | 48
Figure 15: Linear Regression's Electric Consumption prediction for Madrid_02.
Although the RMSE was significantly lower than in the results obtained with ARIMA, the Coefficient of
determination (𝑅2) was less than 0.3 in all fitted models.
Another supervised learning algorithm that we tested was Support Vector Regression (SVR). SVR uses the
same principles as SVM but it used in a regression method, so we can use SVR for working with continuous
values. The caret library available in R is leveraged, which contains functions to train machine learning
models. We use a cross-validation as resampling method. This type of procedure chooses the best
combination of hyperparameters for the model is being trained.
Electricity consumption took negative values in some cases using SVR and this makes no sense. Electric consumption can never be less than zero kWh. Although RMSE obtained was lower than the previous, the method was rejected because this problem could not be controlled. An example of this situation is shown in Figure 16.
day sin(month) cos(month) sin(hour) cos(hour) Season Weekday Workingday RMSE
x x x x x x x x 571.92
x x x x x x x 471.92
![Page 34: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/34.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
34 | 48
Figure 16: Electricity Consumption prediction obtained with an SVR model.
Finally, we used the K-nearest neighbors algorithm (KNN). KNN is a supervised machine learning algorithm
that can be used to solve regression models. In this case, we also used the caret library and the value
chosen for parameter k was 5 in all houses.
Depending on the data availability, performance of forecasters varies. Therefore, predictive models are
periodically re-trained as they are expected to improve their performance as a bigger historical data size
is available.
For the predictive models developed in October 2019, predictive models can be classified in three
different categories. On the one hand, predictive models with a good accuracy which are able to predict
daily schedules and routines, with a MAE below 90kW (see Figure 17). On the other hand, less accurate
predictive models with errors of different magnitudes with MAEs over 125kW. Last but not least,
predictive models that have a bad accuracy (see Figure 18) with MAEs over 250kWs. These former two
type of models don’t adjust well to the dweller’s routine at certain time intervals, as there is not enough
data to learn from them. More historical data is required to train and adjust the model, which is foreseen
to be achieved during the last six months of the T4.4.
Figure 17 shows a predictive model with good accuracy for predicting the electric consumption of a
dwelling in Madrid. It can be seen that red dots (predicted consumptions) are rather close to blue dots
(real consumption). As for Figure 18, it compares the predictions made by a predictive model with a worse
performance with the real consumption of another house in Madrid. It can be concluded that these
predictions are not as accurate as the ones shown in Figure 17.
![Page 35: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/35.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
35 | 48
Figure 17: Real vs forecasted energy consumption of a predictive model with a good performance
Figure 18: Real vs forecasted energy consumption of a predictive model with a bad performance
Figure 19 and Figure 20 show the residuals of a predictive model with enough data, and a predictive model
developed with scarce data respectively. It can be seen that the quality of the former model is better than
the latter, as most differences between observed and predicted values are closer to 0.
Figure 19: Residuals of a predictive model with a good performance.
![Page 36: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/36.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
36 | 48
Figure 20: Residuals of a predictive model with a bad performance.
The neighborhood consumption is also forecasted, leveraging the already developed predictive models.
As it can be seen on Figure 21, the developed predictive models have a good accuracy, which is expected
to be improved as more data is available.
Figure 21: Forecasted vs actual electric consumption in Madrid (neighbourhood level)
When more historical data was available, predictive models were retrained. Figure 22 and Figure 23 show
the electricity consumption estimation for a 10-days period in March by predictive models trained in
March 2020. We can see that the estimation is more accurate with respect to the predictions that were
made in October.
![Page 37: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/37.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
37 | 48
Figure 22: Electric Consumption prediction of a period of 10 days in March for Madrid_02.
Figure 23: Electric Consumption prediction of a period of 10 days in March for Aarhus_11.
DHW Forecasting
An initial study of the performance of the DHW consumption estimations was made using the kNN
method. DHW consumption is measured in m3, and the water meter doesn’t provide decimals. Since the
unit is too largue for the hourly consumption of an apartments building, values of the variable to be
predicted are natural numbers 0, 1, 2. Therefore, it was considered that a classification algorithm would
be more adequate. The consumption variable is introduced as a factor variable that takes three values.
Figure 24 shows the actual classification returned by the algorithm.
![Page 38: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/38.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
38 | 48
Figure 24: Initial predictions obtained for Madrid Neighbourhood DHW consumption.
As it can be seen, all values are classified as 1. To avoid this fact, it was decided not to treat the values as
factors, but as numerical ones. After obtaining the prediction, it was decided to apply manual rounding of
the decimal values. Values less than 0.9 are assigned 0, values between 0.9 and 1.1 a 1 and values greater
than 1.1 a 2. Finally, the values are converted to a factor. Figure 25 shows the results obtained with a
redictive model trained with data until November 2019.
Figure 25: Predictions obtained for Madrid Neighbourhood DHW consumption with predictive model trained with data up to November 2019.
Figure 26 corresponds to the results obtained with a predictive model developed with data until March
2020, and the confussion matrix is shows in Table 9.
![Page 39: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/39.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
39 | 48
Figure 26: Predictions obtained for Madrid Neighbourhood DHW consumption with predictive model trained with data up to March 2020.
Reference
Prediction 0 1 2
0 4 18 4
1 12 72 39
2 1 19 16 Table 9: DHW Consumption Prediction Confussion Matrix.
On the diagonal of the table are the values that were well classified. Accuracy is a measure of goodness
of classification that consists of dividing the values that were well classified by the total values that have
been predicted. In this case, 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =92
185= 0.4973, that is, 49.73 % of estimated values are the same
as real values. This result is low and shows the poor precision of the predictive model. However, we note
that values that are incorrectly classified are not generally estimated at the opposite extreme. That is,
only 1 value should be 0 and is classified as 2, and values that should be 2 and classified as 0 are only 4.
Other methods are going to be tried to improve this result.
4.4 SERVICE DEPLOYMENT
A data-driven predictive model has been developed for forecasting the electric consumption of each
dwelling, as well as the DHW consumption at a neighbourhood level in the Madrid pilot site. These
predictive models were developed in R and exported in *.rds files.
The execution of the predictive models to perform the upcoming 24 hours’ predictions were automated
using periodical tasks executed by a crontab daemon. These tasks execute remotely the models deployed
in an R server. The tasks admits several parameters to indicate which is the model that is going to be
executed, the period of time forecasted and othe input parameters needed by the model to generate an
output. This mechanism allow to execute multiple instances of the same model with different input
![Page 40: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/40.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
40 | 48
parameters or different models with the same parameters. This generic execution module is deployed
within a web service in a Tomcat server. The web service interface (REST API) allows to manage remotely
and dynamically the execution of the models adding, modifying or deleting tasks and the corresponding
predictions.
They were scheduled to be executed daily. When executed, these tasks connect the RServe with the
models, perform a prediction and retrieve the information to be stored in a MySQL database for later use
for the optimization service or the different visualization tools as the mobile app or the desktop
dashboard. As shown in Figure 27, two of the components (RServe and Tomcat) where deployed using
Docker containers.
Figure 27: Deployment of the Energy Demand Forecasting Services.
The Energy Demand Forecasting service is not closed, which means that new predictions can be added as
well as modifying existing ones. If a new prediction is needed, data analysts must develop a predictive
model and generate the corresponding *.rds file. This file is then copied into the RServe. A new task must
be added to the taskservice, configuring the schedule of the task and the input parameters. A typical
problem of forecasting services is the need of adjust the behaviour to the model to the last trends. This
adjustment implies the retraining the predictive model with the last historical data available. With this
mechanism the retraining could be done offline by the analyst and it is only necessary to substitute the
old R model (.rds file) with the new one.
RServe Apache Tomcat
MySQL Task Service
(Java) .rds models
Docker Docker
![Page 41: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/41.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
41 | 48
5. CONCLUSIONS
In 2016 the residential sector represented the 25.4% of final energy consumption and 17.4% of gross
inland energy consumption in the EU. Therefore, the potential of Demand Side Management activities
and DR programmes is particularly promising for this sector. Being able to accurately predict the amount
of energy to be produced over a period of time, and knowing in advance when demand peaks will occur,
can definitely contribute to a better management of their disparity, thus allowing the suggestion of the
most suitable DR programs to end-users. Furthermore, towards the improvement of the stability and
quality of the grid systems, it is necessary to provide consumption and production planning ahead. As part
of Task 4.4, RESPOND aims at developing services that allow the estimation of produced and consumed
energy in dwellings and neighbourhoods within the pilot sites.
Recently numerous approaches for energy forecasting were proposed in literature, and although most of
them focus on data-driven due to their high performance, there are also physics-based models.
Furthermore, the KDD process leading to the extraction of useful knowledge from raw data is at the core
of these predictive models. In this deliverable, the development of predictive models for energy
forecasting services are provided.
The Energy Production Forecasting service focuses on the development of models to estimate the
renewable energy production. Namely, it focuses on the RES generation systems available at the RESPOND
pilot sites, that is, on PV panels (in Aarhus and the Aran Islands) and Solar Thermal Collectors (in Madrid).
Various Machine Learning approaches were considered and tested using Python for Aarhus and Madrid
pilot sites. In all cases, optimal hyper parameters were chosen using grid search and the MAE was used as
an indicator of their performance. For forecasting energy coming from PV panels, Random Forest models
were the ones with the best performance, whilst for the STC Neural Networks were the chosen. For the
Aran Islands, a physical model was employed, as it required only parameters that can most commonly be
found in the PV cells data sheets.
Likewise, the Energy Demand Forecasting service has been found necessary to accurately forecast short-
term electricity demand. Furthermore, this service has also covered the estimation of DHW consumption
in the Spanish neighbourhood. However, this DHW consumption case is rather complicated due to the
aforementioned difficulties, thus is yet to be adequately solved. Similar to the Energy Production
Forecasting models, different ML algorithms have been tested and best results were obtained with kNN
type of algorithms, after generating the necessary input variables derived from raw data (e.g. the sine and
cosine of the hour data).
All the developed models have been deployed following a well-defined plan with views to passing from a
testing to a production environment. They are all designed to retrieve the necessary inputs, execute their
estimations and store the results in an automatized way.
![Page 42: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/42.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
42 | 48
ANNEX 1
This Annex shows the benchmarking results for the Madrid pilot site’s production forecasting model based
on Machine Learning.
MSE MAE MSE MAE MSE MAE MSE MAE MSE MAE
n = 1 train 0.015604 0.077307 0.011866 0.071406 0.019039 0.094445 0.019159 0.092862 0.018788 0.085482
alpha = 0.0001
val 0.017681 0.084605 0.027365 0.087863 0.020549 0.096541 0.01852 0.094201 0.014967 0.083557
test 0.014385 0.077405 11.60716 0.396226 0.028892 0.103801 0.012492 0.077787 6.456884 0.329657
n = 1 train 0.016153 0.081352 0.015488 0.082038 0.018103 0.085819 0.016712 0.079699 0.01968 0.09203
alpha = 0.001
val 0.058684 0.106863 0.015584 0.083892 0.019242 0.092149 9.92237 0.397753 0.019073 0.096172
test 0.919666 0.177747 0.771831 0.166682 0.052011 0.095048 0.032856 0.080721 0.059233 0.114241
n = 1 train 0.018123 0.090033 0.021513 0.098689 0.019124 0.095226 0.0242 0.100317 0.016998 0.084057
alpha = 0.01 val 0.035563 0.099466 0.010709 0.074461 0.020864 0.101133 0.017806 0.09763 0.057648 0.123404
test 0.437981 0.149553 0.017974 0.095148 0.015575 0.08541 0.014523 0.086034 0.019351 0.088121
n = 1 train 0.01692 0.084723 0.019069 0.085512 0.019851 0.092154 0.018472 0.092493 0.018647 0.088056
alpha = 0.1 val 0.022594 0.101805 0.016687 0.084805 0.022353 0.091452 0.077364 0.107857 0.024052 0.101085
test 0.025846 0.102788 0.020192 0.096981 0.119338 0.125656 0.036669 0.109487 0.023387 0.10182
n = 1 train 0.023403 0.094512 0.020927 0.088634 0.022221 0.093445 0.02121 0.087403 0.017827 0.079256
alpha = 1.0 val 0.021463 0.089171 0.018326 0.083212 0.01356 0.072988 0.01834 0.080193 0.020109 0.086545
test 0.020912 0.08773 0.0224 0.090054 0.021636 0.088713 0.019025 0.084933 0.023306 0.097146
n = 1 train 0.03075 0.120032 0.030608 0.115127 0.034604 0.124481 0.028322 0.115598 0.028443 0.111767
alpha = 10.0 val 0.028408 0.114851 0.121721 0.136753 0.028582 0.117125 0.029281 0.121573 0.031952 0.113387
test 0.029413 0.118319 0.020277 0.089448 0.047586 0.141532 0.034518 0.120342 0.030724 0.118158
n = 1 train 0.057917 0.176015 0.055876 0.178485 0.057861 0.177748 0.062579 0.184571 0.05601 0.175951
alpha = 100.0
val 0.048689 0.164342 0.04847 0.155859 0.040056 0.15767 0.042915 0.161319 0.062508 0.1801
test 0.060069 0.179784 0.056776 0.175048 0.066791 0.192928 0.075922 0.198024 0.046311 0.166136
n = 1 train 0.057971 0.177584 0.063981 0.183494 0.054805 0.169174 0.068137 0.195043 0.082505 0.221369
alpha = 1000.0
val 0.071432 0.199682 0.071316 0.198563 0.079843 0.193329 0.040529 0.166422 0.057975 0.194866
test 0.073934 0.199132 0.058619 0.176699 0.072944 0.189047 0.085304 0.215487 0.067726 0.193628
n = 2 train 0.014728 0.077051 0.014477 0.075603 0.011307 0.064982 0.014875 0.075016 0.016286 0.077226
alpha = 0.0001
val 0.038758 0.094639 0.193498 0.120891 0.020567 0.083094 1.519067 0.208796 0.035789 0.094547
test 0.033382 0.093365 0.012988 0.073036 0.057436 0.103396 0.734483 0.16895 0.029053 0.099904
n = 2 train 0.015758 0.076718 0.012026 0.06467 0.015877 0.075033 0.014561 0.074509 0.017477 0.08086
alpha = 0.001
val 0.015237 0.071581 0.908221 0.168812 0.015757 0.081493 0.028879 0.085538 0.025711 0.105326
test 0.081943 0.107886 6477.874 10.10355 0.049024 0.09881 0.078543 0.117813 0.032679 0.10372
n = 2 train 0.015769 0.073446 0.015372 0.069704 0.015113 0.072818 0.015146 0.073802 0.019149 0.084355
alpha = 0.01 val 0.018303 0.075674 0.030786 0.099222 0.031997 0.110947 0.013514 0.069619 0.039039 0.091434
test 0.016075 0.077992 0.011977 0.067987 0.031151 0.09289 0.029368 0.094891 0.018144 0.083738
n = 2 train 0.020137 0.086962 0.020114 0.088364 0.013428 0.070797 0.017229 0.08161 0.021972 0.095486
alpha = 0.1 val 0.011702 0.067005 0.020395 0.093463 12.36888 0.418027 0.068525 0.091061 0.026103 0.101155
test 0.018059 0.087829 0.019202 0.094073 17.99166 0.600563 0.062305 0.098877 0.014353 0.074587
![Page 43: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/43.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
43 | 48
n = 2 train 0.023241 0.092227 0.023196 0.089923 0.015858 0.077653 0.015882 0.072632 0.020534 0.086139
alpha = 1.0 val 0.023714 0.091385 0.023812 0.089338 0.034385 0.110552 142.4085 1.580592 0.028453 0.10147
test 0.026377 0.103599 0.015324 0.079695 0.016777 0.078964 69.04901 0.814191 0.018051 0.086549
n = 2 train 0.029684 0.1185 0.024663 0.104488 0.030168 0.120471 0.029042 0.116263 0.027432 0.111235
alpha = 10.0 val 0.028717 0.118212 0.052246 0.104358 0.035308 0.116038 0.029117 0.117065 0.026951 0.117635
test 0.034638 0.12703 0.049093 0.117155 0.041769 0.136878 0.024037 0.110271 0.02503 0.106055
n = 2 train 0.057553 0.17866 0.046664 0.161381 0.051134 0.166191 0.055601 0.173524 0.06129 0.181362
alpha = 100.0
val 0.053899 0.175967 255.692 2.467469 0.063941 0.18647 0.048454 0.167391 0.05335 0.173927
test 0.050465 0.167912 0.045308 0.158149 0.042331 0.15798 0.059416 0.176707 0.065907 0.189585
n = 2 train 0.084683 0.220517 0.059773 0.177133 0.075334 0.209385 0.069448 0.197707 0.063886 0.186567
alpha = 1000.0
val 0.062058 0.199555 0.057405 0.179813 0.084677 0.208472 0.056278 0.185964 0.066825 0.18722
test 0.059189 0.195877 0.083678 0.208357 0.071181 0.196271 0.067512 0.196884 0.065541 0.192095
n = 3 train 0.012853 0.068769 0.010312 0.057147 0.011731 0.064107 0.012884 0.067972 0.010676 0.059813
alpha = 0.0001
val 0.05114 0.092453 4843857 275.1617 0.069796 0.093791 0.127365 0.112451 0.022697 0.084824
test 0.030245 0.084275 2175092 129.9162 0.010753 0.057572 0.562719 0.125954 0.032949 0.082636
n = 3 train 0.011149 0.060065 0.014238 0.070737 0.0096 0.055558 0.012528 0.065564 0.013828 0.06857
alpha = 0.001
val 0.039424 0.087392 0.020503 0.077204 42834683 580.7974 0.082825 0.095195 0.024618 0.081059
test 211.176 1.878393 0.045695 0.085411 42274300 572.706 0.086539 0.130524 0.026874 0.09089
n = 3 train 0.01226 0.064443 0.018656 0.081283 0.014806 0.069765 0.015931 0.071716 0.014408 0.072143
alpha = 0.01 val 0.023923 0.09305 0.017465 0.080368 0.025098 0.08863 0.024934 0.09781 0.039203 0.104552
test 0.025839 0.095503 0.031276 0.090754 0.024552 0.09774 0.031219 0.100813 0.013913 0.066506
n = 3 train 0.017571 0.07911 0.013956 0.071843 0.015059 0.075635 0.016442 0.079731 0.018199 0.083721
alpha = 0.1 val 0.029211 0.093638 0.02319 0.091684 0.024962 0.091407 0.022969 0.083869 0.017249 0.07951
test 0.015547 0.082261 0.023402 0.093102 0.017263 0.078878 0.014827 0.073931 0.016571 0.08492
n = 3 train 0.019528 0.085383 0.019657 0.084146 0.021212 0.088956 0.014855 0.073546 0.015753 0.075744
alpha = 1.0 val 0.018259 0.080456 0.017425 0.085082 0.012327 0.074108 0.218056 0.133403 0.025574 0.090497
test 0.020657 0.090132 0.020805 0.094042 0.022401 0.093175 1723.608 5.257491 0.026009 0.094757
n = 3 train 0.025972 0.105891 0.028767 0.118024 0.030056 0.119828 0.031099 0.118052 0.030738 0.116421
alpha = 10.0 val 0.026279 0.10694 0.025127 0.114503 0.029959 0.115594 0.038454 0.135325 0.028378 0.111585
test 0.019804 0.08399 0.027563 0.109835 0.029769 0.117457 0.033353 0.117268 0.031266 0.120131
n = 3 train 0.058149 0.169497 0.048701 0.165837 0.059455 0.177352 0.053515 0.166356 0.054294 0.176048
alpha = 100.0
val 0.068989 0.188172 0.060493 0.176343 0.061297 0.17861 0.064678 0.171871 0.03691 0.149947
test 0.065867 0.186432 0.036586 0.146231 0.063105 0.185239 0.119099 0.244062 0.148751 0.194353
n = 3 train 0.063484 0.187622 0.066588 0.192699 0.071355 0.200093 0.072422 0.203657 0.073394 0.204734
alpha = 1000.0
val 0.056176 0.179571 0.075822 0.202822 0.035294 0.157687 0.085019 0.207361 0.057523 0.180654
test 0.0784 0.19899 0.049647 0.170553 0.082029 0.216929 0.077855 0.20473 0.127743 0.241321
n = 4 train 0.011418 0.063862 0.011446 0.06444 0.009028 0.051963 0.010095 0.058059 0.010984 0.060484
alpha = 0.0001
val 0.110646 0.098555 0.083231 0.084176 0.051947 0.08665 0.087437 0.116062 4.851071 0.294152
test 0.089089 0.092912 0.451649 0.151383 0.391805 0.152709 0.038439 0.107783 3.318369 0.312264
n = 4 train 0.013388 0.064549 0.014268 0.066371 0.012394 0.061218 0.012925 0.062738 0.008141 0.051575
alpha = 0.001
val 0.078296 0.081943 0.441684 0.1359 1.155316 0.205619 0.035628 0.083228 6460798 225.1489
test 0.032086 0.088383 0.022798 0.078513 0.014529 0.065617 0.020784 0.071913 6477845 224.197
n = 4 train 0.016479 0.074809 0.009472 0.05905 0.015198 0.068253 0.013806 0.069892 0.012788 0.062047
alpha = 0.01 val 0.037774 0.110671 233709.1 47.83419 0.023806 0.087169 0.024816 0.098211 0.02774 0.096203
![Page 44: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/44.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
44 | 48
test 0.01713 0.07528 2.13E+11 58424.65 0.019014 0.086193 0.024673 0.081174 0.04649 0.085126
n = 4 train 0.022691 0.092498 0.0124 0.070094 0.019127 0.0898 0.021711 0.092949 0.016867 0.076973
alpha = 0.1 val 0.012225 0.075387 1.6E+08 1119.619 0.0226 0.090141 0.019172 0.088591 0.019793 0.086111
test 0.020044 0.093031 3.21E+08 2231.977 0.019536 0.083035 0.023125 0.0945 0.02768 0.096921
n = 4 train 0.019737 0.087791 0.017436 0.081283 0.017681 0.080424 0.019344 0.083217 0.022222 0.090614
alpha = 1.0 val 0.017524 0.072503 0.016692 0.076532 0.024788 0.089143 0.016842 0.076549 0.020412 0.09425
test 0.020547 0.08333 1271.582 5.418682 0.023468 0.094498 0.034331 0.108018 0.022003 0.086014
n = 4 train 0.024154 0.102926 0.0303 0.120188 0.029631 0.117045 0.028941 0.115619 0.028434 0.113709
alpha = 10.0 val 7.515737 0.475666 0.0236 0.108158 0.026633 0.115686 0.023279 0.108729 0.025996 0.104159
test 0.028043 0.107559 0.021042 0.097219 0.018943 0.09676 0.027274 0.114011 0.025429 0.110044
n = 4 train 0.06367 0.187317 0.049277 0.159735 0.054006 0.169338 0.051691 0.164068 0.061475 0.186641
alpha = 100.0
val 0.086587 0.20719 0.049756 0.164125 0.04733 0.16224 0.064245 0.179995 1961.754 4.141597
test 0.068798 0.187332 0.065577 0.182515 0.051315 0.170723 0.053543 0.166289 1968.294 4.072734
n = 4 train 0.075732 0.205781 0.060416 0.181125 0.076662 0.209624 0.06445 0.187228 0.067958 0.196053
alpha = 1000.0
val 0.093328 0.224861 0.091799 0.219291 0.057026 0.184972 0.068394 0.190095 0.048297 0.166949
test 0.06148 0.190328 0.098415 0.2152 0.079617 0.208535 0.062922 0.189197 0.072311 0.199814
n = 5 train 0.014738 0.068739 0.012312 0.063747 0.011333 0.059891 0.010943 0.0577 0.007952 0.046478
alpha = 0.0001
val 0.117752 0.098965 0.74919 0.177114 0.055447 0.091188 0.054331 0.082093 0.031183 0.083543
test 1.11701 0.174295 0.02188 0.075481 1.182298 0.1967 3.624193 0.295085 3898607 299.1411
n = 5 train 0.011785 0.062231 0.012475 0.066023 0.013094 0.06679 0.008994 0.051265 0.015191 0.06865
alpha = 0.001
val 15445.07 15.67267 0.022173 0.085989 0.043852 0.100594 0.559677 0.173687 0.028335 0.083564
test 0.023088 0.077373 0.039256 0.092053 0.018418 0.077099 5.28E+09 9620.684 0.01981 0.086757
n = 5 train 0.014977 0.072008 0.014407 0.072575 0.013896 0.069461 0.019862 0.08504 0.014425 0.069129
alpha = 0.01 val 189.0897 1.774678 0.018694 0.077841 0.029265 0.086294 0.03463 0.097197 0.016937 0.082529
test 7.9183 0.348368 0.01909 0.092738 0.014368 0.070167 0.01849 0.08203 0.037869 0.100108
n = 5 train 0.014898 0.078158 0.018041 0.079706 0.018484 0.082555 0.020135 0.08891 0.019432 0.090051
alpha = 0.1 val 0.02043 0.081274 0.021353 0.089109 0.017707 0.077925 0.012242 0.06976 0.015578 0.079284
test 0.028556 0.103326 0.028115 0.101376 0.020678 0.086157 0.017463 0.085539 0.015007 0.07645
n = 5 train 0.021273 0.08945 0.023323 0.095119 0.01717 0.07619 0.020045 0.08478 0.026555 0.100627
alpha = 1.0 val 0.020197 0.08971 0.022123 0.083571 0.025296 0.09229 0.018475 0.084123 0.017897 0.086163
test 0.024584 0.092887 0.025819 0.100484 0.017935 0.085162 0.017611 0.084882 0.021931 0.092893
n = 5 train 0.026772 0.10959 0.026 0.109159 0.03125 0.116555 0.024879 0.106579 0.030661 0.117127
alpha = 10.0 val 0.025942 0.099843 0.025083 0.10549 827635.6 80.6446 0.024384 0.102148 0.030651 0.120308
test 0.022345 0.098964 0.029402 0.110514 827767.1 80.38624 0.034334 0.116864 0.025282 0.111504
n = 5 train 0.051445 0.168786 0.045897 0.157817 0.051856 0.167544 0.049741 0.163085 0.045574 0.151348
alpha = 100.0
val 0.049231 0.162911 930.0085 2.862775 0.052947 0.166112 0.074996 0.194954 78.01957 1.266441
test 0.067221 0.185472 2437.234 6.293032 0.06325 0.18068 0.041307 0.152867 34.17256 0.696948
n = 5 train 0.070112 0.19572 0.068903 0.190224 0.070646 0.205056 0.06338 0.188982 0.06625 0.19059
alpha = 1000.0
val 0.06432 0.190414 0.085314 0.21191 0.059819 0.176827 0.048986 0.165468 0.071034 0.202015
test 0.045338 0.168759 0.063928 0.192411 0.092667 0.222073 0.084436 0.212722 0.056736 0.178952
n = 6 train 0.011126 0.060991 0.014274 0.068216 0.00982 0.055993 0.008923 0.052912 0.013802 0.069041
alpha = 0.0001
val 0.0178 0.079376 0.04099 0.08606 3.651646 0.305319 0.039011 0.076156 0.022725 0.07986
test 0.68584 0.136622 6.773182 0.361393 0.026785 0.087962 0.948731 0.229538 0.561786 0.134032
n = 6 train 0.01183 0.062111 0.011481 0.060487 0.011833 0.061141 0.013353 0.063427 0.008481 0.050556
![Page 45: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/45.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
45 | 48
alpha = 0.001
val 0.043859 0.103746 34979.07 23.51521 0.024383 0.077785 0.050186 0.090673 1.55E+11 35027.31
test 0.016609 0.07247 0.019926 0.089675 0.039758 0.098815 0.055275 0.096958 1.55E+11 34715.07
n = 6 train 0.019362 0.087428 0.013551 0.068685 0.017024 0.079305 0.014105 0.069016 0.014513 0.069707
alpha = 0.01 val 0.017647 0.075421 0.04683 0.099036 0.012347 0.069604 0.103855 0.108822 0.023274 0.088962
test 0.0459 0.096467 0.018369 0.075734 0.024005 0.092684 1805.368 5.361991 0.023459 0.093225
n = 6 train 0.016744 0.081191 0.017691 0.0831 0.020386 0.088581 0.018429 0.084722 0.012505 0.064371
alpha = 0.1 val 0.01954 0.094201 0.017086 0.076794 0.012777 0.066012 3096.751 5.011297 5607.02 6.711211
test 0.020749 0.088845 0.019006 0.088198 0.015769 0.078559 3013.917 4.910588 5536.854 6.659998
n = 6 train 0.018037 0.079071 0.019907 0.088897 0.02162 0.092011 0.017523 0.081126 0.022249 0.091206
alpha = 1.0 val 0.018428 0.087384 0.021778 0.088144 0.031287 0.108192 9.21E+10 26819.36 0.015165 0.072403
test 0.027445 0.100244 0.019981 0.085649 0.024872 0.09794 1.91E+11 54403.28 0.021349 0.088959
n = 6 train 0.027123 0.111539 0.025641 0.105909 0.026595 0.109379 0.026169 0.107388 0.026102 0.1022
alpha = 10.0 val 0.031227 0.119803 0.029473 0.114165 0.023019 0.0965 0.029692 0.120703 0.033426 0.118668
test 0.028239 0.110188 0.031058 0.110033 0.037256 0.131595 0.031141 0.114818 0.029376 0.112211
n = 6 train 0.050794 0.166453 0.060148 0.177994 0.052125 0.171638 0.061166 0.188352 0.057851 0.176533
alpha = 100.0
val 0.062454 0.182276 0.060689 0.180774 0.052156 0.169597 0.040089 0.157784 0.074394 0.195952
test 0.04814 0.164845 0.05851 0.176955 0.060201 0.181926 0.049647 0.171865 0.0667 0.18545
n = 6 train 0.075092 0.205366 0.068469 0.194513 0.059653 0.179116 0.076625 0.207724 0.06071 0.179714
alpha = 1000.0
val 0.071898 0.207647 151.1709 1.286638 0.066725 0.189774 0.308376 0.248046 0.074627 0.202684
test 0.065874 0.199929 157.1688 1.283495 0.075903 0.199592 0.278549 0.214486 0.097159 0.221041
n = 7 train 0.010749 0.058839 0.011419 0.06088 0.015132 0.071702 0.01306 0.065735 0.01049 0.058113
alpha = 0.0001
val 0.028521 0.079695 0.166377 0.101929 0.017527 0.073203 0.721494 0.199142 0.423545 0.159125
test 0.027315 0.081156 0.159773 0.137411 0.262904 0.119848 0.210422 0.126822 0.037862 0.094739
n = 7 train 0.012526 0.062602 0.015342 0.071769 0.012705 0.063121 0.011827 0.062205 0.0145 0.070842
alpha = 0.001
val 0.025265 0.080667 0.027148 0.084999 0.018394 0.080301 0.025622 0.096158 0.135312 0.121799
test 0.026493 0.086414 0.390133 0.134968 0.026735 0.088921 0.092463 0.096304 0.012391 0.058566
n = 7 train 0.017949 0.081525 0.017245 0.08161 0.015678 0.075631 0.012719 0.067773 0.014448 0.07086
alpha = 0.01 val 0.028054 0.086731 0.014179 0.07018 0.022522 0.085828 0.027675 0.093749 5.72E+12 211343.8
test 0.029 0.099946 0.031039 0.089627 0.027534 0.09201 0.091285 0.113806 5.65E+12 212415
n = 7 train 0.01561 0.075193 0.018445 0.082444 0.015134 0.072775 0.018321 0.084158 0.012942 0.070096
alpha = 0.1 val 0.022241 0.087604 0.023372 0.085434 2082987 130.2058 0.019641 0.085848 1.28E+18 1.42E+08
test 0.020846 0.082755 0.020401 0.085897 1.1E+12 130648 0.018009 0.082755 3578391 171.4596
n = 7 train 0.023999 0.095434 0.018764 0.08141 0.02136 0.089114 0.024119 0.095406 0.02017 0.085465
alpha = 1.0 val 0.023198 0.089591 0.023225 0.097206 0.019586 0.084045 0.01355 0.071197 0.016161 0.079094
test 0.023458 0.098776 0.020841 0.081252 0.022316 0.092507 0.014596 0.077687 0.024975 0.097208
n = 7 train 0.033343 0.126892 0.029253 0.115064 0.0312 0.117827 0.0262 0.108463 0.032914 0.121996
alpha = 10.0 val 0.031656 0.125361 0.028266 0.117386 0.036001 0.127538 0.023792 0.106726 0.027658 0.120991
test 0.03109 0.115821 0.024261 0.106753 0.024323 0.109131 0.032477 0.115733 0.035073 0.127273
n = 7 train 0.051785 0.16669 0.049318 0.16261 0.060001 0.181963 0.049626 0.159155 0.057555 0.177815
alpha = 100.0
val 0.063649 0.177076 5.982728 0.489701 0.044285 0.16042 0.05673 0.174137 0.044808 0.156522
test 0.047206 0.162407 0.054478 0.171775 0.046184 0.16142 0.061017 0.177544 0.048476 0.164297
n = 7 train 0.055065 0.176643 0.062936 0.181414 0.063053 0.187492 0.072537 0.202079 0.071422 0.204206
alpha = 1000.0
val 0.074099 0.195726 0.073129 0.204946 2.73E+09 6911.944 2.22419 0.32012 0.067765 0.191719
test 0.084173 0.207194 1724.585 6.380832 2.945306 0.335084 2.297418 0.359539 0.079253 0.205518
n = 8 train 0.010397 0.056033 0.010113 0.052401 0.010942 0.057502 0.012714 0.063997 0.012735 0.06329
![Page 46: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/46.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
46 | 48
alpha = 0.0001
val 1.524263 0.225304 0.109166 0.089839 0.017698 0.073623 3.757129 0.237081 0.153219 0.12182
test 0.461029 0.131691 0.14588 0.1236 0.037231 0.094525 0.18442 0.101424 0.488 0.18458
n = 8 train 0.010883 0.057796 0.013864 0.067495 0.012621 0.063036 0.01409 0.06751 0.011949 0.060368
alpha = 0.001
val 0.029136 0.094976 0.107303 0.120941 0.022338 0.085119 0.036957 0.083333 0.406399 0.133629
test 0.929395 0.173486 0.028483 0.0753 0.061401 0.103314 0.027729 0.083762 0.032147 0.0904
n = 8 train 0.016677 0.07723 0.013467 0.066934 0.017482 0.079698 0.015709 0.076244 0.01608 0.078532
alpha = 0.01 val 63120338 993.259 0.035146 0.105941 0.044989 0.087664 2.35E+09 4280.962 0.027098 0.093139
test 0.060145 0.087643 0.134283 0.11967 0.018641 0.080802 3.53E+09 7299.06 0.076807 0.113363
n = 8 train 0.017494 0.082515 0.014845 0.073746 0.021261 0.091348 0.016858 0.079014 0.013993 0.067853
alpha = 0.1 val 0.021219 0.093433 372319.2 71.4042 0.023427 0.096335 0.025399 0.101736 7804229 348.5521
test 0.019574 0.083347 56639.08 21.08045 0.02105 0.086349 0.031305 0.105501 3107981 155.3758
n = 8 train 0.02064 0.081014 0.021534 0.088781 0.02005 0.085588 0.019754 0.084074 0.017458 0.076913
alpha = 1.0 val 0.018765 0.08662 2.79E+08 1475.266 0.024451 0.094251 0.023341 0.098949 0.025534 0.096132
test 0.032267 0.107623 2.68E+08 1440.428 0.017563 0.078115 0.019851 0.084559 0.021773 0.096084
n = 8 train 0.029557 0.118009 0.028543 0.113815 0.030106 0.113798 0.029184 0.112573 0.027249 0.112699
alpha = 10.0 val 0.024905 0.109119 0.043229 0.135357 0.027932 0.110589 0.026179 0.111588 0.03729 0.124039
test 0.027518 0.112641 0.022212 0.107277 0.031654 0.122449 0.026377 0.115792 0.019052 0.097286
n = 8 train 0.05058 0.164944 0.05556 0.172133 0.060467 0.18161 0.053908 0.169366 0.059213 0.183095
alpha = 100.0
val 0.062231 0.183147 0.091223 0.216382 0.061272 0.185828 0.05957 0.178322 0.039309 0.160299
test 0.05261 0.167706 0.052347 0.164788 0.070269 0.197425 0.042717 0.158111 0.080524 0.198308
n = 8 train 0.073642 0.202237 0.065454 0.183037 0.063962 0.185698 0.07599 0.203016 0.071087 0.19456
alpha = 1000.0
val 0.051617 0.178667 0.068983 0.190812 0.089923 0.216465 3.927166 0.385992 0.072653 0.19727
test 0.058266 0.191746 0.087156 0.217635 0.08373 0.198633 3.940423 0.370255 0.08674 0.211365
n = 9 train 0.014946 0.06937 0.012175 0.060427 0.01088 0.061175 0.012665 0.061952 0.010195 0.056877
alpha = 0.0001
val 0.106884 0.098405 0.101691 0.127666 0.061348 0.098394 0.190068 0.105469 0.055084 0.104641
test 0.728727 0.170569 0.262514 0.125584 0.315191 0.159311 0.356836 0.153115 0.022646 0.075654
n = 9 train 0.012998 0.068151 0.01198 0.062919 0.014194 0.072325 0.015892 0.075954 0.013433 0.064401
alpha = 0.001
val 0.455013 0.16174 0.078079 0.11744 0.27075 0.141264 0.137592 0.099273 0.058807 0.09608
test 0.080894 0.093003 0.036223 0.096685 0.027821 0.09113 0.09541 0.106901 0.033713 0.083317
n = 9 train 0.01522 0.072385 0.01358 0.069702 0.018648 0.08118 0.014494 0.070909 0.015748 0.077357
alpha = 0.01 val 0.021803 0.087341 0.025085 0.094096 0.023285 0.086045 0.032052 0.085754 0.015866 0.077182
test 0.090766 0.097009 0.022487 0.085945 0.030537 0.099202 0.027309 0.097063 0.025703 0.097104
n = 9 train 0.021576 0.089086 0.013729 0.071039 0.018325 0.08145 0.022035 0.095312 0.018478 0.081868
alpha = 0.1 val 0.030003 0.100235 0.023995 0.092562 0.026632 0.099749 0.018946 0.087228 0.019003 0.083434
test 0.02862 0.089745 0.029922 0.101217 0.032863 0.10671 0.014275 0.076949 0.020413 0.091038
n = 9 train 0.018121 0.084413 0.020509 0.087653 0.021267 0.084078 0.021379 0.086792 0.022746 0.092503
alpha = 1.0 val 2.36E+10 14275.42 0.02467 0.097303 0.020696 0.090207 1.72E+10 11745.59 0.018954 0.07666
test 2.28E+10 13296.43 0.017909 0.079696 0.018628 0.084776 2.12E+17 65919633 0.015569 0.078677
n = 9 train 0.027182 0.112708 0.03101 0.120364 0.029257 0.112726 0.028248 0.113878 0.02536 0.107209
alpha = 10.0 val 0.029755 0.114105 0.023732 0.103456 0.026832 0.114268 0.032881 0.11968 0.036308 0.117753
test 0.026148 0.10823 0.025041 0.113541 0.023495 0.103027 0.017814 0.098367 0.025957 0.108069
n = 9 train 0.051402 0.168199 0.05158 0.169355 0.05172 0.175401 0.055115 0.176543 0.057452 0.180286
alpha = 100.0
val 6.83E+08 2563.929 1.43E+14 1058290 0.059799 0.178887 3.63E+10 23838.08 1.88E+08 1363.189
test 6.78E+08 2292.086 4.42E+17 82859950 0.05779 0.178441 2.24E+10 13171.98 1.78E+08 1174.863
n = 9 train 0.06319 0.184597 0.075651 0.2063 0.069995 0.200417 0.055433 0.16743 0.063644 0.187161
![Page 47: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/47.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
47 | 48
alpha = 1000.0
val 0.055491 0.175636 0.071921 0.193008 0.071216 0.197009 0.085269 0.212155 0.066784 0.197219
test 0.071378 0.198235 0.06092 0.191866 0.049062 0.173333 0.065013 0.181954 0.067569 0.189894
n = 10 train 0.015841 0.071885 0.010363 0.057149 0.011905 0.061156 0.011177 0.059743 0.010658 0.057841
alpha = 0.0001
val 0.212763 0.115264 1.309118 0.207853 2.673601 0.268988 0.174967 0.143714 2.885116 0.288172
test 1.13194 0.203071 0.08916 0.106506 0.020254 0.07115 0.02861 0.082645 0.179602 0.109063
n = 10 train 0.012443 0.065184 0.012612 0.063399 0.013126 0.067395 0.015412 0.071014 0.013063 0.064179
alpha = 0.001
val 0.022002 0.082607 0.056825 0.092752 0.074383 0.091808 0.860144 0.185729 0.085512 0.111972
test 0.110592 0.113537 0.122826 0.104185 0.01992 0.07758 0.235734 0.123818 0.1972 0.113749
n = 10 train 0.020196 0.084173 0.014959 0.074741 0.012246 0.065196 0.01932 0.082704 0.013961 0.070639
alpha = 0.01 val 0.020366 0.080377 0.021243 0.08034 7.01E+10 26715.22 0.026992 0.095196 0.078345 0.132204
test 0.083721 0.128667 0.101527 0.124841 2.03E+09 3967.591 0.02962 0.087434 1.41E+09 3310.999
n = 10 train 0.017348 0.079596 0.020947 0.089738 0.017217 0.080167 0.018634 0.082284 0.02127 0.09324
alpha = 0.1 val 0.020863 0.088337 0.015854 0.078259 1.99E+09 3941.994 0.030761 0.095965 0.019531 0.086822
test 0.02325 0.092945 0.094677 0.114108 0.019443 0.08501 0.032264 0.107205 0.01871 0.08157
n = 10 train 0.019026 0.079847 0.019973 0.083813 0.019956 0.086916 0.019946 0.080994 0.021985 0.086563
alpha = 1.0 val 0.047158 0.12477 0.020536 0.088086 0.022056 0.082202 0.028593 0.101043 0.027086 0.096144
test 0.024105 0.088725 1.03E+09 2823.259 6.58E+08 2258.785 8.63E+08 2585.916 0.021432 0.091195
n = 10 train 0.029758 0.115585 0.029203 0.115985 0.031338 0.119071 0.028993 0.11228 0.032186 0.119464
alpha = 10.0 val 0.02105 0.106601 0.023044 0.104 0.035399 0.1295 0.028686 0.115765 0.031461 0.119017
test 0.030242 0.124689 0.033298 0.121518 0.022486 0.11002 0.041338 0.132218 0.034269 0.115712
n = 10 train 0.049398 0.162865 0.058701 0.18114 0.05608 0.172344 0.047166 0.154939 0.055018 0.176167
alpha = 100.0
val 0.057391 0.176662 0.056534 0.178086 9.81E+21 8.75E+09 0.057154 0.172014 0.096479 0.189442
test 0.063222 0.178187 0.064331 0.180708 0.069199 0.190883 0.065177 0.180319 2.05E+27 3.98E+12
n = 10 train 0.071145 0.199554 0.073934 0.201463 0.065234 0.191843 0.069928 0.191679 0.060581 0.181691
alpha = 1000.0
val 0.076052 0.204504 6347694 222.8693 2.75E+22 1.47E+10 0.063953 0.175413 7.46E+08 2414.072
test 4403017 184.9439 0.077051 0.202072 6519396 224.9915 0.087514 0.220131 0.068117 0.196761
![Page 48: D4.4 Predictive Energy Production and Demand Algorithms Deliverables/D4_4.pdf · 2020. 11. 7. · D.4.4 Predictive Energy Production and Demand Algorithms 10 | 48 2. DATA MINING FOR](https://reader033.fdocuments.in/reader033/viewer/2022061004/60b2d4b3d96763091f03cd25/html5/thumbnails/48.jpg)
WP4: ICT enabled cooperative Demand Response model
D.4.4 Predictive Energy Production and Demand Algorithms
48 | 48
REFERENCES
1 Cao, X., Dai, X., & Liu, J. (2016). Building energy-consumption status worldwide and the state-of-the-art technologies for zero-energy buildings during the past decade. Energy and buildings, 128, 198-213. 2 Abergel, T., Dean, B., & Dulac, J. (2017). Global Status Report 2017: Towards a zero-emission, efficient, and resilient buildings and construction sector. United Nations Environment Programme, 48. 3 https://ec.europa.eu/eurostat/statistics-explained/index.php/Energy_consumption_in_households 4 Energy, S. P., & Heat, G. (2013). Transition to Sustainable Buildings Strategies and Opportunities to 2050 International Energy Agency Buildings are the largest energy consuming sector in the world, and account for over one-third of total final energy consumption and an equally important source of carbon dioxide (CO2) emissions. Achieving significant energy and emissions reduction in the buildings sector is a challenging but achievable policy goal. Transition to Sustainable Buildings presents detailed scenarios and strategies to 2050. 5 Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996, August). Knowledge Discovery and Data Mining: Towards a Unifying Framework. In KDD (Vol. 96, pp. 82-88). 6 Smil, V. (2017). Energy Transitions: Global and National Perspectives. & BP Statistical Review of World
Energy. OurWorlfibData. org/fossil-fuels/CC BY-SA. 7 Antonanzas, J., Osorio, N., Escobar, R., Urraca, R., Martinez-de-Pison, F. J., & Antonanzas-Torres, F. (2016).
Review of photovoltaic power forecasting. Solar Energy, 136, 78-111. 8 Das, U. K., Tey, K. S., Seyedmahmoudian, M., Mekhilef, S., Idris, M. Y. I., Van Deventer, W., ... & Stojcevski, A.
(2018). Forecasting of photovoltaic power generation and model optimization: A review. Renewable and Sustainable Energy Reviews, 81, 912-928. 9 Bacher, P., Madsen, H., & Perers, B. (2011). Short-term solar collector power forecasting. In proceedings of ISES
Solar World Conference. 10 Chumpolrat, K., Sangsuwan, V., Udomdachanut, N., Kittisontirak, S., Songtrai, S., Chinnavornrungsee, P., ... &
Sriprapha, K. (2014). Effect of ambient temperature on performance of grid-connected inverter installed in Thailand. International Journal of Photoenergy, 2014. 11 Duffie, J. A., Beckman, W. A., & Blair, N. (2020). Solar Engineering of Thermal Processes, Photovoltaics and Wind.
John Wiley & Sons. 12 Son, H., & Kim, C. (2017). Short-term forecasting of electricity demand for the residential sector using weather and
social variables. Resources, conservation and recycling, 123, 200-207. 13 Tascikaraoglu, A., Boynuegri, A. R., & Uzunoglu, M. (2014). A demand side management strategy based on forecasting of residential renewable sources: A smart home system in Turkey. Energy and Buildings, 80, 309-320. 14 Lusis, P., Khalilpour, K. R., Andrew, L., & Liebman, A. (2017). Short-term residential load forecasting: Impact of calendar effects and forecast granularity. Applied Energy, 205, 654-669. 15 Zhang, X. M., Grolinger, K., & Capretz, M. A. Forecasting Residential Energy Consumption Using Support Vector
Regressions. 16 Tascikaraoglu, A., & Sanandaji, B. M. (2016). Short-term residential electric load forecasting: A compressive spatio-
temporal approach. Energy and Buildings, 111, 380-392. 17 Lusis, P., Khalilpour, K. R., Andrew, L., & Liebman, A. (2017). Short-term residential load forecasting: Impact of
calendar effects and forecast granularity. Applied Energy, 205, 654-669. 18 Maltais, L. G., & Gosselin, L. (2019). Predicting Domestic Hot Water Demand Using Machine Learning for Predictive
Control Purposes. In Multidisciplinary Digital Publishing Institute Proceedings (Vol. 23, No. 1, p. 6).