Corrected ISA Paper
Transcript of Corrected ISA Paper
-
8/8/2019 Corrected ISA Paper
1/15
-
8/8/2019 Corrected ISA Paper
2/15
INTRODUCTION
Fuel cells convert chemical energy to electric energy. Recent global economic and political conditions
raise the desirability of PEM fuel cells as an alternative residential power source due to their reliance on
readily available fuels such as natural gas, propane or bottled hydrogen. Unfortunately, commercialviability remains elusive due mainly to the prohibitively short system life. While the industry is fast at
work characterizing and mitigating the causes of system failure, comparatively little attention has been
paid to the possibility of extending system life through more efficient operation. Predictive models ofresidential power usage are the key element of the control systems that will drive efficient operation. To
maximize efficiency these models must adapt to seasonality and the changing habits of users.
PEM fuel cells create electric power using a three-stage process. In the first stage fuel is converted topure hydrogen by the reformer. Auto-Thermal Reforming is a commonly used process in which the fuel
is broken down into hydrogen and carbon monoxide through the introduction of steam in the presence of
a catalyst.
FIGURE 1 FUEL CELL BLOCK DIAGRAM
The power-producing element of the fuel cell system is the stack. The stack consists of a number of
individual fuel cells stacked up in series. Each cell contains a membrane electrode assembly (MEA)
between two conducting plates. The MEA consists of a fuel electrode (anode) and an oxidant electrode(cathode) separated by an ion conducting membrane. When hydrogen gas is introduced into the system
on the anode side, the catalyst surface of the membrane splits hydrogen gas molecules into protons and
electrons. The protons pass through the membrane to react with oxygen on the cathode side (forming
water). The electrons, which cannot pass through the membrane, must travel around it, thus creating thesource of DC electricity. The power inverter converts the DC power to AC. Additionally the fuel cell
has a cooling system and water management system. Fuel cells do not handle power transitions
instantaneously so batteries are used to handle surges.
INTELLIGENT CONTROL APPROACH
In the absence of usage profiles, the fuel cell must operate continuously and the batteries must be kept at
a fairly high state of charge to provide power quality comparable to the grid. To avoid brownouts a
-
8/8/2019 Corrected ISA Paper
3/15
minimum amount of power must be generated at all times. The excess energy produced can provide
heat if necessary and the power can be returned to the grid in some locations. Clearly, however this
prolonged operation adds unnecessary run hours to the stack and accelerates end of life.
Reliable information about homeowner power usage habits would provide the opportunity to idle the
system or even shut it down in some cases. Additionally, the batteries could be held at a lower state ofcharge and recharged at times compatible with low power operation. All of this would result in reduced
system run hours, less wear and prolonged system life. This is the driving force behind the development
of predictive models of residential power use for intelligent, adaptive control systems.
The general approach is to collect power usage information for a period of time on board the fuel cell,
and use the on-board model to predict a usage profile for the next day. Periods of low power use would
be identified and if sufficiently long the system can be shut down or idled. Periods of high usage couldbe anticipated by increasing the battery state charge.
PREDICTIVE MODELING OF RESIDENTIAL POWER USAGE
LOAD FORECASTING
Load forecasting is a method used by electric power providers to predict how much electricity a
specified group will demand for a given time. The current state-of-the-art in load forecasting studies the
activities of both the residential and commercial communities as well as weather factors in order to base
predictions. The factors that are most interesting to forecasters in addition to the actual load are time ofday, day of the week (weekend versus midweek), holiday/non-holiday, ambient temperature, dew point,
wind speed, and cloud cover, among other weather-related variables. In the stationary fuel cellapplication however, it is necessary to build the model based on only the activity of the single home thatthe system is providing energy for. Moreover the physical location of the system will not always be
outside the home, where it could provide ambient temperature or other weather related information.
Therefore in this application, any possible model could only have previous load values and calendartime as inputs.
RESIDENTIAL POWER USAGE AS A STOCHASTIC PROCESS
The manner in which electric power is consumed in a home is unquestionably a random process. Evensmall homes have refrigerator compressors, furnaces or air conditioners that may require power at any
time during the day depending on weather, the number of current occupants in the home and a number
of other factors. Figure 3 shows the power usage of a home over the course of a week and it is clear thatwhile there is randomness in the process, both the mean and to a lesser extent the variance change in
time (the process is not homogenous).
-
8/8/2019 Corrected ISA Paper
4/15
FIG. 2 RESIDENTIAL ELECTRIC POWER USAGE PROFILE
Initial attempts to model this data were based on the assumption that residential power usage is a
Markov process that is continuous in time but discrete in state. The load data was discretized by placingit in bins. Matrix quantities and load transition probabilities can be calculated as follows:
( ) tiijj et,x|t,x = (1)
Where is a matrix transition probabilities from load i to load j in time t = tj-ti. = matrix of transition probability rates for all i and j
This model was sufficient for system engineering analysis and design, as it was fairly straightforward to
apply Monte Carlo techniques and create simulated profiles. However, to estimate the required matrices
for a useful number of load increments computational became difficult even for one home. More
difficult was the evaluation of the matrix exponential (matrix power series).
A more careful look at residential power usage suggests that it is more accurately modeled as acontinuous time, continuous state Markov random walk. In other words, it is fair to say that for all
practical purposes load usage is a continuous variable, even in a small home given the variable load
requirements of blowers, compressors and pumps, among other things. Fortunately, continuous-time,
continuous-state Markov processes are governed by the laws of Geometric Brownian Motion (GBM),fairly well characterized in the literature in a large number of stock market applications.
GBM may be described by the following stochastic differential equation with L(t) representing thechange in residential power usage in time:
dZtLdttLtdL )()()( += (2)Where L(t) = power load at time t
= drift
= volatility
dZ = Weiner increment = N(0,1)d t
N(0,1) = Standard Normal Distribution
-
8/8/2019 Corrected ISA Paper
5/15
Dividing through by L(t) and applying Itos Lemma [Ito, 1944] with F(L(t),t) = ln(L(t)):
dZdt2
dF2
+
= (3)
This stochastic differential equation has an explicit solution that lends itself well to Monte Carlo
simulation:
( )( )
+
=t10Nt
2
0
2
eLtL,
(4)
With historical data the literature [1] recommends a simple method for estimating the parameters
)/( 22
and . Calculated from the data:
ttLtLt 1iii
))(ln())(ln()(
= . (5)
Where: = the instantaneous drift
The parameter
=
2
2
, the mean of the data, and is just the standard deviation of all . Later,
GBM will be applied to the analysis of residential usage and demonstrate a method for modeling the
volatility and drift parameters as a function of time (inputs).
RESIDENTIAL POWER USAGE AND ARTIFICIAL NEURAL NETWORKS
It was found through a background literature search that the most common method of electric power
load forecasting is by means of artificial neural networks (ANN). ANN has some major advantagesover other forecasting tools: it can model with high accuracy a data set that is nonlinear and interactive
by learning the general patterns associated between the input(s) and the expected output(s). ANNs are
usually composed of three layers: an input layer, a middle or hidden layer (although there could beseveral), and an output layer. Each input element to the model is connected to each neuron contained in
the hidden layer. In turn, the hidden layer is then connected to the output neuron(s). This type of
network in which all elements flow in one direction from inputs to outputs is called a feedforward
network. It is through these interconnections that ANN can have high accuracy when modelingnonlinear functions.
The output of the neural network is a matrix equation. Lying at each connection between neurons is aweight, or value, that assesses the strength of the input relative to the output. The product of the weights
and inputs is then passed through an activation function, whose output serves as the input to the hidden
-
8/8/2019 Corrected ISA Paper
6/15
layer neurons. These values are then combined and passed through another activation function at the
output neuron to produce the final output value.
FIG. 4 BASIC ANN ARCHITECTURE
NETWORK TRAINING
In order to determine the weights at each neuron, the ANN needs to be trained. Training is the process
by which the weights and biases are optimized to minimize the overall error of the network. To train the
network, the training set, which is comprised of input values paired with their respective target value, ispassed through the network. There are certain precautions to be aware of prior to training and they are
primarily related to proper generalization; the concept behind training is for the network to learn the
general relationship between the input and output values. Therefore, a large data set that isrepresentative of the sample space needs to be used. For example, the ideal load data used in training
would represent all load usage characteristics related to the home. If too small a data set has been used
during training that is not representative of the entire sample space, the network will not learn thegeneral pattern. It will then perform poorly during simulation or use on board the system.
Prior to training, the weight and bias values at all nodes are randomly initialized. The inputs are passed
through the network to produce an output, which is then compared to a target value. Depending uponthe error between the output and the target the network weights and biases are adjusted. This process
continues until the weights and biases produce a minimum performance error.
PREDICTION INTERVALS FOR NEURAL NETWORKS
Point predictions with neural networks are subject to the same type of uncertainty questions as
regression or any other modeling tool. It is therefore desirable to characterize the uncertainty of the
prediction with some type of prediction interval. The width of the interval would be an integral part of
the intelligent control algorithm. Unfortunately, unlike regression, standard methods for predictioninterval estimation are not readily available for neural networks and are still the subject of debate. An
added complication in our application is the fact that residential power usage is a stochastic process and
both the mean value and the variance change in time.
The total variability of neural network predictions like all model predictions can be thought of as having
a model uncertainty component mS2 and a noise component )(S v x2 . The general approach is to
estimate the model uncertainty by characterizing the change in network performance with respect tochanges in the network weights. The noise component can be estimated using a separate network that
models the variance as a function of the inputs.
-
8/8/2019 Corrected ISA Paper
7/15
Bishop [10] has proposed an estimate of the model uncertainty that makes use of the Jacobian and
Hessian matrices, calculated as part of backpropagation training algorithms used in this work. TheJacobian, J, is the matrix of the first derivatives of the network errors with respect to the weights and
biases. The Hessian, H, is the matrix of second derivatives. The inverse of the Hessian is regarded as an
unbiased estimate of the variance/covariance matrix with respect to network weights and biases.
The performance gradient is first estimated:
EJg T= (6)
Where g = gradient of the error functionE = Network Error (difference between the actual and predicted load values)
With equation 6 the model uncertainty, mS2 , can now be estimated:
gHg 1T = mS2 (7)
From a separate neural network (or additional layer) with exponential activation function the noise
component n)(S x2
is estimated as a function of the input vector. Hwang and Ding [8] suggest the
following prediction interval:
mnv)k)d(n,/(n S)(St)(y 2112211 1+ ++ xx (8)
Where y = the predicted response to the input set xn+1t = the students t distribution
n = number of training points,d = the number of input variables
k = the total number of estimated weights & biases
RESIDENTIAL ELECTRIC POWER LOAD DATA AND NETWORK INPUTS
The data were instantaneous power usage from multiple homes taken at 15 minute intervals over a
calendar year. The data were fairly high quality but did contain some zero values that generally
occurred in sequences of 3 to 20. Banks and Carson [4] recommend a Monte Carlo approach forhandling sequential missing data in a time series. The missing sequences were filled using a sequence of
normal random number based on the mean and standard deviation of the previous five values.
The inputs to the network in this application are a combination of calendar times and previous loadvalues. As part of data preprocessing the correlation matrix was calculated and the input values retained
that were correlated less than 0.5.
-
8/8/2019 Corrected ISA Paper
8/15
-
8/8/2019 Corrected ISA Paper
9/15
TABLE 1 ROBUST OPTIMIZATION EXPERIMENT
NETWORK OPTIMIZATION RESULTS
The results graphed in figure 5 below show the effects of the number of nodes, epochs, training periods,
and the type of training algorithm used on the signal-to-noise ratio and mean R2 value. The most robustnetwork architecture will be that which maximizes the signal to noise ratio for all control factors.
Clearly, the number of nodes required for the most robust network would be either seven or eleven.
This supports the idea that a small amount of neurons in the hidden layer is important for generalization
of the function. Too many neurons can result in overfitting, and this explains why the network with 19nodes had the lowest signal to noise ratio of the simulation set. When overfitting occurs, the network
has memorized the relationship between input and output of the training set, instead of learning it. Anetwork that has good performance during training but poor performance during simulation hasmemorized and modeled the noise components of the training set. When inputs are passed through such
a network, the results will possess a high amount of error. Therefore the precautions to take prior to
training are to verify the data set is large enough, as well as representative of behavior. Conversely, ifthere arent enough neurons, then the network is not flexible enough to generalize the function. This
may be why eleven nodes slightly outperformed seven nodes.
Likewise, too many epochs allowed the network to begin memorizing the relationship between input andoutput, rather than only learning it. The training period that went back nine weeks performed the best
since there was more data for the network to learn from. This longer data period contained more
examples of the time-load relationships and patterns that would reappear in the simulation set.
-
8/8/2019 Corrected ISA Paper
10/15
FIG. 5 SNR AND MEAN R2 FOR NETWORK ARCHITECTURE OPTIMIZATION
The training algorithms tested also had some effect on signal-to-noise ratio. The trainbfg algorithm
slightly outperformed the trainlm algorithm [9]. Both algorithms are considered iterative, meaning
they continue training until the error function reaches a minimum. If the error begins to increase,training ends. One problem with this approach is if the algorithm has reached a local minimum, the
error would have to increase in order to continue training to find the global minimum. The Levenberg-
Marquardt (trainlm) and quasi-Newton (trainbfg) algorithms do not allow this, so the performancewould be caught at a local minimum. The same applies for saddlepoints, or very flat areas on the error
plane. This is why the performance of the training algorithms is subject to the initial choice of weights
and biases [10]. Performance also depends however on the speed with which it converges. Since the L-
M algorithm moves faster (or takes steeper steps) towards convergence, it would be more likely to getstuck at a saddlepoint or a local minimum than the quasi-Newton algorithm. This may account for the
improved performance of trainbfg.
The logs-sigmoid training function outperformed the hyperbolic tangent function if only slightly. One
reason may be that the log-sigmoid activation function does not allow for a negative value output. No
load value should be negative, but since the load values used were in kilowatts, there could be timeswhere the house is drawing low power on the order of watts. By restricting the output to only positive
values, negative network outputs do not have to be corrected by the values of the weights and biases,
and instead they are denied by the activation functions. This also applies to the linear and positive linearactivation functions. The final output value does not have to be corrected by the weights and biases
prior to being passed through the activation functions.
-
8/8/2019 Corrected ISA Paper
11/15
CONFIRMATION EXPERIMENTS
It was concluded then from network optimization experiment, that a three-layer feedforward network
should be used with seven nodes at the hidden layer, a logsig activation function at the hidden layer,
and a poslin transfer function at the output neuron. The data set used for training should go back nineweeks and the network training algorithm should be the quasi-Newton backpropogation algorithm.
Finally, There should be 150 epochs for training.
-
8/8/2019 Corrected ISA Paper
12/15
FIG. 7 ANN RESULTS FOR ONE HOUR AHEAD PREDICTION (SIMULATION)
Figure 7 displays the results from load predictions for one day from a home in the west during the month
of May and another home from the south east in the summer. Prediction intervals were calculated as
described above.Table 2 displays the training and simulation R2 values for four homes.
TABLE 2 NETWORK RESULTS R VALUE OF ACTUAL VS PREDICTED
Southeast home
Winter
West home
Spring
Southeast home
Summer
West home
Fall
Training MSE 0.64 0.94 0.78 0.796
Simulation MSE 0.78 0.94 0.749 0.96
The optimized network architecture has very good prediciton accuary over a range of homes andseasons.
STOCHASTIC MODELING OF RESIDENTIAL LOAD DATA:
It has been demonstrated that residential power usage can be modeled and predicted with optimized
neural networks to a surprising level of accuracy and reproducibility. This fact suggests that although
the data appear noisy on first glance, the apparent randomness is probably due to multiple repeating andinteracting patterns in the data, not recognizable to the human eye. To further illustrate a GBM
simulation of the load data is examined.
Figure 8 below compares a GBM simulated profile with actual load data for a day. The simulation was
conducted using equation 4. The drift and volatility parameters are constants, estimated from the
previous days data.
FIG. 8 LOAD PROFILE WITH STATIC GBM SIMULATION
-
8/8/2019 Corrected ISA Paper
13/15
It should be no surprise that this static GBM simulation does a poor job predicting residential loadusage. By definition the drift and volatility parameters are constants. Even with accurate estimates the
model is not capable of accommodating anything more than linear changes in drift and volatility. The
question is raised however as to what extent the prediction accuracy of a GBM simulation could beimproved if both the drift and volatility parameters could be modeled as a function of time. This
approach has been characterized by [1] and [2], using empirical, auto regressive methods to model the
drift and volatility.
With neural network modeling tools readily available it seems prudent to attempt to use them for drift
and volatility modeling and assess the prediction accuracy of this dynamic GBM simulation in
comparison to results outlined above. The parameter was calculated for the load data (equation 5) ofa particular home as well as a five point moving drift and volatility.
Volatilities are variances and as such are 2 distributed. The actual distribution of the volatilities was
sufficiently close to exponential such that a log transformation rendered it nearly normal. The same feedforward back propagation type training was used to model both the moving drift and the moving log of
the volatilities. Equation 4 was used with the drift and volatility parameters replaced by the networks.The simulation R-values for the modeled drift and volatility when compared to actual data were 0.98
and 0.74 respectively.
FIG. 9 LOAD PROFILE WITH DYNAMIC GBM SIMULATION
In figure 9 it is evident that patterns begin to emerge in the predictions (the R value of these predictions
is about 0.5 for reference). The scale and magnitudes of the predictions are not yet close to the actual
data however the accuracy is considerably improved over the static drift and volatility predictions above.
Though more effort could be undertaken to optimize these predictions it hardly seems necessary giventhe prediction accuracy and generalization of the neural network models of the raw data.
-
8/8/2019 Corrected ISA Paper
14/15
CONCLUSIONS:
It was demonstrated in this paper that residential power usage profiles can be accuratetely modeled
using both neural networks and geometric brownian motion with dynamic models for the drift and
volatility parameters. This alone is a significant step toward predictive, adaptive control systems thatmay one day be used to extend fuel cell system life. The neural network approach was far suprior to the
GBM even when neural networks themselves were used to model the drift and volatility parameters as a
function of time.
An effective method of optimizating network architecture parameters was demonstrated based on
Taguchi robust design methodologies. The results of the analysis were predictable and grounded in
neural network theory.
Going forward it will be our task to identify methods for determining optimimal starting weights for the
network to avoid convergence to local minimum. Adpative training will also be studied as well as
alternative network types (radial basis, sequential, elman).
-
8/8/2019 Corrected ISA Paper
15/15
REFERENCE:
1. Hull, J. & A. White,The pricing of option assets with stochastic volatilities, Journal of Finance
42(2), 1997, pp-281-300.2. Levy, G., An introduction to GARCH models I Finance, Financial Engineering News, 2001,
22.
3. Ingleby, M & Onyango, S (2002), Robust estimation of historical volatility by a HoughTransform, Masters Thesis, University of Huddersfield.
4. Banks, J., Carson, J., Random Variate Generation, Discreet Event System Simulation, Prentice
Hall, Englewood Cliffs, New Jersey, 1984.
5. Taguchi, G., Chowdhury, S., Taguchi, S., Robust Engineering Process Formula, RobustEngineering, McGraw Hill, New York, New York, 2000.
6. Klebaner, F., Brownian Motion Calculus, Introduction to Stochastic Calculus with
Applications, Imperial College Press, London, England, 2001.
7. Papadopoulos, G., Edwards, P.J., Murray, A.F., Confidence Estimation for Neural Networks: APractical Comparison, Department of Electronics and Electrical Engineering, University of
Edinburgh, Public Domain.8. Hwang, G., Ding, A.A., Prediction Intervals for Artificial Neural Networks, Journal of the
American Statistical Association, Vol. 92, No. 438, pp 748-757.
9. Demuth, H., Beale, Mark, Back Propagation, Neural Network Toolbox For Use with Matlab,The Mathworks, 2001.
10. Bishop, C.M., Parameter Optimization Algorithms, Neural Networks for Pattern Recognition,
Clarendon Press, Oxford, 1995.
11. Hagan, M.T., Demuth, H.B., Beale, M.H., Performance Optimization, Neural Network Design,PWS Publishing Company, Boston, MA, 1996
12. Lee, K.Y., Cha, Y.T., Park, J.H., Short-term Load Forecasting Using An Artificial NeuralNetwork, IEEE Transactions On Power Systems. 7(1), 1992, pp124-132.