Corrected ISA Paper

8/8/2019 Corrected ISA Paper

1/15


2/15

INTRODUCTION

Fuel cells convert chemical energy to electric energy. Recent global economic and political conditions

raise the desirability of PEM fuel cells as an alternative residential power source due to their reliance on

readily available fuels such as natural gas, propane or bottled hydrogen. Unfortunately, commercialviability remains elusive due mainly to the prohibitively short system life. While the industry is fast at

work characterizing and mitigating the causes of system failure, comparatively little attention has been

paid to the possibility of extending system life through more efficient operation. Predictive models ofresidential power usage are the key element of the control systems that will drive efficient operation. To

maximize efficiency these models must adapt to seasonality and the changing habits of users.

PEM fuel cells create electric power using a three-stage process. In the first stage fuel is converted topure hydrogen by the reformer. Auto-Thermal Reforming is a commonly used process in which the fuel

is broken down into hydrogen and carbon monoxide through the introduction of steam in the presence of

a catalyst.

FIGURE 1 FUEL CELL BLOCK DIAGRAM

The power-producing element of the fuel cell system is the stack. The stack consists of a number of

individual fuel cells stacked up in series. Each cell contains a membrane electrode assembly (MEA)

between two conducting plates. The MEA consists of a fuel electrode (anode) and an oxidant electrode(cathode) separated by an ion conducting membrane. When hydrogen gas is introduced into the system

on the anode side, the catalyst surface of the membrane splits hydrogen gas molecules into protons and

electrons. The protons pass through the membrane to react with oxygen on the cathode side (forming

water). The electrons, which cannot pass through the membrane, must travel around it, thus creating thesource of DC electricity. The power inverter converts the DC power to AC. Additionally the fuel cell

has a cooling system and water management system. Fuel cells do not handle power transitions

instantaneously so batteries are used to handle surges.

INTELLIGENT CONTROL APPROACH

In the absence of usage profiles, the fuel cell must operate continuously and the batteries must be kept at

a fairly high state of charge to provide power quality comparable to the grid. To avoid brownouts a


3/15

minimum amount of power must be generated at all times. The excess energy produced can provide

heat if necessary and the power can be returned to the grid in some locations. Clearly, however this

prolonged operation adds unnecessary run hours to the stack and accelerates end of life.

Reliable information about homeowner power usage habits would provide the opportunity to idle the

system or even shut it down in some cases. Additionally, the batteries could be held at a lower state ofcharge and recharged at times compatible with low power operation. All of this would result in reduced

system run hours, less wear and prolonged system life. This is the driving force behind the development

of predictive models of residential power use for intelligent, adaptive control systems.

The general approach is to collect power usage information for a period of time on board the fuel cell,

and use the on-board model to predict a usage profile for the next day. Periods of low power use would

be identified and if sufficiently long the system can be shut down or idled. Periods of high usage couldbe anticipated by increasing the battery state charge.

PREDICTIVE MODELING OF RESIDENTIAL POWER USAGE

LOAD FORECASTING

Load forecasting is a method used by electric power providers to predict how much electricity a

specified group will demand for a given time. The current state-of-the-art in load forecasting studies the

activities of both the residential and commercial communities as well as weather factors in order to base

predictions. The factors that are most interesting to forecasters in addition to the actual load are time ofday, day of the week (weekend versus midweek), holiday/non-holiday, ambient temperature, dew point,

wind speed, and cloud cover, among other weather-related variables. In the stationary fuel cellapplication however, it is necessary to build the model based on only the activity of the single home thatthe system is providing energy for. Moreover the physical location of the system will not always be

outside the home, where it could provide ambient temperature or other weather related information.

Therefore in this application, any possible model could only have previous load values and calendartime as inputs.

RESIDENTIAL POWER USAGE AS A STOCHASTIC PROCESS

The manner in which electric power is consumed in a home is unquestionably a random process. Evensmall homes have refrigerator compressors, furnaces or air conditioners that may require power at any

time during the day depending on weather, the number of current occupants in the home and a number

of other factors. Figure 3 shows the power usage of a home over the course of a week and it is clear thatwhile there is randomness in the process, both the mean and to a lesser extent the variance change in

time (the process is not homogenous).


4/15

FIG. 2 RESIDENTIAL ELECTRIC POWER USAGE PROFILE

Initial attempts to model this data were based on the assumption that residential power usage is a

Markov process that is continuous in time but discrete in state. The load data was discretized by placingit in bins. Matrix quantities and load transition probabilities can be calculated as follows:

( ) tiijj et,x|t,x = (1)

Where is a matrix transition probabilities from load i to load j in time t = tj-ti. = matrix of transition probability rates for all i and j

This model was sufficient for system engineering analysis and design, as it was fairly straightforward to

apply Monte Carlo techniques and create simulated profiles. However, to estimate the required matrices

for a useful number of load increments computational became difficult even for one home. More

difficult was the evaluation of the matrix exponential (matrix power series).

A more careful look at residential power usage suggests that it is more accurately modeled as acontinuous time, continuous state Markov random walk. In other words, it is fair to say that for all

practical purposes load usage is a continuous variable, even in a small home given the variable load

requirements of blowers, compressors and pumps, among other things. Fortunately, continuous-time,

continuous-state Markov processes are governed by the laws of Geometric Brownian Motion (GBM),fairly well characterized in the literature in a large number of stock market applications.

GBM may be described by the following stochastic differential equation with L(t) representing thechange in residential power usage in time:

dZtLdttLtdL )()()( += (2)Where L(t) = power load at time t

= drift

= volatility

dZ = Weiner increment = N(0,1)d t

N(0,1) = Standard Normal Distribution


5/15

Dividing through by L(t) and applying Itos Lemma [Ito, 1944] with F(L(t),t) = ln(L(t)):

dZdt2

dF2

+

= (3)

This stochastic differential equation has an explicit solution that lends itself well to Monte Carlo

simulation:

( )( )

+

=t10Nt

2

0

2

eLtL,

(4)

With historical data the literature [1] recommends a simple method for estimating the parameters

)/( 22

and . Calculated from the data:

ttLtLt 1iii

))(ln())(ln()(

= . (5)

Where: = the instantaneous drift

The parameter

=

2

2

, the mean of the data, and is just the standard deviation of all . Later,

GBM will be applied to the analysis of residential usage and demonstrate a method for modeling the

volatility and drift parameters as a function of time (inputs).

RESIDENTIAL POWER USAGE AND ARTIFICIAL NEURAL NETWORKS

It was found through a background literature search that the most common method of electric power

load forecasting is by means of artificial neural networks (ANN). ANN has some major advantagesover other forecasting tools: it can model with high accuracy a data set that is nonlinear and interactive

by learning the general patterns associated between the input(s) and the expected output(s). ANNs are

usually composed of three layers: an input layer, a middle or hidden layer (although there could beseveral), and an output layer. Each input element to the model is connected to each neuron contained in

the hidden layer. In turn, the hidden layer is then connected to the output neuron(s). This type of

network in which all elements flow in one direction from inputs to outputs is called a feedforward

network. It is through these interconnections that ANN can have high accuracy when modelingnonlinear functions.

The output of the neural network is a matrix equation. Lying at each connection between neurons is aweight, or value, that assesses the strength of the input relative to the output. The product of the weights

and inputs is then passed through an activation function, whose output serves as the input to the hidden


6/15

layer neurons. These values are then combined and passed through another activation function at the

output neuron to produce the final output value.

FIG. 4 BASIC ANN ARCHITECTURE

NETWORK TRAINING

In order to determine the weights at each neuron, the ANN needs to be trained. Training is the process

by which the weights and biases are optimized to minimize the overall error of the network. To train the

network, the training set, which is comprised of input values paired with their respective target value, ispassed through the network. There are certain precautions to be aware of prior to training and they are

primarily related to proper generalization; the concept behind training is for the network to learn the

general relationship between the input and output values. Therefore, a large data set that isrepresentative of the sample space needs to be used. For example, the ideal load data used in training

would represent all load usage characteristics related to the home. If too small a data set has been used

during training that is not representative of the entire sample space, the network will not learn thegeneral pattern. It will then perform poorly during simulation or use on board the system.

Prior to training, the weight and bias values at all nodes are randomly initialized. The inputs are passed

through the network to produce an output, which is then compared to a target value. Depending uponthe error between the output and the target the network weights and biases are adjusted. This process

continues until the weights and biases produce a minimum performance error.

PREDICTION INTERVALS FOR NEURAL NETWORKS

Point predictions with neural networks are subject to the same type of uncertainty questions as

regression or any other modeling tool. It is therefore desirable to characterize the uncertainty of the

prediction with some type of prediction interval. The width of the interval would be an integral part of

the intelligent control algorithm. Unfortunately, unlike regression, standard methods for predictioninterval estimation are not readily available for neural networks and are still the subject of debate. An

added complication in our application is the fact that residential power usage is a stochastic process and

both the mean value and the variance change in time.

The total variability of neural network predictions like all model predictions can be thought of as having

a model uncertainty component mS2 and a noise component )(S v x2 . The general approach is to

estimate the model uncertainty by characterizing the change in network performance with respect tochanges in the network weights. The noise component can be estimated using a separate network that

models the variance as a function of the inputs.


7/15

Bishop [10] has proposed an estimate of the model uncertainty that makes use of the Jacobian and

Hessian matrices, calculated as part of backpropagation training algorithms used in this work. TheJacobian, J, is the matrix of the first derivatives of the network errors with respect to the weights and

biases. The Hessian, H, is the matrix of second derivatives. The inverse of the Hessian is regarded as an

unbiased estimate of the variance/covariance matrix with respect to network weights and biases.

The performance gradient is first estimated:

EJg T= (6)

Where g = gradient of the error functionE = Network Error (difference between the actual and predicted load values)

With equation 6 the model uncertainty, mS2 , can now be estimated:

gHg 1T = mS2 (7)

From a separate neural network (or additional layer) with exponential activation function the noise

component n)(S x2

is estimated as a function of the input vector. Hwang and Ding [8] suggest the

following prediction interval:

mnv)k)d(n,/(n S)(St)(y 2112211 1+ ++ xx (8)

Where y = the predicted response to the input set xn+1t = the students t distribution

n = number of training points,d = the number of input variables

k = the total number of estimated weights & biases

RESIDENTIAL ELECTRIC POWER LOAD DATA AND NETWORK INPUTS

The data were instantaneous power usage from multiple homes taken at 15 minute intervals over a

calendar year. The data were fairly high quality but did contain some zero values that generally

occurred in sequences of 3 to 20. Banks and Carson [4] recommend a Monte Carlo approach forhandling sequential missing data in a time series. The missing sequences were filled using a sequence of

normal random number based on the mean and standard deviation of the previous five values.

The inputs to the network in this application are a combination of calendar times and previous loadvalues. As part of data preprocessing the correlation matrix was calculated and the input values retained

that were correlated less than 0.5.


8/15


9/15

TABLE 1 ROBUST OPTIMIZATION EXPERIMENT

NETWORK OPTIMIZATION RESULTS

The results graphed in figure 5 below show the effects of the number of nodes, epochs, training periods,

and the type of training algorithm used on the signal-to-noise ratio and mean R2 value. The most robustnetwork architecture will be that which maximizes the signal to noise ratio for all control factors.

Clearly, the number of nodes required for the most robust network would be either seven or eleven.

This supports the idea that a small amount of neurons in the hidden layer is important for generalization

of the function. Too many neurons can result in overfitting, and this explains why the network with 19nodes had the lowest signal to noise ratio of the simulation set. When overfitting occurs, the network

has memorized the relationship between input and output of the training set, instead of learning it. Anetwork that has good performance during training but poor performance during simulation hasmemorized and modeled the noise components of the training set. When inputs are passed through such

a network, the results will possess a high amount of error. Therefore the precautions to take prior to

training are to verify the data set is large enough, as well as representative of behavior. Conversely, ifthere arent enough neurons, then the network is not flexible enough to generalize the function. This

may be why eleven nodes slightly outperformed seven nodes.

Likewise, too many epochs allowed the network to begin memorizing the relationship between input andoutput, rather than only learning it. The training period that went back nine weeks performed the best

since there was more data for the network to learn from. This longer data period contained more

examples of the time-load relationships and patterns that would reappear in the simulation set.


10/15

FIG. 5 SNR AND MEAN R2 FOR NETWORK ARCHITECTURE OPTIMIZATION

The training algorithms tested also had some effect on signal-to-noise ratio. The trainbfg algorithm

slightly outperformed the trainlm algorithm [9]. Both algorithms are considered iterative, meaning

they continue training until the error function reaches a minimum. If the error begins to increase,training ends. One problem with this approach is if the algorithm has reached a local minimum, the

error would have to increase in order to continue training to find the global minimum. The Levenberg-

Marquardt (trainlm) and quasi-Newton (trainbfg) algorithms do not allow this, so the performancewould be caught at a local minimum. The same applies for saddlepoints, or very flat areas on the error

plane. This is why the performance of the training algorithms is subject to the initial choice of weights

and biases [10]. Performance also depends however on the speed with which it converges. Since the L-

M algorithm moves faster (or takes steeper steps) towards convergence, it would be more likely to getstuck at a saddlepoint or a local minimum than the quasi-Newton algorithm. This may account for the

improved performance of trainbfg.

The logs-sigmoid training function outperformed the hyperbolic tangent function if only slightly. One

reason may be that the log-sigmoid activation function does not allow for a negative value output. No

load value should be negative, but since the load values used were in kilowatts, there could be timeswhere the house is drawing low power on the order of watts. By restricting the output to only positive

values, negative network outputs do not have to be corrected by the values of the weights and biases,

and instead they are denied by the activation functions. This also applies to the linear and positive linearactivation functions. The final output value does not have to be corrected by the weights and biases

prior to being passed through the activation functions.


11/15

CONFIRMATION EXPERIMENTS

It was concluded then from network optimization experiment, that a three-layer feedforward network

should be used with seven nodes at the hidden layer, a logsig activation function at the hidden layer,

and a poslin transfer function at the output neuron. The data set used for training should go back nineweeks and the network training algorithm should be the quasi-Newton backpropogation algorithm.

Finally, There should be 150 epochs for training.


12/15

FIG. 7 ANN RESULTS FOR ONE HOUR AHEAD PREDICTION (SIMULATION)

Figure 7 displays the results from load predictions for one day from a home in the west during the month

of May and another home from the south east in the summer. Prediction intervals were calculated as

described above.Table 2 displays the training and simulation R2 values for four homes.

TABLE 2 NETWORK RESULTS R VALUE OF ACTUAL VS PREDICTED

Southeast home

Winter

West home

Spring

Southeast home

Summer

West home

Fall

Training MSE 0.64 0.94 0.78 0.796

Simulation MSE 0.78 0.94 0.749 0.96

The optimized network architecture has very good prediciton accuary over a range of homes andseasons.

STOCHASTIC MODELING OF RESIDENTIAL LOAD DATA:

It has been demonstrated that residential power usage can be modeled and predicted with optimized

neural networks to a surprising level of accuracy and reproducibility. This fact suggests that although

the data appear noisy on first glance, the apparent randomness is probably due to multiple repeating andinteracting patterns in the data, not recognizable to the human eye. To further illustrate a GBM

simulation of the load data is examined.

Figure 8 below compares a GBM simulated profile with actual load data for a day. The simulation was

conducted using equation 4. The drift and volatility parameters are constants, estimated from the

previous days data.

FIG. 8 LOAD PROFILE WITH STATIC GBM SIMULATION


13/15

It should be no surprise that this static GBM simulation does a poor job predicting residential loadusage. By definition the drift and volatility parameters are constants. Even with accurate estimates the

model is not capable of accommodating anything more than linear changes in drift and volatility. The

question is raised however as to what extent the prediction accuracy of a GBM simulation could beimproved if both the drift and volatility parameters could be modeled as a function of time. This

approach has been characterized by [1] and [2], using empirical, auto regressive methods to model the

drift and volatility.

With neural network modeling tools readily available it seems prudent to attempt to use them for drift

and volatility modeling and assess the prediction accuracy of this dynamic GBM simulation in

comparison to results outlined above. The parameter was calculated for the load data (equation 5) ofa particular home as well as a five point moving drift and volatility.

Volatilities are variances and as such are 2 distributed. The actual distribution of the volatilities was

sufficiently close to exponential such that a log transformation rendered it nearly normal. The same feedforward back propagation type training was used to model both the moving drift and the moving log of

the volatilities. Equation 4 was used with the drift and volatility parameters replaced by the networks.The simulation R-values for the modeled drift and volatility when compared to actual data were 0.98

and 0.74 respectively.

FIG. 9 LOAD PROFILE WITH DYNAMIC GBM SIMULATION

In figure 9 it is evident that patterns begin to emerge in the predictions (the R value of these predictions

is about 0.5 for reference). The scale and magnitudes of the predictions are not yet close to the actual

data however the accuracy is considerably improved over the static drift and volatility predictions above.

Though more effort could be undertaken to optimize these predictions it hardly seems necessary giventhe prediction accuracy and generalization of the neural network models of the raw data.


14/15

CONCLUSIONS:

It was demonstrated in this paper that residential power usage profiles can be accuratetely modeled

using both neural networks and geometric brownian motion with dynamic models for the drift and

volatility parameters. This alone is a significant step toward predictive, adaptive control systems thatmay one day be used to extend fuel cell system life. The neural network approach was far suprior to the

GBM even when neural networks themselves were used to model the drift and volatility parameters as a

function of time.

An effective method of optimizating network architecture parameters was demonstrated based on

Taguchi robust design methodologies. The results of the analysis were predictable and grounded in

neural network theory.

Going forward it will be our task to identify methods for determining optimimal starting weights for the

network to avoid convergence to local minimum. Adpative training will also be studied as well as

alternative network types (radial basis, sequential, elman).


15/15

REFERENCE:

1. Hull, J. & A. White,The pricing of option assets with stochastic volatilities, Journal of Finance

42(2), 1997, pp-281-300.2. Levy, G., An introduction to GARCH models I Finance, Financial Engineering News, 2001,

22.

3. Ingleby, M & Onyango, S (2002), Robust estimation of historical volatility by a HoughTransform, Masters Thesis, University of Huddersfield.

4. Banks, J., Carson, J., Random Variate Generation, Discreet Event System Simulation, Prentice

Hall, Englewood Cliffs, New Jersey, 1984.

5. Taguchi, G., Chowdhury, S., Taguchi, S., Robust Engineering Process Formula, RobustEngineering, McGraw Hill, New York, New York, 2000.

6. Klebaner, F., Brownian Motion Calculus, Introduction to Stochastic Calculus with

Applications, Imperial College Press, London, England, 2001.

7. Papadopoulos, G., Edwards, P.J., Murray, A.F., Confidence Estimation for Neural Networks: APractical Comparison, Department of Electronics and Electrical Engineering, University of

Edinburgh, Public Domain.8. Hwang, G., Ding, A.A., Prediction Intervals for Artificial Neural Networks, Journal of the

American Statistical Association, Vol. 92, No. 438, pp 748-757.

9. Demuth, H., Beale, Mark, Back Propagation, Neural Network Toolbox For Use with Matlab,The Mathworks, 2001.

10. Bishop, C.M., Parameter Optimization Algorithms, Neural Networks for Pattern Recognition,

Clarendon Press, Oxford, 1995.

11. Hagan, M.T., Demuth, H.B., Beale, M.H., Performance Optimization, Neural Network Design,PWS Publishing Company, Boston, MA, 1996

12. Lee, K.Y., Cha, Y.T., Park, J.H., Short-term Load Forecasting Using An Artificial NeuralNetwork, IEEE Transactions On Power Systems. 7(1), 1992, pp124-132.

Corrected ISA Paper

Documents

Transcript of Corrected ISA Paper