TRAININGANDSOURCECODEGENERATIONFORARTIFICIAL …CHAPTER 1 BackgroundInformation...
Transcript of TRAININGANDSOURCECODEGENERATIONFORARTIFICIAL …CHAPTER 1 BackgroundInformation...
TRAINING AND SOURCE CODE GENERATION FOR ARTIFICIAL
NEURAL NETWORKS
BY
BRANDON WINRICH
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
IN
COMPUTER SCIENCE
UNIVERSITY OF RHODE ISLAND
2015
MASTER OF SCIENCE THESIS
OF
BRANDON WINRICH
APPROVED:
Thesis Committee:
Major Professor
DEAN OF THE GRADUATE SCHOOL
UNIVERSITY OF RHODE ISLAND
2015
ABSTRACT
The ideas and technology behind artificial neural networks have advanced con-
siderably since their introduction in 1943 by Warren McCulloch and Walter Pitts.
However, the complexity of large networks means that it may not be computation-
ally feasible to retrain a network during the execution of another program, or to
store a network in such a form that it can be traversed node by node. The purpose
of this project is to design and implement a program that would train an artificial
neural network and export source code for it so that the network may be used in
other projects.
After discussing some of this history of neural networks, I explain the math-
ematical principals behind them. Two related training algorithms are discussed:
backpropagation and RPROP. I also go into detail about some of the more useful
activation functions.
The actual training portion of the project was not self implemented. Instead,
a third party external library was used: Encog, developed by Heaton Research.
After analyzing how Encog stores the weights of the network, and how the network
is trained, I discuss how I used several of the more important classes. There are
also details of the slight modifications I needed to make to one of the classes in
the library.
The actual implementation of the project consists of five classes, all of which
are discussed in the fourth chapter. The program has two inputs by the user (a
config file and a training data set), and returns two outputs (a training error report
and the source code).
The paper concludes with discussions about additional features that may be
implemented in the future. Finally, an example is given, proving that the program
works as intended.
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . iii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
CHAPTER
1 Background Information . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Predecessors to ANNs . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Perceptron . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 What is an Artificial Neural Network? . . . . . . . . . . . . . . 3
1.3 Justification for Study . . . . . . . . . . . . . . . . . . . . . . . 5
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Mathematical Elements of Networks . . . . . . . . . . . . . . . 7
2.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Backpropagation . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Resilient Propagation . . . . . . . . . . . . . . . . . . . . 9
2.2 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Linear . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Sigmoid . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Hyperbolic Tangent . . . . . . . . . . . . . . . . . . . . . 12
2.2.4 Elliott . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
iii
Page
iv
3 Java library Encog . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 How Encog Stores Weights . . . . . . . . . . . . . . . . . . . . . 16
3.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Some Individual Classes . . . . . . . . . . . . . . . . . . . . . . 20
3.4.1 TrainingSetUtil . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.2 BasicNetwork . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.3 FlatNetwork . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.4 BasicLayer . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4.5 BasicMLDataSet . . . . . . . . . . . . . . . . . . . . . . 25
3.4.6 BasicMLDataPair . . . . . . . . . . . . . . . . . . . . . . 26
3.4.7 ActivationFunction . . . . . . . . . . . . . . . . . . . . . 26
4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Assumptions/Design Choices . . . . . . . . . . . . . . . . . . . . 28
4.3 Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.1 Config File . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Training Data Set . . . . . . . . . . . . . . . . . . . . . . 32
4.4 Outputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.4.1 Training Error Report . . . . . . . . . . . . . . . . . . . 33
4.4.2 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Individual classes . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5.1 NeuralGenerator . . . . . . . . . . . . . . . . . . . . . . 35
Page
v
4.5.2 LayerInfo . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.5.3 OutputWriter . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5.4 OutputWriterTxt . . . . . . . . . . . . . . . . . . . . . . 38
4.5.5 OutputWriterJava . . . . . . . . . . . . . . . . . . . . . . 39
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Future work and conclusions . . . . . . . . . . . . . . . . . . . . 42
5.1 Categorical classification . . . . . . . . . . . . . . . . . . . . . . 42
5.2 Additional output formats . . . . . . . . . . . . . . . . . . . . . 42
5.3 Non-command line inputs . . . . . . . . . . . . . . . . . . . . . 43
5.4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
APPENDIX
Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.1 NeuralGenerator.java . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2 LayerInfo.java . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
A.3 OutputWriter.java . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.4 OutputWriterTxt.java . . . . . . . . . . . . . . . . . . . . . . . . 71
A.5 OutputWriterJava.java . . . . . . . . . . . . . . . . . . . . . . . 77
A.6 TrainingSetUtil.java (modified) . . . . . . . . . . . . . . . . . . 84
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
LIST OF TABLES
Table Page
1 Average number of required epochs . . . . . . . . . . . . . . . . 11
2 Elliott vs TANH . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vi
LIST OF FIGURES
Figure Page
1 McCulloch-Pitts model of a neuron . . . . . . . . . . . . . . . . 1
2 An example of a neural network . . . . . . . . . . . . . . . . . . 4
3 An example of a neural network with bias nodes . . . . . . . . . 5
4 The two sides of a computing unit[1] . . . . . . . . . . . . . . . 7
5 Extended network for the computation of the error function[1] . 8
6 Result of the feed-forward step[1] . . . . . . . . . . . . . . . . . 9
7 Backpropagation path up to output unit j[1] . . . . . . . . . . . 9
8 Linear activation function . . . . . . . . . . . . . . . . . . . . . 11
9 Sigmoid activation function . . . . . . . . . . . . . . . . . . . . 12
10 Hyperbolic tangent activation function . . . . . . . . . . . . . . 13
11 Comparison between Elliott (solid) and sigmoid (dotted) acti-vation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
12 Comparison between Symmetric Elliott (solid) and hyperbolictangent (dotted) activation functions . . . . . . . . . . . . . . . 14
13 Neural network with labeled weight indexes . . . . . . . . . . . 16
14 BasicNetwork.getWeight() . . . . . . . . . . . . . . . . . . . . . 17
15 Class hierarchy for training . . . . . . . . . . . . . . . . . . . . 18
16 Comparison between modified and original code . . . . . . . . . 22
17 Sample from first output file (.txt) . . . . . . . . . . . . . . . . 33
18 Sample from first output file (.csv) . . . . . . . . . . . . . . . . 33
19 Sample second output file (.txt) . . . . . . . . . . . . . . . . . . 39
20 Sample second output file (.java) . . . . . . . . . . . . . . . . . 41
vii
Figure Page
viii
21 test.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
22 output1.csv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
23 graph of output1.csv . . . . . . . . . . . . . . . . . . . . . . . . 45
24 Results from NeuralGenerator.java . . . . . . . . . . . . . . . . 46
25 TestModule.java . . . . . . . . . . . . . . . . . . . . . . . . . . 46
26 Results from output2.java . . . . . . . . . . . . . . . . . . . . . 46
27 Sample config file . . . . . . . . . . . . . . . . . . . . . . . . . . 47
28 output2.java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
CHAPTER 1
Background Information
1.1 Predecessors to ANNs
The history of most neural network research can be traced back to the efforts
of Warren McCulloch and Walter Pitts. In their 1943 paper ‘A logical Calculus
of Ideas Immanent in Nervous Activity’[1], McCulloch and Pitts introduced the
foundation of a neuron, a single piece of the nervous system, which would respond
once a certain threshold had been reached. This model of a neuron is still used
today.
Figure 1: McCulloch-Pitts model of a neuron
The input values to a neuron are all given individual weights. These weighted
values are summed together and fed into a threshold function: every value greater
than 0 returns a value of 1, and all other values return 0.
In 1949, neuropsychologist Donald Hebb published his book ‘The Organization
of Behavior’. Hebb postulated that “When an axon of cell A is near enough to
excite a cell B and repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells such that A’s efficiency,
as one of the cells firing B, is increased.” [2]. Hebbian learning influenced research
in the field of machine learning, especially in the area of unsupervised learning.
1
In 1958, Frank Rosenblatt developed the Perceptron. More information on
this will be presented in the following subsection.
In 1959, Bernard Widrow and Marcian Hoff developed a working model they
called ADALINE (ADAptive LINEar), as well as a more advanced version known
as MADALINE (Multiple ADAptive LINEar)[3]. These models were some of the
first to be applied to real world problems (such as eliminating echoes on phone
lines), and may still be in use today.[4]
Breakthroughs in neural network research declined starting in 1969, when
Marvin Minsky and Seymour Papert published their book ‘Perceptrons: an Intro-
duction to Computational Geometry’. In this book, Minsky and Papert claimed
Rosenblatt’s perceptron wasn’t as promising as it was originally believed to be.
For example, it was unable to correctly classify an XOR function. While this book
did introduce some new ideas about neural networks, it also contributed to what
was known as ‘the dark age of connectionism’ or an AI winter, as there was a lack
of major research for over a decade.
Interest in artificial networks declined, and the focus of the community
switched to other models such as support vector machines. [5]
1.1.1 Perceptron
In his 1958 paper, Frank Rosenblatt considered 3 questions[6]:
1. How is information about the physical world sensed, or detected, by the
biological system?
2. In what form is information stored, or remembered?
3. How does information contained in storage, or in memory, influence recogni-
tion and behavior?
2
The first question wasn’t addressed as he believed it “is in the province of
sensory physiology, and is the only one for which appreciable understanding has
been achieved.” The other two questions became the basis of his concept of a
perceptron (which he compared to the retina in an eye).
A perceptron functions as a single neuron, accepting weighted inputs and an
unweighted bias, the output of which is passed to a transfer function. This step
function evaluates to 1 if the value is positive, and either 0 or -1 if the value is
negative (the exact value may vary depending on the model). After iterations with
a learning algorithm, the perceptron calculates a decision surface to classify a data
set into two categories.
The perceptron learning algorithm is as follows[7]:
1. Initialize the weights and threshold to small random numbers.
2. Present a pattern vector (x1, x2, ..., xn)t and evaluate the output of the neu-
ron.
3. Update the weights according to wj(t + 1) = wj(t) + η(d − y)xj, where d is
the desired output, t is the iteration number, and η (0.0 < η < 1.0) is the
gain (step size).
Steps two and three are repeated until the data set has been properly classified.
Unfortunately, due to the nature of the perceptron, it will only work with data
that is linearly separable.
1.2 What is an Artificial Neural Network?
An artificial neural network (sometimes referred to as an ANN, or just a neural
network) is a machine learning model inspired by biological neural networks (such
as the central nervous system).
3
Neural networks fall under the supervised learning paradigm. In supervised
learning, the network is presented with pairs of data, input and output. The goal
is to be able to map the input to the output, training in a way that minimizes the
error between the actual output and the desired output. More information may
be found in section 2.1.
Each piece of the network is known as a neuron. Neural networks still use the
McCulloch-Pitts model of a neuron (see figure 1). The inputs into a node are the
values from the previous layer (or the input values, if the layer in question is the
input layer). Each value is multiplied by the associated weight, and then those
products are summed together (∑wixi). Rather than being fed into a threshold
function, an activation function is used. This allows for a wider ranger of potential
output values, instead of just 0 or 1. The output value of the neuron can be used
as the input into the next layer, or as the output for the network.
Neural networks consist of at least three layers: an input layer, at least one
hidden layer, and an output layer:
Figure 2: An example of a neural network
It is also possible to have bias nodes. These nodes hold a constant value (often
+1), and act only as an input to another neuron (they do not have any inputs or
activation functions associated with themselves).
4
Figure 3: An example of a neural network with bias nodes
1.3 Justification for Study
My main interest in machine learning pertains to the realm of video games.
Artificial intelligence is an important aspect of every game (except those which
are exclusively multiplayer, with no computer-controlled agents). 5-60% of the
CPU is utilized by AI-related processes, and this number has been known climb as
high as 100% for turn-based strategy games[8]. While some modern games utilize
machine learning, most of this is done before the game is published (rather than
the training occurring during runtime). According to Charles and McGlinchey,
“online learning means that the AI learns (or continues to learn) whilst the end
product is being used, and the AI in games is able to adapt to the style of play
of the user. Online learning is a much more difficult prospect because it is a real-
time process and many of the commonly used algorithms for learning are therefore
not suitable.”[9]
The project that I have completed focuses on generating the source code for an
artificial neural network, which is directly applicable to the field of gaming. With
5
the actual training occurring during the development phase, it makes sense to have
a program that can create the network, separate from the rest of the project. The
source code that it outputs then allows the network to be used within the context
of a game. The other benefit of such a program is that it allows the neural network
to be used without having to maintain the structure of the network. Reducing the
results of the network down to mathematical formulas results in faster computation
times than having to walk through the nodes of a network (as stored in multiple
classes or data structures). The results of this project have been tested in a Quake
II environment.
List of References
[1] W. McCulloch and W. Pitts, “A logical calculus of the ideas immanent innervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp.115–133, 1943. [Online]. Available: http://dx.doi.org/10.1007/BF02478259
[2] D. Hebb, “The organization of behavior; a neuropsychological theory.” 1949.
[3] B. Widrow, M. E. Hoff, et al., “Adaptive switching circuits.” 1960.
[4] C. Clabaugh, D. Myszewski, and J. Pang. “History: The 1940’s to the1970’s.” [Online]. Available: https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
[5] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,vol. 20, no. 3, pp. 273–297, 1995. [Online]. Available: http://dx.doi.org/10.1007/BF00994018
[6] F. Rosenblatt, “The perceptron: a probabilistic model for information storageand organization in the brain.” Psychological review, vol. 65, no. 6, p. 386, 1958.
[7] A. K. Jain, J. Mao, and K. Mohiuddin, “Artificial neural networks: A tutorial,”1996.
[8] L. Galway, D. Charles, and M. Black, “Machine learning in digital games: asurvey,” Artificial Intelligence Review, vol. 29, no. 2, pp. 123–161, 2008.
[9] D. Charles and S. Mcglinchey, “The past, present and future of artificial neu-ral networks in digital games,” Proceedings of the 5th international conferenceon computer games: artificial intelligence, design and education, pp. 163–169,2004.
6
CHAPTER 2
Mathematical Elements of Networks
2.1 Training2.1.1 Backpropagation
The concept of the backpropagation algorithm was first developed by Paul
Werbos in his 1974 PhD thesis, ‘Beyond Regression: New Tools for Prediction and
Analysis in the Behavioral Sciences’.
Much of the math in this section comes from Raúl Rojas’ book ‘Neural Net-
works - A Systematic Introduction’[1]
The backpropagation algorithm works by using the method of gradient decent.
In order to do this, it needs to use an activation function which is differentiable.
This is a change from the perceptron, which used a step function. One of the more
popular activations functions is the sigmoid function. This and other alternatives
will be explored in section 2.2.
In order to make the calculations easier, each node is considered in two sep-
arate parts. Rojas calls this a B-diagram (or backpropagation diagram). As seen
in figure 4, the right side calculates the output from the activation function, while
the left side computes the derivative.
Figure 4: The two sides of a computing unit[1]
Rather than calculating the error function separately, the neural network is
extended with an additional layer used to calculate the error internally (as seen
7
Figure 5: Extended network for the computation of the error function[1]
in figure 5). The equation for the error function is E = 12
p∑i=1||oi − ti||2, where oi
is the output value from node i, and ti is the target value. Keeping in mind the
separation of the nodes as previously mentioned, the derivative calculated in the
left portion will be (oi − ti).
The backpropagation algorithm consists of four steps:
1. Feed-forward computation
2. Backpropagation to output layer
3. Backpropagation to hidden layer(s)
4. Weight updating
In the first step, the algorithm is processed in a straight forward manner, with
the output from one node being used as the input to the next node, as seen in
figure 6.
Generally speaking, backpropagation retraces through the network in reverse.
Since the network is being run backwards, we evaluate using the left side of the
node (the derivative). Instead of outputs being used as the the inputs to the next
node, outputs from a node are multiplied by the output of previous nodes.
8
Figure 6: Result of the feed-forward step[1]
Figure 7: Backpropagation path up to output unit j[1]
We extended the network to calculate the error function, so for the output
layer we use that derivative as an input, as seen in figure 7.
Backpropagation for the hidden layer(s) acts in the same way, using the values
from the output layer as its input.
The final step is weight updating. The formula for updating the weight wij
(the weight between node i and node j) is ∆wij = −γoiδj, where γ is the learning
rate, oi is the output from node i, and δj is the error from node j.
A possible variation is the inclusion of a momentum variable η. This can help
make the learning rate more stable: ∆wij(t) = −γoiδj + η∆wij(t− 1)
2.1.2 Resilient Propagation
A promising alternative to backpropagation is resilient propagation (often
referred to as RPROP), originally proposed by Martin Riedmiller and Heinrich
9
Braun in 1992. Instead of updating the weights based on how large the partial
derivative of the error function is, the weights are updated based on whether the
partial derivative is positive or negative.
First, the change for each weight is updated based on if the derivative has
changed signs. If such a change has occurred, that means the last update was too
large, and the algorithm has passed over a local minimum. To counter this, the
update value will be decreased. If the sign stays the same, then the update value
is increased.
∆(t)ij =
η+ ∗∆(t−1)
ij , if δEδwij
(t−1) ∗ δEδwij
(t)> 0
η− ∗∆(t−1)ij , if δE
δwij
(t−1) ∗ δEδwij
(t)< 0
∆(t−1)ij , else
where 0 < η− < 1 < η+.
Typically, η+ is assigned a value of 1.2, and η− is assigned a value of 0.5.
Once the update value is determined, the sign of the current partial derivative
is considered. In order to bring the error closer to 0, the weight is decreased if the
partial derivative is positive, and increased if it is negative.
∆w(t)ij =
−∆(t)
ij , if δEδwij
(t)> 0
+∆(t)ij , if δE
δwij
(t)< 0
0, elseAt the end of each epoch, all of the weights are updated:
w(t+1)ij = w
(t)ij + ∆w(t)
ij
The exception to this rule is if the partial derivative has changed signs, then
the previous weight change is reversed. According to Reidmiller and Braun, “due to
that ‘backtracking’ weight-step, the derivative is supposed to change its sign once
again in the following step. In order to avoid a double punishment of the update-
value, there should be no adaptation of the update-value in the succeeding step.”
∆w(t)ij = −∆w(t−1)
ij , if δEδwij
(t−1) ∗ δEδwij
(t)< 0
In most cases, the update value is limited to a specific range, with an upper
limit of ∆max = 50.0 and a lower limit of ∆min = 1e−6.
Reidmiller and Braun provide tested RPROP against several other popular
10
algorithms: backpropagation (BP), SuperSAB (SSAB), and Quickprop (QP)[2]:
Problem 10-5-10 12-2-12 9 Men’s Morris Figure Rec.BP (best) 121 >15000 98 151
SSAB (best) 55 534 34 41QP (best) 21 405 34 28
RPROP (std) 30 367 30 29RPROP (best) 19 322 23 28
Table 1: Average number of required epochs
2.2 Activation Functions2.2.1 Linear
One of the simpler activation functions is the linear function:
f(x) = x
−5 −4 −3 −2 −1 1 2 3 4 5
−5−4−3−2−1
12345
x
y
Figure 8: Linear activation function
This activation is very simple, and isn’t used very often. The input is directly
transferred to the output without being modified at all. Therefore, the output
range is R. The derivative of this activation function is f ′(x) = 1.
A variation on this is the ramp activation function. This function has an
upper and lower threshold, where all values below the lower threshold are assigned a
certain value and all values above the upper threshold are assigned a different value
11
(0 and 1 are common). The result is something similar to the step function used
in the perceptron, but with a linear portion in the middle instead of a disjuncture.
2.2.2 Sigmoid
One of the more common activation functions is the sigmoid function. A
sigmoid function maintains a shape similar to the step function used in perceptrons
(with horizontal asymptotes at 0 at 1). However, the smooth curve of the sigmoid
means that it is a differentiable function, so it can be used in backpropagation
(which requires an activation function to have a derivative).
f(x) = 11+e−x
−5 −4 −3 −2 −1 1 2 3 4 5
0.25
0.5
0.75
1
x
y
Figure 9: Sigmoid activation function
The output range of this activation function 0 to 1. The derivative of this
activation function is f ′(x) = f(x) ∗ (1− f(x)).
2.2.3 Hyperbolic Tangent
The hyperbolic tangent function has a similar shape to the sigmoid function.
However, its lower horizontal asymptote is at -1 instead of 0. This may be more
useful with some data sets, where use of a sigmoid activation function does not
12
produce any negative numbers.
f(x) = tanh(x) = e2x−1e2x+1
−5 −4 −3 −2 −1 1 2 3 4 5
−1
−0.5
0.5
1
x
y
Figure 10: Hyperbolic tangent activation function
The output range of this activation function is -1 to 1. The derivative of this
activation function is f ′(x) = 1− f(x) ∗ f(x).
2.2.4 Elliott
Elliott activation functions were originally proposed by David L. Elliott in
1993 as more computationally effective alternatives to the sigmoid and hyperbolic
tangent activation functions.[3]
Encog provides two such activation functions: Elliott and Symmetric Elliott.
In all of the cases below, s is the slope, which has a default value of 1 (although
this can be changed).
The Elliott activation function serves as an alternative to the sigmoid activa-
tion function:
f(x) = 0.5(x∗s)1+|x∗s| + 0.5
Just as the sigmoid activation function, this produces an output range of 0 to
1. The derivative of this activation function is f ′(x) = s2∗(1+|x∗s|)2
13
−5 −4 −3 −2 −1 1 2 3 4 5
0.25
0.5
0.75
1
x
y
Figure 11: Comparison between Elliott (solid) and sigmoid (dotted) activationfunctions
The Symmetric Elliott activation functions serves as an alternative to the
hyperbolic tangent activation function:
f(x) = x∗s1+|x∗s|
−5 −4 −3 −2 −1 1 2 3 4 5
−1
−0.5
0.5
1
x
y
Figure 12: Comparison between Symmetric Elliott (solid) and hyperbolic tangent(dotted) activation functions
Just as the hyperbolic tangent activation function, this produces an output
range of -1 to 1. The derivative of this activation function is f ′(x) = s(1+|x∗s|)2
14
Heaton Research (the company that makes the Encog library) provided some
interesting statistics on the efficiency of this activation function[4]:
Activation Function Total Training Time Avg Iterations NeededTANH 6,168ms 474
ElliottSymmetric 2,928ms 557
Table 2: Elliott vs TANH
While the Symmetric Elliot required more iterations of training in order to
reach the desired error, the time it took for each training iteration was much less
than the hyperbolic tangent, resulting in the network being trained in effectively
half the time. Although computational power has increased considerably since
David L. Elliott first proposed these activation functions (earlier versions of Encog
approximated the value of the hyperbolic tangent because it was faster than Java’s
built-in TANH function), they can still be useful for training large networks and/or
data sets.
According to the javadoc comments for the two classes, these activation func-
tions approach their horizontal asymptotes more slowly than their traditional coun-
terparts, so they “might be more suitable to classification tasks than predictions
tasks”.
List of References
[1] R. Rojas, “The backpropagation algorithm,” in Neural Networks. Springer,1996, pp. 149–182.
[2] M. Riedmiller and H. Braun, “A direct adaptive method for faster backprop-agation learning: The rprop algorithm,” in Neural Networks, 1993., IEEE In-ternational Conference on. IEEE, 1993, pp. 586–591.
[3] D. L. Elliott, “A better activation function for artificial neural networks,” 1993.
[4] J. Heaton. “Elliott activation function.” September 2011. [Online]. Available:http://www.heatonresearch.com/wiki/Elliott_Activation_Function
15
CHAPTER 3
Java library Encog
3.1 Overview
Encog is an external machine learning library. Originally released in 2008,
Encog is developed by Heaton Research (run by Jeff Heaton). The current Java
version is 3.3 (released on October 12, 2014). Encog is released under an Apache
2.0 license.
3.2 How Encog Stores Weights
In their simplest form, Encog stores the weights for a neural network in an
array of doubles inside a FlatNetwork object.
Figure 13: Neural network with labeled weight indexes
As seen in Figure 13, the order of the weights is determined by a combination
of the reversed order of the layers and the regular order of the nodes (with the
biases being the last node in a layer, if applicable). For example, the network in
Figure 13 consists of an input layer of 2 nodes and a bias, a hidden layer of 3 nodes
16
and a bias, and an output layer of 1 node. The first 4 weights in the array are
the weights going from the hidden layer to the output layer (weights[0] connects
h1n0 to o0, weights[1] connects h1n1 to o0...). The next 3 weights connect the
input layer to the first hidden node (weights[4] connects i0 to h1n0, weights[5]
connects i1 to h1n0...). This continues in this fashion until the final weight in the
array, weights[12], which connects the input bias node to the last regular hidden
node (i2 to h1n2).
To access all of the weights at once, the BasicNetwork class provides a
dumpWeights() method. It may also be useful to use the weightIndex array from
the FlatNetwork, which indicates where in the weights array each layer starts.
Alternately, the BasicNetwork class has a getWeight() method, which allows a
user to access the weight from one specific node to another. This is the method
that I utilized in my implementation:
1 /**2 * Get the weight between the two layers.3 * @param fromLayer The from layer.4 * @param fromNeuron The from neuron.5 * @param toNeuron The to neuron.6 * @return The weight value.7 */8 public double getWeight(final int fromLayer,9 final int fromNeuron,
10 final int toNeuron) {11 this.structure.requireFlat();12 validateNeuron(fromLayer, fromNeuron);13 validateNeuron(fromLayer + 1, toNeuron);14 final int fromLayerNumber = getLayerCount() - fromLayer - 1;15 final int toLayerNumber = fromLayerNumber - 1;1617 if (toLayerNumber < 0) {18 throw new NeuralNetworkError(19 ’’The specified layer is not connected to another layer: ’’20 + fromLayer);21 }2223 final int weightBaseIndex24 = this.structure.getFlat().getWeightIndex()[toLayerNumber];25 final int count26 = this.structure.getFlat().getLayerCounts()[fromLayerNumber];27 final int weightIndex = weightBaseIndex + fromNeuron28 + (toNeuron * count);2930 return this.structure.getFlat().getWeights()[weightIndex];31 }
Figure 14: BasicNetwork.getWeight()
17
3.3 Training
Encog has several different ways to train networks. For the purpose of this
project, we will focus on on propagation training.
MLTrain
BasicTraining
Propagation
Backpropagation ResilientPropagation
Figure 15: Class hierarchy for training
As seen in figure 15, training in Encog utilizes several different classes.
Each method of training that is utilized has its own class (Backpropagation and
ResilientPropagation), with most of the work being done in the parent class
Propagation. There are other forms of training available, so Propagation extends
the BasicTraining class, and all forms of training must implement the MLTrain
interface.
Most of the training is done through the Propagation.iteration() method,
which calls several helper methods. There are two different versions of this method:
a default version and a version that accepts the number of iterations as a parameter.
In order to do a single iteration, the default form of the method calls the alternate
version and passes 1 as a parameter.
The first method to be invoked is BasicTraining.preIteration(). This method
increments a counter called iteration, which keeps track of the current iteration.
It also calls upon the preIteration() method for any strategies that may be in use.
Strategies are additional methods of training that may be used to enhance
18
the performance of a training algorithm. The ResilientPropagation class doesn’t
use any, but the Backpropagation class allows for the use of two strategies:
SmartLearningRate and SmartMomentum. These strategies will be used to attempt
to calculate the learning rate and momentum if they have not been specified upon
creation of the network. However, since both of these variables are assigned val-
ues by the user (with a default momentum of 0 if the use of that variable is not
desired), training strategies are not used in the implementation of this project.
The next method to be invoked is Propagation.rollIteration(). However, the
use of this method is superfluous. While the BasicTraining class has a variable
which keeps track of the current iteration, the Propagation class has its own copy
of that same variable (rather than inheriting the value from its parents class.
The rollIteration() method increments this duplicate variable. Unfortunately,
where the variable in the BasicTraining class is utilized in accessor and mutator
methods, the same variable in the Propagation class is not used anywhere outside
of the rollIteration() method.
Following this is the Propagation.processPureBatch() method (large data
sets may want to make use of the processBatches() method, which uses a por-
tion of the training set rather than the entire thing). This in turn calls upon
Propagation.calculateGradients() and Propagation.learn().
Propagation.calculateGradients() iterates through the network and calculates
the gradient at each portion of the network (for more information, see section 2.1.1).
This is done through the GradientWorker class. The advantage of this is that it
allows for multithreaded calculations. Different portions of the network that don’t
rely on each other (for example, nodes in the same layer do not have any weights
connecting them) can be calculated in parallel using an array of GradientWorkers.
This project only uses single threaded calculations, so the array has a size of 1.
19
Propagation.learn() uses the gradients to update the weights for the network.
Different algorithms update weights in different ways (see sections 2.1.1 and 2.1.2
for more information), so this is an abstract method, with each child class having
its own implementation.
The last method to be used is BasicTraining.postIteration(). This method
calls upon the postIteration() method for any strategies if applicable. The
ResilientPropagation class has its own postIteration() method, which stores the
error in the lastError variable, because RPROP uses this to check for sign changes
in the error during subsequent iterations.
3.4 Some Individual Classes
The following sections will go into detail about how I used some of the classes
from the Encog library. It wasn’t feasible to describe all of the classes used in the
program, but these seven were the most important.
3.4.1 TrainingSetUtil
TrainingSetUtil (org.encog.util.simple.TrainingSetUtil) is the only class that
I modified.
The main method that I used was loadCSVTOMemory(). This method takes a
CSV file and loads that into a ReadCSV object. Then, that object is converted into
something that I could use for training: an MLDataSet. There were two problems I
was encountering when importing CSV files: incomplete data entries were giving
undesired results when training, and an inability to preserve the column headers.
It is not uncommon to have data sets with entries that don’t have values
for all columns (especially when dealing with data obtained through real world
observations). These values can lead to undesired results if used for training, so I
wanted to discard those entries in their entirety. Thankfully, attempting to load
20
empty data values throws a CSVError exception (Error:Unparseable number), so I
was able to surround that part of the code with a try-catch statement. Inside the
catch portion, I decided not to print out the stack trace because that information
wasn’t very useful. However, I did increment a counter I had created called ignored,
which would then be printed to the console at the conclusion of the importing
process.
For the column headers, I needed to create a new data member:
private static List<String> columnNames = new ArrayList<String>();
The information from the .CSV file is loaded into a ReadCSV object. If the .CSV
file has headers (as specified through a boolean), these are stored in an ArrayList
of Strings, which can be accessed through a getColumnNames() method in that class.
However, there is no way to access that ReadCSV object after the initial importing
process is completed. Thus, I needed to add some additional functionality to the
the TrainingSetUtil class.
Inside the loadCSVTOMemory() method, I added a simple statement to store the
headers in the data member that I had defined above:
if(headers){
columnNames = csv.getColumnNames();
}
After that, it was just a matter of creating a standard accessor method (similar
to the one in the ReadCSV class):
/**
* @return the columnNames
*/
public static List<String> getColumnNames() {
21
return columnNames;
}
Back in the main part of the program, I wrote another method to change
the ArrayList into a standard array because I am more comfortable accessing
information in that format.
Modified Code
1 if(headers){2 columnNames = csv.getColumnNames();3 }45 int ignored = 0;67 while (csv.next()) {8 MLData input = null;9 MLData ideal = null;
10 int index = 0;11 try{12 input = new BasicMLData(inputSize);13 for (int i = 0; i < inputSize; i++) {14 double d = csv.getDouble(index++);15 input.setData(i, d);16 }1718 if (idealSize > 0) {19 ideal = new BasicMLData(idealSize);20 for (int i = 0; i < idealSize; i++) {21 double d = csv.getDouble(index++);22 ideal.setData(i, d);23 }24 }2526 MLDataPair pair = new BasicMLDataPair(input,
ideal);27 result.add(pair);28 }catch (CSVError e){29 ignored++;30 //e.printStackTrace();31 }32 }33 System.out.println(’’Rows ignored: ’’ + ignored);34 return result;
Original Code
1234567 while (csv.next()) {8 MLData input = null;9 MLData ideal = null;
10 int index = 0;1112 input = new BasicMLData(inputSize);13 for (int i = 0; i < inputSize; i++) {14 double d = csv.getDouble(index++);15 input.setData(i, d);16 }1718 if (idealSize > 0) {19 ideal = new BasicMLData(idealSize);20 for (int i = 0; i < idealSize; i++) {21 double d = csv.getDouble(index++);22 ideal.setData(i, d);23 }24 }2526 MLDataPair pair = new BasicMLDataPair(input,
ideal);27 result.add(pair);2829303132 }3334 return result;
Figure 16: Comparison between modified and original code
Figure 16 shows the differences between the original loadCSVTOMemory()
method and the modified version. The complete code for this class may be found
in Appendix A.6
22
3.4.2 BasicNetwork
BasicNetwork (org.encog.neural.networks.BasicNetwork) serves as the main
source of interaction between my implementation and the network itself. How-
ever, this doesn’t necessarily mean that this class does most of the work. Much
of the information is stored in related classes (for example, once the format of the
network is set up, the majority of information about the network is stored in a
FlatNetwork object).
Before a network can be used, its structure must be defined. For this purpose,
the BasicNetwork class uses this data member:
/**
* Holds the structure of the network. This keeps the network from
having to
* constantly lookup layers and synapses.
*/
private final NeuralStructure structure;
To set up this structure, each layer must be added through the use of the
addLayer() method. Each layer passed through the parameters will be added to
an ArrayList of Layer objects. The first layer added will be considered the input
layer.
Once all of the layers are added, the network must be finalized by invoking
structure.finalizeStructure(). Finalizing a neural structure eliminates the in-
termediate representation of the layers, temporarily storing that information in
FlatLayer objects, and then creating the FlatNetwork object which will be used in
the remaining network operations.
Once the network is finalized, the reset() method is invoked, which assigns
random starting values to the weights.
23
The actual network training is done through the Propagation class (an ab-
stract class which serves as a parent for classes such as Backpropagation and
ResilientPropagation). The BasicNetwork object is passed as a parameter, as well
as the training data set and any other necessary variables (such as the learning
rate and momentum if applicable).
Once the network is fully trained, its effectiveness can be measured by use of
the compute() method. This is used to compare each ideal output value with the
output value the network produces when given the same input.
3.4.3 FlatNetwork
The FlatNetwork class (org.encog.neural.flat.FlatNetwork) is a more computa-
tionally efficient form of a neural network, designed to store everything in single
arrays instead of keeping track of everything in multiple layers. Layers are main-
tained through the use of index arrays, which indicate where each layer starts in
the main arrays. According to the javadoc comments, “this is meant to be a very
highly efficient feedforward, or simple recurrent, neural network. It uses a mini-
mum of objects and is designed with one principal in mind-- SPEED. Readability,
code reuse, object oriented programming are all secondary in consideration”.
In concept, FlatNetwork objects act similarly from the standpoint of the user,
for they share many of the same methods. However, most of the calculations (such
as training) are actually done in this class (the BasicNetwork class invokes methods
from here). The speed increase comes from the use of single-dimensional arrays of
doubles and ints, which have a faster information access time than using accessor
and mutator methods with multiple classes.
24
3.4.4 BasicLayer
The BasicLayer class (org.encog.neural.networks.layers.BasicLayer) is an im-
plementation of the Layer interface. Its job is to store information about the
specific layer it is assigned (input, hidden, or output) during the creation of the
network. Once the network has been finalized, specific layers are no longer used.
The class has two constructors: one which has user defined parameters (acti-
vation function, bias, and neuron count), and one which just receives the number
of neurons in the layer. If the second constructor is used, the default option is to
create a layer which has a bias and uses a hyperbolic tangent activation function.
Hidden layers utilize all three variables when being initialized. Input layers
do not have activation functions. Bias nodes are stored in the layer prior to where
they will have an impact (a bias node which effects the nodes in the hidden layer
will be declared as part of the input layer), so output layers should not have a bias.
Each layer also has a data member which indicates which network the layer
is a part of.
3.4.5 BasicMLDataSet
The BasicMLDataSet class (org.encog.ml.data.basic.BasicMLDataSet) isn’t a
very complicated class, but it is very important. A child class for the more general
MLDataSet interface, the main purpose of this class is to maintain an ArrayList
of BasicMLDataPair objects. This is what the training data set will be stored in.
The class contains of several constructors, able to create an object by accepting
multidimensional double arrays, an MLDataSet object, or an MLDataPair ArrayList.
The rest of the class contains several add methods, as well as methods to retrieve
data entries or information about the data set (such as its size).
25
3.4.6 BasicMLDataPair
The BasicMLDataPair class (org.encog.ml.data.basic.BasicMLDataPair) is a
child class of the MLDataPair interface. Its purpose is to hold the information
of a single data entry. Each BasicMLDataPair contains two MLData objects, arrays
of doubles designed to store the input data and the ideal data respectively. Both
values are necessary for supervised learning, but only the input value is required
for unsupervised learning (the ideal value should be left null).
3.4.7 ActivationFunction
ActivationFunction (org.encog.engine.network.activation.ActivationFunction)
is an interface that serves as a parent class for any activation function that would
be used with a neural network. The library comes with sixteen activation functions
already implemented, but users are free to implement their own as long as they
include all of the methods in the interface.
The two most important methods are as follows:
void activationFunction(double[] d, int start, int size);
This method is the main math portion of the activation function. The input
values are stored in the double array d, with the range of values specified by the
variables start and size. After some mathematical calculations, the output value
from the activation function is stored in the same double array. For example, from
the ActivationSigmoid class:
x[i] = 1.0 / (1.0 + BoundMath.exp(-1 * x[i]));
The ActivationLinear class actually leaves this method blank. The linear
activation function has outputs identical to its inputs, so there is no need to do
anything with the array of doubles.
26
double derivativeFunction(double b, double a);
This method calculates the derivative of the activation function at a cer-
tain point. Not all activation functions have derivatives (there is another method
called hasDerivative(), which will return true if the specific activation function
has a derivative and false otherwise). However, there must be a derivative for an
activation function to be used with backpropagation.
The method receives two doubles as parameters. The first double, b, is the
original input number (in the activationFunction method, this number would have
been in the d array). The second double, a, is the original output value. This is
the value the activation function produces if it is given a as an input. Depending
on the equation for each specific activation function, the derivative will be cal-
culated with whichever value is more computationally efficient. For example, the
ActivationSigmoid class uses the output value:
return a * (1.0 - a);
To contrast, the ActivationElliott class uses the input value:
double s = params[0];
double d = (1.0+Math.abs(b*s));
return (s*1.0)/(d*d);
As of v3.3, all activation functions in the Encog library have derivatives with
the exception of ActivationCompetitive. Attempting to use this activation function
in a backpropagation network will throw an EncogError exception (“Can’t use the
competitive activation function where a derivative is required”).
27
CHAPTER 4
Implementation
4.1 Overview
The purpose of this program is to train an artificial neural network and export
source code for it. This will allow the results of the network to be used in other
projects without needing to store it in a data structure.
All information is controlled through user input via a config file and a training
data set. The program will output two files: a training error report, and the code
for the network. The exact format of these outputs will be designated by the user.
4.2 Assumptions/Design Choices
Early in the design process, I decided that I was going to use an external third
party library to handle the actual training of the neural network. The purpose of
this project was more focused on the source code generation for a neural network,
rather than the training itself. Doing the actual implementation for the network
training would add additional development time to this project. In addition, unless
it were to be made the main focus of the project, a personal implementation would
not be as effective as a third party alternative, as the designers of said software
have spent years optimizing the code. More information about the java library
Encog may be found in the previous chapter.
The only other major design decision was the restriction of only numerical
information for the training data set. The program is only designed to be used
with numbers for all data entries. Using strings will result in rows being ignored
when the .csv file is imported. For more information on this decision, see section
5.1.
The program also assumes that all inputs from the user are valid. As of now,
28
there are very little debugging tools built into the program, so invalid data will
result in the program not running.
4.3 Inputs
The program requires two inputs from the user: a config file containing all
of the information required by the neural network, and a .csv file containing the
training data set.
4.3.1 Config File
The only command line argument is the file path for a config file. This file
can have any name, but it must have a .txt file extension. The config file contains
the following information:
• The complete file path for the training data set. This file will be described
in detail in the next subsection.
• A boolean for whether or not the training data set file has a header row (true
for yes, false for no).
• The number of input variables (how many columns in the training data set
are independent variables).
• The number of output variables (how many columns in the training data set
are dependent variables).
• The number of hidden layers the artificial neural network will be constructed
with. There is no theoretical upper limit on the number of hidden layers this
program can accommodate, although studies have shown that almost any
problem can be solved with the use of at most two hidden layers. [1]
• Attributes for each hidden layer:
29
– An integer for the type of activation function:
0. Sigmoid
1. Hyperbolic Tangent
2. Linear
3. Elliott
4. Gaussian
5. Logarithmic
6. Ramp
7. Sine
8. Step
9. Bipolar
10. Bipolar Sigmoid
11. Clipped Linear
12. Elliott Symmetric
13. Steepened Sigmoid
– A boolean for if the layer has a bias node or not.
– An integer for the number of normal neurons in the layer.
• Attributes for the input layer (only bias information is needed).
• Attributes for the output layer (bias and activation function is needed).
• The file type for the first output file (the training error):
0. text (.txt)
1. just numbers (.csv)
30
• The name of the first output file (not including the file extension; the program
will add that information internally).
• The file type for the second output file (the code for the artificial neural
network):
0. equation format(.txt)
1. java file (standalone)
2. java file (integrated)
• The name of the second output file (not including the file extension; the
program will add that information internally).
• The desired training error. The network will continue to train until the error
is less than or equal to this number.
• The maximum number of epochs. If the desired training error has not yet
been reached, the network will stop training after this many iterations.
• An integer for the network type:
0. ResilientPropagation (see section 2.1.2)
1. Backpropagation (see section 2.1.1)
• The learning rate. This is only applicable for backpropagation networks.
• The momentum. The program will not use momentum if this value is set to
0. This is only applicable for backpropagation networks.
Comments can be made by beginning a line with a percent symbol (%). The
methods related to importing the config file will ignore any such lines.
31
Rather than prompting the user for this information within the program, using
a file allows all of the required information to be stored in one place. This also
makes multiple uses of the program easier, because the user is able to change the
value of a single variable without going through the tedious process of re-inputting
all of the other data as well.
4.3.2 Training Data Set
The other primary input is the training data set. As mentioned in the previous
subsection, the file path for this file is given as part of the config file rather than
as a command line argument.
The training data set must conform to the following specifications:
• It must be in comma-separated values format (a .csv file).
• Headers are optional. If they are included, the code that the program exports
will use the column names as identifiers for variables.
• If possible, do not include any rows with blank entries in them. These rows
will be discarded when the .csv file is imported, and therefore not used for
training purposes.
• The .csv file shall for organized so that the independent variables (input) are
on the left, while the dependent variables (output) are on the right.
• All of the data entries must be numerical. At this time the program does
not support categorical classification.
Currently, the program uses the same data set for both training and testing.
4.4 Outputs
The program has two separate output files: one file containing the training
error report, and one file containing the code of the neural network.
32
4.4.1 Training Error Report
The first output file contains information about the training error. This is
the overall error (how far the actual results are from the ideal results) after each
iteration of training.
The exact format of this file can be specified by the user in the config file.
Currently, there are two possible formats.
If the user selects option 0, the output will be in a .txt file:
Figure 17: Sample from first output file (.txt)
If the user selects option 1, the output will be in a .csv file. This will have a
header row, and can be loaded into other programs for analysis (such as graphing):
Figure 18: Sample from first output file (.csv)
4.4.2 Source Code
The second output file contains the source code for the trained neural network.
Regardless of what file format this output is in, there will be two main sections to
it: variable declaration, and network calculation.
The variable declaration section is a list of all the variables that will be used
in the network calculation section, as well as any default values (such as 1.0 for
biases). I decided upon the following naming conventions for variables:
• i - Input layer (starts at 0)
33
• h - Hidden layer (starts at 1)
• o - Output layer (starts at 0)
• n - Number of the node within a layer (starts at 0)
• f - Indicates which node from the previous layer the link originates (starts
at 0)
• t - Total (the sum of all f nodes leading to a specified nodes), before being
fed to the activation function.
• Lack of an f or a t indicates that this value is the output from an activation
function (or a bias node).
If there are headers present in the input file, these will be included as input
and output variable names.
The network calculation section is where the specific weight values are utilized
in order to write equations that map the specified input values to output values.
This allows the function of the trained network to be maintained without needing
to store the information in a data structure.
The exact format of this file can be specified by the user in the config file.
Currently, there are two possible formats.
If the user selects option 0, the output will be in a .txt file. Variable decla-
rations will just consist of names, and the network calculation section will just be
mathematical formulas.
If the user selects option 1 or 2, the output will be in a .java file. Variables will
all be declared as doubles (original input and final output variables will be public,
and all others will be private). The network calculation section will be inside a
method. Everything will also be inside of a class (which shares the same name as
the file itself, as specified by the user in the config file).
34
4.5 Individual classes
The program itself (not counting the modified Encog library) currently con-
sists of five classes. This number may grow in the future if more output source
code types were to be implemented.
4.5.1 NeuralGenerator
The NeuralGenerator class is the largest class in the program. Most of the
work happens here.
The variable declarations are mostly self explanatory, so they will not be
discussed here. The comments for each variable can be viewed in Appendix A.1.
After the initial setup, the first thing the program does is import data from the
config file, through the validateConfig() method. This method goes through the
config file line by line (through the use of a helper method, nextValidLine(), which
ignores any lines that are commented out, as designated by a ‘%’ at the beginning
of a line). All information from the config file is stored into data members so it
can be accessed by other methods, and is then printed out to the console.
The initializeOutput1() method is called, which creates the first output file.
This file will contain the training error report. For more information, see section
4.4.1.
The next method to be invoked is createNetwork(). This method creates a
BasicNetwork, and populates it with an input layer, hidden layers, and an output
layer. The information for each layer (activation function, bias, and number of
nodes) is specified by LayerInfo objects, which in turn are defined by the informa-
tion in the config file. Once all of the layers are added, the network is finalized,
and the weights are reset to random values.
Next, the training data set is created from the .csv file. If there are headers,
these are stored in an ArrayList (the information is then stored in a String array,
35
because I prefer working with that format).
Then, the network is trained. The two current options for training utilize
either the Backpropagation class or the ResilientPropagation class (for more infor-
mation, see sections 2.1.1 and 2.1.2 respectively). After each iteration of train-
ing, the training error is calculated, and written to a file through the writeOne()
method. This helper method also prints the information to the console. Training
will continue until the training error is below the desired amount, or until the
maximum number of epochs has been reached.
Once the network is trained, the first file is closed. The program prints the
results of the network to the console, comparing the actual value of each input to
its ideal value.
Finally, the initializeOuput2() method is invoked. This method creates the
code output file (see section 4.4.2), and stores the necessary values in variables
through accessor methods in the OutputWriter class. Finally, the program flow
then proceeds to the writeFile() method for the desired OutputWriter child class,
and then the program terminates.
4.5.2 LayerInfo
LayerInfo is a small class created to store the information needed to create a
layer in an artificial neural network. I had originally planned on using a struct,
but java does not support those, so I decided to make a separate class to hold the
same information.
The class has 3 main variables:
• private int activationFunction - An integer for the type of activation func-
tion for that layer.
• private boolean isBiased - A boolean for whether or not the layer has a bias
36
node.
• private int neurons - An integer for the number of normal (non-bias) neu-
rons in the layer.
All of these variables are set through parameters passed to the constructor.
There should not be a need to change these values once they have been set, so
there are no mutator methods. Each variable has an accessor method so that its
value can be used by the rest of the program.
The only other method is the toString() method. This method is used for
returning the information from the layer in an easy-to-read format, so that it can
be printed. While not essential to the flow of the program, it may be useful for
the user to see this information displayed in the console (especially for debugging
purposes).
4.5.3 OutputWriter
The OutputWriter class serves as a parent class for other OutputWriters. This
class holds all of the shared methods required to create a file and output the
code/formula for a trained artificial neural network.
Child classes must implement three methods: createFile(), writeFile(), and
parseActivationFunction().
The createFile() method creates the file used for output. While the majority
of the code in this method is the same in all child classes, I found that it was easier
to have each child class add its own file extension to the file name (.txt or .java).
The writeFile() method is rather lengthy. This is where the actual program
writes the code/formula for the neural network to a file. While similar in terms of
basic structure, the actual details of this will vary with each child class.
The parseActivationFunction() parses the equation of the activation function
37
and returns it in String form. A series of if-else statements allowed for 14 of
the 16 currently implemented activation functions to be used (Softmax is rather
complicated and would require the code to be reworked, and Competitive is non-
differentiable so I didn’t see a need to include it).
4.5.4 OutputWriterTxt
The OutputWriterTxt class is a child of the OutputWriter class. This class will
be used if the user selects option 0 for the second output file.
The createFile() method creates the second output file, using the filename
as specified in the config file and appending a .txt file extension.
The variable declarations section gives the names of all the variables to be
used in the network calculation section, in the following order:
• Header names for input (if applicable).
• Input layer.
• Hidden layer(s).
• Output layer.
• Header names for output (if applicable).
If there any are bias nodes present, they are assigned the value of the bias as
defined in the network (the default is 1.0, but this value is customizable).
Following the variable declaration is the network calculation section. If there
are any variables defined by header names, the values of these are stored into the
predefined variables such as i0. After that, calculation begins starting with the
first hidden layer. The f values are calculated first, consisting of the associated
node in the previous layer multiplied by the weight of the connection (as defined
by the trained network). All f values for a specific node are added together, with
38
the resulting sum being stored in a t value. This t value is then fed through the
activation function (the text of which comes from the parseActivationFunction()
method), and that is stored in the final value for that node. This continues through
all of the hidden layers and the output layer. Finally, if applicable, the values of
the final outputs are stored in the variables defined by output header names.
The parseActivationFunction() method parses the equation of the activa-
tion function and returns it in String form. All information with regards to
the exact mathematical formulas for each activation function came from the
activationFunction() method of the associated class.
(a) (b)
Figure 19: Sample second output file (.txt)
4.5.5 OutputWriterJava
The OutputWriterJava class is a child of the OutputWriter class. This class will
be used if the user selects option 1 or 2 for the second output file.
The createFile() method creates the second output file, using the filename
as specified in the config file and appending a .java file extension.
39
The format of the .java file was inspired by the output of the program Tiberius.
One of the original intents of this program was to be used in a course that currently
uses Tiberius, so it made sense to model the output file in a way that it would be
compatible.
The first things written to the file is an import statement for java.lang.Math,
followed by the declaration of the class (with the same name as the file).
The variable declarations section declares all of the variables to be used in
the network calculation section, in the same order as specified in the previous
subsection. All methods are static so that they can be accessed from the main
method without creating a specific object of this class type, so all variables are
declared as static doubles. Most variables are private, but the input and output
variables (as well as any variable names defined by headers in the training data set)
are declared as public so that they can be accessed by the user in other classes. If
there any are bias nodes present, they are assigned the value of the bias as defined
in the network (the default is 1.0, but this value is customizable). Bias nodes are
also always declared as private, even if they are in the input layer.
If the user has chosen to make a standalone java file, there will be two ad-
ditional methods: main() and initData(). The main() method will call the other
two methods (initData() and calcNet()), and then print the output values to the
console. The initData() method will provide default values for the input variables
(using the header names if applicable). The default values are currently set to 1,
although these can be modified by the user. If the user has selected to make an
integrated java file, neither of these two methods will be present.
The calcNet() method contains the network calculation section of the code.
The code generation of this section is almost identical to the equivalent in the
OutputWriterTxt class, except that every line ends with a semicolon.
40
The parseActivationFunction() method parses the equation of the activation
function and returns it in String form. Some mathematical expressions are sub-
stituted with their java equivalents (for example, |x| becomes Math.abs(x), and ex
becomes Math.exp(x)).
(a) (b)
Figure 20: Sample second output file (.java)
List of References
[1] J. Heaton. “The number of hidden layers.” September 2008. [Online]. Available:http://www.heatonresearch.com/node/707
41
CHAPTER 5
Future work and conclusions
5.1 Categorical classification
As of right now, the program only works with data sets with entirely numerical
entries. This means that any data sets with categorical entries will need to be
changed into a numerical representation before they can be used. For example,
with the iris data set, instead of species of setosa, versicolor, and virginica, it would
use numbers such as 1, 2, and 3.
My original concept was to start out with numerical classification first, because
it is easier to work with, and then expand to include categorical if I had time.
However, I discovered late in the implementation period that in order to be able to
use categorical data sets, I would have to completely change how the network itself
was implemented. Within Encog, categorical classification is done with several
different classes than numerical classification.
As of the writing of this paper, I do not know if I can use those classes to
work with numerical data sets, or if I would have to make a separate main class
for the different types.
5.2 Additional output formats
Originally, this project was going to be written in C++, because of the ap-
plicability of that language to the gaming industry (where I want to work)[1].
However, it was changed to Java because of the portability of that language (there
is no need to compile the code for different systems, because it is always run within
the java virtual machine).
Currently, the second output file from the program can be in either basic
equational format (in a .txt file), or in Java code. Given more time, I would
42
have preferred to also allow for C++ code to be exported. Given the nature of
the program, it would not be unfeasible to allow other target languages to be
implemented as well.
5.3 Non-command line inputs
Currently, the only input into the program is through a config file, which
contains all of the necessary data that the program needs to run. While this can
be easier for multiple runs (because the user does not need to repeatedly input the
information each time), I recognize that it can be hard to set up the config file for
the first time. Some users may also prefer to enter the information on a step by
step basis.
The basic implementation of an alternate input method is not very compli-
cated. If there are no arguments passed to the program when it is run, a new
method would be called instead of the validateConfig() method. This method
would assign values for all of the necessary variables through a series of input
statements utilizing a scanner.
Related to this additional input method would be improved config file debug-
ging. Currently the program assumes that all of the data the user has inputted
is valid. There is no checking in that method to see if a number is within a valid
range (for example, a value of 17 for the activation function type). These numbers
are checked in other places of the code, but it would be more useful for the user
to have the numbers validated the first time they are encountered. If there is a
major problem (for example, the program is expecting a number, and the user has
a string of text instead), a generic exception will be thrown, and the stack trace
will be printed to the console.
Although these implementations are not very difficult, they were omitted for
because of timing.
43
5.4 Normalization
Depending on the data set being used, normalization can be an important
feature in machine learning. For example, data sets with values for a certain
variable that are much larger than other values may not converge as well during
training as with a normalized data set. Encog supports several different types of
normalization, but I would most likely be using range normalization due to the
ability to normalize the data to a specific range (for example, -1 to 1 when using a
hyperbolic tangent activation function, or 0 to 1 when using a sigmoid activation
function).
f(x) = (x−dL)(nH−nL)(dH−dL) + nL [2]
In this equation, dL is the minimum value (low) in the data set, dH is the
maximum value (high) in the data set, nL is the desired normalized minimum, and
nH is the desired normalized maximum.
5.5 Conclusions
Overall, the program works. The best way to illustrate this is to walk through
an example.
In this example, the config file shown in figure 27 was used. This creates a
network with a single hidden row of 2 neurons, a sigmoid function for an activation
function, and RPROP for a training algorithm. The XOR function was used as a
simple data set, as seen in figure 21.
Figure 21: test.csv
Due to the speed of RPROP, this network was able to train to an error of 0.01
within 47 iterations. A .csv format was chosen for the first output file, which can
44
be seen in figure 22.
(a) (b) (c)
Figure 22: output1.csv
CSV files can easily be plotted, as seen in figure 23. The spike at epoch 29
most likely illustrates the nature of the algorithm to correct itself after skipping
past a local minimum, as denoted by a sign change in the gradient.
0 10 20 30 40 500
0.1
0.2
0.3
Epoch
Error
Figure 23: graph of output1.csv
The second output is in a java file, as seen in figure 28. The integrated option
was chosen, so there will not be any main method.
The program also printed the results of testing the network, as seen in fig-
ure 24.
45
Figure 24: Results from NeuralGenerator.java
In order to test to ensure that the program outputted source code correctly,
I wrote the following short program, seen in figure 25. This program tests the
code for the network by using all four values of the original training data set, and
outputting the results.
1 package test;23 public class TestModule {45 private static int[][] tests =
{{0,0},{1,0},{0,1},{1,1}};67 public static void main(String[] args) {89 for (int i = 0; i < 4; i++){
10 System.out.print("\nTest #" + i + ": (" +tests[i][0]
11 + "," + tests[i][1] + ") \t\t Results: ");12 output2.x1 = tests[i][0];13 output2.x2 = tests[i][1];14 output2.calcNet();15 System.out.print(output2.xor);16 }17 }18 }
Figure 25: TestModule.java
When this program is run, it produces the output seen in figure 26.
Figure 26: Results from output2.java
By comparing the results from figures 24 and 26, it is evident that the gen-
erated source code produces the same results as the trained network. Therefor,
the network has been successfully maintained without the use of additional data
structures.
46
There are always improvements that can be made (for example, those listed in
sections 5.1-5.4). However, considering the initial scope of the project, the program
that has been created can be considered successful.
(a) (b)
Figure 27: Sample config file
List of References
[1] K. Stuart. “How to get into the games industry – an insiders’ guide.”March 2014. [Online]. Available: http://www.theguardian.com/technology/2014/mar/20/how-to-get-into-the-games-industry-an-insiders-guide
[2] J. Heaton. “Range normalization.” September 2011. [Online]. Available:http://www.heatonresearch.com/wiki/Range_Normalization
47
(a) (b)
Figure 28: output2.java
48
APPENDIX
Source Code
A.1 NeuralGenerator.java
1 package skynet;2 import org.encog.Encog;3 import org.encog.engine.network.activation.*;4 import org.encog.ml.data.MLData;5 import org.encog.ml.data.MLDataPair;6 import org.encog.ml.data.MLDataSet;7 import org.encog.neural.networks.BasicNetwork;8 import org.encog.neural.networks.layers.BasicLayer;9 import org.encog.neural.networks.training.propagation.Propagation;
10 import org.encog.neural.networks.training.propagation.back.Backpropagation;11 import org.encog.neural.networks.training.propagation.resilient.*;12 import org.encog.util.csv.CSVFormat;13 import org.encog.util.simple.TrainingSetUtil;14 import java.io.BufferedReader;15 import java.io.BufferedWriter;16 import java.io.File;17 import java.io.FileInputStream;18 import java.io.FileWriter;19 import java.io.IOException;20 import java.io.InputStreamReader;21 import java.util.List;22
23 /**24 * The main class for the program. The purpose of this program is the train25 * an Artificial Neural Network and output source code for it, so that it26 * can be used in other projects.27 * @author bwinrich28 *29 */30 public class NeuralGenerator {31
32 /**33 * A BufferedWriter for our error output34 */35 private BufferedWriter bw1 = null;36
37 /**38 * An OutputWriter for our code output39 */40 private OutputWriter myOutput;
49
41
42 /**43 * The number of neurons in each layer (including bias neurons)44 */45 private int[] numberOfTotalNeurons;46
47 /**48 * The number of neurons in each layer (excluding bias neurons)49 */50 private int[] numberOfNormalNeurons;51
52 /**53 * The location of the .csv file for the training data set54 */55 private String filePath = null;56
57 /**58 * Does the data set have headers?59 */60 private boolean hasHeaders = false;61
62 /**63 * The number of input nodes64 */65 private int numOfInput = 0;66
67 /**68 * The number of output nodes69 */70 private int numOfOutput = 0;71
72 /**73 * The number of hidden layers in the network74 */75 private int numOfHiddenLayers = 0;76
77 /**78 * An array holding the information for each layer (activation function,79 * bias, number of neurons)80 */81 private LayerInfo[] allMyLayers = null;82
83 /**84 * The network will train until the error is this value or lower85 */86 private double desiredTrainingError = 0.01;
50
87
88 /**89 * The maximum number of epochs the network will train90 */91 private int numOfEpochs = 0;92
93 /**94 * The learning rate for the network (backpropagation only)95 */96 private double learningRate = 0;97
98 /**99 * The momentum for the network (backpropagation only)
100 */101 private double momentum = 0;102
103 /**104 * The type of file the error output will be (0: .txt, 1: .csv)105 */106 private int output1Type = 0;107
108 /**109 * The type of file the code output will be (0: .txt, 1/2: .java)110 */111 private int output2Type = 0;112
113 /**114 * The name of the error output file115 */116 private String output1Name = null;117
118 /**119 * The name of the code output file120 */121 private String output2Name = null;122
123 /**124 * The data structure for the ANN125 */126 private BasicNetwork network = null;127
128 /**129 * Array to hold the column names (only if the data set has headers)130 */131 private String[] columnNames;132
51
133 /**134 * The type of network the ANN will be (0: Resilient propagation,135 * 1: backpropagation)136 */137 private int networkType;138
139 /**140 * The training data set141 */142 private MLDataSet trainingSet;143
144
145 /**146 * The main method.147 * @param args Should contain the location of the config file148 */149 @SuppressWarnings("unused")150 public static void main(final String args[]) {151 if (args.length == 0){152 System.out.println("Error: No file");153 }else{154 String configFilePath = args[0];155 NeuralGenerator myThesis = new NeuralGenerator(configFilePath);156 }157 }158
159 /**160 * Constructor for the class.161 * @param configFilepath The location of the config file162 */163 public NeuralGenerator(String configFilepath){164 newMain(configFilepath);165 }166
167 /**168 * The driver method, which will call all other necessary methods169 * required for the execution of the program170 * @param configFilePath The location of the config file171 */172 private void newMain(String configFilePath){173
174 //Import the config file, and the necessary information175 validateConfig(configFilePath);176
177 //Create the first output file178 System.out.println("Initializing first output file...");
52
179 initializeOutput1();180
181 // create a neural network182 System.out.println("Creating network...");183 createNetwork();184
185 //Import data set186 System.out.println("Importing csv file...");187
188 trainingSet = TrainingSetUtil.loadCSVTOMemory(CSVFormat.ENGLISH,189 filePath, hasHeaders, numOfInput, numOfOutput);190
191 //Just because I prefer working with arrays instead of arrayLists192 if(hasHeaders){193 List<String> columns = TrainingSetUtil.getColumnNames();194 int width = columns.size();195 columnNames = new String[width];196 for (int i = 0; i < width; i++ ){197 columnNames[i] = columns.get(i);198 }199 }200
201 // train the neural network202 train();203
204 // Close the first file after we’re done with it205 try{206 if(bw1!=null)207 bw1.close();208 }catch(Exception ex){209 System.out.println("Error in closing the BufferedWriter"+ex);210 }211
212 // test the neural network213 System.out.println("");214 System.out.println("Neural Network Results:");215 for(MLDataPair pair: trainingSet ) {216 final MLData output = network.compute(pair.getInput());217
218 System.out.println(pair.getInput().getData(0) + ","219 + pair.getInput().getData(1) + ", actual=" + output.getData(0)220 + ",ideal=" + pair.getIdeal().getData(0));221 }222
223 //Some additional numbers that we need224 int layers = network.getLayerCount();
53
225
226 numberOfTotalNeurons = new int [layers];227 numberOfNormalNeurons = new int [layers];228
229 for (int i = 0; i<layers; i++)230 {231 numberOfTotalNeurons[i] = network.getLayerTotalNeuronCount(i);232 numberOfNormalNeurons[i] = network.getLayerNeuronCount(i);233 }234
235 System.out.println("\n");236
237 //Initialize the OutputWriter238 System.out.println("Initializing Second Output File...");239 initializeOutput2();240
241 System.out.println("Writing to file...");242
243 myOutput.writeFile();244
245 System.out.println("Done.");246
247 Encog.getInstance().shutdown();248 }249
250
251 /**252 * This method handles the training of the Artificial Neural Network253 */254 private void train() {255 Propagation train = null;256
257 //Different networks will be created based on the type listed in the258 //config file259 switch(networkType){260 case 0:261 train = new ResilientPropagation(network, trainingSet);262 break;263 case 1:264 train = new Backpropagation(network, trainingSet, learningRate,265 momentum);266 break;267 default:268 break;269 }270
54
271 int epoch = 1;272
273 System.out.println("");274
275 System.out.println("Training...");276
277 System.out.println("");278
279 //Training the network280 do {281 train.iteration();282
283 //We write the error to the first output file284 writeOne(epoch, train.getError());285
286 epoch++;287 } while((train.getError() > desiredTrainingError)288 && (epoch < numOfEpochs));289 //Training will continue until the error is not above the desired290 //error, or until the maximum number of epochs has been reached291 train.finishTraining();292 }293
294 /**295 * Helped method for creating the ANN and adding layers to it296 */297 private void createNetwork() {298 network = new BasicNetwork();299
300 for (LayerInfo myLayer:allMyLayers)301 {302 switch (myLayer.getActivationFunction()){303 case -1: //The input layer doesn’t have an activation function304 network.addLayer(new BasicLayer(null, myLayer.isBiased(),305 myLayer.getNeurons()));306 break;307 case 0:308 network.addLayer(new BasicLayer(new ActivationSigmoid(),309 myLayer.isBiased(), myLayer.getNeurons()));310 break;311 case 1:312 network.addLayer(new BasicLayer(new ActivationTANH(),313 myLayer.isBiased(), myLayer.getNeurons()));314 break;315 case 2:316 network.addLayer(new BasicLayer(new ActivationLinear(),
55
317 myLayer.isBiased(), myLayer.getNeurons()));318 break;319 case 3:320 network.addLayer(new BasicLayer(new ActivationElliott(),321 myLayer.isBiased(), myLayer.getNeurons()));322 break;323 case 4:324 network.addLayer(new BasicLayer(new ActivationGaussian(),325 myLayer.isBiased(), myLayer.getNeurons()));326 break;327 case 5:328 network.addLayer(new BasicLayer(new ActivationLOG(),329 myLayer.isBiased(), myLayer.getNeurons()));330 break;331 case 6:332 network.addLayer(new BasicLayer(new ActivationRamp(),333 myLayer.isBiased(), myLayer.getNeurons()));334 break;335 case 7:336 network.addLayer(new BasicLayer(new ActivationSIN(),337 myLayer.isBiased(), myLayer.getNeurons()));338 break;339 case 8:340 network.addLayer(new BasicLayer(new ActivationStep(),341 myLayer.isBiased(), myLayer.getNeurons()));342 break;343 case 9:344 network.addLayer(new BasicLayer(new ActivationBiPolar(),345 myLayer.isBiased(), myLayer.getNeurons()));346 break;347 case 10:348 network.addLayer(new BasicLayer(349 new ActivationBipolarSteepenedSigmoid(),350 myLayer.isBiased(), myLayer.getNeurons()));351 break;352 case 11:353 network.addLayer(new BasicLayer(new ActivationClippedLinear(),354 myLayer.isBiased(), myLayer.getNeurons()));355 break;356 case 12:357 network.addLayer(new BasicLayer(new ActivationElliottSymmetric(),358 myLayer.isBiased(), myLayer.getNeurons()));359 break;360 case 13:361 network.addLayer(new BasicLayer(new ActivationSteepenedSigmoid(),362 myLayer.isBiased(), myLayer.getNeurons()));
56
363 break;364 default:365 //Unimplemented activation function: Softmax (complicated)366 //Unimplemented activation function: Competitive367 //(non-differentiable)368 System.out.println("Error: This activation function is "369 + "either invalid or not yet implemented");370 break;371 }372 }373
374 network.getStructure().finalizeStructure();375
376 network.reset();377 }378
379
380 /**381 * This method creates the error output file382 */383 private void initializeOutput1() {384 String output1NameFull = null;385
386 //File type is specified in the config file387 switch(output1Type){388 case 0:389 output1NameFull = output1Name + ".txt";390 break;391 case 1:392 output1NameFull = output1Name + ".csv";393 break;394 default:395 //More cases can be added at a later point in time396 System.out.println("Invalid output 1 type");397 }398
399 try{400 File file1 = new File(output1NameFull);401
402 if (!file1.exists()) {403 file1.createNewFile();404 }405
406 FileWriter fw1 = new FileWriter(file1);407 bw1 = new BufferedWriter(fw1);408
57
409 //Header line for a .csv file410 if (output1Type == 1){411 bw1.write("Epoch,Error");412 bw1.newLine();413 }414 }catch (IOException e){415 // TODO Auto-generated catch block416 e.printStackTrace();417 }418 }419
420 /**421 * This method creates the code output file422 */423 private void initializeOutput2(){424
425 //File type is specified in the config file426 switch(output2Type){427 case 0:428 myOutput = new OutputWriterTxt();429 break;430 case 1:431 myOutput = new OutputWriterJava(true);432 break;433 case 2:434 myOutput = new OutputWriterJava(false);435 break;436 default:437 //More cases can be added if additional classes are designed438 System.out.println("Invalid output 2 type");439 break;440 }441
442 //Creating the file443 myOutput.createFile(output2Name);444
445 //Passing all of the necessary network information to the OutputWriter446 myOutput.setNetwork(network);447 myOutput.setInputCount(numOfInput);448 myOutput.setOutputCount(numOfOutput);449 myOutput.setLayers(numOfHiddenLayers+2);450 myOutput.setNumberOfTotalNeurons(numberOfTotalNeurons);451 myOutput.setNumberOfNormalNeurons(numberOfNormalNeurons);452 myOutput.setHasHeaders(hasHeaders);453 myOutput.setColumnNames(columnNames);454 myOutput.initializeOtherVariables();
58
455 }456
457 /**458 * Helper method for writing to the error output file459 * @param epoch Number of times the network has been trained460 * @param error Training error for that epoch461 */462 private void writeOne(int epoch, double error) {463
464 String temp = null;465
466 //Format depends on file type467 switch(output1Type){468 case 0:469 temp = "Epoch #" + epoch + " Error:" + error;470 break;471 case 1:472 temp = "" + epoch + "," + error;473 break;474 default:475 temp = "Invalid output 2 type";476 break;477 }478
479 //Output the error to the console before writing it to the file480 System.out.println(temp);481 try {482 bw1.write(temp);483 bw1.newLine();484 } catch (IOException e) {485 // TODO Auto-generated catch block486 e.printStackTrace();487 }488 }489
490 /**491 * Helper method for retrieving lines from the config file. Comments are492 * not considered valid lines.493 * @param d The BufferedReader for the config file494 * @return The next valid line from the config file495 * @throws IOException496 */497 private String nextValidLine(BufferedReader d) throws IOException498 {499 String validLine = null;500 boolean isValid = false;
59
501
502 if (d.ready()){503 do{504 String str = d.readLine();505
506 if (str.length() != 0){507 //Eliminate extra space508 str = str.trim();509
510 //Comments start with %, and are not considered valid511 if (str.charAt(0) != ’%’){512
513 validLine = str;514 isValid = true;515 }516 }517 }while (!isValid && d.ready());518 }519 return validLine;520 }521
522 /**523 * A lengthy method for validating the config file. All information from524 * the config file is stored into data members so it can be accessed by525 * other methods.526 * @param configFilepath The location of the config file527 */528 public void validateConfig(String configFilepath)529 {530 try{531 File myFile = new File(configFilepath);532 FileInputStream fis = null;533
534 BufferedReader d = null;535
536 fis = new FileInputStream(myFile);537
538 d = new BufferedReader(new InputStreamReader(fis));539
540 //First, we store the file path of the .csv file541 if (d.ready()){542 filePath = nextValidLine(d);543 }544
545 //Next, we store if the csv file has headers or not546 if (d.ready()){
60
547 hasHeaders = Boolean.parseBoolean(nextValidLine(d));548 }549
550 //Next, we store the number of input parameters551 if (d.ready()){552 numOfInput = Integer.valueOf(nextValidLine(d));553 }554
555 //Next, we store the number of output parameters556 if (d.ready()){557 numOfOutput = Integer.valueOf(nextValidLine(d));558 }559
560 //Next, we store the number of hidden layers561 if (d.ready()){562 numOfHiddenLayers = Integer.valueOf(nextValidLine(d));563 }564
565 //Next, we store the information for our hidden layers566 allMyLayers = new LayerInfo[numOfHiddenLayers+2];567
568 String layer = null;569 int activationFunction;570 boolean isBiased;571 int neurons;572
573 for (int i = 1; i < numOfHiddenLayers+1; i++){574 if (d.ready()){575 layer = nextValidLine(d);576 layer = layer.trim().toLowerCase();577 layer = layer.substring(1,layer.length()-1);578 String[] layers = layer.split(",");579
580 for (String l:layers){581 l = l.trim();582 }583
584 activationFunction = Integer.valueOf(layers[0].trim());585 isBiased = Boolean.parseBoolean(layers[1].trim());586 neurons = Integer.valueOf(layers[2].trim());587
588 allMyLayers[i] =589 new LayerInfo(activationFunction, isBiased, neurons);590 }591 }592
61
593 //Next, we store the information for the input layer594 if (d.ready()){595 layer = nextValidLine(d);596 layer = layer.trim().toLowerCase();597 layer = layer.substring(1,layer.length()-1);598
599 isBiased = Boolean.parseBoolean(layer.trim());600
601 allMyLayers[0] = new LayerInfo(-1, isBiased, numOfInput);602 }603
604 //Finally, we store the information for the output layer605 if (d.ready()){606 layer = nextValidLine(d);607 layer = layer.trim().toLowerCase();608 layer = layer.substring(1,layer.length()-1);609
610 String[] layers = layer.split(",");611
612 activationFunction = Integer.valueOf(layers[0].trim());613
614 allMyLayers[numOfHiddenLayers+1] =615 new LayerInfo(activationFunction, false, numOfOutput);616 }617
618 //store the information about the output 1 file type619 if (d.ready()){620 output1Type = Integer.valueOf(nextValidLine(d));621 }622
623 //store the information about the output 1 name624 if (d.ready()){625 output1Name = nextValidLine(d);626 }627
628 //store the information about the output 2 file type629 if (d.ready()){630 output2Type = Integer.valueOf(nextValidLine(d));631 }632
633 //store the information about the output 2 name634 if (d.ready()){635 output2Name = nextValidLine(d);636 }637
638 //Store the information for the desired training error
62
639 if (d.ready()){640 desiredTrainingError = Double.valueOf(nextValidLine(d));641 }642
643 //Store the information for the maximum number of epochs644 if (d.ready()){645 numOfEpochs = Integer.valueOf(nextValidLine(d));646 }647
648 //Store the information for the desired network type649 if (d.ready()){650 networkType = Integer.valueOf(nextValidLine(d));651 }652
653 //We need additional variables if we are using Backpropagation654 if (networkType == 1){655 //Store the information for the learning rate656 if (d.ready()){657 learningRate = Double.valueOf(nextValidLine(d));658 }659
660 //Store the information for the momentum661 if (d.ready()){662 momentum = Double.valueOf(nextValidLine(d));663 }664 }665
666 //TODO: reorder this667 //output the information from the config file668 System.out.println("config file validated:");669 System.out.println("\tfilePath = " + filePath);670 System.out.println("\thasHeaders = " + hasHeaders);671 System.out.println("\tnumOfInput = " + numOfInput);672 System.out.println("\tnumOfOutput = " + numOfOutput);673 System.out.println("\tnumOfHiddenLayers = " + numOfHiddenLayers);674 for (LayerInfo l: allMyLayers){675 System.out.println("\t" + l.toString());676 }677 System.out.println("\tdesiredTrainingError = "678 + desiredTrainingError);679 System.out.println("\tnumOfEpochs = " + numOfEpochs);680 System.out.println("\tnetworkType = " + networkType);681 if (networkType == 1){682 System.out.println("\tlearningRate = " + learningRate);683 System.out.println("\tmomentum = " + momentum);684 }
63
685 System.out.println("\toutput2Type = " + output1Type);686 System.out.println("\toutput2Name = " + output1Name);687 System.out.println("\toutput2Type = " + output2Type);688 System.out.println("\toutput2Name = " + output2Name);689
690 }691 catch (Exception e){692 //TODO: create more detailed error messages, to see where the error693 //occurred694 System.out.println("Invalid config file");695 e.printStackTrace();696 }697 System.out.println("");698 }699 }
A.2 LayerInfo.java
1 package skynet;2 /**3 * A simple class, designed to hold the information required to create a4 * layer in the neural network5 * @author bwinrich6 */7
8 class LayerInfo{9
10 /**11 * An integer for the type of activation function (see comments in12 * config file for details)13 */14 private int activationFunction;15
16 /**17 * A boolean for if the layer has a bias node or not18 */19 private boolean isBiased;20
21 /**22 * An integer for the number of normal neurons in the layer23 */24 private int neurons;25
26 /**27 * A constructor with parameters. We have no need for a default28 * constructor29 * @param activationFunction type of activation function
64
30 * @param isBiased is there a bias node31 * @param neurons number of normal neurons32 */33 public LayerInfo(int activationFunction, boolean isBiased, int neurons){34 this.activationFunction = activationFunction;35 this.isBiased = isBiased;36 this.neurons = neurons;37 }38
39 /**40 * Accessor method for activationFunction41 * @return the activationFunction42 */43 public int getActivationFunction() {44 return activationFunction;45 }46
47 /**48 * Accessor method for isBiased49 * @return the isBiased50 */51 public boolean isBiased() {52 return isBiased;53 }54
55 /**56 * Accessor method for neurons57 * @return the neurons58 */59 public int getNeurons() {60 return neurons;61 }62
63 /** A method used for returning the information for the layer in an64 * easy-to-read format, so that it can be printed.65 * @see java.lang.Object#toString()66 */67 @Override68 public String toString(){69
70 String activation = null;71
72 switch(activationFunction){73 case -1:74 activation = "n/a";75 break;
65
76 case 0:77 activation = "Sigmoid";78 break;79 case 1:80 activation = "Hyperbolic Tangent";81 break;82 case 2:83 activation = "Linear";84 break;85 case 3:86 activation = "Elliott";87 break;88 case 4:89 activation = "Gaussian";90 break;91 case 5:92 activation = "Logarithmic";93 break;94 case 6:95 activation = "Ramp";96 break;97 case 7:98 activation = "Sine";99 break;
100 case 8:101 activation = "Step";102 break;103 case 9:104 activation = "BiPolar";105 break;106 case 10:107 activation = "Bipolar Sigmoid";108 break;109 case 11:110 activation = "Clipped Linear";111 break;112 case 12:113 activation = "Competitive";114 break;115 case 13:116 activation = "Elliott Symmetric";117 break;118 case 14:119 activation = "Softmax";120 break;121 case 15:
66
122 activation = "Steepened Sigmoid";123 break;124 default:125 activation = "Invalid";126 break;127 }128
129 return ("Layer: (" + activation + ", " + isBiased + ", " + neurons130 + ")");131 }132 }
A.3 OutputWriter.java
1 package skynet;2 import java.io.BufferedWriter;3 import java.io.File;4 import java.io.IOException;5 import org.encog.engine.network.activation.*;6 import org.encog.neural.flat.FlatNetwork;7 import org.encog.neural.networks.BasicNetwork;8
9 /**10 * A parent class for other OutputWriters. This class holds all of the11 * shared methods required to create a file and output the code/formula for12 * a trained Artificial Neural Network.13 * @author bwinrich14 */15 public abstract class OutputWriter {16
17 /**18 * The file to write to19 */20 protected File file2;21
22 /**23 * A BufferedWriter for file writing24 */25 protected BufferedWriter bw2 = null;26
27 /**28 * The name of the file29 */30 protected String outputName;31
32 /**33 * Does the data set have headers?
67
34 */35 protected boolean hasHeaders;36
37 /**38 * Array to hold the column names (only if the data set has headers)39 */40 protected String[] columnNames;41
42 /**43 * The data structure for the ANN44 */45 protected BasicNetwork network;46
47 /**48 * The number of input nodes49 */50 protected int inputCount;51
52 /**53 * The number of output nodes54 */55 protected int outputCount;56
57 /**58 * The number of neurons in each layer (including bias neurons)59 */60 protected int[] numberOfTotalNeurons;61
62 /**63 * The number of neurons in each layer (excluding bias neurons)64 */65 protected int[] numberOfNormalNeurons;66
67 /**68 * The value of the biases of each layer, if applicable69 */70 protected double[] biases;71
72 /**73 * The flattened version of the ANN74 */75 protected FlatNetwork myFlat;76
77 /**78 * The number of layers in the ANN79 */
68
80 protected int layers;81
82
83 /**84 * Default constructor85 */86 public OutputWriter(){}87
88 /**89 * Mutator method for hasHeaders90 * @param hasHeaders the hasHeaders to set91 */92 public void setHasHeaders(boolean hasHeaders) {93 this.hasHeaders = hasHeaders;94 }95
96 /**97 * Mutator method for columnNames98 * @param columnNames the columnNames to set99 */
100 public void setColumnNames(String[] columnNames) {101 this.columnNames = columnNames;102 }103
104 /**105 * Mutator method for network106 * @param network the network to set107 */108 public void setNetwork(BasicNetwork network) {109 this.network = network;110 }111
112 /**113 * Mutator method for inputCount114 * @param inputCount the inputCount to set115 */116 public void setInputCount(int inputCount) {117 this.inputCount = inputCount;118 }119
120 /**121 * Mutator method for outputCount122 * @param outputCount the outputCount to set123 */124 public void setOutputCount(int outputCount) {125 this.outputCount = outputCount;
69
126 }127
128 /**129 * Mutator method for numberOfTotalNeurons130 * @param numberOfTotalNeurons the numberOfTotalNeurons to set131 */132 public void setNumberOfTotalNeurons(int[] numberOfTotalNeurons) {133 this.numberOfTotalNeurons = numberOfTotalNeurons;134 }135
136 /**137 * Mutator method for numberOfNormalNeurons138 * @param numberOfNormalNeurons the numberOfNormalNeurons to set139 */140 public void setNumberOfNormalNeurons(int[] numberOfNormalNeurons) {141 this.numberOfNormalNeurons = numberOfNormalNeurons;142 }143
144 /**145 * Mutator method for biases146 * @param biases the biases to set147 */148 private void setBiases(double[] biases) {149 this.biases = biases;150 }151
152 /**153 * Mutator method for myFlat154 * @param myFlat the myFlat to set155 */156 private void setMyFlat(FlatNetwork myFlat) {157 this.myFlat = myFlat;158 }159
160 /**161 * Mutator method for layers162 * @param layers the layers to set163 */164 public void setLayers(int layers) {165 this.layers = layers;166 }167
168 /**169 * Some variables can be initialized using information already passed to170 * the class.171 */
70
172 public void initializeOtherVariables(){173 setMyFlat(network.getStructure().getFlat());174 setBiases(myFlat.getBiasActivation());175
176 }177
178 /**179 * Creates the file used for output.180 * @param output2Name the name of181 */182 public abstract void createFile(String output2Name);183
184 /**185 * Writes to the output file. Each String passed as a parameter is186 * written on its own line187 * @param stuff The line to be written to the file188 */189 protected void writeTwo(String stuff){190 try{191 bw2.write(stuff);192 bw2.newLine();193 }catch (IOException e){194 // TODO Auto-generated catch block195 e.printStackTrace();196 }197 }198
199 /**200 * Parses the equation of the activation function and returns it in201 * String form202 * @param af The activation function to parse203 * @param varName The variable passed to the activation function204 * @param targetVarName The variable the result of the activation205 * function will be stored in206 * @return The parsed form of the activation function in String form207 */208 protected abstract String parseActivationFunction(ActivationFunction af,209 String varName, String targetVarName);210
211 /**212 * A lengthy method for writing the code/formula for the neural network213 * to a file. Each child class will have its own implementation.214 */215 public abstract void writeFile();216 }
A.4 OutputWriterTxt.java
71
1 package skynet;2 import java.io.BufferedWriter;3 import java.io.File;4 import java.io.FileWriter;5 import java.io.IOException;6 import org.encog.engine.network.activation.*;7
8 /**9 * A child class of OutputWriter, used for creating .txt files
10 * @author bwinrich11 */12 public class OutputWriterTxt extends OutputWriter{13
14 /**15 * Default Constructor16 */17 public OutputWriterTxt(){}18
19 /* (non-Javadoc)20 * @see OutputWriter#writeFile()21 */22 @Override23 public void writeFile() {24
25 writeTwo("//Variable declarations");26
27 //Variables from headers of csv file, if applicable28 if(hasHeaders){29 writeTwo("//Header Names");30 for (String s: columnNames){31 writeTwo(s);32 }33 }34
35 //variables - input layer36 writeTwo("//Input layer");37 for (int i = 0; i<inputCount; i++)38 {39 writeTwo("i"+ i);40 }41 for (int i = inputCount; i<numberOfTotalNeurons[0]; i++)42 {43 writeTwo("i" + i + " = " + biases[biases.length-1]);44 }45
46 //variables - hidden layers
72
47 writeTwo("//Hidden layer(s)");48 for (int i=1; i<layers-1; i++){49 writeTwo("//Hidden Layer " + i);50 for (int j = 0; j<numberOfNormalNeurons[i]; j++){51 for (int k = 0; k < numberOfTotalNeurons[i-1]; k++){52 writeTwo("h" + i + "n" + j + "f" + k);53 }54 writeTwo("h" + i + "n" + j + "t");55 writeTwo("h" + i + "n" + j);56 }57 for (int j = numberOfNormalNeurons[i]; j<numberOfTotalNeurons[i];58 j++){59 writeTwo("h" + i + "n" + j + " = " + biases[biases.length-i-1]);60 }61 }62
63 //varibles - output layer64 writeTwo("//Output layer");65 for (int i=0; i<outputCount; i++){66 for (int j = 0; j < numberOfTotalNeurons[layers-2]; j++){67 writeTwo("o" + i + "f" + j);68 }69 writeTwo("o" + i + "t");70 writeTwo("o" + i);71 }72
73 writeTwo("");74
75 double weight;76 String sum = "";77
78 //Some extra code if we have headers, to set the default input79 //variables to the header variables80 if(hasHeaders){81 for (int i = 0; i < inputCount; i++){82 writeTwo("i" + i + " = " + columnNames[i]);83 }84 }85
86 //Hidden layers calculation87 for (int i = 1; i<layers-1; i++){88 for (int j = 0; j<numberOfNormalNeurons[i]; j++){89 writeTwo("");90
91 sum = "";92
73
93 for (int k = 0; k < numberOfTotalNeurons[i-1]; k++){94 weight = network.getWeight(i-1,k,j);95 if (i == 1)96 {97 writeTwo("h" + i + "n" + j + "f" + k + " = i" + k + " * "98 + weight);99 }else{
100 writeTwo("h" + i + "n" + j + "f" + k + " = h" + (i-1) + "n"101 + k + " * " + weight);102 }103
104 if (k == 0){105 sum = "h" + i + "n" + j + "f" + k;106 }else{107 sum += " + h" + i + "n" + j + "f" + k;108 }109 }110 writeTwo("h" + i + "n" + j + "t = " + sum);111
112 String af = parseActivationFunction(network.getActivation(i),113 "h" + i + "n" + j + "t", "h" + i + "n" + j);114 writeTwo(af.substring(0,af.length()-1));115 }116 }117
118 //Output layer calculation119 writeTwo("");120
121 sum = "";122
123 for (int i = 0; i<outputCount; i++){124 for (int j = 0; j<numberOfTotalNeurons[layers-2]; j++){125 weight = network.getWeight(layers-2,j,i);126 writeTwo ("o" + i + "f" + j + " = h" + (layers-2) + "n" + j127 + " * " + weight);128
129 if (j == 0){130 sum = "o" + i + "f" + j;131 }else{132 sum += " + o" + i + "f" + j;133 }134 }135 writeTwo("o" + i + "t = " + sum);136
137 String af = parseActivationFunction(network.getActivation(layers-1),138 "o" + i + "t", "o" + i);
74
139 writeTwo(af.substring(0,af.length()-1));140 }141
142 //Some extra code if we have headers, to set the default input143 //variables to the header variables144 if (hasHeaders){145 writeTwo("");146
147 for (int i = 0; i < outputCount; i++){148 writeTwo(columnNames[i + inputCount] + " = o" + i);149 }150 }151
152 writeTwo("");153
154 try{155 if(bw2!=null)156 bw2.close();157 }catch(Exception ex){158 System.out.println("Error in closing the BufferedWriter"+ex);159 }160 }161
162 /* (non-Javadoc)163 * @see OutputWriter#createFile(java.lang.String)164 */165 @Override166 public void createFile(String output2Name){167
168 outputName = output2Name;169
170 try{171 file2 = new File(output2Name + ".txt");172 if (!file2.exists()) {173 file2.createNewFile();174 }175
176 FileWriter fw2 = new FileWriter(file2);177 bw2 = new BufferedWriter(fw2);178 }catch (IOException e){179 // TODO Auto-generated catch block180 e.printStackTrace();181 }182 }183
184 @Override
75
185 protected String parseActivationFunction(ActivationFunction af,186 String varName, String targetVarName){187 String text = null;188
189 if (af instanceof ActivationSigmoid){190 text = targetVarName + " = 1.0 / (1.0 + e^(-1 * " + varName + "))";191 }else if (af instanceof ActivationTANH){192 text = targetVarName + " = tanh(" + varName + ")";193 }else if (af instanceof ActivationLinear){194 text = targetVarName + " = " + varName;195 }else if (af instanceof ActivationElliott){196 double s = af.getParams()[0];197 text = targetVarName + " = ((" + varName + " * " + s198 + ") / 2) / (1 + |" + varName + " * " + s + "|) + 0.5";199 }else if (af instanceof ActivationGaussian){200 text = targetVarName + " = e^(-(2.5*" + varName + ")^2)";201 }else if (af instanceof ActivationLOG){202 text = "if(" + varName + " >= 0){\n\t" + targetVarName203 + " = log(1 + " + varName + ")\n}else{\n\t" + targetVarName204 + " = -log(1 - " + varName + ")\n}";205 }else if (af instanceof ActivationRamp){206 double paramRampHighThreshold =207 ((ActivationRamp)(af)).getThresholdHigh();208 double paramRampLowThreshold =209 ((ActivationRamp)(af)).getThresholdLow();210 double paramRampHigh = ((ActivationRamp)(af)).getHigh();211 double paramRampLow = ((ActivationRamp)(af)).getLow();212 double slope = (paramRampHighThreshold-paramRampLowThreshold)213 / (paramRampHigh-paramRampLow);214
215 text = "if(" + varName + " < " + paramRampLowThreshold + ") {\n\t"216 + targetVarName + " = " + paramRampLow + "\n} else if ("217 + varName + " > " + paramRampHighThreshold + ") {\n\t"218 + targetVarName + " = " + paramRampHigh + "\n} else {\n\t"219 + targetVarName + " = (" + slope + " * " + varName + ")";220 }else if (af instanceof ActivationSIN){221 text = targetVarName + " = sin(2.0*" + varName + ")";222 }else if (af instanceof ActivationStep){223 double paramStepCenter = ((ActivationStep)(af)).getCenter();224 double paramStepLow = ((ActivationStep)(af)).getLow();225 double paramStepHigh = ((ActivationStep)(af)).getHigh();226
227 text = "if (" + varName + ">= " + paramStepCenter + ") {\n\t"228 + targetVarName + " = " + paramStepHigh + "\n} else {\n\t"229 + targetVarName + " = " + paramStepLow + "\n}";230 }else if (af instanceof ActivationBiPolar){
76
231 text = "if(" + varName + " > 0) {\n\t" + targetVarName232 + " = 1\n} else {\n\t" + targetVarName + " = -1\n}";233 }else if (af instanceof ActivationBipolarSteepenedSigmoid){234 text = targetVarName + " = (2.0 / (1.0 + e^(-4.9 * " + varName235 + "))) - 1.0";236 }else if (af instanceof ActivationClippedLinear){237 text = "if(" + varName + " < -1.0) {\n\t" + targetVarName238 + " = -1.0\n} else if (" + varName + " > 1.0) {\n\t"239 + targetVarName + " = 1.0\n} else {\n\t" + targetVarName240 + " = " + varName + "\n}";241 }else if (af instanceof ActivationElliottSymmetric){242 double s = af.getParams()[0];243 text = targetVarName + " = (" + varName + "*" + s + ") / (1 + |"244 + varName + "*" + s + "|)";245 }else if (af instanceof ActivationSteepenedSigmoid){246 text = targetVarName + " = 1.0 / (1.0 + e^(-4.9 * " + varName247 + "))";248 }else{249 //Unimplemented activation function: Softmax (complicated)250 //Unimplemented activation function: Competitive (complicated,251 //non-differentiable)252 //in Encog 3.3 there aren’t any other activation functions, so253 //unless someone implements their own we shouldn’t get to this point254 text = "Error: unknown activation function";255 }256 return text;257 }258 }
A.5 OutputWriterJava.java
1 package skynet;2 import java.io.BufferedWriter;3 import java.io.File;4 import java.io.FileWriter;5 import java.io.IOException;6 import org.encog.engine.network.activation.*;7
8 /**9 * A child class out OutputWriter, used for creating .java files.
10 * @author bwinrich11 */12 public class OutputWriterJava extends OutputWriter{13
14 private boolean standalone;15
16 /**
77
17 * Default constructor18 */19 public OutputWriterJava(){}20
21 /**22 * Constructor with parameter23 * @param standalone main method or not24 */25 public OutputWriterJava(boolean standalone){26 this.standalone=standalone;27 }28
29 /* (non-Javadoc)30 * @see OutputWriter#writeFile()31 */32 @Override33 public void writeFile() {34
35 writeTwo("import java.lang.Math;");36
37 writeTwo("");38
39 writeTwo("public class " + outputName);40 writeTwo("{");41
42 writeTwo("\t//Variable declarations");43
44 //Variables from headers of csv file, if applicable45 if(hasHeaders){46 writeTwo("\t//Header Names");47 for (String s: columnNames){48 writeTwo("\tpublic static double " + s + ";");49 }50 }51
52 //variables - input layer53 writeTwo("\t//Input layer");54 for (int i = 0; i<inputCount; i++)55 {56 writeTwo("\tpublic static double i"+ i + ";");57 }58 for (int i = inputCount; i<numberOfTotalNeurons[0]; i++)59 {60 writeTwo("\tprivate static double i" + i + " = "61 + biases[biases.length-1] + ";");62 }
78
63
64 //variables - hidden layers65 writeTwo("\t//Hidden layer(s)");66 for (int i=1; i<layers-1; i++){67 writeTwo("\t//Hidden Layer " + i);68 for (int j = 0; j<numberOfNormalNeurons[i]; j++){69 for (int k = 0; k < numberOfTotalNeurons[i-1]; k++){70 writeTwo("\tprivate static double h" + i + "n" + j + "f" + k71 +";");72 }73 writeTwo("\tprivate static double h" + i + "n" + j + "t;");74 writeTwo("\tprivate static double h" + i + "n" + j + ";");75 }76 for (int j = numberOfNormalNeurons[i]; j<numberOfTotalNeurons[i];77 j++){78 writeTwo("\tprivate static double h" + i + "n" + j + " = "79 + biases[biases.length-i-1]+";");80 }81 }82
83 //varibles - output layer84 writeTwo("\t//Output layer");85 for (int i=0; i<outputCount; i++){86 for (int j = 0; j < numberOfTotalNeurons[layers-2]; j++){87 writeTwo("\tprivate static double o" + i + "f" + j +";");88 }89 writeTwo("\tprivate static double o" + i + "t;");90 writeTwo("\tpublic static double o" + i + ";");91 }92
93 writeTwo("");94
95 //standalone files will have a main method96 if(standalone){97
98 //TODO: customize this (from Tiberius)99 writeTwo("\tpublic static void main(String[] args)");
100 writeTwo("\t{");101 writeTwo("\t\tinitData();");102 writeTwo("\t\tcalcNet();");103 for (int i = 0; i < outputCount; i++){104 writeTwo("\t\tSystem.out.println(o" + i + ");" );105 }106 writeTwo("\t}");107 writeTwo("");108
79
109 //TODO: customize this (from Tiberius)110 writeTwo("\tpublic static void initData()");111 writeTwo("\t{");112 writeTwo("\t\t//data is set here");113 for (int i = 0; i < inputCount; i++){114 if(hasHeaders){115 writeTwo("\t\t" + columnNames[i] + " = 1;");116 }else{117 writeTwo("\t\ti" + i + " = 1;");118 }119 }120 writeTwo("\t}");121 writeTwo("");122 }123
124 double weight;125 String sum = "";126
127 writeTwo("\tpublic static void calcNet()");128 writeTwo("\t{");129
130 //Some extra code if we have headers, to set the default input131 //variables to the header variables132 if (hasHeaders){133 for (int i = 0; i < inputCount; i++){134 writeTwo("\t\ti" + i + " = " + columnNames[i] + ";");135 }136 }137
138 //Hidden layers calculation139 for (int i = 1; i<layers-1; i++){140 for (int j = 0; j<numberOfNormalNeurons[i]; j++){141 writeTwo("");142
143 sum = "";144
145 for (int k = 0; k < numberOfTotalNeurons[i-1]; k++){146 weight = network.getWeight(i-1,k,j);147 if (i == 1)148 {149 writeTwo("\t\th" + i + "n" + j + "f" + k + " = i" + k150 + " * " + weight + ";");151 }else{152 writeTwo("\t\th" + i + "n" + j + "f" + k + " = h" + (i-1)153 + "n" + k + " * " + weight + ";");154 }
80
155
156 if (k == 0){157 sum = "h" + i + "n" + j + "f" + k;158 }else{159 sum += " + h" + i + "n" + j + "f" + k;160 }161 }162
163 writeTwo("\t\th" + i + "n" + j + "t = " + sum + ";");164
165 String af = parseActivationFunction(network.getActivation(i),166 "h" + i + "n" + j + "t", "h" + i + "n" + j);167 writeTwo("\t\t" + af);168 }169 }170
171 //Output layer calculation172 writeTwo("");173
174 sum = "";175
176 for (int i = 0; i<outputCount; i++){177 for (int j = 0; j<numberOfTotalNeurons[layers-2]; j++){178 weight = network.getWeight(layers-2,j,i);179 writeTwo("\t\to" + i + "f" + j + " = h" + (layers-2) + "n" + j180 + " * " + weight + ";");181
182 if (j == 0){183 sum = "o" + i + "f" + j;184 }else{185 sum += " + o" + i + "f" + j;186 }187 }188 writeTwo("\t\to" + i + "t = " + sum + ";");189
190 String af = parseActivationFunction(network.getActivation(layers-1),191 "o" + i + "t", "o" + i);192 writeTwo("\t\t" + af);193 }194
195 //Some extra code if we have headers, to set the default input196 //variables to the header variables197 if (hasHeaders){198 writeTwo("");199 for (int i = 0; i < outputCount; i++){200 writeTwo("\t\t" + columnNames[i + inputCount] + " = o" + i
81
201 + ";");202 }203 }204
205 writeTwo("\t}");206
207 writeTwo("");208 writeTwo("}");209
210 try{211 if(bw2!=null)212 bw2.close();213 }catch(Exception ex){214 System.out.println("Error in closing the BufferedWriter"+ex);215 }216 }217
218 /* (non-Javadoc)219 * @see OutputWriter#createFile(java.lang.String)220 */221 @Override222 public void createFile(String output2Name){223
224 outputName = output2Name;225
226 try{227 file2 = new File(output2Name + ".java");228 if (!file2.exists()) {229 file2.createNewFile();230 }231
232 FileWriter fw2 = new FileWriter(file2);233 bw2 = new BufferedWriter(fw2);234 }catch (IOException e){235 // TODO Auto-generated catch block236 e.printStackTrace();237 }238 }239
240 @Override241 protected String parseActivationFunction(ActivationFunction af,242 String varName, String targetVarName){243 String text = null;244
245 if (af instanceof ActivationSigmoid){246 text = targetVarName + " = 1.0 / (1.0 + Math.exp(-1 * " + varName
82
247 + "));";248 }else if (af instanceof ActivationTANH){249 text = targetVarName + " = Math.tanh(" + varName + ");";250 }else if (af instanceof ActivationLinear){251 text = targetVarName + " = " + varName + ";";252 }else if (af instanceof ActivationElliott){253 double s = af.getParams()[0];254 text = targetVarName + " = ((" + varName + " * " + s255 + ") / 2) / (1 + Math.abs(" + varName + " * " + s256 + ")) + 0.5;";257 }else if (af instanceof ActivationGaussian){258 text = targetVarName + " = Math.exp(-Math.pow(2.5*" + varName259 + ",2.0));";260 }else if (af instanceof ActivationLOG){261 text = "if(" + varName + " >= 0){\n\t" + targetVarName262 + " = Math.log(1 + " + varName + ");\n}else{\n\t"263 + targetVarName + " = -Math.log(1 - " + varName + ");\n}";264 }else if (af instanceof ActivationRamp){265 double paramRampHighThreshold =266 ((ActivationRamp)(af)).getThresholdHigh();267 double paramRampLowThreshold =268 ((ActivationRamp)(af)).getThresholdLow();269 double paramRampHigh = ((ActivationRamp)(af)).getHigh();270 double paramRampLow = ((ActivationRamp)(af)).getLow();271 double slope = (paramRampHighThreshold-paramRampLowThreshold)272 / (paramRampHigh-paramRampLow);273
274 text = "if(" + varName + " < " + paramRampLowThreshold + ") {\n\t"275 + targetVarName + " = " + paramRampLow + ";\n} else if ("276 + varName + " > " + paramRampHighThreshold + ") {\n\t"277 + targetVarName + " = " + paramRampHigh + ";\n} else {\n\t"278 + targetVarName + " = (" + slope + " * " + varName + ");";279 }else if (af instanceof ActivationSIN){280 text = targetVarName + " = Math.sin(2.0*" + varName + ");";281 }else if (af instanceof ActivationStep){282 double paramStepCenter = ((ActivationStep)(af)).getCenter();283 double paramStepLow = ((ActivationStep)(af)).getLow();284 double paramStepHigh = ((ActivationStep)(af)).getHigh();285
286 text = "if (" + varName + ">= " + paramStepCenter + ") {\n\t"287 + targetVarName + " = " + paramStepHigh + ";\n} else {\n\t"288 + targetVarName + " = " + paramStepLow + ";\n}";289 }else if (af instanceof ActivationBiPolar){290 text = "if(" + varName + " > 0) {\n\t" + targetVarName291 + " = 1;\n} else {\n\t" + targetVarName + " = -1;\n}";292 }else if (af instanceof ActivationBipolarSteepenedSigmoid){
83
293 text = targetVarName + " = (2.0 / (1.0 + Math.exp(-4.9 * "294 + varName + "))) - 1.0;";295 }else if (af instanceof ActivationClippedLinear){296
297 text = "if(" + varName + " < -1.0) {\n\t" + targetVarName298 + " = -1.0;\n} else if (" + varName + " > 1.0) {\n\t"299 + targetVarName + " = 1.0;\n} else {\n\t" + targetVarName300 + " = " + varName + ";\n}";301 }else if (af instanceof ActivationElliottSymmetric){302 double s = af.getParams()[0];303 text = targetVarName + " = (" + varName + "*" + s304 + ") / (1 + Math.abs(" + varName + "*" + s + "));";305 }else if (af instanceof ActivationSteepenedSigmoid){306 text = targetVarName + " = 1.0 / (1.0 + Math.exp(-4.9 * " + varName307 + "));";308 }else{309 //Unimplemented activation function: Softmax (complicated)310 //Unimplemented activation function: Competitive (complicated,311 //non-differentiable)312 //in Encog 3.3 there aren’t any other activation functions, so313 //unless someone implements their own we shouldn’t get to this point314 text = "Error: unknown activation function";315 }316 return text;317 }318 }
A.6 TrainingSetUtil.java (modified)
1 /*2 * Encog(tm) Core v3.3 - Java Version3 * http://www.heatonresearch.com/encog/4 * https://github.com/encog/encog-java-core5
6 * Copyright 2008-2014 Heaton Research, Inc.7 *8 * Licensed under the Apache License, Version 2.0 (the "License");9 * you may not use this file except in compliance with the License.
10 * You may obtain a copy of the License at11 *12 * http://www.apache.org/licenses/LICENSE-2.013 *14 * Unless required by applicable law or agreed to in writing, software15 * distributed under the License is distributed on an "AS IS" BASIS,16 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.17 * See the License for the specific language governing permissions and18 * limitations under the License.
84
19 *20 * For more information on Heaton Research copyrights, licenses21 * and trademarks visit:22 * http://www.heatonresearch.com/copyright23 */24 package org.encog.util.simple;25
26 /*27 * modified: added condition so that it will ignore rows with incomplete
data28 */29
30 /*31 * Additional modification: added way to retrieve header information32 */33
34 import java.util.ArrayList;35 import java.util.List;36
37 import org.encog.ml.data.MLData;38 import org.encog.ml.data.MLDataPair;39 import org.encog.ml.data.MLDataSet;40 import org.encog.ml.data.basic.BasicMLData;41 import org.encog.ml.data.basic.BasicMLDataPair;42 import org.encog.ml.data.basic.BasicMLDataSet;43 import org.encog.util.EngineArray;44 import org.encog.util.ObjectPair;45 import org.encog.util.csv.CSVError;46 import org.encog.util.csv.CSVFormat;47 import org.encog.util.csv.ReadCSV;48
49 public class TrainingSetUtil {50
51 private static List<String> columnNames = new ArrayList<String>();52
53 /**54 * Load a CSV file into a memory dataset.55 * @param format The CSV format to use.56 * @param filename The filename to load.57 * @param headers True if there is a header line.58 * @param inputSize The input size. Input always comes first in a file.59 * @param idealSize The ideal size, 0 for unsupervised.60 * @return A NeuralDataSet that holds the contents of the CSV file.61 */62 public static MLDataSet loadCSVTOMemory(CSVFormat format,63 String filename, boolean headers, int inputSize, int idealSize) {
85
64 MLDataSet result = new BasicMLDataSet();65 ReadCSV csv = new ReadCSV(filename, headers, format);66
67 if(headers){68 columnNames = csv.getColumnNames();69 }70
71
72 int ignored = 0;73
74 while (csv.next()) {75 MLData input = null;76 MLData ideal = null;77 int index = 0;78 try{79 input = new BasicMLData(inputSize);80 for (int i = 0; i < inputSize; i++) {81 double d = csv.getDouble(index++);82 input.setData(i, d);83 }84
85 if (idealSize > 0) {86 ideal = new BasicMLData(idealSize);87 for (int i = 0; i < idealSize; i++) {88 double d = csv.getDouble(index++);89 ideal.setData(i, d);90 }91 }92
93 MLDataPair pair = new BasicMLDataPair(input, ideal);94 result.add(pair);95 }catch (CSVError e){96 ignored++;97
98 //e.printStackTrace();99 }
100 }101 System.out.println("Rows ignored: " + ignored);102
103 return result;104 }105
106 public static ObjectPair<double[][], double[][]> trainingToArray(107 MLDataSet training) {108 int length = (int)training.getRecordCount();109 double[][] a = new double[length][training.getInputSize()];
86
110 double[][] b = new double[length][training.getIdealSize()];111
112 int index = 0;113 for (MLDataPair pair : training) {114 EngineArray.arrayCopy(pair.getInputArray(), a[index]);115 EngineArray.arrayCopy(pair.getIdealArray(), b[index]);116 index++;117 }118
119 return new ObjectPair<double[][], double[][]>(a, b);120 }121
122 /**123 * @return the columnNames124 */125 public static List<String> getColumnNames() {126 return columnNames;127 }128
129
130 }
87
BIBLIOGRAPHY
Charles, D. and Mcglinchey, S., “The past, present and future of artificial neuralnetworks in digital games,” Proceedings of the 5th international conferenceon computer games: artificial intelligence, design and education, pp. 163–169,2004.
Clabaugh, C., Myszewski, D., and Pang, J. “History: The 1940’s to the 1970’s.”[Online]. Available: https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html
Cortes, C. and Vapnik, V., “Support-vector networks,” Machine Learning, vol. 20,no. 3, pp. 273–297, 1995. [Online]. Available: http://dx.doi.org/10.1007/BF00994018
Elliott, D. L., “A better activation function for artificial neural networks,” 1993.
Galway, L., Charles, D., and Black, M., “Machine learning in digital games: asurvey,” Artificial Intelligence Review, vol. 29, no. 2, pp. 123–161, 2008.
Heaton, J. “The number of hidden layers.” September 2008. [Online]. Available:http://www.heatonresearch.com/node/707
Heaton, J. “Elliott activation function.” September 2011. [Online]. Available:http://www.heatonresearch.com/wiki/Elliott_Activation_Function
Heaton, J. “Range normalization.” September 2011. [Online]. Available:http://www.heatonresearch.com/wiki/Range_Normalization
Hebb, D., “The organization of behavior; a neuropsychological theory.” 1949.
Jain, A. K., Mao, J., and Mohiuddin, K., “Artificial neural networks: A tutorial,”1996.
McCulloch, W. and Pitts, W., “A logical calculus of the ideas immanent innervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp.115–133, 1943. [Online]. Available: http://dx.doi.org/10.1007/BF02478259
Riedmiller, M. and Braun, H., “A direct adaptive method for faster backprop-agation learning: The rprop algorithm,” in Neural Networks, 1993., IEEEInternational Conference on. IEEE, 1993, pp. 586–591.
Rojas, R., “The backpropagation algorithm,” in Neural Networks. Springer, 1996,pp. 149–182.
88
Rosenblatt, F., “The perceptron: a probabilistic model for information storage andorganization in the brain.” Psychological review, vol. 65, no. 6, p. 386, 1958.
Stuart, K. “How to get into the games industry – an insiders’ guide.” March 2014.[Online]. Available: http://www.theguardian.com/technology/2014/mar/20/how-to-get-into-the-games-industry-an-insiders-guide
Widrow, B., Hoff, M. E., et al., “Adaptive switching circuits.” 1960.
89