Predicting NFL Game Outcomes
Paul McBride
ECE 539
Professor Yu Hen Hu
Introduction
Football has become an incredibly popular sport in America over the past 20 years or so.
Every week, millions and millions of people tune in to watch the plethora of NFL games
available to them. This has turned the NFL into a multibillion dollar corporation. This
expansion of the NFL is also correlated with the rise popularity of sports news shows such as
SportsCenter. On these programs, the analysts voice their opinions on who they think the
winner of the games will be. These picks are largely based on personal opinion and not truly
based on the data. I am looking to remove this human bias and find out truly what the data
says about winning a football game.
NFL football is an incredibly complex game. Sure, there are some stats you can count on
to correlate with a win, such as total yards, but that is not always the case. A quarterback could
have a bad game, a usually solid running back could cough up the ball in key positions; you
really never totally know what is going to happen. These intangibles make predicting outcomes
very challenging. But, based on the average gameplay of a team, it seems like using a large
amount of data could be reliably used to smooth out the intangibles.
Due to the NFL’s incredible popularity, there exist a huge amount of game-by-game data
online. Actually, the amount of data available is quite overwhelming. The challenging task is to
pick stats that have a big impact on the gameplay. I decided to focus on offensive stats. The
NFL is an incredibly offensive minded league and that’s not to say that defense doesn’t have an
impact. It definitely does, but the play of a stellar quarterback has been known to be key to
winning football games.
I decided to get my data from a typical box score, or the team play stats per game. A
standard box score has more than enough data to be effective. Box score contain offensive
stats such as total yards, interceptions throw, turnovers, etc. So I will have more than enough
useful stats to use.
Just by taking a quick look at the numbers of a typical box score, one can tell that there
is not just one telling stat on who won the game. For example, total yards is usually a good
indicator on who won the game – the winning more often than not has more total yards than
the losing team. But this just is not always the case. It’s facts like this that lead me to believe
that there does not exist a linear mapping method to choose the winner. And it’s situations like
this that are perfect for a neural network, more specifically a back-propagating multilayer
perceptron.
Work Performed
Before any predicting could be performed, I had to decide on the data that I wanted to
work with and to collect said data. As I alluded to in the introduction, the NFL is a very
offensive minded game. Therefore, I wanted to analyze offensive stats to see if an efficient
prediction network could come to fruition. Using my knowledge of football, I decided that the
following stats would be good information for the neural network to learn from:
1. Homefield advantage: more often than not, the home team performs better in front
of their home town fans.
2. Firstdowns: If a team has a high number of first downs, it means they were able to
move the ball up and down the field.
3. Thirddown conversion percentage: If a team has a high thirddown conversion
percentage, the offense is able to stay out on the field longer.
4. Rush attempts: This stat tracks how many times the team attempted to run the ball.
5. Rush yards: If the team has a high number of rushing yards, it usually complements
the passing game quite nicely.
6. Pass attempts: This stat tracks how many times the team attempted to throw the
ball.
7. Pass yards: If the team has a high number of pass yards, it usually complements the
rushing game.
8. Total yards: If a team has a large amount of total yards, it mean there offense was
working effectively.
9. Pass interceptions: This stat tracks the number of interceptions thrown by the
quarter back. An interception turns the ball over to the other team’s offense.
10. Fumbles lost: This stat tracks the number of fumbles lost by the team. A fumble
picked up by the other team means it is their ball.
11. Turnovers: If a team has a high number of turnovers, it indicates that the offense
was not able to stay out on the field.
12. Sacks: This stat tracks the number of times the quarter back was sacked. A sack
results in a loss of yards.
13. Sack yards loss: This stats tracks the yards lost by the offense due to sacks.
14. Penalty yards: This stat tracks the amount of penalty yards lost by the offense.
Penalties make it difficult for the offense to string together a number of plays to stay
out on the field.
I decided to collect data from the 2012 NFL season to train my network. I was able to
get data from a site that has excel csv files available for download
(http://www.repole.com/sun4cast/data.html). I chose the 2012 season because the brand of
football played would be very similar to the 2013 season. After I downloaded the needed csv
file, I uploaded the data into an excel workbook. I then, got rid of unneeded statistics. Since
one team goes up against another team, I wanted to make the attributes of my feature vectors
differential statistics. In other words, if the match up was Team A vs Team B, I subtracted the
stats of Team B from the stats of Team A. If Team A had won that game, the outcome was 1. If
Team B had won, the outcome was -1. Below is an example of a feature vector.
This gave me 508 feature vectors to train my nueral network with. Each feature vector
consisted of 15 attributes and one outcome attribute.
Before I constructed a back-propagating multilayer perceptron, I wanted to have
another classifier to compare the results to. I decided on a support vector machine. There are
many configurations of a support vector machine that I could choose from, so for my baseline I
decided to test three different svm’s with different kernel functions: linear kernel function,
polynomial degree 3 kernel function, and a radial basis kernel function. For each of these I used
constraint values of 1 and 1000000. To get a true feel on how these support vector machines
would perform in real life, I used 4-way cross validation to test the classification rate and to
generate confusion matrices of the results. I give credit to Professor Hu’s Matlab files for my
training. However, the drivers that run the routines are all written by me. Below are the results
I achieved using the support vector machines:
Linear C = 1 Linear C = 1000000 Polynomial C = 1
225 29 224 30 204 50
28 226 31 223 52 202
Class % 0.887795276 0.87992126 0.799212598
Polynomial C = 1000000 RBF C = 1 RBF C = 1000000
205 49 217 37 212 42
53 201 36 218 39 215
Class % 0.799212598 0.856299213 0.840551181
As you can see, the support vector machine using a linear kernel with constraint value equal to
1 performed the best. I was very pleased with the results because predicting ~89% of games is
quite the feat.
Now that I had my baseline classification percentage, it was time to decide what
structures of multilayer perceptron to use. The reason I decided to use a back-propagating
neural network was for its great flexibility and that the back-propagation provides supervised
learning. Since there are many ways an NFL game can go, supervised learning was an attractive
feature. And since it’s flexible, there are many different parameters that I can try to use to
increase the classification rate of the network. Again the algorithm and code used to train the
network is based off of the Matlab file supplied by Professor Hu on the course website. But the
drivers and the data preparation were all written by me. Once again, I decided to use 4-way
cross validation to mimic how the machines would be used in real use.
Right off the bat, I did not really know what to expect so I tried a variety of different
inputs. Below are a few example of my initial results:
# hidden layer = 5
runs\alpha
mu=0 mu=.8
0.01 0.05 0.1 0.01 0.05 0.1
0.848425 0.838583 0.834646 0.846457 0.801181 0.814961
0.832677 0.840551 0.82874 0.840551 0.834646 0.805118
0.834646 0.832677 0.820866 0.850394 0.82874 0.820866
0.840551 0.834646 0.826772 0.824803 0.809055 0.822835
0.826772 0.82874 0.807087 0.854331 0.809055 0.830709
0.82874 0.814961 0.807087 0.852362 0.816929 0.809055
0.824803 0.84252 0.838583 0.836614 0.822835 0.809055
0.826772 0.840551 0.82874 0.838583 0.838583 0.805118
avg 0.832923 0.834154 0.824065 0.843012 0.820128 0.814715
# hidden layer = 2
runs\alpha
mu=0 mu=.8
0.01 0.05 0.1 0.01 0.05 0.1
0.84252 0.82874 0.82874 0.858268 0.836614 0.838583
0.824803 0.834646 0.834646 0.848425 0.844488 0.846457
0.830709 0.840551 0.834646 0.832677 0.840551 0.82874
0.836614 0.82874 0.838583 0.84252 0.838583 0.840551
0.816929 0.82874 0.836614 0.832677 0.84252 0.836614
0.814961 0.838583 0.82874 0.846457 0.850394 0.840551
0.832677 0.82874 0.838583 0.860236 0.840551 0.834646
0.824803 0.836614 0.82874 0.836614 0.860236 0.844488
avg 0.828002 0.833169 0.833661 0.844734 0.844242 0.838829
# hidden layer = 2
runs\alpha
mu=.8
0.006 0.008 0.01 0.012 0.014 0.016
0.846457 0.84252 0.84252 0.856299 0.848425 0.822835
0.864173 0.840551 0.860236 0.846457 0.84252 0.850394
0.848425 0.856299 0.852362 0.848425 0.838583 0.836614
0.82874 0.82874 0.834646 0.836614 0.838583 0.838583
0.82874 0.852362 0.860236 0.838583 0.858268 0.840551
0.852362 0.836614 0.846457 0.834646 0.848425 0.840551
0.846457 0.84252 0.856299 0.838583 0.844488 0.846457
0.84252 0.84252 0.848425 0.84252 0.844488 0.84252
avg 0.844734 0.842766 0.850148 0.842766 0.845472 0.839813
runs\# in hidden layer
alpha=.01 mu=.8
2 3 4 5 6 7
0.844488 0.834646 0.82874 0.840551 0.822835 0.836614
0.834646 0.850394 0.856299 0.840551 0.870079 0.840551
0.852362 0.844488 0.850394 0.834646 0.830709 0.832677
0.848425 0.856299 0.846457 0.82874 0.830709 0.820866
0.836614 0.834646 0.850394 0.856299 0.848425 0.820866
0.856299 0.834646 0.836614 0.840551 0.830709 0.834646
0.832677 0.824803 0.82874 0.832677 0.850394 0.816929
0.84252 0.840551 0.840551 0.830709 0.838583 0.850394
avg 0.843504 0.840059 0.842274 0.838091 0.840305 0.831693
Right off the bat the results were discouraging. The best it seemed that I could do was ~85%
classification rate. While that is actually pretty good, It was not better than the linear support
vector machine that I had previously trained. It was then that I decided to preprocess the data
by running a singular value decomposition. The follow code snippet is an example of my data
preprocessing:
[ua,sa,va] = svd(data(:,1:15),0); m=11; u_vec1 = ua(:, 1:m); vec1 = u_vec1 * u_vec1'; svdData = [vec1*data(:, 1:15) data(:,16:17)];
The challenge doing this was deciding how many of the principal vectors to process my data
with (the choice of variable m in the code snippet). I just decided to give a couple
configurations a try:
Right away, I could tell that the results were looking more promising. I was able to predict
about 86% of the games without much tinkering of the input variables. Through a great deal of
analysis, I was able to raise the prediction classification rate to 88% using 11 singular value
decomposition vectors. Below are the two most successful configurations:
Configuration #1 (88.1234 % classification rate):
1) # of total layers = 3
# hidden layer = 5
5 6 7 8 9 10
0.842519685 0.850394 0.870079 0.848425 0.852362 0.864173
0.838582677 0.854331 0.850394 0.858268 0.858268 0.866142
0.842519685 0.856299 0.854331 0.836614 0.866142 0.875984
0.848425197 0.858268 0.852362 0.852362 0.856299 0.854331
0.834645669 0.854331 0.850394 0.84252 0.86811 0.848425
0.848425197 0.852362 0.858268 0.854331 0.875984 0.854331
0.840551181 0.850394 0.860236 0.836614 0.858268 0.874016
0.838582677 0.862205 0.862205 0.852362 0.840551 0.850394
avg 0.841781496 0.854823 0.857283 0.847687 0.859498 0.860974
runs\# of SVD prince vectors
mu=0 alpha = .01
# hidden layer = 5 mu=.02
alpha = .007
SVD prince vectors 11 11 11 11 11 11 11 11 11 11 11 11
0.889764 0.879921 0.885827 0.874016 0.875984 0.874016 0.883858 0.875984 0.877953 0.879921 0.885827 0.899606
0.874016 0.877953 0.885827 0.879921 0.889764 0.895669 0.891732 0.885827 0.88189 0.874016 0.872047 0.864173
0.879921 0.897638 0.887795 0.889764 0.874016 0.88189 0.889764 0.879921 0.874016 0.870079 0.872047 0.885827
0.891732 0.858268 0.874016 0.88189 0.874016 0.879921 0.846457 0.889764 0.875984 0.903543 0.88189 0.889764
0.887795 0.889764 0.877953 0.877953 0.883858 0.883858 0.889764 0.875984 0.877953 0.885827 0.875984 0.875984
0.885827 0.887795 0.874016 0.875984 0.874016 0.88189 0.889764 0.885827 0.889764 0.879921 0.874016 0.88189
0.893701 0.88189 0.874016 0.885827 0.883858 0.891732 0.874016 0.88189 0.879921 0.879921 0.879921 0.88189
0.895669 0.885827 0.879921 0.88189 0.883858 0.86811 0.877953 0.875984 0.883858 0.877953 0.872047 0.877953
0.887303 0.882382 0.879921 0.880906 0.879921 0.882136 0.880413 0.881398 0.880167 0.881398 0.876722 0.882136
0.881234
# hidden layer = 2 mu=.02
alpha = .006
SVD prince vectors 11 11 11 11 11 11 11 11 11 11 11 11
0.893701 0.893701 0.885827 0.887795 0.875984 0.891732 0.874016 0.874016 0.870079 0.879921 0.887795 0.885827
0.88189 0.889764 0.897638 0.874016 0.879921 0.879921 0.860236 0.874016 0.86811 0.879921 0.887795 0.850394
0.874016 0.86811 0.883858 0.879921 0.879921 0.88189 0.885827 0.883858 0.883858 0.879921 0.883858 0.879921
0.879921 0.883858 0.885827 0.889764 0.862205 0.88189 0.856299 0.883858 0.879921 0.877953 0.872047 0.88189
0.885827 0.875984 0.885827 0.86811 0.875984 0.885827 0.88189 0.877953 0.885827 0.875984 0.883858 0.874016
0.893701 0.885827 0.879921 0.870079 0.885827 0.887795 0.883858 0.860236 0.860236 0.879921 0.870079 0.883858
0.877953 0.891732 0.883858 0.879921 0.877953 0.893701 0.877953 0.887795 0.88189 0.877953 0.86811 0.866142
0.887795 0.889764 0.887795 0.874016 0.88189 0.887795 0.887795 0.887795 0.875984 0.891732 0.883858 0.879921
0.88435 0.884843 0.886319 0.877953 0.877461 0.886319 0.875984 0.878691 0.875738 0.880413 0.879675 0.875246
0.880249
2) # of hidden neurons = 5
3) Momentum = .02
4) Learning rate = .007
5) # of SVD principal vectors = 11
Configuration #2 (88.0249 % classification rate):
1) # of total layers = 3
2) # of hidden neurons = 2
3) Momentum = .02
4) Learning rate = .006
5) # of SVD principal vectors = 11
I was very pleased with the results of preprocessing the data with singular value decomposition.
The classification rate I was able to produce with the back-propagating multilayer perceptron
was on par with the support vector machine!
Now that I had my network of choice, I wanted to utilize it to do some predictions. I
decided on week 15 of the 2013 NFL season. I collected the all 32 team’s relevant statistical
averages and compiled feature vectors from the head to head matchups. Since the
classification rate of mlp configuration #1 is a little better, I used it to make my predictions.
MLP: SVM:
Green represents a win.
The MLP and the SVM predicted that game outcomes very similarly. They only differed on the
outcomes of the Falcons vs. Redskins game and the Cowboys vs. Packers game. This can be
explained because in those two matchups both teams are very comparable. For the most part,
and using my human bias, most of the predictions seem very reasonable. There is only one
game that is very questionable and that is the Broncos versus the Chargers. The Broncos are
the heavy favorite and virtually every sports news pundit has picked them to win. But that is
why I chose to do this. I wanted to remove the human bias.
The Results
Broncos Chargers
Falcons Redskins
Buccs 49ers
Giants Seahawks
Vikings Eagles
Dolphins Pats
Jags Bills
Colts Texans
Browns Bears
Raiders Chiefs
Panthers Jets
Cowboys Packers
Titans Cards
Rams Saints
Steelers Bengals
Lions Ravens
Broncos Chargers
Falcons Redskins
Buccs 49ers
Giants Seahawks
Vikings Eagles
Dolphins Pats
Jags Bills
Colts Texans
Browns Bears
Raiders Chiefs
Panthers Jets
Cowboys Packers
Titans Cards
Rams Saints
Steelers Bengals
Lions Ravens
Looking at the results, it seems as if both predicting machines did very poorly. And yes,
it is true that this week they did. But if we look closer at week 15, we see that there were many
games that are considered upsets. And this just shows the unpredictability of the NFL. The
intangibles are very hard to predict.
The upsets:
1) The Vikings over the Eagles: The usually explosive Eagles offense did not perform up
to their usual standard. But even more unexpected was the offensive output of the Vikings.
This game epitomizes the unpredictability of the NFL.
2) The Dophins over the Patriots: The usually high-powered offense of the Patriots had
an off day. This could be because the play-making ability of the injured tight end Rob
Gronkowski was absent from the field.
3) The Rams over the Saints: Drew Brees and the Saints were let down by the atrocious
effort in the running game.
The rest of the games were just too close to call.
SVM Week 15 Results
Broncos Chargers Broncos Chargers Broncos Chargers
Falcons Redskins Falcons Redskins Falcons Redskins
Buccs 49ers Buccs 49ers Buccs 49ers
Giants Seahawks Giants Seahawks Giants Seahawks
Vikings Eagles Vikings Eagles Vikings Eagles
Dolphins Pats Dolphins Pats Dolphins Pats
Jags Bills Jags Bills Jags Bills
Colts Texans Colts Texans Colts Texans
Browns Bears Browns Bears Browns Bears
Raiders Chiefs Raiders Chiefs Raiders Chiefs
Panthers Jets Panthers Jets Panthers Jets
Cowboys Packers Cowboys Packers Cowboys Packers
Titans Cards Titans Cards Titans Cards
Rams Saints Rams Saints Rams Saints
Steelers Bengals Steelers Bengals Steelers Bengals
Lions Ravens Lions Ravens Lions Ravens
Actual Game Outcomes MLP Week 15 Results
Future Attempts
If I were to do this again in the future, I would add defensive stats to the feature vector.
While the NFL is a very offensive oriented league, the defense definitely does play a role in the
outcome of the games. While 88% predictability rate is very high, I believe that adding a
defensive element to the feature vectors could push the classification rate into the 90
percentile.
References
http://www.repole.com/sun4cast/data.html This site had reliable csv files full of nfl stats
NFL.com, http://www.nfl.com/
ESPN.com, http://espn.go.com/
Matlab Code
1. final_svm_classify.m
This matlab code trained and tested via 4-way cross validation different configurations of a
support vector machine.
clear all; close all
load TeamStatsDifferentials2012.txt data = TeamStatsDifferentials2012;
for i=1:15 data(:,i) = translate(data(:,i), 0, 10); end
train = zeros(381,16,4); test = zeros(127,16,4);
data1 = data(1:127, 1:16); data2 = data(128:254, 1:16); data3 = data(255:381, 1:16); data4 = data(382:508, 1:16);
train(:,:,1) = [data1;data2;data3]; train(:,:,2) = [data1;data2;data4]; train(:,:,3) = [data1;data3;data4]; train(:,:,4) = [data2;data3;data4]; test(:,:,1) = data4; test(:,:,2) = data3; test(:,:,3) = data2; test(:,:,4) = data1;
temp = ['linear '; 'linear '; 'polynomial';... 'polynomial'; 'rbf '; 'rbf ']; funct = cellstr(temp); C = [1, 1000000, 1, 1000000, 1, 1000000]; options = optimset('maxiter',100000);
ConfMatrix = zeros(2,2,6); ClassRate = zeros(1,6); for i=1:6 TN=0; FN=0; FP=0; TP=0; for j=1:4 model=svmtrain(train(:,1:15,j), train(:,16,j), 'kernel_function',...
char(funct(i)),'options',options,'boxconstraint',... C(i), 'method', 'QP'); tested=svmclassify(model,test(:,1:15,j)); for k=1:127 if test(k,16,j)==1 if tested(k,1)==1 TP = TP+1; else FN = FN+1; end else if tested(k,1)==1 FP = FP+1; else TN = TN+1; end end end
end ConfMatrix(1,1,i) = ConfMatrix(1,1,i) + TN; ConfMatrix(2,1,i) = ConfMatrix(2,1,i) + FN; ConfMatrix(1,2,i) = ConfMatrix(1,2,i) + FP; ConfMatrix(2,2,i) = ConfMatrix(2,2,i) + TP; ClassRate(1,i) = (TN+TP)/508; end
2. final_svm_classify_predict.m
This matlab script trained a linear svm and tested the prediction data.
clear all; close all
load TeamStatsDifferentials2012.txt; data = TeamStatsDifferentials2012; load PredictionData.txt; testData = PredictionData;
for i=1:15 data(:,i) = translate(data(:,i), 0, 10); testData(:,i) = translate(testData(:,i), 0, 10); end
C = [1, 1000000, 1, 1000000, 1, 1000000]; options = optimset('maxiter',100000);
model=svmtrain(data(:,1:15), data(:,16), 'kernel_function',... 'linear','options',options,'boxconstraint',... C(1), 'method', 'QP'); tested=svmclassify(model,testData(:,1:15));
3. mybp_svm.m
This matlab script trained and test via 4-way cross validation back-propagating mlps. This input
came in through a text file.
clear all, close all
load TeamStatsDifferentials2012_mlp.txt data = TeamStatsDifferentials2012_mlp;
for i=1:15 data(:,i) = translate(data(:,i), 0, 10); end
[ua,sa,va] = svd(data(:,1:15),0);
ConfMat = zeros(2,2,1); ClassRate = zeros(9,3);
file = fopen('mlpinput2.txt', 'r');
for b=1:10; m=11; u_vec1 = ua(:, 1:m); vec1 = u_vec1 * u_vec1'; svdData = [vec1*data(:, 1:15) data(:,16:17)];
data1 = svdData(1:127, :); data2 = svdData(128:254, :); data3 = svdData(255:381, :); data4 = svdData(382:508, :); trainData(:,:,1) = [data1;data2;data3]; trainData(:,:,2) = [data1;data2;data4]; trainData(:,:,3) = [data1;data3;data4]; trainData(:,:,4) = [data2;data3;data4]; testData(:,:,1) = data4; testData(:,:,2) = data3; testData(:,:,3) = data2; testData(:,:,4) = data1; line = fgetl(file); input = str2num(line); for c=1:8 ConfMat(:,:,b) = 0;
for a=1:4
mybpconfig_svm(input, trainData(:,:, a), testData(:,:,a)); %
configurate the MLP network and learning parameters. load mlpconfig.mat % BP iterations begins while not_converged==1, % start a new epoch % Randomly select K training samples from the training set.
[train,ptr,train0]=rsample(train0,K,Kr,ptr); % train is K by
M+N z{1}=(train(:,1:M))'; % input sample matrix M by K d=train(:,M+1:MN)'; % corresponding target value N by K
% Feed-forward phase, compute sum of square errors for l=2:L, % the l-th layer u{l}=w{l}*[ones(1,K);z{l-1}]; % u{l} is n(l) by K z{l}=actfun(u{l},atype(l)); end error=d-z{L}; % error is N by K E(t)=sum(sum(error.*error));
% Error back-propagation phase, compute delta error delta{L}=actfunp(u{L},atype(L)).*error; % N (=n(L)) by K if L>2, for l=L-1:-1:2,
delta{l}=(w{l+1}(:,2:n(l)+1))'*delta{l+1}.*actfunp(u{l},atype(l)); end end
% update the weight matrix using gradient, momentum and random
perturbation for l=2:L, dw{l}=alpha*delta{l}*[ones(1,K);z{l-1}]'+... mom*dw{l}+randn(size(w{l}))*0.005; w{l}=w{l}+dw{l}; end
% display the training error %bpdisplay;
% Test convergence to see if the convergence condition is
satisfied, cvgtest; t = t + 1; % increment epoch count end % while loop
%disp('Final training results:') if classreg==0, [Cmat,crate]=bptest(wbest,tune,atype); elseif classreg==1, SS=bptestap(wbest,tune,atype), end
if testys==1, %disp('Apply trained MLP network to the testing data. The
results are: '); if classreg==0, [Cmat,crate,cout]=bptest(wbest,test0,atype,labeled,N); if labeled==1, %disp('Confusion matrix Cmat = '); disp(Cmat); %disp(['classification = ' num2str(crate) '%']) elseif labeled==0,
% print out classifier output only if there is no label disp('classifier outputs are: ') disp(cout); end elseif classreg==1, SS=bptestap(wbest,test0,atype), end end ConfMat(:,:,b) = ConfMat(:,:,b) + Cmat; end ClassRate(c,b) = (ConfMat(1,1,b)+ConfMat(2,2,b))/508; end ClassRate(9,b) = mean(ClassRate(1:8,b));
end
4. mybp_config.m
This matlab script takes in input to set up the mlp, training data, and testing data.
function mybpconfig( input, train, test )
train0 = train; [Kr,MN]=size(train);
M= MN-2; N=MN-M; testys=1;
test0 = test;
[Kt,MNt]=size(test0);
if MNt == MN, labeled=1; else labeled=0; end scalein= input(2); [tmp,xmin,xmax]=scale([train0(:,(1:M)); test0(:,(1:M))],-5,5); train0(:,(1:M))=tmp((1:Kr),:); test0(:,(1:M))=tmp((Kr+1):(Kr+Kt),:); L1 = input(3); L=L1+1; n(1) = M; n(L) = N; j = 4; for i=2:L-1, n(i)= input(j); w{i}=0.001*randn(n(i),n(i-1)+1); % first column is the bias weight dw{i}=zeros(size(w{i})); % initialize dw j = j + 1; end w{L}=0.005*randn(n(L),n(L-1)+1); % first column is the bias weight dw{L}=zeros(size(w{L}));
% ============================================================== % choose types of activation function % default: hidden layers, tanh (type = 2), output layer, sigmoid (type = 1) % default parameter T = 1 is used. % ============================================================== atype=2*ones(L,1); atype(L)=1; % default %disp('By default, hidden layers use tanh activation function, output use
sigmoidal'); chostype= input(j); %if isempty(chostype), chostype=0; end if chostype==1, disp('============================================================='); disp('activation function type 1: sigmoidal'); disp('activation function type 2: hyperbolic tangent'); disp('activation function type 3: linear'); for l=2:L, atype(l)=input(['Layer #' int2str(l) ' activation function type = ']); end end
% ============================================================== % next load a tuning set file to help determine training errors % or partition the training file into a training and a tuning file. % ============================================================== %disp('Enter 0 (default) if a pattern classification problem, '); classreg= input(j+1); %if isempty(classreg), classreg=0; end
%disp('============================================================='); % msg_1=[... % 'To estimate training error, choose one of the following: ' % '1 - Use the entire training set to estimate training error; ' % '2 - Use a separate fixed tuning data file to estimate training error;' % '3 - Partition training set dynamically into training and tuning sets;' % ' (This is for pattern classification problem) ']; % disp(msg_1);
%chos=input('Enter your selection (default = 1): '); chos= input(j+2); %if isempty(chos), chos=1; end if chos==1, tune=train0; elseif chos==2, dir; tune_name=input(' Enter tuning filename in single quote, no file
extension: '); eval(['load ' tune_name]); tune=eval(tune_name); eval(['clear ' tune_name]); % scale the tuning file feature vectors and output as it is newly loaded. if scalein==1, % scale input to [-5 5] [tune(:,1:M),xmin,xmax]=scale(tune(:,1:M),-5,5); end
elseif chos==3, % partition the training file into a training and tuning % according to a user-specified percentage prc=input('Percentage (0 to 100) of training data reserved for tuning: ');
[tune,train0]=partunef(train0,M,prc); [Kr,MN]=size(train0); % this train0 is only a subset of original train0
and hence % Kr must be updated. end [Ktune,MN]=size(tune);
% ============================================================== % scaling the output of training set data % normally the output will be scaled to [outlow outhigh] = [0.2 0.8] % for sigmoidal activation function, and [-0.8 0.8] for hyperbolic
tangent % or linear activation function at the output nodes. % However, the actual output of MLP during testing of tuning file or
testing % file will be handled differently: % a) Pattern classification problem: since we are concerned only with the
maximum % among all output, the output of MLP will not be changed even it
ranges only % between [outlow outhigh] rather than [0 1] % b) Approximation (regression) problem: the output of MLP will be scaled
back % for comparison with target values % ==============================================================
% disp('============================================================='); % disp('Output from output nodes for training samples may be scaled to: ') % disp('[0.2 0.8] for sigmoidal activation function or '); % disp('[-0.8 0.8] for hyperbolic tangent or linear activation function '); %scaleout=input('=Enter 1 (default) to scale the output: '); scaleout= input(j+3); %if isempty(scaleout), scaleout=1; end if atype(L)==1, outlow = 0.2; elseif atype(L)==2 | atype(L) == 3, outlow = -
0.8; end outhigh=0.8; if scaleout==1, [train0(:,M+1:MN),zmin,zmax]=scale(train0(:,M+1:MN),outlow,outhigh); %
scale output % scale the target value of tuning set [tune(:,M+1:MN),zmin,zmax]=scale(tune(:,M+1:MN),outlow,outhigh); % scale target value of testing set if available if testys==1 & labeled==1 % if testing set specifies output [test0(:,M+1:MN),zmin,zmax]=scale(test0(:,M+1:MN),outlow,outhigh); end end
% now, we have a training file and a tuning file
% ============================================================== % learning parameters % ============================================================== %disp('============================================================='); %alpha=input('learning rate (between 0 and 1, default = 0.1) alpha = '); alpha= input(j+4); %if isempty(alpha), alpha=0.1; end
%mom=input(' momentum constant (between 0 and 1, default 0.8) mom = '); mom= input(j+5); %if isempty(mom), mom=0.8; end
% ============================================================== % termination criteria % A. Terminate when the max. # of epochs to run is reached. % ============================================================== %nepoch=input('maximum number of epochs to run, nepoch = '); nepoch= input(j+6); %disp(['# training samples = ' int2str(Kr)]); %K = input(['epoch size (default = ' int2str(min(64,Kr)) ', <= ' int2str(Kr)
') = ']); K= input(j+7); %if isempty(K), K=min(64,Kr); end %disp(['total # of training samples applied = ' int2str(nepoch*K)]);
% ============================================================== % B. Check the tuning set testing result periodically. If the tuning set
testing % results is reducing, save the weights. When the tuning set testing
results % start increasing, stop training, and use the previously saved weights. % ============================================================== %disp('============================================================='); % nck=input(['# of epochs between convergence check (> ' ... % int2str(ceil(Kr/K)) '): ']); nck= input(j+8); % disp(' '); % disp('If testing on tuning set meets no improvement for n0'); %maxstall=input('iterations, stop training! Enter n0 = '); maxstall= input(j+9); nstall=0; % initilize # of no improvement count. when nstall > maxstall,
quit. if classreg==0, bstrate=0; % initialize classification rate on tuning set to 0 elseif classreg==1, bstss=1; % intialize tuning set error to maximum ssthresh=0.001; % initialize thres end
% ============================================================== % training status monitoring % ============================================================== E=zeros(1,nepoch); % record training error
ndisp=5; % disp(' '); % disp(['the training error is plotted every ' int2str(ndisp) '
iterations']); % disp('Enter <Return> to use default value. ') % chos1=input('Enter a positive integer to set to a new value: '); chos1= input(j+10); if isempty(chos1), ndisp=5;
elseif chos1>0, ndisp=chos1; else ndisp=input('You must enter a positive integer, try again: '); end
% ============================================================== % intialization for the bp iterations % ==============================================================
t = 1; % initialize epoch counter ptr=1; % initialize pointer for re-sampling the training file not_converged = 1; % not yet converged
% ============================================================== % save all variables into a file so that user needs not reenter all of them % ==============================================================
save mlpconfig.mat
end
Top Related