Download - Predicting NFL Game Outcomes - CAE Usershomepages.cae.wisc.edu/~ece539/fall13/project/McBride... · 2013-12-26 · Penalties make it difficult for the offense to string together a

Predicting NFL Game Outcomes

Paul McBride

ECE 539

Professor Yu Hen Hu

Introduction

Football has become an incredibly popular sport in America over the past 20 years or so.

Every week, millions and millions of people tune in to watch the plethora of NFL games

available to them. This has turned the NFL into a multibillion dollar corporation. This

expansion of the NFL is also correlated with the rise popularity of sports news shows such as

SportsCenter. On these programs, the analysts voice their opinions on who they think the

winner of the games will be. These picks are largely based on personal opinion and not truly

based on the data. I am looking to remove this human bias and find out truly what the data

says about winning a football game.

NFL football is an incredibly complex game. Sure, there are some stats you can count on

to correlate with a win, such as total yards, but that is not always the case. A quarterback could

have a bad game, a usually solid running back could cough up the ball in key positions; you

really never totally know what is going to happen. These intangibles make predicting outcomes

very challenging. But, based on the average gameplay of a team, it seems like using a large

amount of data could be reliably used to smooth out the intangibles.

Due to the NFL’s incredible popularity, there exist a huge amount of game-by-game data

online. Actually, the amount of data available is quite overwhelming. The challenging task is to

pick stats that have a big impact on the gameplay. I decided to focus on offensive stats. The

NFL is an incredibly offensive minded league and that’s not to say that defense doesn’t have an

impact. It definitely does, but the play of a stellar quarterback has been known to be key to

winning football games.

I decided to get my data from a typical box score, or the team play stats per game. A

standard box score has more than enough data to be effective. Box score contain offensive

stats such as total yards, interceptions throw, turnovers, etc. So I will have more than enough

useful stats to use.

Just by taking a quick look at the numbers of a typical box score, one can tell that there

is not just one telling stat on who won the game. For example, total yards is usually a good

indicator on who won the game – the winning more often than not has more total yards than

the losing team. But this just is not always the case. It’s facts like this that lead me to believe

that there does not exist a linear mapping method to choose the winner. And it’s situations like

this that are perfect for a neural network, more specifically a back-propagating multilayer

perceptron.

Work Performed

Before any predicting could be performed, I had to decide on the data that I wanted to

work with and to collect said data. As I alluded to in the introduction, the NFL is a very

offensive minded game. Therefore, I wanted to analyze offensive stats to see if an efficient

prediction network could come to fruition. Using my knowledge of football, I decided that the

following stats would be good information for the neural network to learn from:

1. Homefield advantage: more often than not, the home team performs better in front

of their home town fans.

2. Firstdowns: If a team has a high number of first downs, it means they were able to

move the ball up and down the field.

3. Thirddown conversion percentage: If a team has a high thirddown conversion

percentage, the offense is able to stay out on the field longer.

4. Rush attempts: This stat tracks how many times the team attempted to run the ball.

5. Rush yards: If the team has a high number of rushing yards, it usually complements

the passing game quite nicely.

6. Pass attempts: This stat tracks how many times the team attempted to throw the

ball.

7. Pass yards: If the team has a high number of pass yards, it usually complements the

rushing game.

8. Total yards: If a team has a large amount of total yards, it mean there offense was

working effectively.

9. Pass interceptions: This stat tracks the number of interceptions thrown by the

quarter back. An interception turns the ball over to the other team’s offense.

10. Fumbles lost: This stat tracks the number of fumbles lost by the team. A fumble

picked up by the other team means it is their ball.

11. Turnovers: If a team has a high number of turnovers, it indicates that the offense

was not able to stay out on the field.

12. Sacks: This stat tracks the number of times the quarter back was sacked. A sack

results in a loss of yards.

13. Sack yards loss: This stats tracks the yards lost by the offense due to sacks.

14. Penalty yards: This stat tracks the amount of penalty yards lost by the offense.

Penalties make it difficult for the offense to string together a number of plays to stay

out on the field.

I decided to collect data from the 2012 NFL season to train my network. I was able to

get data from a site that has excel csv files available for download

(http://www.repole.com/sun4cast/data.html). I chose the 2012 season because the brand of

football played would be very similar to the 2013 season. After I downloaded the needed csv

file, I uploaded the data into an excel workbook. I then, got rid of unneeded statistics. Since

one team goes up against another team, I wanted to make the attributes of my feature vectors

differential statistics. In other words, if the match up was Team A vs Team B, I subtracted the

stats of Team B from the stats of Team A. If Team A had won that game, the outcome was 1. If

Team B had won, the outcome was -1. Below is an example of a feature vector.

This gave me 508 feature vectors to train my nueral network with. Each feature vector

consisted of 15 attributes and one outcome attribute.

Before I constructed a back-propagating multilayer perceptron, I wanted to have

another classifier to compare the results to. I decided on a support vector machine. There are

many configurations of a support vector machine that I could choose from, so for my baseline I

decided to test three different svm’s with different kernel functions: linear kernel function,

polynomial degree 3 kernel function, and a radial basis kernel function. For each of these I used

constraint values of 1 and 1000000. To get a true feel on how these support vector machines

would perform in real life, I used 4-way cross validation to test the classification rate and to

generate confusion matrices of the results. I give credit to Professor Hu’s Matlab files for my

training. However, the drivers that run the routines are all written by me. Below are the results

I achieved using the support vector machines:

Linear C = 1 Linear C = 1000000 Polynomial C = 1

225 29 224 30 204 50

28 226 31 223 52 202

Class % 0.887795276 0.87992126 0.799212598

Polynomial C = 1000000 RBF C = 1 RBF C = 1000000

205 49 217 37 212 42

53 201 36 218 39 215

Class % 0.799212598 0.856299213 0.840551181

As you can see, the support vector machine using a linear kernel with constraint value equal to

1 performed the best. I was very pleased with the results because predicting ~89% of games is

quite the feat.

Now that I had my baseline classification percentage, it was time to decide what

structures of multilayer perceptron to use. The reason I decided to use a back-propagating

neural network was for its great flexibility and that the back-propagation provides supervised

learning. Since there are many ways an NFL game can go, supervised learning was an attractive

feature. And since it’s flexible, there are many different parameters that I can try to use to

increase the classification rate of the network. Again the algorithm and code used to train the

network is based off of the Matlab file supplied by Professor Hu on the course website. But the

drivers and the data preparation were all written by me. Once again, I decided to use 4-way

cross validation to mimic how the machines would be used in real use.

Right off the bat, I did not really know what to expect so I tried a variety of different

inputs. Below are a few example of my initial results:

# hidden layer = 5

runs\alpha

mu=0 mu=.8

0.01 0.05 0.1 0.01 0.05 0.1

0.848425 0.838583 0.834646 0.846457 0.801181 0.814961

0.832677 0.840551 0.82874 0.840551 0.834646 0.805118

0.834646 0.832677 0.820866 0.850394 0.82874 0.820866

0.840551 0.834646 0.826772 0.824803 0.809055 0.822835

0.826772 0.82874 0.807087 0.854331 0.809055 0.830709

0.82874 0.814961 0.807087 0.852362 0.816929 0.809055

0.824803 0.84252 0.838583 0.836614 0.822835 0.809055

0.826772 0.840551 0.82874 0.838583 0.838583 0.805118

avg 0.832923 0.834154 0.824065 0.843012 0.820128 0.814715

# hidden layer = 2

runs\alpha

mu=0 mu=.8

0.01 0.05 0.1 0.01 0.05 0.1

0.84252 0.82874 0.82874 0.858268 0.836614 0.838583

0.824803 0.834646 0.834646 0.848425 0.844488 0.846457

0.830709 0.840551 0.834646 0.832677 0.840551 0.82874

0.836614 0.82874 0.838583 0.84252 0.838583 0.840551

0.816929 0.82874 0.836614 0.832677 0.84252 0.836614

0.814961 0.838583 0.82874 0.846457 0.850394 0.840551

0.832677 0.82874 0.838583 0.860236 0.840551 0.834646

0.824803 0.836614 0.82874 0.836614 0.860236 0.844488

avg 0.828002 0.833169 0.833661 0.844734 0.844242 0.838829

# hidden layer = 2

runs\alpha

mu=.8

0.006 0.008 0.01 0.012 0.014 0.016

0.846457 0.84252 0.84252 0.856299 0.848425 0.822835

0.864173 0.840551 0.860236 0.846457 0.84252 0.850394

0.848425 0.856299 0.852362 0.848425 0.838583 0.836614

0.82874 0.82874 0.834646 0.836614 0.838583 0.838583

0.82874 0.852362 0.860236 0.838583 0.858268 0.840551

0.852362 0.836614 0.846457 0.834646 0.848425 0.840551

0.846457 0.84252 0.856299 0.838583 0.844488 0.846457

0.84252 0.84252 0.848425 0.84252 0.844488 0.84252

avg 0.844734 0.842766 0.850148 0.842766 0.845472 0.839813

runs\# in hidden layer

alpha=.01 mu=.8

2 3 4 5 6 7

0.844488 0.834646 0.82874 0.840551 0.822835 0.836614

0.834646 0.850394 0.856299 0.840551 0.870079 0.840551

0.852362 0.844488 0.850394 0.834646 0.830709 0.832677

0.848425 0.856299 0.846457 0.82874 0.830709 0.820866

0.836614 0.834646 0.850394 0.856299 0.848425 0.820866

0.856299 0.834646 0.836614 0.840551 0.830709 0.834646

0.832677 0.824803 0.82874 0.832677 0.850394 0.816929

0.84252 0.840551 0.840551 0.830709 0.838583 0.850394

avg 0.843504 0.840059 0.842274 0.838091 0.840305 0.831693

Right off the bat the results were discouraging. The best it seemed that I could do was ~85%

classification rate. While that is actually pretty good, It was not better than the linear support

vector machine that I had previously trained. It was then that I decided to preprocess the data

by running a singular value decomposition. The follow code snippet is an example of my data

preprocessing:

[ua,sa,va] = svd(data(:,1:15),0); m=11; u_vec1 = ua(:, 1:m); vec1 = u_vec1 * u_vec1'; svdData = [vec1*data(:, 1:15) data(:,16:17)];

The challenge doing this was deciding how many of the principal vectors to process my data

with (the choice of variable m in the code snippet). I just decided to give a couple

configurations a try:

Right away, I could tell that the results were looking more promising. I was able to predict

about 86% of the games without much tinkering of the input variables. Through a great deal of

analysis, I was able to raise the prediction classification rate to 88% using 11 singular value

decomposition vectors. Below are the two most successful configurations:

Configuration #1 (88.1234 % classification rate):

1) # of total layers = 3

# hidden layer = 5

5 6 7 8 9 10

0.842519685 0.850394 0.870079 0.848425 0.852362 0.864173

0.838582677 0.854331 0.850394 0.858268 0.858268 0.866142

0.842519685 0.856299 0.854331 0.836614 0.866142 0.875984

0.848425197 0.858268 0.852362 0.852362 0.856299 0.854331

0.834645669 0.854331 0.850394 0.84252 0.86811 0.848425

0.848425197 0.852362 0.858268 0.854331 0.875984 0.854331

0.840551181 0.850394 0.860236 0.836614 0.858268 0.874016

0.838582677 0.862205 0.862205 0.852362 0.840551 0.850394

avg 0.841781496 0.854823 0.857283 0.847687 0.859498 0.860974

runs\# of SVD prince vectors

mu=0 alpha = .01

# hidden layer = 5 mu=.02

alpha = .007

SVD prince vectors 11 11 11 11 11 11 11 11 11 11 11 11

0.889764 0.879921 0.885827 0.874016 0.875984 0.874016 0.883858 0.875984 0.877953 0.879921 0.885827 0.899606

0.874016 0.877953 0.885827 0.879921 0.889764 0.895669 0.891732 0.885827 0.88189 0.874016 0.872047 0.864173

0.879921 0.897638 0.887795 0.889764 0.874016 0.88189 0.889764 0.879921 0.874016 0.870079 0.872047 0.885827

0.891732 0.858268 0.874016 0.88189 0.874016 0.879921 0.846457 0.889764 0.875984 0.903543 0.88189 0.889764

0.887795 0.889764 0.877953 0.877953 0.883858 0.883858 0.889764 0.875984 0.877953 0.885827 0.875984 0.875984

0.885827 0.887795 0.874016 0.875984 0.874016 0.88189 0.889764 0.885827 0.889764 0.879921 0.874016 0.88189

0.893701 0.88189 0.874016 0.885827 0.883858 0.891732 0.874016 0.88189 0.879921 0.879921 0.879921 0.88189

0.895669 0.885827 0.879921 0.88189 0.883858 0.86811 0.877953 0.875984 0.883858 0.877953 0.872047 0.877953

0.887303 0.882382 0.879921 0.880906 0.879921 0.882136 0.880413 0.881398 0.880167 0.881398 0.876722 0.882136

0.881234

# hidden layer = 2 mu=.02

alpha = .006

SVD prince vectors 11 11 11 11 11 11 11 11 11 11 11 11

0.893701 0.893701 0.885827 0.887795 0.875984 0.891732 0.874016 0.874016 0.870079 0.879921 0.887795 0.885827

0.88189 0.889764 0.897638 0.874016 0.879921 0.879921 0.860236 0.874016 0.86811 0.879921 0.887795 0.850394

0.874016 0.86811 0.883858 0.879921 0.879921 0.88189 0.885827 0.883858 0.883858 0.879921 0.883858 0.879921

0.879921 0.883858 0.885827 0.889764 0.862205 0.88189 0.856299 0.883858 0.879921 0.877953 0.872047 0.88189

0.885827 0.875984 0.885827 0.86811 0.875984 0.885827 0.88189 0.877953 0.885827 0.875984 0.883858 0.874016

0.893701 0.885827 0.879921 0.870079 0.885827 0.887795 0.883858 0.860236 0.860236 0.879921 0.870079 0.883858

0.877953 0.891732 0.883858 0.879921 0.877953 0.893701 0.877953 0.887795 0.88189 0.877953 0.86811 0.866142

0.887795 0.889764 0.887795 0.874016 0.88189 0.887795 0.887795 0.887795 0.875984 0.891732 0.883858 0.879921

0.88435 0.884843 0.886319 0.877953 0.877461 0.886319 0.875984 0.878691 0.875738 0.880413 0.879675 0.875246

0.880249

2) # of hidden neurons = 5

3) Momentum = .02

4) Learning rate = .007

5) # of SVD principal vectors = 11

Configuration #2 (88.0249 % classification rate):

1) # of total layers = 3

2) # of hidden neurons = 2

3) Momentum = .02

4) Learning rate = .006

5) # of SVD principal vectors = 11

I was very pleased with the results of preprocessing the data with singular value decomposition.

The classification rate I was able to produce with the back-propagating multilayer perceptron

was on par with the support vector machine!

Now that I had my network of choice, I wanted to utilize it to do some predictions. I

decided on week 15 of the 2013 NFL season. I collected the all 32 team’s relevant statistical

averages and compiled feature vectors from the head to head matchups. Since the

classification rate of mlp configuration #1 is a little better, I used it to make my predictions.

MLP: SVM:

Green represents a win.

The MLP and the SVM predicted that game outcomes very similarly. They only differed on the

outcomes of the Falcons vs. Redskins game and the Cowboys vs. Packers game. This can be

explained because in those two matchups both teams are very comparable. For the most part,

and using my human bias, most of the predictions seem very reasonable. There is only one

game that is very questionable and that is the Broncos versus the Chargers. The Broncos are

the heavy favorite and virtually every sports news pundit has picked them to win. But that is

why I chose to do this. I wanted to remove the human bias.

The Results

Broncos Chargers

Falcons Redskins

Buccs 49ers

Giants Seahawks

Vikings Eagles

Dolphins Pats

Jags Bills

Colts Texans

Browns Bears

Raiders Chiefs

Panthers Jets

Cowboys Packers

Titans Cards

Rams Saints

Steelers Bengals

Lions Ravens

Broncos Chargers

Falcons Redskins

Buccs 49ers

Giants Seahawks

Vikings Eagles

Dolphins Pats

Jags Bills

Colts Texans

Browns Bears

Raiders Chiefs

Panthers Jets

Cowboys Packers

Titans Cards

Rams Saints

Steelers Bengals

Lions Ravens

Looking at the results, it seems as if both predicting machines did very poorly. And yes,

it is true that this week they did. But if we look closer at week 15, we see that there were many

games that are considered upsets. And this just shows the unpredictability of the NFL. The

intangibles are very hard to predict.

The upsets:

1) The Vikings over the Eagles: The usually explosive Eagles offense did not perform up

to their usual standard. But even more unexpected was the offensive output of the Vikings.

This game epitomizes the unpredictability of the NFL.

2) The Dophins over the Patriots: The usually high-powered offense of the Patriots had

an off day. This could be because the play-making ability of the injured tight end Rob

Gronkowski was absent from the field.

3) The Rams over the Saints: Drew Brees and the Saints were let down by the atrocious

effort in the running game.

The rest of the games were just too close to call.

SVM Week 15 Results

Broncos Chargers Broncos Chargers Broncos Chargers

Falcons Redskins Falcons Redskins Falcons Redskins

Buccs 49ers Buccs 49ers Buccs 49ers

Giants Seahawks Giants Seahawks Giants Seahawks

Vikings Eagles Vikings Eagles Vikings Eagles

Dolphins Pats Dolphins Pats Dolphins Pats

Jags Bills Jags Bills Jags Bills

Colts Texans Colts Texans Colts Texans

Browns Bears Browns Bears Browns Bears

Raiders Chiefs Raiders Chiefs Raiders Chiefs

Panthers Jets Panthers Jets Panthers Jets

Cowboys Packers Cowboys Packers Cowboys Packers

Titans Cards Titans Cards Titans Cards

Rams Saints Rams Saints Rams Saints

Steelers Bengals Steelers Bengals Steelers Bengals

Lions Ravens Lions Ravens Lions Ravens

Actual Game Outcomes MLP Week 15 Results

Future Attempts

If I were to do this again in the future, I would add defensive stats to the feature vector.

While the NFL is a very offensive oriented league, the defense definitely does play a role in the

outcome of the games. While 88% predictability rate is very high, I believe that adding a

defensive element to the feature vectors could push the classification rate into the 90

percentile.

References

http://www.repole.com/sun4cast/data.html This site had reliable csv files full of nfl stats

NFL.com, http://www.nfl.com/

ESPN.com, http://espn.go.com/

Matlab Code

1. final_svm_classify.m

This matlab code trained and tested via 4-way cross validation different configurations of a

support vector machine.

clear all; close all

load TeamStatsDifferentials2012.txt data = TeamStatsDifferentials2012;

for i=1:15 data(:,i) = translate(data(:,i), 0, 10); end

train = zeros(381,16,4); test = zeros(127,16,4);

data1 = data(1:127, 1:16); data2 = data(128:254, 1:16); data3 = data(255:381, 1:16); data4 = data(382:508, 1:16);

train(:,:,1) = [data1;data2;data3]; train(:,:,2) = [data1;data2;data4]; train(:,:,3) = [data1;data3;data4]; train(:,:,4) = [data2;data3;data4]; test(:,:,1) = data4; test(:,:,2) = data3; test(:,:,3) = data2; test(:,:,4) = data1;

temp = ['linear '; 'linear '; 'polynomial';... 'polynomial'; 'rbf '; 'rbf ']; funct = cellstr(temp); C = [1, 1000000, 1, 1000000, 1, 1000000]; options = optimset('maxiter',100000);

ConfMatrix = zeros(2,2,6); ClassRate = zeros(1,6); for i=1:6 TN=0; FN=0; FP=0; TP=0; for j=1:4 model=svmtrain(train(:,1:15,j), train(:,16,j), 'kernel_function',...

char(funct(i)),'options',options,'boxconstraint',... C(i), 'method', 'QP'); tested=svmclassify(model,test(:,1:15,j)); for k=1:127 if test(k,16,j)==1 if tested(k,1)==1 TP = TP+1; else FN = FN+1; end else if tested(k,1)==1 FP = FP+1; else TN = TN+1; end end end

end ConfMatrix(1,1,i) = ConfMatrix(1,1,i) + TN; ConfMatrix(2,1,i) = ConfMatrix(2,1,i) + FN; ConfMatrix(1,2,i) = ConfMatrix(1,2,i) + FP; ConfMatrix(2,2,i) = ConfMatrix(2,2,i) + TP; ClassRate(1,i) = (TN+TP)/508; end

2. final_svm_classify_predict.m

This matlab script trained a linear svm and tested the prediction data.

clear all; close all

load TeamStatsDifferentials2012.txt; data = TeamStatsDifferentials2012; load PredictionData.txt; testData = PredictionData;

for i=1:15 data(:,i) = translate(data(:,i), 0, 10); testData(:,i) = translate(testData(:,i), 0, 10); end

C = [1, 1000000, 1, 1000000, 1, 1000000]; options = optimset('maxiter',100000);

model=svmtrain(data(:,1:15), data(:,16), 'kernel_function',... 'linear','options',options,'boxconstraint',... C(1), 'method', 'QP'); tested=svmclassify(model,testData(:,1:15));

3. mybp_svm.m

This matlab script trained and test via 4-way cross validation back-propagating mlps. This input

came in through a text file.

clear all, close all

load TeamStatsDifferentials2012_mlp.txt data = TeamStatsDifferentials2012_mlp;

for i=1:15 data(:,i) = translate(data(:,i), 0, 10); end

[ua,sa,va] = svd(data(:,1:15),0);

ConfMat = zeros(2,2,1); ClassRate = zeros(9,3);

file = fopen('mlpinput2.txt', 'r');

for b=1:10; m=11; u_vec1 = ua(:, 1:m); vec1 = u_vec1 * u_vec1'; svdData = [vec1*data(:, 1:15) data(:,16:17)];

data1 = svdData(1:127, :); data2 = svdData(128:254, :); data3 = svdData(255:381, :); data4 = svdData(382:508, :); trainData(:,:,1) = [data1;data2;data3]; trainData(:,:,2) = [data1;data2;data4]; trainData(:,:,3) = [data1;data3;data4]; trainData(:,:,4) = [data2;data3;data4]; testData(:,:,1) = data4; testData(:,:,2) = data3; testData(:,:,3) = data2; testData(:,:,4) = data1; line = fgetl(file); input = str2num(line); for c=1:8 ConfMat(:,:,b) = 0;

for a=1:4

mybpconfig_svm(input, trainData(:,:, a), testData(:,:,a)); %

configurate the MLP network and learning parameters. load mlpconfig.mat % BP iterations begins while not_converged==1, % start a new epoch % Randomly select K training samples from the training set.

[train,ptr,train0]=rsample(train0,K,Kr,ptr); % train is K by

M+N z{1}=(train(:,1:M))'; % input sample matrix M by K d=train(:,M+1:MN)'; % corresponding target value N by K

% Feed-forward phase, compute sum of square errors for l=2:L, % the l-th layer u{l}=w{l}*[ones(1,K);z{l-1}]; % u{l} is n(l) by K z{l}=actfun(u{l},atype(l)); end error=d-z{L}; % error is N by K E(t)=sum(sum(error.*error));

% Error back-propagation phase, compute delta error delta{L}=actfunp(u{L},atype(L)).*error; % N (=n(L)) by K if L>2, for l=L-1:-1:2,

delta{l}=(w{l+1}(:,2:n(l)+1))'*delta{l+1}.*actfunp(u{l},atype(l)); end end

% update the weight matrix using gradient, momentum and random

perturbation for l=2:L, dw{l}=alpha*delta{l}*[ones(1,K);z{l-1}]'+... mom*dw{l}+randn(size(w{l}))*0.005; w{l}=w{l}+dw{l}; end

% display the training error %bpdisplay;

% Test convergence to see if the convergence condition is

satisfied, cvgtest; t = t + 1; % increment epoch count end % while loop

%disp('Final training results:') if classreg==0, [Cmat,crate]=bptest(wbest,tune,atype); elseif classreg==1, SS=bptestap(wbest,tune,atype), end

if testys==1, %disp('Apply trained MLP network to the testing data. The

results are: '); if classreg==0, [Cmat,crate,cout]=bptest(wbest,test0,atype,labeled,N); if labeled==1, %disp('Confusion matrix Cmat = '); disp(Cmat); %disp(['classification = ' num2str(crate) '%']) elseif labeled==0,

% print out classifier output only if there is no label disp('classifier outputs are: ') disp(cout); end elseif classreg==1, SS=bptestap(wbest,test0,atype), end end ConfMat(:,:,b) = ConfMat(:,:,b) + Cmat; end ClassRate(c,b) = (ConfMat(1,1,b)+ConfMat(2,2,b))/508; end ClassRate(9,b) = mean(ClassRate(1:8,b));

end

4. mybp_config.m

This matlab script takes in input to set up the mlp, training data, and testing data.

function mybpconfig( input, train, test )

train0 = train; [Kr,MN]=size(train);

M= MN-2; N=MN-M; testys=1;

test0 = test;

[Kt,MNt]=size(test0);

if MNt == MN, labeled=1; else labeled=0; end scalein= input(2); [tmp,xmin,xmax]=scale([train0(:,(1:M)); test0(:,(1:M))],-5,5); train0(:,(1:M))=tmp((1:Kr),:); test0(:,(1:M))=tmp((Kr+1):(Kr+Kt),:); L1 = input(3); L=L1+1; n(1) = M; n(L) = N; j = 4; for i=2:L-1, n(i)= input(j); w{i}=0.001*randn(n(i),n(i-1)+1); % first column is the bias weight dw{i}=zeros(size(w{i})); % initialize dw j = j + 1; end w{L}=0.005*randn(n(L),n(L-1)+1); % first column is the bias weight dw{L}=zeros(size(w{L}));

% ============================================================== % choose types of activation function % default: hidden layers, tanh (type = 2), output layer, sigmoid (type = 1) % default parameter T = 1 is used. % ============================================================== atype=2*ones(L,1); atype(L)=1; % default %disp('By default, hidden layers use tanh activation function, output use

sigmoidal'); chostype= input(j); %if isempty(chostype), chostype=0; end if chostype==1, disp('============================================================='); disp('activation function type 1: sigmoidal'); disp('activation function type 2: hyperbolic tangent'); disp('activation function type 3: linear'); for l=2:L, atype(l)=input(['Layer #' int2str(l) ' activation function type = ']); end end

% ============================================================== % next load a tuning set file to help determine training errors % or partition the training file into a training and a tuning file. % ============================================================== %disp('Enter 0 (default) if a pattern classification problem, '); classreg= input(j+1); %if isempty(classreg), classreg=0; end

%disp('============================================================='); % msg_1=[... % 'To estimate training error, choose one of the following: ' % '1 - Use the entire training set to estimate training error; ' % '2 - Use a separate fixed tuning data file to estimate training error;' % '3 - Partition training set dynamically into training and tuning sets;' % ' (This is for pattern classification problem) ']; % disp(msg_1);

%chos=input('Enter your selection (default = 1): '); chos= input(j+2); %if isempty(chos), chos=1; end if chos==1, tune=train0; elseif chos==2, dir; tune_name=input(' Enter tuning filename in single quote, no file

extension: '); eval(['load ' tune_name]); tune=eval(tune_name); eval(['clear ' tune_name]); % scale the tuning file feature vectors and output as it is newly loaded. if scalein==1, % scale input to [-5 5] [tune(:,1:M),xmin,xmax]=scale(tune(:,1:M),-5,5); end

elseif chos==3, % partition the training file into a training and tuning % according to a user-specified percentage prc=input('Percentage (0 to 100) of training data reserved for tuning: ');

[tune,train0]=partunef(train0,M,prc); [Kr,MN]=size(train0); % this train0 is only a subset of original train0

and hence % Kr must be updated. end [Ktune,MN]=size(tune);

% ============================================================== % scaling the output of training set data % normally the output will be scaled to [outlow outhigh] = [0.2 0.8] % for sigmoidal activation function, and [-0.8 0.8] for hyperbolic

tangent % or linear activation function at the output nodes. % However, the actual output of MLP during testing of tuning file or

testing % file will be handled differently: % a) Pattern classification problem: since we are concerned only with the

maximum % among all output, the output of MLP will not be changed even it

ranges only % between [outlow outhigh] rather than [0 1] % b) Approximation (regression) problem: the output of MLP will be scaled

back % for comparison with target values % ==============================================================

% disp('============================================================='); % disp('Output from output nodes for training samples may be scaled to: ') % disp('[0.2 0.8] for sigmoidal activation function or '); % disp('[-0.8 0.8] for hyperbolic tangent or linear activation function '); %scaleout=input('=Enter 1 (default) to scale the output: '); scaleout= input(j+3); %if isempty(scaleout), scaleout=1; end if atype(L)==1, outlow = 0.2; elseif atype(L)==2 | atype(L) == 3, outlow = -

0.8; end outhigh=0.8; if scaleout==1, [train0(:,M+1:MN),zmin,zmax]=scale(train0(:,M+1:MN),outlow,outhigh); %

scale output % scale the target value of tuning set [tune(:,M+1:MN),zmin,zmax]=scale(tune(:,M+1:MN),outlow,outhigh); % scale target value of testing set if available if testys==1 & labeled==1 % if testing set specifies output [test0(:,M+1:MN),zmin,zmax]=scale(test0(:,M+1:MN),outlow,outhigh); end end

% now, we have a training file and a tuning file

% ============================================================== % learning parameters % ============================================================== %disp('============================================================='); %alpha=input('learning rate (between 0 and 1, default = 0.1) alpha = '); alpha= input(j+4); %if isempty(alpha), alpha=0.1; end

%mom=input(' momentum constant (between 0 and 1, default 0.8) mom = '); mom= input(j+5); %if isempty(mom), mom=0.8; end

% ============================================================== % termination criteria % A. Terminate when the max. # of epochs to run is reached. % ============================================================== %nepoch=input('maximum number of epochs to run, nepoch = '); nepoch= input(j+6); %disp(['# training samples = ' int2str(Kr)]); %K = input(['epoch size (default = ' int2str(min(64,Kr)) ', <= ' int2str(Kr)

') = ']); K= input(j+7); %if isempty(K), K=min(64,Kr); end %disp(['total # of training samples applied = ' int2str(nepoch*K)]);

% ============================================================== % B. Check the tuning set testing result periodically. If the tuning set

testing % results is reducing, save the weights. When the tuning set testing

results % start increasing, stop training, and use the previously saved weights. % ============================================================== %disp('============================================================='); % nck=input(['# of epochs between convergence check (> ' ... % int2str(ceil(Kr/K)) '): ']); nck= input(j+8); % disp(' '); % disp('If testing on tuning set meets no improvement for n0'); %maxstall=input('iterations, stop training! Enter n0 = '); maxstall= input(j+9); nstall=0; % initilize # of no improvement count. when nstall > maxstall,

quit. if classreg==0, bstrate=0; % initialize classification rate on tuning set to 0 elseif classreg==1, bstss=1; % intialize tuning set error to maximum ssthresh=0.001; % initialize thres end

% ============================================================== % training status monitoring % ============================================================== E=zeros(1,nepoch); % record training error

ndisp=5; % disp(' '); % disp(['the training error is plotted every ' int2str(ndisp) '

iterations']); % disp('Enter <Return> to use default value. ') % chos1=input('Enter a positive integer to set to a new value: '); chos1= input(j+10); if isempty(chos1), ndisp=5;

elseif chos1>0, ndisp=chos1; else ndisp=input('You must enter a positive integer, try again: '); end

% ============================================================== % intialization for the bp iterations % ==============================================================

t = 1; % initialize epoch counter ptr=1; % initialize pointer for re-sampling the training file not_converged = 1; % not yet converged

% ============================================================== % save all variables into a file so that user needs not reenter all of them % ==============================================================

save mlpconfig.mat

end