8/6/2019 DRA Assignment 4 v2 Edited
1/10
www
Team DR.ADecision and Risk Analysis Assignment 4
Win in the Weekend Horse Races
Regression Model vs Artificial Neural Network
Decision and Risk Analysis Assignment 4
8/6/2019 DRA Assignment 4 v2 Edited
2/10
Team DR.ADecision and Risk Analysis Assignment 4
Methodology
Using basic past statistics data from Hong Kong Jockey Club to predict the top two finishhorses on 2010/7/11 Sunday (Race 8 THE SHA TIN MILE TROPHY)
(Applying Multiple Regression & Artificial Neuron Network)
Team DR.ADecision and Risk Analysis Assignment 4
No. Name TrainnerTrainer
win ratejockey
jockey win
rateR1R2R3R4 R5 R6
Current
Ratingage sex total stakes Draw
Horse Wt.
(Declarati
on)
wt. winplac
e
1 YOUNG ELITE CFownes 13% B Prebble 16.61% 3 5 10 2 1 1 115 5 Gelding $4,507,625 7 1242 133 5.1 2
2 JACKPOT DELIGHT C
Fownes 13% C Y Ho 15.52% 2 7 8 7 7 12 112 6 Gelding $8,083,750 6 1145 130 4.1 2.1
3 YUMMY SPIRITS J
Moore 12% D Beadman 10.70% 6 1 6 2 2 1 108 5 Gelding $2,987,000 1 1078 126 5.9 2.3
4 EYSHAL J
Moore 12% J Lloyd 9.18% 11 3 12 12 9 8 107 6 Gelding $5,440,500 2 1177 125 14 4.6
5 EXPRESS WIN C H Yip 6% C W Wong 2.64% 12 7 8 7 4 11 105 6 Gelding $5,346,500 3 1175 123 18 4.5
6 CHATER WAY D EFerraris 4%
W CMarwing 8.42% 5 13 2 5 9 8 103 4 Gelding $3,035,750 10 1146 121 9.8 2.9
7 ENRICHED J Size 16% D Whyte 16.87% 2 8 1 1 2 5 99 4 Gelding $2,230,000 8 1166 117 5.4 1.8
8 LEGEND D J Hall 10% T H So 7.69% 4 4 1 1 3 2 98 5 Gelding $3,922,625 5 1157 116 10 2.8
9 PRESTO J Size 16% M L Yeung 7.51% 6 9 4 7 7 3 97 6 Gelding $4,810,750 9 1168 115 20 3
10 BEAUTY
FOREVER K L Man 9% H W Lai 4.70% 6 6 3 2 5 6 96 5 Gelding $1,573,750 4 1133 114 17 4
8/6/2019 DRA Assignment 4 v2 Edited
3/10
Team DR.ADecision and Risk Analysis Assignment 4
Correlations in SPSS
Team DR.ADecision and Risk Analysis Assignment 4
Correlations
win age DrawCurrent
Rating
Horse Wt.
(Declaration) jockey win rate R1 R2 R3 R4 R5 R6 total stakes Trainer win rate wt.
win Pearson 1 .452 -.085 -.572 .041 -.847 .696 .178 -.019 .414 .392 .162 -.082 -.230 -.572
Sig. (2-tailed) .190 .815 .084 .911 .002 .025 .622 .958 .234 .262 .655 .823 .523 .084
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
age Pearson .452 1 -.419 .237 .122 -.297 .517 -.318 .651 .691 .295 .338 .748 .172 .237Sig. (2-tailed) .190 .228 .510 .737 .404 .126 .371 .041 .027 .408 .340 .013 .634 .510
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Draw Pearson -.085 -.419 1 -.207 .366 .307 -.561 .844 -.466 -.162 .192 -.032 -.064 .098 -.207
Sig. (2-tailed) .815 .228 .565 .299 .388 .091 .002 .174 .655 .595 .930 .860 .787 .565
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Current Rating Pearson -.572 .237 -.207 1 .282 .519 -.054 -.287 .771 .225 -.115 .136 .574 .039 1.000
Sig. (2-tailed) .084 .510 .565 .431 .124 .881 .421 .009 .533 .753 .709 .083 .915 .000
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Horse Wt. (Declaration) Pearson .041 .122 .366 .282 1 .258 -.015 .181 .368 .148 -.103 -.021 .266 .152 .282
Sig. (2-tailed) .911 .737 .299 .431 .472 .967 .618 .296 .683 .778 .954 .458 .676 .431
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
jockey win rate Pearson -.847 -.297 .307 .519 .258 1 -.732 -.064 .128 -.241 -.310 -.198 .173 .613 .519
Sig. (2-tailed) .002 .404 .388 .124 .472 .016 .860 .724 .503 .383 .583 .632 .060 .124
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
R1 Pearson .696 .517 -.561 -.054 -.015 -.732 1 -.193 .453 .632 .325 .324 .106 -.409 -.054
Sig. (2-tailed) .025 .126 .091 .881 .967 .016 .593 .189 .050 .360 .361 .770 .240 .881
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
R2 Pearson .178 -.318 .844 - .287 .181 -.064 -.193 1 -.428 .075 .449 .387 -.046 -.296 -.287
Sig. (2-tailed) .622 .371 .002 .421 .618 .860 .593 .217 .838 .193 .270 .900 .406 .421
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
R3 Pearson -.019 .651 -.466 .771 .368 .128 .453 -.428 1 .637 .140 .253 .623 .091 .771
Sig. (2-tailed) .958 .041 .174 .009 .296 .724 .189 .217 .048 .700 .480 .054 .803 .009
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
R4 Pearson .414 .691 -.162 .225 .148 -.241 .632 .075 .637 1 .772 .603 .641 -.059 .225
Sig. (2-tailed) .234 .027 .655 .533 .683 .503 .050 .838 .048 .009 .065 .046 .871 .533
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
R5 Pearson .392 .295 .192 -.115 -.103 -.310 .325 .449 .140 .772 1 .588 .338 -.276 -.115
Sig. (2-tailed) .262 .408 .595 .753 .778 .383 .360 .193 .700 .009 .074 .339 .440 .753
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10R6 Pearson .162 .338 -.032 .136 -.021 -.198 .324 .387 .253 .603 .588 1 .514 -.394 .136
Sig. (2-tailed) .655 .340 .930 .709 .954 .583 .361 .270 .480 .065 .074 .129 .260 .709
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
total stakes Pearson -.082 .748 -.064 .574 .266 .173 .106 -.046 .623 .641 .338 .514 1 .143 .574
Sig. (2-tailed) .823 .013 .860 .083 .458 .632 .770 .900 .054 .046 .339 .129 .693 .083
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Trainer win rate Pearson -.230 .172 .098 .039 .152 .613 -.409 -.296 .091 -.059 -.276 -.394 .143 1 .039
Sig. (2-tailed) .523 .634 .787 .915 .676 .060 .240 .406 .803 .871 .440 .260 .693 .915
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
wt. Pearson -.572 .237 -.207 1.000 .282 .519 -.054 -.287 .771 .225 -.115 .136 .574 .039 1
Sig. (2-tailed) .084 .510 .565 .000 .431 .124 .881 .421 .009 .533 .753 .709 .083 .915
N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
Signif icant at the 0.001 level Significant at the 0.05 level
8/6/2019 DRA Assignment 4 v2 Edited
4/10
Team DR.ADecision and Risk Analysis Assignment 4
Regression
Team DR.ADecision and Risk Analysis Assignment 4
Fundamental elements to study horse race statistics
R1-R6 Age & Draw are put on scale Measure in SPSS
Perform Correlation (Bivariate) Regression in SPSS to find o
ut the constant
8/6/2019 DRA Assignment 4 v2 Edited
5/10
Team DR.ADecision and Risk Analysis Assignment 4
Limitations
Team DR.ADecision and Risk Analysis Assignment 4
Horse race data is very impure; many "upsets occur.
Unpredictable variations
Pass statistics may become unreliable or misinterpreted
The problem of multicolinearity (high correlation of variables) takes place in
most horse racing variables.
Conclusion: Too many human factors involved causing upsets occur
Horse Racing is a pure game of luck after all.
8/6/2019 DRA Assignment 4 v2 Edited
6/10
Team DR.ADecision and Risk Analysis Assignment 4 5
Artificial Neuron Network with EasyNN(Trial)
Why Artificial Neuron Network?
- LEARN and DERIVE from complicated or imprecise data.
- Extract and detect patterns and trends that are too complex to be notified by human.
EasyNN Modeling
Assumption:
- 2 types of horses in a race only: Winner & Losers.
- Winners possess all the winning factors in that race.
- 1st Runner-up possesses the best factors among the losers.
for each race, only pick data of the winning and 1st runner up.
Limitations:
- Trial version allows only 100 records to be processed.- Excluding query test data, only 84 data rows are input as example.
- Only half of the example rows are trained and learnt. the network built is not reliable.
8/6/2019 DRA Assignment 4 v2 Edited
7/10
Team DR.ADecision and Risk Analysis Assignment 4 6
Data Mining - for the best learning
How Data affect our Neural Network?
- Number of Data fields
- Data structure
Data Modeling- Trial 1 : Input all data fields in raw format.
- Trial 2 : Input all data fields with processed format - to reduce the spread of data range.
e.g. averaging out all winning rates of the horses and classify a horse into winning rate
either above or below the average.
- Trial 3 : Reduce no. of data fields with low Significance and Importance.
- Trial 4 : Further reduce no. of the data fields.
- Trial 5 : Redefine all the average values of the factors - only included rates from winning
horses.
Trial Validating Correctness
1 57.5%
2 65%
3 70%
4 65%
5 55%
8/6/2019 DRA Assignment 4 v2 Edited
8/10
Team DR.ADecision and Risk Analysis Assignment 4 7
Neural Network Automated Builder
84 records of 47 races are input into the systems. Only the winners and the 1st runner-
up of these 47 races are selected.
For each record, we input the followings 11 factors :
No. of horses in the race (6-14)
The total distance of the race (1000-2400)
The class of the race (1-5)
The draw of the horse (1-14) The weight of the horse (107-133)
If the horse has a winning rate of more than 19.87 (true or false)
If the horse wore gears during that race (true or false)
If the trainer has a winning rate of more than 8.48 (true or false)
If the jockey has a winning rate of more than 10.02 (true or false)
The country code of the horse (1-6)
The age of the horse (3-9)
The odd of the horse in that race is less than 9.98 (true or false)
The Neural Network
randomly chooses and learns from 46 training example rows in 5000 cycles
is self-validating with 40 examples rows
is manually validating with 6 full set of past racing data
8/6/2019 DRA Assignment 4 v2 Edited
9/10
Team DR.ADecision and Risk Analysis Assignment 4 8
Exciting Forecasts & unExpected Result
We forecast the result of Race 8 on 11th July.
Our ANN suggests either horse no.1 or no.7 to be the WINNER!
Results: 6-1-8
8/6/2019 DRA Assignment 4 v2 Edited
10/10
Team DR.ADecision and Risk Analysis Assignment 4 9
Conclusions & Recommendations
The learning of Neural Network depends heavily on the data structures and input fields.
Input should avoid alphabetic, wide spread of unevenly distributed numeric values.
These will introduce exceptional learning noises.
When a race involves more horses, forecast returns more winning horses. However it
is always difficult to locate exactly 1 potential winner from the forecast.
Our model could increase forecast accuracy by inputting more learning records,
manually modifying the input weighting in the hidden layers and fine-tuning the
attributes of the learning mechanism of the built-in back propagation algorithm.
Test Runners No. of Potential Winners Forecasts
1 14 3 Success
2 14 5 Success
3 14 2 Failed
4 12 2 Failed
5 10 1 Success6 10 2 Failed
Top Related