DRA Assignment 4 v2 Edited

download DRA Assignment 4 v2 Edited

of 10

Transcript of DRA Assignment 4 v2 Edited

  • 8/6/2019 DRA Assignment 4 v2 Edited

    1/10

    www

    Team DR.ADecision and Risk Analysis Assignment 4

    Win in the Weekend Horse Races

    Regression Model vs Artificial Neural Network

    Decision and Risk Analysis Assignment 4

  • 8/6/2019 DRA Assignment 4 v2 Edited

    2/10

    Team DR.ADecision and Risk Analysis Assignment 4

    Methodology

    Using basic past statistics data from Hong Kong Jockey Club to predict the top two finishhorses on 2010/7/11 Sunday (Race 8 THE SHA TIN MILE TROPHY)

    (Applying Multiple Regression & Artificial Neuron Network)

    Team DR.ADecision and Risk Analysis Assignment 4

    No. Name TrainnerTrainer

    win ratejockey

    jockey win

    rateR1R2R3R4 R5 R6

    Current

    Ratingage sex total stakes Draw

    Horse Wt.

    (Declarati

    on)

    wt. winplac

    e

    1 YOUNG ELITE CFownes 13% B Prebble 16.61% 3 5 10 2 1 1 115 5 Gelding $4,507,625 7 1242 133 5.1 2

    2 JACKPOT DELIGHT C

    Fownes 13% C Y Ho 15.52% 2 7 8 7 7 12 112 6 Gelding $8,083,750 6 1145 130 4.1 2.1

    3 YUMMY SPIRITS J

    Moore 12% D Beadman 10.70% 6 1 6 2 2 1 108 5 Gelding $2,987,000 1 1078 126 5.9 2.3

    4 EYSHAL J

    Moore 12% J Lloyd 9.18% 11 3 12 12 9 8 107 6 Gelding $5,440,500 2 1177 125 14 4.6

    5 EXPRESS WIN C H Yip 6% C W Wong 2.64% 12 7 8 7 4 11 105 6 Gelding $5,346,500 3 1175 123 18 4.5

    6 CHATER WAY D EFerraris 4%

    W CMarwing 8.42% 5 13 2 5 9 8 103 4 Gelding $3,035,750 10 1146 121 9.8 2.9

    7 ENRICHED J Size 16% D Whyte 16.87% 2 8 1 1 2 5 99 4 Gelding $2,230,000 8 1166 117 5.4 1.8

    8 LEGEND D J Hall 10% T H So 7.69% 4 4 1 1 3 2 98 5 Gelding $3,922,625 5 1157 116 10 2.8

    9 PRESTO J Size 16% M L Yeung 7.51% 6 9 4 7 7 3 97 6 Gelding $4,810,750 9 1168 115 20 3

    10 BEAUTY

    FOREVER K L Man 9% H W Lai 4.70% 6 6 3 2 5 6 96 5 Gelding $1,573,750 4 1133 114 17 4

  • 8/6/2019 DRA Assignment 4 v2 Edited

    3/10

    Team DR.ADecision and Risk Analysis Assignment 4

    Correlations in SPSS

    Team DR.ADecision and Risk Analysis Assignment 4

    Correlations

    win age DrawCurrent

    Rating

    Horse Wt.

    (Declaration) jockey win rate R1 R2 R3 R4 R5 R6 total stakes Trainer win rate wt.

    win Pearson 1 .452 -.085 -.572 .041 -.847 .696 .178 -.019 .414 .392 .162 -.082 -.230 -.572

    Sig. (2-tailed) .190 .815 .084 .911 .002 .025 .622 .958 .234 .262 .655 .823 .523 .084

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    age Pearson .452 1 -.419 .237 .122 -.297 .517 -.318 .651 .691 .295 .338 .748 .172 .237Sig. (2-tailed) .190 .228 .510 .737 .404 .126 .371 .041 .027 .408 .340 .013 .634 .510

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    Draw Pearson -.085 -.419 1 -.207 .366 .307 -.561 .844 -.466 -.162 .192 -.032 -.064 .098 -.207

    Sig. (2-tailed) .815 .228 .565 .299 .388 .091 .002 .174 .655 .595 .930 .860 .787 .565

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    Current Rating Pearson -.572 .237 -.207 1 .282 .519 -.054 -.287 .771 .225 -.115 .136 .574 .039 1.000

    Sig. (2-tailed) .084 .510 .565 .431 .124 .881 .421 .009 .533 .753 .709 .083 .915 .000

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    Horse Wt. (Declaration) Pearson .041 .122 .366 .282 1 .258 -.015 .181 .368 .148 -.103 -.021 .266 .152 .282

    Sig. (2-tailed) .911 .737 .299 .431 .472 .967 .618 .296 .683 .778 .954 .458 .676 .431

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    jockey win rate Pearson -.847 -.297 .307 .519 .258 1 -.732 -.064 .128 -.241 -.310 -.198 .173 .613 .519

    Sig. (2-tailed) .002 .404 .388 .124 .472 .016 .860 .724 .503 .383 .583 .632 .060 .124

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    R1 Pearson .696 .517 -.561 -.054 -.015 -.732 1 -.193 .453 .632 .325 .324 .106 -.409 -.054

    Sig. (2-tailed) .025 .126 .091 .881 .967 .016 .593 .189 .050 .360 .361 .770 .240 .881

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    R2 Pearson .178 -.318 .844 - .287 .181 -.064 -.193 1 -.428 .075 .449 .387 -.046 -.296 -.287

    Sig. (2-tailed) .622 .371 .002 .421 .618 .860 .593 .217 .838 .193 .270 .900 .406 .421

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    R3 Pearson -.019 .651 -.466 .771 .368 .128 .453 -.428 1 .637 .140 .253 .623 .091 .771

    Sig. (2-tailed) .958 .041 .174 .009 .296 .724 .189 .217 .048 .700 .480 .054 .803 .009

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    R4 Pearson .414 .691 -.162 .225 .148 -.241 .632 .075 .637 1 .772 .603 .641 -.059 .225

    Sig. (2-tailed) .234 .027 .655 .533 .683 .503 .050 .838 .048 .009 .065 .046 .871 .533

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    R5 Pearson .392 .295 .192 -.115 -.103 -.310 .325 .449 .140 .772 1 .588 .338 -.276 -.115

    Sig. (2-tailed) .262 .408 .595 .753 .778 .383 .360 .193 .700 .009 .074 .339 .440 .753

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10R6 Pearson .162 .338 -.032 .136 -.021 -.198 .324 .387 .253 .603 .588 1 .514 -.394 .136

    Sig. (2-tailed) .655 .340 .930 .709 .954 .583 .361 .270 .480 .065 .074 .129 .260 .709

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    total stakes Pearson -.082 .748 -.064 .574 .266 .173 .106 -.046 .623 .641 .338 .514 1 .143 .574

    Sig. (2-tailed) .823 .013 .860 .083 .458 .632 .770 .900 .054 .046 .339 .129 .693 .083

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    Trainer win rate Pearson -.230 .172 .098 .039 .152 .613 -.409 -.296 .091 -.059 -.276 -.394 .143 1 .039

    Sig. (2-tailed) .523 .634 .787 .915 .676 .060 .240 .406 .803 .871 .440 .260 .693 .915

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    wt. Pearson -.572 .237 -.207 1.000 .282 .519 -.054 -.287 .771 .225 -.115 .136 .574 .039 1

    Sig. (2-tailed) .084 .510 .565 .000 .431 .124 .881 .421 .009 .533 .753 .709 .083 .915

    N 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10

    Signif icant at the 0.001 level Significant at the 0.05 level

  • 8/6/2019 DRA Assignment 4 v2 Edited

    4/10

    Team DR.ADecision and Risk Analysis Assignment 4

    Regression

    Team DR.ADecision and Risk Analysis Assignment 4

    Fundamental elements to study horse race statistics

    R1-R6 Age & Draw are put on scale Measure in SPSS

    Perform Correlation (Bivariate) Regression in SPSS to find o

    ut the constant

  • 8/6/2019 DRA Assignment 4 v2 Edited

    5/10

    Team DR.ADecision and Risk Analysis Assignment 4

    Limitations

    Team DR.ADecision and Risk Analysis Assignment 4

    Horse race data is very impure; many "upsets occur.

    Unpredictable variations

    Pass statistics may become unreliable or misinterpreted

    The problem of multicolinearity (high correlation of variables) takes place in

    most horse racing variables.

    Conclusion: Too many human factors involved causing upsets occur

    Horse Racing is a pure game of luck after all.

  • 8/6/2019 DRA Assignment 4 v2 Edited

    6/10

    Team DR.ADecision and Risk Analysis Assignment 4 5

    Artificial Neuron Network with EasyNN(Trial)

    Why Artificial Neuron Network?

    - LEARN and DERIVE from complicated or imprecise data.

    - Extract and detect patterns and trends that are too complex to be notified by human.

    EasyNN Modeling

    Assumption:

    - 2 types of horses in a race only: Winner & Losers.

    - Winners possess all the winning factors in that race.

    - 1st Runner-up possesses the best factors among the losers.

    for each race, only pick data of the winning and 1st runner up.

    Limitations:

    - Trial version allows only 100 records to be processed.- Excluding query test data, only 84 data rows are input as example.

    - Only half of the example rows are trained and learnt. the network built is not reliable.

  • 8/6/2019 DRA Assignment 4 v2 Edited

    7/10

    Team DR.ADecision and Risk Analysis Assignment 4 6

    Data Mining - for the best learning

    How Data affect our Neural Network?

    - Number of Data fields

    - Data structure

    Data Modeling- Trial 1 : Input all data fields in raw format.

    - Trial 2 : Input all data fields with processed format - to reduce the spread of data range.

    e.g. averaging out all winning rates of the horses and classify a horse into winning rate

    either above or below the average.

    - Trial 3 : Reduce no. of data fields with low Significance and Importance.

    - Trial 4 : Further reduce no. of the data fields.

    - Trial 5 : Redefine all the average values of the factors - only included rates from winning

    horses.

    Trial Validating Correctness

    1 57.5%

    2 65%

    3 70%

    4 65%

    5 55%

  • 8/6/2019 DRA Assignment 4 v2 Edited

    8/10

    Team DR.ADecision and Risk Analysis Assignment 4 7

    Neural Network Automated Builder

    84 records of 47 races are input into the systems. Only the winners and the 1st runner-

    up of these 47 races are selected.

    For each record, we input the followings 11 factors :

    No. of horses in the race (6-14)

    The total distance of the race (1000-2400)

    The class of the race (1-5)

    The draw of the horse (1-14) The weight of the horse (107-133)

    If the horse has a winning rate of more than 19.87 (true or false)

    If the horse wore gears during that race (true or false)

    If the trainer has a winning rate of more than 8.48 (true or false)

    If the jockey has a winning rate of more than 10.02 (true or false)

    The country code of the horse (1-6)

    The age of the horse (3-9)

    The odd of the horse in that race is less than 9.98 (true or false)

    The Neural Network

    randomly chooses and learns from 46 training example rows in 5000 cycles

    is self-validating with 40 examples rows

    is manually validating with 6 full set of past racing data

  • 8/6/2019 DRA Assignment 4 v2 Edited

    9/10

    Team DR.ADecision and Risk Analysis Assignment 4 8

    Exciting Forecasts & unExpected Result

    We forecast the result of Race 8 on 11th July.

    Our ANN suggests either horse no.1 or no.7 to be the WINNER!

    Results: 6-1-8

  • 8/6/2019 DRA Assignment 4 v2 Edited

    10/10

    Team DR.ADecision and Risk Analysis Assignment 4 9

    Conclusions & Recommendations

    The learning of Neural Network depends heavily on the data structures and input fields.

    Input should avoid alphabetic, wide spread of unevenly distributed numeric values.

    These will introduce exceptional learning noises.

    When a race involves more horses, forecast returns more winning horses. However it

    is always difficult to locate exactly 1 potential winner from the forecast.

    Our model could increase forecast accuracy by inputting more learning records,

    manually modifying the input weighting in the hidden layers and fine-tuning the

    attributes of the learning mechanism of the built-in back propagation algorithm.

    Test Runners No. of Potential Winners Forecasts

    1 14 3 Success

    2 14 5 Success

    3 14 2 Failed

    4 12 2 Failed

    5 10 1 Success6 10 2 Failed