Quantile Regression

11
Quantile Regression Prize Winnings – LPGA 2009/2010 Seasons www.lpga.com Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,” International Journal of Sport Finance, Vol. 5, pp. 167-170 Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press, College Station, TX.

description

Quantile Regression. Prize Winnings – LPGA 2009/2010 Seasons www.lpga.com Kahane , L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,” International Journal of Sport Finance , Vol. 5, pp. 167-170 - PowerPoint PPT Presentation

Transcript of Quantile Regression

Page 1: Quantile  Regression

Quantile Regression

Prize Winnings – LPGA 2009/2010 Seasons

www.lpga.com

Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression Approach,” International Journal of Sport Finance, Vol. 5, pp. 167-170

Cameron, A.C., and P.K. Trivedi (2010). Microeconometrics Using Stata, Revised Edition, STATA Press, College Station, TX.

Page 2: Quantile  Regression

Data Description

• Ladies Professional Golf Association (LPGA) participants during 2009 and 2010 seasons

• Response Variable: Earnings per Event entered ($1000s)

• Predictor Variables: Average Driving Distance Percent of Fairways reached on Drives Percent of Greens reached in Regulation Putts per Hole on Greens reached in Regulation Percent of Sand Saves (2 shots to hole)

Page 3: Quantile  Regression

Quantile Regression• Linear Regression is used to relate the Conditional

Mean to predictors.• Quantile Regression relates specific quantiles to

predictors. Particularly useful with non-normal data• Makes use of different loss function than Ordinary

Least Squares – Uses linear programming to estimate

1

: ' : '

Cumulative Distribution Function (CDF): ( ) Pr

Quantile: ( )

Loss Function to b e minimized for Quantile:

1i i i i

thq q

th

N N

q i ii y x i y x

F y Y y

q F y q y F q

q

Q q y q y

i q i qx 'β x 'β

Page 4: Quantile  Regression

Summary Data for Earnings/Event earnevent------------------------------------------------------------- Percentiles Smallest 1% .1654375 0 5% .5611538 010% 1.200667 .1654375 Obs 28925% 2.991929 .2545882 Sum of Wgt. 289

50% 6.653733 Mean 13.44039 Largest Std. Dev. 17.4023775% 15.45304 81.3550490% 34.815 81.95658 Variance 302.842595% 54.85067 82.81731 Skewness 2.32599199% 81.95658 99.06261 Kurtosis 8.461489

Note: The data are highly skewed:• Mean > 2*Median• Std. Dev. > Mean

Page 5: Quantile  Regression

Plots of Earnings per Event – Showing Skew

Page 6: Quantile  Regression

Multiple Linear Regression

Source | SS df MS Number of obs = 289-------------+------------------------------ F( 5, 283) = 68.95 Model | 47899.6433 5 9579.92865 Prob > F = 0.0000 Residual | 39318.9937 283 138.936374 R-squared = 0.5492-------------+------------------------------ Adj R-squared = 0.5412 Total | 87218.637 288 302.84249 Root MSE = 11.787

------------------------------------------------------------------------------ earnevent | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- drive | .0749854 .112027 0.67 0.504 -.1455266 .2954974 fairway | .0765432 .1433219 0.53 0.594 -.2055689 .3586554 green | 1.515417 .2260856 6.70 0.000 1.070394 1.96044girputtshole | -155.7758 19.96318 -7.80 0.000 -195.0709 -116.4806 sandsvpct | .4146478 .0919212 4.51 0.000 .2337117 .5955839 _cons | 160.2711 49.32258 3.25 0.001 63.1854 257.3567------------------------------------------------------------------------------

The model explains approximately 55% of the variation in earnings per eventImportant Factors: Greens in Regulation (+), Putts per hole (-), Sand Save Percent (+)

Page 7: Quantile  Regression

Influential Observations wrt bs

drive fairway green girputtshole sandsvpctObsnum _dfbeta_1 Obsnum _dfbeta_2 Obsnum _dfbeta_3 Obsnum _dfbeta_4 Obsnum _dfbeta_5

30 0.3509 30 0.3433 30 -0.3225 30 0.6184 30 0.3295166 -0.2432 125 0.2899 104 -0.2795 106 -0.2525 104 -0.3027249 -0.3423 249 -0.4798 166 0.2661 125 -0.2833 163 0.5885268 -0.3883 238 -0.2572 177 0.2556 211 -0.2842

249 -0.2418 211 -0.4088 268 0.7242256 0.4677 238 -0.5183

268 -0.2856

( )

2 2Influential Observations: 0.1176

289j iDFBETAS

n

These cases are extremely influential (higher than twice the “rule of thumb”).Golfers: 30 (Michelle Ellis, 2009), 104 (Liselotte Neuman, 2009), 125 (Jiyai Shin, 2009), 166 (Paula Creamer, 2010), 211 (Cristie Kerr, 2010), 249 (Angela Park, 2010), and 268 (Jiyai Shin, 2010) appear to have high influence on several regression coefficients

Page 8: Quantile  Regression

Quantile RegressionModels the regression relation for various quantiles between the predictors and the response variable: Earnings per Event

: ' : '

1 drive fairway green putts sandsav

Loss Function to b e minimized for Quantile:

1i i i i

i iq

i i i i i

th

N N

q i ii y x i y x

Y

q

Q q y q y

'i q

'i

i q i q

x β

x

x 'β x 'β

Standard errors of regression coefficients are estimated by bootstrapping 400 samples

Page 9: Quantile  Regression

Quantile Regression Output (STATA)q25 Coef. Std. Err. t P>|t| [95% Conf.Interval]drive -0.0254 0.0539 -0.4700 0.6380 -0.1314 0.0806fairway -0.0073 0.0665 -0.1100 0.9120 -0.1382 0.1235green 0.8419 0.1492 5.6400 0.0000 0.5483 1.1355girputtshole -72.3041 9.3716 -7.7200 0.0000 -90.7510 -53.8571sandsvpct 0.1215 0.0435 2.8000 0.0060 0.0360 0.2071_cons 86.0426 18.9736 4.5300 0.0000 48.6952 123.3899

q50 Coef. Std. Err. t P>|t| [95% Conf.Interval]drive 0.0528 0.0786 0.6700 0.5020 -0.1018 0.2074fairway 0.0476 0.0973 0.4900 0.6250 -0.1440 0.2392green 0.9611 0.1832 5.2500 0.0000 0.6006 1.3217girputtshole -95.3871 17.1245 -5.5700 0.0000 -129.0947 -61.6795sandsvpct 0.1432 0.0777 1.8400 0.0660 -0.0098 0.2962_cons 99.7847 43.1192 2.3100 0.0210 14.9096 184.6598

q75 Coef. Std. Err. t P>|t| [95% Conf.Interval]drive 0.3131 0.1399 2.2400 0.0260 0.0377 0.5886fairway 0.0857 0.1692 0.5100 0.6130 -0.2475 0.4188green 1.1226 0.3229 3.4800 0.0010 0.4869 1.7583girputtshole -127.4508 37.0845 -3.4400 0.0010 -200.4471 -54.4544sandsvpct 0.4549 0.1288 3.5300 0.0000 0.2014 0.7084_cons 76.0492 86.7001 0.8800 0.3810 -94.6097 246.7080

Note:1) Driving distance is

only significant among golfers at the 75th percentile

2) Putting ability effect increases among skill levels

3) Greens in regulation effect is fairly equal among skill levels

4) Fairway accuracy is not significant for any skill level

5) Sand saves are more important for golfers at the 75th percentile

Page 10: Quantile  Regression

Tests of Equality of Coefficients Across Quantiles

. test[q25=q50=q75]: drive( 1) [q25]drive - [q50]drive = 0 ( 2) [q25]drive - [q75]drive = 0F( 2, 283) = 3.25 Prob > F = 0.0403

. test[q25=q50=q75]: fairway ( 1) [q25]fairway - [q50]fairway = 0 ( 2) [q25]fairway - [q75]fairway = 0F( 2, 283) = 0.27 Prob > F = 0.7603

. test[q25=q50=q75]: green ( 1) [q25]green - [q50]green = 0 ( 2) [q25]green - [q75]green = 0F( 2, 283) = 0.58 Prob > F = 0.5600

. test[q25=q50=q75]: girputtshole( 1) [q25]putts - [q50]putts = 0 ( 2) [q25] putts - [q75] putts = 0F( 2, 283) = 1.62 Prob > F = 0.1989

. test[q25=q50=q75]: sandsvpct( 1) [q25]sandsvpct - [q50]sandsvpct = 0 ( 2) [q25]sandsvpct - [q75]sandsvpct = 0F( 2, 283) = 5.35 Prob > F = 0.0053

Page 11: Quantile  Regression

Plots of Regression Coefficients by Quantile