Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190...

18
Regression to the Mean at The Masters Golf Tournament A comparative analysis of regression to the mean on the PGA tour and at the Masters Tournament Kevin Masini Pomona College Economics 190

Transcript of Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190...

Page 1: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

RegressiontotheMeanatTheMastersGolfTournamentAcomparativeanalysisofregressiontothemeanonthePGAtourandattheMastersTournament

KevinMasiniPomonaCollegeEconomics190

Page 2: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

2

1. Introduction

Everysportinvolveselementsofluckandskill.EvenonthePGAtour,whichis

consideredasthehighestlevelofgolf,scoresandwinnersareoftendeterminedbyafortuitous

bounceontothegreenoranunluckykickintoahazard.Becausegolfissuchagameofinches,

thereisanimperfectcorrelationbetweenplayerperformanceandskill.Thisimperfect

correlationcanbeseeninallsports,andisespeciallyevidentinthegameofgolf.Thisiswhy

weseesomanydifferentwinnersonthePGAtourandwhyitissodifficultforplayerstowin

multiplestournamentsinagivenseasonandeventhroughoutaplayer’scareer.The

aforementionedimperfectcorrelationleadstoaphenomenonknownasregressiontothe

mean.

1.1RegressiontotheMean

Regressiontothemeanisthephenomenonwheresomeonewhoperformstowardan

extremeoneyearislikelytoperformclosertothemeanthefollowingyear.Regressiontothe

meancanbeseeninmanydifferentaspectsoflife,butisespeciallynoticeableinsports.Itwas

firstobservedin1886whenSirFrancisGaltonstudiedtherelationshipbetweentheheightsof

parentsandtheirchildren(Galton,1886).Thisinauguralworkhasledtofurtherresearchonthe

phenomenon.Awell-knownexampleofregressiontothemeanisthe“sophomoreslump”.

Thesophomoreslumpiswhereaplayerwhohasaparticularlyexceptionalrookieseasonshows

declineintheirsecondseason.Thisisverymuchthedefinitionofregressiontothemean.A

rookiewhohadanexceptionalseasonlikelyoutperformedtheirtrueabilityandwillregress

Page 3: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

3

towardsthemeanthefollowingyear.Justasaplayerwhounderperformsintheirfirstseason

willlikelyperformbetterintheirsecondseason.

1.2TheMasters

Eachseasontherearenearly50PGAtourevents.Ofthesetournamentstherearefour

majortournaments(majors).Thefourmajorsareviewedasthemostimportanttournaments

eachyear.Ofthefour,TheMastersTournamentistheonlyoneplayedatthesamecourse

everyyear.TheMasterswasfirstplayedin1934andtypicallyhasafieldofeightytoone

hundredofthebestgolfersintheworld.EachyearTheMastersisplayedatAugustaNational,

oneofthemostfamousgolfcoursesintheworld.

TheMastershasbeenplayedatAugustaNational73times,ofthose73,47havebeen

wonbymultipletimewinners.Thatis,peoplewhohaveoneatleasttwiceaccountfornearly

two-thirdsofthevictoriesatAugusta.Thatmeanstherehavebeen26one-timewinnersatThe

Masters.TrevorImmelmanwonthetournamentin2008asoneofhisonlytwowinsonthePGA

tour.Furthermore,hehasonlyfinishedinthetop10twiceinhisfifteenappearancesat

Augusta.ThisisarareoccurrenceatTheMasters.Typically,fansseefamiliarnamesatopthe

leaderboardeachyear.Forexample,PhilMickelsonhasfinishedintheTop10atTheMasters

infourteenofhistwenty-fourprofessionalstarts,winningthreetimes.Toputthatinto

perspective,Philhasfinishedinthetop10in58%ofhisMastersstartscomparedto34%ofhis

PGAtourstarts.SimilartoMickelson,manyplayersseemto‘showup’atTheMastersevery

year.Whetheritbethecourse,thefactthatmanyplayerstailortheirschedulearoundthe

tournament,orsomeotherreason,itseemsthatcertainplayersshowlessregressiontothe

Page 4: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

4

meanfromyeartoyearatTheMasters.ItisbecauseofthisthatIhypothesizethatwewillsee

lessregressiontothemeanatTheMastersthanisseenduringtheentirePGATourseason.

Thisgoesforbothyear-to-yearaswellasfromround-to-round.

2. LiteratureReview

Regressiontothemeanisstudiedinanumberofdifferentareas,withsportsbeingone

ofthemainfocuses.Whenitcomestosports,aplayer’sperformancecanbemodeledbya

combinationofluckandskill.Essentially,eachathletehasabaseskilllevelandthenhas

differentlevelsofluckonagivendayorduringagivenseason.Intermsofgolf,weseethese

fluctuationsinluckmoreoftenthanthetypicalsport.InKahnemen’sThinkingFastandSlow

(2011)heoffersasimplemodelofluckandskill,whichisasfollows:

𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑡𝑎𝑙𝑒𝑛𝑡 + 𝑙𝑢𝑐𝑘

𝑔𝑟𝑒𝑎𝑡𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑎𝑙𝑖𝑡𝑡𝑙𝑒𝑚𝑜𝑟𝑒𝑡𝑎𝑙𝑒𝑛𝑡 + 𝑎𝑙𝑜𝑡𝑜𝑓𝑙𝑢𝑐𝑘

Thissimplemodeloffersinsightonregressiontothemeaningolfandhowtointuitively

understandthefluctuationsinplayer’sscores.Thinkofthefirsttworoundsofagolf

tournament.Saythattheaveragescoreispar,ora72.Onewouldexpectthataplayerthat

shota65hasaboveaverageskill,butalsoexperiencedaboveaverageluck.Thisplayerislikely

tobesuccessfulonthesecondday,butprobablylesssuccessfulbecausetheywillnotbeas

luckyastheywereonthefirstday(Kahneman,2011).Kahnemandoesagoodjobofdescribing

thetheorybehindregressiontothemeanandmorespecificallyluckandskillingolf,butdoes

notofferanydataonthesubject.

Page 5: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

5

ConnollyandRendleman(2008,2009)usethismodelofluckandskill,butoffermore

insightsonthedirectresultthatithasongolfers.Theydiscoveredthatthewinnerofanormal

PGAtoureventexperiencesroughly2.5strokesperroundofabnormallyfavorablerandom

variationinscoring.BroadieandRendleman(2015)wentdeeperintheiranalysisofluckand

skillatalllevelsofgolfbylookingathowplayer’sperformancechangedfromthefirstroundto

thesecondroundoftournaments.Theysplitplayersintotwogroups,basedontheirfirstround

performance.Group1beingplayersinthetophalfandGroup2beingplayersinthebottom

half.Theythenlookedathowplayersineachgroupperformedinthesecondround.They

foundthatGroup1asacollectiveperformedmuchworseontheseconddaywhileGroup2

showedmuchimprovement.Thistestshowedclearevidenceofregressiontothemean

betweenthefirsttworoundofprofessionalgolftournaments.Theiranalysisalsolookedat

howdifferentskilllevelsareeffectedbyluckandskill.Theydiscoveredthatasyoudecreasethe

skilllevelofgolfersfromprofessionalstoamateurstoyoureverydaycountryclubgolfer,the

variationinscoresismorelikelytobeduetoskillratherthanluckwhentheplayersareless

skilled.Thisisknownastheparadoxbetweenluckandskill.

SchallandSmith(2000)lookedatregressiontothemeaninprofessionalbaseball

players.Theiranalysisdidnotfocusonthemodelofluckandskill,butusedaverysimilar

modelforplayerperformance.Theydidaseason-by-seasonanalysisofbattingaveragesand

earnedrunaveragesstandardizedeachseasontohaveameanofzeroandastandarddeviation

of1.Theyfoundthattherewasanimperfectcorrelationinperformancefromoneyeartothe

next.Becauseperformanceisimperfectlymeasured,playersbattingaveragesandearnedrun

averagesregresstowardsthemean.

Page 6: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

6

3. Data

ThispaperutilizesdataobtainedfromthePGAtoursShotLinkdatabase.Thedatabase

hasdataontheoverallresultsoftournamentsaswellasshot-by-shotdataforeveryshothitin

competitionplay.ThePGAtourhashundredsofvolunteersateachtournamenttohelpwith

thecollectionoftheshot-by-shotdata.Theyusethisshot-by-shotdatatorunanalyseson

playersandtournamentstoofferinsightintohowplayersindividuallyandasagroupperform

onanumberofdifferentlayersofskillsets.

Intermsofthisanalysis,theshot-by-shotdataisnotnecessary.Thispaperutilizes

playerscoresduringthefirsttworoundatTheMastersTournamentaswellasaveragefirstand

secondroundscoresforplayersthroughouttheentireseason.Scoresfromthethirdandfourth

roundsarenotusedastheyoccurafteranumberofplayersare“cut”fromthetournament.

Datawaspulledfortheten-yearstretchfrom2008until2017.

4. Methodology

Thisanalysisdiffersfrompreviousanalysesinthatitisacomparativeanalysisbetween

thePGAtourseasonandTheMastersTournament.Ilooktoseeifthereisasignificant

differenceinhowplayersregresstothemeanatTheMasterscomparedtothroughoutthe

season.Regressiontothemeanislookedatfromyear-to-yearaswellasfromround-to-round

inagivenyear.Atypicalprofessionalgolftournamentconsistsoffourroundsoftournament

playwithpoorerperformingplayersbeingcutfollowingthesecondround.Thispaperfocuses

onthefirsttworoundsofthetournamentinordertoincludeeveryplayerinthefieldfora

giventournament.Inordertoseehowplayersperformfromoneroundtothenext,thisstudy

Page 7: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

7

usesatestverysimilartotheoneperformedbyBroadieandRendleman(2015).Thesecond

partoftheanalysisistoseehowplayersperformacrossseasons.Inordertorunthisanalysis

thispaperwilluseamodelsimilartothatusedbySchallandSmith(2000).

4.1Round-By-RoundAnalysis

Theround-by-roundanalysiscompareshowplayersperformfromoneroundtothenext

duringthePGATourseasonandatTheMasters.Foreachgroup,playersareassignedtoaone

oftwogroupsafterthefirstroundofplay.Thetophalf(theplayerswhoshotthelowest

scores)areplacedinGroup1,andthebottomhalfisplacedinGroup2.Thentheaverage

second-roundscoreiscomputedforthesamegroups.

Thereareseveraldifferentfactorsthatgointothegroupingofplayers.Playersinthe

firstgroupmaysimplybemoreskilledthanthoseinthesecondgroup.Or,itcouldbethatthe

firstgroupjustexperiencedmorefavorablerandomvariation,alsoknownas“luck”.Ifitwas

onlytheskilloftheplayerthatdeterminedthegroupsonewouldexpectthattheplayersfrom

Group1wouldhaveasecond-roundaveragescoreroughlythesamenumberofstrokesbetter

thanGroup2astheydidinthefirst-round.Ifluckwastheonlyfactorinthefirstround,then

onewouldexpectthatthetwogroupswouldhaveaveragesthatareclosetoequalinthe

secondround.Finally,ifacombinationofluckandskilliswhatdeterminesscoresthenone

wouldexpectthatthedifferencebetweensecond-roundscoreswouldbesmallerthanthe

differencewasforfirst-roundscores.Thedifferenceforgroupsarethencomparedbetween

thePGATourseasonandTheMasters.Thiscomparisoncanbequantifiedbylookingatthe

correlationbetweendifferences.

Page 8: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

8

4.2Year-To-YearAnalysis

Inordertocompareplayerscoresfromdifferentyears’performancecanbe

standardizedbyfindingthedifferencebetweenaplayer’sperformancefromagivenyearand

themeanperformanceforallplayersduringsaidyear.Thisnumbermustbedividedbythe

standarddeviationofperformanceacrossallplayersfortheseason.

FollowingtheworkofSchallandSmith(2000),aplayer’sperformanceforagivenyearis

determinedbyanexpectedvalue(x),whichcanbethoughtofustheplayer’sskilllevelortrue

ability.Theplayer’sactualperformancethendiffersfromtheirtrueabilitybyarandomterm

(E)thathasanexpectedvalueofzeroandisindependentofskillaswellastherandomterms

valueinotherseasons.Thisthengivesusthefollowingequation:

𝑌 = 𝑥 + 𝐸

Onceplayersscoresarestandardized,player’sperformancecanbecomparedfromyear-to-year

andbetweenthePGATourseasonandTheMasters.

5. Results

Analysesofthepast10seasonsshowthatregressiontothemeanatTheMastersisnot

significantlydifferentthanitisduringthePGAtourseason.Ifanything,thereismoreregression

tothemeanatTheMastersthanduringtheseason.Whenlookingatthedifferencebetween

playerscoreandtheaveragescore,theR-squaredvalueattheMastersforthe2015and2016

seasonsis.105.Thisiscomparedwithavalueof.185forthePGAtourseason.Onecansee

Page 9: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

9

thatwhilebothvaluesarelow,theR-squaredforTheMastersissignificantlylowerthanduring

thePGATourseason.

Whenlookingfromround-to-roundin2015,thePGAtourseasonshowsasexpected

regressiontothemeanwithanr-squaredvalueof.131.Themastersshowedanevensmaller

value.TheR-squaredforTheMastersin2015is.00034,showingnearlynorelationship

betweenfirstandsecondroundscoresofplayers.Thisseemstoshowtheparadoxofluckand

skill,whichhasbeenseeninpreviousworks.

Thislackofcorrelationbetweenthescoresofplayersbetweenroundsisevidentinthe

round-by-roundanalysisusingtwogroups.Table1abelowshowsthatthegroupsconverge

towardsthemeaninthesecondround.Thisgivessolidevidenceconfirmingtheworkof

BroadieandRendleman(2015),sayingthatacombinationofluckandskilliswhatleadstototal

performanceinprofessionalgolf.Furthermore,therewasnosignificantdifferencebetween

thegroupsatTheMastersandduringtheregularPGATourseason.DuringthePGATour

season,playersinthefirstgroupstillhavealowerscorethanthoseinthesecondgroupinthe

secondround.ThisisnottrueforTheMasters.AttheMastersweseethatthefirstgrouphas

aslightlynegativecorrelationbetweenthefirstandsecondrounds.Regressiontothemeanis

soseverethatGroup1scoresworsethanthesecondgroupduringthesecondroundatThe

Masters.ThisseemstosuggestthatdeviationinscoresbetweengroupsatTheMastersis

causedsolelybyluck.

Whencomparingthecorrelationoffirstandsecondroundscoresbetweenthedifferent

groups,oneseesverylittlecorrelationforbothgroups.Maybethemostinterestingpartisthe

Page 10: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

10

mannerinwhichcorrelationsfluctuatefromyeartoyearascanbeseeninTable1b.For

example,in2015Group1hadsawafairlysignificantpositivecorrelationbothduringThe

Masters(.24)andduringtheseason(.44)whilethegroupwasnearlyzeroforallotherseasons.

Group2,ontheotherhand,showedapositivecorrelationin2016duringtheseason(.28)anda

similarlynegativecorrelationatTheMasters(-.22).Thefactthatthecorrelationistypically

closetozero,andthattheyfluctuateyearbyyearandgroupbygroupgoestoshowjusthow

randomgolfcanbe.

LookingatthecorrelationbetweenroundsfortheentirefieldatbothTheMastersand

duringthePGAseasonoverthepast10yearsfurtherrevealstherandomnessbetweenrounds.

ThePGAseasonismuchmoreconsistentthanTheMasterswithcorrelationsfluctuating

between.29and.51overthepast10years.Ontheotherhand,TheMastersfluctuatesfrom

.08to.47overthesameyears.ThePGAseasonhasahighercorrelationbetweenroundsin8of

the10seasons,againsuggestinglessregressiontothemeanduringtheseasonthanduringThe

Masters(Figure1).

Ithensplitplayersintotwogroupsbasedontheiraveragescoreontouroverthepastfour

years.Group1consistsofthetophalfofplayersoftheperiodandGroup2consistsofthe

bottomhalf.Thepointofthiswastosplitplayersintogroupsbasedontheirtrueabilityin

ordertodetermineifbetterplayersregresstothemeanlessthanlessskilledplayers.Group1

beingthebetterplayersandGroup2beingtheless-skilledplayers.Ithenlookedathoweach

groupperformedfromthefirsttothesecondroundatTheMastersandduringtheentirePGA

Tourseason.IfoundthattheplayersinGroup1playedthefirstroundofTheMastersnearly

halfastrokebetterthanthesecondroundoverthelastthreetournaments.Thisiscompared

Page 11: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

11

Table1a:round-by-roundcomparison

tothemshooting.15strokesbetterinthefirstroundduringtheentireseasonoverthepast

threeyears.Ontheotherside,thesecondgroupshotnearlyhalfastrokebetterinthesecond

roundofTheMastersthanthefirst.Thiscomparedtoscoringslightlybetterinthesecond

roundthroughoutthePGATourseason.Theselargerdifferencebetweenroundsatthe

MastersprovidesfurtherevidenceofmoreregressiontothemeanatTheMastersthanduring

thePGATourseason.

Whilethistestdidnotshowanydifferenceinregressiontothemeanbetweendifferentskill

groups,itdidshowthatthegroupsperformedmuchdifferentlyfromroundtoround.Thetest

showsevidencethatthemoreskilledplayersontourplaybetterinthefirstroundthanthe

secondroundandviceversaforlessskilledplayers.Thiscouldbebecausetheworseplayers

havetoplaybettertomakethecut,oritcouldbecausedbysomeotherreason.

Table1b:round-by-roundcorrelation

Page 12: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

12

6. Conclusion

Analysesshowthatthereisnotasignificantdifferenceinregressiontothemeanbetween

TheMastersTournamentandthePGAtourseason.Thisisapparentonboththeround-to-

roundlevelaswellastheyear-to-yearanalysis.Itisofnotethatthenumberofobservationsare

lowbecauseofthefactthattheaveragegolftournamenthasfewerthanonehundredplayers.

Onethingthatisnotcontrolledforintheround-by-roundanalysisisdifferingweather

conditions.Playerstypicallyhaveoneroundinthemorningandoneroundintheafternoon

duringthefirsttworoundsofatournament.Onoccasionthereisanextremedifferencein

playingconditionsbetweenthemorningandafternoon.Thischangeinweathercouldbea

causeforregressiontothemeanwhenlookingatasingulartournament.Itisunlikelythatthis

wouldbeafactorwhenlookingattheentireseason.

Figure1:Round-to-roundcorrelationduringPGAseasonandatTheMastersfrom2008-2017

Page 13: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

13

ThefactthatatTheMastersplayersfromdifferinggroupsscorepracticallythesamein

thesecondroundrevealsthatscoringatTheMastersisbasedmoreonluckthanduringthe

PGAseason.ThiscouldbeduetothefactthatitismuchmoredifficulttoqualifyforThe

Mastersthanitisforregularevents.MeaningthattheplayersatTheMastersarecloserintrue

abilitythantheyareinanormaltournament.

IfplayersattheMastersshowmoreregressiontothemeanthanduringtheseason,

thenwhyisitthatplayerslikePhilMickelsonseemtoperformbetteratTheMasters?One

explanationcouldbethatMickelsonandotherplayerssimplymatchupwellwithAugusta.Itis

seeninothertournamentsthatplayersplaybetteratcertaincourses.Itcouldbethat

Mickelsonjustsohappenstohaveagamethatfitswellwithoneofthemostprestigious

coursesintheworld.

Page 14: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

14

7. References

(1) Broadie,Mark,andRichardRendleman.“AretheOfficialWorldGolfRankingsBiased?

”Http://www.columbia.edu/~mnb2/Broadie/Assets/owgr_20120507_broadie_rendlema

n.Pdf,7May2012.

(2) Connolly,RobertA.andRichardJ.Rendleman,Jr.,2008,Skill,LuckandStreakyPlayon

thePGATour,"JournaloftheAmericanStatisticalAssociation,103(March):74-88.

(3) Connolly,RobertA.andRichardJ.Rendleman,Jr.,2012,\WhatitTakestoWinonthe

PGATour(IfYourNameisTiger"orIfItIsn't),"InterfacesNovember-December,

42(6):554-576.

(4) Galton,F.(1886),“RegressionTowardsMediocrityinHereditaryStature,”Journalofthe

AnthropologicalInstitute,15,246-263.

(5) Kahneman,Daniel.Thinking,FastandSlow.Farrar,StrausandGiroux,2013.

(6) PastWinners,2018.www.masters.com/en_US/discover/past_winners.html.

(7) PGA.“WhatIsShotLinkIntelligence.”PGATour,2005,

www.pgatour.com/stats/shotlinkintelligence/overview.html.

(8) TeddySchall&GarySmith(2000)DoBaseballPlayersRegresstowardtheMean?,The

AmericanStatistician,54:4,231-235

Page 15: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

15

8. GraphsandFigures

Figure3:Mastersround1comparison2016-2017

Figure2:PGAround1comparison2016-2017

Page 16: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

16

Figure4:PGAround-to-round2017

Figure5:Mastersround-to-round2017

Page 17: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

17

Table2:PGA

Tou

rgroup

sforro

undcompa

rison

Page 18: Regression to the Mean at The Masters Golf …economics-files.pomona.edu › GarySmith › Econ190 › Econ190 2018...one of the most famous golf courses in the world. The Masters

18

Table3:M

astersgroup

sforro

undcompa

rison