Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1...

10
Yucesoy and Barabási EPJ Data Science (2016) 5:17 DOI 10.1140/epjds/s13688-016-0079-z REGULAR ARTICLE Open Access Untangling performance from success Burcu Yucesoy 1 and Albert-László Barabási 1,2,3,4* * Correspondence: [email protected] 1 Center for Complex Network Research and Department of Physics, Northeastern University, Boston, USA 2 Center for Cancer Systems Biology, Dana Farber Cancer Institute, Boston, USA Full list of author information is available at the end of the article Abstract Fame, popularity and celebrity status, frequently used tokens of success, are often loosely related to, or even divorced from professional performance. This dichotomy is partly rooted in the difficulty to distinguish performance, an individual measure that captures the actions of a performer, from success, a collective measure that captures a community’s reactions to these actions. Yet, finding the relationship between the two measures is essential for all areas that aim to objectively reward excellence, from science to business. Here we quantify the relationship between performance and success by focusing on tennis, an individual sport where the two quantities can be independently measured. We show that a predictive model, relying only on a tennis player’s performance in tournaments, can accurately predict an athlete’s popularity, both during a player’s active years and after retirement. Hence the model establishes a direct link between performance and momentary popularity. The agreement between the performance-driven and observed popularity suggests that in most areas of human achievement exceptional visibility may be rooted in detectable performance measures. Keywords: success; performance; popularity 1 Introduction Performance, representing the totality of objectively measurable achievements in a certain domain of activity, like the publication record of a scientist or the winning record of an athlete or a team, captures the actions of an individual entity [–]. In contrast success, captured by fame, celebrity, popularity, impact or visibility, is a collective measure, repre- senting a community’s reaction to and acceptance of an individual entity’s performance [, ]. The link between these two measures, while often taken for granted, is actually far from being understood and often controversial and lopsided. Indeed, even the most profound scientific discovery goes unnoticed if its importance is not acknowledged through discus- sions, talks and citations by the scientific community. The void between success and per- formance is well illustrated by the concepts of ‘famesque’, ‘celebutante’ or ‘faminess’, used to label an individual without tangible performance, but ‘known for his well-knowingness’ []. These often prompt us to see fame and success as only loosely related to [–] and often divorced [–] from performance. This dichotomy is illustrated by documented examples of scientists whose popular media visibility significantly exceeds their scientific credentials [], or by countless celebrities, from the Kardashian sisters to athletes with no or only underwhelming accomplishments [–], as well as by high performers like David Beckham or Tiger Woods who are frequently featured in the media for reasons un- © 2016 Yucesoy and Barabási. This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Transcript of Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1...

Page 1: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 DOI 10.1140/epjds/s13688-016-0079-z

R E G U L A R A R T I C L E Open Access

Untangling performance from successBurcu Yucesoy1 and Albert-László Barabási1,2,3,4*

*Correspondence:[email protected] for Complex NetworkResearch and Department ofPhysics, Northeastern University,Boston, USA2Center for Cancer Systems Biology,Dana Farber Cancer Institute,Boston, USAFull list of author information isavailable at the end of the article

AbstractFame, popularity and celebrity status, frequently used tokens of success, are oftenloosely related to, or even divorced from professional performance. This dichotomy ispartly rooted in the difficulty to distinguish performance, an individual measure thatcaptures the actions of a performer, from success, a collective measure that captures acommunity’s reactions to these actions. Yet, finding the relationship between the twomeasures is essential for all areas that aim to objectively reward excellence, fromscience to business. Here we quantify the relationship between performance andsuccess by focusing on tennis, an individual sport where the two quantities can beindependently measured. We show that a predictive model, relying only on a tennisplayer’s performance in tournaments, can accurately predict an athlete’s popularity,both during a player’s active years and after retirement. Hence the model establishesa direct link between performance and momentary popularity. The agreementbetween the performance-driven and observed popularity suggests that in mostareas of human achievement exceptional visibility may be rooted in detectableperformance measures.

Keywords: success; performance; popularity

1 IntroductionPerformance, representing the totality of objectively measurable achievements in a certaindomain of activity, like the publication record of a scientist or the winning record of anathlete or a team, captures the actions of an individual entity [–]. In contrast success,captured by fame, celebrity, popularity, impact or visibility, is a collective measure, repre-senting a community’s reaction to and acceptance of an individual entity’s performance [,]. The link between these two measures, while often taken for granted, is actually far frombeing understood and often controversial and lopsided. Indeed, even the most profoundscientific discovery goes unnoticed if its importance is not acknowledged through discus-sions, talks and citations by the scientific community. The void between success and per-formance is well illustrated by the concepts of ‘famesque’, ‘celebutante’ or ‘faminess’, usedto label an individual without tangible performance, but ‘known for his well-knowingness’[]. These often prompt us to see fame and success as only loosely related to [–] andoften divorced [–] from performance. This dichotomy is illustrated by documentedexamples of scientists whose popular media visibility significantly exceeds their scientificcredentials [], or by countless celebrities, from the Kardashian sisters to athletes withno or only underwhelming accomplishments [–], as well as by high performers likeDavid Beckham or Tiger Woods who are frequently featured in the media for reasons un-

© 2016 Yucesoy and Barabási. This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in anymedium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commonslicense, and indicate if changes were made.

Page 2: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 2 of 10

related to their professional achievements [, ]. The source of this dichotomy is that inmost areas of human achievement it is difficult to distinguish performance from success[]. Indeed, while we can use citations, prizes and other measures to quantify the impactof a scientific discovery, we lack objective performance measures to capture the degree ofinnovation or talent characterizing a particular paper or a scientist.

Our goal here is to explore in a quantitative manner the relationship between perfor-mance, an individual measure, and success, a collective measure capturing the societalacknowledgement of a given level of performance. Previous research, some using Googlesearch results as a proxy for fame, have suggested that performance can indeed drive suc-cess [–]. Here, by exploring the time-lines of both achievement and its recognition,we uncover how performance affects success over the career of an individual. We do sothrough sports, an area where performance is accurately recorded in terms of number ofwins, place in rankings or career records [–].

Sports is characterized by an equally obsessive focus on popularity and fame, whichstrongly affects an athlete’s market value [], and previous research shows that fame canbe an appropriate proxy for accomplishment []. Yet, in sports too, performance andsuccess often follow different patterns, illustrated by the fact that only a small fraction ofthe earnings of a professional athlete is tied directly to his/her performance on the field,the vast majority coming from endorsements, determined more by the athlete’s perceivedsuccess and popularity. For example, only $. million of Roger Federer’s $. millionreported income was from tournament prizes [], the rest came from endorsementstied to the public recognition of the athlete. Yet, Novak Djokovic, who was better rankedthan Federer during , received over $. million prize money but only $ million viaendorsements, about a third of Federer’s purse. The fact that professional performancedoes not uniquely determine reward is further illustrated by Anna Kournikova, who in was the second best paid female tennis player, despite never reaching higher thanthe th place in rankings. Her popularity and consequential financial prowess is oftenattributed to her photogeneity and media-friendly personality, traits outside of her sportsperformance. This and many other well documented cases of ‘faminess’ raise an importantquestion: What performance factors affect popularity and how do they do so? In otherwords, can performance explain popularity and fame and if so, to what extent? These arefundamental issues in most areas that aim to fairly reward excellence, from science toeducation and business.

2 Performance and success in tennisProfessional tennis players accumulate score points based on their winning record duringthe previous year (Figure (A)). By ordering the players based on their score we obtain theofficial Association of Tennis Professionals (ATP) Rankings, r(t), an accurate measure of atennis player’s performance relative to other players, having r = for the top player whiler is high for low-performing athletes.

We measure each player’s time-resolved visibility through the number of hourly visits toits Wikipedia page [–], a sensitive time-dependent measure of the collective interestin a player (Figure (B)). We define a player’s popularity or fame through his cumulativevisibility, representing the average number of visits his Wikipedia page acquires during ayear (red line in Figure (B)). Figure (C) indicates that top players can gather a total of

visits in a year, while those at the bottom of the rankings collect as few as , document-ing popularity differences between players that span orders of magnitude. As Figure (D)

Page 3: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 3 of 10

Figure 1 Ranking and visibility. (A) The score distribution for the players on the ATP rankings list onDecember 31, 2012. (B) Page-viewsW(t) (blue) and the rank (green) for Novak Djokovic, where the time tcorresponds to either the beginning of a tournament or every 17 days, a period slightly longer than theaverage duration of a Grand Slam. The red line indicatesW(t), the yearly average page-views. The four markeddata points correspond to the 2009 semi finals (n(t) = 6) of US Open (V(t) = 2,000) when Djokovic lost againstthe top ranked Federer (�r(t)H(�r)/r(t) = 1/4), gaining him the highest visibility peak up to that point, thesemi finals (n(t) = 4) at Shanghai Masters (V(t) = 1,000) in 2010 for which H(�r) = 0; winning (n(t) = 7) the USOpen (V(t) = 2,000) against Nadal (�r(t)H(�r)/r(t) = 2/3), which lead to an explosion in Djokovic’s popularity.His tallest peak was in 2012 when he won the Australian Open (V(t) = 2,000) while he was at the top of therankings. (C) The distribution of the number of Wikipedia page-views for athletes active and retired duringthe year 2012. (D) Number of Wikipedia page-views for all active players during the year 2012 shown infunction of their ATP ranking on December 31, 2012. (Also see Fig. S1 for page-view counts of active playersand how those relate to their score points).

indicates, a player’s rank (performance) and Wikipedia visits (popularity) are correlated:The lower the ranking, the higher are the Wikipedia page-views. Yet, we also observe asignificant scattering: Athletes ranked around r = , can gather anywhere from to visits per year, wide differences that support the impression that popularity is oftendivorced from performance.

To address the degree to which performance determines popularity, we reconstructedthe number of Wikipedia visits W (t) for all ranked professional male tennis players be-tween and (Figure (B)), together with their time-dependent history of achieve-ments on the field (supplementary material S in Additional file ). The two datasets allowus to identify several performance-related factors that influence a player’s visibility, andeventually his popularity:

(i) Rank r(t): Figure (A) shows the measured Wikipedia page-views W (t) vs. a player’smomentary rank r(t), indicating that the number of visits rapidly drop with the increasingrank of a player. It also shows that the variations in W (t) is higher for players with lowerperformance (i.e. higher rank) (Figure (B)).

(ii) Tournament value V (t): The more points V (t) a tournament offers to the winner,a measure of the tournament prestige, the more visibility it confers to its players, win orlose (Figure (B)). For example, the early peaks in Djokovic’s visibility correspond to hisparticipation in the US and Australian Opens, two high value tournaments (Figure (B)).

Page 4: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 4 of 10

Figure 2 The impact of performance on visibility. The Wikipedia page-viewsW(t) for all players vs. thevarious measures of individual performance: (A) the rank of a player r(t); (B) the tournament value V(t); (C) thenumber of matches the player participates in during a tournament n(t); (D) the rivals term �r(t)H(�r)/r(t);(E) the number of years active Y(t). The Wikipedia visits, each corresponding to �t ≈ 17 day bins, are shownin grey while the red dots correspond to binned averages. In (C), up to n(t) = 5 (where most tournaments end)W(t) increases with n(t) (Figure 2(C)). The jump for n(t) = 6 and 7 corresponds to the semi-final and finalmatches of the most watched tennis tournaments, offering disproportionately more visibility. In (E), after 15years the visibility drops slightly, indicating that players with very long professional career no longer benefitfrom career longevity. (F) The contribution of each performance variable to visibilityW(t) based on (2). Here�r̃(t) ≡ �r(t)H(�r)/r(t) and the β -coefficients result from a multivariate regression analysis of the formlog y =

∑i ai log xi + c. The dependent variable y is the Wikipedia page-viewW(t); xi represents one of the five

independent variables (r(t), V(t), n(t), e�r(t)H(�r)/r(t) , Y(t)), and the ai are the corresponding coefficients. Allvariables are individually significant (p < 0.001).

(iii) Number of matches n(t): The more matches an athlete plays within a tournament,reflecting his advance within the competition, the more exposure he receives (Figure (C)).As the losing player is eliminated, n(t) also determines the amount of points the playercollects within a tournament. Up to n(t) = (where most tournaments end) we see a steadyincrease of W (t) with n(t). The jump for n(t) = and exist only for the semi-final andfinal matches in the most watched tennis tournaments, offering disproportional visibility.

(iv) Rank of the best opponent: A match against a better ranked rival generates additionalinterest in a player. To capture this effect we measure the relative rank difference for eachmatch �r(t)H(�r)/r(t), where �r(t) = r(t)–rBR(t) is the difference between the rank of theconsidered player and his best rival in the tournament; the Heaviside step function H(�r)is one when the opponent has a better rank and zero otherwise. The increase of the averagepage-views with �r(t)H(�r)/r(t) quantifies the boost in visibility from playing against abetter athlete (Figure (D)), like the visibility peak of Djokovic when he played againstthe then # Federer in the US Open (Figure (B)) (see S for the detailed opponentstatistics).

(v) Career length Y (t): The longer the player has been an active professional player, themore Wikipedia visitations he collects (Figure (E)).

Page 5: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 5 of 10

Figures (A)-(E) document clear correlations between the performance measures (i)-(v)and the momentary visibility of a player. Yet, a linear fit y(t) = Ax(t) + C of each individualperformance measure to the observed Wikipedia page-views results in R < ., exceptfor /r(t), for which R = .. Therefore, no individual performance measure can fullyexplain visibility, indicating that performance drives popularity through a combinationof performance measures (see supplementary materials S in Additional file for vari-able interdependencies). We therefore explored the predictive power of the sum of thesevariables with multipliers obtained via an ordinary least squares (OLS) fitting [] pro-cess, resulting in R = ., only slightly better than /r(t) alone. We find, however, that amultiplicative process offers a much better predictive power, an OLS fitting leading to theformula

WM(t) = A

r(t)V (t)n(t)e

�r(t)H(�r)r(t) Y (t) + C, ()

which yields R = .. This indicates that the influence of the performance measures (i)-(v) do not add up, but amplify each other in a multiplicative manner, allowing for the emer-gence of the extreme fluctuations in visibility, as observed in Figure (B) and Figures (A)-(E). By taking the logarithm of () and calculating the standardized β-coefficients for eachterm, we can evaluate how strongly each performance variable influences W (t) in units ofstandard deviation. Figure (F) shows the obtained standardized β-coefficients, indicat-ing that rank is the strongest driving force of visibility, followed by the value (prestige) ofthe tournaments, the rank of the opponents and the number of matches the player partic-ipated in a tournament. While career length contributes to a lesser degree, all terms aresignificant (p < .).

These results lead to a popularity model (PROMO), which predicts the time dependentvisibility of a player WM(t) using the athlete’s performance as an input,

WM(t) = AY (t)r(t)

V (t)n(t)e�r(t)H(�r)/r(t) + CY (t)r(t)

. ()

The last term accounts for periods when the player is not playing (between tournaments,periods of injury, etc.). An OLS fitting process of this two-parameter PROMO results inR = ., offering the best predictive accuracy of the models we tested (an analysis of allsub-models is provided in the supplementary materials S in Additional file ). Hence ()represents our main result, linking an athlete’s visibility (W (t)) to his performance cap-tured by r(t), V (t), n(t), �r(t)H(�r)/r(t) and Y (t). While it is possible to improve the ac-curacy further by implementing additional measures such as including a multiplier to n(t)for when n(t) ≥ , the increase in accuracy is not sufficient to justify the added complexity.

We designed several tests to validate PROMO’s predictive power:(i) In a prospective study we fit the r.h.s. of () to W (t) for and , obtaining the

coefficients A = . and C = ,, the same for all athletes (see supplementary materi-als S in Additional file ). We then use these parameters to predict the momentary visi-tations WM(t) for the subsequent years (Figure (A)). We find that the model accuratelypredicts the bulk of the real data (Pearson correlation coefficient is r = . resulting in anR of .). The deviation for very low Wikipedia page-views has a simple technical rea-son: beginning athletes often lack a dedicated Wikipedia page, hence their page-views are

Page 6: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 6 of 10

Figure 3 Predicting visibility and popularity. (A) Scatter plot of predicted Wikipedia page-viewsWM(t) andcollected page-viewsW(t), using the parameters A = 3.747 and C = 7,929 determined from the page-viewhistory for the first 2 years (training data). Error bars indicate prediction percentiles (10% and 90%) in each binand are green if y = x lies between the two percentiles in that bin and red otherwise. The black circles are theaverage predicted page-views in that bin. (B) Comparison between Novak Djokovic’s observed Wikipediapage-viewsW(t) (blue) and his performance-based predicted page-viewsWM(t). The model accuratelycaptures the considerable lift in his career in 2011, when his page-views increased about an order ofmagnitude. (C) The total predicted Wikipedia page-views for each player, capturing a player’s predictedpopularity, compared to the actual total Wikipedia page-views in the whole period considered in our analysis.The symbol colors represent the best rank the player reached during the considered period. The highlightedpoints correspond to the most significant outliers, whose modified z-value for the logarithmic distancebetween the prediction and data exceeds 3.5. The corresponding athletes are identified in the insert.(D) Separating the slow modesWS(t), driven by career length Y(t) and rank r(t) (purple), and fast modesWF (t),driven by tournament value V(t), number of matches n(t) and best better rivals term e�r(t)H(�r)/r(t) (orange).

counted only indirectly and incompletely on Wikipedia (see supplementary materials Sand S in Additional file ). Once a player gets a Wikipedia page, his visibility approachesthe model-predicted values.

(ii) Figure (B) compares the time dependent visibility WM(t) to the real visitation W (t)for Novak Djokovic, indicating that the model () accurately captures not only the explo-sion of his overall popularity in following his exceptional performance on the field,but also the peaks in the visitation patterns. Comparing Figure (B) to Figure (B) helpsclarify the lack of significant increase in popularity when Djokovic first reached # in therankings: Rankings alone cannot explain visibility; we need the full predictive power ofPROMO to unveil the underlying dynamics.

(iii) The model () allows us to separate the role of the different parameters: Rank andcareer length act as slow modes, driving the popularity of an athlete, capturing his overallfame or celebrity (Figure (D)). In contrast the tournament value V (t), the number ofmatches played within a tournament n(t), and the rank of the best opponent represent fastmodes that drive the momentary visibility, like the timing and the height of the individualvisibility peaks (Figure (D)). As the slow and fast modes get multiplied, they togethercan account for the major visibility peaks on a slowly varying background, which in turndetermines an athlete’s overall fame.

Page 7: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 7 of 10

(iv) To assess our ability to predict a player’s popularity or fame from his performance,we use the total page-views a player received across several years. Figure (C) showsa comparison between the predicted and observed popularity of each active player be-tween and , indicating that the observed popularity W (t) closely follows theperformance-driven predicted popularity WM(t). A color-coding by the player’s peak rankreveals that for players that reached top rankings, the accuracy of the prediction is remark-able; scattering is only seen for lower ranked players.

To understand if popularity in tennis can be induced by factors unrelated to perfor-mance, we inspected the outliers, athletes whose observed popularity is significantlyhigher than their performance-based popularity. We find that the outliers highlightedin Figure (C) (also listed in Table S) are young players at the bottom of the rankings,who participated in only a few tournaments. An inspection of their career reveals thattheir added popularity is also performance driven, routed in outstanding results in junioror doubles tournaments, performance factors not considered in (). For example Quinzi,Peliwo, Zverev and Golding reached number one or two in junior rankings, and Broadyand Edmund had considerable successes in doubles. Only Marco Djokovic’s visibility couldnot be explained by his performance. His higher than expected fame is most likely relatedto the attention (and potential confusion) he earns as the brother of Novak Djokovic, oneof the best active tennis players.

(v) Finally, we find that the Wikipedia pages of retired players continue to attract visitors(Figure (C)), prompting us to ask if a player’s enduring popularity can be explained by hispast performance. Given that for retired players r(t), V (t), n(t) and �r(t)H(�r)/r(t) arenot recorded, their visibility can be determined only by the second term in (). By usingthe median for r(t), reflecting a player’s overall performance during his active career, andYT for the number of active years Y (t), we predict an athlete’s popularity during retirementas

W InactM (t) = C

YT

rmed, ()

using the same fitting parameter we had before (C = ,). As Figure (A) shows, thepredicted popularity of inactive players is in excellent agreement with the measured pop-ularity, indicating that past performance is the main source of their enduring fame, at leastfor several years following the player’s retirement (see supplementary materials S in Ad-ditional file for outliers).

Finally we can apply the tools and insights developed above to explore the emergenceof fame. For this we group players into ten categories based on their peak ranking duringtheir career. The change in the average ranking of each group over time (Figures (B)-(C)) indicates that players who reach the best ranks distinguish themselves in their first tournaments, i.e. they climb in rank very fast early in their careers. This effect is partic-ularly clear in Figure (C), which shows the rate of change in rankings during the first tournaments, indicating that players that eventually reach the top rise much faster earlyon than the rest. Therefore, top ranks are not reached by a slow improvement in skill, butinstead young players come in with a given skill set, some remarkable, others less so, andrapidly reach the vicinity of their skill-determined ranking level, where they fluctuate formost of their career (Figure (B)).

Page 8: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 8 of 10

Figure 4 The emergence of fame. (A) The total predicted Wikipedia page-views for each retired playercompared to the actual total Wikipedia page-views they collected after they stopped participating intournaments, color-coded based on the best rank the player reached during his active career. (B) Theevolution of player rankings vs. the number of tournaments they participated in, shown as averages overcategories based on the best ranking an athlete achieved during his career. Dark red captures top 10 playersand dark blue those whose best ranking is more than 100. (C) The change in the rankings for the first 20tournaments for all players plotted with respect to their eventual best rank. (D) The projected daily averagepage-views of players based on a moving sum of three months preceding each tournament, averaged overcategories based on best rank. (E) The projected average page-views of players vs. their average rank at thetime of the tournaments for the first 100 tournaments of their career, again averaged over categories basedon best rank. (F) The projected average page-views of players vs. their average rank at the time of thetournaments, for the duration of their careers. The first 100 tournaments are highlighted and the black dashedline represents a 〈r〉–2 fit.

Figure (D) shows the projected daily average Wikipedia visitation of players groupedbased on their best rank (supplementary materials S in Additional file ). It allows us touncover a highly nonlinear relationship between rank and popularity (Figure (E)): pop-ularity raises fast immediately following a player’s entry into the professional field, butthis growth rate slows down between ranks , and . This is followed by an ex-ploding popularity for the elite players, those that reach ranking and below. In otherwords, elite players benefit from a disproportional popularity bonus, not accessible forother ranked players. Overall, we find a robust relationship between a player’s averagepopularity and rank: At the beginning of a player’s career popularity increases as /〈r〉α ,with α ≈ . ± ., indicating that in this early stages of an athlete’s career rank is themost important determinant of popularity (Figure (F)). After the first tournaments,however, the influence of rank on popularity is less pronounced and average popularityfluctuates in the vicinity of its top value.

3 ConclusionsWhile we would like to believe that fame, visibility and popularity are uniquely determinedby performance, representing well-deserved recognition for some sustained or singularachievement, a significant body of media research indicates otherwise, suggesting thatfame follows patterns on its own, divorced from talent or performance [–]. Here weaimed to quantify the relationship between performance and popularity in an area where

Page 9: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 9 of 10

these two quantities can be individually measured. We did so by constructing a model topredict a tennis player’s visibility captured by his Wikipedia page-views, a proxy of the ath-lete’s popularity and fame. Taken together, we find that in tennis a player’s popularity andmomentary visibility are uniquely determined by his performance on the court. The agree-ment is especially good for elite players. This indicates that for athletes exceptional per-formance offers exceptional visibility, a level that is hard to modulate by exogenous events.For less accomplished players we observe deviations from the performance-predictedpopularity, suggesting that in this case visibility can be manipulated by exogenous eventsor personal attributes outside of the scope of sports performance. It is comforting, how-ever, that for most outliers the extra visibility can be explained by performance factors notconsidered in our model, like achievements in doubles or junior tournaments. In short, thebetter the performance of an athlete, the more accurately it determines his popularity, andthe lesser the role of exogenous factors. Finally, we find that the fame of retired players isalso determined by their past performance, indicating that exceptional past performancecan lead to a commensurate lasting legacy.

We expect that our methodology can be readily generalized to other areas where per-formance and visibility can be independently measured. The generalization to sports likechess, table tennis, golf or car racing should be straightforward as the ranking systems ofthese sports are similar to tennis. With careful adjustments it may also be appropriate forteam sports, allowing us to systematically explore how the performance of a professionalathlete is tied to his/her or the team’s collective success.

We would like to believe that in most areas of human achievement fame and visibil-ity are determined by some underlying performance indicators. The scientific challenge,however, is to systematically separate performance from success, a fundamental goal fromscience to management. The excellent agreement between the performance-based and theobserved popularity documented here makes us wonder to what degree faminess is real,suggesting that outstanding fame and popularity may be rooted in performance measuresthat are perhaps not yet accessible to us.

Additional material

Additional file 1: Supplementary materials. (pdf )

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsBY and A-LB designed the research. BY analyzed the data and prepared the figures. BY and A-LB prepared the manuscript.

Author details1Center for Complex Network Research and Department of Physics, Northeastern University, Boston, USA. 2Center forCancer Systems Biology, Dana Farber Cancer Institute, Boston, USA. 3Department of Medicine, Brigham and Women’sHospital, Harvard Medical School, Boston, USA. 4Center for Network Science, Central European University, Budapest,Hungary.

AcknowledgementsWe wish to thank Tamás Hámori for his guidance toward a deeper understanding of the tennis world, Kim Albrect forhelpful visualizations and other colleagues at the CCNR, especially those in the success group, for valuable discussionsand comments. This research was supported by Air Force Office of Scientific Research (AFOSR) under agreementsFA9550-15-1-0077 and FA9550-15-1-0364.

Received: 18 January 2016 Accepted: 20 April 2016

Page 10: Untangling performance from success · YucesoyandBarabásiEPJDataScience20165:17 Page3of10 Figure1 Rankingandvisibility.(A)ThescoredistributionfortheplayersontheATPrankingsliston

Yucesoy and Barabási EPJ Data Science (2016) 5:17 Page 10 of 10

References1. Lehmann S, Jackson AD, Lautrup BE (2006) Measures for measures. Nature 444(7122):1003-10042. Radicchi F, Fortunato S, Castellano C (2008) Universality of citation distributions: toward an objective measure of

scientific impact. Proc Natl Acad Sci USA 105(45):17268-172723. Grubb HJ (1998) Models for comparing athletic performances. J R Stat Soc, Ser D, Stat 47(3):509-5214. Radicchi F (2011) Who is the best player ever? A complex network analysis of the history of professional tennis. PLoS

ONE 6(2):e172495. Radicchi F (2012) Universality, limits and predictability of gold-medal performances at the Olympic Games. PLoS ONE

7(7):e403356. Uzzi B (2008) A social network’s changing statistical properties and the quality of human innovation. J Phys A, Math

Theor 41(22):2240237. Ke Q, Ferrara E, Radicchi F, Flammini A (2015) Defining and identifying Sleeping Beauties in science. Proc Natl Acad Sci

USA 112(24):7426-74318. Boorstin DJ (2012) The image: a guide to pseudo-events in America. Vintage Books, New York9. Van de Rijt A, Shor E, Ward C, Skiena S (2013) Only 15 minutes? The social stratification of fame in printed media. Am

Sociol Rev 78(2):266-28910. Gabler N (2001) Toward a new definition of celebrity. The Norman Lear Center, USC Annenberg11. Andrews DL, Jackson SJ (2001) Sport stars: the cultural politics of sporting celebrity. Routledge, London12. Argetsinger A (2013) Famesque: Amy Argetsinger on celebrities famous for being famous. Washington Post13. Whannel G (2002) Media sport stars: masculinities and moralities. Routledge, London14. Turner G (2004) Understanding celebrity. Sage, London15. Evans J, Hesmondhalgh D (2005) Understanding media: inside celebrity. Open University Press, Milton Keynes16. Ivaldi A, O’Neill SA (2008) Adolescents’ musical role models: whom do they admire and why? Psychol Music

36(4):395-41517. Hall N (2014) The Kardashian index: a measure of discrepant social media profile for scientists. Genome Biol 15(7):42418. McClain AS (2013) The Kardashian phenomenon: news interpretation. Media Report to Women 41(2)19. Female athletes only famous for their looks.

http://www.rantsports.com/clubhouse/2014/04/04/15-hot-female-athletes-who-are-only-famous-for-their-looks/.Accessed 9 Dec 2014

20. Celebrities who are famous for doing nothing.http://www.ajc.com/gallery/entertainment/celebrities-who-are-famous-doing-nothing/gCDry/. Accessed 9 Dec2014

21. Vincent J, Hill JS, Lee JW (2009) The multiple brand personalities of David Beckham: a case study of the Beckhambrand. Sport Mark Q 18(3):173-180

22. Davie WR, King CR, Leonard DJ (2010) A media look at Tiger Woods - two views. J Sports Media 5(2):107-11623. Murray C (2003) Human accomplishment: the pursuit of excellence in the arts and sciences, 800 BC to 1950. Harper

Collins, New York24. Rosen S (1981) The economics of superstars. Am Econ Rev 71(5):845-85825. Hamlen WA Jr (1991) Superstardom in popular music: empirical evidence. Rev Econ Stat 73(4):729-73326. Bagrow JP, Rozenfeld HD, Bollt EM, ben-Avraham D (2004) How famous is a scientist? - Famous to those who know

us. Europhys Lett 67(4):51127. Bagrow JP, ben-Avraham D (2005) On the Google-fame of scientists and other populations. In: Modeling cooperative

behavior in the social sciences. AIP conference proceedings, vol 779, pp 81-8928. Adler M (2006) Stardom and talent. In: Ginsburg VA, Throsby D (eds) Handbook of the economics of art and culture,

vol 1. Elsevier, Amsterdam, pp 895-90629. Simkin M, Roychowdhury V (2011) Von Richthofen, Einstein and the AGA. Significance 8(1):22-2630. Simkin MV, Roychowdhury VP (2013) A mathematical theory of fame. J Stat Phys 151(1-2):319-32831. Simkin MV, Roychowdhury VP (2015) Chess players’ fame versus their merit. Appl Econ Lett 22(18):1499-150432. Sire C, Redner S (2009) Understanding baseball team standings and streaks. Eur Phys J B 67(3):473-48133. Merritt S, Clauset A (2014) Scoring dynamics across professional team sports: tempo, balance and predictability. EPJ

Data Sci 3(1):134. Clauset A, Kogan M, Redner S (2015) Safe leads and lead changes in competitive team sports. Phys Rev E 91:06281535. Lago-Ballesteros J, Lago-Peñas C (2010) Performance in team sports: identifying the keys to success in soccer.

J Human Kinet 25:85-9136. Petersen AM, Jung W-S, Yang J-S, Stanley HE (2011) Quantitative and empirical demonstration of the Matthew effect

in a study of career longevity. Proc Natl Acad Sci USA 108(1):18-2337. Motegi S, Masuda N (2012) A network-based dynamical ranking system for competitive sports. Sci Rep 2:90438. Vaz de Melo POS, Almeida VAF, Loureiro AAF, Faloutsos C (2012) Forecasting in the NBA and other team sports:

network effects in action. ACM Trans Knowl Discov Data 6(3):1339. Herm S, Callsen-Bracker H-M, Kreis H (2014) When the crowd evaluates soccer players’ market values: accuracy and

evaluation attributes of an online community. Sport Manag Rev 17(4):484-49240. Yu AZ, Ronen S, Hu K, Lu T, Hidalgo CA (2016) Pantheon 1.0, a manually verified dataset of globally famous

biographies. Sci Data 3:15007541. The world’s 100 highest-paid athletes 2013.

http://www.forbes.com/pictures/mli45mmlg/the-worlds-100-highest-paid-athletes-4/. Accessed 1 Dec 201442. Spoerri A (2007) What is popular on Wikipedia and why? First Monday 12(4)43. Mestyán M, Yasseri T, Kertész J (2013) Early prediction of movie box office success based on Wikipedia activity big

data. PLoS ONE 8(8):e7122644. Keegan B, Gergle D, Contractor N (2013) Hot off the Wiki: Structures and dynamics of Wikipedia’s coverage of

breaking news events. Am Behav Sci 57(5):595-62245. Cohen J, Cohen P, West SG, Aiken LS (2003) Applied multiple regression/correlation analysis for the behavioral

sciences. Lawrance Erlbaum Associates, Mahwah