Fake It Till You Make It

28
Fake It Till You Make It: Reputation, Competition, and Yelp Review Fraud Michael Luca Harvard Business School [email protected] Georgios Zervas Boston University [email protected] September 17, 2013 Abstract Consumer reviews are now a part of everyday decision-making. Yet the credibility of reviews is fundamentally undermined when business-owners commit review fraud, either by leaving positive reviews for themselves or negative reviews for their com- petitors. In this paper, we investigate the extent and patterns of review fraud on the popular consumer review platform Yelp.com. Because one cannot directly observe which reviews are fake, we focus on reviews that Yelp’s algorithmic indicator has iden- tified as fraudulent. Using this proxy, we present four main findings. First, roughly 16 percent of restaurant reviews on Yelp are identified as fraudulent, and tend to be more extreme (favorable or unfavorable) than other reviews. Second, a restaurant is more likely to commit review fraud when its reputation is weak, i.e., when it has few re- views, or it has recently received bad reviews. Third, chain restaurants - which benefit less from Yelp - are also less likely to commit review fraud. Fourth, when restaurants face increased competition, they become more likely to leave unfavorable reviews for competitors. Taken in aggregate, these findings highlight the extent of review fraud and suggest that a business’s decision to commit review fraud respond to competition and reputation incentives rather than simply the restaurant’s ethics. Part of this work was completed while the author was supported by a Simons Foundation Postdoctoral Fellowship. 1

description

Fake it till you make it

Transcript of Fake It Till You Make It

Fake It Till You Make It:Reputation, Competition, and Yelp Review FraudMichaelLucaHarvardBusinessSchoolmluca@[email protected],2013AbstractConsumerreviewsarenowapartofeverydaydecision-making. Yetthecredibilityof reviewsisfundamentallyunderminedwhenbusiness-ownerscommitreviewfraud,either byleavingpositivereviews for themselves or negativereviews for their com-petitors. Inthis paper, we investigate the extent andpatterns of reviewfraudonthepopularconsumerreviewplatformYelp.com. Becauseonecannotdirectlyobservewhich reviews are fake, we focus on reviews that Yelps algorithmic indicator has iden-tied as fraudulent. Using this proxy,we present four main ndings. First,roughly 16percent of restaurant reviews on Yelp are identied as fraudulent, and tend to be moreextreme(favorableorunfavorable)thanotherreviews. Second, arestaurantismorelikelytocommitreviewfraudwhenitsreputationisweak, i.e., whenithasfewre-views, or it has recently received bad reviews. Third, chain restaurants - which benetlessfromYelp-arealsolesslikelytocommitreviewfraud. Fourth,whenrestaurantsfaceincreasedcompetition, theybecomemorelikelytoleaveunfavorablereviewsforcompetitors. Takeninaggregate, thesendingshighlighttheextentof reviewfraudandsuggestthatabusinesssdecisiontocommitreviewfraudrespondtocompetitionandreputationincentivesratherthansimplytherestaurantsethics.PartofthisworkwascompletedwhiletheauthorwassupportedbyaSimonsFoundationPostdoctoralFellowship.11 IntroductionConsumer reviewwebsites suchas Yelp, TripAdvisor, andAngies List have become in-creasinglypopularoverthepastdecade, andnowexistfornearlyanyproductorserviceimaginable. Yelpalonecontainsmorethan30millionreviewsofrestaurants, barbers, me-chanics,andotherservices,andhasamarketcapitalizationinexcessoffourbilliondollars.Moreover,thereismountingevidencethatthesereviewshaveadirectinuenceonproductsales(seeChevalierandMayzlin(2006),Luca(2011),ZhuandZhang(2010)).Asthepopularityoftheseplatformshasgrown,sohaveconcernsthatthecredibilityofreviewscanbeunderminedbybusinessesleavingfakereviewsforthemselvesorfortheircompetitors. Thereisconsiderableanecdotalevidencethatthistypeofcheatingisendemicintheindustry. Forexample, theNewYorkTimesrecentlyreportedonthecaseof busi-nesseshiringworkersonMechanical TurkanAmazon-ownedcrowdsourcingmarketplacetopostfake5-starYelpreviewsontheirbehalf foraslittleas25centsperreview.1In2004, Amazon.caunintentionallyrevealedtheidentitiesof anonymousreviewers, brieyunmaskingconsiderableself-reviewingbybookauthors.2Inrecentyears,reviewfraudhasemergedasthepreeminentthreattothesustainabilityofthistypeofusergeneratedcontent. Despitethemajorchallengethatreviewfraudposesforrmsandconsumersalike, littleisknownabouttheeconomicincentivesbehindit. Inthis paper, we assemble a novel dataset from Yelp one of the industry leaders to estimatetheincidenceof reviewfraudamongrestaurantsintheBostonmetropolitanarea, andtounderstand the conditions under which it is most prevalent. Specically, we provide evidenceonthe impact of restaurants evolvingreputation, andthe competitionit faces, anditsdecisiontoengageinpositiveandnegativereviewfraud.A growing computer science literature has developed data-mining algorithms which lever-age observable review characteristics, such as textual features, and reviewers social networks,toidentifyabnormal reviewingpatterns(forexample, seeAkogluetal. (2013)). Arelatedstrandoftheliteraturehasfocusedonconstructingagoldstandardforfakereviewsthatcanbeusedas traininginput for fakereviewclassiers. For example, Ott et al. (2012)constructsuchafakereviewcorpusbyhiringusersonMechanical Turkanonlinelabormarkettoexplicitlywritefakereviews.In this paper, we analyze fake reviews from a dierent perspective, and investigate a busi-nesss incentives to engage in review fraud. We analyze these issues using data from Yelp.com,1SeeARave,aPan,orJustaFake?byDavidSegal,May11,availableathttp://www.nytimes.com/2011/05/22/your-money/22haggler.html.2See Amazon reviewers brought to book by David Smith, Feb.04, available athttp://www.guardian.co.uk/technology/2004/feb/15/books.booksnews.2focusing on reviews that have been written for restaurants in the Boston metropolitan area.Empirically, identifying fake reviews is dicult because the econometrician does not directlyobservewhetherareviewisfake. Asaproxyforfakereviews, weusetheresultsofYelpsltering algorithm that predicts whether a review is genuine or fake. Yelp uses this algorithmto ag fake reviews, and to lter them o of the main Yelp page (we have access to all reviewsthatdonotdirectlyviolatetermsofservice,regardlessofwhethertheywereltered.) Theexact algorithm is not public information, but the results of the algorithm are. With this inhand,wecananalyzethepatternsofreviewfraudonYelp.Overall, roughly16%ofreviewsareidentiedbyYelpasfakeandaresubsequentlyl-tered. What does a ltered review look like?We rst consider the distribution of star-ratings.Thedatashowthatlteredreviewstendtobemoreextremethanpublishedreviews. Thisobservationrelatestoabroaderliteratureonthedistributionof opinioninuser-generatedcontent. Li and Hitt (2008) show that the distribution of reviews for many products tend tobebimodal, withreviewstendingtoward1-and5-starsandrelativelylittleinthemiddle.Theirargumentisthatthiscanbeexplainedthroughselectionifpeoplearemorelikelytoleave a review after an extreme experience. Our results suggest another factor that increasestheproportionofextremereviewsistheprevalenceoffakereviews.Does reviewfraudrespondtoeconomic incentives, or is it drivenmainlybyasmallnumber of unethical restaurants that are intent ongamingthe systemregardless of thesituation?If review fraud is driven by incentives,then we should see a higher concentrationof fraudulent reviews when the incentives are stronger. Theoretically, restaurants with worse,or less established reputations have a stronger incentive to game the system. Consistent withthis, wendthatarestaurantsreputationisastrongpredictorof itsdecisiontoleaveafakereview. Wealsondthatrestaurantsaremorelikelytoengageinfraudwhentheyhavefewerreviews. Thisresultmotivatesanadditional explanationfortheoften-observedconcentrationofhighratingsearlierinabusinesslife-cycle. Previousworkattributedthisconcentration to interactions between a consumers decision to contribute a review, and priorreviewsbyotherconsumers(seeMoeandSchweidel (2012), GodesandSilva(2012).) Ourworksuggestsbusinesses increasedincentivestomanipulatetheirreviewsearlyonasanadditional explanation for this observation. We also nd that restaurants that have recentlyreceivedbadratings engageinmorepositivereviewfraud. Bycontrast, wendnolinkbetweennegativereviewfraudandreputation. Tosomeextentthisistobeexpected,sincethesearefakereviewsthatarepresumablyleftbyabusinesscompetitors.Wealsondthatarestaurantsoinereputationisadeterminantof itsdecisiontoseekpositivefakereviews. Inparticular, Luca(2011)ndsthatconsumerreviewsarelessinuentialforchainrestaurants,whichalreadyhavermlyestablishedreputationsbuiltby3extensivemarketingandbranding. JinandLeslie(2009)ndthatorganizationalformalsoaects a restaurants performance in hygiene inspections. Consistent with this,we nd thatchainrestaurantsarelesslikelytoleavefakereviewsrelativetoindependentrestaurants.This contributes to our understanding of the ways in which a business reputation aects itsincentiveswithrespecttoreviewfraud.In addition to leaving reviews for itself, a restaurant may commit review fraud by leavinganegative reviewfor acompetitor. Empirically, we ndthat increasedcompetitionbynearbyindependentrestaurantsservingsimilartypesof foodispositivelyassociatedwithnegativereviewfraud. Interestingly, increasedcompetitionbynearbyrestaurantsservingdierenttypesoffoodarenotasignicantpredictorofnegativereviewfraud. Apotentialexplanationforthisndingisthatrestaurantstendtocompetebasedonacombinationoflocationandcuisine. Thisexplanationechoesthesurveyof Auty(1992)inwhichdinersranked food type and quality highest among a list of restaurant selection criteria, suggestingthat restaurants do no compete on the basis of location alone. Our results are also consistentwiththeanalysisof Mayzlinetal. (2012)whondthathotelswithindependently-ownedneighborsaremorelikelytoreceivenegativefakereviews.Overall, our ndings suggest that positive review fraud is primarily driven by changes inarestaurantsownreputation, whilenegativereviewfraudisprimarilydrivenbychangingpatterns of competition. For platforms looking to curtail gaming, our results provide insightsboth into the extent of gaming, as well as the circumstances in which this is more prevalent.Ourresultsalsoprovideinsightintoourunderstandingofethicaldecisionmakingbyrms,whichweshowtobeafunctionofeconomicincentivesratherthanafunctionofunethicalrms. Finally,ourworkiscloselyrelatedtotheliteratureonorganizationalform,showingthatincentivesbyindependentrestaurantsarequitedierentfromincentivesofchains.2 RelatedworkThereisnowextensiveevidencethatconsumerreviewshaveacausal impactondemandinindustries rangingfrombooks torestaurants tohotels, amongothers (Chevalier andMayzlin(2006), Luca(2011), Ghose et al. (2012)). However, there is considerablylessagreementaboutwhetherreviewscontaintrustworthyinformationthatcustomersshoulduse. Forexample,LiandHitt(2008)arguethatearlierreviewerstendtobemorefavorabletoward a product relative to later reviewers, making reviews less representative of the typicalbuyerandhencelessreliable. Lookingatmoviesalesandreviews, Dellarocasetal.(2010)provideevidencethatconsumersaremorelikelytoreviewnicheproducts,butatthesametimearemorelikelytoleaveareviewifmanyotherreviewershavecontributed,suggesting4othertypesofbiasthatmayappearinreviews. Inprinciple, ifoneknowsthestructureofanygivenbiasthatreviewsexhibit, Dai etal. (2012)arguethatthereviewplatformcanimprovethereliabilityofinformationbysimplyadjustingandreweighingreviewstomakethemmorerepresentative.Perhaps the most direct challenge to the reliability of online information is the possibilityof leavingfakereviews. Theoretically, Dellarocas(2006) providesconditionsunderwhichreviews can still be informative even if there is gaming. In concurrent but independent work,AndersonandSimester(2013)showthatmanyreviewsonanonlineapparel platformarewritten by customers who have no purchase record. These reviews tend to be more negativethanotherreviews. Incontrastwithoursetting- wherewelookforeconomicincentivestocommitreviewfrom-theirworkhighlightsreviewsthatarewrittenbypeoplewithoutanyclear nancial incentivetoleaveafakereview. Withincomputer science, agrowingliterature has focusedonthe development of machine learning algorithms to identify reviewfraud. Commonlythesealgorithmsrelyeither ondataminingtechniquestoidentifyab-normal reviewing patterns, or employ supervised learning methods trained on hand-labelledexamples. SomeexamplesareOttetal. (2012), Fengetal. (2012), Akogluetal. (2013),Mukherjeeetal.(2011,2012),Jindaletal.(2010).To the best of our knowledge, there exists only one other paper that analyzes the economicincentives of reviewfraud, anduses adierent empirical approachandsetting. Mayzlinetal. (2012)exploitanorganizational dierencebetweenExpediaandTripAdvisor(whicharespin-os of thesameparent companywithdierent features3) tostudyreviewfraudbyhotels: whileanyonecanpostareviewonTripAdvisor, Expediarequiresthataguesthas paidandstayedbefore submittingareview. The authors observe that Expediasvericationmechanismincreases thecost of postingafakereview. Thestudynds thatindependent hotels tendtohaveahigher proportionof ve-star reviews onTripAdvisorrelative to Expedia and competitors of independent hotels tend to have a higher proportion ofone-star reviews on TripAdvisor relative to Expedia. Their argument is that there are manyreasons that TripAdvisor reviews may be dierent from Expedia reviews,and many reasonsthat independent hotels may receive dierent reviews from chain hotels. However, they arguethatifindependenthotelsreceivefavorabletreatmentrelativetochainsonTripAdvisorbutnotonExpedia, thenthissuggeststhatthesereviewsonTripAdvisorarefraudulent. Thevalidity of this measure rests on the assumption that dierences in the distributions of ratingsacrossthetwosites, andbetweendierenttypesofhotelsareduetoreviewfraud. Inourwork, wedonotrelyonthisassumptionasweareidentifyingoureectsentirelywithinasinglereviewplatform. Ourworkalsodierenceinthatweareabletoexploitthepanel3Seehttp://www.iac.com/about/history.5FilteredPublishedReviews0500010000150002005 2006 2007 2008 2009 2010 2011 2012(a)Publishedandlteredreviewcountsbyquarter.GGGGGGGGGGGG G GGGGGG GGGG GGG GGGGGGGFiltered reviews2005 2006 2007 2008 2009 2010 2011 20120%10%20%30%(b)Percentageoflteredreviewsbyquarter.Figure1: ReviewingactivityforBostonrestaurantsfromYelpsfoundingthrough2012.natureofourdata, andanalyzethewithinrestaurantroleofreputationinthedecisiontocommit reviewfraud. This allows us toshownot onlythat certaintypes of restaurantsaremorelikelytocommitreviewfraud, butalsothatevenwithinarestaurant, economicconditionsinuencetheultimatedecisiontocommitreviewfraud.Finally, we briey mention the connection between our work, the literature on statisticalfrauddetection(e.g., see BoltonandHand(2002)), andthe relatedline of researchonstatistical models of misclassication in binary data (e.g., see Hausman et al. (1998)). Thesemethodshavebeenappliedtouncovervarioustypesoffraud, suchasfraudelentinsuranceclaims, andfraudulentcreditcardtransactions. Thekeydierenceofourworkisthatourendgoal isnt toidentifyindividual fraudulent reviewsinsteadwewishexploit anoisysignalofreviewfraudtoinvestigatetheincentivesbehindit.3 DescriptionofYelpandFilteredReviews3.1 AboutYelpOuranalysisinvestigatesreviewsfromthewebsiteYelp.com, whichisaconsumerreviewplatformwhereuserscanreviewlocalbusinessessuchasrestaurants,bars,hairsalons,andmanyother services. At thetimeof this study, Yelpreceives approximately100millionuniquevisitors per month, andcounts over 30millionreviews inits collection. It is thedominant reviewsitefor restaurants. For thesereasons, Yelpis acompellingsettinginwhich to investigate review fraud. For a more detailed description of Yelp in general, see Luca(2011).Inthis analysis, we focus onrestaurant reviews inthe metropolitanareaof Boston,6MA. WeincludeinouranalysiseveryYelpreviewthatwaswrittenfromthefoundingofYelpin2004through2012,otherthantheroughly1%ofreviewsthatviolateYelpstermsof service (for example, reviews that containoensive, or discriminatorylanguage.) Intotal, ourdatasetcontains316,415reviewsfor3,625restaurants. Ofthesereviews, 50,486(approximately16%) have beenlteredbyYelp. Figure 1adisplays quarterlytotals ofpublishedandlteredreviews onYelp. BothYelps growthinterms of the number ofreviewsthat arepostedonit, andtheincreasingnumber of reviewsthat arelteredareevidentinthisgure.3.2 FakeandFilteredReviewsThemainchallengeinempiricallyidentifyingreviewfraudisthatwenotdirectlyobservewhether a review is fake. The situation is further complicated by the lack of single standardforwhatmakesreviewfake. TheFederalTradeCommissionstruth-in-advertisingrules4provide some useful guidelines: reviews must be truthful and substantiated, non-deceptive,and any material connection between the reviewer and the business being reviewed must bedisclosed. For example,reviews by the business owner,his or her family members,competi-tors, reviewers that have been compensated, or a disgruntled ex-employee are considered fake(and, by extension illegal) unless these connections are disclosed. Not every review can be asunambiguouslyclassied. Thecaseofabusinessownernudgingconsumersbyprovidingthemwithinstructionsonhowtoreviewhisbusinessisinalegal greyarea. Mostreviewsiteswhoseobjectiveistocollectreviewsthatareasobjectiveaspossiblefrownuponsuchinterventions,andencouragebusinessownerstoavoidthem.Toworkaroundthelimitationof notobservingfakereviewsweexploitauniqueYelpfeature: Yelp is the only major review site we know of that allows access to lteredreviews reviews that Yelp has classied as illegitimate using a combination of algorithmic techniques,simpleheuristics, andhumanexpertise. FilteredreviewsarenotpublishedonYelpsmainlistings, andtheydonotcounttowardscalculatingabusiness averagestar-rating. Never-theless, adeterminedYelpvisitorcanseeabusinesslteredreviewsaftersolvingapuzzleknownasaCAPTCHA.5Filteredreviewsare, of course, onlyimperfectindicatorsof fakereviews. Our work contributes to the literature on review fraud by developing a method thatusesanimperfectindicatoroffakereviewstoempiricallyidentifythecircumstancesunder4SeeGuidesConcerningtheUseofEndorsementsandTestimonialsinAdvertising,availableathttp://ftc.gov/os/2009/10/091005revisedendorsementguides.pdf.5ACAPTCHAis apuzzleoriginallydesignedtodistinguishhumans frommachines. It is commonlyimplementedbyaskinguserstoaccuratelytranscribeapieceoftextthathasbeenintentionallyblurredataskthatiseasierforhumansthanformachines. YelpusesCAPTCHAstomakeaccesstolteredreviewsharderforbothhumansandmachines. FormoreonCAPTCHAsseeVonAhnetal.(2003).7which fraud is prevalent. This technique translates to other settings where such an imperfectindicatorisavailable, andreliesonthefollowingassumption: thattheproportionof fakereviewsisstrictlysmalleramongthereviewsYelppublishes, thanthereviewsYelplters.Weconsiderthistobeamodestassumptionwhosevaliditycanbequalitativelyevaluated.In 4,weformalizetheassumption,suggestamethodofevaluatingitsvalidity,anduseittodevelopourempiricalmethodologyforidentifyingtheincentivesofreviewfraud.3.3 CharacteristicsoflteredreviewsTotheextentthatYelpisacontentaggregatorratherthanacontentcreator, thereisadirectinterestinunderstandingreviewsthatYelphasltered. WhileYelppurposelymakesthelteringalgorithmhardtoreverseengineer, weareabletotestfordierencesintheobservedattributesofpublishedandlteredreviews.Figure1bdisplaystheproportionofreviewsthathavebeenlteredbyYelpovertime.The spike in the beginning results from a small sample of reviews posted in the correspondingquarters. Afterthis,thereisaclearupwardtrendintheprevalenceofwhatYelpconsidersto be fake reviews. Yelps retroactively lters reviews using the latest version of its detectionalgorithm. Therefore, aYelpreviewcanbeinitiallyltered, but subsequentlypublished(andviceversa.) Hence, theincreasingtrendseems toreect thegrowingincentives forbusinesses toleavefakereviews as Yelpgrows ininuence, rather thanimprovements inYelpsfake-reviewdetectiontechnology.Should we expect the distribution of ratings for a given restaurant to reect the unbiaseddistribution of consumer opinions?The answer to this question is likely no. Empirically, Huetal.(2006)showthatreviewsonAmazonarehighlydispersed,andinfactoftenbimodal(roughly 50% of products on Amazon have a bimodal distribution of ratings). Theoretically,LiandHitt(2008)pointtothefactthatpeoplechoosewhichproductstoreview,andmaybemorelikelytorateproductsafter havinganextremelygoodor badexperience. Thiswouldleadreviewstobemoredispersedthanactual consumeropinion. Thisselectionofconsumerscanunderminethequalityofinformationthatconsumersreceivefromreviews.Wearguethat fakereviews mayalsocontributetothelargedispersionthat is oftenobservedinconsumer ratings. Toseewhy, consider what afakereviewmight looklike:fakereviewsmayconsist of abusinessleavingfavorablereviewsfor itself, or unfavorablereviews for its competitors. There is little incentive for abusiness toleave amediocrereview. Hence, thedistributionoffakereviewsshouldtendtobemoreextremethanthatof legitimatereviews. Figure2ashows thedistributions of publishedandlteredreviewonYelp. Thecontrastbetweenthetwodistributionsisconsistentwiththesepredictions.81 2 3 4 5PublishedFilteredStar rating0%10%20%30%40%(a)Distributionofstarsratingsbypublishedstatus.User review countFiltered reviews1 2 3 4 5 6 7 8 9 10 11 12 13 14 150%20%40%60%(b) Percentage of ltered reviews by user review count.Figure2: Characteristicsoflteredreviews.Legitimate reviews are unimodal with a sharp peak at 4 stars. By contrast, the distributionof fakereviewsisbimodal withspikesat1starand5stars. Hence, inthiscontext, fakereviews appear to exacerbate the dispersion that is often observed in online consumer ratings.InFigure2bwebreakdownindividual reviews bythetotal number of reviews theirauthorshavewritten, anddisplaythepercentageof lteredreviewsforeachgroup. Thetrendwe observe suggests that Yelpusers who have contributedmore reviews are less likelytohavetheirreviewsltered.Weestimatethecharacteristicsoflteredreviewsinmoredetailbyusingthefollowinglinearprobabilitymodel:Filteredij= bi + x

ij + ij, (1)wherethedependentvariableFilteredijindicateswhetherthejthreviewof businessiwasltered,biisabusinessxedeect,andxijisvectorofreviewandreviewercharacteristicsincluding: star rating, (log of) length in characters, (log of) total number of reviewer reviews,and a dummy for the reviewer having a Yelp-prole picture. The estimation results are shownin the rst column of Table 1. In line with our observations so far, we nd that reviews withextreme ratings are more likely to be ltered all else equal, 1- and 5-star review are roughly3percentagepointsmorelikelytobelteredthan3-starreviews. WealsondthatYelpsreviewlterissensitivetothereviewandreviewerattributesincludedinourmodel. Forexample,longerreviews,orreviewsbyuserswithalargerreviewcountarelesslikelytobeltered. Beyondestablishingsomecharacteristicsof Yelpslter, thisanalysisalsopointstotheneedforcontrollingforpotentialalgorithmicbiaseswhenusinglteredreviewsasaproxyforfakereviews. Weexplainourapproachindealingwiththisissuein 4.93.4 FilteredReviewsandAdvertisingonYelpLocal business advertising constitutes Yelps major revenue stream. Advertisers are featuredon Yelp search results pages in response to relevant consumer queries, and on the Yelp pagesof similar, nearbybusinesses. Furthermore, whenabusiness purchases advertising, YelpremovescompetitorsadsfromthatbusinessYelppage. Overtheyears,Yelphasbeenthetarget of repeatedcomplaints allegingthat its lter discriminates infavor of advertisers,goinginsomecases as far as claimingthat thelter is nothingother thananextortionmechanismfor advertisingrevenue.6Yelphas deniedthese allegations, andsuccessfullydefendeditselfincourtwhenlawsuitshavebeenbroughtagainstit(forexample,seeLevittv. YelpInc.,andDemetriadesv. YelpInc.) Ifsuchallegationsweretrue,theywouldraiseseriousconcernsastothevalidityofusinglteredreviewsasproxyforfakereviewsinouranalysis.Usingourdatasetweareabletocastfurtherlightonthisissue. TodosoweexploitthefactthatYelppubliclydiscloseswhichbusinessesarecurrentadvertisers(itdoesnotdisclose which businesses were advertisers historically.) Specically, we augment Equation 1byinteractingthexitvariableswithadummyvariableindicatingwhetherabusinesswasaYelpadvertiseratthetimeweobtainedourdataset. Theresultsofestimatingthismodelareshowninsecondcolumnof Table1. Wendthat noneof theadvertiser interactioneects are statistically signicant, while the remaining coecients are essentially unchangedin comparisonto those in Equation 1. This suggests,for example,that neither 1- nor 5-starreviews were signicantly more or less likely to be ltered for businesses that were advertisingonYelpatthetimewecollectedourdataset.This analysis has some clear limitations. First, since we do not observe the complete his-toric record of which businesses have advertised on Yelp, we can only test for discriminationin favor of (or, against) currentYelp advertisers. Second, we can only test for discriminationinthepresent breakdownbetweenlteredandpublishedreviews. Third,ourtestobviouslypossessesnopowerwhatsoeverindetectingdiscriminationunrelatedtolteringdecisions.Therefore, while our analysis provides some suggestive evidence against the theory that Yelpfavorsadvertisers, westressthatitisneitherexhaustive, norconclusive. Itisbeyondthescope of this paper, and outside the capacity of our dataset to evaluate all the ways in whichYelpcouldfavoradvertisers.6See No, Yelp Doesnt Extort Small Businesses. See For Yourself., available at: http://officialblog.yelp.com/2013/05/no-yelp-doesnt-extort-small-businesses-see-for-yourself.html.104 EmpiricallyStrategyInthis section, weintroduceour empirical strategyfor identifyingreviewfraudonYelp.Ideally, if we could recognize fake reviews, we would estimate the following regression model:fit= x

it + bi + t + it(i = 1 . . . N; t = 1 . . . T), (2)wherefitisthenumberoffakereviewsbusinessireceivedduringperiodt,xitisavectoroftime-varying covariates measuring a business economic incentives to engage in review fraud,arethestructural parameters of interest, biandtarebusiness andtimexedeects,andtheitisanerrorterm. Theinclusionofbusinessxedeectsallowsustocontrol forunobservable time-invariant, business-specic incentives for Yelp review fraud. For example,Mayzlin et al. (2012) nd that the management structure of hotels in their study is associatedwithreviewfraud. Totheextent that management structureis time-invariant, businessxedeectsallowustocontrolforthisunobservablecharacteristic. Hence,whenlookingatincentives to leave fake reviewers over time, we include restaurant xed eects. However, wealso run specications without a restaurant xed eect so that we can analyze time-invariantcharacteristicsaswell. Similarly, theinclusionoftimexedeectsallowsustocontrol forunobservable,commonacrossbusinesses,time-varyingshocks.Asisoftenthecaseinstudiesofgamingandcorruption(e.g.,seeMayzlinetal.(2012),Duggan and Levitt (2002),and references therein) we do not directly observe fit,and hencewecannotestimatetheparametersofthismodel. ToproceedweassumethatYelpslterpossessessomepositivepredictivepowerindistinguishingfakereviewsfromgenuineones.Isthisacredibleassumptiontomake?Yelpsappearstoespousetheviewthatitis. WhileYelp is secretive about how its review lter works, it states that the lter sometimes aectsperfectly legitimate reviews and misses some fake ones, too, but does a good job given thesheervolumeofreviewsandthedicultyofitstask.7Inaddition,wesuggestasubjectivetest to assess the assumptions validity: for any business, one can qualitatively check whetherthe fraction of suspicious-looking reviews is larger among the reviews Yelp publishes,ratherthanamongtheonesitlters.Formally,weassumethatPr[Filtered|Fake] = a0, andPr[Filtered|Fake] = a0 + a1,forconstantsa0 [0, 1], anda1 (0, 1], i.e., thattheprobabilityafakereviewislteredisstrictlygreater thantheprobabilityagenuinereviewisltered. Lettingfitkbealatentindicator of the kthreview of businesses i at time t being fake, we model the ltering process7SeeWhatisthelter?,availableathttp://www.yelp.com/faq#what_is_the_filter.11forasinglereviewas:fitk= 0(1 fitk) + (0 + 1)fitk + uitk,whereuitkisazero-meanindependenterrorterm. Werelaxthisindependenceassumptionlater. Summingofoverallnitreviewsforbusinessiinperiodtweobtain:nit

k=1fitk=nit

k=1[0(1 fitk) + (0 + 1)fitk + uitk]fit= 0nit + 1fit + uit(3)whereuitis acompositeerror term. SubstitutingEquation2intotheabove, yields thefollowingmodelyit= a0nit + a1 (x

it + bi + t + it) + uit. (4)It consists of observedquantities, unobservedxedeects, andanerror term. We canestimatethismodel usingawithinestimatorwhichwipesoutthexedeects. However,whilewecanidentifythereduced-formparametersa1, wecannotseparatelyidentifythevectorofstructural parametersofinterest, . Therefore, wecanonlytestforthepresenceof fraudthroughtheestimates of thereduced-formparameters, a1. Furthermore, sincea1 1,theseestimateswillbelowerboundstothestructuralparameters,.4.1 ControllingforbiasesinYelpslterSofar, wehavenotaccountedforpossiblebiasesinYelpslterrelatedtospecicreviewattributes. Butwhatif uitisendogenous? Forexample, theltermaybemorelikelytoltershorterreviews, regardlessofwhethertheyarefake. Tosomeextent, wecancontrolforthesebiases. Letzitkbeavectorofreviewattributes. Weincorporatelterbiasesinbymodelingtheerrorthemuitkasfollowsuitk= z

itk + uitk(5)where uitkisnowanindependenterrorterm. Thisinturnsuggeststhefollowingregressionmodelyit= a0nit + a1 (x

it + bi + t + it) +nit

kz

itk + uit. (6)12In zitk, we include controls for: review length, the number of prior reviews a reviews authorhas written, and whether the reviewer has a prole picture associated with his or her account.As we saw in 3 these attributes help explain a large fraction of the variance in ltering. Alimitationofourworkisthatwecannotcontrolforlteringbiasesinattributesthatwedonot observe, such as the IP address of a reviewer, or the exact time a review was submitted.If theseunobservedattributesareendogenous, ourestimationwill bebiased. Equation6constitutesourpreferredspecication.5 ReviewFraudandOwnReputationThis section discusses the main results, which are at the restaurant-month level and presentedin Tables 3 and 4. The particular choice of aggregation granularity is driven by the frequencyofreviewingactivity. RestaurantsintheBostonareareceiveonaverageapproximately0.11-star publishedreviews per month, and0.375-star reviews. Table 2contains detailedsummarystatisticsofthethesevariables.Wefocusonunderstandingtherelationshipbetweenarestaurantsreputationandtheitsincentivestoleaveafakereview, withtheoverarchinghypothesisthatrestaurantswithamore established, or more favorable reputationhave less of anincentive toleave fakereviews. Whilethereisnt asinglevariablethat fullycapturesarestaurantsreputation,thereareseveralthatwefeelcomfortableanalyzing,includingitsnumberofrecentpositiveand negative reviews, and an indicator for whether the restaurant is a chain, or independentbusiness. Theseareempiricallywell-foundedmetricsofarestaurantsreputation.Specically, weestimateEquation6, whereweincludeinthevector xitthefollowingparameters: the number of 1, 2, 3, 4, and5star reviews receivedinperiodt 1; thelogof thetotal number of reviews thebusiness hadreceiveduptoandincludingperiodt 1;and,theageofthebusinessinperiodtmeasuredin(fractional)years. Toinvestigatetheincentivesofpositivereviewfraudweestimatespecicationswiththenumberof5-starreviews per restaurant-monthas the dependent variable. These results are presentedinTable3. Similarly, toinvestigatenegativereviewfraud, werepeat theanalysis withthenumber of 1-star reviews per restaurant-month as the dependent variable. We present theseresultsinTable4. Next,wediscussourresultsindetail.5.1 Results: worseningreputationdrivespositivereviewfraudLowratingsincreaseincentivesforpositivereviewfraud, andhighratingsde-creasethem Onemeasureof arestaurantsreputationisits rating. Asarestaurants13ratingincreases,itreceivesmorebusinessLuca(2011)andhencemay havelessincentivetogame the system. Consistent with this hypothesis, in the rst column of Table 3 we observea positive and signicant association between the number of published 1- and 2-star reviewsabusinessreceivedinperiodt 1,andreviewfraudinthecurrentperiod. Conversely,weobserveanegative, statisticallysignicantassociationbetweenreviewfraudinthecurrentperiod, and the occurrence of 4- and 5-star published reviews in the previous period. In otherwords, apositivechangetoarestaurantsreputationwhethertheresultoflegitimate, orfakereviewsreducestheincentivesof engaginginreviewfraud, whileanegativechangeincreasesthem.Beyondthestatisticalsignicanceoftheseresultswealsointerestedintheirsubstantiveeconomic impact. One way to gauge this, is to compare the magnitudes of the estimated co-ecients to the average value of the dependent variable. For example, on average restaurantsinourdatasetreceivedapproximate0.1ltered5-starreviewspermonth. Meanwhile, thecoecientestimatesintherstcolumnofTable3suggestthatanadditional1-starreviewpublishedinthepreviousperiodisassociatedwithanextra0.01ltered5-starreviewsinthe current period, i.e., an increase constituting approximately 10% of the observed monthlyaverage. Furthermore, recalling that most likely a1< 1 (that is to say Yelp does not identifyeverysinglefakereview), thisnumberisalowerboundfortheincreaseinpositivereviewfraud.To assess the robustness of the relationship between recent reputational shocks and reviewfraudwere-estimatedtheabovemodelincludingthe6-monthleadsofpublished1,2,3,4,and5star reviews counts. We hypothesize that while tosome extent restaurants mayanticipategoodorbadreviewsandengageinreviewfraudinadvance,theeectshouldbemuchsmaller comparedtotheeect of recentlyreceivedreviews. Our results, shownincolumn2of Table3suggestthatthisisindeedthecase. Thecoecientsof the6-monthleadvariablesarenearzero, andnotstatisticallyatconventional signicancelevels. Theonlyexceptionisthecoecientforthe6-monthleadof 5starreviews(p