Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage...

12
Information Systems Research Vol. 23, No. 3, Part 2 of 2, September 2012, pp. 1056–1067 ISSN 1047-7047 (print) ISSN 1526-5536 (online) http://dx.doi.org/10.1287/isre.1110.0405 © 2012 INFORMS Research Note Music Blogging, Online Sampling, and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, [email protected] Jui Ramaprasad McGill University, Montreal, Québec H3A 1G5, Canada, [email protected] O nline social media such as blogs are transforming how consumers make consumption decisions, and the music industry is at the forefront of this revolution. Based on data from a leading music blog aggregator, we analyze the relationship between music blogging and full-track sampling, drawing on theories of online social interaction. Our results suggest that intensity of music sampling is positively associated with the popularity of a blog among previous consumers and that this association is stronger in the tail than in the body of music sales distribution. At the same time, the incremental effect of music popularity on sampling is also stronger in the tail relative to the body. In the last part of the paper, we discuss the implications of our results for music sales and potential long-tailing of music sampling and sales. Put together, our analysis sheds new light on how social media are reshaping music sharing and consumption. Key words : blogs; social interactions; observational learning; word of mouth; long tail; music industry; social media History : Vallabh Sambmurthy, Senior Editor; Siva Viswanathan, Associate Editor. This paper was received on March 27, 2008, and was with the authors 27 months for 4 revisions. Published online in Articles in Advance February 13, 2012. The debut Arcade Fire album, “Funeral,” was released barely a month ago, on Sept. 14, by the indie label Merge, based in North Carolina. Enthusiastic reviews were written, even more enthusiastic blog entries were posted, MP3’s circulated. It used to take months of touring and record-shop hype for an underground band to build a cult, but now it takes only a few weeks. “I’d like to thank the Internet,” Mr. Butler said, and he wasn’t serious, but he also wasn’t wrong. (Sanneh 2004) 1. Introduction Blogs and other social media are changing how con- sumers interact with each other, how they make decisions about consumption, and even how they actually consume products and services. These new media are particularly influential in the music world, as evidenced by the success of the Arcade Fire album Funeral (see the quote above). Traditionally, radio play has served as the primary mechanism for consumers to discover music before deciding whether to buy it or not. More recently, music blogs are emerging as an alternative new media competing with old media for consumer attention. The key difference is that whereas the old media have limited bandwidth and tend to focus on the most popular mainstream music, social media such as music blogs have a substantially higher bandwidth, bringing a far wider cross section of music to the attention of consumers. The purpose of this research is to examine how social interactions enabled by music blogs are shaping music sharing, sampling, and sales. The stark contrast in bandwidth between new and old media is demonstrated in Figure 1, which displays the Lorenz curves for song radio play versus music blog sampling for the data set used in our empirical analysis. 1 It can be seen that the top 1% of songs on radio account for a full 50% of radio play, whereas the top 10% of songs consume more than 90% of radio time; the corresponding numbers for music blog 1 Song radio play data are from Nielsen SoundScan, and music blog sampling data are from The Hype Machine. The Lorenz curves are constructed for the set of data that comprises the entire set of songs posted on The Hype Machine between July 1, 2006, and August 31, 2006 (for the sampling curve) and the set of songs played on the radio at least once during the first week of August (for the radio play curve). 1056

Transcript of Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage...

Page 1: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Information Systems ResearchVol. 23, No. 3, Part 2 of 2, September 2012, pp. 1056–1067ISSN 1047-7047 (print) � ISSN 1526-5536 (online) http://dx.doi.org/10.1287/isre.1110.0405

© 2012 INFORMS

Research Note

Music Blogging, Online Sampling,and the Long Tail

Sanjeev DewanPaul Merage School of Business, University of California, Irvine, Irvine, California 92697,

[email protected]

Jui RamaprasadMcGill University, Montreal, Québec H3A 1G5, Canada, [email protected]

Online social media such as blogs are transforming how consumers make consumption decisions, and themusic industry is at the forefront of this revolution. Based on data from a leading music blog aggregator, we

analyze the relationship between music blogging and full-track sampling, drawing on theories of online socialinteraction. Our results suggest that intensity of music sampling is positively associated with the popularity ofa blog among previous consumers and that this association is stronger in the tail than in the body of musicsales distribution. At the same time, the incremental effect of music popularity on sampling is also stronger inthe tail relative to the body. In the last part of the paper, we discuss the implications of our results for musicsales and potential long-tailing of music sampling and sales. Put together, our analysis sheds new light on howsocial media are reshaping music sharing and consumption.

Key words : blogs; social interactions; observational learning; word of mouth; long tail; music industry;social media

History : Vallabh Sambmurthy, Senior Editor; Siva Viswanathan, Associate Editor. This paper was received onMarch 27, 2008, and was with the authors 27 months for 4 revisions. Published online in Articles in AdvanceFebruary 13, 2012.

The debut Arcade Fire album, “Funeral,” was releasedbarely a month ago, on Sept. 14, by the indie labelMerge, based in North Carolina. Enthusiastic reviewswere written, even more enthusiastic blog entries wereposted, MP3’s circulated. It used to take months oftouring and record-shop hype for an undergroundband to build a cult, but now it takes only a few weeks.“I’d like to thank the Internet,” Mr. Butler said, and hewasn’t serious, but he also wasn’t wrong.

(Sanneh 2004)

1. IntroductionBlogs and other social media are changing how con-sumers interact with each other, how they makedecisions about consumption, and even how theyactually consume products and services. These newmedia are particularly influential in the music world,as evidenced by the success of the Arcade Fire albumFuneral (see the quote above). Traditionally, radio playhas served as the primary mechanism for consumersto discover music before deciding whether to buy itor not. More recently, music blogs are emerging asan alternative new media competing with old mediafor consumer attention. The key difference is that

whereas the old media have limited bandwidth andtend to focus on the most popular mainstream music,social media such as music blogs have a substantiallyhigher bandwidth, bringing a far wider cross sectionof music to the attention of consumers. The purposeof this research is to examine how social interactionsenabled by music blogs are shaping music sharing,sampling, and sales.

The stark contrast in bandwidth between new andold media is demonstrated in Figure 1, which displaysthe Lorenz curves for song radio play versus musicblog sampling for the data set used in our empiricalanalysis.1 It can be seen that the top 1% of songs onradio account for a full 50% of radio play, whereasthe top 10% of songs consume more than 90% ofradio time; the corresponding numbers for music blog

1 Song radio play data are from Nielsen SoundScan, and music blogsampling data are from The Hype Machine. The Lorenz curves areconstructed for the set of data that comprises the entire set of songsposted on The Hype Machine between July 1, 2006, and August31, 2006 (for the sampling curve) and the set of songs played onthe radio at least once during the first week of August (for theradio play curve).

1056

Page 2: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1057

Figure 1 Lorenz Curves for Radio Play vs. Music Blog Sampling

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Sampling Radio play

Note. The Gini coefficient, calculated as the proportion of the area between the 45� line and the Lorenz curve to the total area under the 45� line, is equal to0.61 for music blog sampling and 0.93 for radio play.

sampling for the top 1% and top 10% of songs areonly 8% and 40%, respectively.2 As the use of socialmedia expands alongside radio play, a key questionis whether or not the exposure to a larger varietytranslates to a more diverse consumption of music byconsumers.

Apart from their higher bandwidth, new mediain the form of music blogs also enable consumersto immediately sample the music online—a typeof consumption enabled by the “information good”nature of music. In this paper, we define samplingto be the streaming of full-track (as opposed to shortclips) MP3 files posted on the blogs in a browser ormusic player. Despite the ease of music consump-tion through online sampling, the enormous varietyof music available online3 raises the question, how doconsumers decide what music to sample? Any indi-vidual user is unlikely to be familiar with more thana tiny fraction of all those songs available. Given this,how does a user go about deciding which blogs toseek out for music recommendations and what songsto actually listen to? In this regard, with the advent ofsocial media, consumers are increasingly relying onthe opinions and actions of other consumers.

Music blogging and online sharing influences thechoices of other consumers through two broad typesof social interaction: word of mouth (WOM) andobservational learning (OL). As discussed in Chenet al. (2010), WOM effects are meant to capture the

2 The Gini coefficients for Sampling and RadioPlay are 0.53 and 0.93,respectively, also indicating a substantially higher level of inequal-ity in radio play compared with sampling.3 For example, The Hype Machine alone provides links to thou-sands of music blogs and through them to hundreds of thousandsof songs.

impact of consumer opinions (such as product reviews)on other consumers’ choices, whereas OL theoriesdeal with the influence of consumer actions (such asfrequency of purchase of different products). Word-of-mouth effects (see, e.g., Godes and Mayzlin 2004,2009; Dellarocas 2003) are relatively less salient in ourcontext because our data do not capture variation inconsumer opinion—there are single blog posts persong, each of which typically signals a tacitly positiveopinion about the music. Therefore, the thrust of ouranalysis is based on observational theories of learningas they apply to the role of blog/music popularity.

Understanding the role of popularity in consumerchoice is the main goal of observational theories oflearning. In the music blogging context, observa-tional learning occurs when consumers draw infer-ences about music quality and likelihood of liking apiece of music from the past choices of other con-sumers as reflected in various popularity statistics(e.g., number of users who liked different songs ora list of most popular blogs). The initial focus ofthe OL literature was on developing analytical mod-els (e.g., Banerjee 1992, Bikhchandani et al. 1998) toexplain how consumers infer uncertain product qual-ity from the choices of previous consumers. Morerecently, a distinctly empirical stream of the litera-ture has emerged, and a common finding is that therelease of popularity information results in popularproducts becoming even more popular, often leadingto winner-take-all outcomes in product markets (e.g.,Anderson and Holt 1997, Chen et al. 2010, Salganiket al. 2006). In contrast, Tucker and Zhang (2009)demonstrate that in certain settings the impact ofpopularity information might depend on the inher-ent market size of different products, so popular-ity information about a niche product might signal

Page 3: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail1058 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

high quality, further driving sales, whereas popular-ity information for a popular product is compara-tively less informative in this regard—an insight thatinforms our analysis as well. Our theoretical discus-sions and empirical predictions draw on OL theo-ries, emphasizing the differential impacts of blog andmusic popularity in mainstream versus niche music.

Our research also contributes to the emerging lit-erature on the online long tail effect (Anderson 2004,2006), where most of the prior work has focused onthe changing distribution of book sales (e.g., Bryn-jolfsson et al. 2006). However, the idea of the long tailcan also be applied to music consumption, as in theanalysis of Bhattacharjee et al. (2007) and Chellappaet al. (2007). Given that music is a digital good thatcan be consumed immediately, we are able to explorehow social media might contribute to the long tail insales through first driving the long tail in sampling.Similar to the focus of these prior studies, we lookat the potentially differential impact of blogging onthe consumption of music in the body versus tail ofmusic sales.

Highlighting our results, we find that blog popular-ity has a strong and positive effect on music samplingand that this effect is significantly stronger in the tailas compared to the body of music sales distribution.We also find evidence that music sampling is posi-tively associated with the inherent popularity of themusic item, and music popularity is also more impor-tant in the tail for driving sampling compared withthe body of the song sales distribution. Finally, wefind that sampling is positively associated with musicsales. These results indicate that consumer blogginghas an important influence on both music samplingand music sales, and the influence is different in thebody versus tail of the sales distribution. These resultshave important implications for consumers and var-ious participants of the music industry, as discussedlater in the paper.

The rest of this paper proceeds as follows. Sec-tion 2 provides the theoretical underpinnings forour analysis and develops our hypotheses. Section 3presents our data and empirical specifications. Sec-tion 4 presents our empirical results, and §5 providessome discussion and concluding remarks.

2. Theory and HypothesesIn this section we develop our hypotheses, basedon the conceptual framework shown in Figure 2.As shown in the figure, our primary interest is onthe association between blog/music popularity andmusic sampling and on how this association is differ-ent for mainstream versus niche music (i.e., the longtail effect). We first look at the association betweenblog/music popularity and sampling (Hypotheses 1Aand 1B) and then at how that association differs

Figure 2 Conceptual Framework

Blog popularity

Music popularity

Music sampling

Niche vs.mainstream

Controls—Genre—Radio play—Indie—Reviews—Time since release

between the body and tail of music sales distribution(Hypotheses 2A and 2B). Our analysis is guidedby the notion that music is inherently an experiencegood, in that its properties cannot be determined byinspection prior to consumption, as opposed to searchgoods, where this assessment can be done ex ante (seeNelson 1970 for the earliest distinction between searchand experience goods). As such, consumers rely onthe opinions and actions of their peers in choosingwhat music to consume. Now, one could argue thatonline sharing is turning all music into a search goodbecause one can sample music prior to purchase. Butthe question relevant to this study is what music tosample in the first place, which is not trivial, giventhe enormous variety of both blogs and music.

As discussed in §1, sampling decisions might beguided by blog or music popularity, and the choicecan be explained by theories of observational learn-ing. The gist of the argument for blog popularityfollows; the arguments for music popularity are anal-ogous. In the presence of uncertainty about blog qual-ity, or source credibility (Kelman 1961), consumershave to balance their own ex ante quality informa-tion with the inferences they draw from the prior con-sumption choices of their peers. Assuming that somefraction of the consumer population is able to dis-cern blog quality, users would infer that a more pop-ular blog is on average of higher quality than a lesspopular blog. In the extreme, when this correspon-dence between popularity and quality is made irre-spective of the users’ own private information, wehave the so-called information cascade phenomenon(e.g., Banerjee 1992, Bikhchandani et al. 1992, Tuckerand Zhang 2009). In general, OL theories predictthat in the presence of quality uncertainty, consumersdraw actionable inferences from popularity informa-tion. This effect applies to inferences about both musicand blog quality, leading to our first hypothesis.

Hypothesis 1A (H1A). Music sampling is positivelyassociated with both blog and music popularity.

We turn now to the dynamics of social influenceand examine how the effect of music/blog popularity

Page 4: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1059

changes over time. When a song first comes out,there is little word-of-mouth information available,quality uncertainty is high, and consumer choice isdriven by popularity information alone. This settingis conducive to the formation of information cascades,where other individuals’ actions (as summarized bypopularity measures) are likely to be the primarydriver of sampling behavior. Over time, consumers getexposed to other sources of information and word ofmouth, which tends to break up any information cas-cades. In this sense, the longer a song has been out, theinfluence of popularity is correspondingly less impor-tant. That is, over time consumers rely less on observa-tional learning as a way of discovering music or infer-ring its quality—consistent with the substitute infor-mation argument of Chen et al. (2010). We thereforehypothesize that

Hypothesis 1B (H1B). The association between sam-plingandbothblogandmusicpopularity is stronger fornewlyreleased songs compared with previously released songs.

We turn now to the differential effects of blog pop-ularity for mainstream versus niche music. Using thesearch versus experience good classification discussedabove, we note that mainstream music is more like asearch good, whereas niche music is closer in natureto experience goods. This is because consumers mightalready be aware of mainstream music through itsplay on the radio and other mainstream media, butthey are less likely to have had prior exposure toniche music. Blogs catering more to niche music arelikely to have proportionately less following, so therewould be a higher level of quality uncertainty in themind of the average user. Therefore, we predict thatblog popularity is a more informative proxy for thequality of blogs catering to niche music as comparedto those offering mainstream music.

This argument is also supported by the literatureon word-of-mouth effects. Specifically, Bearden andEtzel (1982) and Senecal and Nantel (2004) makea clear case that personal influence is more impor-tant for experience goods. In our context, this meansthat influence associated with popular blogs is moreimportant for niche music sampling compared withmainstream music sampling, leading to the followinghypothesis.

Hypothesis 2A (H2A). The association of samplingwith blog popularity is stronger in the tail compared withthe body of song sales distribution.

Finally, we discuss the differential effects of musicpopularity for mainstream versus niche music. Wewould expect niche music to have greater qual-ity uncertainty in the mind of the user comparedwith mainstream music. Accordingly, we expect thatmusic popularity, at the margin, is a relatively more

important signal of quality for niche music. Also, inthe conceptual framework of King and Balasubrama-nian (1994), “other-based preference formation” (rely-ing on the choices of others for making one’s ownchoices) is more salient for experience goods com-pared with search goods.

On the other hand, it has been argued in the word-of-mouth literature that informational influence is lessimportant for high-preference heterogeneity services(Price et al. 1989): if everybody has different taste,then knowing what another consumer has chosendoes not influence one’s own choice. To the extent thatpreferences for niche music are more heterogeneous,we should expect peer influence to be less importantfor niche music as compared with mainstream music.This effect works in the opposite direction to the OLargument above, and so it is an empirical question asto which effect is dominant. For the sake of hypothe-sis testing, we posit the following.

Hypothesis 2B (H2B). The association of samplingwith music popularity is stronger in the tail compared withthe body of the sales distribution.

3. Data and Empirical Specification3.1. DataWe combined data from three major sources: TheHypeMachine (amusicblogaggregator),Amazon.com,and Nielsen SoundScan. The Hype Machine (THM)4

is one of the leading music blog aggregators, track-ing MP3s that are posted on more than 1,200 musicblogs (at the time we received our data) and post-ing them on THM’s website. When THM posts thesongs on its site, it adds a “listen” link that allowsusers to listen to, but not download, the entire songthat was posted on the corresponding blog post. TheTHM posting also provides basic information aboutthe song, including the track title and artist, as wellas links to the track on iTunes, the correspondingalbum on Amazon, and back to the original blog post.THM has provided us with data for the full set ofsongs that were posted between July 1, 2006, andAugust 31, 2006. For each of these songs, THM hasprovided the total click-throughs on the “listen” link,the “Amazon” link, the “iTunes” link, and the “post”link from the date the song was posted until the datewe received the data, November 14, 2006. THM alsoprovided the basic description of the songs that areposted on the various music blogs (i.e., track title andartist), the date the song was posted on the musicblog and the date it was saved to THM, the Techno-rati score, and number of del.icio.us bookmarks forthe blog.

From the links that THM provided to Amazon.compages, for the album corresponding to each track

4 http://www.hypem.com.

Page 5: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail1060 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

posted on THM, we were able to extract the AmazonStandard Identification Number (ASIN) and obtainthe following data as of November 15, 2006: eachalbum’s Amazon sales rank, average value of all thecustomer reviews, number of customer reviews, thevalue of the last 100 reviews (from which we calcu-lated the standard deviation of the customer reviews),release date, and record label data.5 We categorizedeach album by record label based on whether it is“independent” or not, determined by whether thelabel was part of the Recording Industry Associationof America (one of the generally accepted methods ofclassifying the labels), and categorized the albums bythe corresponding genre as listed on allmusic.com.

We were also able to obtain radio play and sales datafor both albums and songs from Nielsen SoundScan.We obtained these data by taking a random sampleof 2,500 song titles from our complete data set, whereNielsen was able to match approximately 1,800 songswith data from its database. Specifically, Nielsen pro-vided us the weekly nationwide “spins,” i.e., the num-ber of times the song had been played on the radioanywhere in the United States. In addition, for all ofthe albums corresponding to the tracks in our dataset, we obtained weekly total album unit sales datafrom Nielsen SoundScan from the date of release ofthe album until April 2007. Nielsen SoundScan salesdata compile both online and off-line album sales, andNielsen is the data source used by Billboard musiccharts. We also obtained song-level sales data for asubset of the tracks in our THM data set. We were toldby Nielsen that song sales is primarily composed ofdigital downloads, and therefore song sales ought tocorrelate well with the online social interactions thatwe study.

Finally, we obtained information from Billboardcharts on artists associated with the songs in ourdata set. Specifically, we extracted data from the“Billboard Top 100 Artists of the Year” for the years2002 through 2006 and the “Billboard Hot 100 All-Time Top Artists,” which is a list of the top-sellingartists since 1958; this information was used to mea-sure artist reputation in our data set.

After combining the data from each of these datasources, we constructed a cross-sectional data set forsubset of songs posted on music blogs between July 1,2006, and August 31, 2006, with total sampling mea-sured on November 14, 2006; total radio play andsong sales measured at the end of the week corre-sponding to November 14, 2006; and the Amazon.comdata measured on November 15, 2006. Our final sam-ple is drawn from 281 unique blogs, comprising songs

5 Recall that the THM click-through data cover the period from thetime the song was posted on THM through November 14, 2006. Wecollected data from Amazon as soon after this date as possible.

from 1,088 unique albums in 24 genres. This samplehas both song-level data (the listen and click-throughdata from THM and the song sales and radio playdata from Nielsen SoundScan) and album-level data(the customer review and rank data from Amazonand album-level sales from Nielsen SoundScan).

The first two columns of Table 1 provide the labelsand descriptions of the variables we use in this study.These definitions are self-descriptive, but a few keyvariables warrant additional explanation. Sampling isthe click-throughs to the “listen” link on THM. Blog-Pop is the total number of del.icio.us bookmarks fora given blog from a given blog, and MusicPop ismeasured by the total number of track-level salesof a given song. RadioPlay is the total number oftimes a song has been played on the radio. Album-Sales are the sales of the album that correspond tothe track that was posted on THM. MusicPop, Radio-Play, and AlbumSales are each cumulative measures,measuring the total of each activity during the timeperiod we are studying (July 1, 2006, until Novem-ber 14, 2006). DaysRel is the number of days sincethe release of the album; DaysPost is the number ofdays since the track was posted on THM. RecentRelis a dummy variable indicating whether the songwas posted within 10 weeks of its release, to con-trol for potential bursts in sampling activity at thetime of album release, where 10 weeks was chosen asan appropriate timeframe based on previous researchindicating that albums remain on the Billboard chartsfor an average of 10 weeks after release (Bhatta-charjee et al. 2007). RevNum, RevVal, and RevStdDevare the number, average valence, and standard devia-tion of customer reviews posted on Amazon.com forthe album corresponding to the track, respectively.ArtistRep is a dummy variable, defined as 1 if thesong’s artist was on the Billboard Top 100 Artists ofthe Year between 2002 and 2006 or if the artist wason the Hot 100 All-Time Top Artists, which are thetop-selling artists since 1958, and 0 otherwise.

To address the research questions we are interestedin, we first need to define the long tail in music. Pre-vious research examining the long tail for book salesdefines the tail as the sales in the offerings outside thecapacity of a typical brick-and-mortar store (Brynjolf-sson et al. 2006, Anderson 2006). Although this num-ber has been established in studies done on books(100,000), a concrete number has not been establishedfor music. It is well known that Walmart, which is oneof the largest retailers of music, carries up to 5,000music albums in its store; thus, we define the “tail”of the music sales distribution as those albums hav-ing an Amazon.com rank greater than 5,000 and the“body” as those albums having an Amazon.com rankof less than 5,000. As a robustness check, we alsoanalyze alternative partitions of the distribution into

Page 6: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1061

Table 1 Summary Statistics

Variable Description Full sample Bodya Taila

Sampling Total number of click-throughs to the MP3 link 417014 578047 25605446200605 48060075 42650345

BlogPop Blog popularity, represented by the number of del.icio.us bookmarks 55028 50080 5907241160415 4910275 41360845

MusicPop Total unit song sales 81289016 141538045 2106100646011270315 48414790865 4516550325

RadioPlay Total number of times the song was played on the radio 11328097 21327033 3360234915710655 41314140475 4113810375

DaysPost Number of days since the song was posted on THM 104054 104093 1040154170755 4180225 4170275

DaysRel Number of days since the release of the album 11643058 11513095 117720634117670585 4117090675 4118150195

RevVal Average valence of customer reviews on Amazon 4035 4037 4033400435 400405 400465

RevNum Number of customer reviews on Amazon 102026 151043 5303241740385 42090255 41100985

RevStdDev Standard deviation of the last 100 customer reviews on Amazon 0097 1001 0093400325 400275 400375

AlbumSales Album unit sales since MP3 posted on THM 221590096 421559096 2171200148413280505 411159160925 4514230385

SalesRank Amazon sales rank 141784004 11805010 27170404442317150325 4113210935 42810380925

RecentRel Dummy variable = 1 if song posted within 10 weeks of release date; 0018 0020 00160 otherwise 400385 400405 400365

ArtistRep Dummy variable = 1 if artist has “high” reputation; 0 otherwise 0013 0022 0005400345 400425 400215

Indie Dummy variable = 1 if independent label; 0 otherwise 0032 0026 0038400445 400445 400495

Tail Dummy variable = 1 if music is in the tail; 0 otherwise 0050400505

N 1,762 880 882

aThe “body” is defined as albums with Amazon sales rank ≤ 51000, whereas the “tail” consists of albums with Amazon sales rank > 51000. Standarddeviations are in parentheses.

body and tail based on cutoffs of 2,000 and 10,000,respectively.

When we examine the summary statistics inTable 1, we see that the average sales rank in thebody is roughly 1,800 compared to almost 28,000 inthe tail. The distinction becomes even more evidentwhen we see that average song sales (MusicPop) in thetail are much lower (but have considerable variation)than in the body; similarly, the average AlbumSalesare much higher in the body than in the tail. Blog-Pop has approximately the same average value andstandard deviation in the full sample, the body, andthe tail. The average RevNum in the body is muchlarger than the average RevNum in the tail, whereasthe mean of RevVal is high and similar across the twogroups. Whereas 38% of albums in the tail are inde-pendent albums, only 26% of the albums in the bodyare independent. The average of Sampling is lowerin the tail. DaysPost and DaysRel are approximatelythe same across the subsamples. Roughly 22% of the

artists in the body have an established reputation,whereas only 5% of the artists of albums in the tail do.

3.2. Empirical SpecificationWe are primarily interested in understanding the rela-tionship between sampling and music/blog popular-ity. Thus, our dependent variable is Sampling, andthe key explanatory variables are blog popularityBlogPop and music popularity MusicPop. These arelinked together in the following equation, where forany track i,

log4Samplingi5

= �0 +�1 log4BlogPopi5+�2 log4MusicPopi5

+�3 RevVali +�4 log4RevNumi5

+�5 RevStdDevi +�6 log4DaysPosti5

+�7 log4DaysReli5+�8 RecentReli

+�9 Indiei +�10 ArtistRepi +L∑

l=1

�lGenreil + �i1 (1)

Page 7: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail1062 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

Tabl

e2

Corr

elat

ion

Mat

rix

Log(Sampling)

Log(BlogPo

p)Lo

g(MusicPo

p)Lo

g(Ra

dioP

lay)

Log(Sales)

Log(SalesR

ank)

RevVal

Log(Re

vNum

)Re

vStdDe

vLo

g(Da

ysPo

st)

Log(Da

ysRe

l)ArtistRep

RecentRe

lIndie

Tail

Log(Sampling)

100

079∗

∗∗

0042

5∗∗∗

0018

9∗∗∗

0044

3∗∗∗

−00

444∗

∗∗

−00

056∗

∗00

333∗

∗∗

0014

6∗∗∗

−00

007

−00

073∗

∗∗

0018

2∗∗∗

0006

4∗∗∗

0010

1∗∗∗−

0037

3∗∗∗

Log(BlogPo

p)1

−00

028

−00

043∗

0000

3−

0000

100

020

−00

046∗

0002

8−

0000

2−

0004

8∗∗

−00

040∗

0004

7∗∗

−00

013

−00

013

Log(MusicPo

p)1

0073

9∗∗∗

0046

5∗∗∗

−00

477∗

∗∗

−00

063∗

∗∗

0034

6∗∗∗

0012

0∗∗∗

0001

200

068∗

∗∗

0026

8∗∗∗

−00

068∗

∗∗

−00

230∗

∗∗−

0037

9∗∗∗

Log(Ra

dioP

lay)

100

329∗

∗∗

−00

317∗

∗∗

−00

068∗

∗∗

0021

2∗∗∗

0009

3∗∗∗

0001

400

071∗

∗∗

0002

4∗∗∗

−00

022

−00

216∗

∗∗−

0025

3∗∗∗

Log(Sales)

1−

0081

4∗∗∗

−00

097

0047

1∗∗∗

0024

8∗∗∗

0009

5∗∗∗

−00

354∗

∗∗

0033

1∗∗∗

0032

5∗∗∗

−00

134∗

∗∗−

0066

5∗∗∗

Log(SalesR

ank)

1−

0002

7−

0056

8∗∗∗

−00

162∗

∗∗

−00

008

0010

1∗∗∗

−00

337∗

∗∗

−00

087∗

∗∗

0018

7∗∗∗

0081

4∗∗∗

RevVal

1−

0009

9∗∗∗

−00

673∗

∗∗

−00

018

0016

4∗∗∗

−00

065∗

∗∗

−00

041∗

0009

3∗∗∗−

0004

4∗

Log(Re

vNum

)1

0033

3∗∗∗

0001

600

381∗

∗∗

0040

8∗∗∗

−00

300∗

∗∗

−00

315∗

∗∗−

0046

0∗∗∗

RevStdDe

v1

0001

0−

0010

5∗∗∗

0013

6∗∗∗

−00

021

−00

070

−00

120∗

∗∗

Log(Da

ysPo

st)

100

069∗

∗∗

0011

4∗∗∗

0000

0−

0004

1∗∗

−00

017

Log(Da

ysRe

l)1

0017

3∗∗∗

−00

676∗

∗∗

−00

322∗

∗∗

0008

5∗∗∗

ArtistRep

1−

0009

9∗∗∗

−00

223∗

∗∗−

0025

5∗∗∗

RecentRe

l1

0007

4∗∗∗−

0005

1∗∗

Indie

100

133∗

∗∗

Tail

1

Note

.Th

ese

resu

ltsar

eba

sed

ona

tota

lof1

,762

obse

rvat

ions

.∗∗∗,∗

∗,a

nd∗De

note

sign

ifica

nce

at1%

,5%

,and

10%

,res

pect

ivel

y.

Table 3 Hausman Specification Test

Efficient under H0 Consistent under H1 Statistic p-Value

OLS 2SLS 94.15 p < 0001

where Genreil, for l = 11 0 0 0 1L1 are dummy variablesso that Genreil = 1 if track i is of genre l and 0 other-wise; the other variables are as described in Table 1.

In the above specification, one might suspect theendogeneity of BlogPop and MusicPop because of miss-ing variables jointly correlated with the dependentvariable Sampling. One example of such a missingvariable might be song quality because sampling andboth popularity measures are likely to be increasingin song quality. Song reviews on sites such as Amazonand iTunes could serve as a useful measure of songquality, but unfortunately, song-level reviews are onlyavailable for a very small fraction of songs in ourdata set. We are able to obtain album-level reviewdata, which should be correlated with song quality,although imperfectly. We include album review datain the specification (RevVal, RevNum, and RevStdDev),but this may not remove endogeneity completely.

We conducted the Hausman specification test tohelp choose the correct estimation method, wherewe compare ordinary least squares (OLS) with two-stage least squares (2SLS) estimation. Table 3 presentsthe results from the Hausman specification test, com-paring OLS with 2SLS, where the latter allows forthe endogeneity of BlogPop and MusicPop (alongwith Sampling), which indicates that 2SLS is pre-ferred to OLS. The endogenous variable BlogPop isinstrumented by the variable Technorati,6 measuredas the number of in-links to a given blog (whichis highly correlated with BlogPop but not correlatedwith Sampling). MusicPop is also endogenous andis instrumented by the variable AlbumSales.7 Theset of instrumental variables also includes all otherexogenous variables in Equation (1). Finally, White’s(1980) test indicated significant heteroskedasticity, sowe use heteroskedasticity-adjusted standard errorsthroughout.

4. Empirical Results4.1. Hypothesis TestsIn this section we present the results of tests for ourhypotheses. Table 4 presents results comparing the

6 A blog with more links to or from other blogs (i.e., having a higherTechnorati score) has a higher blog popularity in the music blo-gosphere, which should be correlated with blog popularity withinTHM but will not necessarily drive sampling within the THM site;this is supported by the correlations mentioned.7 Finding an instrument for MusicPop is difficult. We use AlbumSalesbecause it is correlated to sales of songs within the album; however,albums sales is less likely to drive song-level sampling directly.

Page 8: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1063

Table 4 Regression Results for Full Sample Based on OLS and2SLS Estimation

OLS 2SLS

Intercept 40002∗∗∗ 102414006875 4008755

Log(BlogPop) 00067∗∗∗ 00070∗∗∗

4000155 4000225Log(MusicPop) 00194∗∗∗ 00551∗∗∗

4000135 4000415RevVal 00020 00039

4000685 4000825Log(RevNum) 00241∗∗∗ 00086∗∗∗

4000225 4000315RevStdDev −00099 −00048

4001045 4001285Log(DaysPost) −00033 00004

4001205 4001455Log(DaysRel) −00128∗∗∗ −00045

4000275 4000345Indie 00014 00215∗∗∗

4000515 4000635RecentRel 00193∗∗ 00296∗∗∗

4000805 4000935ArtistRep 00121 −00084

4000785 4000955

N 1762 1762Adjusted R2 00299 00247

Notes. Variables are as defined in Table 1. The dependent variable isLog(Sampling), and the results correspond to the OLS and 2SLS estimationof Equation (1), with tail set at the Amazon sales rank of 5,000. Log(BlogPop)and Log(MusicPop) are treated as endogenous variables, and the instru-ments are Log(Technorati) and Log(AlbumSales) as well as all other exoge-nous independent variables. Heteroskedasticity-adjusted standard errors arein parentheses.

∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively.

OLS and 2SLS estimation methods for the full sample.We see that the signs and significance of the coeffi-cients are largely consistent. Given the results of theHausman test reported in Table 3, we continue ouranalysis using only 2SLS for our subsample estima-tion. Focusing on the 2SLS column, we see that theestimated coefficients are generally consistent withour prior expectations: BlogPop and MusicPop are bothpositive and significant, providing support for H1A.Looking at the other coefficient estimates, we see thatRevNum is positive and significant, as is RecentRel,confirming that a song that is posted closer to itsalbum release day is sampled more. We also see thatIndie is positive and significant, indicating that songsreleased by independent labels are on average sam-pled more than are songs released by major labels.Interestingly, RevVal, DaysPost, DaysRel, and ArtistRepare not significant.

To examine H1B, Table 5 presents results for a sam-ple split based on a song’s life cycle, partitioningthe data set based on whether the song was postedwithin 13 weeks of the album’s release date (the

so-called “shallow releases” in the music industry) oroutside 13 weeks of the album’s release date (“deepreleases” in the music industry). We see that H1B issupported by the results. Specifically, the MusicPopcoefficient is larger for shallow releases comparedwith deep releases, and the difference is significant(p < 0001). The point estimate of BlogPop is positiveand significant for both subsamples. The coefficientis larger in magnitude for shallow releases comparedwith deep releases, consistent with H1B, but the dif-ference is not significant.

Now we turn to the estimation results of regres-sion Equation (1) for the body and tail subsamples,presented in Table 6. We find support for both H2Aand H2B. That is, both BlogPop and MusicPop havea stronger association with sampling in the tail com-pared with the body (p < 0001). More specifically, themarginal effect of blog popularity is a stronger deter-minant of music sampling for niche music (musicin the tail) than for mainstream music (music in thebody). Similarly, the marginal effect of music popu-larity is a stronger determinant of music sampling

Table 5 Regression Results for Recent vs. Older Album Releases

Posted ≤ 13 weeks Posted > 13 weeksof album release of album release

(shallow releases) (deep releases)

Intercept −30678 20929∗∗∗

4203805 4009455Log(BlogPop) 00089∗ 00060∗∗

4000495 4000255Log(MusicPop) 00926∗∗∗ 00465∗∗∗

4001505 4000395RevVal 00243 −00009

4001795 4000985Log(RevNum) −00265∗∗ 00128∗∗∗

4001275 4000325RevStdDev 00481 −00147

4003085 4001505Log(DaysPost) 10284∗∗∗ −00144

4004785 4001525Log(DaysRel) −00788∗∗∗ −00077∗∗

4003025 4000335Indie −00086 00207∗∗∗

4001675 4000675ArtistRep −00513 −00006

4004195 4000915

N 365 1397Adjusted R2 00256 00243

Notes. Variables are as defined in Table 1. The dependent variable isLog(Sampling), and the results correspond to the 2SLS estimation of Equa-tion (1), with tail set at Amazon sales rank of 5,000. Log(BlogPop) andLog(MusicPop) are treated as endogenous variables, and the instrumentsare Log(Technorati) and Log(AlbumSales) as well as all other exogenousindependent variables. Heteroskedasticity-adjusted standard errors are inparentheses.

∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively.

Page 9: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail1064 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

Table 6 Regression Results for Body and Tail Subsamples

Body Tail(Amazon rank ≤ 51000) (Amazon rank > 51000)

Intercept 00408 102714101595 4108375

Log(BlogPop) 00044 00110∗∗

4000295 4000445Log(MusicPop) 00476∗∗∗ 00885∗∗∗

4000435 4001365RevVal 00201 −00168

4001295 4001425Log(RevNum) 00124∗∗∗ 00134∗∗

4000445 4000565RevStdDev 00021 −00268

4002095 4002125Log(DaysPost) 00095 −00109

4001775 4002925Log(DaysRel) −00050 −00129∗∗

4000435 4000655Indie 00288∗∗∗ 00246∗∗

4000815 4001255RecentRel 00440∗∗∗ 00149

4001215 4001895ArtistRep −00076 −00208

4000995 4003085

N 880 882Adjusted R2 00227 00082

Notes. Variables are as defined in Table 1. The dependent variable isLog(Sampling), and the results correspond to the 2SLS estimation of Equa-tion (1), with tail set at the Amazon sales rank of 5,000. Log(BlogPop) andLog(MusicPop) are treated as endogenous variables, and the instruments areLog(Technorati) and Log(AlbumSales) as well as all other exogenous inde-pendent variables. Heteroskedasticity-adjusted standard errors are in paren-theses.

∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively.

for niche music than for mainstream music. Together,these results indicate that the popularity of both musicand blogs is a particularly important signal of qual-ity for niche music, where there is more uncertaintycompared with mainstream music.

4.2. Robustness ChecksWe start by considering alternative partitions of bodyand tail to make sure our comparative results arenot an artifact of the criterion used for the distinc-tion. To do this, we consider two different sets ofpartitions, one based on a threshold of 2,000 for theAmazon rank and the other using a cutoff of 10,000.The results of the 2SLS estimation for these subsam-ples are reported in Table 7. As can be seen, theresults for both sets of subsamples are consistent withH2A and H2B.

One might be concerned that because of theunequal sampling of music blogs, the most popularblogs might be skewing our results. We can test thisissue by eliminating the blogs with extremely highpopularity. We do this by reestimating Equation (1)

after removing the top 10% and top 25% most popu-lar blogs as measured by BlogPop. The results for bothof these reduced samples (not reported for the sake ofbrevity) are largely consistent with our prior findings.

5. Discussion and ConclusionsIn this study, we integrated the research on observa-tional learning and the long tail to understand howconsumers make music consumption decisions, par-ticularly looking at how these decisions are madedifferently for mainstream and niche music. We pro-pose that because of the quality uncertainty associatedwith niche music, observational learning would be astronger driver of consumption of niche music (in thetail) compared with mainstream music (in the body).Indeed, our empirical results establish that both blogpopularity and music popularity have a strongerassociation with sampling in the tail compared withthe body of music sales distribution. We also findthat product life cycle plays a role in the relation-ship between popularity information and consump-tion in that music popularity information becomesless important the longer the music has been on themarket, though we see a tendency for blog popularityto be more influential for older music compared withnewer music.

Given our findings, a natural and interesting ques-tion that follows is how this opportunity for musicdiscovery through social media (not only traditionalmedia, i.e., the radio) ultimately affects music sales.We are not able to provide a definitive analysis of thisimportant question because of limitations in our dataset: we only have music blog sampling data from onesite (The Hype Machine), which is only a fraction ofoverall blog sampling. However, we have found thatsampling on THM and overall blog buzz8 during thistime period are positively and significantly correlated,thus indicating that the THM audience may be rep-resentative of the broader population of online musicconsumers;9 i.e., THM sampling might be a reason-able proxy for overall blog sampling. With this caveatin mind, Table 8 provides the results of a 2SLS regres-sion that relates song sales to radio play and THMsampling (both treated as endogenous), along withseveral control variables. Looking at the full sample,we see that both radio play (traditional media) and

8 Blog buzz is calculated by the number of blogs that blogged abouta given song, as indicated by Google Blog Search, during the sametime period as our sampling data from THM.9 For the songs in our data set, we examined the correlationbetween sampling on THM and overall buzz about the songs inblogs on the Internet and found that the correlation between THMsampling and blog buzz is 0.357 (p < 0001). This suggests that theintensity of sampling on THM is a good proxy for the overall buzzabout the album in the blogosphere.

Page 10: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1065

Table 7 Robustness to Alternative Partition of Body and Tail

Body Tail Body Tail(Amazon rank ≤ 21000) (Amazon rank > 21000) (Amazon rank ≤ 101000) (Amazon rank > 101000)

Intercept −00972 00354 00292 008124105815 4103475 4100465 4207575

Log(BlogPop) 00051 00079∗∗ 00061∗∗ 00107∗

4000415 4000315 4000265 4000635Log(MusicPop) 00568∗∗∗ 00732∗∗∗ 00497∗∗∗ 10147∗∗∗

4000585 4000865 4000395 4002695RevVal 00197 00063 00191 −00305

4001925 4001065 4001085 4002035Log(RevNum) 00164∗∗ 00098∗∗ 00081∗∗ 00227∗∗∗

4000685 4000425 4000395 4000795RevStdDev 00023 −00003 −00014 −00541∗

4003665 4001605 4001775 4003025Log(DaysPost) 00163 −00002 00102 −00065

4002395 4002105 4001645 4004225Log(DaysRel) −00046 −00095∗∗ −00024 −00229∗∗∗

4000605 4000475 4000385 4000975Indie 00327∗∗∗ 00253∗∗∗ 00287∗∗∗ 00303

4001235 4000915 4000735 4001885RecentRel 00638∗∗∗ 00147 00467∗∗∗ −00127

4001745 4001265 4001095 4002315ArtistRep −00161 00017 −00056 −00219

4001295 4001765 4000985 4004595

N 525 1237 1146 616Adjusted R2 00230 00129 00238 00042

Notes. Variables are as defined in Table 1. The dependent variable is Log(Sampling), and the results correspond to the 2SLS estimation of Equation (1), withtail set at Amazon sales rank of 2,000. Log(BlogPop) and Log(MusicPop) are treated as endogenous variables, and the instruments are Log(Technorati) andLog(AlbumSales) as well as all other exogenous independent variables. Heteroskedasticity-adjusted standard errors are in parentheses.

∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively.

sampling (social media) are associated with incremen-tal song sales. The point estimates for the body/tailsubsamples differ slightly from each other, but the dif-ferences are not statistically significant.

These findings have implications for artists, musicblogs, and music communities such as The HypeMachine. Given that the goals of these entities areto increase consumption of music, particularly musicthat may not be discovered otherwise, the results ofour study provide straightforward recommendationsfor engaging social media and driving consumption.For example, artists that do not have an establishedreputation (and therefore are likely to have musicin the tail of the distribution) will benefit from theendorsement from a popular music blog; this is likelyto increase consumption through sampling. Musicblogs benefit from being popular because consumerswill seek out their recommendations; thus, engagingtheir users in such a way they bookmark and recom-mend their blog is important. Finally, taking THM asan example, the goal of many of these communitiesis about “music discovery.” We have seen that blogpopularity drives sampling of otherwise less knownmusic (in the tail) as well as older music; displayingthe blog popularity information could help increasethe diversity of consumption even more.

Table 8 2SLS Estimation of Song Sales

Body (Amazon Tail (AmazonFull sample rank ≤ 51000) rank > 51000)

Intercept 00415 20058∗∗∗ −003214005715 4009675 4007955

Log(RadioPlay) 00454∗∗∗ 00446∗∗∗ 00445∗∗∗

4000115 4000155 4000175Log(Sampling) 00692∗∗∗ 00712∗∗∗ 00535∗∗∗

4000595 4000775 4000965RevVal 00024 −00200 00180

4000815 4001455 4001075Log(RevNum) 00064∗∗ 00106∗∗ −00049

00031 4000495 4000435RevStdDev −00074 −00401 00182

4001295 4002445 4001515Log(DaysRel) 00114∗∗∗ 00043 00241∗∗∗

4000355 4000485 4000495Indie −00118∗ −00096 −00094

4000685 4000935 4000955RecentRel −00236∗∗∗ −00321∗∗ −00162

4000965 4001305 4001365ArtistRep 00077 00008 00439∗∗

4000855 4000965 4001915

N 1762 880 882Adjusted R2 00639 00661 00515

∗∗∗, ∗∗, and ∗Denote significance at 1%, 5%, and 10%, respectively.

Page 11: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long Tail1066 Information Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS

Figure 3 Lorenz Curves: Music Blog Sampling vs. Song Sales

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Sampling Song sales

Note. The Gini coefficient, calculated as the proportion of the area between the 45� line and the Lorenz curve to the total area under the 45� line, is equal to0.53 for music blog sampling and 0.88 for song sales.

Putting together all of our results, we can distill outa few key observations. First, new and fast-growingsocial media in the form of music blogs are expos-ing consumers to a far wider range of music than dotraditional media such as the radio (see Figure 1). Sec-ond, music blog sampling is characterized by moreequality of consumption compared with the distri-bution of song sales, as shown in the Lorenz curvesin Figure 3. That is, consumers are more willing tosample music in the long tail, but they are less will-ing to purchase such songs. Finally, our preliminaryresults (see Table 8) with song sales as the dependentvariable suggest that music blog sampling is associ-ated with higher song sales—both in the body and inthe tail. This reminds us of the empirical analysis ofTucker and Zhang (2007), who show that online popu-larity information at an online retailer simultaneouslyresulted in both the “steep tail” (i.e., popular productsbecome more popular) and “long tail” (i.e., increasedsales of niche music) phenomena. This is plausiblein our context, with our results implying that socialmedia (i.e., music blog sampling) not only expandsales of popular songs but also bring new consumersinto the market by exposing them to niche music,something that is not feasible in the realm of tradi-tional media (i.e., radio play). These results provideevidence that social media are a driver of the longtail in sampling, as they provide access to a largervariety of music and enable consumers to find andconsume this more easily. Given that our results alsoshow preliminary evidence that sampling drives songsales, this suggests that social media could ultimatelylead to a long-tailing of music sales as well. We admitthis latter conclusion is somewhat speculative at this

point, and it would be fruitful for further research toexamine the issue in more concrete terms.

This work does have some limitations, overcom-ing which will provide some directions for futureresearch. First, we have sampling data at a singlepoint in time, so this precludes an analysis of howdemand shifts over time are related to blogging andsampling activity. The lack of data over time also pre-vents an analysis of peer effects in sampling, whereinsampling itself could be driven by prior samplingbehavior by other consumers. Second, as always, find-ing the appropriate instrumental variables for theendogenous variables is challenging; given the datathat we have access to, we have used the best instru-mental variables available and have provided tests todemonstrate the robustness of our results. Third, asdiscussed above, we have data from only one onlinemusic community, which limits the conclusions thatwe can draw on the relationship between bloggingand sales. We also observe only songs that have beenblogged about and do not observe the counterfac-tual; conducting an analysis where songs that both areand are not blogged about are observed could pro-vide additional insights into the role of observationallearning on consumption decisions. Finally, the gener-alizability of our results outside of the music domainis an open question and could serve as a fruitful direc-tion for further research.

AcknowledgmentsThe order of the authors is alphabetical and they contributedequally. The authors thank Anthony Volodkin from TheHype Machine for generously sharing his music blog aggre-gator data and NielsenSoundscan for providing essential

Page 12: Music Blogging, Online Sampling, and the Long Tail · and the Long Tail Sanjeev Dewan Paul Merage School of Business, University of California, Irvine, Irvine, California 92697, sdewan@uci.edu

Dewan and Ramaprasad: Music Blogging, Online Sampling, and the Long TailInformation Systems Research 23(3, Part 2 of 2), pp. 1056–1067, © 2012 INFORMS 1067

music sales data. One of the authors also acknowledges agenerous dissertation fellowship from CalIT2, sponsored bythe Emulex Corporation. Finally, the authors are grateful forhelpful comments and suggestions from the seminar par-ticipants at Conference on Information Systems and Tech-nology (CIST) 2007, Workshop on Information Systems andEconomics (WISE) 2007, and International Symposium ofInformation Systems (ISIS) 2007.

ReferencesAnderson, C. 2004. The long tail. Wired (October) 170–177.Anderson, C. 2006. The Long Tail. Hyperion Books, New York.Anderson, L. R., C. A. Holt. 1997. Information cascades in the lab-

oratory. Amer. Econom. Rev. 87(5) 847–862.Banerjee, A. V. 1992. A simple model of herd behavior. Quart. J.

Econom. 107(3) 797–817.Bearden, W. O., M. J. Etzel. 1982. Reference group influence on prod-

uct and brand purchase decisions. J. Consumer Res. 9(9) 183–194.Bhattacharjee, S., R. D. Gopal, K. Lertwachara, J. R. Marsden,

R. Telang. 2007. The effect of digital sharing technologies onmusic markets: A survival analysis of albums on rankingcharts. Management Sci. 53(9) 1359–1374.

Bikhchandani, S., D. Hirshleifer, I. Welch. 1998. Learning from thebehavior of others: Conformity, fads, and informational cas-cades. J. Econom. Perspect. 12(3) 151–170.

Brynjolfsson, E., Y. Hu, M. D. Smith. 2006. From niches to riches: Theanatomy of the long tail. Sloan Management Rev. 47(4) 67–71.

Chellappa, R. K., B. Konsynski, V. Sambamurthy, S. Shivendu. 2007.An empirical study of the myths and facts of digitization inthe music industry. Presentation 2007 Workshop InformationSystems Economics (WISE), Montreal.

Chen, Y., Q. Wang, J. Xie. 2010. Online social interactions: A naturalexperiment on word of mouth versus observational learning.Working paper, University of Florida, Gainesville.

Dellarocas, C. 2003. The digitization of word of mouth: Promisesand challenges of online feedback mechanisms. ManagementSci. 49(10) 1407–1424.

Godes, D., D. Mayzlin. 2004. Using online conversation tostudy word of mouth communication. Marketing Sci. 23(4)545–560.

Godes, D., D. Mayzlin. 2009. Firm-created word-of-mouth com-munication: Evidence from a field test. Marketing Sci. 28(4)721–739.

Kelman, H. C. 1961. Processes of opinion change. Public OpinionQuart. 25(1) 57–78.

King, M. F., S. K. Balasubramanian. 1994. The effects of expertise,end goal, and product type on adoption of preference forma-tion strategy. J. Acad. Marketing Sci. 22(2) 146–159.

Nelson, P. 1970. Information and consumer behavior. J. PoliticalEconom. 78(2) 311–329.

Price, L. L., L. F. Feick, R. A. Higie. 1989. Preference heterogeneityand coorientation as determinants of perceived informationalinfluence. J. Bus. Res. 19(3) 227–242.

Salganik, M. J., P. S. Dodds, D. J. Watts. 2006. Experimental study ofinequality and unpredictability in an artificial cultural market.Science 311(5762) 854–856.

Sanneh, K. 2004. A draining week in the indie-music spotlight.New York Times (October 18), http://www.nytimes.com/2004/10/18/arts/music/18band.html.

Senecal, S., J. Nantel. 2004. The influence of online product rec-ommendations on consumers’ online choices. J. Retailing 80(2)159–169.

Tucker, C., J. Zhang. 2007. Long tail or steep tail: A field investiga-tion into how popularity information affects the distribution ofcustomer choices. Working paper, MIT Sloan School WorkingPaper 4655-07, Cambridge, MA.

Tucker, C., J. Zhang. 2009. How does popularity information affectchoices? A field experiment. Management Sci. 57(5) 828–842.

White, H. 1980. A heteroskedasticity-consistent covariance matrix-estimator and a direct test for heteroskedasticity. Econometrica48(4) 817–838.