Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science...

26
Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences http://cs.uiowa.edu/~psriniva [email protected]

Transcript of Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science...

Page 1: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010)

Padmini Srinivasan

Computer Science Department Department of Management Sciences

http://cs.uiowa.edu/[email protected]

Page 2: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Compression (Ch 5)

Page 3: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Heaps

Page 4: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Zipf’s law

Page 5: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Broder et al. Graph Structure of the Web

Note that the exponent is different. Note also the deviation In the low end of the out-degree.

Probability page has in-degree k = 1/k2

Actual exponent slightly larger than 2.

Page 6: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Infinite-inventory retailers• Amazon, Netflix, iTunes music store, • Long tail markets• Items not in brick and mortar stores:

– 30% Amazon.com sales– 25% Netflix

• Success because of long tail markets.• Two different hypotheses

– Majority prefer popular and minority prefer niche items– Everyone likes some popular and some niche items

• Different impact on inventory control. If keeping mainstream items:– Satisfy most people nearly all the time– Irritate most people at least some of the time

• Knowing which model works/fits/explains behaviour better is important

Page 7: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Infinite-inventory retailers• Two different hypotheses

– Majority prefer popular and minority prefer niche items– Everyone likes some popular and some niche items

• Different impact on inventory control. If keeping mainstream items:– Satisfy most people nearly all the time– Irritate most people at least some of the time

• Knowing which model works better is important

• Their work supports the second hypothesis.• Also availability of tail items may boost sale of ‘head’ items ~ one-stop

shopping convenience• Not just the direct impact on revenue: second-order gains: customer

satisfaction.

Page 8: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Datasets examined

Web queries: stemming Urls: restricted to domains (click search data)Browsing: Nielsen data (domains)

Data trimming done

Page 9: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Long Tail

• What is it?– A relatively small number of items accounts for

large number of consumptions – old 80 – 20 rule.– Definition: popularity: fraction of total

consumption fulfilled by an item. Eg. fraction of checkouts associated with a particular book.

– Popularity of a movie: total times rated/total number of ratings

Page 10: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Two Long-Tail GraphsNetflix & Yahoo! Music

Typical inventory: 3000 (netflix) 50,000 (Yahoo! Music)Web search: 10 web sites > over 15% page viewsTop 10,000 web sites leaves 20% unaccounted.

Page 11: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

More Long Tail Graphs

Page 12: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Eccentric Tastes?

• An inventory: k-ranked (most popular) items• Definition User is p-percent satisfied if at least p

percent of consumption is in the k-ranked set.• Analysis: What percent of users are p-percent satisfied?• Netflix (k = 3000) only 11% of users are 100% satisfied;

63% are 90% satisfied• Yahoo! Music (k=50,000), only 5% users 100% satisfied;

32% are 90% satisfied• With brick and mortar almost none of the users

completely satisfied.

Page 13: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Eccentric Tastes?Netflix & Yahoo! music

Upper: 90% satisfaction; lower: 100 % satisfaction

Page 14: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Ratings versus Popularity

• The more obscure the less appreciated an item.

• So the more aware the more appreciated?– Studied with movies and music.– Relationship between popularity (rank) and rating

• Value of tail over emphasized because there is disproportionate dissatisfaction or satisfaction.– Tail end less dissatisfaction/satisfaction?

Page 15: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Ratings versus Popularity

• Pattern present Netflix but not in music dataset. (more obscure songs get even higher ratings).

Page 16: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Ratings versus Popularity

Tail end less dissatisfaction/satisfaction? (users disproportionately dissatisfied with tail end)

85% netflix users and 91% yahoo! Music users rated an item outside physical stores. (original 89% & 95% resp.)So can’t dismiss the long tail endsEven typical users have a need for tail end items

32% Netflix users, 56% of Yahoo!Music users had at least 10% itemsrated high in the tail

Page 17: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Null Hypothesis model

• Random model– Each user decides how many items to consume

(consistent with the empirical data. Fix number of users, number of items, and number selected/viewed/clicked/rated by users).

– Item selection by user also random but constrained to be according to popularity and without replacement.

– What are the limitations in this null model?

Page 18: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Null ModelNetflix & Yahoo! music

Upper: 90% satisfaction; lower: 100 % satisfactionNull models: users are much harder to satisfy. Eg: only 14% ofusers in null model are 90% satisfied compared to 64% (movies)with k=3000.

Page 19: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Implications?

• Though most users consume tail content part of the time– Sizeable fraction of users prefer head over tail

content that goes beyond the draw of popularity.– To compensate other users draw

disproportionately from the tail.

Page 20: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Consumption patterns: Users vs Popularity

Page 21: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Some patterns

• By moving from k = 3000 to 3500 movies, cumulative popularity increases 2% from 87 to 89% while 90% satisfaction increases more (7%) (63 to 70%).

• Movies that by popularity alone account for only 2% of the demand could potentially grow the overall customer base by 7% by attracting newly satisfied users.

• Searching: moving 95 to 96% along the tail increases 90% user satisfaction from 80 to 86%

Page 22: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Individual eccentricity: median rank of his/her consumed items.

Page 23: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

More on eccentricity

– Are those who are more ‘engaged’ (i.e., consume more) more eccentric?• No: correlations between two at individual level (low)

– But some observations at the group level

Page 24: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

More on eccentricity ~ web pages

Unique urls

Page 25: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Theoretical Analysis

• Independent model• Sticky model– Winner take all.

• Shared inventory approach

Page 26: Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences psriniva.

Summary

• Nice analysis long tail• Different perspectives combined– Popularity (cumulative and individual)– 90% , 100% satisfaction– Engagement versus ratings– Use of a null model to make predictions and compare– Nice graphs– Long tail helps in capturing user satisfaction and

retention