SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

53
Jure Leskovec (@jure) Including joint work with J. McAuley, R. Pandey, L. Riedel 1 Jure Leskove, Stanford University & Pinterest

Transcript of SF BIG ANALYTICS: Pinterest Chief Scientist Prof. Jure Leskovec: Discovering Networks of Products

Jure Leskovec (@jure) Including joint work with J. McAuley, R. Pandey, L. Riedel

1 Jure Leskove, Stanford University & Pinterest

Connecting People & Objects

2 Jure Leskove, Stanford University & Pinterest

Internet

Offsite

Save

Do

On Pinterest

Pinterest: Discovery Engine

Visual Discovery Engine

Pins: Rich Objects

4 Jure Leskove, Stanford University & Pinterest

Boards: Collections

5 Jure Leskove, Stanford University & Pinterest

Boards: Collections

Pinners

Boards

Pins

Web Pages

Object Graph

Hyperlink Graph

From Pins to the Object Graph

30+ Billion Pins categorized by people into more than

750 Million Boards

50% of pins have been created in the last 6 months

8

How do we uncover relationships

between pins?

9

Object Graph

10

Can we understand how pins fit together

into a giant network?

Jure Leskove, Stanford University & Pinterest

Object Graph: Products

Pins & product catalogs: 10s of millions of products 100s of millions product reviews

How do we build the product graph Three components: Link Prediction Topic models Product hierarchies

11 Jure Leskove, Stanford University & Pinterest

Product Graph: Relations

12

Substitutes: Purchase instead

Complements: Purchase

in addition

Jure Leskove, Stanford University & Pinterest

Product Graph: Description

13

: cleaner; quieter

: cheaper; high power

: well made, easy to install

: fits perfectly, great value Jure Leskove, Stanford University & Pinterest

Product Graph: Overview

14

Substitute Complement

Jure Leskove, Stanford University & Pinterest

Product Graph: What it does?

1. Understand the notions of substitute and complement goods

is substitutable for

complements

15 Jure Leskove, Stanford University & Pinterest

Product Graph: What it does?

2. Generate explanations of why certain products are

preferred

“Good quality, soft, light weight, the colors are

beautiful and exactly like the picture!”

People prefer this because:

16 Jure Leskove, Stanford University & Pinterest

Product Graph: What it does?

3. Recommends baskets of related items

Query: Suggested outfit:

Query: Suggested outfit:

17 Jure Leskove, Stanford University & Pinterest

Product Graph: Overview

Building networks of products

Modeling: Can we use product data to model product relationships?

Understanding: Can we explain why people prefer certain products

over others?

18 Jure Leskove, Stanford University & Pinterest

Problem Setting

Binary prediction task: Given a pair of products, x and y, predict

whether they are related (substitute/complementary)

Goal: Build a probabilistic model

that encodes

19 Jure Leskove, Stanford University & Pinterest

Problem Setting How to learn

from data

Train by maximum likelihood:

20

X Complementary

Not Complementary

Jure Leskove, Stanford University & Pinterest

Attempt 1: Big bags of features

21

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

Jure Leskove, Stanford University & Pinterest

Attempt 1: Big bags of features

22

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

Parameterized probability measure (essentially weighted-nearest-neighbor)

Jure Leskove, Stanford University & Pinterest

Attempt 1: Big bags of features

23

Features of product i: [0,0,0,0,0,0,0,1,0,5,0,0,0, … ,0,1,0,0,0,0,0,1,2]

Features of product j: [0,0,0,1,0,0,0,0,0,0,0,1,0, … ,0,0,0,0,0,0,0,1,0]

aardvark zoetrope

• High-dimensional • Prone to overfitting • Too fine-grained

Jure Leskove, Stanford University & Pinterest

Attempt 2: Features from Topics

LDA

Shoes Female

Blei & McAuliffe (2007)

Product topics

Use any kind of product related features:

brand, price, reviews, product descriptions, …

Topic models:

24 Fa

shio

n Jure Leskove, Stanford University & Pinterest

Attempt 2: Features from Topics

Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]

Shoes Female

25 Jure Leskove, Stanford University & Pinterest

Attempt 2: Features from Topics

On the right track, but are the topics we are discovering

relevant to link prediction? 26

Features of product i: [0.1, 0.4, 0.2, 0.1, 0.2] Features of product j: [0.3, 0.1, 0.3, 0.2, 0.1]

Shoes Female

Jure Leskove, Stanford University & Pinterest

Attempt 3: Learn “good” topics

Learn to discover topics that explain the graph structure

27 Jure Leskove, Stanford University & Pinterest

Attempt 3: Learn “good” topics

Link Prediction

Product “topics”

Idea: Learn both simultaneously

Discover topics that “explain” product relations

28 Jure Leskove, Stanford University & Pinterest

Attempt 3: Learn “good” topics

Conceptually, we want to learn to project products into topic space such that

related products are nearby 29 Jure Leskove, Stanford University & Pinterest

The SCEPTRE Model

Combining topic models with link prediction

Topic model with topic distribution 𝜽𝜽 But, the topics should be “good” as features for the link prediction

30 Jure Leskove, Stanford University & Pinterest

The SCEPTRE Model: Details

31

Topic membership

Jure Leskove, Stanford University & Pinterest

The SCEPTRE Model

why do people who view X eventually buy Y?

There is a link between the two products because people use similar words to describe them

But in what direction does the link flow?

Issue 1: Relationships we want to learn are not symmetric

32 Jure Leskove, Stanford University & Pinterest

The SCEPTRE Model

why do people why view X eventually buy Y?

Solution: We solve this issue by learning “relatedness” in addition to “directedness”

Relationships: Explained by product “properties” “baby, pajamas, pants, colorful”

Directedness: Subjective/qualitative language “true size, fits well, items are the same color as on the picture”

33 Jure Leskove, Stanford University & Pinterest

Learning Multiple Graphs

35

browsed together

bought together

Issue 2: We want to learn multiple relationships simultaneously

We could fit two independent models, but learning both at once: 1) Gives us more data on which to train the complete model

2) Helps with interpretability, since both relationships are explained in terms of the same topics

Jure Leskove, Stanford University & Pinterest

Learning Multiple Graphs

36

Solution: We fix this by learning multiple regressors simultaneously (one for each graph),

that operate on a single set of topics

One regressor per graph

Jure Leskove, Stanford University & Pinterest

Sceptre is Not tractable

37

Issue 3: The model has a too many parameters

Thousands of topics multiplied by millions of products

Jure Leskove, Stanford University & Pinterest

Including Hierarchy

Idea: use the category

hierarchy to sparsify the

model

Solution: Product hierarchy

38 Jure Leskove, Stanford University & Pinterest

Including Hierarchy

39

Associate each node in the category tree with a small number of topics:

Now we can fit models with thousands of topics but only 10-20 are active per product

“Car audio” topics (for example) have probability zero of being

selected for this product

Topics at the top of the hierarchy are common to all electronics products, and will contain generic (though electronics

specific) language Jure Leskove, Stanford University & Pinterest

Training the model: EM

40

E-step (topic assignments)

M-step (link prediction)

Other topic/regression parameters (word distribution 𝜙𝜙 and topic assignments z)

Jure Leskove, Stanford University & Pinterest

Building the Product Graph Now, we can generate the product graph by identifying most probable links

For every product, rank all other products according to p(x is related to y)

But this is slow! Quadratic number of comparisons!

Solution: Use product hierarchy and a matching engine

43 Jure Leskove, Stanford University & Pinterest

Experiments Just for fun, let’s use the Amazon

product catalog:

44 Jure Leskove, Stanford University & Pinterest

Edge Prediction Accuracy

45 Jure Leskove, Stanford University & Pinterest

Ranking Performance

Manual examination shows great performance (false positives are actually very relevant)

46 Jure Leskove, Stanford University & Pinterest

Results: Micro-Categories

47 Jure Leskove, Stanford University & Pinterest

Results: Micro-Categories

48 Jure Leskove, Pinterest & Stanford University

Explaining user preferences Explain recommendations by identifying

words that “best explain” the link: Topic model we assign a topic to each word

Logistic regressor uses the words to make predictions

Identify phrases that maximize the likelihood of the link in order to explain it

49

Use the “directedness” model to generate explanations as it selects more subjective language (i.e., how do the products differ, and why was one product “preferable” over another).

Jure Leskove, Stanford University & Pinterest

Example: Product Graph

50 Jure Leskove, Stanford University & Pinterest

Example: Product Graph

51 Jure Leskove, Stanford University & Pinterest

Pinterest as a graph of objects

53

Connecting People & Objects

54 Jure Leskove, Stanford University & Pinterest

Tourist Attractions

Food Sporting Venues

San Francisco

Art Galleries

Pinterest Graph - Example User: ● likes classic art ● just viewed a pin

about things to do in SF Artists

Pinners

Boards

Images

Web Pages

Object Graph

Hyperlink Graph

From Pins to the Object Graph

We are hiring!

58

[email protected]

Inferring Networks of Substitutable and Complementary Products by J. McAuley, R. Pandey, J. Leskovec. ACM SIGKDD2015.