Download - Maschinelles Lernen und Data Mining distribution with mean vector zero. Instead assume that there is a non-zero population mean vector. 15 II Joint Learning Problem ...

Transcript

UniversitΓ€t Potsdam Institut fΓΌr Informatik

Lehrstuhl Maschinelles Lernen

Recommendation

Tobias Scheffer

Inte

lligent D

ata

Analy

sis

II

Recommendation Engines

Recommendation of products, music, contacts, ..

Based on user features, item features, and past

transactions: sales, reviews, clicks, …

User-specific recommendations, no global ranking

of items.

Feedback loop: choice of recommendations

influences available transaction and click data.

2

Inte

lligent D

ata

Analy

sis

II

Netflix Prize

Data analysis challenge, 2006-2009

Netflix made rating data available: 500,000 users,

18,000 movies, 100 million ratings

Challenge: predict ratings that were held back for

evaluation; improve by 10% over Netflixβ€˜s

recommendation

Award: $ 1 million.

3

Inte

lligent D

ata

Analy

sis

II

Problem Setting

Users π‘ˆ = {1,… ,π‘š}

Items 𝑋 = {1,… ,π‘šβ€²}

Ratings π‘Œ = {(𝑒1, π‘₯1, 𝑦1)… , (𝑒𝑛, π‘₯𝑛, 𝑦𝑛)}

Rating space 𝑦𝑖 ∈ π‘Œ

E.g., π‘Œ = βˆ’1,+1 , π‘Œ = {⋆,… . ,⋆⋆⋆⋆⋆}

Loss function β„“(𝑦𝑖 , 𝑦𝑗)

E.g., missing a good movie is bad but watching a

terrible movie is worse.

Find rating model: π‘“πœƒ: 𝑒, π‘₯ ↦ 𝑦.

4

Inte

lligent D

ata

Analy

sis

II

Problem Setting: Matrix Notation

Users π‘ˆ = {1,… ,π‘š}

Items 𝑋 = {1,… ,π‘šβ€²}

Ratings π‘Œ =

𝑦11 𝑦12

𝑦21 𝑦23

𝑦33

Rating space 𝑦𝑖 ∈ Ξ₯

E.g.,Ξ₯ = βˆ’1,+1 ,Ξ₯ = {⋆,… . ,⋆⋆⋆⋆⋆}

Loss function β„“(𝑦𝑖 , 𝑦𝑗)

5

item 1 2 3

user 1 user 2 user 3

Incomplete matrix

Inte

lligent D

ata

Analy

sis

II

Problem Setting

Model π‘“πœƒ(𝑒, π‘₯)

Find model parameters that minimize risk

πœƒβˆ— = argminθ∫ ∫ ∫ β„“ 𝑦, π‘“πœƒ 𝑒, π‘₯ 𝑝 𝑒, π‘₯, 𝑦 𝑑π‘₯π‘‘π‘’π‘‘π‘Ÿ

As usual: 𝑝 𝑒, π‘₯, 𝑦 is unknown β†’ minimize

regularized empirical risk

πœƒβˆ— = argminΞΈ β„“ 𝑦𝑖 , π‘“πœƒ 𝑒𝑖 , π‘₯𝑖

𝑛

𝑖=1+ πœ†Ξ©(πœƒ)

6

Inte

lligent D

ata

Analy

sis

II

Content-Based Recommendation

Idea: User may like movies that are similar to other

movies which they like.

Requirement: attributes of items, e.g.,

Tags,

Genre,

Actors,

Director,

…

7

Inte

lligent D

ata

Analy

sis

II

Content-Based Recommendation

Feature space for items

E.g., Ξ¦ = comedy, action, year, dir tarantino, dir cameron T

πœ™ avatar = 0, 1, 2009, 0, 1 T

8

Inte

lligent D

ata

Analy

sis

II

Content-Based Recommendation

Users π‘ˆ = {1,… ,π‘š}

Items 𝑋 = {1,… ,π‘šβ€²}

Ratings π‘Œ = {(𝑒1, π‘₯1, 𝑦1)… , (𝑒𝑛, π‘₯𝑛, 𝑦𝑛)}

Rating space 𝑦𝑖 ∈ π‘Œ

E.g.,Ξ₯ = βˆ’1,+1 ,Ξ₯ = {⋆,… . ,⋆⋆⋆⋆⋆}

Loss function β„“(𝑦𝑖 , 𝑦𝑗)

E.g., missing a good movie is bad but watching a

terrible movie is worse.

Feature function for items: πœ™: π‘₯ ↦ ℝ𝑑

Find rating model: π‘“πœƒ: 𝑒, π‘₯ ↦ 𝑦.

9

Inte

lligent D

ata

Analy

sis

II

Independent Learning Problems for Users

Minimize regularized empirical risk

πœƒβˆ— = argminΞΈ β„“ 𝑦𝑖 , π‘“πœƒ 𝑒𝑖 , π‘₯𝑖

𝑛

𝑖=1+ πœ†Ξ©(πœƒ)

One model per user:

π‘“πœƒπ‘’π‘₯ ↦ Ξ₯

One learning problem per user:

πœƒπ‘’βˆ— = argminπœƒπ‘’

β„“ 𝑦𝑖 , π‘“πœƒπ‘’π‘₯𝑖

𝑖:𝑒𝑖=𝑒+ πœ†Ξ©(πœƒπ‘’)

10

Inte

lligent D

ata

Analy

sis

II

Independent Learning Problems for Users

One learning problem per user:

βˆ€π‘’: πœƒπ‘’βˆ— = argminπœƒπ‘’

β„“ 𝑦𝑖 , π‘“πœƒπ‘’π‘₯𝑖

𝑖:𝑒𝑖=𝑒+ πœ†Ξ©(πœƒπ‘’)

Use any model class and learning mechanism; e.g.,

π‘“πœƒπ‘’π‘₯𝑖 = πœ™ π‘₯𝑖

Tπœƒπ‘’

Logistic loss + β„“2 regularization: logistic regression

Hinge loss + β„“2 regularization: SVM

Squared loss + β„“2 regularization: ridge regression

11

Inte

lligent D

ata

Analy

sis

II

Independent Learning Problems for Users

Obvious disadvantages of independent problems:

Commonalities of users are not exploited,

User does not benefit from ratings given by other

users,

Poor recommendations for users who gave few

ratings.

Rather use joint prediction model:

Recommendations for each user should benefit from

other osers’ ratings.

12

Inte

lligent D

ata

Analy

sis

II

Independent Learning Problems

13

Parameter vectors of independent

prediction models for users

Regularizer

πœƒ1

πœƒ2

Inte

lligent D

ata

Analy

sis

II

Joint Learning Problem

14

Parameter vectors of independent

prediction models for users

Regularizer

πœƒ1

πœƒ2

Inte

lligent D

ata

Analy

sis

II

Joint Learning Problem

Standard β„“2 regularization follows from the

assumption that model parameters are governed by

normal distribution with mean vector zero.

Instead assume that there is a non-zero population

mean vector.

15

Inte

lligent D

ata

Analy

sis

II

Joint Learning Problem

16

0

Graphical model of

hierarchical prior

πœƒ1 πœƒ2 πœƒπ‘š

πœƒ

Ξ£

Inte

lligent D

ata

Analy

sis

II

Joint Learning Problem

Population mean vector

πœƒ ~𝑁 0,1

πœ† 𝐼

User-specific mean vector:

πœƒπ‘’~𝑁 πœƒ ,1

πœ†πΌ

Substitution: πœƒπ‘’ = πœƒ + πœƒπ‘’β€² ; now πœƒ and πœƒπ‘’

β€² have mean

vector zero.

-Log-prior = regularizer

Ξ© πœƒ + πœƒπ‘’β€² = πœ† πœƒ

2+ πœ† πœƒπ‘’

β€² 2

17

Inte

lligent D

ata

Analy

sis

II

Joint Learning Problem

Joint optimization problem:

minπœƒ1β€² ,…,πœƒπ‘š

β€² ,πœƒ β„“ 𝑦𝑖 , π‘“πœƒπ‘’

β€² +πœƒ π‘₯𝑖 + πœ†Ξ© πœƒβ€²π‘’ + πœ† Ξ©(πœƒ )

𝑖:𝑒𝑖=𝑒𝑒

Parameters πœƒπ‘’β€² are independent, πœƒ is shared.

Hence, πœƒπ‘’ are coupled.

18

πœƒπ‘’ = πœƒ + πœƒπ‘’β€²

Coupling

strength

Global

regularization

Inte

lligent D

ata

Analy

sis

II

Discussion

Each user benefits from other usersβ€˜ ratings.

Does not take into account that users have different

tastes.

Two sci-fi fans may have similar preferences, but a

horror-movie fan and a romantic-comedy fan do

not.

Idea: look at ratings to determine how similar users

are.

19

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering

Idea: People like items that are liked by people who

have similar preferences.

People who give similar ratings to items probably

have similar preferences.

This is independent of item features.

20

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering

Users π‘ˆ = {1,… ,π‘š}

Items 𝑋 = {1,… ,π‘šβ€²}

Ratings π‘Œ = {(𝑒1, π‘₯1, 𝑦1)… , (𝑒𝑛, π‘₯𝑛, 𝑦𝑛)}

Rating space 𝑦𝑖 ∈ Ξ₯

E.g.,Ξ₯ = βˆ’1,+1 ,Ξ₯ = {⋆,… . ,⋆⋆⋆⋆⋆}

Loss function β„“(𝑦𝑖 , 𝑦𝑗)

Find rating model: π‘“πœƒ: 𝑒, π‘₯ ↦ 𝑦.

21

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering by Nearest Neighbor

Define distance function on users:

𝑑(𝑒, 𝑒′)

Predicted rating:

π‘“πœƒ 𝑒, π‘₯ = 1

π‘˜π‘¦π‘’π‘–,π‘₯

π‘˜ nearestneighbors 𝑒𝑖 of 𝑒

Predicted rating is the average rating of the

π‘˜ nearest neighbors in terms of 𝑑 𝑒, 𝑒′ .

No learning involved.

Performance hinges on 𝑑(𝑒, 𝑒′).

22

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering by Nearest Neighbor

Define distance function on users:

𝑑 𝑒, 𝑒′ =1

π‘šβ€² 𝑦𝑒′,π‘₯ , βˆ’π‘¦π‘’,π‘₯

2π‘šβ€²

π‘₯=1

Euclidean distance between ratings for all items.

Skip items that have not been rated by both users.

23

Inte

lligent D

ata

Analy

sis

II

Extensions

Normalize ratings (subtract mean rating of user,

divide by userβ€˜s standard deviation)

Weight influence of neighbors by inverse of

distance.

Weight influence of neighbors with number of jointly

rated items.

π‘“πœƒ 𝑒, π‘₯ =

1

𝑑(𝑒, 𝑒𝑖)𝑦𝑒𝑖,π‘₯π‘˜ nearest

neighbors 𝑒𝑖 of 𝑒

1

𝑑(𝑒, 𝑒𝑖)π‘˜ nearestneighbors 𝑒𝑖 of 𝑒

24

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Example

π‘Œ =4 5 45 5 15 3 4

How much would Alice enjoy Zombiland?

25

Matr

ix

Zom

bila

nd

Titanic

Death

Pro

of

Alice

Bob

Carol

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Example

π‘Œ =4 5 45 5 15 3 4

𝑑 𝑒, 𝑒′ =1

π‘šβ€² 𝑦𝑒′,π‘₯ , βˆ’π‘¦π‘’,π‘₯

2π‘šβ€²

π‘₯=1

𝑑 𝐴, 𝐡 =

𝑑 𝐴, 𝐢 =

𝑑 𝐡, 𝐢 =

26

M

Z

T

D

Alice

Bob

Carol

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Example

π‘Œ =4 5 45 5 15 3 4

𝑑 𝑒, 𝑒′ =1

π‘šβ€² 𝑦𝑒′,π‘₯ , βˆ’π‘¦π‘’,π‘₯

2π‘šβ€²

π‘₯=1

𝑑 𝐴, 𝐡 = 2.9

𝑑 𝐴, 𝐢 = 1

𝑑 𝐡, 𝐢 = 1.4

27

M

Z

T

D

Alice

Bob

Carol

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Example

π‘Œ =4 5 45 5 15 3 4

π‘“πœƒ 𝐴, 𝑍 =

1

𝑑(𝐴,𝑒𝑖)𝑦𝑒𝑖,𝑍2 nearest

neighbors 𝑒𝑖 of 𝐴

1

𝑑(𝐴,𝑒𝑖)2 nearest

neighbors 𝑒𝑖 of 𝐴

=

28

M

Z

T

D

Alice

Bob

Carol

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Example

π‘Œ =4 5 45 5 15 3 4

π‘“πœƒ 𝐴, 𝑍 =

1

𝑑(𝐴,𝑒𝑖)𝑦𝑒𝑖,𝑍2 nearest

neighbors 𝑒𝑖 of 𝐴

1

𝑑(𝐴,𝑒𝑖)2 nearest

neighbors 𝑒𝑖 of 𝐴

=1

2.95+

1

13

1

2.9+

1

1

29

M

Z

T

D

Alice

Bob

Carol

Inte

lligent D

ata

Analy

sis

II

Collaborative Filtering: Discussion

K nearest neigbor and similar methods are called

memory-based approaches.

There are no model parameters, no optimization

criterion is being optimized.

Each prediction reuqires an iteration over all training

instances β†’ impractical!

Better to train a model by minimizing an appropriate

loss function over a space of model parameter, then

use model to make predictions quickly.

30

Inte

lligent D

ata

Analy

sis

II

Latent Features

Idea: Instead of ad-hoc definition of distance

between users, learn features that actually

represent preferences.

If, for every user 𝑒, we had a feature vector πœ“π‘’ that

describes their preferences,

Then we could learn parameters πœƒπ‘₯ for item π‘₯ such

that πœƒπ‘₯Tπœ“π‘’ quantifies how much 𝑒 enjoys π‘₯.

31

Inte

lligent D

ata

Analy

sis

II

Latent Features

Or, turned around,

If, for every item π‘₯ we had a feature vector πœ™π‘₯ that

characterizes its properties,

We could learn parameters πœƒπ‘’ such that πœƒπ‘’Tπœ™π‘₯

quantifies how much 𝑒 enjoys π‘₯.

In practice some user attributes πœ“π‘’ and item

attributes πœ™π‘₯ are usually available, but they are

insufficient to understand π‘’β€˜s preferences and π‘₯β€˜s

relevant properties.

32

Inte

lligent D

ata

Analy

sis

II

Latent Features

Idea: construct user attributes πœ“π‘’ and item

attributes πœ™π‘₯ such that ratings in training data can

be predicted accurately.

Decision function: 𝑓Ψ,Ξ¦ 𝑒, π‘₯ = πœ“π‘’

Tπœ™π‘₯

Prediction is product of user preferences and item

properties.

Model parameters:

Matrix Ξ¨ of user features πœ“π‘’ for all users,

Matrix Ξ¦ of item features πœ™π‘₯ for all items.

33

Inte

lligent D

ata

Analy

sis

II

Latent Features

Optimization criterion:

Ξ¨βˆ—, Ξ¦βˆ—

= argminΞ¨,Ξ¦ β„“(𝑦𝑒,π‘₯ ,

π‘₯,𝑒

𝑓Ψ,Ξ¦ 𝑒, π‘₯ )

+ πœ† πœ“π‘’2

𝑒+ πœ™π‘₯

2

π‘₯

34

Feature vectors of all users

and all Items are regularized

Inte

lligent D

ata

Analy

sis

II

Latent Features

Both item and user features are the solution of an

optimization problem.

Number of features π‘˜ has to be set.

Meaning of the features is not pre-determined.

Sometimes they turn out to be interpretable.

35

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization

Decision function:

𝑓Ψ,Ξ¦ 𝑒, π‘₯ = πœ“π‘’Tπœ™π‘₯

In matrix notation:

π‘Œ Ξ¨,Ξ¦ = ΨΦT

Matrix elements: 𝑦 11 … 𝑦 1π‘šβ€²

⋱𝑦 π‘š1 𝑦 π‘šπ‘šβ€²

=

πœ“11 … πœ“1π‘˜

β‹±πœ“π‘š1 πœ“π‘šπ‘˜

πœ™11 … πœ™π‘šβ€²1

β‹±πœ™1π‘˜ πœ™π‘šβ€²π‘˜

36

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization

Decision function in matrix notation:

𝑦 11 … 𝑦 1π‘šβ€²

⋱𝑦 π‘š1 𝑦 π‘šπ‘šβ€²

=

πœ“11 … πœ“1π‘˜

β‹±πœ“π‘š1 πœ“π‘šπ‘˜

πœ™11 … πœ™π‘šβ€²1

β‹±πœ™1π‘˜ πœ™π‘šβ€²π‘˜

37

M

Z

T

D

Alice

Bob

Carol

Latent features

of Alice Latent features

of Matrix

Predicted rating

of Matrix for Alice

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization

Decision function in matrix notation:

𝑦 11 … 𝑦 1π‘šβ€²

⋱𝑦 π‘š1 𝑦 π‘šπ‘šβ€²

=

πœ“11 … πœ“1π‘˜

β‹±πœ“π‘š1 πœ“π‘šπ‘˜

πœ™11 … πœ™π‘šβ€²1

β‹±πœ™1π‘˜ πœ™π‘šβ€²π‘˜

38

M

Z

T

D

Alice

Bob

Carol

Latent features

of Alice

Latent features

of Carol Latent features

of Matrix

Latent features

of Death Proof

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization

Optimization criterion:

Ξ¨βˆ—, Ξ¦βˆ—

= argminΞ¨,Ξ¦ β„“(𝑦𝑒,π‘₯ ,

π‘₯,𝑒

𝑓Ψ,Ξ¦ 𝑒, π‘₯ )

+ πœ† Ξ¨2+ Ξ¦

2

Criterion is not convex:

For instance, multiplying all feature vectors with -1

gives an equally good solution:

𝑓Ψ,Ξ¦ 𝑒, π‘₯ = πœ“π‘’Tπœ™π‘₯ = (βˆ’πœ“π‘’

T)(βˆ’πœ™π‘₯)

Limiting the number of latent features to π‘˜ restricts

the rank of matrix π‘Œ .

39

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization

Optimization criterion:

Ξ¨βˆ—, Ξ¦βˆ—

= argminΞ¨,Ξ¦ β„“(𝑦π‘₯,𝑒,

π‘₯,𝑒

𝑓Ψ,Ξ¦ 𝑒, π‘₯ )

+ πœ† Ξ¨2+ Ξ¦

2

Optimization by

Stochastic gradient descent or

Alternating least squares.

40

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Stochastic Gradient Descent

Iterate through ratings 𝑦𝑒,π‘₯ in training sample

Let πœ“π‘’β€² ← πœ“π‘’ βˆ’ 𝛼

πœ•π‘“Ξ¨,Ξ¦(𝑒,π‘₯)

πœ•πœ“π‘’

Let πœ™π‘₯β€² ← πœ™π‘₯ βˆ’ 𝛼

πœ•π‘“Ξ¨,Ξ¦(𝑒,π‘₯)

πœ•πœ™π‘₯

Until convergence.

Requires differentiable loss function; e.g., squared

loss, …

41

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Alternating Least Squares

For squared loss and parallel architectures.

Alternate between 2 optimization processes:

Keep Ξ¦ fixed, optimize πœ“π‘’ in parallel for all 𝑒.

Keep Ξ¨ fixed, optimize πœ™π‘₯ in parallel for all π‘₯.

42

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Alternating Least Squares

For squared loss and parallel architectures.

Alternate between 2 optimization processes:

Keep Ξ¦ fixed, optimize πœ“π‘’ in parallel for all 𝑒.

Keep Ξ¨ fixed, optimize πœ™π‘₯ in parallel for all π‘₯.

Optimization criterion for Ξ¨:

πœ“π‘’βˆ— = argminπœ“π‘’

𝑦𝑒 βˆ’ 𝑦 𝑒2 βˆ’ πœ† πœ“π‘’

2

= argminπœ“π‘’π‘¦π‘’ βˆ’ πœ“π‘’

TΞ¦T 2βˆ’ πœ† πœ“π‘’

2

𝑦 𝑒1 … 𝑦 π‘’π‘šβ€² = πœ“π‘’1 … πœ“π‘’π‘˜

πœ™11 … πœ™π‘šβ€²1

β‹±πœ™1π‘˜ πœ™π‘šβ€²π‘˜

43

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Alternating Least Squares

For squared loss and parallel architectures.

Alternate between 2 optimization processes:

Keep Ξ¦ fixed, optimize πœ“π‘’ in parallel for all 𝑒.

Keep Ξ¨ fixed, optimize πœ™π‘₯ in parallel for all π‘₯.

Optimization criterion for Ξ¨:

πœ“π‘’βˆ— = argminπœ“π‘’

𝑦𝑒 βˆ’ πœ“π‘’TΞ¦T 2

βˆ’ πœ† πœ“π‘’2

πœ“π‘’βˆ— = ΦΦT + πœ†πΌ

βˆ’1Φ𝑦𝑒

44

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Alternating Least Squares

For squared loss and parallel architectures.

Alternate between 2 optimization processes:

Keep Ξ¦ fixed, optimize πœ“π‘’ in parallel for all 𝑒.

Keep Ξ¨ fixed, optimize πœ™π‘₯ in parallel for all π‘₯.

Optimization criterion for Ξ¨:

πœ“π‘’βˆ— = argminπœ“π‘’

𝑦𝑒 βˆ’ πœ“π‘’TΞ¦T 2

βˆ’ πœ† πœ“π‘’2

πœ“π‘’βˆ— = ΦΦT + πœ†πΌ

βˆ’1Φ𝑦𝑒

Optimization criterion for Ξ¦:

πœ™π‘₯βˆ— = argminπœ™π‘₯

𝑦π‘₯ βˆ’ πœ™π‘₯TΞ¨T 2

βˆ’ πœ† πœ™π‘₯2

πœ™π‘₯βˆ— = ΨΨT + πœ†πΌ

βˆ’1Ψ𝑦π‘₯

45

Inte

lligent D

ata

Analy

sis

II

Matrix Factorization by Alternating Least Squares

For squared loss and parallel architectures.

Initialize Ξ¨, Ξ¦ randomly.

Repeat until convergence:

Keep Ξ¨ fixed, for all 𝑒 in parallel:

πœ“π‘’ = ΦΦT + πœ†πΌβˆ’1

Φ𝑦𝑒

Keep Ξ¦ fixed, for all π‘₯ in parallel:

πœ™π‘’ = ΨΨT + πœ†πΌβˆ’1

Ψ𝑦π‘₯

46

Inte

lligent D

ata

Analy

sis

II

Extensions: Biases

Some users just give optimistic or pessimistic

ratings; some items are hyped. Decision function: 𝑓Ψ,Ξ¦,𝐡𝑒,𝐡π‘₯

𝑒, π‘₯ = 𝑏𝑒 + 𝑏π‘₯ + πœ“π‘’Tπœ™π‘₯

Optimization criterion: Ξ¨βˆ—, Ξ¦βˆ—, 𝐡𝑒, 𝐡π‘₯

= argminΞ¨,Ξ¦ β„“(𝑦π‘₯,𝑒,

π‘₯,𝑒

𝑓Ψ,Ξ¦,𝐡𝑒,𝐡π‘₯𝑒, π‘₯ )

+ πœ† Ξ¨2+ Ξ¦

2+ 𝐡𝑒

2+ 𝐡π‘₯

2

47

Inte

lligent D

ata

Analy

sis

II

Extensions: Explicit Features

Often, explicit user and item features are available.

Concatenate vectors πœ“π‘’ and πœ™π‘₯; explicit features

are fixed, latent features are free paremeters.

48

Inte

lligent D

ata

Analy

sis

II

Extensions: Temporal Dynamics

How much a user likes an item depends on the

point in time when the rating takes place.

𝑓Ψ,Ξ¦,𝐡𝑒,𝐡π‘₯,𝑑 𝑒, π‘₯ = 𝑏𝑒 𝑑 + 𝑏π‘₯(𝑑) + πœ“π‘’ 𝑑 Tπœ™π‘₯

49

Inte

lligent D

ata

Analy

sis

II

Summary

Purely content-based recommendation: users donβ€˜t

benefit from other usersβ€˜ ratings.

Collaborative filtering by nearest neighbors: fixed

definition of similarity of users. No model

parameters, no learning. Has to iterate over data to

make recommendation.

Latent factor models, matrix factorization: user

preferences and item properties are free

parameters, optimized to minimized discrepancy

between inferred and actual ratings.

50