Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR...

Focused Matrix Factorization for

Audience Selection in Display Advertising

BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI , LLUIS GARCIA-PUEYO, JEFF YUAN

PRESENTER: I GDE DHARMA NUGRAHA

CHONNAM NATIONAL UNIVERSITY

Outlined Introduction

Problem in Matrix Factorization

Focused Matrix Factorization Model

Model Learning and Inference

Implementation

Experimental Evaluation

Conclusion

Introduction Audience selection or audience retrieval is the problem in display

advertising to display ads for those users who are most likely to show interest and respond positively to the campaigns.

The user’s past feedback on this campaign can be leveraged to construct such a list using collaborative filtering techniques such as matrix factorization.

However, the user-campaign interaction is typically extremely sparse, hence the conventional matrix factorization does not perform well.

Moreover, simply combining the users feedback from all campaigns does not address this since it dilutes the focus on target campaign in consideration.

Introduction To resolve these issues, this paper propose a novel focused matrix

factorization (FMF) which◦ Learns users’ preference towards the specific campaign products, while also

exploiting the information about related products.◦ Exploit the product taxonomy to discover related campaigns and design

models to discriminate between the users’ interest towards campaign products and non-campaign product.

Introduction The illustration of different approach in this paper is shown in the

figure.


For audience selection, response matrix for campaign c, say Xc, where Xc(i,j) = 1 if user i has purchased item j. Given a item p, the audience selection problem requires the algorithm to select the set of users that are most likely going to purchase p.

Matrix Factorization is one method that could be used to solve the audience retrieval task.

Matrix Factorization measure the affinity between a user and an unpurchased item as the dot product between the corresponding factors and finally select the top-k users with the highest affinity for each item.

In matrix factorization, assume each user u can be represented by a latent factor and each item i can be represented by a latent factor so the users’ affinity of user u towards item i follow this model:

Here is the affinity of user u to item I, represent the dot product of the corresponding user and item factors.

Problem in Matrix Factorization From the user affinity model, this approach compute parameter to

determine the ranking values for and .

A widely used approach to learning is Bayesian Personalized Ranking (BPR). BPR is used in MF to learn a ranking function Ri for each item i that ranks i’s interesting users higher than the non-interesting users.

If user u1 has purchased item i and user u2 has not purchased the item, the ranking will be . Then . Based on this argument, likelihood function is given by:

Where Li is the list of users who have purchased item i.

Problem in Matrix Factorization is the logistic sigmoid function to approximated the non-smooth,

non-differentiable expression , where

This method use a Gaussian prior over all the factors in and compute the MAP (Maximum Aposteriori) estimate of . The posterior over is given by:

And the formula below maximize the above posterior function:

Problem in Matrix Factorization The first summation term corresponds to the log-likelihood, i.e.,

whereas the second term corresponds to the log of the Gaussian-prior, i.e., .

is a constant, proportional to . is given by the following expression:

That commonly called as the regularization term, and is used to prevent overfitting by keeping the learned factors and sparse.

Problem in Matrix Factorization To compute the factor of user and item, matrix factorization use the

derivative function from Stochastic Gradient Descent.

A given training data point (i, u1, u2), the derivative with respect to the , , and variables are shown below:

Where equal to


The update equation corresponding to derivative function are shown below:

Where is the learning rate which is set to a small value and is regularization term, usually chosen via cross-validation.

Matrix Factorization works well when the campaign has a rich response matrix, it is known to struggle in the presence of sparsity, where:

◦ Most tail campaigns are small in term of the number of items and have a “narrow” response matrix,

◦ And for new campaign where there is not have enough historical purchase data.

Focused Matrix Factorization Model

Based on Global Matrix Factorization (GMF). GMF address the sparsity problem in Matrix Factorization by combining the response matrix of all campaigns.

GMF construct global matrix Xc which includes user-product data from all campaigns.

The global response matrix is dense and can be factorized under the BPR framework to derive the item and user factors, and , respectively.

Then, given any target campaign, these factors can be used to perform audience selection.

GMF resolves the sparsity issue and gives insight into the general buying pattern of the user. However, GMF do not capture the campaign-specific user preference accurately.

Focused Matrix Factorization Model Focused Matrix Factorization (FMF) was proposed to address the

drawbacks of GMF, which borrow relevant information from other campaign while still retaining focus on the target campaign.

FMF have three focused collaborative filtering models FMF1, FMF2 and FMF3 with varying degrees of sharing between campaign.

Notation for FMF:◦ T denotes the item in the target campaigns for which audience selection is

performed.◦ N denotes the item in the campaigns that are not part of the target.◦ so a set of items .◦ denotes the information that can be shared between the target and non-

target campaign.

Focused Matrix Factorization Model FMF1

◦ The key idea is that instead of learning one set of user preferences like GMF, each user is allowed to have two sets of preferences: one set for the target campaign (focus preferences) and the other for the non-target campaigns (non-focus preferences).

◦ Let and denote the factors for user u for the target and non-target campaigns respectively. To allow information transfer between the target and non-target campaigns, these factors was constrained in the following manner:

◦ The affinity between user u and item i in the FMF1 model is written as:


◦ If the target and non-target items are completely independent and do not share any information, then can be set to 0 and only will be used to compute user affinity for the target items (i.e. the MF model).

◦ On the other hand, if the target and non-target items are completely alike, then the residual vectors and can be set to 0 and the shared vector, , is used for the affinity computation (i.e. the GMF model).

◦ Without loss of generality, FMF1 model can be simplified by setting to 0, so the FMF1 final model is:

◦ Thus in FMF1, .


◦ Consider the similarity between the target and non-target campaigns while sharing information.

◦ To allow this, FMF2 introduce an variable for every non-target campaign j from which we want to borrow (let be the set of items in the non-target campaign j) to control the degree of sharing.

◦ A large positive value for would indicate a large correlation between the target and the non target campaign, a small value indicates the absence of interaction, and a negative value suggests anti-correlation.

◦ The model of FMF2 is:

◦ Thus in FMF2, .


◦ FMF2 model allows borrowing information differently from different campaigns by keeping variables.

◦ This model can further generalized to not only have different but also different for each non-target campaign j. In other words, each non-target campaign has its own residual vector to capture its specificity.

◦ However, this results in too many user factors and can be difficult to estimate in practice. To avoid this, each is set to be . So the FMF3 model looks like:

◦ Thus in FMF3, .◦ is restricted to be in the range [0, 1].

Focused Matrix Factorization Model Leveraging Item Taxonomy

◦ This model performance can be further boosted by leveraging an additional source of information: taxonomy over the products.

◦ The taxonomy provides lineage for a product in terms of the parent categories that it belongs to.

◦ Taxonomy is incorporated using a hierarchical additive model over the item factors.

◦ In particular, item factors is introduced for all nodes in the taxonomy and define the item factors using the following equation:

◦ Where denotes the ancestor for item i.

Model Learning and Inference The output of this model needs to be a ranking function Ri for each

item i in the target campaign T that ranks the users according to who is most likely going to purchase the item.

From the training data, if the condition know that user u1 has bought item i and the another user u2 has not bought item i then

The BPR objective function essentially enforces this criterion for all items and every pair of users within a given item. The log likelihood function for this case is given by:

Model Learning and Inference The wish is to treat the campaign items differently from the non-

target campaign items, i.e., this method penalize the errors on the target campaign much more than the non-target campaigns by using weight A and B for respective terms in the summation. The log-likelihood expression now is given by:

Where ’s denote the regularization constants.

Model Learning and Inference To use SGD, a term from the

summation is sampled, which denote using (i, u1, u2). Depending on whether the item is from the target campaign (i.e., i ϵ T) or from some non-target campaign j (i.e., i ϵ Nj), this model obtain two sets of gradients which are show in the figure.

Implementation This model is developed using C++ for a multi-core implementation and

BOOST library package for storing the factor matrices.

The global state maintained by the SGD algorithm consists of the 3 factor matrices {vS, vN, vI} and the α vector. A lock is introduced for each row in factor matrices.

In the SGD algorithm, in each iteration of training, its execute 3 steps. ◦ The first step, sampling a 3-tuple (i, u1, u2).◦ The second step, read the appropriate user and item factors and compute the

gradients with respect to them.◦ The third step, update the factor matrices based on the gradients.

Using locks over such small vector can result in significant increase in the processing time. To alleviate this problem, caching technique is proposed.

Implementation Caching technique

◦ In this technique, each thread Ti maintains two values in its cache: which is the value with which the current thread started off and which is the locally cached value. The condition where is a given tolerance threshold is maintained.

◦ Whenever exceeds , its reconcile the locally cached copy with the global value in the following manner:

◦ The figure show the illustration of this model implementation.

Experimental Evaluation Experimental Setup

◦ Dataset for evaluation use the log of previous advertising campaigns obtained from a major advertising network.

◦ The dataset contains information about the item corresponding to various advertising campaigns and an anonymized list of users who actually responded to the campaign by making a purchase of the campaign item.

◦ In addition, the dataset that contain a taxonomy over the various item in the campaign.

◦ It contain 50.000 users and around a million items in the taxonomy.◦ The taxonomy dataset contains 3 level deep, with around 1500 nodes at

lowest level, 270 at the middle level and 23 top level categories.◦ Overall, the dataset contain 23 campaigns.

Experimental Evaluation Comparison Systems

◦ The experiment compare the proposed models FMF1, FMF2, FMF3 against the following methods:1. MF: use the basic MF approach.2. GMF: use the basic GMF approach.3. GMF(t), MF(t): Use the above models along with the taxonomy extension.

Metrics◦ Use the area under ROC curve (AUC).◦ AUC is a commonly used metric for testing the quality of rank orderings.◦ The formula to compute AUC:

◦ Where, 1 is the indicator function that is 1 if the condition is satisfied and 0 otherwise. U is for the list of users to rank and B for the ground truth (i.e. the set of users that actually bought the item).

Experimental Evaluation Cross-validation/Parameter Sweep

◦ For each of the experiments, a parameter sweep over MapReduce cluster was executed.

◦ The parameter that sweep over included U, I, N and K, the number of factors.

◦ For each setting of parameter was evaluated in 4 different initializations and picked the best initialization for each configuration, in terms of performance on the validation dataset.

◦ AUC was choose over the test set for a given number of factors to report the experiments.

Experimental Evaluation Experimental Results

◦ The first experiment, GMF, MF and FMF2 technique was compared for different campaigns.

◦ The figure show the result


◦ The second experiment, performance over the individual campaigns was examined.



◦ The third experiment, to examine the best performance model across all factor sizes for each campaign.



◦ The forth experiment, to examine the influence of taxonomy for each model.◦ The figure show the result.


◦ The last experiment, to compare the performance between FMF model.

◦ For each of the four campaigns, FMF1, FMF2 and FMF3 model was trained.

◦ For the figure, the models FMF1 and FMF2 perform much better than FMF3.

◦ The reason is caused by FMF3 have much more constrained than the other two models.


◦ Effect of Campaign Size◦ The figure show the result for

different campaign size◦ The performance of FMF2 model is a

function of the target campaign size, i.e., the number of items in the target campaign.

◦ From the figure, show that the performance of FMF2 is robust and largely unaffected by the campaign size.


◦ Effect of Intra-campaign relationship (Campaign Homogeneity)◦ In this experiment, the performance

of FMF2 models as a function of the homogeneity of the target campaign was explored.

◦ From the figure show that the AUC scores increase as long as the homogeneity of the campaign.


◦ Effect of Inter-campaign relationship (Information Transfer)◦ This experiment explores the effect of

inter-campaign relationship for information transfer in the FMF2 model.

◦ This experiment pick a fairly homogeneous campaign X and split it into two parts X1 and X2. Then picked another campaign Y and constructed two configuration using X1, X2 and Y. X1 become the target campaign, in config 1, X2 become the non-target campaign and in config 2, Y become the non-target campaign.

◦ The figure show that config 1 has higher AUC score than config 2 since config 1 has X2 as the non-target campaign which is highly similar to X1.


◦ The last experiment, to compare the performance between FMF model.

◦ Efficiency◦ In this experiment, the trade-offs

that is obtained by using the caching technique is demonstrated.

◦ The result show in the figure. When the threshold is set to 0, there is complete synchronization. As the threshold is increased, the synchronization with the global copy is performed less often, resulting in faster runtime but less accuracy.

Conclusion This paper propose Focused Matrix Factorization (FMF) model to

appropriately borrow relevant information from other campaign while still retaining focus on the target campaigns.

The experiment result show that FMF model consistently outperforms the traditional matrix factorization techniques over all kinds of campaigns.

In addition, the experiment resulting the character of the conditions which the approach will obtain significant improvements.

Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR...

Documents

Transcript of Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR...