cojoint 2

download cojoint 2

of 8

Transcript of cojoint 2

  • 8/12/2019 cojoint 2

    1/8

    A Case Study of Behavior-driven Conjoint Analysison Yahoo! Front Page Today Module

    Wei Chu, Seung-Taek Park, Todd Beaupre, Nitin Motgi, Amit Phadke,

    Seinjuti Chakraborty, Joe Zachariah

    Yahoo! Inc.701 First Avenue

    Sunnyvale, CA 94089{chuwei,parkst,tbeaupre,nmotgi,aphadke,seinjuti,joez}@yahoo-inc.com

    ABSTRACT

    Conjoint analysis is one of the most popular market researchmethodologies for assessing how customers with heteroge-neous preferences appraise various objective characteristicsin products or services, which provides critical inputs formany marketing decisions, e.g. optimal design of new prod-ucts and target market selection. Nowadays it becomes prac-tical in e-commercial applications to collect millions of sam-ples quickly. However, the large-scale data sets make tra-ditional conjoint analysis coupled with sophisticated MonteCarlo simulation for parameter estimation computationallyprohibitive. In this paper, we report a successful large-scalecase study of conjoint analysis on click through stream ina real-world application at Yahoo!. We consider identifyingusers heterogenous preferences from millions of click/viewevents and building predictive models to classify new usersinto segments of distinct behavior pattern. A scalable con-joint analysis technique, known as tensor segmentation, isdeveloped by utilizing logistic tensor regression in standardpartworth framework for solutions. In offline analysis on

    the samples collected from a random bucket of Yahoo! FrontPage Today Module, we compare tensor segmentation againstother segmentation schemes using demographic information,and study user preferences on article content within tensorsegments. Our knowledge acquired in the segmentation re-sults also provides assistance to editors in content manage-ment and user targeting. The usefulness of our approach isfurther verified by the observations in a bucket test launchedin Dec. 2008.

    Categories and Subject Descriptors

    H.1.0 [Models and Principles]: General; H.3.3 [InformationSearch and Retrieval]: Information filtering; H.3.5 [OnlineInformation Services]: Web-based services; H.4 [Information

    Systems Applications]: Miscellaneous

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.KDD09, June 28July 1, 2009, Paris, France.Copyright 2009 ACM 978-1-60558-495-9/09/06 ...$10.00.

    General Terms

    Algorithms, Experiments, Designs, Performance

    Keywords

    Conjoint Analysis, Classification, Clustering, Segmentation,Tensor Product, Logistic Regression

    1. INTRODUCTIONSince the advent of conjoint methods in marketing re-

    search pioneered by Green and Rao [9], research on theoret-ical methodologies and pragmatic issues has thrived. Con-joint analysis is one of the most popular marketing researchmethodologies to assess users preferences on various ob-jective characteristics in products or services. Analysis oftrade-offs, driven by heterogeneous preferences on benefitsderived from product attributes, provides critical input formany marketing decisions, e.g. optimal design of new prod-ucts, target market selection, and product pricing. It is alsoan analytical tool for predicting users plausible reactions tonew products or services.

    In practice, a set of categorical or quantitative attributes iscollected to represent products or services of interest, whilea users preference on a specific attribute is quantified bya utility function (also called partworth function). Whilethere exist several ways to specify a conjoint model, additivemodels that linearly sum up individual partworth functionsare the most popular selection.

    As a measurement technique for quantifying users prefer-ences on product attributes (or partworths), conjoint anal-ysis always consists of a series of steps, including stimu-lus representation, feedback collection and estimation meth-ods. Stimulus representation involves development of stim-uli based on a number of salient attributes (hypothetical

    profiles or choice sets) and presentation of stimuli to appro-priate respondents. Based on the nature of users responseto the stimuli, popular conjoint analysis approaches are ei-ther choice-based or ratings-based. Recent developmentsof estimation methods comprise hierarchical Bayesian (HB)methods [15], polyhedral adaptive estimation [19], SupportVector Machines [2, 7] etc.

    New challenges on conjoint analysis techniques arise fromemerging personalized services on the Internet. A well-knownWeb site can easily attract millions of users daily. The Webcontent is intentionally programmed that caters to users

  • 8/12/2019 cojoint 2

    2/8

    needs. As a new kind of stimuli, the content could be a hy-perlink with text or an image with caption. The traditionalstimuli, such as questionnaires with itemized answers, areseldom applied on web pages, due to attention burden on theuser side. Millions of responses, such as click or non-click, tocontent stimuli are collected at much lower cost comparedto the feedback solicited in traditional practice, but the im-plicit feedback without incentive-compatibility constraintsare potentially noisy and more difficult to interpret. Al-

    though indispensable elements in the traditional settings ofconjoint analysis have been changed greatly, conjoint anal-ysis is still particularly important in identifying the mostappropriate users at the right time, and then optimizingavailable content to improve user satisfaction and retention.

    We summarize three main differences between Web-basedconjoint analysis and the traditional one in the following:

    The Web content may have various stimuli that poten-tially contain many psychologically related attributes,rather than predefined attributes of interest in tradi-tional experimental design. Meanwhile, most of usersare casual or new visitors who declare part or noneof their personal information and interests. Since wehave to extract attributes or discover latent features in

    profiling both content stimuli and users, parameter es-timation methods become more challenging than thatin the traditional situation;

    In feedback collection, most of respondents havent ex-perienced strong incentives to expend their cognitiveresources on the prominent but unsolicited content.This issue causes a relatively high rate of false nega-tive (false non-click);

    The sample size considered in traditional conjoint anal-ysis is usually less than a thousand, whereas it is com-mon in modern e-business applications to observe mil-lions of responses in a short time, e.g. in a few hours.The large-scale data sets make the traditional conjoint

    analysis coupled with sophisticated Monte Carlo sim-ulation for parameter estimation computationally pro-hibitive.

    In this paper, we conduct a case study of conjoint analy-sis on click through stream to understand users intentions.We construct features to represent the Web content, andcollect user information across the Yahoo! network. Thepartworth function is optimized in tensor regression frame-work via gradient descent methods on large scale samples.In the partworth space, we apply clustering algorithms toidentifying meaningful segments with distinct behavior pat-tern. These segments result in significant CTR lift over boththe unsegmented baseline and two demographic segmenta-tion methods in offline and online tests on the Yahoo! Front

    Page Today Module application. Also by analyzing charac-teristics of user segments, we obtain interesting insight ofusers intention and behavior that could be applied for mar-ket campaigns and user targeting. The knowledge could befurther utilized to help editors for content management.

    The paper is organized as follows: In Section 2, we de-lineate the scenario under consideration by introducing theToday Module application at Yahoo! Front Page; In Sec-tion 3, we review related work in literature; In Section 4,we present tensor segmentation in detail. We report exper-imental results in Section 5 and conclude in Section 6.

    Figure 1: A snapshot of the default Featured tabin the Today Module on Yahoo! Front Page, de-lineated by the rectangular. There are four articlesdisplayed at footer positions, indexed by F1, F2, F3and F4. One of the four articles is highlighted atthe story position. At default, the article at F1 ishighlighted at the story position.

    2. PROBLEM SETTINGIn this section, we first describe our problem domain and

    our motivations for this research work. Then we describeour data set and define some notations.

    2.1 Today ModuleToday Module is the most prominent panel on Yahoo!

    Front Page, which is also one of the most popular pageson the Internet, see a snapshot in Figure 1. The defaultFeatured tab in Today Module highlights one of four high-quality articles selected from a daily-refreshed article poolcurated by human editors. As illustrated in Figure 1, there

    are four articles at footer positions, indexed by F1, F2, F3and F4 respectively. Each article is represented by a smallpicture and a title. One of the four articles is highlighted atthe story position, which is featured by a large picture, a titleand a short summary along with related links. At default,the article at F1 is highlighted at the story position. A usercan click on the highlighted article at the story position toread more details if the user is interested in the article. Theevent is recorded as a story click. If a user is interested inthe articles at F2F4 positions, she can highlight the articleat the story position by clicking on the footer position.

    A pool of articles is maintained by human editors that in-carnates the voice of the site, i.e., the desired nature andmix of various content. Fresh stories and breaking news areregularly acquired by the editors to replace out-of-date ar-ticles every a few hours. The life time of articles is short,usually just a few hours, and the popularity of articles, mea-sured by their click-through rate (CTR)1, changes over time.Yahoo! Front Page serves millions of visitors daily. Eachvisit generates a view event on the Today Module, thoughthe visitor may not pay any attention to the Today Mod-ule. The users of the Today Module, a subset of the whole

    1CTR of an article is measured by the total number of clickson the article divided by total number of views on the articlein a certain time interval.

  • 8/12/2019 cojoint 2

    3/8

    traffic, also generate a large amount of story click eventsby clicking at story position for more details of the storiesthey would like to read. Our setting is featured by dynamiccharacteristics of articles and users. Scalability is also animportant requirement in our system design.

    One of our goals is to increase user activities, measured byoverall CTR, on the Today Module. To draw visitors atten-tion and increase the number of clicks, we would like to rankavailable articles according to visitors interests, and to high-

    light the most attractive article at the F1 position. In ourprevious research [1] we developed an Estimated Most Pop-ular algorithm (EMP), which estimates CTR of availablearticles in near real-time by a Kalman filter, and presentsthe article of the highest estimated CTR at the F1 position.Note that there is no personalized service in that system.i.e. the article shown at F1 is the same to all visitors ata given time. In this work we would like to further boostoverall CTR by launching a partially personalized service.User segments which are determined by conjoint analysiswill be served with different content according to segmentalinterests. Articles with the highest segmental CTR will beserved to user segments respectively.

    In addition to optimizing overall CTR, another goal of this

    study is to understand users intention and behavior to someextend for user targeting and market campaigns. Once weidentify users who share similar interests in conjoint analy-sis, predictive models can be built to classify users (includingnew visitors) into segments. For example, if we find a usersegment who like Car & Transportation articles, then wecan target this user segment for Autos marketing campaigns.Furthermore, with knowledge of user interests in segments,we can provide assistance to the editors for content manage-ment. For example, if we know that most users in a segmentwho like articles of News visit us in the morning while mostusers in another segment who like articles about TV comein the evening. Then editors can target these two segmentsby simply programming more News articles in the morningand more TV-related articles in the evening.

    Note that if we only consider the first goal of maximizingoverall CTR, personalized recommender systems at individ-ual level might be an appropriate approach to pursue. In ourrecent research [4], we observed that feature-based modelsyield higher CTR than conjoint models in offline analysis.However, conjoint models significantly outperform the one-size-fits-all EMP approach on the metric of CTR, and alsoprovide actionable management on content and user target-ing at segment level. The flexibility supplied by conjointmodels is valuable to portals such as Yahoo!.

    2.2 Data CollectionWe collected three sets of data, including content features,

    user profiles and interactive data between users and articles.

    Each article is summarized by a set of features, such astopic categories, sub-topics, URL resources, etc. Each visi-tor is profiled by a set of attributes as well, e.g. age, gender,residential location, Yahoo! property usage, etc. Here wesimply selected a set of informative attributes to representusers and articles. Gauch et al. [8] gave an extensive reviewon various profiling techniques.

    The interaction data consists of two types of actions: viewonly or story click for each pair of a visitor and an article.One visitor can only view a small portion of available arti-cles, and it is also possible that one article is shown to the

    same visitor more than one time. It is difficult to detectwhether the visitors have paid enough attention to the ar-ticle at the story position. Thus in data collection a largeamount of view events are false non-click events. In anotherwords, there is a relatively high rate of false negative (falsenon-click) in our observations.

    There are multiple treatments on users reactions in mod-eling the partworth utility, such as

    Choice-based responses: We only consider whether anarticle has been clicked by a visitor, while ignoringrepeated views and clicks. In this case, an observedresponse is simply binary, click or not;

    Poisson-based responses: The number of clicks we ob-served on each article/user pair is considered as a re-alization from a Poisson distribution;

    Metric-based responses: We consider repeated viewsand clicks and treat CTR of articles by each user astarget.

    In the Today Module setting, Poisson-based and metric-based responses might be vulnerable by the high rate of falsenegative observations. Thus we follow the choice-based re-sponses only in this work.

    2.3 NotationsLet index the i-th user as xi, a D1 vector of user fea-

    tures, and thej-th content item as zj , aC1 vector of articlefeatures. We denote by rij the interaction between the userxi and the item zj , where rij {1,+1} for view eventand story click event respectively. We only observe inter-actions on a small subset of all possible user/article pairs,and denote by

    the set of observations{rij}.

    3. RELATED WORKIn very early studies [21], homogeneous groups of con-

    sumers are entailed by a priori segmentation. For exam-ple, consumers are assigned to groups on the basis of demo-graphic and socioeconomic variables, and the conjoint mod-els are estimated within each of those groups. Clearly, thecriteria in the two steps are not necessarily related: oneis the homogeneity of customers in terms of their descrip-tive variables and another is the conjoint preference withinsegments. However, this segmentation strategy is easy toimplement, which is still widely applied in industry.

    Traditionally, conjoint analysis procedures are of two-stage:1) estimating a partworth utility function in terms of at-tributes which represents customers preference at individual-level, e.g. via ordinary least squares regression; 2) if seg-mentation is of interest to marketing, through hierarchi-cal or non-hierarchical clustering algorithms, customers are

    grouped into segments where people share similar individual-level partworths.

    Conjoint studies, especially on the partworth function, de-pend on designs of stimuli (e.g. product profiles or choicessets on questionnaires) and methods of data collection fromrespondents. One of challenges in the traditional conjointanalysis is to obtain sufficient data from respondents to es-timate partworth utilities at individual level using relativelyfew questions. The theory of experimental design is adaptedfor constructing compact profiles to evaluate respondentsopinion effectively. Kuhfeld et al. [14] studied orthogonal

  • 8/12/2019 cojoint 2

    4/8

    designs for linear models. Huber and Zwerina [11] broughttwo additional properties, minimal level overlap and util-ity balance, into choice-based conjoint experiments. Sandorand Wedel [17] developed experimental designs by utilizingprior information. More references are reviewed by Rao [16]recently.

    Hierarchical Bayesian (HB) methods [15] are developed toexploit partworth information of all respondents in modelingthe partworth function. The HB models relate the variation

    in a subjects metric-based responses and the variation inthe subjects partworths over the population as follows:

    rij =

    i zj + ij i, j (1)

    i = Wxi+ i for i= 1, . . . , n (2)

    where i is a C-dimensional partworth vector of the user

    i and W is a matrix of regression coefficients that relatesuser attributes to partworths. The error terms{ij} and{ i} in eq(1) and eq(2) are assumed to be mutually in-dependent and Gaussian random variables with zero meanand covariance matrix{2j } and respectively, i.e. ijN(0, 2j ) and i N(0,).2 Together with appropriateprior distributions over the remaining variables, the poste-rior analysis yields the HB estimator of the partworths as aconvex function of an individual-level estimator and a pooledestimator, in which the weights mainly depend on the noiselevels{2j } and in eq(1) and eq(2). The estimation onthese noise levels usually involves the Monte Carlo simu-lation, such as the Gibbs sampling or Metropolis-Hastingsalgorithms [20], which is computationally infeasible for ourapplications with millions of samples.

    Huber and Train [10] compared the estimates obtainedfrom the HB methods with those from classical maximumsimulated likelihood methods, and found the average of theexpected partworths to be almost identical for the two meth-ods in some applications on electricity suppliers. In the pastdecade, integrated conjoint analysis methods have emerged

    that simultaneously segment the market and estimate segment-level partworths, e.g. the finite mixture conjoint models [6].The study conducted by [20] shows the finite mixture con-joint models performed well at segment-level, compared withthe HB models.

    Toubia et al. [19] developed a fast polyhedral adaptiveconjoint analysis method that greatly reduces respondentburden in parameter estimation. They employinterior pointmathematical programming to select salient questions to re-spondents that narrow the feasible region of the partworthvalues as fast as possible.

    A recently developed technique on parameter estimationis based on the ideas from statistical learning and supportvector machines. The choice-based data can be translated asa set of inequalities that compare the utilities between two

    selected items. Evgeniou et al. [7] formulated the partworthlearning as a convex optimization problem in regularizationframework. Chapelle and Harchaoui [2] proposed two ma-chine learning algorithms that efficiently estimate conjointmodels from pairwise preference data.

    Jiang and Tuzhilin [12] experimentally demonstrated both1-to-1 personalization and segmentation approaches signifi-cantly outperform aggregate modeling. Chu and Park [4] re-

    2N(, 2) denotes a Gaussian distribution with mean andvariance 2.

    cently proposed a feature-based model for personalized ser-vice at individual level. On the Today Module application,the 1-to-1 personalized model outperforms several segmen-tation models in offline analysis. However, conjoint modelsare still indispensable components in our system, because ofthe valuable insight on user intention and tangible controlon both content and user sides at segment level.

    4. TENSOR SEGMENTATIONIn this section, we employ logistic tensor regression cou-

    pled with efficient gradient-descent methods to estimate thepartworth function conjointly on large data sets. In theusers partworth space, we further apply clustering tech-niques to segmenting users. Note that we consider the casesof millions of users and thousands of articles. The numberof observed interactions between user/article pairs could betens of million.

    4.1 Tensor IndicatorWe first define an indicator as a parametric function of the

    tensor product of both article featureszj and user attributesxi as follows:

    sij =

    C

    a=1

    D

    b=1

    xi,bwabzj,a, (3)

    where D and Care the dimensionality of user and contentfeatures respectively,zj,adenotes thea-th feature ofzj, andxi,b denotes the b-th feature ofxi. The weight variable wabis independent of user and content features, which representsaffinity of these two features xi,b andzj,a in interactions. Inmatrix form, eq(3) can be rewritten as

    sij =xi W zj ,

    where W denotes a DCmatrix with entries{wab}. Thepartworths of the user xi on article attributes is evaluatedas Wxi, denoted as xi, a vector of the same length ofzj .

    The tensor product above, also known as a bilinear model,

    can be regarded as a special case in the Tucker family [5],which have been extensively studied in literature and appli-cations. For example, Tenenbaum and Freeman [18] devel-oped a bilinear model for separating style and contentin images, and recently Chu and Ghahramani [3] deriveda probabilistic framework of the Tucker family for mod-eling structural dependency from partially observed high-dimensional array data.

    The tensor indicator is closely related to the traditionalHB models as in eq(1) and eq(2). Respondent heterogene-ity is assumed either to be randomly distributed or to beconstrained by attributes measured at individual level.

    4.2 Logistic Regression

    Conventionally the tensor indicator is related to an ob-served binary event by a logistic function. In our particularapplication, we found three additions in need:

    User-specific bias: Users activity levels are quite dif-ferent. Some are active clickers, while some might becasual users. We introduce a bias term for each user,denoted as i for user i;

    Article-specific bias: Articles have different popularity.We have a bias term for each article as well, denotedas j for article j ;

  • 8/12/2019 cojoint 2

    5/8

    Global offset: Since the number of click events is muchsmaller than that of view events in our observations,the classification problem is heavily imbalanced. Thuswe introduce a global offset to take this situation intoaccount.

    Based on the considerations above, the logistic function isdefined as follows,

    p(rij |sij) = 1

    1 + exp(rij sij+ )where sij =sijij and sij is defined as in eq(3).

    Together with a standard Gaussian prior on the coeffi-cients{wab}, i.e. wab N(0, c),3 the maximum a posterioriestimate of{wab} is obtained by solving the following opti-mization problem,

    minW,,

    1

    2 c

    D

    a=1

    C

    b=1

    w2ab

    ij

    logp(rij|sij).

    where denotes{i} and denotes{i}. We employ agradient descent package to find the optimal solution. Thegradient with respect towab is given by

    1cwab

    ij

    rijxi,bzj,a(1p(rij|sij)),

    and the gradient with respect to the bias terms can be de-rived similarly. The model parameter c is determined bycross validation.

    4.3 ClusteringWith the optimal coefficients W in hand, we compute the

    partworths for each training user byxi = Wxt. The vector

    xi represents the users preferences on article attributes.In the partworth space spanned by{xi}, we further apply

    a clustering technique, e.g. K-means [13], to classify trainingusers having similar preferences into segments. The numberof clusters can be determined by validation in offline analy-

    sis.For an existing or new user, we can predict her partworths

    byxt= Wxt, wherextis the vector of user features. Then

    her segment membership can be determined by the shortestdistance between the partworth vector and the centroids ofclusters, i.e.

    argmink

    xt ok (4)

    where{ok} denote the centroids obtained in clustering.

    5. EXPERIMENTSWe collected events from a random bucket in July, 2008

    for training and validation. In the random bucket, articlesare randomly selected from a content pool to serve users.An event records a users action on the article at the F1position, which is either view or story click encoded as1 and +1 respectively. We also collected events from arandom bucket in September 2008 for test.

    Note that a user may view or click on the same articlemultiple times but at different time stamps. In our train-ing, repeated events were treated as a single event. Thedistinct events were indexed in triplet format, i.e. (user id,article id, click/view).

    3Appropriate priors can be specified to bias terms too.

    We split the July random bucket data by a time stampthreshold for training and validation. There are 37.806 mil-lion click and view events generated by 4.635 million usersbefore the time stamp threshold, and 0.604 million clickevents happened after the time stamp for validation. Inthe September test data, there are about 1.784 million clickevents.

    The features of users and items were selected bysupport.The support of a feature means the number of samples

    having the feature. We only selected the features of highsupport above a prefixed threshold, e.g. 5% of the popula-tion. Each user is represented by a vector of 1192 categoricalfeatures, which include:

    Demographic information: gender (2 classes) and agediscretized into 10 classes;

    Geographical features: 191 locations of countries andUS States;

    Behavioral categories: 989 binary categories that sum-marize consumption behavior of a user within Yahoo!properties.

    Each article is profiled by a vector of 81 static features.

    The 81 static features include:

    URL categories: 43 classes inferred from the URL ofthe article resource;

    Editor categories: 38 topics tagged by human editorsto summarize the article content.

    Categorical features are encoded as binary vectors withnon-zero indicators. For example, gender is translated intotwo binary features, i.e., maleis encoded as [0, 1], femaleis encoded as [1, 0] and unknown is [0, 0]. As the num-ber of non-zero entries in a binary feature vector varies, wefurther normalized each vector into unit length, i.e., non-zero entries in the normalized vector are replaced by 1/

    k,

    where k is the number of non-zero entries. For article fea-tures, we normalized URL and Editor categories together.For user features, we normalized behavioral categories andthe remaining features (age, gender and location) separately,due to the variable length of behavioral categories per user.Following conventional treatment, we also augmented eachfeature vector with a constant attribute 1. Each contentitem is finally represented by a feature vector of 82 entries,while each user is represented by a feature vector of 1193entries.

    5.1 Offline AnalysisFor each user in test, we computed her membership first

    as in eq(4), and sorted all available articles in descending or-der according to their CTR in the test users segment at the

    time stamp of the event. On click events, we measured therank position of the article being clicked by the user. Theperformance metric we used in offline analysis is the num-ber of clicks in top four rank positions. A good predictivemodel should have more clicks on top-ranked positions. Wecomputed the click portion at each of the top four rank po-sitions in predictive ranking, i.e. # of clicks at a position

    # of all clicks , and

    the metric lift over the baseline model, which is defined asclick portion - baseline click portion

    baseline click portion .

    We trained our logistic regression models on the trainingdata with different values of the trade-off parameter c where

  • 8/12/2019 cojoint 2

    6/8

    0 2 4 6 8 10 12 14 16 18 200.124

    0.125

    0.126

    0.127

    0.128

    0.129

    0.13

    0.131

    0.132

    cluster number

    %

    ofclicksatthetoprankposition

    Figure 2: Click portion at the top rank position withdifferent cluster number on the July validation data.

    c {0.01, 0.1, 1, 10, 100} , and examined their performance(click portion at the top position) on the validation data set.We found that c = 1 gives the best validation p erformance.

    Using the model with c = 1, we run K-means cluster-ing [13] on the partworth vectors of training users computedas in eq(4) to group users with similar preferences into clus-ters. We varied the number of clusters from 1 to 20, andpresented the corresponding results of the click portion atthe top rank position in Figure 2. Note that the CTR esti-mation within segments suffers from low-traffic issues whenthe number of segments is large. We observed the best val-idation performance at 8 clusters, but the difference com-pared with that at 5 clusters is not statistically significant.Thus we selected 5 clusters in our application.

    To verify the stability of the clusters we found in the Julydata, we further tested on the random bucket data collectedin September 2008. The EMP approach, described as inSection 2.1, was utilized as the baseline model. We also im-plemented two demographic segmentations for comparison

    purpose:

    Gender: 3 clusters defined by users gender,{male,female, unknown};

    AgeGender: 11 clusters defined as{49, male,49, female,unknown}.

    We estimated the article CTR within segments by the sametechnique [1] used in the EMP approach. A user will beserved with the most popular article in the segment whichshe belongs to.

    We computed the lifts over the EMP baseline approach

    and presented the results of the top 4 rank positions in Fig-ure 3. All segmentation approaches outperform the baselinemodel, the EMP unsegmented approach. AgeGender seg-mentation having 11 clusters works better than Gender seg-mentation with 3 clusters in our study. Tensor segmentationwith 5 clusters consistently gives more lift than Gender andAgeGender at all the top 4 positions.

    5.1.1 Segment Analysis

    We collected some characteristics in the 5 segments wediscovered. On the September data, we identified cluster

    Figure 3: A comparison on lift at top 4 rank posi-tions in the offline analysis on the September data.The baseline model is the EMP unsegmented ap-proach.

    16%

    10%

    22%

    32%

    20%

    c1

    c2

    c3

    c4

    c5

    Figure 4: Population distribution in the 5 segments.

    membership for all users, and plotted the population distri-bution in the 5 segments as a pie chart in Figure 4. Thelargest cluster takes 32% of users, while the smallest cluster

    contains 10% of users. We further studied the user composi-tion of the 5 clusters with popular demographic categories,and presented the results in Figure 5 as a Hinton graph. Wefound that

    Cluster c1 is of mostly female users under age 34; Cluster c2 is of mostly male users under age 44; Cluster c3 is for female users above age 30; Cluster c4 is of mainly male users above age 35; Cluster c5 is predominantly non-U.S. users.

    We also observed that c1 and c2 contains a small portion ofusers above age 55, and c3 has some young female users as

    well. Here, cluster membership is not solely determined bydemographic information, though the demographic informa-tion gives a very strong signal. It is users behavior (clickpattern) that reveals users interest on article topics.

    We utilized the centroid of each cluster as a representa-tive to illustrate users preferences on article topics. Thecentroids in the space of article topics are presented in Fig-ure 6 as a heatmap graph. The gray level indicates userspreference, from like (white) to dislike (black). We foundthe following favorite and unattractive topics by comparingthe representatives scores across segments:

  • 8/12/2019 cojoint 2

    7/8

    1 2 3 4 5 6 7 8 9 10 11 12 13

    c1

    c2

    c3

    c4

    c5

    demographic information

    cluster

    Male

    Female

    Age>64

    Age55~64

    Age45~54

    Age35~44

    Age30~34

    Age25~29

    Age21~24

    Age18~21

    Age13~18

    Age