Plans on “Latent Topic Model”
description
Transcript of Plans on “Latent Topic Model”
Plans on “Latent Topic Model”
High-Level Architecture
Users Ads
UserEncoding
eCTR / FB Prediction
UserClustering
UserEncoding
Prediction
Existing Pipeline
• Encoding– Auto-encoder for dimension reduction– Political affiliation clustering– Output: Hive table (user id + low-dim representation)
• eCTR prediction– Optional: user clustering stage
Approaches to use encoding in eCTR prediction
Social Networks
Information on a social network• Social graph
– Friendship networks– User-ads network ...
• Text– News feed– Messages– Ads text …
• Images – Album– Random posts– Ads figures …
• Demographics – Age, occupation …
• Very high-dimensional• Non-independent • Insufficient training data (this is
true even we use the whole web)• Hard to optimize and interpret
eCTR
Essentials of a good user-ads representation
• Distilling all local attribute semantics– Social roles – Topical contents– Ideology/sentiment
• Capture relational information– long range indirect influence– social environments and contexts
• Capture dynamic trends– e.g., change of strength of interest– New/dying interests
• Discriminative: – optimize against well-defined predictive task rather than vague intermediate
goals such as clustering
• Low dimensional and (perhaps) interpretable
Example:
Proposed Models
…
…
Dynamic tomography
• How to model dynamics in a simplex?
Project an individual/stock in network into a "tomographic" space
Trajectory of an individual/stock in the "tomographic" space
Senate Network: role trajectoriesCluster legendJon Corzine’s seat (#28,
Democrat, New Jersey) was taken over by Bob Menendez from t=5
onwards.
Corzine was especially left-wing, so much that his views did not
align with the majority of Democrats (t=1 to 4).
Once Menendez took over, the latent space vector for senator
#28 shifted towards role 4, corresponding to the main Democratic voting clique.
Jon Corzine’s seat (#28, Democrat, New Jersey) was taken over by Bob Menendez from t=5
onwards.
Corzine was especially left-wing, so much that his views did not
align with the majority of Democrats (t=1 to 4).
Once Menendez took over, the latent space vector for senator
#28 shifted towards role 4, corresponding to the main Democratic voting clique.
Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more
consistent with the Republican party.
Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space
vector includes more of role 3, corresponding to the main Republican
voting clique.
This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,
during which a high proportion of Republicans voted for him.
Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more
consistent with the Republican party.
Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space
vector includes more of role 3, corresponding to the main Republican
voting clique.
This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,
during which a high proportion of Republicans voted for him.
Visualization
•
Visualization
Algorithm Details
Data
Learning System
Given – a network of user/documents
Perform E-step(Gibbs sampling)in parallel way. Get Sufficient Stats
Perform M-stepIn parallel way
Repeat until convergence
Single Program
α, β, η, μα, β, η, μα, β, η, μα, β, η, μ
α, β, η, μα, β, η, μ
zz zz zz zz
Project Plans and Milestones
• Scalable implementation of baseline user text model (M1)
• Discriminative M1
• M1 + network model M2
• M3 + history + time M3
• Parallel work on downstream utility– eCTR prediction– Visualization – User/ads clustering
Resources
• CMU: – First intern Keisuke will come in mid Oct , implementing
M1– Second intern Qirong Hu will come in later Dec,
implementing M2 and M3
• FB:– Rajat Raina– Rong Yang– System support