AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical...

AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING

SYSTEM

Yi Hou,

Center for Clinical Investigation (CCI)

EECS Department,

Case Western Reserve University

Review• Motivation

• Lack systematic and automatic ADs/Topic categorizing system

--> no place to specify category

• Social Network Platform Popularity

revenue from Facebook advertising shoot

up 191 percent year-over-year in the first -->

quarter of 2014

Word matters! -->

Tasks• 1) Given all the ADs/topics, establish a word network,

where two words share an edge iff they co-occurred in at least one AD/topic and the edge weight is the counting of the times they have occurred together in an AD/topic.

• Small world, power law distribution

• 2) Given a word network, build a taxonomy T• Modularity based clustering• Top 20 IF-IDF keywords (due to vocabulary issue)• Empirical Network Analysis

• 3) Given a user's current texting information. e.g. the most recent few Tweets/Posts (we initiate the value of 10 here), we are trying to build a ranking model R, where each AD will be ranked based on R and the top-10 ADs will be returned to the user.

Data Source• Data Crawling

• Twitters stream APIs• ruby gem ”twitterstream” • acquired application-only authentication tokens• set up listening point recording global Tweets• only selected 5 categories of ADs/Topics:

Car/Dating/Education/Grocery/Hiring, by keyword filtering

• Manually collected data (experimented on)• only selected 5 categories of ADs/Topics:

Car/Dating/Education/Grocery/Hiring.

Car Dating Education Grocery Hiring Total

1028 0 160 228 45 1461

Car Dating Education Grocery Hiring Total

50 50 50 50 50 250

Method• Data Preprocessing• Build word network• Build topic taxonomy• ADs/topic ranking

Data Preprocessing• Remove Stop Words

• Such as “is”, “are”, “when” …• List from Stanford NLP lab.

• Stemming• Reducing inflected words to their stem, base or root form• Used Porter stemmer at http://text-processing.com/demo/stem/• e.g. “stemming” ”stem”

• Result• Original: “I like data mining. It is awesome.”• New: ”I like data mine It awesom"

Data Visualization• In total, 1104 unique words, with word cloud

representation.

Build Word Network• Co-occurrence Matrix of Words

• co-occurrence counting served as similarity of measurement of word pairs

• co-occurrence matrix served as our adjacent matrix • co-occurrence counting served as the edge weight • Coded in C++.• # of nodes: 1104• # of edges: 18972

Build topic Taxonomy• Modularity-based community finding The algorithm exhaustively search the graph to maximize

the modularity measurement

• Heavily connected component signify the topic models

• Each cluster/topic described by top-K highest TF-IDF keywords

Modularity-based finding• Modularity

• one measure of the structure of networks or graphs

• A measure of goodness of division of a network into sub clusters• Q represents the measure of goodness• C represents sets of clusters• eij stands for number of edges between cluster i, j

• m represents total number of edges

• Reference: • 1. Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne

Lefebvre, Fast unfolding of communities in large networks, in Journal of Statistical Mechanics: Theory and Experiment 2008 (10), P1000

How the algorithm works• Start with all vertices initiated as isolated clusters;

• Successively join clusters with greatest increase ∆Q for modularity measurement;

• Stop the procedure when joining any two clusters will result in ∆Q ≤ 0;

Clustering• We found 13 clusters:

• Visualized: different clusters

with different colors.

Clustering• We found 13 clusters:• Why not 5?• If we zoom in and look

at 2 clusters, yellow and

blue, respectively. We can see that they actually both belong to grocery.

• So actually modularity based clustering categorize words in a better granularity. (Divided grocery into food/electronics…)

•

Clustering• Percentage distribution of 13 clusters:

Clustering• Top 20 TF-IDF keywords in each cluster:• Intuitively:• Cluster 1: chevy, ford cars• Cluster 2: date, single dating• Cluster 3: lunch, friend social (new)• Cluster 4: hire, join hiring• …….• We observed well-defined clusters.• We observed new categories.

Keywords

meet chevi build event open cover 2015 volt hous alway project american everyon truck ford parti convert built boss cruze

date singl see start relationship still peopl pic need matchcom think site area tri women profil ladi facebook reason cant

can best friend wait contract problem bet friday lunch latest colleg success stop celebr 33 tgi appet kickstart budget match

hire join look team help manag us engin come design work social market make media great offic sale softwar summer

onlin appli degre now program today learn new take click tech will earn info offer univers educ applic avail find

like 2014 nissan big dont star go allnew dream murano receiv follow remain rogu 1st fool reveal forev 416 370

get free one just use right fit happi easter http fun zipcar everi galaxi sign 0 fast rate plu low

next top car share save toyota end feedback way honda easi two tip php 5 w read video announc invest

10 time spring someon delici drive busi win idea give pretti cake egg recip chanc never day prepar cook dish

love want lyft check commun safeti coupl visit pink stach zoosk thing 1 person account fuzzi motorcycl send execut amaz

quota s ever memor america everyth onlineonli 3 acceler town name trip hdtv slim 1080 39 rca kid assembl roster

road bring mt king

room mango pineappl collect store add walmart put winter weekend long theater preorder soldier ripen faster paper captain bellalif bag

Empirical Network Analysis• Property definitions:

• Diameter d: the diameter of a network is the largest geodesic distance in the (connected) network.

• Shortest path lu,v: the shortest path between two nodes u and v in the network.

• Average shortest path lnetwork: the average shortest path for every pairs of nodes in the network.

• Power law distribution: node degree distribution follows a power law, at least asymptotically.

• Small world property: small world property holds two conditions, that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ ln(N).

Empirical Network Analysis• Clustering coefficients definitions:

• 1) Global clustering coefficient :• Nt: number of triangles formed in the graph

• Nc: connected triple nodes in the graph

• 2) Local clustering coefficient :• Directed graph: Undirected graph:

• ni: direct neighbors of node I,

• nc: direct connections between i’s direct neighbors

• Averaged over all nodes:

• Reference: “Social network analysis” – by Lada Adamic, University of Michigan

Empirical Network Analysis

• In our experiment, we use local clustering coefficient definition(for undirected graph) , here is the statistics of the experiments.

• The network satisfies small-world property!• Let’s recall:• Small world property: small world property holds two conditions,

that is, 1) high clustering coefficient(as compared to Erdos-Renyi model) and 2) low average shortest path, which means typical distance l between two random nodes grows proportionally to the logarithm of the number of nodes N in the network, as lnetwork ∝ln(N).

Dataset Avg Degree Diameter Avg Path Length

Modularity Avg Clustering Coefficient

ADs/Topic 17.185 5 2.757661 0.499 0.816

Network Diameter

• Betweenness Centrality

Closeness Centrality

Eccentricity

Reference:

1. Ulrik Brandes, A Faster Algorithm for Betweenness Centrality, in Journal of Mathematical Sociology 25(2):163-177, (2001)

Power Law Distribution

• Degree Distribution

In-degree

Out-degree

Power Law Distribution

• The nodes with high degrees satisfy power law distribution.

• The nodes with low degrees don’t.• Because of limit of data, 1104 words in total.

Continue work: ranking

• FB ranking: assign weights for each features.

• But Youtube added randomness to increase recall at the cost of precision.

• Reference:• 1, James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van

Vleet, Ullas Gargi, Sujoy Gupta, Yu He, Mike Lambert, Blake Livingston, Dasarathi Sampath: The YouTube video recommendation system. RecSys 2010: 293-296

Continue work: ranking

• Our ranking: a combination of FB news ranking and Youtube ranking:

• We use cosine similarity to measure which topic cluster the user is most interested in.

• We generate top 8 ADs/Topic by FB ranking algorithm. • And we add two more ADs/Topic by random.• Increase the prediction broadness (increase recall), at the

cost of precision.

Limitation and Future Work• Will perform the system in larger scale dataset.• Since we don’t have real data, e.g. the performance(CTR)

for each AD/topic, we need to generate them based on Gaussian model.

Thank you!

Questions?

AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical...

Documents

Transcript of AN AUTOMATIC ADVERTISEMENT/TOPIC MODELING AND RECOMMENDING SYSTEM Yi Hou, Center for Clinical...