2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social...

29
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida

description

This is the slides I presented at 2013 KDD conference for paper "Multi-Label Relational Neighbor Classification using Social Context Features".

Transcript of 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social...

Page 1: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

Multi-label Relational Neighbor Classification using Social Context Features

Xi Wang and Gita SukthankarDepartment of EECSUniversity of Central Florida

Page 2: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

2

Motivation

The conventional relational classification model focuses on the single-label classification problem.

Real-world relational datasets contain instances associated with multiple labels.

Connections between instances in multi-label networks are driven by various casual reasons.

Example: Scientific collaboration network

Machine Learning

Data MiningArtificial Intelligence

Page 3: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

3

Problem Formulation

Node classification in multi-relational networks Input:

Network structure (i.e., connectivity information) Labels of some actors in the network

Output: Labels of the other actors

Page 4: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

4

Classification in Networked Data

Homophily: nodes with similar labels are more likely to be connected

Markov assumption: The label of one node depends on that of its immediate

neighbors in the graph Relational models are built based on the labels of

neighbors. Predictions are made using collective inference.

Page 5: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

5

Contribution

A new multi-label iterative relational neighbor classifier (SCRN)

Extract social context features using edge clustering to represent a node’s potential group membership

Use of social features boosts classification performance over benchmarks on several real-world collaborative networked datasets

Page 6: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

6

Relational Neighbor Classifier

The Relational Neighbor (RN) classifier proposed by Macskassy et al. (MRDM’03), is a simple relational probabilistic model that makes predictions for a given node based solely on the class labels of its neighbors.

Iteration 1 Iteration 2Training Graph

Page 7: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

7

Relational Neighbor Classifier

Weighted-vote relational neighbor classifier (wvRN) estimates prediction probability as:

Here is the usual normalization factor, and is the weight of the link between node and

ij Nv

jjjiii NcLPvvwz

vcLP )|(),(1

)|(

Page 8: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

8

Apply RN in Multi-relational Network

Ground truth

: nodes with both labels (red, green): nodes with green label only: nodes with red label only

Page 9: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

9

Edge-Based Social Feature Extraction

Connections in human networks are mainly affiliation-driven.

Since each connection can often be regarded as principally resulting from one affiliation, links possess a strong correlation with a single affiliation class.

The edge class information is not readily available in most social media datasets, but an unsupervised clustering algorithm can be applied to partition the edges into disjoint sets (KDD’09,CIKM’09).

Page 10: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

10

Cluster edges using K-Means

Scalable edge clustering method proposed by Tang et al. (CIKM’09).

Each edge is represented in a feature-based format, where each edge is characterized by its adjacent nodes.

K-means clustering is used to separate the edges into groups, and the social feature (SF) vector is constructed based on edge cluster IDs.

Original network

Step1 : Edge representations

Step2: Construct social features

Page 11: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

11

Edge-Clustering Visualization

Figure: A subset of DBLP with 95 instances. Edges are clustered into 10 groups, with each shown in a different color.

Page 12: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

12

Proposed Method: SCRN

The initial set of reference features for class c can be defined as the weighted sum of social feature vectors for nodes known to be in class c:

Then node ’s class propagation probability for class c conditioned on its social features:

Page 13: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

13

SCRN

SCRN estimates the class-membership probability of node belonging to class c using the following equation:

class propagation probability

similarity between connected nodes(link weight)

class probability of its neighbors

Page 14: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

14

SCRN OverviewInput: , Max_IterOutput: for nodes in

1. Construct nodes’ social feature space2. Initialize the class reference vectors for each class3. Calculate the class-propagation probability for each

test node4. Repeat until # of iterations > Max_Iter or predictions

converge Estimate test node’s class probability Update the test node’s class probability in collective inference Update the class reference vectors Re-calculate each node’s class-propagation probability

Page 15: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

15

SCRN Visualization

Figure: SCRN on synthetic multi-label network with 1000 nodes and 32 classes (15 iterations).

Page 16: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

16

DatasetsDBLP

We construct a weighted collaboration network for authors who have published at least 2 papers during the 2000 to 2010 time- frame.

We selected 15 representative conferences in 6 research areas:

DataBase: ICDE,VLDB, PODS, EDBT

Data Mining: KDD, ICDM, SDM, PAKDD

Artificial Intelligence: IJCAI, AAAI

Information Retrieval: SIGIR, ECIR

Computer Vision: CVPR

Machine Learning: ICML, ECML

Page 17: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

17

Datasets

IMDb We extract movies and TV shows released

between 2000 and 2010, and those directed by the same director are linked together.

We only retain movies and TV programs with greater than 5 links.

Each movie can be assigned to a subset of 27 different candidate movie genres in the database such as “Drama", “Comedy", “Documentary" and “Action”.

Page 18: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

18

Datasets

YouTube A subset of data (15000 nodes) from the

original YouTube dataset[1] using snowball sampling.

Each user in YouTube can subscribe to different interest groups and add other users as his/her contacts.

Class labels are 47 interest groups.

[1] http://www.public.asu.edu/~ltang9/social_ dimension.html

Page 19: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

19

Comparative Methods

Edge (EdgeCluster)wvRNPriorRandom

Page 20: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

20

Experiment Setting

Size of social feature space : 1000 for DBLP and YouTube; 10000 for IMDb

Class propagation probability is calculated with the Generalized Histogram Intersection Kernel.

Relaxation Labeling is used in the collective inference framework for SCRN and wvRN.

We assume the number of labels for testing nodes is known.

Page 21: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

21

Experiment Setting

We employ the network cross-validation (NCV) method (KAIS’11) to reduce the overlap between test samples.

Classification performance is evaluated based on Micro-F1, Macro-F1 and Hamming Loss.

Page 22: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

22

Results (Micro-F1)

DBLP

5 10 15 20 25 3010

20

30

40

50

60

70

SCRN

Edge

wvRN

Prior

Random

Training data percentage(%)

Mic

ro-F

1 ac

cura

cy (%

)

Page 23: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

23

Results (Macro-F1)

DBLP

5 10 15 20 25 3010

20

30

40

50

60

70

SCRN

Edge

wvRN

Prior

Random

Training data percentage (%)

Mac

ro-F

1 ac

cura

cy (%

)

Page 24: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

24

Results (Hamming Loss)

DBLP

Page 25: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

25

Results (Hamming Loss)

IMDb

Page 26: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

26

Results (Hamming Loss)YouTube

Page 27: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

27

Conclusion

Links in multi-relational networks are heterogeneous.

SCRN exploits label homophily while simultaneously leveraging social feature similarity through the introduction of class propagation probabilities.

Significantly boosts classification performance on multi-label collaboration networks.

Our open-source implementation of SCRN is available at: http://code.google.com/p/multilabel-classification-on-social-network/

Page 28: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

28

Reference

MACSKASSY, S. A., AND PROVOST, F. A simple relational classifier. In Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM) at KDD, 2003, pp. 64–76.

TANG, L., AND LIU, H. Relational learning via latent social dimensions. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2009, pp. 817–826.

TANG, L., AND LIU, H. Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of International Conference on Information and Knowledge Management (CIKM), 2009, pp. 1107-1116.

NEVILLE, J., GALLAGHER, B., ELIASSI-RAD, T., AND WANG, T. Correcting evaluation bias of relational classifiers with network cross validation. Knowledge and Information Systems (KAIS), 2011, pp. 1–25.

Page 29: 2013 KDD conference presentation--"Multi-Label Relational Neighbor Classification using Social Context Features"

29

Thank you!