Information Theoretic Co Clustering

INFORMATION-THEORETIC CO-

CLUSTERINGAuthors / Inderjit S. Dhillon, Subramanyam Mallela and

Dharmendra S. Modha

Conference / ACM SIGKDD ’03, August 24-27, 2003, Washington

Presenter / Meng-Lun, Wu

OUTLINE Introduction Problem Formulation Co-Clustering Algorithm Experimental Result Conclusions And Future Work

INTRODUCTION (CONT.) Clustering is a fundamental tool in

unsupervised learning. Most clustering algorithms focus on one-

way clustering.

doc Word1 … Wordn

50 12 … 10

52 13 … 0

53 10 … 20

Clustering

doc Word1 … Wordn Cluster

50 12 … 10 Cluster0

52 13 … 0 Cluster1

53 10 … 20 Cluster0

INTRODUCTION (CONT.) It is often desirable to co-cluster or

simultaneously cluster both dimensions. The normalized non-negative

contingency table into a joint probability distribution between two discrete random variables.

The optimal co-clustering is one that leads to the largest mutual information between the clustered random variables.

INTRODUCTION (CONT.) The optimal co-clustering is one that

minimizes the loss in mutual information.

The mutual information of two random variables is a quantity that measures the mutual dependence of the two variables.

Formally, the mutual information can be defined as:

Xx Yy ypxp

yxpyxpYXI

),(log),();(

INTRODUCTION (CONT.) The Kullback-Leibler (K-L) divergence,

measures the difference between two probability distributions.

Given the true probability distribution p(x,y) and another distribution q(x,y) can be defined as:

KL yxq

yxpyxpyxqyxpD

),(log),(),(||),((

PROBLEM FORMULATION Let X and Y be discrete random

variables.X: {x1,…,xm}, Y: {y1,…,yn}

p(X, Y) denote the joint probability distribution.

Let the k clusters of X as: Let the l clusters of Y as: {ŷ1, ŷ2, . . . , ŷl}

)(ˆ}ˆ,...,ˆ,ˆ{},...,,{:

YCYyyyyyyC

XCXxxxxxxC

}ˆ,...,ˆ,ˆ{ 21 kxxx

PROBLEM FORMULATION (CONT.) Definition

An optimal co-clustering minimizes

Subject to constraints on the number of row and column clusters.

For a fixed co-clustering (CX,CY), we can write the loss in mutual information.

)ˆ;ˆ();( YXIYXI

)),(||),(()ˆ;ˆ();( yxqyxpDYXIYXI KL

PROBLEM FORMULATION (CONT.)

)),(||),((),(

),(log),(

)ˆ|()ˆ|()ˆ,ˆ(

),(log),(

)ˆ()(

)ˆ,ˆ(

),(log),(

)ˆ()ˆ()ˆ,ˆ(

),(log),(

)ˆ()ˆ()ˆ,ˆ()()(),(

log),(

ˆ ˆˆ ˆ

yxqyxpDyxq

yxpyxp

yypxxpyxp

yxpyxp

ypxpyxpypxp

yxpyxp

ypxpyxpypxpyxp

KLx y xx yy

x y xx yyx y xx yy

PROBLEM FORMULATION (CONT.)

q(X,Y) is a distribution of the form

0.18 0.18 0.14 0.14 0.18 0.180.150.150.150.150.20.2

)()ˆ,ˆ()ˆ|()ˆ|()ˆ,ˆ(),(

xpyxpyypxxpyxpyxq

0.5 0.5

0.30.30.4

Suppose

054.05.0

15.03.0

CO-CLUSTERING ALGORITHM Input :

The joint probability distribution p(X,Y), k the desired number of row clusters and l the desired number of column clusters.

Output:The partition functions C†

X and C†Y

CO-CLUSTERING ALGORITHM (CONT.)

D(p||q)

0.041909

0.05696

0.0376

0.049641

^x3^x1

D(p||q)

0.05696

0.04191

0.049641

0.0376

^x3^x2

D(p||q)0.0211

80.0211

80.0224

30.0407

650.0489

30.0489

ŷ2 ŷ1

D(p||q)0.04813

80.04813

80.04194

20.0229

50.0205

20.0205

ŷ1 ŷ2

D(p||q)=0.02881

EXPERIMENTAL RESULTS For our experimental results we use

various subsets of the 20-Newsgroup data(NG20).

We use 1D-clustering to denote document clustering without any word clustering.

Evaluation MeasuresMicro-averaged-precision

Micro-averaged-recall

EXPERIMENTAL RESULTS (CONT.)

CONCLUSIONS AND FUTURE WORK The information-theoretic formulation

for co-clustering can be guaranteed to reach a local minimum in a finite number of steps.

Co-clustering for joint distribution of two random variables.

In this paper, the row and column clusters are pre-specified.

We hope that an information-theoretic regularization procedure may allow us to select the number of clusters.

Information Theoretic Co Clustering

Technology

Transcript of Information Theoretic Co Clustering

ITGC: Information-theoretic Grid-based Clustering · 2019-05-13 · EDBT 2019, March 26-29, 2019, Lisbon, Portugal S. Behzadi et al. 2 INFORMATION-THEORETIC GRID-BASED CLUSTERING

Bayesian Co clustering

Incremental collaborative filtering via evolutionary co clustering

Robust Information-theoretic Clustering By C. Bohm, C. Faloutsos, J-Y. Pan, and C. Plant Presenter: Niyati Parikh.

Game Theoretic Models for Social Network Analysissitabhra/meetings/socialnetwork0212/talks/... · Game Theoretic Models for Social Network Analysis ... Clustering coeﬃcient is used

Lecture outline Clustering aggregation – Reference: A. Gionis, H. Mannila, P. Tsaparas: Clustering aggregation, ICDE 2004 Co-clustering (or bi-clustering)

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin

Bayesian Co-clustering for Dyadic Data Analysis

CoClust: A Python Package for Co-clustering

Information Theoretic Methods For Biometrics, Clustering ...

An Automata-Theoretic Approach to Hardware/Software Co ...

Model-Based Co-clustering for Functional Data

Beyond Co-Location: Clustering the Social Economy

A Divisive Information-Theoretic Feature Clustering Algorithm ......Journal of Machine Learning Research 3 (2003) 1265-1287 Submitted 5/02; Published 3/03 A Divisive Information-Theoretic

Idea of Co-Clustering

Finding Metric Structure in Information Theoretic Clusteringcseweb.ucsd.edu/~kamalika/pubs/cm08.pdf · 2011-01-01 · Finding Metric Structure in Information Theoretic Clustering

Co-clustering with augmented data

ITGC: Information-theoretic Grid-based Clusteringeprints.cs.univie.ac.at/5984/1/ITGC_Information... · Thus, we propose an Information-Theoretic Grid-based Clustering (ITGC) algorithm

Collaborative co- clustering across multiple social …fwang1/pdf/MDM16_FengjiaoWang.pdfCollaborative co-clustering across multiple social media Fengjiao Wang Shuyang Lin Philip S.

A Divisive Information-Theoretic Feature Clustering ...jmlr.csail.mit.edu/papers/volume3/dhillon03a/dhillon03a.pdf · Project (), hierarchical classiﬁcation has been studied by