A Generalized Maximum Entropy Approach To Bregman Co Clustering
-
Upload
guest00a636 -
Category
Technology
-
view
1.237 -
download
0
description
Transcript of A Generalized Maximum Entropy Approach To Bregman Co Clustering
![Page 1: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/1.jpg)
Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. ModhaSource : KDD ’04, August 22-25, 2004, ACM, pp. 509- pp.514Presenter : Allen Wu
112/04/09
1
![Page 2: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/2.jpg)
Introduction Bregman divergences Bregman co-clustering Algorithm Experiments Conclusion
112/04/09
2
![Page 3: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/3.jpg)
Information-theoretic co-clustering (ITCC) model the co-clustering problem as the joint probability distribution.
We seek a co-clustering of both dimensions such that loss in “Mutual Information”
is minimized given a fixed no. of row & col. Clusters.
)ˆ;ˆ( - );(min,ˆ
YXIYXIYX
112/04/09
3
![Page 4: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/4.jpg)
The loss in mutual information equals
where
Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y).
)),( || ),((D )ˆ;ˆ( - );( KL yxqyxpYXIYXI
yyxxyypxxpyxpyxq ˆ,ˆ where),ˆ|()ˆ|()ˆ,ˆ(),(
112/04/09
4
![Page 5: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/5.jpg)
0.18 0.18 0.14 0.14 0.18 0.18
0.150.150.150.150.20.2
)ˆ(
)(
)ˆ(
)()ˆ,ˆ()ˆ|()ˆ|()ˆ,ˆ(),(
yp
yp
xp
xpyxpyypxxpyxpyxq
5
0.5 0.5
0.30.30.4
054.05.0
18.0
3.0
15.03.0
112/04/09
![Page 6: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/6.jpg)
6
D(p||q)0.0419
090.0419
090.05696
0.05696
0.03760.04964
1
D(p||q)0.056960.056960.0419
10.0419
10.04964
10.0376
112/04/09
![Page 7: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/7.jpg)
D(p||q)0.0211
80.0211
80.0224
30.04076
50.04893 0.04893
7
D(p||q)0.04813
80.04813
80.04194
20.0229
50.0205
20.0205
2
112/04/09
![Page 8: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/8.jpg)
8
112/04/09
![Page 9: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/9.jpg)
However, the matrix may contain negative entries or a distortion measure other than KL-divergence.
The squared Euclidean distance might be more appropriate.
This paper address the general situation by extending ITCC along three directions. “Nearness” is now measured by any Bregman
divergence. Allow specification of a larger class of constraints. Generalize the maximum entropy approach.
112/04/09
9
![Page 10: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/10.jpg)
112/04/09
10
![Page 11: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/11.jpg)
112/04/09
11
![Page 12: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/12.jpg)
112/04/09
12
![Page 13: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/13.jpg)
112/04/09
13
![Page 14: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/14.jpg)
The objective function is
k
h xh
hk
x1
2
},...,{ 1
min
112/04/09
14
![Page 15: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/15.jpg)
Let ф be a real-valued strictly convex function defined on the convex set S=dom(ф)R, ф is differentiable on int(S), the interior of
S.
The Bregman divergence dф:S ×int(S)[0,∞) is defined as
)(,)()(),( 2212121 zzzzzzzd
112/04/09
15
![Page 16: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/16.jpg)
112/04/09
16
![Page 17: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/17.jpg)
I-Divergence Given zR+, let ф(z) = zlog(z).For z1, z2 R+
Squared Euclidean Distance Given z R, let ф(z) =z2. For z1, z2 R,
)()/log(),( 2121121 zzzzzzzd
22121 )(),( zzzzd
112/04/09
17
![Page 18: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/18.jpg)
Bregman information is defined as the expected Bregman divergence to the expectation. Iф(Z)=E[dф(Z,E[Z])]
I-Divergence Given a real non-negative random variable Z, the
Bregman information is Iф(Z)=E[Zlog(Z/E[Z])]
Squared Euclidean Distance Given any real random variable Z, the Bregman
information is Iф(Z)=E[(Z-E[Z])2]
112/04/09
18
![Page 19: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/19.jpg)
Let (X, Y)~p(X, Y) be jointly distributed random variables with X, Y.
p(X, Y) be written the form of the matrix Z
The quality of the co-clustering can be defined as
)(,][,][],[ ,11 vuuvnm
uv yxpzvuzZ
nv
mu vyYuxX 11 ][},{:;][},{:
),( clustering-co by the determineduniquely is Z where
)ˆ,()]ˆ,([1 1
m
u
n
vuvuvuv zzdzZZdE
112/04/09
19
![Page 20: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/20.jpg)
(,) involves four random variables corresponding to the various partitioning of the matrix Z.
We can obtain different matrix approximations based on the statistics of Z corresponding to the non-trivial combinations of }}ˆ{},ˆ{},{},{},ˆ,ˆ{},,ˆ{},ˆ,{{ VUVUVUVUVU
}ˆ,ˆ,,{ VUVU
}ˆ,ˆ,,{ VUVU
112/04/09
20
![Page 21: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/21.jpg)
(Γ) denotes the class of matrix approximation schemes based on (,).
The set of approximations MA(,,C) consists of all Z’Sm×n.
The “best” approximation Z.
}},ˆ{},ˆ,{{ }},{},{},ˆ,ˆ{{
}}ˆ,ˆ{{ }},ˆ{},ˆ{{
43
21
VUVUCVUVUC
VUCVUC
)]',([minargˆ),,('
ZZdEZCMZ A
112/04/09
21
![Page 22: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/22.jpg)
112/04/09
22
![Page 23: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/23.jpg)
We present brief case studies to demonstrate two salient features. Dimensionality reduction Missing value prediction
112/04/09
23
![Page 24: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/24.jpg)
Clustering interleaved with implicit dimensionality reduction
Superior performance as compared to one-sided clustering
112/04/09
24
![Page 25: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/25.jpg)
Assign zero measure for missing elements, co-cluster and use reconstructed matrix for prediction
Implicit discovery of correlated sub-matrices
112/04/09
25
![Page 26: A Generalized Maximum Entropy Approach To Bregman Co Clustering](https://reader033.fdocuments.in/reader033/viewer/2022061204/547f2a7eb4af9f24688b456c/html5/thumbnails/26.jpg)
The Bregman divergence as the co-clustering loss function. I-divergence and squared Euclidean distance
Approximation models of various complexities are possible depending on the statistics.
The minimum Bregman information principle as a generalization of the maximum entropy principle.
112/04/09
26