Bhattacharyya clustering with applications to mixture ...

26
Mean Exponential Family Application Bhattacharyya clustering with applications to mixture simplifications ICPR 2010, Istanbul, Turkey Frank Nielsen 1,2 Sylvain Boltz 1 Olivier Schwander 1,3 1 ´ Ecole Polytechnique, France 2 Sony Computer Science Laboratories, Japan 3 ´ ENS Cachan, France August, 24 2010 Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Transcript of Bhattacharyya clustering with applications to mixture ...

Page 1: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Bhattacharyya clustering with applications tomixture simplificationsICPR 2010, Istanbul, Turkey

Frank Nielsen1,2 Sylvain Boltz1 Olivier Schwander1,3

1Ecole Polytechnique, France

2Sony Computer Science Laboratories, Japan

3ENS Cachan, France

August, 24 2010

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 2: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

MeanDefinitionBurbea-Rao divergencesBurbea-Rao centroid

Exponential FamilyDefinitionBhattacharyya distanceClosed-form formula

ApplicationStatistical mixturesMixture simplification

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 3: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Introduction

Bhattacharyya distance

I Widely used to compare probability density functions

I Good statistical properties, related to Fisher information

I Measures the overlap between two distributions

Bhattacharyya coefficient

Bc(p, q) =

∫ √p(x)q(x)dx ≤ 1

Bhattacharyya distance

B(p, q) = − log Bc(p, q) ≥ 0

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 4: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Contributions

Drawbacks

I Few closed-form formulaare known

I Centroid estimation onlyfor univariate Gaussian,without guarantees

Results

I Bhattacharyya betweenexponential families, usingBurbea-Rao divergencecs

I Efficient scheme forcentroid

I Application tosimplification of Gaussianmixtures

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 5: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

What is a mean ?

Euclidean geometry

I Given a set of n points {pi},I the center of mass (a.k.a. center of gravity) is

c =1

n

∑i

pi

Unique minimizer of average squared Euclidean distance

c = arg minp

∑i

‖p − pi‖2

Definitions

I By axiomatization

I By optimization

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 6: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Axiomatization

Axioms for a mean function M(x1, x2)

I Reflexivity : M(x , x) = x

I Symmetry : M(x1, x2) = M(x2, x1)

I Continuity : M(·, ·) continuous

I Strict monotonicity : M(x1, x2) < M(x ′1, x2) for x1 < x ′1I Anonymity :

M(M(x11, x12),M(x21, x22)) = M(M(x11, x21),M(x12, x22))

Yields to a unique family

M(x1, x2) = f −1

(f (x1) + f (x2)

2

)with f continuous, strictly monotonous and increasing function

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 7: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Examples and f -representation

Some f -means

I Arithmetic mean : x1+x22 with f (x) = x

I Geometric mean :√

x1x2 with f (x) = log x

I Harmonic mean : 21x1

+ 1x2

with f (x) = 1x

Arithmetic mean on the f -representation

I y = f (x)

I f (x) = 1n

∑i f (xi )

I y = 1n

∑i yi

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 8: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Optimization

Problem

minx

∑i

ωid(x , pi ) = minx

L(x ; ({xi}, {ωi}), d

Entropic mean (Ben-Tal et al., 1989)

I d(p, q) = If (p, q) = pf (qp ) (Csiszar f -divergence)

I f is a strictly convex differentiable function with f (1) = 0 andf ′(1) = 0

Some entropic means

I Arithmetic mean : f (x) = − log x + x − 1

I Geometric mean : f (x) = x log x − x + 1

I Harmonic mean : f (x) = (x − 1)2

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 9: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Bregman means

Bregman divergence

I BF (p, q) = F (p)− F (q) + 〈p − q|∇F (q)〉I F is a strictly convex and differentiable function

Convex problem

I unique minimizer

I c = ∇F−1 (∑

i ωi∇F (xi ))

Since BF is not symmetrical, there is another centroid

I Left-sided one : minx∑

i ωiBF (x , pi )

I Right-sided one : minx∑

i ωiBF (pi , x)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 10: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Burbea-Rao divergence

Based on Jensen inequality for a convex function F

BRF (p, q) =F (p) + F (q)

2− F (

p + q

2) ≥ 0

Special case : Jensen-Shannon divergence

I JS(p, q) = KL(p, p+q2 ) + KL(q, p+q

2 )

I JS(p, q) = H(p+q2 )− H(p)+H(q)

2 − ≥ 0

I H(x) = −F (x) = −x log x (Shannon entropy)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 11: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Symmetrizing Bregman divergences

Jeffreys-Bregman divergence

SF (p, q) =1

2(BF (p, q) + BF (q, p))

=1

2〈p − q|∇F (p)−∇F (q)〉

Jensen-Bregman divergence

JF (p, q) =1

2

(BF (p,

p + q

2) + BF (q,

p + q

2)

)=

F (p) + F (q)

2− F

(p + q

2

)= BRF (p, q)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 12: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Burbea-Rao centroid

Optimization problem

I c = arg minx∑

i ωiBRF (x , pi ) = arg min L(x)

I L(x) ≡ 1

2F (x)︸ ︷︷ ︸

convex

−∑i

ωiF (c + pi

2)︸ ︷︷ ︸

concave

ConCave Convex Procedure (CCCP, NIPS2001)

I iterative scheme

I ∇Lconvex(x (k+1)) = ∇Lconcave(x (k))

I converges to a local minimum

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 13: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

ConCave Convex ProcedurePossible decomposition for function with bounded Hessian

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 14: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBurbea-Rao divergencesBurbea-Rao centroid

Iterative algorithm for Burbea-Rao centroids

Initializationx (0) : center of mass (Bregman right-sided centroid), orsymmetrized KL divergence

Iteration

∇F (x (k+1)) =∑i

ωi∇F

(x (t) + pi

2

)

Centroid

x (t+1) = ∇F−1

(∑i

ωi∇F

(x (t) + pi

2

))

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 15: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Exponential family

Definition

p(x ;λ) = pF (x ; θ) = exp (〈t(x)|θ〉 − F (θ) + k(x))

I λ source parameter

I θ natural parameter

I F (θ) log-normalizer

I k(x) carrier measure

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 16: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Example

Poisson distribution

p(x ;λ) =λx

x!exp(−λ)

I t(x) = x

I θ = log λ

I F (θ) = exp(θ)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 17: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Multivariate normal distribution

Gaussian

p(x ;µ,Σ) =1

2π√

det Σexp

(−(x − µ)tΣ−1(x − µ)

2

)

Exponential family

I θ = (θ1, θ2) =(Σ−1µ, 1

2 Σ−1)

I F (θ) = 14tr(θ−1

1 θ2θT2

)− 1

2 log det θ1 + d2 log π

I t(x) = (x ,−x tx)

I k(x) = 0

Composite vector-matrix inner product

〈θ, θ′〉 = θt1θ′1 + tr(θt2θ

′2)

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 18: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Bhattacharyya distance

Bhattacharyya coefficient

I Amount of overlap between distributions

I Bc(p, q) =∫ √

p(x)q(x)dx

Bhattacharyya distance

I B(p, q) = − log Bc(p, q)

Metrization

I Hellinger-Matusita metric

I H(p, q) =√

1− B(p, q)

I Gives the same Voronoi diagram

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 19: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Closed-form formula

Bc(p, q) =

∫ √p(x)q(x)dx

=

∫exp

(〈t(x),

θp + θq2〉 − F (θp + θq)

2+ k(x)

)dx

= exp

(F

(θp + θq

2

)− F (θp) + F (θq)

2

)> 0

B(p, q) = − log Bc(p, q) = BRF (θp, θq) ≥ 0

Equivalence

I Bhattacharyya between two member of the same EF

I Burbea-Rao between natural parameters using log-normalizer

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 20: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

DefinitionBhattacharyya distanceClosed-form formula

Examples

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 21: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

Gaussian Mixture Models

Mixture

I Pr(X = x) =∑

i ωiPr(X = x |µi ,Σi )

I each Pr(X = x |µi ,Σi ) is a multivariate normal distribution

Soft Clustering

Expectation-Maximization algorithm, equivalent to soft Bregmanclustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 22: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

Statistical imageshttp ://www.informationgeometry.org/MEF/

RGBxy representation : 5D point set

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 23: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

Mixture simplification

Initialization

I Mixture of Gaussians, with Bregman soft clustering (≡ EM)

Simplification

I k-means using Bhattacharyya distance and centroids

Different k

I Hierarchical clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 24: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

Hierarchical clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 25: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

Conclusion

Results

I Symmetrizing Bregman yields Burbea-Rao divergences

I Bhattacharyya between exponential families yields Burbea-Rao

I Closed-form formula for Bhattacharyya between EF

I Efficient scheme for BR centroid using CCCP

Applications

I Simplification of Gaussian Mixture Models

I Hierarchical Clustering

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification

Page 26: Bhattacharyya clustering with applications to mixture ...

MeanExponential Family

Application

Statistical mixturesMixture simplification

References

I Statistical exponential families : A digest with flash cards,F.Nielsen and V. Garcia, arXiv 2009

I An optimal Bhattacharyya centroid algorithm for Gaussianclustering with applications in automatic speech recognition,ICASSP 2000.

I The concave-convex procedure, A. Yuille and A. Rangarajan,Neural Computation, vol. 15, no. 4, pp. 915-936, 2003.

I The Burbea-Rao and Bhattacharyya centroids, F. Nielsen, andS. Boltz, arXiv 2010

www.informationgeometry.org

Nielsen, Boltz, Schwander Bhattacharyya clustering and mixture simplification