Machine Learning using Matlab - uni-konstanz.deMatlab Lecture 10 Clustering part 2 Outline Gaussian...

27
Machine Learning using Matlab Lecture 10 Clustering part 2

Transcript of Machine Learning using Matlab - uni-konstanz.deMatlab Lecture 10 Clustering part 2 Outline Gaussian...

  • Machine Learning using Matlab

    Lecture 10 Clustering part 2

  • Outline● Gaussian Mixture Model (GMM)● Expectation-Maximization (EM) ● Mean shift● Mean shift clustering

  • Multivariate Gaussian distribution● The probability density function of multivariate Gaussian/normal distribution is

    given by:

    where m is the mean vector, Si is the covariance matrix, and sig denotes the determinant of the matrix .

    ● Partial derivative to mean and variance

  • Multivariate Gaussian distribution

  • Multivariate Gaussian distribution

  • Multivariate Gaussian distribution

  • Multivariate Gaussian distribution

  • GMM - model representation● Let be the latent variable, and . denotes the

    probability that xi generated from jth Gaussian model (k models in total) ● Gaussian mixture density is given by:

    ● Assume , where . Correspondingly, . ● Gaussian mixture density is rewritten as:

    mixture coefficient

  • GMM● The parameters of GMM are thus● Maximum Likelihood Estimation (MLE) to maximize the following log

    likelihood:

  • GMM● If parameters maximize log likelihood, then we

    have:

  • ● Bayes rule: p(A|B)p(B)=p(B|A)p(A)● Posterior probability:

    ● The mean and variance are computed as:

    GMM

    Soft assignment

  • GMM● We still have an additional constraint:● Introduce the Lagrange multiplier:

    ● Take derivative, we have

  • ● Randomly initialize parameters● Repeat until converge:

    ○ E step. Compute posterior probability :

    ○ M step. Update the parameters using the current :

    GMM algorithm

  • The General EM algorithmGiven a joint distribution p(X,Z|the over observed variables X and the latent variables Z , governed by parameters th, the goal is to maximize the likelihood function p(X|thet with respect to th .

    ● Choose an initial setting for the parameters ● Repeat until converge:

    ○ E step: evaluate p(Z|X,the using current ○ M step: evaluate th given by Z

  • GMM vs. k-means● Randomly initialize parameters● Repeat until converge:

    ○ E step. Compute soft membership, i.e., posterior probability

    ○ M step. Update the parameters using the current soft membership

    ● Randomly initialize k cluster centroids● Repeat until converge

    ○ Assign each data to its closest centroid○ Update the cluster center using its current

    assigned points

  • Mean shift● Mean shift is a procedure for locating the

    maxima of a density function given discrete data sampled from that function (mode-seeking algorithm)

    ● Algorithm:○ Input: random initialize centroid and a fixed window size○ Repeat until converge:

    ■ Compute the mean of points in the window size■ Shift the centroid to the new mean

    ● It is guaranteed to move toward the direction of maximum increase in the density

    ● Application: clustering, tracking, ...

  • Mean shift - model representation● Suppose the current mean is m , and (x1,x2,...,xm) be the data that are in

    the window size h, the mean shift vector is given by:

    ● Mean shift procedure:○ Compute the mean shift vector○ Shift to the new mean

    kernel function

  • Common kernels● Flat kernel:

    ● Gaussian kernel:

    ● ...

  • Properties of Mean shift● Adaptive gradient ascent

    ○ Automatic convergence speed - the mean shift vector size depends on the gradient itself○ Near maxima, the steps are small and refined

    ● Convergence is guaranteed for infinitesimal steps only

  • Mean shift clustering● Attraction basin: the region for which all trajectories lead to the same mode● Clustering: all data points in the attraction basin of a mode

  • Mean shift clusteringAlgorithm

    ● Starting on the data points, run mean shift procedure to find the stationary points of the density function

    ● Prune those points by retaining only the local maxima

  • Image segmentation by mean shift● Find features (color, gradients, texture,

    etc)● Initialize windows at individual pixel

    locations● Perform mean shift for each window until

    converge● Merge windows that end up the same

    mode

  • Segmentation results

    Results from: Mean Shift: A Robust Approach toward Feature Space Analysis, PAMI 2002.

  • Problem: computational complexity

    Slides from Fei-Fei Li

  • Speedups

    Slides from Fei-Fei Li

  • Speedups

    Slides from Fei-Fei Li

  • Summary of mean shift clustering● Pros:

    ○ General, application-independent tool, simple to implement○ Model-free, doesn’t assume any prior shape on data clusters○ Parameters free, only window size h○ Doesn't need to select number of cluster○ Robust to outliers

    ● Cons:○ Output depends on window radius, which is not trivial

    ■ Inappropriate window size can cause modes to be merged, or generate additional “shallow” modes → use adaptive window size

    ○ Computational intensive