Scalable Training of Mixture Models via Coresets

39
Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MI T

description

Scalable Training of Mixture Models via Coresets. Daniel Feldman. Matthew Faulkner. Andreas Krause. MIT. Fitting Mixtures to Massive Data. EM, generally expensive. Weighted EM, fast!. Importance Sample. Coresets for Mixture Models. *. Naïve Uniform Sampling. - PowerPoint PPT Presentation

Transcript of Scalable Training of Mixture Models via Coresets

Page 1: Scalable Training of Mixture Models via  Coresets

Scalable Training of Mixture Models via Coresets

Daniel Feldman

MatthewFaulkner

Andreas Krause

MIT

Page 2: Scalable Training of Mixture Models via  Coresets

Fitting Mixtures to Massive Data

ImportanceSample

EM, generally expensive Weighted EM, fast!

Page 3: Scalable Training of Mixture Models via  Coresets

Coresets for Mixture Models

*

Page 4: Scalable Training of Mixture Models via  Coresets

Naïve Uniform Sampling

4

Page 5: Scalable Training of Mixture Models via  Coresets

5

Naïve Uniform Sampling

Small cluster is missed

Sample a set U of m points uniformly

High variance

Page 6: Scalable Training of Mixture Models via  Coresets

Sampling Distribution

Sampling distribution

Bias sampling towards small clusters

Page 7: Scalable Training of Mixture Models via  Coresets

Importance Weights

WeightsSampling distribution

Page 8: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Iteratively find representative points

8

Page 9: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Sample a small set uniformly at random

9

Iteratively find representative points

Page 10: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

10

Iteratively find representative points

Page 11: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

11

Iteratively find representative points

Page 12: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

12

Iteratively find representative points

Page 13: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

13

Iteratively find representative points

Page 14: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

14

Iteratively find representative points

Page 15: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

15

Iteratively find representative points

Page 16: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

• Remove half the blue points nearest the samples• Sample a small set uniformly at random

16

Small clusters are represented

Iteratively find representative points

Page 17: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Partition data via a Voronoi diagram centered at points17

Page 18: Scalable Training of Mixture Models via  Coresets

Creating a Sampling Distribution

Sampling distribution 18

Points in sparse cells get more massand points far from centers

Page 19: Scalable Training of Mixture Models via  Coresets

Importance Weights

Sampling distribution 19

Points in sparse cells get more massand points far from centers

Weights

Page 20: Scalable Training of Mixture Models via  Coresets

20

Importance Sample

Page 21: Scalable Training of Mixture Models via  Coresets

21

Coresets via Adaptive Sampling

Page 22: Scalable Training of Mixture Models via  Coresets

A General Coreset Framework

Contributions for Mixture Models:

Page 23: Scalable Training of Mixture Models via  Coresets

A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:

23

affine subspace

Page 24: Scalable Training of Mixture Models via  Coresets

Geometric Reduction

Lifts geometric coreset tools to mixture models

Soft-min

Page 25: Scalable Training of Mixture Models via  Coresets

Semi-Spherical Gaussian Mixtures

25

Page 26: Scalable Training of Mixture Models via  Coresets

Extensions and Generalizations

26

Level Sets

Page 27: Scalable Training of Mixture Models via  Coresets

Composition of Coresets

Merge[c.f. Har-Peled, Mazumdar 04]

27

Page 28: Scalable Training of Mixture Models via  Coresets

Composition of Coresets

Compress

Merge[Har-Peled, Mazumdar 04]

28

Page 29: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

29

Page 30: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

30

Page 31: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Compress

Merge[Har-Peled, Mazumdar 04]

31Error grows linearly with number of compressions

Page 32: Scalable Training of Mixture Models via  Coresets

Coresets on Streams

Error grows with height of tree

Page 33: Scalable Training of Mixture Models via  Coresets

33

Coresets in Parallel

Page 34: Scalable Training of Mixture Models via  Coresets

Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.

34

MNIST data:60,000 training,10,000 testing

Page 35: Scalable Training of Mixture Models via  Coresets

35

Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.

T. Siapas et al, Caltech

Page 36: Scalable Training of Mixture Models via  Coresets

36

Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.

CSN Sensors Worldwide

Page 37: Scalable Training of Mixture Models via  Coresets

Learning User Acceleration

37

17-dimensional acceleration feature vectors

Bad

Good

Page 38: Scalable Training of Mixture Models via  Coresets

38

Seismic Anomaly Detection

Bad

Good

GMM used for anomaly detection

Page 39: Scalable Training of Mixture Models via  Coresets

Conclusions

• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets

• Parallel (MapReduce) and Streaming implementations

• Strong empirical performance, enables learning on mobile devices

• GMMs admit coresets of size independent of n - Extensions for other mixture models

39