Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical...

Principal Component Analysis

Machine Learning

Last Time

• Expectation Maximization in Graphical Models– Baum Welch

• Unsupervised Dimensionality Reduction

Curse of Dimensionality

• In (nearly) all modeling approaches, more features (dimensions) require (a lot) more data – Typically exponential in the number of features

• This is clearly seen from filling a probability table.

• Topological arguments are also made.– Compare the volume of an inscribed hypersphere

to a hypercube

Dimensionality Reduction

• We’ve already seen some of this.

• Regularization attempts to reduce the number of effective features used in linear and logistic regression classifiers

Linear Models

• When we regularize, we optimize a function that ignores as many features as possible.

• The “effective” number of dimensions is much smaller than D

Support Vector Machines

• In exemplar approaches (SVM, k-nn) each data point can be considered to describe a dimension.

• By selecting only those instances that maximize the margin (setting α to zero), SVMs use only a subset of available dimensions in their decision making.

Decision Trees

• Decision Trees explicitly select split points based on features that improve InformationGain or Accuracy

• Features that don’t contribute to the classification sufficiently are never used.

weight

5M height

5F 1F / 1M

Feature Spaces

• Even though a data point is described in terms of N features, this may not be the most compact representation of the feature space

• Even classifiers that try to use a smaller effective feature space can suffer from the curse-of-dimensionality

• If a feature has some discriminative power, the dimension may remain in the effective set.

1-d data in a 2-d world

0 0.020.040.060.08 0.1 0.120.14249.6249.8

250250.2250.4250.6250.8

251251.2251.4

Dimensions of high variance

Identifying dimensions of variance

• Assumption: directions that show high variance represent the appropriate/useful dimension to represent the feature set.

Aside: Normalization

• Assume 2 features:– Percentile GPA– Height in cm.

• Which dimension shows greater variability?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1235

• Assume 2 features:– Percentile GPA– Height in cm.

0 5 10 15 20 25 30235

• Assume 2 features:– Percentile GPA– Height in m.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

Principal Component Analysis

• Principal Component Analysis (PCA) identifies the dimensions of greatest variance of a set of data.

Eigenvectors

• Eigenvectors are orthogonal vectors that define a space, the eigenspace.

• Any data point can be described as a linear combination of eigenvectors.

• Eigenvectors of a square matrix have the following property.

• The associated lambda is the eigenvalue.

• Write each data point in this new space

• To do the dimensionality reduction, keep C < D dimensions.

• Each data point is now represented as a vector of c’s.

Identifying Eigenvectors

• PCA is easy once we have eigenvectors and the mean.

• Identifying the mean is easy.• Eigenvectors of the covariance matrix,

represent a set of direction of variance.• Eigenvalues represent the degree of the

variance.

Eigenvectors of the Covariance Matrix

• Eigenvectors are orthonormal• In the eigenspace, the Gaussian is diagonal – zero

covariance.• All eigen values are non-negative.• Eigenvalues are sorted.• Larger eigenvalues, higher variance

Dimensionality reduction with PCA

• To convert from an original data point to PCA

• To reconstruct a point

Eigenfaces

Encoded then Decoded.

Efficiency can be evaluatedwith Absolute or Squared error

Some other (unsupervised) dimensionality reduction techniques

• Kernel PCA• Distance Preserving Dimension Reduction• Maximum Variance Unfolding• Multi Dimensional Scaling (MDS)• Isomap

• Next Time– Model Adaptation and Semi-supervised

Techniques• Work on your projects.

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical...

Documents

Transcript of Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical...

BAUM IRON HIGH PERFORMANCE BAUM IRON CHOP SAW …

Extended Baum-Welch algorithm

HMM: Hidden Markov Models - coling.epfl.ch · HMM: Hidden Markov Models Jean-Cedric Chappelier´ ... Baum-Welch algorithm LIA I&C Introduction to Natural Language Processing (CS-431)

Hidden Markov Models - Universität Potsdam · anck: Sprachtechnologie 28 Baum-Welch-Algorithmus Wenn die Zustände der Beobachtungen bekannt wären, könnte man die Parameter durch

Expectation- Maximization & Baum-Welchrshamir/algmb/presentations/EM-BW-Ron-16 .pdfBaum-Welch: EM for HMM 1 1 1 ()log( ()) log M M M k k kl kl k b k l Eb eb A a chosen ij ' (denote

Pengenalan Kata Berbahasa Indonesia dengan Hidden Markov ... · Hidden Markov Model (HMM) menggunakan Algoritme Baum-Welch . Agus Buono, Arief Ramadhan, RuvilU1a . Departemen llmu

Hidden Markov Models : A Continuous-Time Version of the ... · Hidden Markov Models : A Continuous-Time Version of the Baum-Welch Algorithm by Mohamed ZRAIAA (M.Z) Submitted in partial

Hidden Markov Models in the context of genetic analysisrmjbale/Stat/HMM.pdf · Forward/backward Baum-Welch algorithm Viterbi algorithm 3 When the parameters are unknown 4 Two applications

30. Handwritten English Character Recognition using HMM and … · 2017-10-27 · Handwritten English Character Recognition using HMM, Baum-Welch and Genetic Algorithm Ravindra Nath2

Biostatistics 615/815 Lecture 23: The Baum-Welch Algorithm ... · Welch, ”Hidden Markov Models and The Baum Welch Algorithm”, IEEE Information Theory Society News Letter, Dec

Baum (Test)

Robert Baum

The Hierarchical Hidden Markov Model: Analysis and ...1007469218079.pdf · The Hierarchical Hidden Markov Model: Analysis and Applications ... standard Baum-Welch (forward-backward)

Efcient Computation of Closed Contours using Modied Baum ...elderlab.yorku.ca/~elder/publications/proceedings... · Efcient Computation of Closed Contours using Modied Baum-Welch

Speech Recognition using Hidden Markov Model · topics related to this chapter are observation densities, forward and backward al-gorithm, Viterbi algorithm and Baum-Welch. Chapter

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

Monte Carlo Hidden Markov Models - Robotics Institute · of Baum-Welch, and early stopping is applied to select the right model. We prove that under mild assumptions, Monte Carlo

Hidden Markov Models with Generalised Emission ... · Keywords: Hidden Markov model, ChIP-chip, L q norm, non-Euclidean, Viterbi Algorithm, Baum-Welch Algorithm. ii Acknowledgment

Hidden Markov Models - Phillip Compeaucompeau.cbd.cmu.edu/wp-content/uploads/2016/08/Ch11_HMM.pdf · Baum-Welch Algorithm . An Introduction to Bioinformatics Algorithms Section 1:

L24: Baum-Welch and Entropic Trainingresearch.cs.tamu.edu/prism/lectures/pr/pr_l24.pdf · Baum-Welsh re-estimation ... Baum-Welch converges to is actually the global maximum? ...