Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek...

The UK’s European universityEstimating the Accuracy of Spectral Learning for HMMs Farhana Ferdousi Liza and Marek GrzesSchool of Computing, University of Kent, UK

Roadmap

• Motivation• Background• Proposed method• Experiments and results• Conclusion• Q&A

Motivation (Why Estimating the Accuracy?)

• Model is incorrect or training data is insufficient. (when we see unexpected results)

• The unsupervised learning and model selection

Motivation (Why Spectral Learning for HMM?)

θ0 0.2 0.4 0.6 0.8 1

likeliho

od

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5LM

LM

LMLM

LM

3_SL,LM

← 2_SL

↓ 1_SL

Background

• Spectral Learning• Moment based parameter estimation• Use information contained in the eigen-vectors of a (item-item similarity) matrix to detect structure.

• Provide certain (PAC-style) guarantees of performance (not extremely heuristics).

• Hidden Markov Model• Described by three matrices: T, O and 𝜋.

Spectral learning for HMM

OOM operator for HMM

Empirical low label moment calculation:

Transformed operators for HMM

UΣ𝑉∗ =

Observation 1

X axis-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

Y ax

is

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1SecondVecFirstVec

0.5

X axis

0

-0.5-0.5

0

Y axis

0.5

0.5

-0.5

0

Z ax

is

SecondVecFirstVecThirdVec

As expected, the basis vector (representing the subspace)rotates.

Observation 2

training data size0 5000 10000ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =2


sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =3


sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =4

training data size ×1040 2 4ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80 m =8

training data size0 20 40 60ba

sis

vect

or a

ngle

cha

nge

diff

0102030405060 threshold = 0.1


sis

vect

or a

ngle

cha

nge

diff

0

20

40

60

80threshold = 0.01

training data size0 200 400 600 800ba

sis

vect

or a

ngle

cha

nge

diff

0

20

40

60threshold = 0.001

training data size0 200 400 600 800ba

sis

vect

or a

ngle

cha

nge

diff

0

10

20

30

40

50

60threshold = 0.0001

Synthetic Dataset Real Dataset

Difference between the rotating bases gets smaller on larger training subsets

ill conditioned HMM

• A matrix is ill-conditioned if the condition number is too large or inverse condition number (ICN) is too small.

• An HMM is ambiguous if ICN of the characteristic matrixes of HMM is too small (close to zero), and in such a case, the parameter estimation is difficult for any parameter estimation technique.

• ICN was calculated as a ratio between the smallest and largest singular value of a row augmented matrix of Tand O.

• Example: ill conditioned HMM

Proposed Criteria

• From our observations we have proposed the convergence criteria based on basis vector angle change difference.

• Our claim is based on the second observation that angle change difference reduces "subspace stabilises";; also "can be determined”, when the training data increases.

• [Hsieh and Olsen] showed that, active subspace will never change in a neighborhood of the global minimum.

• The subset size and the threshold determination is application specific, can be determine using cross validation technique.

Experimental setting

Real Dataset: web-navigation data from msnbc.com

Synthetic Dataset : Configuration for the synthetic dataset

Evaluation 1:Error Measure for synthetic dataset (true model is known)

Threshold(log scaled)-10 -5 0 5

Nor

mal

ized

L1

erro

r

8.8

8.9

9

9.1

9.2

9.3Example 1


Nor

mal

ized

L1

erro

r8.28

8.29

8.3

8.31

8.32

8.33

8.34

Example 2


Nor

mal

ized

L1

erro

r

7.04

7.05

7.06

7.07

7.08

7.09

Example 15


Nor

mal

ized

L1

erro

r

7.61

7.62

7.63

7.64

7.65

7.66

7.67

Example 4

Evaluation 1:Error does not corresponds with ill conditioned HMM


Nor

mal

ized

L1

erro

r×104

0

2

4

6

8

10

12Example 8


Nor

mal

ized

L1

erro

r

×104

0

1

2

3

4

5Example 9


Nor

mal

ized

L1

erro

r

×104

0

2

4

6

8

10Example 18


Nor

mal

ized

L1

erro

r

×105

0

2

4

6

8

10Example 17

Evaluation 2:Recovered Parameter and proposed criteria (Well conditioned HMM)

Evaluation 2:Recovered Parameter and proposed criteria (ill conditioned HMM)

Threshold = 0.00001

Conclusion

• The angle change difference can be a useful criteria for checking the convergence.

• Without a convergence criteria it would be difficult to know whether the model is incorrect or, the model is correct but more training example is required.

Future Work

• Problems with spectral learning• Can not incorporate long term dependency• For large domain the SVD can be time consuming• Simplification of the domain space might be tricky and in some cases intractable

• Q & A• Thanks

THE UK’S EUROPEAN UNIVERSITY

www.kent.ac.uk

Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek...

Documents

Transcript of Farhana Ferdousi Liza and Marek Grzes School of Computing ... · Farhana Ferdousi Liza and Marek...