CS 4495 Computer Vision Hidden Markov Modelsafb/classes/CS4495-Fall...Time series prediction ... •...

Hidden Markov Models CS 4495 Computer Vision – A. Bobick

Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision Hidden Markov Models

Administrivia • PS4 – going OK?

• Please share your experiences on Piazza – e.g. discovered something that is subtle about using vl_sift. If you want to talk about what scales worked and why that’s ok too.

Outline • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture

Slides “borrowed” from UMd and elsewhere Material from: slides from Sebastian Thrun, and Yair Weiss

Audio Spectrum

Audio Spectrum of the Song of the Prothonotary Warbler

Bird Sounds

Chestnut-sided Warbler Prothonotary Warbler

Questions One Could Ask

• What bird is this? • How will the song

continue? • Is this bird sick? • What phases does this

song have?

Time series classification Time series prediction

Outlier detection Time series segmentation

Other Sound Samples

Another Time Series Problem

Cisco General Electric

Microsoft

• Will the stock go up or down?

• What type stock is this (eg, risky)?

• Is the behavior abnormal?

Time series prediction

Time series classification

Outlier detection

Music Analysis

• Is this Beethoven or Bach? • Can we compose more of

that? • Can we segment the piece

into themes?

Time series classification Time series

prediction/generation Time series segmentation

For vision: Waving, pointing, controlling?

The Real Question • How do we model these problems?

• How do we formulate these questions as a

inference/learning problems?

Outline For Today • Time Series • Markov Models • Hidden Markov Models • 3 computational problems of HMMs • Applying HMMs in vision- Gesture • Summary

Weather: A Markov Model (maybe?)

2% 38%

75% 5%

Probability of moving to a given

state depends only on the current state:

1st Order Markovian

Ingredients of a Markov Model • States:

• State transition probabilities:

• Initial state distribution:

Sunny Rainy

2% 38%

75% 5%

1[ ]i iP q Sπ = =

1 2{ , ,..., }NS S S

1( | )ij t i t ja P q S q S+= = =

Ingredients of Our Markov Model • States:

• State transition probabilities:

• Initial state distribution:

(.7 .25 .05)π =

{ , , }sunny rainy snowyS S S

.8 .15 .05.38 .6 .02.75 .05 .2

Sunny Rainy

2% 38%

75% 5%

Probability of a Time Series • Given:

• What is the probability of this series?

)05.25.7.(=π

2.05.75.02.6.38.05.15.8.

0001512.02.002.06.06.015.07.0 =⋅⋅⋅⋅⋅=

)|()|()|()|()|()(

snowysnowyrainysnowy

rainyrainyrainyrainysunnyrainysunnySSPSSP

SSPSSPSSPSP⋅⋅

⋅⋅⋅

Hidden Markov Models

2% 38%

75% 5%

Sunny Rainy

2% 38%

75% 5%

50% 0% 50%

NOT OBSERVABLE

OBSERVABLE

Probability of a Time Series • Given:

• What is the probability of this series?

)05.25.7.(=π

2.05.75.02.6.38.05.15.8.

5.5.065.3.05.1.3.6.

),...,(),...,|()()|( 7171,..., all 71

qqPqqOPQPQOPqqQ

∑∑ ==

),...,,,()( umbrellaumbrellacoatcoat OOOOPOP =

2 4 6(0.3 0.1 0.6) (0.7 0.8 ) ...= ⋅ ⋅ ⋅ ⋅ +

Specification of an HMM • N - number of states

• Q = {q1; q2; : : : ;qT} – sequence of states

• Some form of output symbols • Discrete – finite vocabulary of symbols of size M. One symbol is

“emitted” each time a state is visited (or transition taken). • Continuous – an output density in some feature space associated

with each state where a output is emitted with each visit

• For a given sequence observation O • O = {o1; o2; : : : ;oT} – oi observed symbol or feature at time i

Specification of an HMM • A - the state transition probability matrix

• aij = P(qt+1 = j|qt = i) • B- observation probability distribution

• Discrete: bj(k) = P(ot = k |qt = j) i ≤ k ≤ M • Continuous bj(x) = p(ot = x | qt = j)

• π - the initial state distribution • π (j) = P(q1 = j)

• Full HMM over a of states and output space is thus specified as a triplet: λ = (A,B,π)

S3 S2 S1

What does this have to do with Vision? • Given some sequence of observations, what “model”

generated those? • Using the previous example: given some observation

sequence of clothing:

• Is this Philadelphia, Boston or Newark?

• Notice that if Boston vs Arizona would not need the sequence!

The 3 great problems in HMM modelling 1. Evaluation: Given the model 𝜆 = (𝐴,𝐵,𝜋) what is the

probability of occurrence of a particular observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇} = 𝑃(𝑂|𝜆)

• This is the heart of the classification/recognition problem: I have a trained model for each of a set of classes, which one would most likely generate what I saw.

2. Decoding: Optimal state sequence to produce an observation sequence 𝑂 = {𝑜1, … , 𝑜𝑇}

• Useful in recognition problems – helps give meaning to states – which is not exactly legal but often done anyway.

3. Learning: Determine model λ, given a training set of observations

• Find λ, such that 𝑃(𝑂|𝜆) is maximal

Problem 1: Naïve solution

NB: Observations are mutually independent, given the hidden states. That is, if I know the states then the previous observations don’t help me predict new observation. The states encode *all* the information. Usually only kind-of true – see CRFs.

)()...()(),|(,|( 22111

TqTqqt

it obobobqoPqOP ==) ∏

• State sequence 𝑄 = (𝑞1, … 𝑞𝑇) • Assume independent observations:

Problem 1: Naïve solution

1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=

• But we know the probability of any given sequence of states:

Problem 1: Naïve solution • Given:

• We get:

NB: -The above sum is over all state paths -There are 𝑁𝑇 states paths, each ‘costing’ 𝑂(𝑇) calculations, leading to 𝑂(𝑇𝑁𝑇) time complexity.

)()...()(),|(,|( 22111

TqTqqt

it obobobqoPqOP ==) ∏

qPqOPOP )|(),|()|( λλλ

1 1 2 2 3 ( 1)( | ) ...q q q q q q T qTP q a a aλ π −=

• Define auxiliary forward variable α:

Problem 1: Efficient solution

)|,,...,()( 1 λα iqooPi ttt ==

𝛼𝑡(𝑖) is the probability of observing a partial sequence of observables 𝑜1, … 𝑜𝑡 AND at time t, state 𝑞𝑡 = 𝑖

Problem 1: Efficient solution

• Recursive algorithm: • Initialise: • Calculate: • Obtain:

1 1( ) ( )i ii b oα π=

( ) ( ) ( )N

t t ij j ti

j i a b oα α+ +=

iT iOP

1)()|( αλ

Complexity is only O(𝑵𝟐𝑻)!!!

(Partial obs seq to t AND state 𝑖 at 𝑡) x (transition to j at t+1) x (sensor)

Sum of different ways of getting obs seq

Sum, as can reach 𝑗 from any preceding state

The Forward Algorithm (1) S2

),,...,()( 1 ittt SqOOPi ==α

)()|,(

),,...,(),,...,|,,...,(

),,...,()(

111111

iSqSqOP

SqOOPSqOOSqOOP

SqOOPj

titjtt

iittittjtt

)()( 11 Obi iiπα =

(Trellis diagram)

Problem 1: Alternative solution

• Define auxiliary forward variable β:

1 2( ) ( , ,..., | , )t t t T ti P o o o q iβ λ+ += =

Backward algorithm:

β𝑡(𝑖) – the probability of observing a sequence of observables o t+1 , … , 𝑜𝑇 GIVEN state 𝑞𝑡 = 𝑖 at time 𝑡, and λ

Problem 1: Alternative solution • Recursive algorithm:

• Initialize: • Calculate:

• Terminate:

( ) 1T jβ =

Complexity is 𝑂(𝑁2𝑇)

( ) ( ) ( )N

t t i j j tj

i j a b oβ β + +=

11 )()|( βλ 1,...,1−= Tt

Forward-Backward • Optimality criterion : to choose the states that are

individually most likely at each time t

• The probability of being in state i at time t

• : accounts for partial observation sequence • account for remainder

( ) ( | , )( ) ( )

( ) ( )

t p q i Oi i

γ λα β

α β=

( )t iα( ) :t iβ 1 2, ,...t t To o o+ +

1 2, ,... to o o

= p(O|λ) and qt =i

= p(O|λ)

Problem 2: Decoding • Choose state sequence to maximise probability of

observation sequence • Viterbi algorithm - inductive algorithm that keeps the best

state sequence at each instance

Problem 2: Decoding

• State sequence to maximize 𝑃(𝑂,𝑄|�):

• Define auxiliary variable δ:

1 2( , ,... | , )TP q q q O λ

Viterbi algorithm:

1 2 1 2( ) max ( , ,..., , , ,... | )t t tqi P q q q i o o oδ λ= =

𝛿𝑡(𝑖) – the probability of the most probable path ending in state 𝑞𝑡 = 𝑖

Problem 2: Decoding • Recurrent property: • Algorithm:

• 1. Initialise:

1 1( ) max( ( ) ) ( )t t ij j tij i a b oδ δ+ +=

1 1( ) ( )i ii b oδ π= Ni ≤≤1

1( ) 0iψ =

To get state seq, need to keep track of argument to maximise this, for each t and j. Done via the array 𝜓𝑡(𝑗).

Problem 2: Decoding • 2. Recursion:

• 3. Terminate:

11( ) max( ( ) ) ( )t t ij j ti N

j i a b oδ δ −≤ ≤=

11( ) arg max( ( ) )t t iji N

j i aψ δ −≤ ≤= NjTt ≤≤≤≤ 1,2

)(max1

iP TNiδ

≤≤=∗

)(maxarg1

iq TNiT δ≤≤

P* gives the state-optimized probability

Q* is the optimal state sequence (𝑄∗ = {𝑞1∗, 𝑞2∗, … , 𝑞𝑇∗})

Problem 2: Decoding • 4. Backtrack state sequence:

1 1( )t t tq qψ∗ ∗

+ += 1, 2,...,1t T T= − −

O(N2T) time complexity

Problem 3: Learning • Training HMM to encode observation sequence such that

HMM should identify a similar obs seq in future • Find 𝜆 = (𝐴,𝐵,𝜋), maximizing 𝑃(𝑂|𝜆) • General algorithm:

• Initialize: 𝜆0 • Compute new model 𝜆, using 𝜆0 and observed

sequence 𝑂 • Then • Repeat steps 2 and 3 until:

λλ ←o

dOPOP <− )|(log)|(log 0λλ

Problem 3: Learning

)|()()()(

),( 11

λβα

jobaiji ttjijt

• Let ξ(i,j) be a probability of being in state i at time t and at state j at time t+1, given λ and O seq

∑∑= =

jttjijt

ttjijt

)()()(

Step 1 of Baum-Welch algorithm:

= p(O and (take i to j) |λ )

= p(O|λ)

= p(take i to j at time t |O,λ)

Problem 3: Learning

Operations required for the computation of the joint event that the system is in state Si and time t and State Sj at time t+1

Problem 3: Learning Let be a probability of being in state 𝑖 at time 𝑡, given 𝑂 - expected no. of transitions from state i - expected no. of transitions

1( ) ( , )

i i jγ ξ=

= ∑1

iγ−

1( , )

i jξ−

=∑ ji →

( )t iγ

Problem 3: Learning

the expected frequency of state i at time t=1 ratio of expected no. of transitions from

state i to j over expected no. of transitions from state i ratio of expected no. of times in

state j observing symbol k over expected no. of times in state j

∑∑=

tij γ

Step 2 of Baum-Welch algorithm:

,( )ˆ ( )

tt o kj

∑∑

)(ˆ 1 iγπ =

Problem 3: Learning

• Baum-Welch algorithm uses the forward and backward algorithms to calculate the auxiliary variables 𝛼,𝛽

• B-W algorithm is a special case of the EM algorithm: • E-step: calculation of ξ and γ • M-step: iterative calculation of , ,

• Practical issues: • Can get stuck in local maxima • Numerical problems – log and scaling

π ija )(ˆ kbj

Now HMMs and Vision: Gesture Recognition…

"Gesture recognition"-like activities

Some thoughts about gesture • There is a conference on Face and Gesture Recognition

so obviously Gesture recognition is an important problem…

• Prototype scenario: • Subject does several examples of "each gesture" • System "learns" (or is trained) to have some sort of model for each • At run time compare input to known models and pick one

• New found life for gesture recognition:

Generic Gesture Recognition using HMMs

Nam, Y., & Wohn, K. (1996, July). Recognition of space-time hand-gestures using hidden Markov model. In ACM symposium on Virtual reality software and technology (pp. 51-58).

Generic gesture recognition using HMMs (1)

Data glove

Wins and Losses of HMMs in Gesture • Good points about HMMs:

• A learning paradigm that acquires spatial and temporal models and does some amount of feature selection.

• Recognition is fast; training is not so fast but not too bad.

• Not so good points: • If you know something about state definitions, difficult to

incorporate • Every gesture is a new class, independent of anything else you’ve

learned. • ->Particularly bad for “parameterized gesture.”

Parameterized Gesture

“I caught a fish this big.”

Parametric HMMs (PAMI, 1999)

• Basic ideas: • Make output probabilities of the state be a function of the parameter of

interest, 𝑏𝑗 (𝒙) becomes 𝑏′𝑗(𝒙,𝜃). • Maintain same temporal properties, 𝑎𝑖𝑗 unchanged. • Train with known parameter values to solve for dependencies of 𝑏𝑏 on θ. • During testing, use EM to find θ that gives the highest probability. That

probability is confidence in recognition; best θ is the parameter.

• Issues: • How to represent dependence

on θ ? • How to train given θ ? • How to test for θ ? • What are the limitations on

dependence on θ ?

Linear PHMM - Representation Represent dependence on θ as linear movement of the

mean of the Gaussians of the states: Need to learn Wj and µj for each state j. (ICCV ’98)

Linear PHMM - training • Need to derive EM equations for linear parameters and

proceed as normal:

Linear HMM - testing • Derive EM equations with respect to θ :

• We are testing by EM! (i.e. iterative): • Solve for γtk given guess for θ • Solve for θ given guess for γtk

How big was the fish?

Pointing

• Pointing is the prototypical example of a parameterized gesture.

• Assuming two DOF, can parameterize either by (x,y) or by (θ,φ) .

• Under linear assumption must choose carefully.

• A generalized non-linear map would allow greater freedom. (ICCV 99)

Linear pointing results Test for both recognition and recovery:

If prune based on legal θ (MAP via uniform density) :

Noise sensitivity

• Compare ad hoc procedure with PHMM parameter recovery (ignoring “their” recognition problem!!).

HMMs and vision • HMMs capture sequencing nicely in a probabilistic

manner.

• Moderate time to train, fast to test.

• More when we do activity recognition…

CS 4495 Computer Vision Hidden Markov Modelsafb/classes/CS4495-Fall...Time series prediction ... •...

Documents

Transcript of CS 4495 Computer Vision Hidden Markov Modelsafb/classes/CS4495-Fall...Time series prediction ... •...

L13: hidden Markov models - Texas A&M Universityresearch.cs.tamu.edu/prism/lectures/sp/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models

Hidden Markov Models · Hidden Markov Models 1 10-601 Introduction to Machine Learning Matt Gormley Lecture 20 Nov. 7, 2018 ... Hidden Markov Model 28 A Hidden Markov Model (HMM)

Hidden Markov Models in Bioinformaticscsatol/mach_learn/bemutato/Mate_Korosi_HMMpr… · Outline ˜ Markov Chain ˜ HMM (Hidden Markov Model) ˜ Hidden Markov Models in Bioinformatics

Hidden Markov Models - Penn Engineeringcis520/lectures/HMM.pdf · Hidden Markov Models ... w is the “hidden” part of the “Hidden Markov Model” In speech recognition, we will

Hidden Markov Models

Sequence Classification - with emphasis on Hidden Markov ...cs.unc.edu/~lazebnik/fall09/sequence_classification.pdf · Hidden Markov Models ... with emphasis on Hidden Markov Models

Hidden markov model

Inference in Mixed Hidden Markov Models and Applications ... · 2. From hidden Markov models to mixed hidden Markov models Many authors suggested applying hidden Markov models to

Bioinformatics Introduction to Hidden Markov Models - … · Bioinformatics Introduction to Hidden Markov Models Hidden Markov Models and Multiple Sequence Alignment Slides borrowed

9 Markov chains and Hidden Markov Models - Freie … · 9 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Algorithms: Viterbi, forward,

EE365: Hidden Markov Models - Stanford Universityee266.stanford.edu/lectures/hmm.pdf · EE365: Hidden Markov Models Hidden Markov Models The Viterbi Algorithm 1. Hidden Markov Models

Hidden Markov Models - Home | Princeton Universityrvan/orf557/hmm080728.pdf · 1 Hidden Markov Models..... 1 1.1 Markov Processes ..... 1 1.2 Hidden Markov Models..... 4 ... able

Hidden Markov Models - uml.edugrinstei/91.510/HMM/Class Sildes - Hidden Markov... · Hidden Markov Models. ... particular hidden state. • Thus a hidden Markov model is a standard

Interacting Hidden Markov Models for Video Understandingdraper/papers/narayana_ijprai18.pdfKeywords: Hidden Markov models; video analysis; Segre variety. 1. Introduction Hidden Markov

Markov Chains and Hidden Markov Models - Rice University · are “hidden”; hence, we have a hidden Markov model, or HMM. ... In Markov chains and hidden Markov models, the probability

CS4495/6495 Introduction to Computer Vision8D-L3 Hidden Markov Models CS4495/6495 Introduction to Computer Vision Outline •Time Series •Markov Models •Hidden Markov Models •3

Hidden Markov Models./awm/tutorials/hmm14.pdf · Hidden Markov Models ... 14)

Hidden Markov Models and Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov Model ... ASR Lectures 4&5 Hidden Markov Models and Gaussian

Overview Hidden Markov Models Gaussian Mixture Models · Hidden Markov Models and Gaussian Mixture Models ... Hidden Markov models HMM algorithms ... model ASR Lectures 4&5 Hidden

L13: hidden Markov modelscourses.cs.tamu.edu/rgutier/csce630_f14/l13.pdf · L13: hidden Markov models • Discrete Markov processes • Hidden Markov models • Forward and Backward