Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...

EE3J2 Data MiningSlide 1

EE3J2 Data Mining

Lecture 14: Introduction to Hidden Markov Models

Martin Russell

Objectives

Limitations of sequence matching Introduction to hidden Markov models (HMMs)

Sequence retrieval using DP

……

AAGDTDTDTDD

AABBCBDAAAAAAA

BABABABBCCDF

GGGGDDGDGDGDGDTDTD

DGDGDGDGD

AABCDTAABCDTAABCDTAAB

CDCDCDTGGG

GGAACDTGGGGGAAA

…….

Corpus of sequential data

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…

Dynamic Programming

Distance Calculation Calculate ad(S,Q)

for each sequence S in corpus

QSadSS

,minargˆ

Limitations of ‘template matching’

This type of analysis is sometimes referred to as template matching

The ‘templates’ are the sequences in the corpus Can think of each template as representing a ‘class’ Problem is to determine which class best fits the

query Performance will depend on precisely which

template is used to represent the class

Alternative path shapes

The basic units of path considered so far are:

substitution insertion deletion

Others are possible and may have advantages, e.g:

substitution insertion deletion

Example

Hidden Markov Models (HMMs)

One solution is to replace the individual template sequence with an ‘average’ sequence

But what is an ‘average sequence’? One solution is to use a type of statistical model

called a Hidden Markov Model

Suppose the following sequences are in same class:– ABC, YBBC, ABXC, AZ

Compute alignments:

Y B B C

A B X C

Finite State Network Representation The sequence consists of 3 ‘states’

– First state is ‘realised’ as A (twice) or Y (once)

– Second state ‘realised’ as B (three times) or X (once)

– Second state can be repeated or deleted

– Third state can be ‘realised’ as C (twice) or Z (once)

Network representation

Directed graph representation Each state associated with a set of probabilities

– Called the ‘state emission’ probabilities

ZpXpCpBp

Transition probabilities

Transition probabilities control insertions and deletions of symbols

1 10.67

05.05.000

033.067.000

ajk=Prob(state k follows state j)

Basic rule for drawing transition networks: Connect state j to state k if ajk > 0

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

CpaBpaApaYpYABBBXBCp 4343232222

State-symbol trellis

Y A B B B X B C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

More examples

Dynamic Programming

Y A B B B X B C

Formal Definition

A Hidden Markov Model (HMM) for the symbols 1, 2, …, K consists of:

– A number of states N

– An N N state transition probability matrix A

– For each state k a set of probabilities pk(1), … , p(K) - p(k) is the probability that k occurs for state k

Alignment paths for HMMs

For HMMs, alignment paths are called state sequences

Y A B B B X B C

CpaBpaApaYpYABBBXBCp 4343232222

State sequence

The optimal state sequence

Let M be a HMM and s a sequence Probability on previous slide depends on the state

sequence and the model, so we write:

By analogy with dynamic programming, the optimal state sequence is the sequence such that:

Msp |,

MspMsp

|,maxargˆ

or, ,|,max|ˆ,

Computing the optimal state sequence:The ‘state-symbol’ trellis

Y A B B B X B C

Rule: connect state j at symbol m with state k at symbol m+1 if ajk > 0

More examples

Dynamic Programminga.k.a Viterbi Decoding

Y A B B B X B C

4|ˆ,ˆ ,3

kk Mspsp

Sequence retrieval using HMMs

Corpus of pre-build HMMs

‘query’ sequence Q

…BBCCDDDGDGDGDCDTCDTTDCCC…Viterbi

Decoding

Calculate p(Q|M) for each HMM M

in corpus MQpMM

|maxargˆ

HMM Construction

Suppose we have a set of HMMs, each representing a different class (e.g. protein sequence)

Given an unknown sequence s:– Use Viterbi decoding to compare s with each HMM

– Compute

But how do we obtain the HMM in the first place?

MxspMsp |ˆ,|ˆ

HMM training

Given a set of example sequences S a HMM M can be built such that p(S|M) is locally maximised

Procedure is as follows:– Obtain an initial estimate of a suitable model M0

– Apply an algorithm – the ‘Baum-Welch’ algorithm – to obtain a new model M1 such that p(S|M1) ≥ p(S|M0)

– Repeat to produce a sequence of HMMs M0, M1,…,Mn with:

p(S|M0) ≤ p(S|M1) ≤ p(S|M2) ≤… ≤ p(S|Mn)

Local optimality

M0 M1…Mn

P(S|M)

Local maximum

Global maximum

Summary

Hidden Markov Models Importance of HMMs for sequence matching Viterbi decoding HMM training

Summary

Review of template matching Hidden Markov Models Dynamic programming for HMMs

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...

Documents

Transcript of Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin...

Data Mining-Graph Mining

Statistical Data Mining€¦ · 3 Data Mining Data (re-design and maintain existing database) Mining (Analysis) -- our focus Statistical Data Mining What is Data Mining? Data mining

Data Mining vs. Statistics Pavel Brusilovsky. 2 Objectives 2 Intro to Data Mining Data Mining vs. Statistics Data Mining vs. Text Mining Applications.

Data Mining für Business Intelligence Data Mining for ...

Data Mining By: Thai Hoa Nguyen Pham. Data Mining Define Data Mining Classification Association Clustering.

Data Mining: What is Data Mining?

Data mining week 1 - pengantar data mining

Data Mining Taylor Statistics 202: Data Mining

EE3J2 Data Mining EE3J2 Data Mining

Web Mining – Data Mining im Internet Mining – Data Mining im Internet Vorlesung SS 2014 ... Web Mining is Data Mining for Data on the World-Wide Web Text Mining: Application of

Course : Data mining - Lecture : Mining data streams

October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

DATA MINING AND ANALYSIS - doc.lagout.org Mining/Data Mining and Analysis... · DATA MINING AND ANALYSIS The fundamental algorithms in data mining and analysis form the basis ...

EE3J2: Data Mining Data Mining: Fundamental Concepts An Introduction to the Course Dr Theodoros N Arvanitis Electronic, Electrical & Computer Engineering.

DATA MINING WITH - Lagout Mining/Data Mining with Decision... · Library of Congress Cataloging-in-Publication Data Rokach, Lior. Data mining with ... Data mining is the science,

Data Mining Introduction to Data Mining

Data Mining: Introduction. Chapter 1. Introduction zMotivation: Why data mining? zWhat is data mining? zData Mining: On what kind of data? zData mining.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.

Introduction to Data Miningstxavierstn.edu.in/ict_ppts/Computer Science/Jenila Vincent/part1.pdf · Define data mining Data mining vs. databases Basic data mining tasks Data mining