Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s...

43
Markov Chains J. Alfredo Blakeley-Ruiz

Transcript of Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s...

Page 1: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov Chains

J. Alfredo Blakeley-Ruiz

Page 2: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

The Markov Chain

• Concept developed by Andrey Andreyevich Markov

• A set of states that can be inhabited at a moment in time.

• These states are connected by transitions

• Each transition represents the probability that a state will transition to another state in discrete time.

• The transition from one state to another state depends only on the current state, not any states that may have existed in the past.– Markov property

Page 3: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Explained Mathematically

Given a series states X1, X2,…, Xn

As long as

Page 4: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov chains can be represented as a weighted digraph

http://www.mathcs.emory.edu/~cheung/Courses/558/Syllabus/00/queueing/discrete-Markov.html

Page 5: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov chains can also be represented by a transition matrix

1 2 3 4

1 0.6 0.2 0.0 0.2

2 0.6 0.4 0.0 0.0

3 0.0 0.3 0.4 0.3

4 0.3 0.0 0.0 0.7

Page 6: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Peter the Great

• Lived 1672-1725• Tsar: 1682-1725• Ushered Russia into the modern era

– Changed fashions – beard tax– Modernized Russian military– Changed capital to St. Petersburg– Created a meritocracy– Founded Academy of Sciences in

St. Petersburg• Imported many foreign experts (Euler)• Later native Russians began to make their mark• Nikolai Lobachevsky – non-Euclidian geometry• Pafnuty Chebyshev – Markov’s advisor

• Some not so good things– Huge wars of attrition with Ottomans and Swedish– Meritocracy only for the nobles and clergy– New tax laws turned the serfs into slaves

https://en.wikipedia.org/wiki/Peter_the_Great

Page 7: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Jacob Bernoulli

• Born: Basel Switzerland– 1654-1705

• University of Basel• Theologian, astronomer, and

mathematician– Supporter of Leibniz in calculus

controversy – Credited with a huge list of

mathematical contributions• Discovery of the constant e• Bernoulli numbers• Bernoulli’s golden theorem – Law of

large numbers

Page 8: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

The law of large numbers

• The relative frequency, hnt, of an event with probability p = r/t, t = s + t, in nt independent trials converges to probability p.

• This also often called the weak law of large numbers.

https://en.wikipedia.org/wiki/Law_of_large_numbers

Page 9: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Adolphe Quetelet

• Born: Ghent, French Republic– 1796-1874

• Alma Mater: University of Ghent• Brussels Observatory

– Astronomer– Statistician– Mathematician – Sociologist

• From a sociological perspective he concluded that while free will was real. The law of large numbers demonstrated that it didn’t matter.

https://en.wikipedia.org/wiki/Adolphe_Quetelet

Page 10: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Pafnuty Chebyshev

• Born in Akatovo, Russian Empire

– 1821-1894

• Alma mater: Moscow University

• Professor: St. Petersburg University

– Andrei Markov

– Aleksandr Lyapunov

• Chebyshev inequality

– Proves weak law of large numbers

https://en.wikipedia.org/wiki/Pafnuty_Chebyshev

Page 11: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Chebyshev inequality

• Let X be a sequence of independent and identically distributed variables with mean 𝜇 and standard deviation 𝜎.

• X = (x1,...,xn)/n

• <X> = 𝜇

• Var(X) = 𝜎2/n

• Chebyachev’s inequality states that for all 𝜀 > 0.P(|X-𝜇|≥ 𝜀) ≤ Var(X)/𝜀 = 𝜎2/(n𝜀)

→ lim𝑛→∞

P(|X−𝜇| ≥ 𝜀) ≤ lim𝑛→∞

𝜎2/(n𝜀) = 0

Page 12: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Pavel Nekrasov

• Lived: 1853-1923

• University of Moscow

• Monarchist and supporter Orthodox Church

• Attempted to use LLN to prove free will

• Came up with a proof for LLN using independent variables.

• His 1902 paper on these two topics inspired Markov to invent Markov chains.

http://bit-player.org/wp-content/extras/markov/#/33

Page 13: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Andrey Andreyevich Markov

• Born: Ryazan, Russian Empire– 1856-1922

• Alma Mater: St. Petersburg University– Advisor Pafnuty Cheyshev

• Professor: St. Petersburg University• Notable achievments

– Published more than 120 papers– Chebyshev inequality – Markov inequality– Invention of Markov Chain

• Proving that the Law of Large numbers could apply to dependent random variables

• Markov had a prickly personality

Page 14: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

“I note with astonishment that in the book of A. A. Chuprov, Essays on the Theory of Statistics, on page 195, P. A. Nekrasov, whose work in recent

years represents an abuse of mathematics, is mentioned next to

Chebyshev.”

Page 15: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

The Feud

• Nekrosov and Markov where the opposite in every respect

– Moscow vs. St. Petersburg

– Monarchist vs. Anti-Tsarist

– Religious vs Secularist

• These have as much of a role to play in the future perception of these two mathematicians as mathematics

Page 16: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Nekrosov’s argument for free will (1902 paper)• Nekrosov disagreed with Quetelet• Used Chebyech inequality to prove law of large

numbers for specific independent 𝜀• Showed Independent variables -> law of large numbers• Conjectured that independent variables where

necessary for the law of large numbers• Conjectured voluntary acts can be considered like

independent trials in probability theory.• Stated that law of large numbers had been shown to

hold true for social behaviors.• Argued that this was proof of free will

Page 17: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov’s counter argument

• Markov did not care about the philosophical arguments and was only interested in the mathematics

• Nekrosov and others assumed that the law of large numbers applied only to independent events

• Markov invented markov chains and used them to prove that the law of large numbers could apply to dependant events

Page 18: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov’s 1906-1907 paper

Ground work• In his first Paper on Markov chains, Markov

considered two states and • A simple chain was an infinite series x1,x2,…,xk,

xk+1– Where k is the current time, and xk is the state at time

k.

• For any k, xk+1 was independent of x1,x2,…,xk-1given that xk is known.

• This chain was also time homogeneous in that xk+1 given xk was independent of k

Page 19: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Some Variables

P, = Probability of event xk+1= , Given xk=

𝑃𝛽𝑘+1 = ∑𝛼𝑃𝛼

𝑘𝑃𝛼,𝛽

ai = expected value of independent variable xi

𝐴𝛾𝑖 = 𝐸 𝑥𝑘+𝑖 𝑥𝑘 = 𝛾 = Expected value of xk+I,

Given kk = 𝛾

Page 20: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov’s Theorom

Theorem: For a chain with a positive matrix, all

numbers ak+1 and 𝐴𝛾𝑖 have the same limit, which

they differ from by numbers < ∆𝑖 . At the same time ∆𝑖 < Chi, where C and H are constants and 0<H<1.

This theorem shows that the limit of the probability of the next variable converges to zero with both independent variables and dependent variables given a markov chain.

Page 21: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

“I am concerned only with ques-tions of pure analysis.... I refer to the question of the applicability of prob-

ability theory with indifference.”

Page 22: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Markov’s 1913 Paper

Markov’s experiment on Pushkin’s Eugene Onegin

Hayes, B 2013

Page 23: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Simple Markov Chains can be used to create a simple weather prediction model

Hayes, B 2013

Page 24: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Simple Markov Chain in Economics

Recession Prediction Social Class Mobility Prediction

ng mr sr

ng 0.97 0.29 0

mr 0.145 0.778 0.77

sr 0 0.508 0.492

http://quant-econ.net/jl/finite_markov.html

poor middle class

Rich

poor 0.9 0.1 0

middleclass

0.4 0.4 0.2

rich 0.1 0.1 0.8

Page 25: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Chemical Kinetics: As a stochastic process

• 𝑎𝑘1↔𝑘2

b

– k1 rate that a transition to b

– k2 rate that b transition to a

• Modeled by a linear chain where each state is a different number of a and b molecules– x1,x2,...,xk,xk+1

– If the state xk there are 50 molecules of a and 0 molecule of b what will be the state at xk+1.

Page 26: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Chemical Kinetics Example

• a0=5

• b0=0

• 20 iterations

• k1= 1

• k2= 1

Page 27: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

To emphasize randomness

Page 28: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Chemical Kinetics Example

• a0 = 50

• b0 = 0

• 100 iterations

• k1= 1

• k2= 1

Page 29: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Chemical Kinetics Example

• a0 = 50

• b0 = 0

• 100 iterations

• k1= 5

• k2= 1

Page 30: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Other applications of simple markovchain

• Genetic drift

• Google page rank algorithm

• Social sciences

• Games

– Snakes and ladders

– Monopoly

Page 31: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Hidden Markov Model

• So far we have discussed observable Markov Models

• Sometimes the underlying stochastic process in a system cannot be observed

• Observations resulting from the process can be used to infer the underlying stochastic process

• Developed by Leonard E. Baum and Co.

Page 32: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Hidden Markov Model

• X(t) = state at time t

• X(t) ∈ {x1,…,xn}– n = # unobservable

states

• Y(t) ∈ {y1,…,yk}– k = # possible

observations

• 𝑃 𝑋1:𝑇 , 𝑌1:𝑇 =

P 𝑥1 P 𝑦1 𝑥1 ∏𝑡=2

𝑇

𝑃 𝑥𝑡 𝑥𝑡−1 𝑃(𝑦𝑡|𝑥𝑡)

https://en.wikipedia.org/wiki/Hidden_Markov_model

Page 33: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Simple Example HMM

• Bob and Alice talk on the phone every day

• Alice cannot see the weather.

• Bob only likes to talk about three activities

• The weather’s markovmodel can be predicted using Alice’s observations. https://en.wikipedia.org/wiki/Hidden_Markov_model

Page 34: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Left to right HMM

• State transitions have the property aij = 0, j < i

– No transtions are allowed to states whose indices are lower than the current state

• Transition can be restricted aij = 0, j > i + △.

Page 35: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Single Word Speech Recognition

• v words to be identified– Each word modeled by a

distinct HMM

• k occurrences of each word spoken by 1 or more talkers.

• λv = HMM for each word in the vocabulary

• O = {O1,O2,…,On)

• P(O| λv), 1 < v < V

• v* = argmax[P(O| λv)]

http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

Page 36: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Identifying unknown proteins

• Homologous proteins are proteins that share a common ancestry (and likely a similar function).

– Orthologs – proteins that originate from a common ancestor

– Paralogs – proteins that originate from copy events in the same ancestor

• The sequence of known homologous proteins can be used to predict the function of unknown homologous proteins.

Page 37: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Identifying unknown proteins

• Global Alignment– CD-Hit

– uClust

• Local Alignment– Blast

• Problem is that sequence does not directly determine function. Structure does.

http://drive5.com/usearch/manual/uclust_algo.html

Page 38: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Protein Homology Identification

• Protein domains represent functional subunits in a protein

• A single protein can be made up of one or more domains– We can use the

domain architecture to predict remote homology

Page 39: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Protein Homology Identification

• Pfam and other domain databases identify domains of highly conserved sequences

• They then create a series of hundreds of representative sequences in order to create an HMM.

• This HMM can then be used to determine the domains found in an unknown protein.

• Function can be inferred from domain architecture

Page 40: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Example HMM’s

Page 41: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Homework

• Given the transition matrix bellow

– Draw the representative digraph

– Assuming we are in State 1 at t0, what is the probability that we will be in state 2 at t2

– Calculate the probability matrix for t2

– At which t do the values converge so that the state of t0 does not matter

1 2 3

1 0.9 0.075 0.025

2 0.15 0.8 0.05

3 0.5 0.25 0.25

Page 42: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

References

1. Hayes, B. (2013). First Links in the Markov Chain. American Scientist, 101(2), 92. 2. Schneider, I. (2005). Jakob Bernoulli, Ars Conjectandi (1713). Landmark Writings in Western Mathematics 1640-

1940 (pp. 88-104). Amsterdam: Elsevier.3. Kinchin, A (1929). Sur la loi des grands nombres. Comptes rendus de l’Academie des Sciences, 189, 477-479.4. Vucinich, A. (1960). Mathematics in Russian culture. Journal of the History of Ideas, 21(2): 161-1795. Senata, E. (2003). Statistical Regularity and free will: L. A. J. Quetelet and P. A. Nekrasov. International Statistical

Review, 71, 319-334.6. Grinstead, M. and Snell, J. (1997). Introduction to Probability. American Mathematical Society.

http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/amsbook.mac.pdf7. Basharin, G., Langville, A., Naumov, V. (2004). The life and work of A.A. Markov. Linear Algebra and its

Applications, 386, 3-26.8. Rabiner, L. (1989). A tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.

Proceedings of the IEEEl, 77(2), 1989.9. Baum, L., Petrie, T. (1966). The Statistical Inference of Probabilistic Functions of Finite State Markov Chains. The

Annals of Mathematical Statistics, 37(6), 1554-1563.10. Finn, R., Bateman, A., Clements, J., et al. (2014). Pfam: the protein families database. Nucleic Acids Research,

42(D1), D222-D230.11. Edgar,RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461.12. "Clustering of highly homologous sequences to reduce the size of large protein database", Weizhong Li, Lukasz

Jaroszewski & Adam Godzik Bioinformatics, (2001) 17:282-283.13. Wheele, T., Clements, J., Finn, R. (2014). Skylign: a tool for creating informative, interactive logos representing

sequence alighments and profile hidden Markov models. BMC Bioinformatics, 15(7), 2014

Page 43: Markov Chains - UTKweb.eecs.utk.edu/~cphill25/cs594_spring2016/Markov Chains.pdf · Markov’s Theorom Theorem: For a chain with a positive matrix, all numbers a k+1 and 𝐴 𝑖have

Questions?