7/31/2019 Probability and Markov Models
1/21
Timothy L. BaileyBIOL3014
1
Probability and Markov
Models
7/31/2019 Probability and Markov Models
2/21
Timothy L. BaileyBIOL3014
2
Reading
Chapter 1 in the book.
Chapter 3 pages 46-51.
7/31/2019 Probability and Markov Models
3/21
Timothy L. BaileyBIOL3014
3
Definition: Random Process
A RANDOM PROCESS is something thathas a random outcome:
Roll a die, flip a coin, roll 2 dice
Observe orthologous base pair in 2 seqs
Measure an mRNA level
Weigh a person
7/31/2019 Probability and Markov Models
4/21
Timothy L. BaileyBIOL3014
4
Definition: Experiment
In probability theory, an EXPERIMENT is asingle observation of a random process.
7/31/2019 Probability and Markov Models
5/21
Timothy L. BaileyBIOL3014
5
Definition: Event
An EVENT is a set of possible outcomes of anexperiment.
An ELEMENTARY EVENT is whatever you
decide it is. For example: The outcome of 1 roll of a die The outcomes of nrolls of a die
The residue at position 237 in a protein
The residues at position 237 in a family of proteins The weight of a person
Elementary events must be non-overlapping!
7/31/2019 Probability and Markov Models
6/21
Timothy L. BaileyBIOL3014
6
Compound Events
A COMPOUND EVENT is a set of one ormore elementary events.
For example, you might define twocompound events in a die-rollingexperiment: E=roll less than 3, F=roll
greater than or equal to 3.
Then,
E = {1, 2} and F = {3, 4, 5,6}.
7/31/2019 Probability and Markov Models
7/21
Timothy L. BaileyBIOL3014
7
Defn: Sample Space
The SAMPLE SPACE is the set of allELEMENTARY EVENTS.
So the sample space is the universe of
all possible outcomes of the experiment. This is written:
= { Ei}
For example, for rolls of a die, you mighthave: = {1, 2, 3, 4, 5, 6}
7/31/2019 Probability and Markov Models
8/21
Timothy L. BaileyBIOL3014
8
Discrete vs. Continuous Events
The sample space might be INFINITE.For example, the weight of person can beany real number greater than 0.
Some events are DISCRETE: countable
Base pairs, residues, die rolls
Other events are CONTINUOUS: eg, realnumbers
Weights, alignment scores, mRNA levels
7/31/2019 Probability and Markov Models
9/21
Timothy L. BaileyBIOL3014
9
The Axioms of Probability
Let E and F be events. Then the axioms ofprobability are:
1. Pr(E) 0
2. Pr() = 1
3. Pr(E U F) = Pr(E) + Pr(F) if (E F) = empty set
4. Pr(E | F) Pr(F) = Pr (E F)
E
F
E U F
E
F
E F
ProbabilityIs likearea
in Venndiagrams
7/31/2019 Probability and Markov Models
10/21
Timothy L. BaileyBIOL3014
10
Notation
Joint Probability: Pr(E,F)
The probability of EandF
Conditional probability: Pr(E | F)
The probability of EgivenF
7/31/2019 Probability and Markov Models
11/21
Timothy L. BaileyBIOL3014
11
Conditional Probability and Bayes
Rule
Conditional probability can be defined as:
Pr(E | F) = Pr (E,F) / Pr(F)
Bayes Rule can be used to reverse theroles of E and F:
Pr(F | E) = Pr (E|F) Pr(F) / Pr(E)
7/31/2019 Probability and Markov Models
12/21
Timothy L. BaileyBIOL3014
12
Sequence Models
Observed biological sequences (DNA,RNA, protein) can be thought of as theoutcomes of random processes.
So, it makes sense to model sequencesusing probabilistic models.
You can think of a sequence model as alittle machine that randomly generatessequences.
7/31/2019 Probability and Markov Models
13/21
Timothy L. BaileyBIOL3014
13
A Simple Sequence Model
Imagine a tetrahedral (four-sided) die withthe letters A, C, G and T on its sides.
You roll the die 100 times and write downthe letters that come up (down, actually).
This is a simple random sequence model.
7/31/2019 Probability and Markov Models
14/21
Timothy L. BaileyBIOL3014
14
Zero-order Markov Model
The four-sided die model is called a
0-order Markov model.
It can be drawn thus:
M
qAqCqG
qT
0-order MarkovSequence model
Emission Probabilites
p=1 Transition probability
7/31/2019 Probability and Markov Models
15/21
Timothy L. BaileyBIOL3014
15
Complete 0-order Markov Model
To model the length of the sequences thatthe model can generate, we need to addstart and end states.
Complete 0-order MarkovSequence model
M
qAqCqGqT
S E1 1-p
p
7/31/2019 Probability and Markov Models
16/21
Timothy L. BaileyBIOL3014
16
Generating a Sequence
This Markov model can generateany DNA sequence.Associated with eachsequence is a path and aprobability.
1. Start in state S: P = 1
2. Move to state M: P=1P
3. Print x: P = qXP
4. Move to state M: P=pP
or to state E: P=(1-p) P5. If in state M, go to 3. If in
state E, stop.Sequence: GCAGCT
Path: S, M, M, M, M, M, M, E
P=1qGpqCpqApqGpqCpqT(1-p)
M
qAqCqGqT
S E1 1-p
p
7/31/2019 Probability and Markov Models
17/21
Timothy L. BaileyBIOL3014
17
Using a 0-order Markov Model
This model can generate any DNA sequence, so it canbe used to model DNA.
We used it when we created scoring matrices forsequence alignment as the background model.
Its a pretty dumb model, though. DNA is not very well modeled by a 0-order Markov
model because the probability of seeing, say, a Gfollowing a C is usually different than a Gfollowing an A, (e.g, in CpG islands.)
So we need a better models: higher order Markovmodels.
7/31/2019 Probability and Markov Models
18/21
Timothy L. BaileyBIOL3014
18
Markov Model Order
This simple sequence model iscalled a 0-order Markovmodelbecause the probabilitydistribution of the next letter tobe generated doesnt depend on
any (zero) of the letterspreceding it.
The Markov Property:
Let X = X1X2XL be a sequence.
In an n-order Markov sequence model, the probability distributionof the next letter depends on the previous n letters generated.
0-order: Pr(Xi|X1X2Xi-1)=Pr(Xi)
1-order: Pr(Xi|X1X2Xi-1)=Pr(Xi|Xi-1)
n-order: Pr(Xi|X1X2Xi-1)=Pr(Xi|Xi-1Xi-2Xi-n)
M
qAqCqGqT
S E1 1-p
p
7/31/2019 Probability and Markov Models
19/21
Timothy L. BaileyBIOL3014
19
A 1-order Markov Sequence Model
In a first-order Markov sequence model, the probability of the next letterdepends on what the previous letter generated was.
We can model this by making a state for each letter. Each state alwaysemits the letter it is labeled with. (Not all transitions are shown.)
S E
A
Pr(A|A)
T
Pr(T|T)
G
Pr(G|G)
C
Pr(C|C)
Pr(T|A)
Pr(C|G)
7/31/2019 Probability and Markov Models
20/21
Timothy L. BaileyBIOL3014
20
A 2-order Markov Model
To make a second order Markov sequencemodel, each state is labelled with two letters. Itemits the second letter in its label.
There would have to be sixteen states: AA, AC,AG, AT, CA, CG, CT etc., plus four states for thefirst letter in the sequence: A, C, G, T
Each state would have transitions only to stateswhose first letter matched their second letter.
7/31/2019 Probability and Markov Models
21/21
Timothy L. BaileyBIOL3014 21
Part of a 2-order Model
Each state remembers what the previous
letter emitted was in its label.
E
AA
Pr(A|AA)
AT
AG AC
Pr(T|AA)
Pr(G|AA)
Top Related