CS273a Lecture 11, Aut 08, Batzoglou Multiple Sequence Alignment.

Post on 20-Dec-2015

223 views 3 download

Tags:

Transcript of CS273a Lecture 11, Aut 08, Batzoglou Multiple Sequence Alignment.

CS273a Lecture 11, Aut 08, Batzoglou

Multiple Sequence Alignment

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Index-based local alignment

Dictionary:

All words of length k (~10)

Alignment initiated between words of alignment score T

(typically T = k)

Alignment:

Ungapped extensions until score

below statistical threshold

Output:

All local alignments with score

> statistical threshold

……

……

query

DB

query

scan

Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Local Alignments

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

After chaining

CS273a Lecture 11, Aut 08, Batzoglou

Chaining local alignments

1. Find local alignments

2. Chain -O(NlogN) L.I.S.

3. Restricted DP

CS273a Lecture 11, Aut 08, Batzoglou

Progressive Alignment

• When evolutionary tree is known:

Align closest first, in the order of the tree In each step, align two sequences x, y, or profiles px, py, to generate a new

alignment with associated profile presult

Weighted version: Tree edges have weights, proportional to the divergence in that edge New profile is a weighted average of two old profiles

x

w

y

zExample

Profile: (A, C, G, T, -)px = (0.8, 0.2, 0, 0, 0)py = (0.6, 0, 0, 0, 0.4)

s(px, py) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -)

Result: pxy = (0.7, 0.1, 0, 0, 0.2)

s(px, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -)

Result: px- = (0.4, 0.1, 0, 0, 0.5)

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Threaded Blockset Aligner

Human–Cow

HMR – CDRestricted AreaProfile Alignment

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Reconstructing the Ancestral Mammalian Genome

Human: C

Baboon: C

Cat: C

Dog: G

C

C or G

C

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Neutral Substitution Rates

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Finding Conserved Elements (1)

• Binomial method 25-bp window in the human genome Binomial distribution of k matches in N bases given the neutral

probability of substitution

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Finding Conserved Elements (2)

• Parsimony Method Count minimum # of mutations explaining each column Assign a probability to this parsimony score given neutral model Multiply probabilities across 25-bp window of human genome

A

CAAG

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Finding Conserved Elements

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Finding Conserved Elements (3)

GERP

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Phylo HMMs

HMM

Phylogenetic Tree Model

Phylo HMM

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Finding Conserved Elements (3)

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

How do the methods agree/disagree?

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Statistical Power to Detect Constraint

L

N

C: cutoff # mutationsD: neutral mutation rate: constraint mutation rate relative to neutral

CS273a Lecture 11, Aut 08, BatzoglouCS273a Lecture 11, Fall 2008

Statistical Power to Detect Constraint

L

N

C: cutoff # mutationsD: neutral mutation rate: constraint mutation rate relative to neutral