ISMB 2007 Review
description
Transcript of ISMB 2007 Review
![Page 1: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/1.jpg)
ISMB 2007 Review
Kyung-Ah Sohn
![Page 2: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/2.jpg)
Bayesian Association of haplotypes and non-genetic factors to regulatory and phenotypic variation in human populations
Jim C. Huang, Anitha Kannan and John Winn
University of Toronto, MS Research, Cambridge
![Page 3: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/3.jpg)
A statistical method for alignment-free comparison of regulatory sequences
Miriam R. Kantorovitz, Gene E. Robinson and Saurabh Sinha
UIUC, USA
![Page 4: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/4.jpg)
Motivation
How do we measure the similarity between two regulatory DNA sequences in an alignment-free manner? For sequences which do not demonstrate any
statistically significant alignment e.g. two sequences which are not orthologous, yet are
functionally related detecting regulatory regions in the new genome that are
homologous to known enhancers or promoters, which show a significantly less level of alignment than coding sequences
![Page 5: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/5.jpg)
Comparison of k-word frequency distributionHow to compare two 4k-dimensional vectors of k-
word counts?
1. Euclidian distance
2. Information theoretic measure like KL-distance
3. Geometric measure such as the cosine of the angle between the count vectors
4. Statistical measure such as the correlation coefficient
![Page 6: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/6.jpg)
Contribution of this paper
D2 score: Alignment-free similarity measure defined as the number of k-word matches
D2z score: normalized measure that captures the statistical significance of D2 score
Reduce the time complexity from O(42k) to O(4k)
![Page 7: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/7.jpg)
D2 score
For A=A1A2…An1, B=B1B2…Bn2
),( jiY
}11,11|),{( 21 knjknijiI
: indicator variable for a match between the k-words starting at position i in A and at position j in B
The number of k-word matches between the two sequences A and B, including overlaps
![Page 8: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/8.jpg)
D2 score
The inner product of the vectors of word counts in A and B
Let : the set of all k-words on the
alphabet of size d : the number of times w
appears in the sequence
Then
},...,,{ 21 kdwwwW
),...,,(21
Aw
Aw
Aw
Akd
NNNN
Ww
Bw
Aw
BA NNNNBAD ,),(2
![Page 9: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/9.jpg)
D2z score
)(
)(),(),(2
2
22
D
DEBADBAzD
where E(D2) and σ (D2): the expectation and the standard deviation of D2(A,B)
Approximately standard normal when the lengths of the sequences are large enough
How to compute E(D2) and σ (D2)?
1. IID case
2. Markov model case
![Page 10: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/10.jpg)
Expectation
IID modelk
k
a
Ba
Aa
k
lljlijiji gffBAYYE 1,1
1
0),(),( )Pr()1Pr()(
kgknknDE 1,1212 )1)(1()(
where faA
: background probability of letter a in the sequence A
a
yBa
xAayx ffg )()(,
![Page 11: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/11.jpg)
Expectation
Markov Model
)|(Pr)(Pr)|(Pr)(Pr
)(Pr)(Pr)1Pr()(
1||
111
||),(),(
wwwwww
wwYYE
B
kw
BAA
kw
BAjiji
![Page 12: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/12.jpg)
Variance
),(),,(
),(),(),(
),(2 ),()()(tsji
tsjiji
ji YYCovYVarDVar
![Page 13: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/13.jpg)
Variance – IID case
Case (a): Cov(Y(i,j), Y(s,t))=0
Case (b):
Case (c): …
![Page 14: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/14.jpg)
Variance – Marcov Model
Case (a)
![Page 15: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/15.jpg)
Evaluation and Comparison
Evaluate if functionally and/or evolutionarily related sequence pairs are scored better than unrelated pairs of sequences randomly chosen from the genome Positive set: a set of CRMs, known to regulate expression in the same
tissue Negative set: a set of equally many randomly chosen non-coding
sequences Compare each pair of sequences in the positive set, and also for
negative set, sort all the scores in one combined list, and then count how many of the pairs in top half of this list are from the positive set
![Page 16: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/16.jpg)
Evaluation on functionally related regulatory sequences
![Page 17: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/17.jpg)
Evaluation on orthologous regulatory sequences
![Page 18: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/18.jpg)
Summary
Proposed a new sequence similarity score
![Page 19: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/19.jpg)
Semiparametric functional mapping of quantitative trait loci governing long-term HIV dynamics
Song Wu, Jie Yang and Rongling Wu
Department of Statistics, University of Florida
![Page 20: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/20.jpg)
HIV dynamics
Bi-exponential model for short-term dynamic changes of HIV virion copies in AIDS patients after initiation of HAART
tt ePePtV 2121)(
Plasma load at time t
Viral decay rates in the first and second phase
Baseline viral loads when the treatment is initiated
Lack of incorporating the characteristics of long-term HIV viral load changes
![Page 21: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/21.jpg)
HIV dynamics
Two phases of viral load decayThe early rapid decay – λ1
The late slow decay corresponding to the cleaning of free and latent viruses
It is not sensible to assume constant λ2 over a long term treatment period
ttt ePePtV )(21
21)(
![Page 22: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/22.jpg)
Natural cubic spline
Piecewise third-order polynomial function that passes through a set of control points
Estimate λ2(t) using a cubic spline
![Page 23: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/23.jpg)
Quantitative genetic model
marker QTL
Alleles with frequency
M/m A/a
p/1-p q/1-q
Genetically associated
D: linkage disequilibrium
Four haplotypes of MA, Ma, mA, and ma with frequencies
p11=pq+D, p10=p(1-q)-D,
p01=(1-p)q-D, p00=(1-p)(1-q)+D
![Page 24: ISMB 2007 Review](https://reader035.fdocuments.in/reader035/viewer/2022062221/5681404f550346895dabc264/html5/thumbnails/24.jpg)
Linear model linking genetic and residual effects