Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.
-
Upload
pamela-rodgers -
Category
Documents
-
view
216 -
download
0
Transcript of Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.
![Page 1: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/1.jpg)
Special Topics in Genomics
Cis-regulatory Modules and Phylogenetic Footprinting
![Page 2: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/2.jpg)
The slides for module discovery are provided by Prof. Qing Zhou @ UCLA
Cis-regulatory Modules and Module Discovery
![Page 3: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/3.jpg)
Motif Discovery
0θ
T
G
C
A
0
0
0
0
wTTT
wGGG
wCCC
wAAA
21
21
21
21
1θ 2θ wθ ΘBackground Motif (weight matrix)
1
2
3
4 5
Mixture modeling
![Page 4: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/4.jpg)
Difficulties in motif discovery in higher organisms
• Upstream sequences are longer.
• Motifs are less conserved and shorter.
• Background sequence structures are more complicated.
• To solve the problem, utilize more biological knowledge in our model.
1) module structure
2) multiple species conservation
![Page 5: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/5.jpg)
Cis-regulatory module
• Combinatorial control of genes: cis-regulatory modules
module
module
![Page 6: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/6.jpg)
CisModule: modeling module structure(Zhou and Wong, PNAS 2004)
• Module structure: consider co-localization of motif sites.
Motif 1 Motif 2 Motif 3
Hierarchical Mixture modeling
K: # of motifs
0 1Θ KΘ
B M
r1 r
0q1q
Kq
S
25.0
25.0
25.0
25.0
![Page 7: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/7.jpg)
Parameters and missing data
• Missing data problem.K # of motifsl Module lengthS Set of sequencesM Indicators for a module startA Indicators for a motif site start
Background modelWeight matrices for motifs
W Motif widthsr Probability of a module startq Probability of starting a motif site
0Θ
Given
Observed data
Missing data
Parameters Ψ
![Page 8: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/8.jpg)
Bayesian inference by posterior sampling
Module-motif detectionGiven Θ, r, q, and W, 1)Sample modules:
2) Within each module, sample motif sites:
M=1 M=0M=0
Parameter UpdateGiven M and A,
1) Infer Θ from aligned sites.2) Update r, q and W.
Aligned
TTTGC
TATCC
CTTGC
TTTAC
GTTGC
wθθθ 21
043
001
501
010
T
G
C
A
![Page 9: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/9.jpg)
Module sampling
• Denote
.)Ψ|,()Ψ( M
MS|S PP
Want to sample from P (M | S, Ψ), need to calculate
,][ ],1[21 LL xxxx S ).Ψ|()Ψ( ],1[ nn xPf
Ψ).()|()1()Ψ(),1()Ψ( 10 nnlnn fxPrfnlnhrf
Forward summation:
Ψ).()( nn BA
),1( nlnh Module:
ln 1n n
)|( 0nxPBackground:
1 L
![Page 10: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/10.jpg)
Module sampling
• Backward sampling
.)Ψ(
)Ψ()|1( ],1[1
n
nLnln f
AMMP
How to calculate ),1( nlnh
.)1,()|(),()|(),( 001
],1[
mihxPqwmihxPqmih n
K
kkkmwmk k
![Page 11: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/11.jpg)
Posterior inference
• Motif sites: marginal posterior probability of being a motif start position > 0.5.
• Modules: marginal posterior probability of being within a module > 0.5.
![Page 12: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/12.jpg)
Simulation study
• Generate 30 data sets independently, each contains:
1) 20 sequences, each of length 1000;
2) 25 modules, with length 150;
3) each module contains 1 E2F site, 1 YY1 site, and 1 cMyc site.
CisModule Do not consider module
Motifs Fail TP FP Fail TP FP
E2F 0.03 17.9 7.5 0.37 17.1 11.6
YY1 0.07 16.0 8.7 0.20 17.1 11.0
cMyc 0 15.7 9.9 0.63 13.6 12.4
![Page 13: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/13.jpg)
Example: Discovery of tissue-specific modules in Ciona
• Sidow lab Collected 21 genes that are co-expressed during the development of muscle tissue in Ciona.
• Want to find motifs and modules in the upstream sequences (average length = 1330) of these genes.
• Found 3 motifs in 28 modules (4860 bps).
Are they real motifs that determine the gene expression??
![Page 14: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/14.jpg)
Experimental validation
• Positive element: the shortest sufficient and non-overlapping sequence that drives strong expression in muscle: average length of 289 bps.
![Page 15: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/15.jpg)
Experimental validation
• 70% of our predicted motif sites are located in the positive elements!
![Page 16: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/16.jpg)
Other tools
• Gibbs Module Sampler (Thompson et al. Genome Res. 2004)
• EMCMODULE (Gupta and Liu, PNAS, 2005)
![Page 17: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/17.jpg)
Phylogenetic Footprinting
![Page 18: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/18.jpg)
Functional elements tend to be conserved across species
For example, exons are conserved due to the selection pressure. Introns and intergenic regions are less likely to be conserved.
![Page 19: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/19.jpg)
Phylogenetic footprinting
Miller et al. Annu. Rev. Genomics Hum. Genet. 2004
![Page 20: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/20.jpg)
Incorporating cross-species conservation into motif discovery
• A threshold method (Wasserman et al. Nature Genetics, 2000)
STEP1: construct cross-species alignmentSTEP2: compute conservation measure from the alignmentSTEP3: Non-conserved regions are filtered outSTEP4: Gibbs motif sampler is applied to conserved regions of
the target genome
![Page 21: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/21.jpg)
Phylogenetic footprinting & motif discovery
• CompareProspector (Liu Y. et al. Genome Res. 2004)
STEP1: construct cross-species alignmentSTEP2: compute conservation measure (window percent
identity, WPID) from the alignmentSTEP3: multiply the likelihood ratio at a position by the
corresponding WPID, thus likelihood landscape is changed to favor conserved sites
STEP4: apply a Gibbs motif sampler based algorithm
![Page 22: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/22.jpg)
Phylogenetic footprinting & motif discovery
• Evolutionary model based approachEMnEM (Moses et al. 2004)PhyME (Sinha et al. 2004)PhyloGibbs (Siddharthan et al. 2005)Tree Sampler (Li and Wong, 2005)
![Page 23: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/23.jpg)
Incorporating cross-species conservation into motif discovery
• PhyloCon(Wang and Stormo, Bioinformatics, 2003)
STEP 1: construct alignment among orthologous sequences;STEP 2: convert conserved regions into profiles;STEP 3: use profiles in the first sequence as seeds;STEP 4: find matches of each seed in the second sequence;STEP 5: update seeds;STEP 6: repeat step 2 and 3 for all sequences.
![Page 24: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/24.jpg)
Phylogenetic footprinting & module discovery
• Multimodule (Zhou and Wong, The Annals of Applied Statistics, 2007)
![Page 25: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/25.jpg)
Multimodule
• Module structure of each sequence is modeled by an HMM.
• Couple HMMs via multiple alignment: Aligned states are coupled and collapsed into one common state.
• Uncoupled states: similar to single species model.
• Coupled states: evolutionary model.
![Page 26: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/26.jpg)
Comparing with other methods
• Three data sets with experimental validation reported previously, which contain 9 known motifs with 152 validated sites.
• CompareProspector (Liu et al. 2004): conservation score
• PhyloCon (Wang and Stormo 2003): progressive alignment of profiles
• EMnEM (Moses et al. 2004): Phylogenetic motif discovery
• CisModule (Zhou and Wong 2004): Single-species module discovery.
![Page 27: Special Topics in Genomics Cis-regulatory Modules and Phylogenetic Footprinting.](https://reader035.fdocuments.in/reader035/viewer/2022081520/5697bf9c1a28abf838c93643/html5/thumbnails/27.jpg)
Comparing with other methods
Method # known motifs identified
For correctly identified motifs by each method
# predicted sites
# overlaps Sensitivity (%)
Specificity (%)
CompareProspector 7 75 36 24 48
PhyloCon 3 50 26 17 52
EMnEM 6 130 44 29 34
CisModule 5 110 35 23 32
MultiModule 8 157 79 52 50
# of known sites = 152