Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School,...
-
Upload
tyler-perkins -
Category
Documents
-
view
223 -
download
2
Transcript of Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School,...
![Page 1: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/1.jpg)
Recognition of regulatory signals
Mikhail S. Gelfand
IntegratedGenomics-Moscow
NATO ASI School, October 2001
![Page 2: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/2.jpg)
Why?
• Additional annotation tool (e.g. specificity of transporters and enzymes from large families)
• Important for practice (in addition to metabolic reconstruction)
• Interesting from the evolutionary point of view
![Page 3: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/3.jpg)
Overview
0. Biological introduction
1. Algorithms• Representation of signals
• Deriving the signal
• Site recognition
2. Comparative genomics• Phylogenetic footprinting
• Consistency filtering
![Page 4: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/4.jpg)
Some biology
• Transcription (DNA RNA)
• Splicing (pre-mRNA mRNA)
• Translation (mRNA protein)
• Regulation of transcription in prokaryotes
• … and eukaryotes
• Initiation of translation
![Page 5: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/5.jpg)
Transcription and translation in prokaryotes
![Page 6: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/6.jpg)
Initiation of transcription (bacteria)
![Page 7: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/7.jpg)
Translation in prokaryotes
![Page 8: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/8.jpg)
Translation (details)
![Page 9: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/9.jpg)
Splicing (eukaryotes)
![Page 10: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/10.jpg)
Regulation of transcriptionin prokaryotes
![Page 11: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/11.jpg)
Structure of DNA-binding domain. Example 1
![Page 12: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/12.jpg)
Structure of DNA-binding domain. Example 2
![Page 13: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/13.jpg)
Protein-DNA interactions
![Page 14: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/14.jpg)
Regulation of transcriptionin eukaryotes
![Page 15: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/15.jpg)
Representation of signals
• Consensus
• Pattern (consensus with degenerate positions)
• Positional weight matrix (PWM, or profile)
• Logical rules
• RNA signals
![Page 16: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/16.jpg)
Consensus
codB CCCACGAAAACGATTGCTTTTT
purE GCCACGCAACCGTTTTCCTTGC
pyrD GTTCGGAAAACGTTTGCGTTTT
purT CACACGCAAACGTTTTCGTTTA
cvpA CCTACGCAAACGTTTTCTTTTT
purC GATACGCAAACGTGTGCGTCTG
purM GTCTCGCAAACGTTTGCTTTCC
purH GTTGCGCAAACGTTTTCGTTAC
purL TCTACGCAAACGGTTTCGTCGG
consensus ACGCAAACGTTTTCGT
![Page 17: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/17.jpg)
Pattern
codB CCCACGAAAACGATTGCTTTTT
purE GCCACGCAACCGTTTTCCTTGC
pyrD GTTCGGAAAACGTTTGCGTTTT
purT CACACGCAAACGTTTTCGTTTA
cvpA CCTACGCAAACGTTTTCTTTTT
purC GATACGCAAACGTGTGCGTCTG
purM GTCTCGCAAACGTTTGCTTTCC
purH GTTGCGCAAACGTTTTCGTTAC
purL TCTACGCAAACGGTTTCGTCGG
consensus ACGCAAACGTTTTCGT
pattern aCGmAAACGtTTkCkT
![Page 18: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/18.jpg)
Frequency matrix
j a C G m A A A C G t T T k C k T
A 6 0 0 2 9 9 8 0 0 1 0 0 0 0 0 0
C 1 8 0 7 0 0 1 9 0 0 0 0 0 9 1 0
G 1 1 9 0 0 0 0 0 9 1 1 0 5 0 5 0
T 1 0 0 0 0 0 0 0 0 7 8 9 4 0 3 9
W(b,j)=ln(N(b,j)+0.5) – 0.25iln(N(i,j)+0.5)
I = j b f(b,j)[log f(b,j) / p(b)] Information content
![Page 19: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/19.jpg)
Sequence logo
![Page 20: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/20.jpg)
Positional weight matrix (PWM)
j a C G m A A A C G t T T k C k T
A 6 0 0 2 9 9 8 0 0 1 0 0 0 0 0 0
C 1 8 0 7 0 0 1 9 0 0 0 0 0 9 1 0
G 1 1 9 0 0 0 0 0 9 1 1 0 5 0 5 0
T 1 0 0 0 0 0 0 0 0 7 8 9 4 0 3 9
A 1.1 –1.0 –0.7 0.5 2.2 2.2 1.9 –0.7 –0.7 –0.1 –1.0 –0.7 –1.1 –0.7 –1.4 –0.7
C –0.4 1.9 –0.7 1.6 –0.7 –0.7 0.1 2.2 –0.7 –1.2 –1.0 –0.7 –1.1 2.2 –0.3 –0.7
G –0.4 0.1 2.2 –1.1 –0.7 –0.7 –1.0 –0.7 2.2 –0.1 –0.1 –0.7 1.2 –0.7 1.0 –0.7
T –0.4 –1.0 –0.7 –1.1 –0.7 –0.7 –1.0 –0.7 –0.7 1.5 1.9 2.2 1.0 –0.7 0.6 2.2
![Page 21: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/21.jpg)
• Probabilistic motivation: log-likelihood (up to a linear transformation)
• More probabilistic motivation: z-score (with the suitable base of the logarithm)
• Thermodynamical motivation: free energy (assuming independence of positions, up to a linear transformation)
• Pseudocounts
W(b,j)=ln(N(b,j)+0.5) – 0.25iln(N(i,j)+0.5)
![Page 22: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/22.jpg)
Logical rules, trees etc.
![Page 23: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/23.jpg)
Compilation of samples• Initial sample:
– GenBank
– specialized databases
– literature (reviews)
– literature (original papers)
• Correction of GenBank errors
• Checking the literature • removal of predicted sites
• Removal of duplicates
![Page 24: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/24.jpg)
Re-alignment approaches
• Initial alignment by a biological landmark– start of transcription for promoters
– start codon for ribosome binding sites
– exon-intron boundary for splicing sites
• Deriving the signal within a sliding window
• Re-alignment
• etc. etc. until convergence
![Page 25: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/25.jpg)
Gene starts of Bacillus subtilisdnaN ACATTATCCGTTAGGAGGATAAAAATG
gyrA GTGATACTTCAGGGAGGTTTTTTAATG
serS TCAATAAAAAAAGGAGTGTTTCGCATG
bofA CAAGCGAAGGAGATGAGAAGATTCATG
csfB GCTAACTGTACGGAGGTGGAGAAGATG
xpaC ATAGACACAGGAGTCGATTATCTCATG
metS ACATTCTGATTAGGAGGTTTCAAGATG
gcaD AAAAGGGATATTGGAGGCCAATAAATG
spoVC TATGTGACTAAGGGAGGATTCGCCATG
ftsH GCTTACTGTGGGAGGAGGTAAGGAATG
pabB AAAGAAAATAGAGGAATGATACAAATG
rplJ CAAGAATCTACAGGAGGTGTAACCATG
tufA AAAGCTCTTAAGGAGGATTTTAGAATG
rpsJ TGTAGGCGAAAAGGAGGGAAAATAATG
rpoA CGTTTTGAAGGAGGGTTTTAAGTAATG
rplM AGATCATTTAGGAGGGGAAATTCAATG
![Page 26: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/26.jpg)
dnaN ACATTATCCGTTAGGAGGATAAAAATG
gyrA GTGATACTTCAGGGAGGTTTTTTAATG
serS TCAATAAAAAAAGGAGTGTTTCGCATG
bofA CAAGCGAAGGAGATGAGAAGATTCATG
csfB GCTAACTGTACGGAGGTGGAGAAGATG
xpaC ATAGACACAGGAGTCGATTATCTCATG
metS ACATTCTGATTAGGAGGTTTCAAGATG
gcaD AAAAGGGATATTGGAGGCCAATAAATG
spoVC TATGTGACTAAGGGAGGATTCGCCATG
ftsH GCTTACTGTGGGAGGAGGTAAGGAATG
pabB AAAGAAAATAGAGGAATGATACAAATG
rplJ CAAGAATCTACAGGAGGTGTAACCATG
tufA AAAGCTCTTAAGGAGGATTTTAGAATG
rpsJ TGTAGGCGAAAAGGAGGGAAAATAATG
rpoA CGTTTTGAAGGAGGGTTTTAAGTAATG
rplM AGATCATTTAGGAGGGGAAATTCAATG
cons. aaagtatataagggagggttaataATG
num. 001000000000110110000000111
760666658967228106888659666
![Page 27: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/27.jpg)
dnaN ACATTATCCGTTAGGAGGATAAAAATG
gyrA GTGATACTTCAGGGAGGTTTTTTAATG
serS TCAATAAAAAAAGGAGTGTTTCGCATG
bofA CAAGCGAAGGAGATGAGAAGATTCATG
csfB GCTAACTGTACGGAGGTGGAGAAGATG
xpaC ATAGACACAGGAGTCGATTATCTCATG
metS ACATTCTGATTAGGAGGTTTCAAGATG
gcaD AAAAGGGATATTGGAGGCCAATAAATG
spoVC TATGTGACTAAGGGAGGATTCGCCATG
ftsH GCTTACTGTGGGAGGAGGTAAGGAATG
pabB AAAGAAAATAGAGGAATGATACAAATG
rplJ CAAGAATCTACAGGAGGTGTAACCATG
tufA AAAGCTCTTAAGGAGGATTTTAGAATG
rpsJ TGTAGGCGAAAAGGAGGGAAAATAATG
rpoA CGTTTTGAAGGAGGGTTTTAAGTAATG
rplM AGATCATTTAGGAGGGGAAATTCAATG
cons. tacataaaggaggtttaaaaat
num. 0000000111111000000001
5755779156663678679890
![Page 28: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/28.jpg)
Positional information content before and after re-alignment
![Page 29: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/29.jpg)
Positional nucleotide frequencies after re-alignment (aGGAGG pattern)
![Page 30: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/30.jpg)
Enhancement of a weak signal
![Page 31: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/31.jpg)
Deriving the signal ab initio
• “Discrete” (pattern-driven) approaches: word counting
• “Continuous” (profile-driven) approaches: optimization
![Page 32: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/32.jpg)
Word counting. Short words
• Consider all k-mers
• For each k-mer compute the number of sequences containing this k-mer
– (maybe with some mismatches)
• Select the most frequent k-mer
![Page 33: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/33.jpg)
Problem: Complete search is possible only for short words
Assumption: if a long word is over-represented, its subwords also are overrepresented
Solution: select a set of over-represented words and combine them into longer words
![Page 34: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/34.jpg)
Word counting. Long words
• Consider some k-mers
• For each k-mer compute the number of sequences containing this k-mer
– (maybe with some mismatches)
• Select the most frequent k-mer
![Page 35: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/35.jpg)
Problem: what k-tuples to start with?
1st attempt: those actually occurring in the sample.
But: the correct signal (the consensus word) may not be among them.
![Page 36: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/36.jpg)
2nd attempt: those actually occurring in the sample and some neighborhood.
But: – again, the correct signal (the consensus word)
may not be among them;– the size of the neighborhood grows
exponentially
![Page 37: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/37.jpg)
Graph approach
Each k-mer in each sequence corresponds to a vertex. Two k-mers are linked by an arc, if they differ in at most h positions (h<<k).
Thus we obtain an n-partite graph (n is the number of sequences).
A signal corresponds to a clique (a complete subgraph) – or at least a dense subgraph – with vertices in each part.
![Page 38: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/38.jpg)
A simple algorithm
• Remove vertices that cannot be extended to complete subgraphs – that is, do not have arcs to all parts of the graph
• Remove pairs that cannot be extended …– that is, do not form triangles with the third
vertex in all parts of the graph
• Etc.(will not work “as is” for dense subgraphs)
![Page 39: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/39.jpg)
Optimization. EM algorithms
• Generate an initial set of profiles (e.g. seed with all k-mers)
• For each profile
– find the best (highest scoring) representative in each sequence
– update the profile
• Iterate until convergence
![Page 40: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/40.jpg)
This algorithm converges.
However, it cannot leave the basin of attraction.
Thus, if the initial approximation is bad, it will converge to nonsense.
Solution: stochastic optimization.
![Page 41: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/41.jpg)
Simulated annealing
• Goal: maximize the information content I
I = j b f(b,j)[log f(b,j) / p(b)]
• or any other measure of homogeneity of the sites
![Page 42: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/42.jpg)
Let A be the current signal (set of candidate sites), and let I(A) be the corresponding information content.
Let B be a set of sites obtained by randomly choosing a different site in one sequence, and let I(B) be its information content.
• if I(B) I(A), B is accepted• if I(B) < I(A), B is accepted with probability
P = exp [(I(B) – I(A)) / T]The temperature T decreases exponentially, but
slowly; the initial temperature is chosen such that almost all changes are accepted.
![Page 43: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/43.jpg)
Gibbs sampler
Again, A is a signal (set of sites), and I(A) is its information content.
At each step a new site is selected in one sequence with probability
P ~ exp [(I(Anew)]For each candidate site the total time of
occupation is computed.(Note that the signal changes all the time)
![Page 44: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/44.jpg)
Use of symmetry• DNA-binding factors and their signals
Co-operative homogeneous
Palindromes
Repeats
Co-operative non-homogeneous
Cassetes
Others
RNA signals
![Page 45: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/45.jpg)
Recognition: PWM/profiles
The simplest technique: positional nucleotide weights are
W(b,j)=ln(N(b,j)+0.5) – 0.25iln(N(i,j)+0.5)
Score of a candidate site b1…bk is the sum of the corresponding positional nucleotide weights:
S(b1…bk ) = j=1,…,kW(bj,j)
![Page 46: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/46.jpg)
Distribution of RBS profile scores on sites (green) and non-sites (red)
![Page 47: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/47.jpg)
Pattern recognition
• Linear discriminant analysis
• Logical rules
• Syntactic analysis
• Context-sensitive grammars
• Perceptron
• Neural networks
![Page 48: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/48.jpg)
Neural networks: architecture
• 4k input neurons (sensors), each responsible for observing a particular nucleotide at particular position
OR 2k neurons (one discriminates between purines and pyrimidines, the other, between AT and GC)
• One or more layers of hidden neurons• One output neuron
![Page 49: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/49.jpg)
• Each neuron is connected to all neurons of the next layer
• Each connection is ascribed a numerical weight
A neuron• Sums the signals at incoming connections• Compares the total with the threshold (or
transforms it according to a fixed function)• If the threshold is passed, excites the
outcoming connections (resp. sends the modified value)
![Page 50: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/50.jpg)
Training:
• Sites and non-sites from the training sample are presented one by one.
• The output neuron produces the prediction.• The connection weights and thresholds are
modified if the prediction is incorrect.
Networks differ by architecture, particulars of the signal processing, the training schedule
![Page 51: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/51.jpg)
Use of sequence context
• Presence of multiple co-operative sites– ArgR (E. coli), purine regulator (Pyrococcus)– XylR+CRP; CytR+CRP (E. coli)– MEF+MyoD in muscle-specific promoters
(mammals)
• Location relative to promoters – repressors vs. activators
![Page 52: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/52.jpg)
BenchmarkingDifficult, because:• Different algorithms are optimized for different
performance parameters• Incompatible training sets• Difficult to construct a homogeneous and
unambiguous testing set:– Unobserved sites– Competition between closely located sites– Activation in specific conditions– non-specific binding (52 out of 54 candidate HNF-1
binding sites do bind the factor)
![Page 53: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/53.jpg)
Promoters of E. coli
• PWM at false positive rate 1 per 2000 bp:– 25% of all promoters,– 60% of constitutive (non-activated) promoters
• PWM perform as well as neural networks
![Page 54: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/54.jpg)
Eukaryotic promoters
![Page 55: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/55.jpg)
Ribosome binding sites• Information content of the profile predicts
the average reliability of predictions
![Page 56: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/56.jpg)
CRP (E. coli)
0102030405060708090
100110
3 3,2 3,4 3,6 3,8 4 4,2 4,4 4,6 4,8 5
threshold
OV
UN
OV: overprediction (% of false positives among candidate sites)UN: underprediction (% of lost true sites)
![Page 57: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/57.jpg)
Comparative approach to the analysis of regulation
Making good predictions
with bad rules
![Page 58: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/58.jpg)
Regulation of transcription in prokaryotes
Difficult:
• Small sample size
• Weak signals (or we do not know what features are relevant, maybe the DNA structure)
![Page 59: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/59.jpg)
CRP (E. coli)
0102030405060708090
100110
3 3,2 3,4 3,6 3,8 4 4,2 4,4 4,6 4,8 5
threshold
OV
UN
OV: overprediction (% of false positives among candidate sites)UN: underprediction (% of lost true sites)
![Page 60: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/60.jpg)
GenBank entry for the E. coli genomegene complement(120178..121551) /note="b0112" /gene="aroP"CDS complement(120178..121551) /gene="aroP" /product="aromatic amino acid transport protein"protein_bind complement(121599..121617) /bound_moiety="TyrR documented site"protein_bind complement(121622..121640) /bound_moiety="TyrR documented site"protein_bind complement(121653..121664) /bound_moiety="PutA predicted site"promoter complement(121683..121711) /note="factor Sigma70; promoter aroP; documented +1 at 121671"protein_bind complement(121810..121823) /bound_moiety="OxyR predicted site"protein_bind complement(121813..121835) /bound_moiety="ArgR predicted site"
aroP TyrR TyrR PutA Pr. OxyR ArgR
![Page 61: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/61.jpg)
Many genomes are available =>
comparative approach
Basic assumption
Regulons (sets of co-regulated genes) are conserved
• well …in some cases
• in fact, in many cases
![Page 62: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/62.jpg)
Corollary: The consistency check
• True sutes occur upstream of orthologous genes
• False sites are scattered at random
![Page 63: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/63.jpg)
Orthologs
• Orthologous genes: – diverged by specitation– retain cellular role
• Paralogous genes: – diverged by duplication– retain biochemical function only
![Page 64: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/64.jpg)
Orthology (definition)
• Genomes are shown as black “pipes”
• 1st event: duplication• 2nd event: specitation• Genes of the same
color are orthologous• Genes of different
color are paralogous
![Page 65: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/65.jpg)
Search for orthologs (fast and dirty)
Genome 1 Genome 2
symmetrical best hit
A
B
B"
A'
B'
![Page 66: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/66.jpg)
The basic procedure
Genome 2Genome 2Genome 1Genome 1
Set of known sitesSet of known sites ProfileProfile
Genome NGenome N
![Page 67: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/67.jpg)
Accounting for the operon structure
«Old» genome «New» genome
A
A
BC
BC
D
XD
EF
E
F
X
X
X
X
![Page 68: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/68.jpg)
Checklist
• Presence of orthologous transcription factors
• Really orthologous (BETs, COGs etc. are not sufficient)
• * Conservation of the DNA-binding domain
• * Conservation of the core pathway
![Page 69: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/69.jpg)
Purine regulons of E. coli and H. influenzae purR purR guaBA guaBA glyA pyrD pyrD prsA prsA glnB glnB purA purA codBA - codA pyrC - purT - gcvTHP - speAB - - ycfC purB
ycfC purB
purHD glyA
purHDglyA
purL purL cvpApurF
cvpApurF
purMN purMN purKE purKE purC purC yjcD yieG
HI0125
![Page 70: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/70.jpg)
Predicted purine transporters
YgfO
YicE
UAPA_En
UAPC_En
YgfU
2635740_Bs
2635741_Bs
YcdG_Ec
UraA_Hi
UraA_Ec
2895752_EfPyrP_Bc
PyrP_Bs
YjcD_Hi
YjcDYgfQ
YtiP_Bs2239289_Bs
YieG YicO
Y326_Mj
2314333_Hp
2689889_Bb
2689890_Bb
997
746
979
PbuX_Bs
965
969
981
997
980
965
758
940
714
996
997
999
994
778
749
9981000
![Page 71: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/71.jpg)
Changes in the operon structure: more examples
• glnK-amtB loci of methanogenic acrhaebacteria
M. thermoautotrophicum
NIF amtB glnK NIF amtB glnK
M. jannaschii
NIF glnK amtB
glnK NIF amtB
![Page 72: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/72.jpg)
Tryptophan operons
E. coli
H. influenzae
trpE trpD trpC trpB trpA
ydfG trpB trpA
trpE trpD trpC
![Page 73: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/73.jpg)
Heat chock (HrcA) regulons / CIRCE elements
Bacillus subtilis
CIRCE hrcA grpE dnaK dnaJ
CIRCE groES groEL
Mycobacterium tuberculosis
hrcA dnaJ
dnaK grpE dnaJ
CIRCE groES groEL
CIRCE groEL
Chlamidiae
CIRCE hrcA grpE dnaK
dnaJ
CIRCE groES groEL
groEL
Synechocystis
hrcA
grpE dnaK
dnaJ
CIRCE groES groEL
CIRCE groEL
Mycoplasma
hrcA
grpE
CIRCE dnaK
CIRCE dnaJ
CIRCE groES groEL
CIRCE lon
CIRCE clpB
![Page 74: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/74.jpg)
Closely related genomes: Phylogenetic footprinting
Regulatory sites are more conserved than non-coding regions in general and are often seen as conserved islands in alignments of gene upstream regions.
![Page 75: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/75.jpg)
High conservation
purL
ST AGCGGCATTTTGCGTAACAATGCGCCAGTTGGCAACTT-ATT-CGCAACGATAGCCGCACC--GTATGACAAGAAAAAGCEC AGCGGCATTTTGCGTAAACCTGCGCCAGATGGCAACTT-ATT-ACAGCCATTGGCGGCACG--CGTTGCTAATTCACGATYP AGTGGCATTTTGCGCAACAAAACGCCAGTGTGCAACTTTATTGCGAGCTATTTGCTGAGTCTGCGTTACACACACATAGC ** *********** ** ****** ******* *** * ** * * * *
ST GG-TGATT---------TTATTTCT-------ACGCAAACGGTTTCGTCGGCGCGTCAGATTCTTTATAATGACGGCCGTEC GG-TGATT---------TTATTTCC-------ACGCAAACGGTTTCGTCAGCGCATCAGATTCTTTATAATGACGCCCGTYP GGCTGTTTCTGACTGAATTATTAATAATAGATACGCAAACGGTTTCGTCGGCGGCTCAGATTCACTATAATGGCGCGCGT ** ** ** ***** ***************** *** ******** ******* ** ***
ST TTCCCCCC-------------------TTGCGCACACCAAA--------------GCTTAGAAGACGAGAGA--CTTA--EC TTCCCCCCC------------------TTGGGTACACCGAAA-------------GCTTAGAAGACGAGAGA--CTTA--YP TTTGCCCTGTTGTTGCGCCAATGAATGTTGCGCCCAATGAAGTGCTGTTCCAGCCGCTTCGAAGACGAGAGAAACTTAGA ** *** *** * ** ** **** ************ ****
ST TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCTGCATTCCGTATCAATAAACTGCTGGCGCGCTTTCAGGCTGCCAACEC TGATGGAAATTCTGCGTGGTTCGCCTGCACTGTCGGCATTCCGAATCAACAAACTGCTGGCACGTTTTCAGGCTGCCAGGYP TTATGGAAATACTGCGTGGTTCACCCGCTTTGTCGGCTTTTCGTATCACCAAACTGTTGTCCCGTTGCCAGGATGCTCAC * ******** *********** ** ** **** ** ** ** **** ****** ** * ** * **** ***
![Page 76: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/76.jpg)
Low conservation
yjcD
ST AAA-GCATAAAAAGCGGCAAAGTTCAGTTGAAAAAGCGTTGATGATCGCTGGATAATCGTTTGCTTTTTTTTG---CCACEC AAA-GAGAAAAAAGCAGCAAACTTCGGTTGAAAAAGCCGCTATGATCGCCGGATAATCGTTTGCTTTTTTTA----CCACYP AAATGTATTAAATGTCGCATTCGGGTGTTGATTAGTCACCACTGATGGCTAGATAATCGTTTGCCTTAAATGACATCTGC *** * *** * *** ***** * * **** ** ************* ** * * *
ST CC--------GTTTTGT--------ATACGTG----GAGCTAAACGTTTGCTTTTTTGCGGCGCCCCG-G-TTGTCGTAAEC CC--------GTTTTGT--------ATGCGCG----GAGCTAAACGTTTGCTTTTTTGCGACGCAGCA-AATTGTCGCAAYP CCTAAACTTCGATTTTTTTTCAGTCATGCGTTCTCCCAGCTAATCGTTTGCTATTTTTCCCCGCTCTATGAGTCAGGGAG ** * *** * ** ** ****** ******** **** * *** * * *
ST ATGTAGC----------ACAAGGA-GATAACGTTGCGCTGTTAGTGGATTACCTCCCACGTATACCGACGAATAATAAATEC ACCTGGA----------GCAGGAA-GATAACGTTTCGCTGGCAGGGGATTGTCCGCCACGCATCTTGACGAAAATTAAACYP AGTTAGTGAGTTCATCGACAGGAACGGAAACGATTACGTAGAGAAGGGCGCTTGGCTTGGCATGCTATTTTAAAATGA-C * * * ** * * * **** * * ** * * ** * * * *
ST TCTCAGGGGATGTTTTCT-ATGTCT------ACGCCTTCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTTEC TCTCAGGGGATGTTTTCTTATGTCT------ACGCCATCAGCGCGTACCGGCGGTTCACTCGACGCCTGGTTTAAAATTTYP ACACAGGGGACATCACC--ATGTCTAGCAGCAACCCTCAAGCACAGCCAAAGGGCACGCTTGATGCATTCTTTAAGCTTA * ******* * * ****** * ** *** * * ** * ** ** ** * ***** **
![Page 77: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/77.jpg)
Degeneration of sitestrpH
ttGtACAagttaactaGTacaaEC gtcgccgaATGTACTAGAGAACTAGTGCATtagcttatST accgcaggATGTACTAGTAAACTAGTTTAAtggattggYP gtcgtcggATGTTTTAACTAAATATTTTCAtgagtgatEH ctcgccgcATGTACTGATGGGTAACCGGCGctgaactg .**..* ****..*. .. .* . . . .BA tcactgtatttttttagtatactattaaacttatcctc
![Page 78: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/78.jpg)
Problems and solutions
Unique members of regulons may be lost: use of additional genomes decreases the number of “orphan” regulon members.
Closely related factors may have similar sites: careful analysis of function and analysis of particular sites is usually sufficient to resolve ambiguities.
Too many genomes and regulons: apply preliminary automated screening.
![Page 79: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/79.jpg)
Modification: ubiquitous regulators
• Present in many genomes
• Only core regulon is conserved
• Mode of regulation may vary
• Signals may be slightly different
![Page 80: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/80.jpg)
Arginine repressor ArgR/AhrC
artJRv1652 Rv1653 Rv1654 Rv1655 Rv1656 Rv1658 Rv1659 Rv1383 Rv1384
argC argJ argB argD argF argGargHcarA carB yqiXyqiYyqiZ
rocRrocC rocArocB rocF rocDrocE
AhrC
2787 278827862785 414 1203 12043089 3090 4268426642652443
yqjN
4913533
TM1782 TM1783 TM1784 TM1785 TM1097TM1780 TM1781 TM0558TM0577 TM0593TM0592TM0591TM0371
? ? ? DR1415 DR0080DR0674 DR0678DR684 DR0668 DR2610 ? ?DR0742
Mycobacterium tuberculosus
Bacillus subtilis
Clostridium acetobutylicum
Thermotoga maritima
Deinococcus radiodurans
AhrC
argC argB argD argFargGargH carA carB artIartM artQargR
Escherichia coli
? HI0596HI0811 HI1727HI1209
Haemophilus influenzae
argE
argA
artP
HI1179H1177 HI1178 HI1180
Vibrio choleraeVC2644 VC2643 VC2641argR VC2645 VC2642 VC2618 VC2390 VC2389 VC2508 VCA075
9VCA075
7VCA075
8VCA076
0VC2316
![Page 81: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/81.jpg)
ABC transporters (periplasmic components)
TM1170CA_3898
HI1080BS_yckK
DR0564
Cpn0604DR2278
Cpn0482HI1179
EC_artJ (arg)EC_artI (arg)
EC_argT (arg)EC_hisJ (his)
TM0593BS_glnH (gln)
Rv0411cEC_ybeJ
EC_yhdWBS_yqiX
EC_glnH (gln)CA_0129
DR2154DR2610
CA_4268CA_0491
BS_yxeMCA_1093
BS_ytmKBS_ytmJ
0.1 changes per site
EC_fliY (biosynthesis of flagellae)
![Page 82: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/82.jpg)
Modification: horizontal transfer
• Impossible to resolve the orthology relationships: a homologous regulated gene is sufficient for corroboration
• Often rgulate large loci (several adjacent operons)
• Signals are mainly conserved
![Page 83: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/83.jpg)
New signals
• Select a group of related genomes
• In each genome select metabolically related genes
• Add possibly co-transcribed genes
• Compare upstream regions for each genome independently
• Construct profiles
• Compare constructed profiles: if similar, then relevant
![Page 84: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/84.jpg)
The purine regulon of Pyrococcus spp.• Use functional annotation and COGs to select genes encoding enzymes
from purine pathway: purA, purB, purC, purF, purD, purE, purL-I, purL-II, purT, guaA.
• Construct profiles for each genome. The quality of profiles is weak (< 1 bit/position).
• However, the profiles are almost identical.
• There is no significant similarity of upstream regions (outside sites). Thus the profiles are probably correct.
• Low specificity of profiles, thus >300 candidate genes in each genome.
• Observation: in upstream regions of all genes from the initial sample the candidate sites occur twice with 22 bp spacer.
• The new rule is absolutely specific: only one additional gene in each genome.
![Page 85: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/85.jpg)
YgfO
YicE
UAPA_En
UAPC_En
YgfU
2635740_Bs
2635741_Bs
YcdG_Ec
UraA_Hi
UraA_Ec
2895752_EfPyrP_Bc
PyrP_Bs
YjcD_Hi
YjcDYgfQ
YtiP_Bs
2239289_Bs
YieG YicO
Y326_Mj
2314333_Hp
2689889_Bb
2689890_Bb
997
746
979
PbuX_Bs
965
969
981
997
980
965
758
940
714
996
997
999
994
778
749
998
1000
PH
PA A
PF
![Page 86: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/86.jpg)
Sources
• G. Stormo
• J. Fickett
• W. Miller
• I. Dubchak
• Yuh et al. (1998)
• Tronche et al. (1997)
• textbooks
![Page 87: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/87.jpg)
Discussions and collaboration
• Farid Chetouani (Institute Pasteur)
• Eugene Koonin (NCBI)
• Yuri Kozlov (Aginomoto)
• Leonid Mirny (Harvard - MIT)
• Alexander Mironov (GosNIIGenetika)
• Vasily Lybetsky (Inst. Probl. Inform. Trans.)
• Andrey Osterman (IntegratedGenomics)
• Danila Perumov (Inst. Nucl. Phys.)
• Pavel Pevzner (UC San Diego)
• Michael Roytberg (Inst. Math. Probl. Biol.)
![Page 88: Recognition of regulatory signals Mikhail S. Gelfand IntegratedGenomics-Moscow NATO ASI School, October 2001.](https://reader036.fdocuments.in/reader036/viewer/2022081513/56649e245503460f94b12281/html5/thumbnails/88.jpg)
Collaborators
• Andrey A. Mironov
• A. B. Rakhmaninova• Vadim Brodyansky• Lyudmila Danilova• Anna Gerasimova • Alexey Kazakov• Ekaterina Kotelnikova
• Olga Laikova• Pavel Novichkov• Ekaterina Panina • Elya Permina • Dmitry Ravcheev• Dmitry Rodionov• Natalya Sadovskaya• Alexey Vitreschak