Chapter 2 structures of nucleic acids nucleic acids - Personal Psu
I SUPPORT OPEN ACCESS PubMed Central Public Library of Science Nucleic Acids Research...
-
Upload
miles-simon -
Category
Documents
-
view
213 -
download
0
Transcript of I SUPPORT OPEN ACCESS PubMed Central Public Library of Science Nucleic Acids Research...
I SUPPORTOPEN ACCESS
PubMed Centralwww.pubmedcentral.nih.gov/
Public Library of Sciencewww.publiclibraryofscience.org
Nucleic Acids Researchnar.oupjournals.org/
Thanks to the authors and reviewers who support NAR
Please introduce yourselves!
rebase.neb.com/rebase/rebase.html
RM Systems
FunctionalType
I 94
3701II
III
IV ( )
10
3
R S M
R M
res mod
R R
Type II Subtypes
EcoRI
BamHI
HphI
BsrDI
AhdI
BcgI
HpaII
GAATTC
GGATCC
GGTGA
GCAATG
GAC(N)5GTC
CGA(N)6TGC
CCGG
R M
C
M1 M2
S
RM
V
M R
R
M R C
S
M R
R1 R2 M1 M2
REBASE EntriesI
IVIII
II
3701
893
650
2652
94
830
65
714
82
753
10
199
12
232
3
354
= R = M = S = Predicted
Type II Restriction Enzymesand
Methylases
Total number of R specificities: 262
Number of sequenced examples: 188
Total number of M specificities: 253
Number of sequenced examples: 193
The Bioinformatics Problems of RM Systems
1. M genes
Easy to find using motifs
2. S and V genes
Easy to find using motifs
3. C genes
Some are easy (C.BamHI, etc.)
Some are difficult
4. R genes
Very difficult unless homologs exist
Sequenced Restriction Enzymes Genes
Recognition
Sequence Family 1 Family 4Family 3Family 2
AATT
ACGT
AGCT
ATAT
CATG
CCGG
CGCG
CTAG
GATC
GCGC
GGCC
GTAC
TATA
TCGA
TGCA
TTAA
1
1 (2)
1
None known
2 (68)
1 (1)
2 (2)
3 (1)
21 (13)
2
5 (4)
1 (1)
None known
6 (1)
1 (4)
1
1 (1)
1 (1)
1
3(7)
1
2(2)
1
1
1
2(5)
1
1
1
Family 5
1
Analysis of new M gene hits
1. Is the overall sequence of the M gene similar to a known M gene?
2. Is the variable region (DNA recognition domain) highly similar to a known variable region?
3. Are there genes nearby that are similar to known S, V, C, R or other M genes?
4. Are the flanking genes similar to known non-R genes?
Problems
Methylases
1. What cutoff value will distinguish true positives from spurious hits?
a) How do we avoid just populating the database with more examples of the same?
b) How do we avoid the degeneration of the database by including marginal examples?
2. The HemK group of “apparent” methylases
Problems
Restriction enzymes
1. Even “true” matches are often very poor.
2. Good matches are “usually”, but not always, real isoschizomers. How do we distinguish?
3. Can we identify the “real” candidates, in the absenceof sequence similarity?
SHOTGUN SEQUENCING
R M
HindII
HindII
HindVP
» HindVP
HindVP
1. digestion of λ DNA using McaTI expression lysate only2. digestion using BssHII (NEB) only3. double digestion using BssHII and McaTI expression lysate
BssHII and McaTI haveno significant sequence similarity!
Acknowledgements
Janos Posfai Computer Scientist - Sequence Analysis
Tamas Vincze Programmer - Sequence Analysis
Yu Zheng Postdoctoral Fellow – in vitro experiments
Rick Morgan Staff Scientist – Experimental RE discovery
Dana Macelis Programmer - REBASE
I SUPPORTOPEN ACCESS
PubMed Centralwww.pubmedcentral.nih.gov/
Public Library of Sciencewww.publiclibraryofscience.org
Nucleic Acids Researchnar.oupjournals.org/