I SUPPORT OPEN ACCESS PubMed Central Public Library of Science Nucleic Acids Research...

I SUPPORTOPEN ACCESS

PubMed Centralwww.pubmedcentral.nih.gov/

Public Library of Sciencewww.publiclibraryofscience.org

Nucleic Acids Researchnar.oupjournals.org/

Thanks to the authors and reviewers who support NAR

Please introduce yourselves!

rebase.neb.com/rebase/rebase.html

RM Systems

FunctionalType

I 94

3701II

III

IV ( )

10

3

R S M

R M

res mod

R R

Type II Subtypes

EcoRI

BamHI

HphI

BsrDI

AhdI

BcgI

HpaII

GAATTC

GGATCC

GGTGA

GCAATG

GAC(N)5GTC

CGA(N)6TGC

CCGG

R M

C

M1 M2

S

RM

V

M R

R

M R C

S

M R

R1 R2 M1 M2

REBASE EntriesI

IVIII

II

3701

893

650

2652

94

830

65

714

82

753

10

199

12

232

3

354

= R = M = S = Predicted

Type II Restriction Enzymesand

Methylases

Total number of R specificities: 262

Number of sequenced examples: 188

Total number of M specificities: 253

Number of sequenced examples: 193

The Bioinformatics Problems of RM Systems

1. M genes

Easy to find using motifs

2. S and V genes

Easy to find using motifs

3. C genes

Some are easy (C.BamHI, etc.)

Some are difficult

4. R genes

Very difficult unless homologs exist

Sequenced Restriction Enzymes Genes

Recognition

Sequence Family 1 Family 4Family 3Family 2

AATT

ACGT

AGCT

ATAT

CATG

CCGG

CGCG

CTAG

GATC

GCGC

GGCC

GTAC

TATA

TCGA

TGCA

TTAA

1

1 (2)

1

None known

2 (68)

1 (1)

2 (2)

3 (1)

21 (13)

2

5 (4)

1 (1)

None known

6 (1)

1 (4)

1

1 (1)

1 (1)

1

3(7)

1

2(2)

1

1

1

2(5)

1

1

1

Family 5

1

Analysis of new M gene hits

1. Is the overall sequence of the M gene similar to a known M gene?

2. Is the variable region (DNA recognition domain) highly similar to a known variable region?

3. Are there genes nearby that are similar to known S, V, C, R or other M genes?

4. Are the flanking genes similar to known non-R genes?

Problems

Methylases

1. What cutoff value will distinguish true positives from spurious hits?

a) How do we avoid just populating the database with more examples of the same?

b) How do we avoid the degeneration of the database by including marginal examples?

2. The HemK group of “apparent” methylases

Problems

Restriction enzymes

1. Even “true” matches are often very poor.

2. Good matches are “usually”, but not always, real isoschizomers. How do we distinguish?

3. Can we identify the “real” candidates, in the absenceof sequence similarity?

SHOTGUN SEQUENCING

R M

HindII

HindVP

» HindVP

HindVP

1. digestion of λ DNA using McaTI expression lysate only2. digestion using BssHII (NEB) only3. double digestion using BssHII and McaTI expression lysate

BssHII and McaTI haveno significant sequence similarity!

Acknowledgements

Janos Posfai Computer Scientist - Sequence Analysis

Tamas Vincze Programmer - Sequence Analysis

Yu Zheng Postdoctoral Fellow – in vitro experiments

Rick Morgan Staff Scientist – Experimental RE discovery

Dana Macelis Programmer - REBASE

I SUPPORTOPEN ACCESS

PubMed Centralwww.pubmedcentral.nih.gov/

Public Library of Sciencewww.publiclibraryofscience.org

Nucleic Acids Researchnar.oupjournals.org/

I SUPPORT OPEN ACCESS PubMed Central Public Library of Science Nucleic Acids Research...

Documents

Transcript of I SUPPORT OPEN ACCESS PubMed Central Public Library of Science Nucleic Acids Research...