Shmuel Pietrokovski Computational and experimental analysis of … · 2002-01-09 · of proteins...

2
] 28 Tel. 972 8 934 Fax. 972 8 934 E-mail: Motifs can be identified by multiple-sequence alignments of proteins. Aligned motifs are called blocks. Blocks are extremely useful in various areas of protein research. In particular, they are effective in identifying new family members through database sequence searches, in predicting the proteins function and structure, and in designing PCR primers to amplify genes of unknown family members. The latest method our group developed is CYRCA (Cyclical Relations Consistency Analysis). CYRCA processes results of block to block comparisons (done with our LAMA program) and identifies sets of related blocks. Each set typically characterizes a motif of similar function and structure that appears in different contexts. For example, ATP-binding sites, Gil Amitai Olga Belenkiy Yonathan Caspi Milana Frenkel Dana Gerber Shay Marcus Einat Sitbon Computational and experimental analysis of proteins - sequence to structure/function relations Department of Molecular Genetics Shmuel Pietrokovski 2747 4108 [email protected] Our main area of research are the relations between the function and structure of proteins and their sequence. We approach this basic problem by computational sequence analysis and experimental research. Our computational studies focus on the analysis and use of conserved protein regions. The group is developing methods for using and identifying protein motifs. Much of this work involves the Blocks Database of conserved protein motifs and the tools we develop to use and maintain it. We also do in-depth computational studies particular protein families. These studies guide our experimental research for some of these families. For other families we collaborate with experimental research groups. Data and insights from our advanced computational analysis were very successfully used to direct experimental work. Fig. 1 Rossmann-type folds ligand binding motif. CYRCA sequence analysis identified a set of 47 different blocks, 22 of which from families with known protein structures. These structures belong to eight different folds. The figure shows structure superimposition of ligand binding regions from representative structures of the folds. Note that all ligands are bound on one plane and their phosphate/sulphate groups are placed in two regions. Each structure is colored differently with the protein backbones shown in strands and ligands shown as ‘sticks’ with phosphates, sulfate and an oxygen to be phosphorylated shown in spheres. See Kunin et al., 2001.

Transcript of Shmuel Pietrokovski Computational and experimental analysis of … · 2002-01-09 · of proteins...

Page 1: Shmuel Pietrokovski Computational and experimental analysis of … · 2002-01-09 · of proteins post translational modification sites ... This protein splicing activity is autoproteolytic

]28

Tel. 972 8 934 Fax. 972 8 934E-mail:

Motifs can be identified by multiple-sequence alignments of proteins. Aligned motifs are called blocks. Blocks are extremely useful in various areas of protein research. In particular, they are effective in identifying new family members through database sequence searches, in predicting the proteins function and structure, and in designing PCR primers to amplify genes of unknown family members.

The latest method our group developed is CYRCA (Cyclical Relations Consistency Analysis). CYRCA processes results of block to block comparisons (done with our LAMA program) and identifies sets of related blocks. Each set typically characterizes a motif of similar function and structure that appears in different contexts. For example, ATP-binding sites,

Gil Amitai

Olga Belenkiy

Yonathan Caspi

Milana Frenkel

Dana Gerber

Shay Marcus

Einat Sitbon

Computational and

experimental analysis of

proteins - sequence to

structure/function relations

Department of Molecular Genetics

Shmuel Pietrokovski

2747 [email protected]

Our main area of research are the relations between the function and structure of proteins and their sequence. We approach this basic problem by computational sequence analysis and experimental research. Our computational studies focus on the analysis and use of conserved protein regions. The group is developing methods for using and identifying protein motifs. Much of this work involves the Blocks Database of conserved protein motifs and the tools we develop to use and maintain it. We also do in-depth computational studies particular protein families. These studies guide our experimental research for some of these families. For other families we collaborate with experimental research groups. Data and insights from our advanced computational analysis were very successfully used to direct experimental work.

Fig. 1 Rossmann-type folds ligand binding

motif.

CYRCA sequence analysis identified a set

of 47 different blocks, 22 of which from

families with known protein structures. These

structures belong to eight different folds.

The figure shows structure superimposition

of ligand binding regions from representative

structures of the folds. Note that all

ligands are bound on one plane and their

phosphate/sulphate groups are placed in

two regions. Each structure is colored

differently with the protein backbones shown

in strands and ligands shown as ‘sticks’ with

phosphates, sulfate and an oxygen to be

phosphorylated shown in spheres. See Kunin

et al., 2001.

shmuel_piertro 26.12.2001, 12:3528

Page 2: Shmuel Pietrokovski Computational and experimental analysis of … · 2002-01-09 · of proteins post translational modification sites ... This protein splicing activity is autoproteolytic

Ge

no

mic

s, P

ro

te

om

ic

s a

nd

B

io

in

fo

rm

at

ic

s

helix-turn-helix DNA-binding motifs, active residues of catalytic triad proteases etc. Many of these sets cannot be identified by sequence to sequence and block to sequence comparisons. Our automatic grouping procedure allows to hierarchically order large amounts of protein sequence data. Individual sequences are grouped by families, in each family motifs are identified and aligned in blocks, blocks from different families are grouped in CYRCA sets. This procedure offers detailed functional predictions for whole proteins, specific regions and single residues that can be experimentally tested.

We are studying in detail motifs that are found in different protein families and interact with a single type of protein. For example, PCNA is a protein that forms a ‘sliding-clamp’ on DNA. Proteins that bind to PCNA are thus able to processively interact with DNA without specifically binding to any one region along it. Such proteins include DNA replication proteins, DNA and chromatin maintenance proteins and cell cycle regulators. PCNA binding sites that are found in different protein families mainly evolved independently of each other by convergence. Beyond improving the detection ability for these important sites and identifying new proteins that bind PCNA, we study the evolution of protein-protein interaction sites. This is relevant for research of proteins post translational modification sites, degradation signals etc.

The main protein family we currently study is inteins. Inteins are selfish genetic elements. They code for proteins that catalyze their excision out of host proteins, ligating the host flanks with a polypeptide bond. This protein splicing activity is autoproteolytic and is not dependent on any host specific factors. Most inteins also include a homing endonuclease domain that mediates the recombination of the intein gene into alleles lacking the intein element. The protein-splicing and endonuclease domains found in inteins are also present in other proteins. In some, the function of the domain is different, such as the C-terminal part of Hedgehog developmental proteins that has a protein-splicing-like domain. In others, the function seems the same, such as different types of nucleases that have homing endonuclease domains.

We are interested in inteins function and their evolution - understanding how they are selected and propagated and where they originated from. Inteins are very diverse in sequence but all have a protein-splicing activity that is relatively simple to assay. Hence, inteins are convenient for studying protein sequence/function relation. Recently we have shown that inteins with highly atypical active site residues can efficiently protein-splice. We also identified and showed the activity of a unique group of inteins that occur in insect viruses. These are the only inteins known to naturally protein-splice in the

cytoplasm of multicellular organisms. The inteins we identified protein spliced in E.coli and in insect cells and an intein with an endonuclease domain cut intein-less alleles.

Selected PublicationsHenikoff, S., E. A. Greene, S. Pietrokovski, P. Bork, T. K.

Attwood and L. Hood (1997) Gene families: the taxonomy of protein paralogs and chimeras. Science, 278, 609-614.

Pietrokovski, S., (1998) Identification of a virus intein and a possible variation in the protein-splicing reaction. Curr. Biol. 8, R634-R635.

Pietrokovski, S., (1998) Modular organization of inteins and C-terminal autocatalytic domains Protein Sci. 7, 64-71.

Kowalski, J. C., M. Belfort, M. A. Stapleton, M. Holpert, J. T. Dansereau, S. Pietrokovski, S. M. Baxter and V. Derbyshire (1999) Configuration of the catalytic GIY-YIG domain of intron endonuclease I- Tev I: coincidence of computational and molecular findings. Nucleic Acids Res. 27, 2115-2125.

Amitai, G., and S. Pietrokovski (1999) Fine-tuning an engineered intein. Nat. Biotechnol. 17, 854-855.

Kelman, Z., S. Pietrokovski and J. Hurwitz (1999) Isolation and Characterization of a Split B-type DNA Polymerase from the Archaeon Methanobacterium thermoautotrophicum DeltaH. J. Biol. Chem. 274, 28751 28761.

Henikoff, J.G., E.A. Greene, S. Pietrokovski and Henikoff S. (2000) Increased coverage of protein families with the Blocks Database servers. Nucleic Acids Res. 28, 228-230 .

Sapir, T., D. Horesh, M. Caspi, R. Atlas, H.A. Burgess, S. Grayer Wolf, F. Francis, J. Chelly, M. Elbaum, S. Pietrokovski and O. Reiner (2000) Doublecortin mutations cluster in evolutionarily conserved functional domains. Hum. Mol. Gen. 9, 703-712.

Pietrokovski, S., and B-Z. Shilo (2001) Identification of new signaling components in the Drosophila genome sequence. Funct. Integ. Genomics, 1, 250-255.

Kunin, V., B. Chan, E. Sitbon, G. Lithwick, and S. Pietrokovski (2001) Consistency analysis of similarity between multiple alignments- prediction of protein function and fold structure from analysis of local sequence motifs. J Mol. Biol. 307, 939-949.

Pietrokovski, S., (2001) Intein spread and extinction in evolution. Trends Genet. 17, 465-472.

AcknowledgementsOur work is supported by The Israel Science Foundation,

founded by The Israel Academy of Sciences and Humanities and by the MINERVA Foundation, Germany.

For additional information see:http://bioinfo.weizmann.ac.il/~pietro

] 29

shmuel_piertro 26.12.2001, 12:3529