CVOS2015IIAV4q

1
RESEARCH POSTER PRESENTATION DESIGN © 2012 www.PosterPresentations.com As a first test, we applied our established Lustig-Jernigan (1997) ribonucleotide-amino acid potentials on an updated coarse-grained model calculation involving some 100-million lattice structures for BIV TAR-Tat. The native target Tat peptide was identified within the lowest 0.28% of lattice structures as evaluated by these ribonucleotide-amino acid potentials. However it appears there remains an opportunity for even better native-target identification in the TAR-Tat model and other coarse-grained examples by further refining the potentials. As a first step we applied to the original 1997 RNA-protein contacts, BLASTN-defined RNA variants replacing those determined from the original small-RNA databases. The ordering of the quartet of potentials for each of 5 key amino acids (A, R, N, Q, E) remains largely consistent. Moreover, the correlation plot of BLASTn-derived potentials versus the older corresponding potentials show significant co- linearity. This suggests that it is worthwhile to explore a much larger ensemble (over 600) of post-1997 RNA-protein complexes, using ribonucleotide-amino acid contact information from X-ray and sufficient complementary RNA variant data. An updated expression for normalizing the potentials that includes accounting for random substitutions shows promise in further refining these potentials. Abstract Introduction Methods and Results Conclusions Low-energy coarse grained structures can be evaluated with existing RNA-protein potentials, realistically allowing hierarchical modeling of low-energy all- atom models. Correlation between 1997-data and BLASTn- data shows the latter approach is plausible. Though initially here there are only 20 BLASTn alignments for three RNA-protein complexes, an expected larger set of more than one-hundred such complexes should show significantly improved sample size. References -Gerstein, M. & Altman, R. B. Average Core Structures and Variability Measures for Protein Families: Application to the Immunoglobulins. J. Mol. Biol. 1995. 251, 161-175. -Hsieh, M., Collins, E. D., Blomquist, T. & Lustig, B. In Flexibility of BIV TAR-Tat: Models of peptide binding. J. Biomol. Struct. Dyn. 2002, 20, 243-247. -Lustig, B., Arora, S. & Jernigan, R. L In RNA base-amino acid interaction strengths derieved from structures and sequences. Nucleic Acids Res. 1997, 25, 2562–2565. -Lustig, B. & Jernigan, R. L. Consistencies of individual DNA base-amino acid interactions in structures and sequences. Nucleic Acids Res. 1995, 23, 4707-4711 -Perez-Cano, L., Solernou, A., Pons, C. & Fernández-Recio, J. Structural prediction of protein-RNA interaction by computational docking with propensity-based statistical potentials. Pac. Symp. Biocomput. 2010A, 15, 293- 301. -Perez-Cano, L. & Fernández-Recio, J. Optimal Protein-RNA Area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins. 2010B, 78, 25-35. Computational modeling of RNA-protein interactions remains an important endeavor. This is in part the result of the accelerated identification of novel non-coding RNAs that exhibit a wide variety of function and interaction. However, exclusively all-atom approaches that model RNA-protein interactions via molecular dynamics and related methods are often problematic in their application. This is very likely a result of limitations involving RNA flexibility and certain force field potentials that prove problematic in such complexes. One possible alternative is the implementation of hierarchical approaches, first efficiently exploring configurational space with a coarse-grained representation of backbone placements, such as described in ROSETTA and in various lattice model approaches. Subsequently, the lowest energy set of such coarse-grained models can be used as scaffolds for all-atom placements, a standard method in modeling protein 3D-structure. However, this likely will require improved ribonucleotide-amino acid potentials as applied to coarse-grained structures. Thu Nguyen, Phuc Tran , Takayuki Kimura , Amos Park, Ariana Cesare, Loc Nguyen, Reema Shalan and Brooke Lustig [email protected] Development of Second-Generation RNA-Protein Statiastical Potentials Figure 1. Lattice modeling of BIV TAR-Tat (Hsieh et al., 2002). (Left) Native BIV TAR RNA backbone (1MNB) and its bound Tat peptide backbone (Orange and Red ribbon) aligned with corresponding lattice RNA and peptide (Pink and Green line). (Right) Lattice model energy distributions of non-mutant (red) and glycine (75, 78 and 75/78) substituted amino acid contacts using existing Lustig-Jernigan ribonucleotide-amino acid contact potentials. Acknowledgements : This work includes computational resources from SJSU College of Science Network and Computer Services. Native-like structure’s energy an estimated -83 kcal/mol Figure 2. Contact energies for the common set of amino acids, normalized for each RNA base quartet. The potential energy from Lustig- Jernigan (1997) ribonucleotide-amino acid potentials (red) and the twenty BLASTN-defined RNA sequences (orange). Generally the order of the potentials remains fairly consistent between sets of data. Figure 3. Correlation plot of the potential energies from 1997 versus BLASTn ones. The slope of the regression line is 1.12 and the correlation coefficient is 0.539, P < 0.020. A C G U A C G U A C G U A C G U A C G U A R N Q E -6 -4 -2 0 2 4 6 1997-data BLASTn-data lnf We used frequencies of contacts between bases and amino acids to derive relative interaction energies from the acyl tRNA-synthetase and U1A spliceosomal protein-RNA complexes (Lustig et al., 1997). This was similar to earlier DNA- protein contact potentials (Lustig et al., 1995). First we calculate the logarithms of frequencies for all occurrences of a j-type base interacting specifically with an I-type amino acid so that the interaction energy e Ij is of the form e Ij ~ -ln f Ij (1) where f Ij is the sum over all the sets of the relative frequency in which a base type j interacts with all occurrences of a residue type I. For each of the four bases, the relative interaction energies are then normalized as j ln f Ij = 0 (2) This original normalization simply shifts the values so that the mean for the four bases is zero. Here we applied BLASTn to query RNA sequences corresponding to the various classes of tRNAs binding to their respective tRNA-synthetase proteins. First, note there are a diversity of tRNA sequences corresponding to different organisms, unlike the U1A spliceosomal binding RNA. Normalization and other Strategies Figure 4. Contact energies calculated with different normalizations. Original normalization (green) uses equation, ln(fraction of the contacts) = 0 at each quartet, whereas alternative normalization (red) uses eq. 3. After we normalized the contact energy normalized within each quartets, we applied an additional normalization to account for the random substitution of amino acids in protein (Gerstein & Altman, 1995). Here, we modify eq. 1 by including a fraction F I of all amino acids in our protein sets that are of class I, such that e Ij ~ ln (F Ij /F I ) (3) where F Ij just rescales f Ij , so that the total number of residues in each fraction Here we propose a number of strategies: Evaluate normalized potentials in alternative docking simulation e.g. AutoDOCK. Utilize solvent accessible criteria for counting the residues and nucleotides (Perez-Cano et al., 2010A). Explore alternative normalization for the quartet i.e. normalize using the fraction of base type j for all nucleotides in the complexes. Specify sub-classes for contacts specific for major grooves versus minor grooves. Explore a much larger ensemble of current 602 RNA-protein complexes (Perez-Cano et al., 2010B). ACGU ACGU ACGU ACGU ACGU ACGU ACGU ACGU A R N Q E K T Y -8 -6 -4 -2 0 2 4 6 8 Original normalization Alternative normalization lnf

Transcript of CVOS2015IIAV4q

Page 1: CVOS2015IIAV4q

RESEARCH POSTER PRESENTATION DESIGN © 2012

www.PosterPresentations.com

As a first test, we applied our established Lustig-Jernigan (1997) ribonucleotide-amino acid potentials on an updated coarse-grained model calculation involving some 100-million lattice structures for BIV TAR-Tat. The native target Tat peptide was identified within the lowest 0.28% of lattice structures as evaluated by these ribonucleotide-amino acid potentials. However it appears there remains an opportunity for even better native-target identification in the TAR-Tat model and other coarse-grained examples by further refining the potentials. As a first step we applied to the original 1997 RNA-protein contacts, BLASTN-defined RNA variants replacing those determined from the original small-RNA databases. The ordering of the quartet of potentials for each of 5 key amino acids (A, R, N, Q, E) remains largely consistent. Moreover, the correlation plot of BLASTn-derived potentials versus the older corresponding potentials show significant co-linearity. This suggests that it is worthwhile to explore a much larger ensemble (over 600) of post-1997 RNA-protein complexes, using ribonucleotide-amino acid contact information from X-ray and sufficient complementary RNA variant data. An updated expression for normalizing the potentials that includes accounting for random substitutions shows promise in further refining these potentials.

Abstract

Introduction

Methods and Results

ConclusionsLow-energy coarse grained structures can be evaluated with

existing RNA-protein potentials, realistically allowing hierarchical modeling of low-energy all-atom models.

Correlation between 1997-data and BLASTn-data shows the latter approach is plausible.

Though initially here there are only 20 BLASTn alignments for three RNA-protein complexes, an expected larger set of more than one-hundred such complexes should show significantly improved sample size.

References-Gerstein, M. & Altman, R. B. Average Core Structures and Variability Measures for Protein Families: Application to the Immunoglobulins. J. Mol. Biol. 1995. 251, 161-175.-Hsieh, M., Collins, E. D., Blomquist, T. & Lustig, B. In Flexibility of BIV TAR-Tat: Models of peptide binding. J. Biomol. Struct. Dyn. 2002, 20, 243-247.-Lustig, B., Arora, S. & Jernigan, R. L In RNA base-amino acid interaction strengths derieved from structures and sequences. Nucleic Acids Res. 1997, 25, 2562–2565.-Lustig, B. & Jernigan, R. L. Consistencies of individual DNA base-amino acid interactions in structures and sequences. Nucleic Acids Res. 1995, 23, 4707-4711-Perez-Cano, L., Solernou, A., Pons, C. & Fernández-Recio, J. Structural prediction of protein-RNA interaction by computational docking with propensity-based statistical potentials. Pac. Symp. Biocomput. 2010A, 15, 293-301.-Perez-Cano, L. & Fernández-Recio, J. Optimal Protein-RNA Area, OPRA: A propensity-based method to identify RNA-binding sites on proteins. Proteins. 2010B, 78, 25-35.

Computational modeling of RNA-protein interactions remains an important endeavor. This is in part the result of the accelerated identification of novel non-coding RNAs that exhibit a wide variety of function and interaction. However, exclusively all-atom approaches that model RNA-protein interactions via molecular dynamics and related methods are often problematic in their application. This is very likely a result of limitations involving RNA flexibility and certain force field potentials that prove problematic in such complexes. One possible alternative is the implementation of hierarchical approaches, first efficiently exploring configurational space with a coarse-grained representation of backbone placements, such as described in ROSETTA and in various lattice model approaches. Subsequently, the lowest energy set of such coarse-grained models can be used as scaffolds for all-atom placements, a standard method in modeling protein 3D-structure. However, this likely will require improved ribonucleotide-amino acid potentials as applied to coarse-grained structures.

Thu Nguyen, Phuc Tran, Takayuki Kimura, Amos Park, Ariana Cesare, Loc Nguyen, Reema Shalan and Brooke [email protected]

Development of Second-Generation RNA-Protein Statiastical Potentials

Figure 1. Lattice modeling of BIV TAR-Tat (Hsieh et al., 2002). (Left) Native BIV TAR RNA backbone (1MNB) and its bound Tat peptide backbone (Orange and Red ribbon) aligned with corresponding lattice RNA and peptide (Pink and Green line). (Right) Lattice model energy distributions of non-mutant (red) and glycine (75, 78 and 75/78) substituted amino acid contacts using existing Lustig-Jernigan ribonucleotide-amino acid contact potentials.

Acknowledgements: This work includes computational resources from SJSU College of Science Network and Computer Services.

Native-like structure’s energy an estimated -83 kcal/mol

Figure 2. Contact energies for the common set of amino acids, normalized for each RNA base quartet. The potential energy from Lustig-Jernigan (1997) ribonucleotide-amino acid potentials (red) and the twenty BLASTN-defined RNA sequences (orange). Generally the order of the potentials remains fairly consistent between sets of data.

Figure 3. Correlation plot of the potential energies from 1997 versus BLASTn ones. The slope of the regression line is 1.12 and the correlation coefficient is 0.539, P < 0.020.

A C G U A C G U A C G U A C G U A C G UA R N Q E

-6

-4

-2

0

2

4

6

1997-data BLASTn-data

lnf

We used frequencies of contacts between bases and amino acids to derive relative interaction energies from the acyl tRNA-synthetase and U1A spliceosomal protein-RNA complexes (Lustig et al., 1997). This was similar to earlier DNA-protein contact potentials (Lustig et al., 1995). First we calculate the logarithms of frequencies for all occurrences of a j-type base interacting specifically with an I-type amino acid so that the interaction energy eIj is of the form

eIj ~ -ln fIj (1)

where fIj is the sum over all the sets of the relative frequency in which a base type j interacts with all occurrences of a residue type I. For each of the four bases, the relative interaction energies are then normalized as

∑jln fIj = 0 (2) This original normalization simply shifts the values so that the mean for the four bases is zero.

Here we applied BLASTn to query RNA sequences corresponding to the various classes of tRNAs binding to their respective tRNA-synthetase proteins. First, note there are a diversity of tRNA sequences corresponding to different organisms, unlike the U1A spliceosomal binding RNA.

Normalization and other Strategies

Figure 4. Contact energies calculated with different normalizations. Original normalization (green) uses equation, ln(fraction of the contacts) = 0 at each quartet, whereas alternative normalization (red) uses eq. 3.

After we normalized the contact energy normalized within each quartets, we applied an additional normalization to account for the random substitution of amino acids in protein (Gerstein & Altman, 1995). Here, we modify eq. 1 by including a fraction FI of all amino acids in our protein sets that are of class I, such that eIj ~ ln (FIj /FI) (3)

where FIj just rescales fIj, so that the total number of residues in each fraction effectively cancel.

Here we propose a number of strategies:

Evaluate normalized potentials in alternative docking simulation e.g. AutoDOCK.

Utilize solvent accessible criteria for counting the residues and nucleotides (Perez-Cano et al., 2010A).

Explore alternative normalization for the quartet i.e. normalize using the fraction of base type j for all nucleotides in the complexes.

Specify sub-classes for contacts specific for major grooves versus minor grooves.

Explore a much larger ensemble of current 602 RNA-protein complexes (Perez-Cano et al., 2010B).

A C G U A C G U A C G U A C G U A C G U A C G U A C G U A C G UA R N Q E K T Y

-8

-6

-4

-2

0

2

4

6

8

Original normalization

Alternative normalization

lnf