Structure determination of noncanonical RNA motifs from H ...
Transcript of Structure determination of noncanonical RNA motifs from H ...
1
Structure determination of noncanonical RNA motifs from 1H NMR chemical shift data alone
Parin Sripakdeevong1, Mirko Cevec2, Andrew T. Chang3, Michèle C. Erat4,9, Melanie Ziegeler2, Qin
Zhao5, George E. Fox6, Xiaolian Gao6, Scott D. Kennedy7, Ryszard Kierzek8, Edward P. Nikonowicz3, Harald Schwalbe2, Roland K. O. Sigel9, Douglas H. Turner10, and Rhiju Das1,11,12,*
1Biophysics Program, Stanford University, Stanford, CA 94305, USA 2Center for Biomolecular Magnetic Resonance, Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe University Frankfurt, Max-von-Laue-Strasse 7, 60438 Frankfurt, Germany 3Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77251, USA 4Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom 5Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA 6Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA 7Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA 8Institute of Bioorganic Chemistry, Polish Academy of Sciences, 60-714 Poznan, Noskowskiego 12/14, Poland 9Institute of Inorganic Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 10Department of Chemistry, University of Rochester, Rochester, New York 14627, USA 11Department of Biochemistry, Stanford University, Stanford, CA 94305, USA 12Department of Physics, Stanford University, Stanford, CA 94305, USA * To whom correspondence should be addressed. Phone: (650) 723-5976. Fax: (650) 723-6783. E-mail: [email protected].
2
Abstract
Structured non-coding RNAs play key roles in fundamental processes from gene regulation to viral
pathogenesis, but determining these molecules’ three-dimensional conformation remains time-consuming,
even for the smallest functional motifs. We demonstrate herein that the integration of non-exchangeable 1H chemical shift data with Rosetta de novo modeling can consistently produce high-resolution RNA
structures, without the use of other NMR measurements. On a benchmark set of 21 noncanonical motifs,
including 11 blind targets, CS-ROSETTA-RNA recovered the experimental structure with high accuracy
(0.6 to 2.0 Å all-heavy-atom rmsd) in 16 cases. These successes included blind predictions of a group II
intron internal loop, two tRNA anticodon loops, four UNAC tetraloops, and a highly irregular 5´-GAGU-
3´/3´-UGAG-5´ internal loop that required additional synthesis of samples with modified sequence to
solve by conventional NMR means. Four cases poorly resolved by CS-ROSETTA-RNA involved
dynamics or lack of structure that also complicated traditional structure determination. By using
previously untapped information in 1H chemical shifts, CS-ROSETTA-RNA provides a new route for
enabling, accelerating, and validating structure determination of noncanonical RNA motifs.
3
Introduction
RNA molecules form complex three-dimensional structures that play key roles in a multitude of cellular
processes [see e.g., ref. (1)]. These RNAs are typically composed of canonical helices interconnected by
motifs with intricate, noncanonical structure. With size ranges on the order of a dozen nucleotides or less,
these RNA motifs offer compelling targets for solution NMR approaches (2, 3). Nevertheless, NMR
characterization of RNA motifs does not always produce sufficient constraints to generate confident
atomic-resolution 3D models (4-9). Compared to protein NMR spectroscopy, RNAs present
approximately half the density of protons, give lower dispersion of spin resonances due to the limited
chemical diversity of the four nucleobases, and lack a robust methodology that achieves unambiguous
sequential assignment via through-bond correlation experiments (2, 3). The development of new RNA
structure modeling approaches that extract more information from limited NMR data would accelerate the
structure determination process, permit cross-checks of model validity, and enable the determination of
structures for RNA sequences that require extraordinary efforts to solve by conventional NMR methods.
NMR chemical shifts have long been recognized as an important source of structural information
for functional macromolecules. Backbone chemical shifts are widely used for protein analysis, including
determination of protein secondary structures and backbone torsions (10, 11) and refinement of three-
dimensional models (12-14). More recently, chemical shift data have been leveraged for accurate de novo
protein structure determination (15-17). The development of chemical shift methodologies for RNA could
be similarly fruitful (18). First, non-exchangeable 1H chemical shifts are routinely collected in most NMR
experiments and account for 55% of all RNA chemical shift data deposited in the Biological Magnetic
Resonance Bank (BMRB) (19). Second, algorithms have been developed to ‘back-calculate’ non-
exchangeable 1H chemical shifts from RNA 3D structure (20-23). In particular, the well-calibrated
4
NUCHEMICS program has been used to refine models generated from conventional NMR measurements
(NOE, J-couplings, residual dipolar couplings) (24, 25) and to successfully determine de novo structures
of simple helical forms of nucleic acids (26). Nevertheless, structural information contained in NMR
chemical shift data is still not routinely leveraged for RNA structure determination.
In this work, we demonstrate that assigned 1H chemical shift data alone provide sufficient
information to determine the structures of noncanonical RNA motifs at high resolution, without other
NMR measurements. As with recent CS-ROSETTA approaches developed for proteins (16, 27, 28), the
key innovation has been the integration of chemical shift data with high-resolution de novo structure
prediction. Recent years have seen the development of a physically realistic Rosetta energy function for
RNA (29) and advances in RNA conformational sampling (30, 31) that enable atomic-accuracy recovery
of noncanonical RNA structures in favorable cases. We therefore reasoned that incorporating
experimentally determined 1H NMR chemical shifts into de novo modeling might enable consistent
structure determination of noncanonical RNA motifs. This article presents the resulting protocol,
Chemical-Shift-ROSETTA for RNA (CS-ROSETTA-RNA), and its extensive benchmark on 21 RNA
motifs, including 11 blind targets.
Results
Coupling 1H chemical shifts to Rosetta RNA structure modeling
We illustrate the CS-ROSETTA-RNA method first on a complex test motif that was challenging for prior
Rosetta approaches, a conserved UUAAGU hexaloop from 16S ribosomal RNA involved in RNA/protein
contacts (32, 33) (Fig. 1A). On one hand, application of two previously described RNA structure
prediction methods, Fragment Assembly with Full-Atom Refinement (FARFAR) (29) and Stepwise
5
Assembly (SWA) (31) successfully generated models with atomic-resolution agreement to this
hexaloop’s crystallographic structure (0.52 Å all-heavy-atom rmsd; see Figs. 1A-B). On the other hand,
the Rosetta all-atom energy function incorrectly evaluated the energy of these near-native models
compared to highly non-native models by at least 8 kBT (Fig. 1C). We reasoned that the structural
information contained in chemical shifts could be used to better discriminate the near-native models.
Indeed, the Rosetta near-native model explained several noncanonical 1H chemical shifts observed
experimentally (33), such as a significant upfield chemical shift for the H4´ atom of the hexaloop’s U7
nucleotide due to its apposition against the nucleobase of A5 [3.39 ppm, more than six standard
deviations upfield of the BMRB average (4.38 ± 0.15 ppm); see Fig. 1B]. Overall, the NUCHEMICS
back-calculated chemical shifts for the near-native Rosetta model gave good agreement with the
experimental non-exchangeable 1H chemical shifts as quantified by a chemical shift root-mean-square-
deviation (rmsdshift) of 0.19 ppm (Fig. 1D). Furthermore, for the entire set of Rosetta models, the chemical
shift deviation became progressively larger for models with greater conformational discrepancy from the
crystallographic structure (Fig. 1E). We therefore supplemented the Rosetta energy function with a
chemical shift pseudo-energy term (Eshift) proportional to the sum of squared deviations between
experimental and back-calculated chemical shifts (see Methods). After this pseudo-energy term was
added to the Rosetta all-atom energy function, the near-native model indeed gave the lowest energy
among all Rosetta models (Fig. 1F).
A benchmark for high resolution RNA modeling
To evaluate the generality and accuracy of the CS-ROSETTA-RNA method, we carried out modeling on
a benchmark set containing 21 RNA motifs (see Table 1). First, we applied CS-ROSETTA-RNA to a test
set of 10 noncanonical RNA motifs for which published chemical shift data as well as structural models
6
derived from NMR and, in some cases, crystallography were available (see SI Appendix, Table S1). These
RNA motifs ranged in size from six to thirteen nucleotides and included both hairpins and internal loops.
On average, 6.2 non-exchangeable 1H chemical shifts per nucleotide (out of 7-8 total) were assigned,
including both backbone and base protons (SI Appendix, Table S1). To assess the applicability of CS-
ROSETTA-RNA to RNAs that form heterogeneous ensembles, this test set included 3 motifs that were
known to adopt multiple conformations and/or display structural dynamics in solution: a G:G mismatch
with two alternative conformations (35), a poorly structured apical loop of the HIV-1 TAR RNA (36),
and a tandem UG:UA mismatch that contains no hydrogen bonds (37) (SI Appendix, Fig. S2). In addition
to these cases, we further tested CS-ROSETTA-RNA on 11 blind RNA targets concurrently under
investigation by five independent NMR laboratories. Sequences and assigned chemical shifts for these
targets, but no other information, were made available for chemical-shift-guided modeling. Subsequent
comparison of CS-ROSETTA-RNA models with structures derived from conventional NMR approaches
thus served as blind evaluations for the new method.
Over the entire benchmark of 21 RNA motifs, CS-ROSETTA-RNA returned 16 cases (76%) in
which at least one of the five lowest energy cluster centers achieved better than 2.0 Å rmsd (over all
heavy atoms; see Table 1). The method performed well on both the test set of known structures (8/10
success cases) and the blind targets set (8/11 success cases). Furthermore, 10 of the 21 cases satisfied a
more stringent success criterion: the lowest energy (top-ranked) model was within atomic-accuracy of the
experimental structure (under 1.5 Å all-heavy-atom rmsd). As expected, this number of stringent success
cases was significantly higher than the 2 of 21 cases achieved by modeling without chemical shift data
(see SI Appendix, Table S2).
CS-ROSETTA-RNA success cases included atomic-accuracy models from diverse sources, such
7
as the most conserved internal loop from the signal recognition particle (SRP) RNA (6, 7, 38) (rmsd of
0.81 Å; Fig. 2A); a two-way junction with a 90° turn from the hepatitis C virus IRES (39, 40) (rmsd of
1.48 Å; Fig. 2B); and a 4x4 internal loop from an R2 retrotransposon (41) (rmsd of 1.17 Å; Fig. 2C).
Successful blind predictions included the cuUCCaa anticodon stem-loop of Bacillus subtilis tRNAGly
(rmsd of 1.41 Å; Fig. 2D); a 5´-GU-3´/3´-UAU-5´ internal loop from a group II intron (rmsd of 1.37 Å;
Fig. 2E); and all four UNAC tetraloops (Figs. 2F-G). In each of these cases, the high accuracy of the CS-
ROSETTA-RNA model was reflected not only in low rmsd to the experimental structure but also
accurate recovery of the base pair and base stack geometries as classified in the Leontis-Westhof scheme
(34) (see SI Appendix, Table S3).
High-confidence models and structure cross-validation
Several CS-ROSETTA-RNA predictions gave strong convergence, as defined by a distinct energy
‘funnel’: a single dominant conformation and geometrically similar models achieved better energy than
all other conformations. In six benchmark cases, the lowest energy model gave an energy gap of >3.0 kBT
to the next-lowest energy cluster and, in each of these cases, the model achieved atomic-accuracy (under
1.5 Å rmsd to experimental structure; see SI Appendix, Fig. S3). This energy gap thus appears to be a
hallmark of CS-ROSETTA-RNA accuracy (see also A criterion for confidence prediction subsection of
SI Appendix, Supporting Results). In one apparent exception, the SRP conserved internal loop, a large
energy gap (5.5 kBT) strongly suggested that the CS-ROSETTA-RNA prediction should be accurate, but
the lowest energy CS-ROSETTA-RNA model showed disagreements with the experimental NMR models
(6) (>2.0 Å rmsd; see SI Appendix, Figs. S4A-B). Further analysis revealed that the experimental NMR
models poorly explained the 1H chemical shift data published in the same study (6) (rmsdshift = 0.52 ppm)
8
and poorly agreed with subsequently solved crystallographic structures (7, 38) [rmsd of 2.30 Å to PDB:
1LNT (38)]. In contrast, the CS-ROSETTA-RNA model gave excellent agreement with the chemical shift
data (rmsdshift = 0.18 ppm) and closely matched the crystallographic structures (rmsd of 0.81 Å to PDB:
1LNT; see Fig. 2A and SI Appendix, Figs. S4C-D). The SRP motif case suggests that the CS-ROSETTA-
RNA method can be used to help cross-validate or remodel NMR-derived structures.
Blind modeling of a highly irregular internal loop
Perhaps the most striking CS-ROSETTA-RNA result was the blind prediction of a self-complementary
4x4 internal loop with the sequence 5´-GABrGU-3´/3´-UBrGAG-5´ (BrG, denoting 8-bromoguanosine, was
introduced to stabilize the major of two conformations observed in the RNA). The original NMR study
(42) of this motif provided a descriptive 2D schematic (reproduced in Fig. 3A) but did not produce
sufficient constraints to generate an atomic-detail three-dimensional structure due to ambiguity in NOE
assignments. In contrast, CS-ROSETTA-RNA was able to produce a 3D model (Fig. 3B) with a large
energy gap (28.1 kBT), giving high confidence in the model’s accuracy. Further supporting its accuracy,
the CS-ROSETTA-RNA model recovered information proposed by the NMR study but set aside during
structure modeling, including two G-G trans Watson-Crick/Hoogsteen base pairs, the two-fold rotational
symmetry of the structure, and C2´-endo ribose conformations at six nucleotides (see SI Appendix,
Supporting Results). Finally, chemical shift data strongly supported an unusual feature of the CS-
ROSETTA-RNA model, previously unseen in the database of RNA structures [as assessed by FR3D (43);
see SI Appendix, Supporting Results]: the nucleobases of the two central adenosines were stacked on each
other, formed no base pairs, and formed hydrogen bonds to the opposing strand backbone. To
accommodate these interactions, the adenosines adopted an irregular backbone conformation, involving
9
~180° inversions in their ribose orientation relative to the riboses in the preceding and following
nucleotides. The resulting geometry positioned the adenosines’ H4´, 1H5´, and 2H5´ atoms directly above
the neighboring guanosine nucleobases, leading to strongly predicted upfield shifts that agreed well with
the experimentally measured chemical shifts (rmsdshift = 0.18 ppm; Fig. 3C). Here, the enumerative SWA
method (29, 31), which does not rely on a fragment database of previous structures, was critical for
modeling the conformation of this highly irregular internal loop.
Subsequently, an additional series of NMR experiments with newly synthesized non-self-
complementary duplex allowed for the assignment of previously ambiguous NOEs. The resulting
conventionally determined NMR structure of the 5´-GABrGU-3´/3´-UBrGAG-5´ internal loop further
confirmed the accuracy of the CS-ROSETTA-RNA model at atomic resolution (see Fig. 3D and SI
Appendix, Fig. S5). CS-ROSETTA-RNA high-confidence modeling of the 5´-GABrGU-3´/3´-UBrGAG-5´
motif suggests that the method can be used to facilitate structure determination in cases where NOE
distances and assignments cannot be determined unambiguously due to overlap and/or symmetry (42).
Poor accuracy cases
Despite generally giving high-resolution models, CS-ROSETTA-RNA returned 5 of 21 cases with poorer
than 2.0 Å rmsd accuracy. In one case (uuGCCaa anticodon stem-loop of B. subtilis tRNAGly), either
FARFAR modeling using a different fragment database or SWA modeling alone returned <2.0 Å rmsd
structures among the five lowest energy clusters (SI Appendix, Fig. S6), suggesting that this motif would
be recoverable with different sampling schemes. For the other four failure cases, examination of the
original experimental NMR models revealed potential complications for structure modeling. Each of
these cases presented large deviations in the coordinates of the experimental NMR ensemble (>2.5 Å
10
average pairwise rmsd for HIV-1 TAR apical loop and two HAR1 internal loops) or more localized
fluctuations (tandem UG:UA mismatch; see SI Appendix, Fig. S2). Such structural dynamics in solution
precluded high-resolution agreement between these structures and the CS-ROSETTA-RNA models.
Interestingly, a separate benchmark case involving a heterogeneous ensemble was recovered by CS-
ROSETTA-RNA: models for two populated conformations of the G:G mismatch were attained and
agreed with the experimental NMR ensemble (35) (Figs. 2H and 2I). Thus, overall, our benchmark results
demonstrate the general applicability of CS-ROSETTA-RNA for high resolution RNA structure
determination but highlight the possibility of ambiguity in cases where the RNA conformation is dynamic
or unstructured.
Discussion
Natural RNA sequences form intricate three-dimensional structures involving the interconnection of
canonical helices and small, functional noncanonical motifs. By integrating NMR chemical shift data into
a new generation of RNA de novo modeling algorithms, CS-ROSETTA-RNA enables confident and rapid
determination of noncanonical RNA motif structures in a manner fundamentally distinct from prior
methods, using independent and far less experimental information. While obtaining resonance
assignments is a necessary and time-consuming step of NMR-based RNA characterization, the structural
information contained in chemical shifts are typically left aside during the generation of RNA structural
models. Furthermore, the standard operating procedure (2, 3) of determining NOEs, J-couplings, and, in
some cases, residual dipolar coupling, does not always yield sufficient information to determine an
RNA’s three-dimensional structure by conventional means, as illustrated by the 5´-GABrGU-3´/3´-
UBrGAG-5´ example above (Fig. 3). Here, incorporating assigned 1H chemical shift data alone into
11
Rosetta-based RNA modeling gives high accuracy structures for the majority of cases in a 21-RNA
benchmark, including 8 of 11 blind targets. A large energy gap between the lowest and next lowest
energy cluster center provided a powerful criterion for assessing confidence in the modeling. Since the
models are based on independent data, the CS-ROSETTA-RNA structures can provide strong cross-
validation or point out potential inaccuracies for NMR models generated by conventional means, as in the
SRP RNA motif case (see Fig. 2C and SI Appendix, Fig. S4). Furthermore, in cases like the irregular 5´-
GABrGU-3´/3´-UBrGAG-5´ loop where ambiguities hindered NOE assignments, the CS-ROSETTA-RNA
approach permits the bypass of experiments with additional samples to yield high-confidence, all-atom
models.
The CS-ROSETTA-RNA method can be used immediately for modeling or cross-validating new
RNA motifs. In addition, several extensions for the CS-ROSETTA-RNA approach appear feasible, based
on recent work in chemical-shift-guided protein modeling (16, 27, 28). Integration of chemical shift
information with residual dipolar coupling in de novo structure prediction could permit the rapid
modeling of larger RNAs composed of multiple motifs (39, 44, 45). Furthermore, the development and
calibration of back-calculation algorithms for RNA 13C, 15N, and exchangeable 1H chemical shifts (46,
47) will enable the incorporation of these additional, structural data into CS-ROSETTA-RNA. Finally,
NMR experiments can reveal chemical shifts of slowly exchanging or rarely populated ‘hidden’ states of
nucleic acids with potential functional importance (48-50). Thus, further integration of de novo modeling
and NMR methodologies may allow not just the acceleration of structure determination but eventually
lead to the solution of currently intractable three-dimensional RNA structures.
Methods
12
Generation of Rosetta Models
Two complementary structure-modeling methods, Fragment Assembly with Full-Atom Refinement
(FARFAR) and Stepwise Assembly (SWA), were used in parallel to generate the Rosetta models for each
motif. SWA models were constructed using a series of recursive building steps, as described previously
(31). Each step involved enumerating several million conformations for each nucleotide, and all step-by-
step build-up paths were covered in N2 building steps where N is the number of nucleotides in the motif.
At the final building steps, all models are finely clustered and a maximum of 10,000 low energy SWA
models were retained. The SWA approach is effective at generating models that are highly optimized to
the underlying all-atom energy function, but can produce primarily incorrect models when the assumed
energy function is inaccurate. Therefore, models were also generated by fragment assembly followed by
full-atom refinement (FARFAR) in the Rosetta framework, as described previously (29); the fragment
source was the large ribosomal subunit of H. marismortuii (PDB: 1JJ2). For each motif, 250,000
FARFAR models were generated; these models were then finely clustered and a maximum of 10,000 low
energy FARFAR models were retained. The SWA and FARFAR models were then combined leading to
~10,000-20,000 final Rosetta models for each motif. The total computational costs for the generation of
SWA and FARFAR models in term of modern central processing unit (CPU) are as follow. For SWA
runs, the computational cost range from ~5,000 CPU hours for the smallest motif (6-nt) to ~50,000 CPU
hours for the largest motif (13-nt) investigated in this work. For FARFAR runs, the computational cost
range from ~3,000 CPU hours for the smallest motif (6-nt) to ~8,000 CPU hours for the largest motif (13-
nt). Algorithms and complete documentation are being incorporated into Rosetta release 3.5
(www.rosettacommons.org), freely available for academic use.
To further encourage usage of the CS-ROSETTA-RNA method by the general NMR RNA
community, a web server where users can access and submit CS-ROSETTA-RNA jobs is made available
13
at http://rosie.rosettacommons.org/rna_denovo. Due to computational resource limitations and to ensure
short queue time, the web server runs a slightly modified version of CS-ROSETTA-RNA where the
models are generate using only the FARFAR method and the maximum number of models per job
submission is limited to 50,000.
Incorporation of 1H chemical shifts into structure modeling
Information from the experimental non-exchangeable 1H chemical shifts were incorporated into the
modeling process through the chemical shift pseudo-energy term:
!
Eshift = c " # iexp $# i
calc( )i%
2 [1]
where and are, respectively, the experimental and back-calculated chemical shift in ppm units
(the index i sums over all experimentally assigned non-exchangeable 1H chemical shifts in the RNA
motif), and c is a weighting factor set to 4.0 kBT/ppm-2 based on test runs with different motifs. The
NUCHEMICS program (22) was used to back-calculate non-exchangeable 1H chemical shifts. In the 21
RNA motifs benchmark set, only 3 chemical shift datasets (UUCG tetraloop, Chimp HAR1 GAA loop
and Human HAR1 GAA loop) included stereospecific assignments of the diastereotopic 1H5´ and 2H5´
protons pair. For the remaining 18 chemical shift datasets, the assignment of 1H5´ and 2H5´ was
determined for each model based on which values gave better agreement between the experimental and
back-calculated chemical shifts.
Each Rosetta model was refined and rescored under the hybrid all-atom energy:
Ehybrid = ERosetta + Eshift [2]
where ERosetta is the standard Rosetta all-atom energy function for RNA (29), and Eshift is the chemical
shift pseudo-energy term. Refinement of the models under the Ehybrid all-atom energy function was carried
14
out using continuous minimization in torsional space with the Davidson–Fletcher–Powell algorithm under
the Rosetta framework [For this purpose, the NUCHEMICS algorithm was rewritten inside the Rosetta
codebase (51)]. After refinement, the models were rescored and re-ranked under the Ehybrid all-atom
energy function. Finally, all models were clustered, such that models with pairwise all-heavy-atom rmsd
below 1.5 Å are grouped. The lowest energy member of each cluster was designated as the cluster center
and the five lowest energy cluster centers were designated the CS-ROSETTA-RNA predictions.
15
Acknowledgments
We thank Hashim Al-Hashimi for suggesting the problem and the Rosetta community for sharing code.
We are grateful to G. Varani and J. D. Puglisi for providing experimental 1H chemical shift data (for
HIV-1 TAR apical loop and Hepatitis C Virus IRES IIa, respectively). Calculations were carried out on
the BioX2 cluster (NSF award CNS-0619926) and XSEDE resources (allocation MCB090153). We
acknowledge financial support from a Burroughs Wellcome Career Award at the Scientific Interface (to
RD), a C. V. Starr Asia/Pacific Stanford Graduate Fellowship (to PS), the National Institutes of Health
Grant GM73969 to (EPN), the Swiss National Science Foundation and the European Commission (BIO-
NMR, ERC-Starting Grant, Marie Curie Fellowship) (to MCE and RKOS), and the Deutsche
Forschungsgemeinschaft (Cluster of Excellence) and the European Commission (Bio-NMR, WeNMR,
Marie Curie Fellowship) (to MC, MZ and HS).
16
References
1. Gesteland RF, Cech T, & Atkins JF (2006) The RNA world: the nature of modern RNA suggests a
prebiotic RNA world (Cold Spring Harbor Lab Press, Cold Spring Harbor, NY). 2. Furtig B, Richter C, Wohnert J, & Schwalbe H (2003) NMR spectroscopy of RNA.
Chembiochem 4(10):936-962. 3. Scott LG & Hennig M (2008) RNA structure determination by NMR. Methods Mol Biol 452:29-
61. 4. Peterson RD & Feigon J (1996) Structural change in Rev responsive element RNA of HIV-1 on
binding Rev peptide. J Mol Biol 264(5):863-877. 5. Ippolito JA & Steitz TA (2000) The structure of the HIV-1 RRE high affinity rev binding site at
1.6 A resolution. J Mol Biol 295(4):711-717. 6. Schmitz U, James TL, Lukavsky P, & Walter P (1999) Structure of the most conserved internal
loop in SRP RNA. Nat Struct Biol 6(7):634-638. 7. Jovine L, et al. (2000) Crystal structure of the ffh and EF-G binding sites in the conserved domain
IV of Escherichia coli 4.5S RNA. Structure 8(5):527-540. 8. Tolbert BS, et al. (2010) Major groove width variations in RNA structures determined by NMR
and impact of 13C residual chemical shift anisotropy and 1H-13C residual dipolar coupling on refinement. J Biomol NMR 47(3):205-219.
9. Nabuurs SB, Spronk CAEM, Vuister GW, & Vriend G (2006) Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors. PLoS Comput Biol 2(2):e9.
10. Szilágyi L (1995) Chemical shifts in proteins come of age. Progress in Nuclear Magnetic Resonance Spectroscopy 27(4):325-442.
11. Cornilescu G, Delaglio F, & Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13(3):289-302.
12. Kuszewski J, Gronenborn AM, & Clore GM (1995) The impact of direct refinement against proton chemical shifts on protein structure determination by NMR. J Magn Reson B 107(3):293-297.
13. Clore GM & Gronenborn AM (1998) New methods of structure refinement for macromolecular structure determination by NMR. Proc Natl Acad Sci U S A 95(11):5891-5898.
14. Vila JA, et al. (2008) Quantum chemical 13C(alpha) chemical shift calculations for protein NMR structure determination, refinement, and validation. Proc Natl Acad Sci U S A 105(38):14389-14394.
15. Cavalli A, Salvatella X, Dobson CM, & Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci U S A 104(23):9615-9620.
16. Shen Y, et al. (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci U S A 105(12):4685-4690.
17. Wishart DS, et al. (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucleic Acids Res 36(Web Server issue):W496-502.
18. Wishart DS & Case DA (2001) Use of chemical shifts in macromolecular structure determination. Methods Enzymol 338:3-34.
19. Ulrich EL, et al. (2008) BioMagResBank. Nucleic Acids Res 36(Database issue):D402-408. 20. Prado FR & Giessner-Prettre C (1981) Parameters for the calculation of the ring current and
atomic magnetic anisotropy contributions to magnetic shielding constants: Nucleic acid bases and
17
intercalating agents. Journal of Molecular Structure: THEOCHEM 76(1):81-92. 21. Dejaegere A, Bryce Richard A, & Case David A (1999) An Empirical Analysis of Proton
Chemical Shifts in Nucleic Acids. Modeling NMR Chemical Shifts, ACS Symposium Series, (American Chemical Society), Vol 732, pp 194-206.
22. Cromsigt JA, Hilbers CW, & Wijmenga SS (2001) Prediction of proton chemical shifts in RNA. Their use in structure refinement and validation. J Biomol NMR 21(1):11-29.
23. Lam SL (2007) DSHIFT: a web server for predicting DNA chemical shifts. Nucleic Acids Res 35(Web Server issue):W713-717.
24. Huthoff H, Girard F, Wijmenga SS, & Berkhout B (2004) Evidence for a base triple in the free HIV-1 TAR RNA. RNA 10(3):412-423.
25. Girard FC, Ottink OM, Ampt KA, Tessari M, & Wijmenga SS (2007) Thermodynamics and NMR studies on Duck, Heron and Human HBV encapsidation signals. Nucleic Acids Res 35(8):2800-2811.
26. van der Werf RM (2011) New Methods for Studying Structure and Dynamics of RNA. Ph.D. Thesis (Radboud University Nijmegen Nijmegen, The Netherlands).
27. Shen Y, Vernon R, Baker D, & Bax A (2009) De novo protein structure generation from incomplete chemical shift assignments. J Biomol NMR 43(2):63-78.
28. Raman S, et al. (2010) NMR structure determination for larger proteins using backbone-only data. Science 327(5968):1014-1018.
29. Das R, Karanicolas J, & Baker D (2010) Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods 7(4):291-294.
30. Das R & Baker D (2007) Automated de novo prediction of native-like RNA tertiary structures. Proceedings of the National Academy of Sciences 104(37):14664-14669.
31. Sripakdeevong P, Kladwang W, & Das R (2011) An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling. Proc Natl Acad Sci U S A 108(51):20573-20578.
32. Carter AP, et al. (2000) Functional insights from the structure of the 30S ribosomal subunit and its interactions with antibiotics. Nature 407(6802):340-348.
33. Zhang H, Fountain MA, & Krugh TR (2001) Structural characterization of a six-nucleotide RNA hairpin loop found in Escherichia coli, r(UUAAGU). Biochemistry 40(33):9879-9886.
34. Leontis NB & Westhof E (2001) Geometric nomenclature and classification of RNA base pairs. RNA 7(4):499-512.
35. Burkard ME & Turner DH (2000) NMR structures of r(GCAGGCGUGC)2 and determinants of stability for single guanosine-guanosine base pairs. Biochemistry 39(38):11748-11762.
36. Aboul-ela F, Karn J, & Varani G (1996) Structure of HIV-1 TAR RNA in the absence of ligands reveals a novel conformation of the trinucleotide bulge. Nucleic Acids Res 24(20):3974-3981.
37. Shankar N, et al. (2007) NMR reveals the absence of hydrogen bonding in adjacent UU and AG mismatches in an isolated internal loop from ribosomal RNA. Biochemistry 46(44):12665-12678.
38. Deng J, Xiong Y, Pan B, & Sundaralingam M (2003) Structure of an RNA dodecamer containing a fragment from SRP domain IV of Escherichia coli. Acta Crystallogr D Biol Crystallogr 59(Pt 6):1004-1011.
39. Lukavsky PJ, Kim I, Otto GA, & Puglisi JD (2003) Structure of HCV IRES domain II determined by NMR. Nat Struct Biol 10(12):1033-1038.
40. Zhao Q, Han Q, Kissinger CR, Hermann T, & Thompson PA (2008) Structure of hepatitis C virus IRES subdomain IIa. Acta Crystallogr D Biol Crystallogr 64(Pt 4):436-443.
41. Lerman YV, et al. (2011) NMR structure of a 4 x 4 nucleotide RNA internal loop from an R2
18
retrotransposon: identification of a three purine-purine sheared pair motif and comparison to MC-SYM predictions. RNA 17(9):1664-1677.
42. Hammond NB, Tolbert BS, Kierzek R, Turner DH, & Kennedy SD (2010) RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 49(27):5817-5827.
43. Sarver M, Zirbel C, Stombaugh J, Mokdad A, & Leontis N (2008) FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. Journal of Mathematical Biology 56(1):215-252.
44. Davis JH, et al. (2005) RNA helical packing in solution: NMR structure of a 30 kDa GAAA tetraloop-receptor complex. J Mol Biol 351(2):371-382.
45. Wang J, et al. (2009) A method for helical RNA global structure determination in solution using small-angle x-ray scattering and NMR measurements. J Mol Biol 393(3):717-734.
46. Rossi P & Harbison GS (2001) Calculation of 13C chemical shifts in rna nucleosides: structure-13C chemical shift relationships. J Magn Reson 151(1):1-8.
47. Aeschbacher T, Schubert M, & Allain FH (2012) A procedure to validate and correct the 13C chemical shift calibration of RNA datasets. J Biomol NMR 52(2):179-190.
48. Johnson JE, Jr. & Hoogstraten CG (2008) Extensive backbone dynamics in the GCAA RNA tetraloop analyzed using 13C NMR spin relaxation and specific isotope labeling. J Am Chem Soc 130(49):16757-16769.
49. Bothe JR, et al. (2011) Characterizing RNA dynamics at atomic resolution using solution-state NMR spectroscopy. Nat Methods 8(11):919-931.
50. Nikolova EN, et al. (2011) Transient Hoogsteen base pairs in canonical duplex DNA. Nature 470(7335):498-502.
51. Leaver-Fay A, et al. (2011) ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol 487:545-574.
19
Figure Legends Figure 1. The CS-ROSETTA-RNA method illustrated on a conserved UUAAGU hairpin from the
16S rRNA. (A) The crystallographic model of the UUAAGU hairpin (PDB: 1FJG). (B) Rosetta near-
native model with a 0.52 Å all-heavy-atom rmsd to the crystallographic model (rmsd calculated over the
entire loop, excluding the flexible G6 extra-helical bulge). Two-dimensional schematics follow the
Leontis and Westhof nomenclature (34). (C) Plot of the Rosetta energy vs. rmsd to the crystallographic
conformation for all Rosetta models before the inclusion of the chemical shift pseudo-energy term. The
Rosetta all-atom energy function incorrectly assigns numerous non-native models (>5.0 Å rmsd) with
significantly lower energies (~8 kBT) than the near-native models. (D) Comparison of the back-calculated
chemical shifts from the Rosetta near-native model vs. experimental 1H chemical shift values shows
excellent agreement (rmsdshift= 0.19 ppm). (E) Plot of the average rmsdshift of all Rosetta models in 0.5-Å
rmsd bins from the crystallographic structure. On average, the agreement between the experimental and
back-calculated chemical shifts becomes progressively worse for models with greater conformational
discrepancy from the crystallographic structure. (F) Plot of the Rosetta energy vs. rmsd to the
crystallographic conformation for all Rosetta models after the inclusion of the chemical shift pseudo-
energy term. With chemical shift data, the near-native model (G) becomes the lowest energy model
overall (green circle).
Figure 2. Comparison of experimental and CS-ROSETTA-RNA models for diverse RNA motifs.
The Rosetta models (shown in color) are overlaid over the experimental structures (shown in white). (A)
4x4 internal loop from the most conserved domain of the signal recognition particle (SRP) RNA (PDB:
1LNT). (B) Hepatitis C Virus IRES Subdomain IIa (PDB: 2PN4). (C) 4x4 internal loop from an R2
retrotransposon (PDB: 2L8F). (D) Glycine tRNA(UCC) anticodon stem-loop from Bacillus subtilis
(PDB: 2LBL). (E) 5´-GU-3´/3´-UAU-5´ internal loop from a group II intron (PDB: not yet deposited). (F)
UAAC tetraloop (PDB: 4A4R). (G) UCAC tetraloop (PDB: 4A4S). (H) G(anti):G(syn) single mismatch
(~75% populated; PDB: 1F5G). (I) G(syn):G(anti) single mismatch (~25% populated; PDB: 1F5H). The
rmsds between Rosetta models (energy cluster rank) and the experimental structure are (A) 0.81 Å (first),
(B) 1.48 Å (second), (C) 1.17 Å (first), (D) 1.41 (third), (E) 1.37 Å (first), (F) 0.94 Å (first), (G) 1.00 Å
(first), (H) 0.71 Å (first), and (I) 0.63 Å (second). The two-dimensional schematics are annotated based
20
on the experimental structure and follow the Leontis and Westhof nomenclature (34).
Figure 3. CS-ROSETTA-RNA modeling of the 5´-GAGU-3´/3´-UGAG-5´ self-complementary
internal loop. (A) A 2D schematic proposed in the original NMR study (42) and (B) CS-ROSETTA-
RNA model of the 5´-GABrGU-3´/3´-UBrGAG-5´ symmetric duplex. In the CS-ROSETTA-RNA model,
the backbone of A5 and A5* adopts a highly irregular conformation, which leads to a ~180° inversion in
the adenosines’ ribose orientation relative to the riboses in the preceding and following nucleotides (G4,
G6, G4*, G6*; see direction of arrows at the O4* ribose atoms). This inverted ribose conformation
positions the H4´, 1H5´ and 2H5´ ribose protons of A5 and A5* directly above the G6 and G6* guanine
rings and the resulting large ring current effects explain the large upfield shifts of these protons. (C)
Back-calculated chemical shift values give excellent agreement with experimentally determined values
(rmsdshift = 0.18 ppm). The upfield shifted H4´, 1H5´ and 2H5´ ribose protons of A5 and A5* are
highlighted with green circles. (D) Subsequently determined experimental NMR ensemble of the 5´-
GABrGU-3´/3´-UBrGAG-5´ internal loop (PDB: 2LX1) agrees with the CS-ROSETTA-RNA model at
atomic resolution (1.07 Å rmsd between the first member of the NMR ensemble and the CS-ROSETTA-
RNA model).
Table Legends Table 1. The CS-ROSETTA-RNA method benchmarked on 21 RNA motifs.
Figure 1
Crystal (PDB: 1FJG) With Chemical Shift Data (1st-ranked model)
DATA Linear Fit
y = 1.02x ! 0.02 R2 = 0.985_ _ _
rmsdshift= 0.193 ppms
E F D
A C B
A Figure 2
E
B
C
F
H
F
I
G
D
Figure 3 A
C DATA Linear Fit
y = 1.003x ! 0.045 R2 = 0.976__ ___
rmsdshift = 0.183 ppm _ _
D NMR Ensemble
1H5"
2H5"
H4"
B CS-ROSETTA-RNA Model
Table 1. The CS-ROSETTA-RNA method benchmarked on 21 RNA motifs. Motif name PDBa Nnt
b rmsdc,d
(Å) rmsd-top5c,e
(Å)
Known structures Single G:G mismatch 1F5G 6 0.71 0.71 UUCG tetraloop 2KOC 6 0.84 0.84 Tandem GA:AG mismatch 1MIS 8 1.10 1.10 Tandem UG:UA mismatch 2JSE 8 3.02 2.52 16s rRNA UUAAGU loop 1FJG 8 0.52 0.52 HIV-1 TAR apical loop 1ANR 8 5.86 5.86 tRNAi
Met ASL 1SZY 9 3.89 1.35 Conserved SRP internal loop 1LNT 12 0.81 0.81 R2 retrotransposon 4x4 loop 2L8F 12 1.17 1.17 Hepatitis C virus IRES IIa 2PN4 13 3.21 1.48 Blind targets UAAC tetraloop 4A4R 6 0.97 0.97 UCAC tetraloop 4A4S 6 1.00 1.00 UGAC tetraloop 4A4U 6 3.60 1.67 UUAC tetraloop 4A4T 6 1.72 1.72 Chimp HAR1 GAA loop 2LHP 7 2.88 2.88 Human HAR1 GAA loop 2LUB 7 2.26 2.03 GU:UAU internal loop – f 9 1.37 1.37 tRNAGly ASL (cuUCCaa)g 2LBL 9 3.28 1.41 tRNAGly ASL (cuUCCcg)g 2LBK 9 3.42 1.94 tRNAGly ASL (uuGCCaa)g 2LBJ 9 3.08 2.93 Self-symmetric GABrGU loop 2LX1 12 1.14 1.14 rmsd < 1.50 Å ! ! 10/21 13/21 rmsd < 2.00 Å ! ! 11/21 16/21
Additional information and full motif names provided in SI Appendix, Tables S1 and S3. Energy vs. rmsd plots for all 21 benchmark motifs provided in SI Appendix, Figure S1.
a PDB ID of experimental reference structure. b Motif size, the number of nucleotides in the modeled RNA motif. Each motif consists of non-canonical core nucleotides closed by boundary canonical (W.C. or G-U wobble) base pairs. c All-heavy-atom rmsd over all nucleotides, excluding the boundary canonical base pairs after alignment over all nucleotides. Nucleotides found to be extra-helical bulges (both unpaired and unstacked) in the experimental structure were excluded from both the alignment and the rmsd calculation. Bold text indicates rmsd better than 2.0 Å. d All-heavy-atom rmsd of the first-ranked (lowest energy) model to the experimental structure. e Lowest all-heavy-atom rmsd to the experimental structure among the five lowest energy cluster centers. f The experimental structure has not yet been deposited into the PDB database. g The sequence of the 7-nt anticodon loop is given in parentheses with the anticodon triplet in uppercase.
1
Supporting Information for “Structure determination of noncanonical RNA motifs from 1H NMR chemical shift data alone”
Parin Sripakdeevong1, Mirko Cevec2, Andrew T. Chang3, Michèle C. Erat4,9, Melanie Ziegeler2, Qin Zhao5, George E. Fox6, Xiaolian Gao6, Scott D. Kennedy7, Ryszard Kierzek8, Edward P. Nikonowicz3,
Harald Schwalbe2, Roland K. O. Sigel9, Douglas H. Turner10, and Rhiju Das1,11,12,* 1Biophysics Program, Stanford University, Stanford, CA 94305, USA 2Center for Biomolecular Magnetic Resonance, Institute for Organic Chemistry and Chemical Biology, Johann Wolfgang Goethe University Frankfurt, Max-von-Laue-Strasse 7, 60438 Frankfurt, Germany 3Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77251, USA 4Department of Biochemistry, University of Oxford, Oxford OX1 3QU, United Kingdom 5Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA 6Department of Biology and Biochemistry, University of Houston, Houston, TX 77004, USA 7Department of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA 8Institute of Bioorganic Chemistry, Polish Academy of Sciences, 60-714 Poznan, Noskowskiego 12/14, Poland 9Institute of Inorganic Chemistry, University of Zurich, Winterthurerstrasse 190, CH-8057 Zurich, Switzerland 10Department of Chemistry, University of Rochester, Rochester, New York 14627, USA 11Department of Biochemistry, Stanford University, Stanford, CA 94305, USA 12Department of Physics, Stanford University, Stanford, CA 94305, USA
To whom correspondence should be addressed. Phone: (650) 723-5976. Fax: (650) 723-6783. E-mail: [email protected].
2
This Supporting Information contains the following sections:
Supporting Methods.
Supporting Results.
Figure S1. Energy vs. all-heavy-atom RMSD to the experimental structure.
Figure S2. Four dynamic and/or unstructured motifs from the RNA motif benchmark.
Figure S3. Egap vs. RMSD of the CS-ROSETTA-RNA lowest energy model to the experimental
structure.
Figure S4. Comparisons between CS-ROSETTA, NMR and crystallographic models of the most
conserved internal loop from the signal recognition particle RNA.
Figure S5. Comparisons between the CS-ROSETTA-RNA and NMR models of the 5´-GABrGU-3´/3´-
UBrGAG-5´ self-complementary internal loop.
Figure S6. Modeling results on the uuGCCaa anticodon stem-loop of B. subtilis tRNAGly by three
different models generation methods.
Table S1. Supplemental information on the RNA motifs benchmark
Table S2. Supplemental Rosetta modeling results (before inclusion of chemical shift data)
Table S3. Supplemental Rosetta modeling results (after inclusion of chemical shift data)
3
Supporting Methods Updates to the standard Rosetta all-atom energy function for RNA Minor updates were made to the Rosetta all-atom energy function for RNA (1). The energy unit reported herein is in Rosetta units (RU), which is used internally by the Rosetta program to store evaluated energy values. Comparisons to RNA Watson-Crick helix thermodynamic parameters (2) indicate that 1 Rosetta unit (RU) is approximately equal to 1 kBT (1) (note that the structure modeling results in this study do not depend on the absolute scale of this energy unit). Compared to the Rosetta energy function used in the prior published work (3), two minor updates were made:
a. Glycosidic (!) torsional potential: In the prior implementation of the glycosidic torsional potential, the glycosidic torsion at the syn minima (~69° if north pucker; ~70° if south pucker) were penalized by 2.2 RU relative to conformations at anti minima (199° if north pucker; ~237° if south pucker) based on the lower frequency of syn glycosidic conformation found in the crystallographic structure of the large ribosomal subunit of H. marismortuii (PDB: 1JJ2). Prior to this work, in separate studies on RNA loop modeling (3) and ab initio motif structure prediction (unpublished results), we observed poor recovery of syn guanosines in noncanonical motifs. Hence, we have modified the glycosidic torsional potential to remove the energy penalty of the syn conformation for guanosine nucleotide.
b. Intra-nucleotide energetic terms: For historical reasons, the energetic contributions from the standard Rosetta all-atom energy terms (van der Waals interactions, hydrogen bonds and solvation effects) were omitted for atom-pairs belonging to the same nucleotide (the exception being a very weak van der Waals interactions repulsion term). These energy terms have been reintroduced here for all base-phosphate intra-nucleotide atom-pairs, including those forming potential hydrogen bonds. The energetic contribution from the standard Rosetta energy terms between intra-nucleotide base-ribose and ribose-phosphate atom-pairs are still presently omitted and instead captured through existing torsional potential terms (1).
CS-ROSETTA-RNA modeling for the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop
Among the 11 blind motifs in the 21-RNA benchmark, the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop was unique in that an initial NMR study on the motif was already published prior to the CS-ROSETTA-RNA modeling (4). As noted in the main text, this prior study did not produce sufficient constraints to generate an atomic-detail three-dimensional structure, but a descriptive 2D schematic was proposed (Fig. 3A in main text). We therefore took the following approach to the modeling process. First, approximately half of the features proposed by the 2D schematic were directly incorporated as constraints during the modeling process, including:
i). G4, G4*, G6 and G6* were assumed to adopt the syn glycosidic conformation.
ii). G4-G6* and G4*-G6 were assumed to form base pairs (but the exact base-pairing pattern was
4
not specified).
iii). U7 and U7* were assumed to be extra-helical bulges.
In this case, only Stepwise Assembly (3) (SWA) modeling was performed since the Fragment Assembly with Full-Atom Refinement (1) (FARFAR) method did not have a framework to support incorporating the constraints. Additional features proposed in the prior study that were not used as modeling constraints included:
i). The exact base-pairing pattern (trans Watson-Crick/Hoogsteen) of G4-G6* and G4*-G6.
ii). Two-fold rotational symmetry of the structure.
iii.) 2´-endo sugar conformations at G4, G4*, A5, A5*, G6, and G6*.
The CS-ROSETTA-RNA model was able to independently recover these additional features.
The original blind modeling of the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop was also completed during the early development stage of the CS-ROSETTA-RNA method. At the time of the original modeling, the intra-nucleotide energetic terms update to the Rosetta all-atom energy function for RNA (see above) was not yet implemented. The lowest energy model obtained from this original CS-ROSETTA-RNA modeling (referred to here as the original Rosetta model) thus contained steric clashes between the amino atoms in the base and the phosphate group atoms of both the G4 and G4* nucleotides. Once the intra-nucleotide energetic terms update to the Rosetta all-atom energy function for RNA was implemented, the original Rosetta model was refined using the following protocol: Stepwise Assembly (SWA) modeling was carried out again on the RNA motif, but now focused only on conformations near the original Rosetta model; a filter was imposed at each SWA building step requiring models to be with 3.0 Å rmsd of the original Rosetta model. The refined Rosetta model generated from this procedure closely resembles the original Rosetta model (1.20 Å pairwise all-heavy-atom rmsd) and retained all the base-pairing and base-stacking features. The refined model also removed steric clashes between the amino atoms in the base and the phosphate group atoms of both the G4 and G4* nucleotides, as expected, and now contain an energetically favorable hydrogen bond between the 1H2 amino proton and the OP1 phosphate oxygen at these nucleotides. The results from the original blind modeling are presented in Table 1 of main text and the Supporting Tables while the refined model is presented in Fig. 3 of the main text.
Supporting Results A criterion for confidence prediction
We discovered that the energy gap, between the lowest energy model and the next lowest energy model that is structurally distinct from the former (at least 1.5 Å rmsd separation), could be used as a metric to assess the confidence of the CS-ROSETTA-RNA modeling. A large energy gap indicates the presence of
5
a single dominant lowest energy conformation with much better energy than all others. Conversely, a small energy gap indicates the presence of multiple structurally distinct conformations with similar energies. As a metric to assess the confidence level of future predictions, we defined the Egap:
where Elowest is the Rosetta energy (including chemical shift pseudo-energy) of the lowest energy model and Enext is the Rosetta energy (including chemical shift pseudo-energy) of the next lowest energy model that is at least 1.5 Å all-heavy-atom rmsd from the former. Based on the results from the 21-RNA benchmark, an Egap value greater than 3.0 kBT is a strong indicator that the lowest energy Rosetta model is accurate. For all 6 cases from the 21-motif benchmark that were found to have an Egap value greater than 3.0 kBT, the lowest energy model was within atomic-accuracy of the experimental structure (under 1.5 Å all-heavy-atom rmsd; see Fig. S3 and Table S3). The criterion was applicable to motifs in both the known structure test set and the blind target set. Interestingly, the same criterion was not applicable to the benchmark results prior to the addition of the chemical shift pseudo-energy term (see Table S2). Thus, inclusion of the experimental chemical shift data was critical for both achieving accurate predictions and assessing their confidence.
FR3D search on the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop In the Rosetta model for the 4x4 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop, the backbone of A5 and A5* adopted a highly irregular conformation, which led to a ~180° inversion in the adenosines’ ribose orientation relative to the riboses in the preceding and following nucleotides (G4, G6, G4*, G6*; see main text Fig. 3B). We used the FR3D program (5) to determine whether this conformational arrangement had been previously observed. The FR3D search, which relies on geometric comparisons of the base conformations, was carried out on a non-redundant list of 654 RNA-containing PDB structures with resolution better than 4 Å (downloaded from the FR3D website on June 02, 2012). The atomic coordinates of the G4-A5-G6 tri-nucleotide from the Rosetta model were used as the query conformation and FR3D was used to search for any geometrically similar consecutive tri-nucleotide conformation in the non-redundant PDB database. FR3D found no matching candidate with geometric discrepancy below the 0.50 Å default cutoff value. The candidate with the lowest geometric discrepancy (0.62 Å), the G1297-U1298-A1299 tri-nucleotide of the 16S rRNA of H. marismortuii (PDB: 2AW7), does not display the aforementioned ~180° inverted ribose conformational arrangement. Lastly, a manual search reveals a potentially similar 180° inverted ribose conformation at the G1426-A1427-G1428 tri-nucleotide of the 23S rRNA of E. coli (PDB: 1VS6). However, all bases in this prior structure were in the anti glycosidic configuration, distinct from the syn G configurations exhibited in the 5´-GABrGU-3´/3´-UBrGAG-5´ Rosetta model.
6
References
1. Das R, Karanicolas J, & Baker D (2010) Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods 7(4):291-294.
2. Xia T, et al. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37(42):14719-14735.
3. Sripakdeevong P, Kladwang W, & Das R (2011) An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling. Proc Natl Acad Sci U S A 108(51):20573-20578.
4. Hammond NB, Tolbert BS, Kierzek R, Turner DH, & Kennedy SD (2010) RNA internal loops with tandem AG pairs: the structure of the 5'GAGU/3'UGAG loop can be dramatically different from others, including 5'AAGU/3'UGAA. Biochemistry 49(27):5817-5827.
5. Sarver M, Zirbel C, Stombaugh J, Mokdad A, & Leontis N (2008) FR3D: finding local and composite recurrent structural motifs in RNA 3D structures. Journal of Mathematical Biology 56(1):215-252.
Figure S1. Energy vs. all-heavy-atom RMSD to the experimental structure. The energy (y-axis) is the sum of the standard Rosetta all-atom energy function for RNA and the chemical shift pseudo-energy term.
All-heavy-atom RMSD (Å) to Experimental Structure
Ene
rgy
Motif name! NMR PDB ID!
Number of models in NMR ensemble!
Average pairwise rmsd (Å) of NMR ensemble!Include closing W.C.
base pairs!Exclude closing W.C.
base pairs!HIV-1 TAR apical loop! 1ANR! 20! 4.07! 4.42!Chimp HAR1 GAA loop! 2LHP! 10! 2.61! 3.33!Human HAR1 GAA loop! 2LUB! 10! 1.98! 2.58!Tandem UG:UA mismatch! 2JSE ! 10! 1.39! 1.77!
Figure S2. Four dynamic and/or unstructured motifs from the RNA motif benchmark. (A) The table reports the average rmsd value calculated between every possible pair of models in the NMR-derived ensemble. Inclusion of the well-structured boundary Watson-Crick base pairs in the rmsd calculation lowers the rmsd value. Four structurally diverse models from the NMR-derived ensemble of the (B) HIV-1 TAR apical loop, (C) chimp HAR1F GAA loop, (D) human HAR1F GAA loop, and (E) tandem UG:UA mismatch. In the UG:UA mismatch case, the structural dynamics appear to be localized to a single gaunosine nucleotide although the motif is known to be thermodynamically destabilizing and contains no base pairs [Biochemistry. 2007 Nov 6;46(44):12665-78].
A
E
PDB: 2JSE | NMR model #1 PDB: 2JSE | NMR model #2 PDB: 2JSE | NMR model #3 PDB: 2JSE | NMR model #5
C
PDB: 2LHP | NMR model #1 PDB: 2LHP | NMR model #5 PDB: 2LHP | NMR model #8 PDB: 2LHP | NMR model #10
D
PDB: 2LUB | NMR model #1 PDB: 2LUB | NMR model #3 PDB: 2LUB | NMR model #6 PDB: 2LUB | NMR model #10
B
PDB: 1ANR | NMR model #1 PDB: 1ANR | NMR model #3 PDB: 1ANR | NMR model #4 PDB: 1ANR | NMR model #6
Figure S3. Egap vs. RMSD of the CS-ROSETTA-RNA lowest energy model to the experimental structure. An energy gap (Egap) value greater than 3.0 kBT (red line) is a strong indicator that the lowest energy Rosetta model will achieve atomic-accuracy. In the 21-RNA benchmark, six motifs have an Egap value greater than 3.0 kBT, and the lowest energy CS-ROSETTA-RNA model for all of these 6 cases were found to be within 1.5 Å rmsd (green line) of the experimental structure.
Figure S4. Comparisons between CS-ROSETTA, NMR and crystallographic models of the most conserved internal loop from the signal recognition particle RNA. The NMR study on this motif produced a tightly defined ensemble, where (A) the mean minimized model and (B) the ensemble member with the best rmsdshift both gave back-calculated chemical shifts for the G4 H1!, G4 H8 and G9 H1! protons that disagreed with the experimental data by at least 1.0 ppm (green circles). Since the experimental 1H chemical shift data and the NMR models originated from the same NMR study, these disagreements could not have been due to differences in experimental conditions (e.g. temperature, salt concentrations and pH). The NMR ensemble also showed disagreements with both (C) the CS-ROSETTA-RNA lowest energy model and (D) the crystallographic model, particularly at the G4-G5 backbone suite (blue dashed circle), the G9-C10 backbone suite, and the C10 nucleobase. .
A NMR mean minimized model (PDB: 28SR)
G9 H1!
DATA Linear Fit
B NMR ensemble model #6 (PDB: 28SP)
C CS-ROSETTA-RNA lowest energy model
D Crystallographic model (PDB: 1LNT)
G4 H8
G4 H2!
G4 H8!
G4 H2!
G4 H8
G4 H2!
G4 H8
G4 H2!
DATA Linear Fit
rmsdshift=0.441 ppm_
rmsdshift =0.504 ppm_
G4 H2!
G4 H8
G9 H1! G4 H2!
G4 H8
rmsdshift=0.180 ppm_
rmsdshift=0.274 ppm_
DATA Linear Fit
DATA Linear Fit
Figure S5. Comparisons between the CS-ROSETTA-RNA and NMR models of the 5´-GABrGU-3´/3´-UBrGAG-5´ self-complementary internal loop. (A-B) The CS-ROSETTA-RNA lowest energy model in cartoon and stick view. (C-D) The first model from the NMR ensemble subsequently solved by the Turner and Kennedy group (PDB: 2LX1) in cartoon and stick view. The pairwise all-heavy-atom rmsd value between the CS-ROSETTA-RNA and the NMR model is 1.07 Å.
A CS-ROSETTA-RNA model (stick view)
C NMR ensemble model #1 (stick view)
B CS-ROSETTA-RNA model (cartoon view)
D NMR ensemble model #1 (cartoon view)
Figure S6. Modeling results on the uuGCCaa anticodon stem-loop of B. subtilis tRNAGly by three different models generation methods. During the original blind modeling, we generated models using (A) SWA and (B) FARFAR with RNA09 fragment database (obtained from http://kinemage.biochem.duke.edu/databases/rnadb.php). CS-ROSETTA-RNA prediction using models from both source (A) and (B) did not return a <2.0 Å rmsd structure (to NMR PDB: 2LBJ) among the five lowest energy clusters. However, modeling with (A) SWA alone does return a <2.0 Å rmsd structure among the five lowest energy clusters (green circle). Lastly, modeling with (C) FARFAR but using the PDB:1JJ2 fragment database also returned a <2.0 Å rmsd structure among the five lowest energy clusters (blue circle).
A Stepwise Assembly (SWA)
B FARFAR with RNA09 fragment database
C FARFAR with PDB:1JJ2 fragment database
Motif nam
e M
otif properties N
on-exchangable 1H chem
ical shift dataa
NM
R structure
b C
rystallographic structureb
Size Strands
Ntotal
Nper
nucleotide Source
PDB
nucleotide-segment
(chain ID)
rmsd
shift (ppm
) PD
B nucleotide-segm
ent(chain ID
) rm
sd shift
(ppm)
Know
n Structures Single G
:G m
ismatch
6 2
39 6.5
BM
RB
Entry 4614 1F5G
3-5 (A
) , 6-8 (B)
0.19 !
! !
UU
CG
tetraloop 6
1 46
7.7 B
MR
B Entry 5705
2KO
C
5-10 (A)
0.24 1F7Y
8-13 (B
) 0.28
Tandem G
A:A
G m
ismatch
8 2
60 7.5
Biochem
istry. 1996 Jul 30;35(30):9677-89 1M
IS 3-6 (A
), 11-14 (B)
0.32 !
! !
Tandem U
G:U
A m
ismatch
8 2
38 4.8
Biochem
istry. 2007 Nov 6;46(44):12665-78
2JSE 4-7 (A
), 16-19 (A)
0.26 !
! !
16s rRN
A U
UA
AG
U loop
8 1
46 5.8
Biochem
istry. 2001 Aug 21;40(33):9879-86
1HS2
3-10 (A)
0.33 1FJG
1089-1096 (A
) 0.25
HIV-1 TA
R apical loop
8 1
56 7.0
Nucleic A
cids Res. 1996 O
ct 15;24(20):3974-81 1A
NR
29-36 (A
) 0.28
! !
! tR
NA
i Met A
SL 9
1 67
7.4 J M
ol Biol. 1997 A
pr 4;267(3):505-19 1SZY
7-15 (A
) 0.30
! !
! C
onserved SRP internal loop
12 2
67 5.6
Nat Struct B
iol. 1999 Jul;6(7):634-8 28SR
5-10 (A
), 19-24 (A)
0.50 1L
NT
4-9 (A
), 16-21 (B)
0.27 R
2 retrotransposon 4x4 loop 12
2 70
5.8 B
MR
B Entry 17406
2L8F 2-7 (A
), 10-15 (A)
0.32 !
! !
Hepatitis C
virus IRES IIa
13 2
93 7.2
Nat Struct B
iol. 2003 Dec;10(12):1033-8
1P5M
9-17 (A), 45-48 (A
) 0.41
2PN4
51-59 (A), 109-112 (B
) 0.20
Blind Targets
UA
AC
tetraloop 6
1 28
4.7 Fox group
4A4R
9-14 (A
) 0.31
! !
! U
CA
C tetraloop
6 1
30 5.0
Fox group 4A
4S 9-14 (A
) 0.25
! !
! U
GA
C tetraloop
6 1
28 4.7
Fox group 4A
4U
9-14 (A)
0.39 !
! !
UU
AC
tetraloop 6
1 28
4.7 Fox group
4A4T
9-14 (A)
0.50 !
! !
Chim
p HA
R1 G
AA
loop 7
2 38
5.4 Schw
albe group 2LH
P 9-11 (A
), 26-29 (A)
0.35 !
! !
Hum
an HA
R1 G
AA
loop 7
2 34
4.9 Schw
albe group 2LU
B
9-11 (A), 26-29 (A
) 0.35
! !
! G
U:U
AU
internal loop 9
2 50
5.6 Sigel group
!c
6-9 (A), 30-34 (A
) 0.44
! !
tRN
AG
ly ASL (cuU
CC
aa) 9
1 65
7.2 N
ikonowicz group
2LBL
5-13 (A)
0.32 !
! !
tRN
AG
ly ASL (cuU
CC
cg) 9
1 59
6.6 N
ikonowicz group
2LBK
5-13 (A
) 0.19
! !
! tR
NA
Gly A
SL (uuGC
Caa)
9 1
69 7.7
Nikonow
icz group 2LB
J 5-13 (A
) 0.37
! !
! Self-sym
metric G
AB
rGU
loop 12
2 90
7.5 K
ennedy and Turner group 2LX
1 3-8 (A
), 14-19 (A)
0.26 !
! !
Average 8.4
1.5 52.7
6.2 !
! !
0.33 !
! 0.25
HIV, hum
an imm
unodeficiency virus; TAR
, trans-activation response; tRN
Ai M
et, initiator methionine tR
NA
; ASL, anticodon stem
-loop; SRP, signal recognition particle;
IRES, internal ribosom
e entry site; HA
R1, hum
an accelerated region 1; tRN
AG
ly, glycine tRN
A; B
rG, 8-brom
oguanosine.
a Ntotal and N
per nucleotide are, respectively, the number of experim
entally assigned non-exchangeable 1H chem
ical shift data (including H1´, H
2´, H3´, H
4´, 1H5´ and 2H
5´ ribose protons, and H
2, H5, H
6 and H8 base protons). The chem
ical shift data were obtained from
the BM
RB
database, from published literature (see citation) and from
N
MR
structure determination studies conducted in parallel w
ith this work (see group nam
e). In two know
n structure cases, the data were not directly available in the
published literature and were graciously provided to us by the authors of the source publication (G
. Varani for the HIV-1 TA
R apical loop and J. D
. Puglisi for the Hepatitis
C V
irus IRES IIa). In B
MR
B Entry 5705, correction w
ere made for the sw
itched chemical shift assignm
ents of 1H5´ and 2H
5´ for nucleotides C8, G
9 and G10.
b The NM
R structures cam
e from the sam
e source as the experimental non-exchangeable 1H
chemical shift data, except for the self-sym
metric G
AB
rGU
loop structure which
came from
a subsequent study by the same group. The first m
odel of the NM
R ensem
ble was used as the experim
ental reference structure (except in the 3 cases below). In 4
cases with know
n structure, we found that the m
otif had also been solved by crystallography. In 3 of the 4 cases, the crystallographic structure provided a better agreement
to the chemical shift data (i.e., low
er rmsd
shift than every mem
ber of the NM
R ensem
ble). In these 3 cases (PDB
ID highlighted in bold text), the crystallographic structure
was used as the experim
ental reference structure. c The experim
ental structure has not yet been deposited into the PDB
database.
Table S1. Supplemental inform
ation on the RN
A m
otifs benchmark
Table S2. Supplemental R
osetta modeling results (before inclusion of chem
ical shift data)
Motif nam
e M
otif properties Low
est energy model (top-1)
Best of five low
est energy cluster centers (top-5)
Size Strands
PDB
a E
gap b rm
sd (Å)
rmsd
shift (ppm
) B
ase-pair recovery
c B
ase-stack recovery
c C
luster rank
rmsd (Å
) rm
sdshift
(ppm)
Base-pair
recoveryc
Base-stack
recoveryc
Know
n Structures Single G
:G m
ismatch
6 2
1F5G
0.33 0.72
0.16 1/1
4/4 1
0.72 0.16
1/1 4/4
UU
CG
tetraloop 6
1 1F7Y
1.54
2.38 0.42
0/0 1/2
1 2.38
0.42 0/0
1/2 Tandem
GA
:AG
mism
atch 8
2 1M
IS 2.35
3.84 0.36
0/2 4/6
2 0.99
0.22 2/2
6/6 Tandem
UG
:UA
mism
atch 8
2 2JSE
0.80 6.36
0.38 0/0
0/3 3
3.43 0.32
0/0 2/3
16s rRN
A U
UA
AG
U loop
8 1
1FJG
0.27 6.63
0.52 0/1
0/3 2
1.99 0.38
1/1 3/3
HIV-1 TA
R apical loop
8 1
1AN
R
0.00 4.86
0.30 0/0
0/0 1
4.86 0.30
0/0 0/0
tRN
Ai M
et ASL
9 1
1SZY
0.38 5.57
0.32 0/0
2/3 2
4.33 0.25
0/0 2/3
Conserved SR
P internal loop 12
2 1LN
T 1.80
1.89 0.40
1/3 9/10
2 0.91
0.26 3/3
10/10 R
2 retrotransposon 4x4 loop 12
2 2L8F
0.06 4.68
0.48 0/4
3/10 5
1.39 0.44
2/4 10/10
Hepatitis C
virus IRES IIa
13 2
2PN4
2.82 6.11
0.35 2/2
5/7 4
5.33 0.35
1/2 5/7
Blind Targets
UA
AC
tetraloop 6
1 4A
4R
0.33 0.85
0.33 0/1
3/3 1
0.85 0.33
0/1 3/3
UC
AC
tetraloop 6
1 4A
4S 0.09
4.66 0.57
0/1 0/4
2 0.72
0.28 0/1
4/4 U
GA
C tetraloop
6 1
4A4U
2.11
5.94 0.49
0/0 1/2
2 1.60
0.38 0/0
2/2 U
UA
C tetraloop
6 1
4A4T
0.44 5.05
0.51 0/0
0/0 3
1.43 0.39
0/0 0/0
Chim
p HA
R1 G
AA
loop 7
2 2LH
P 0.52
5.04 0.50
0/0 1/2
5 3.78
0.38 0/0
2/2 H
uman H
AR
1 GA
A loop
7 2
2LUB
0.90
4.44 0.48
0/1 2/5
4 2.28
0.27 1/1
4/5 G
U:U
AU
internal loop 9
2 !
d 0.16
4.12 0.25
1/2 3/4
1 4.12
0.25 1/2
3/4 tR
NA
Gly A
SL (cuUC
Caa)
9 1
2LBL
0.41 5.56
0.29 0/0
2/5 3
5.47 0.33
0/0 2/5
tRN
AG
ly ASL (cuU
CC
cg) 9
1 2LB
K
0.55 5.34
0.35 1/1
1/5 5
5.00 0.29
1/1 1/5
tRN
AG
ly ASL (uuG
CC
aa) 9
1 2LB
J 1.23
5.02 0.44
1/1 1/3
3 4.11
0.38 1/1
1/3 Self-sym
metric G
AB
rGU
loop 12
2 2LX
1 0.55
5.46 0.53
0/2 2/7
3 1.25
0.29 2/2
7/7 Average
8.4 1.5
! 0.84
4.50 0.40
0.33/1.05 2.10/4.19
2.62 2.71
0.32 0.76/1.05
3.43/4.19 R
MSD
< 1.50 Å
! !
! !
2/21 !
! !
! 8/21
! !
! R
MSD
< 2.00 Å
! !
! !
3/21 !
! !
! 10/21
! !
! E
gap >3.00 kB T
! !
! 0/21
! !
! !
! !
! !
! a PD
B ID
of experimental reference crystallographic or N
MR
structure (see Table S1 for details). b Energy gap (see A criterion for confidence prediction subsection of Supporting R
esults for definition). c N
umber of native base-pairs and native base-stacks correctly recovered by the R
osetta model. B
ase pairs and the base stacks are automatically annotated using the program
M
C-annotate [J M
ol Biol. 2001 M
ay 18;308(5):919-36]. Base-pairing annotation follow
s the Leontis and Westhof nom
enclature [RN
A. 2001 A
pr;7(4):499-512] and recovery entails having the correct edge-to-edge interaction (W
atson-Crick, H
oogsteen, or Sugar-edge) and local strand orientation (cis or trans). Counts of both native base pairs and
correctly recovered base pairs are lowered ow
ing to ambiguities in assignm
ent of bifurcated base pairs, pairs connected by single hydrogen bonds and pairs that are not com
pletely co-planar. Boundary/closing canonical base pairs w
ere not counted. Base-stacks are classified as either upw
ard, downw
ard, outward or inw
ard [RN
A. 2009 O
ct;15(10):1875-85] and recovery entails having the correct base-stacking type. d The experim
ental structure has not yet been deposited into the PDB
database.
Motif nam
e M
otif properties Low
est energy model (top-1)
Best of five low
est energy cluster centers (top-5)
Size Strands
PDB
a E
gap b rm
sd (Å)
rmsd
shift (ppm
) B
ase-pair recovery
c B
ase-stack recovery
c C
luster rank
rmsd (Å
) rm
sdshift
(ppm)
Base-pair
recoveryc
Base-stack
recoveryc
Know
n Structures Single G
:G m
ismatch
6 2
1F5G
0.46 0.71
0.14 1/1
4/4 1
0.71 0.14
1/1 4/4
UU
CG
tetraloop 6
1 1F7Y
3.30
0.84 0.18
0/0 2/2
1 0.84
0.18 0/0
2/2 Tandem
GA
:AG
mism
atch 8
2 1M
IS 7.58
1.10 0.13
2/2 6/6
1 1.10
0.13 2/2
6/6 Tandem
UG
:UA
mism
atch 8
2 2JSE
1.41 3.02
0.12 0/0
2/3 5
2.52 0.15
0/0 3/3
16s rRN
A U
UA
AG
U loop
8 1
1FJG
2.11 0.52
0.19 1/1
3/3 1
0.52 0.19
1/1 3/3
HIV-1 TA
R apical loop
8 1
1AN
R
2.23 5.86
0.18 0/0
0/0 1
5.86 0.18
0/0 0/0
tRN
Ai M
et ASL
9 1
1SZY
0.48 3.89
0.18 0/0
2/3 3
1.35 0.17
0/0 3/3
Conserved SR
P internal loop 12
2 1LN
T 5.51
0.81 0.18
3/3 10/10
1 0.81
0.18 3/3
10/10 R
2 retrotransposon 4x4 loop 12
2 2L8F
4.10 1.17
0.17 3/4
10/10 1
1.17 0.17
3/4 10/10
Hepatitis C
virus IRES IIa
13 2
2PN4
2.08 3.21
0.19 2/2
7/7 2
1.48 0.17
2/2 7/7
Blind Targets
UA
AC
tetraloop 6
1 4A
4R
3.92 0.97
0.24 1/1
3/3 1
0.97 0.24
1/1 3/3
UC
AC
tetraloop 6
1 4A
4S 2.56
1.00 0.20
1/1 4/4
1 1.00
0.20 1/1
4/4 U
GA
C tetraloop
6 1
4A4U
0.74
3.60 0.28
0/0 1/2
2 1.67
0.28 0/0
2/2 U
UA
C tetraloop
6 1
4A4T
0.45 1.72
0.27 0/0
0/0 1
1.72 0.27
0/0 0/0
Chim
p HA
R1 G
AA
loop 7
2 2LH
P 0.73
2.88 0.18
0/0 2/2
3 2.88
0.21 0/0
1/2 H
uman H
AR
1 GA
A loop
7 2
2LUB
1.33
2.26 0.22
1/1 4/5
4 2.03
0.17 0/1
4/5 G
U:U
AU
internal loop 9
2 !
d 1.10
1.37 0.16
2/2 4/4
1 1.37
0.16 2/2
4/4 tR
NA
Gly A
SL (cuUC
Caa)
9 1
2LBL
0.89 3.28
0.19 0/0
5/5 3
1.41 0.20
0/0 5/5
tRN
AG
ly ASL (cuU
CC
cg) 9
1 2LB
K
1.50 3.42
0.17 1/1
4/5 2
1.94 0.16
1/1 5/5
tRN
AG
ly ASL (uuG
CC
aa) 9
1 2LB
J 2.19
3.08 0.16
1/1 3/3
3 2.93
0.17 1/1
3/3 Self-sym
metric G
AB
rGU
loop 12
2 2LX
1 28.13
1.14 0.22
2/2 7/7
1 1.14
0.22 2/2
7/7 Average
8.4 1.5
! 3.39
2.18 0.19
1.00/1.05 3.95/4.19
1.90 1.69
0.19 0.95/1.05
4.10/4.19 R
MSD
< 1.50 Å
! !
! !
10/21 !
! !
! 13/21
! !
! R
MSD
< 2.00 Å
! !
! !
11/21 !
! !
! 16/21
! !
! E
gap >3.00 kB T
! !
! 6/21
! !
! !
! !
! !
! a PD
B ID
of experimental reference crystallographic or N
MR
structure (see Table S1 for details). b Energy gap (see A criterion for confidence prediction subsection of Supporting R
esults for definition). c N
umber of native base-pairs and native base-stacks correctly recovered by the R
osetta model. B
ase pairs and the base stacks are automatically annotated using the program
M
C-annotate [J M
ol Biol. 2001 M
ay 18;308(5):919-36]. Base-pairing annotation follow
s the Leontis and Westhof nom
enclature [RN
A. 2001 A
pr;7(4):499-512] and recovery entails having the correct edge-to-edge interaction (W
atson-Crick, H
oogsteen, or Sugar-edge) and local strand orientation (cis or trans). Counts of both native base pairs and
correctly recovered base pairs are lowered ow
ing to ambiguities in assignm
ent of bifurcated base pairs, pairs connected by single hydrogen bonds and pairs that are not com
pletely co-planar. Boundary/closing canonical base pairs w
ere not counted. Base-stacks are classified as either upw
ard, downw
ard, outward or inw
ard [RN
A. 2009 O
ct;15(10):1875-85] and recovery entails having the correct base-stacking type. d The experim
ental structure has not yet been deposited into the PDB
database.
Table S3. Supplemental R
osetta modeling results (after inclusion of chem
ical shift data)