Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder.
-
Upload
diana-nancy-king -
Category
Documents
-
view
222 -
download
3
Transcript of Proposal for the RNA Alignment Ontology Rob Knight Dept Chem & Biochem CU Boulder.
What do we want to do?
• Represent detailed structural info and other metadata on alignment
• Avoid horizontal and vertical expansion
• Explicitly annotate correspondences at the level where they occur
Homology is problematic…• Fundamental problem: systems that are homologous at one
level are not necessarily homologous at other levels• E.g. bat wings and bird wings: homologous as pentadactyl
limbs, but not homologous as wings• Homology is hierarchical and
can partially overlap at any level(e.g. Griffiths 2006)
Ridley “Evolution” 3rd ed.
Bat forelimbs
Bird forelimbs
Frog forelimbs
Rodent forelimbs
Mammal forelimbs
Tetrapod forelimbs
…and correspondence need not be homology at all!
• Example from SELEX: hammerhead ribozymes independently evolved at least three times: in nature, and in Jack Szostak and Ron Breaker’s labs
• However, we still want to be able to align the functionally equivalent sequences although there is not evolutionary relationship
Problem: have millions of fragments, want to align (incl. noncanonical pairs) + assign named regions
Solution
• Use existing alignment, try to fit new seqs in
• Would be improved if we could explicitly annotate helices, noncanonical pairs, etc. on the sequence overall
• For display, need to easily show/hide groups of sequences and/or regions of the sequence
Use case 2: SELEX
• From large number of unaligned sequences, want to identify motifs like this (Majerfeld & Yarus 2005)
How is this currently done?
• Find regions that are similar in more sequences than chance
• Group these sequences centered on the “motif”• See if the parts of the motif can be related by helices• See if anything else is reliably found by the motif• Repeat for other families and see if there are
relationships between them• Group these families together, then iterate
So how do we handle all this? A proposalEntities:
• sequence_region: a thing that defines a set of bases relative to some sequence (i.e. with indices for each base)
• stem: two regions linked by pairs• unbroken_stem: two regions completely paired• base: region that consists of single nucleotide• base_pair: region that consists of two, paired bases• canonical_base_pair: base pair that is cis-WW• terminal_loop: contiguous sequence_region stretching
from i to j such that i-1 and j+1 are a base pair• internal_loop: unpaired region that interrupts one
unbroken_stem• junction: unpaired region that connects two or more stems
So how do we handle all this? A proposal (cont’d)
Relationships:• correspondence: relation among set of sequence_regions implying
all share a feature (with metadata about how determined)• homology: correspondence implying continuous chain of descent
preserving the relation• sequence_similarity: correspondence implying regions are similar
in primary sequence• two_d_structure_similarity: correspondence implying regions are
similar in 2D structure, i.e. nested canonical base pairs• secondary_structure_similarity: correspondence implying regions
are similar in secondary structure, i.e. incl. pseudoknots/noncanonicals
• tertiary_structure_similarity: correspondence implying regions are similar in 3D structure
So how do we handle all this? A proposal (cont’d)
Relationships:• pairing: relation that asserts that two sequence_regions each have
parts of at least one base_pair that connects them• stem_pairing: pairing that includes several base_pairs (not
necessarily contiguous) between two sequence_regions• unbroken_stem_pairing: stem_pairing that includes no bases in
the sequence_regions that are not paired with the other sequence_region, in order
• base_pairing: pairing that connects exactly two bases, annotated with the Leontis-Westhof classification
More exotic uses for alignment:• microrna_target: pairing relation in which one member is a miRNA
and the other is an mRNA according to SO• same_microrna_target: a relation among a set of sequences that
have microrna_target relation to the same miRNA
Definitions• Correspondence: A relation between regions of an RNA alignment, which can occur between
molecules or within a molecule. These relations are reflexive, symmetric and transitive.• Region: Consists of a single RNA nucleotide or a set of RNA nucleotides. Regions can be
continuous spans of nucleotides or discontinuous collections of contiguous spans. Single base pairs, terminal loops, junctions, etc. are all examples of regions.
• Homology: A correspondence that implies descent from a common ancestor with evolutionary continuity.
• Similarity: A correspondence that can be defined in terms of a quantitative measurement, typically at some structural level.
• Sequence similarity: A similarity defined at the primary sequence level, e.g. 95% sequence identity.
• Secondary structure similarity: A similarity defined at the secondary structure level, e.g. 50% of base pairs in common.
• 3D structure similarity: A similarity defined at the 3D structure level, e.g. 3 Angstrom RMSD.
• Basepairing: A relation between two RNA nucleotides, defined by base-base hydrogen-bonding interactions.
• Function: The properties of a biological entity for which it is maintained by evolutionary selection
AcknowledgementsRNA Alignment Ontology
working group:• James. W. Brown• Fabrice Jossinet• Rym Kachouri• B. Franz. Lang• Neocles Lenotis• Gerhard Steger• Jesse Stombaugh• Eric WesthofOther coauthors:• Amanda Birmingham• Paul Griffiths• Franz Lang NSF RCN grant #
0443508
Knight Lab members:
• Cathy Lozupone
• Micah Hamady
• Chris Lauber
• Jesse Zaneveld
• Jeremy Widmann
• Elizabeth Costello
• Jens Reeder
• Daniel McDonald
• Anh Vu
• Ryan Kennedy
• Julia Goodrich
• Meg Pirrung
• Reece Gesumaria
• Tony Walters
• Bob Larsen
Trp project:
• Irene Majerfeld
• Jana Chocholousova
• Vikas Malaiya
• Matthew Iyer
• Mike Yarus