TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids...

6
TISdb: a database for alternative translation initiation in mammalian cells Ji Wan and Shu-Bing Qian* Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA Received August 10, 2013; Revised October 15, 2013; Accepted October 16, 2013 ABSTRACT Proper selection of the translation initiation site (TIS) on mRNAs is crucial for the production of desired protein products. Recent studies using ribosome profiling technology uncovered a surprising variety of potential TIS sites in addition to the annotated start codon. The prevailing alternative translation reshapes the landscape of the proteome in terms of diversity and complexity. To identify the hidden coding potential of the transcriptome in mammalian cells, we developed global translation initiation sequencing (GTI-Seq) that maps genome-wide TIS positions at nearly a single nucleotide resolution. To facilitate studies of alternative translation, we created a database of alternative TIS sites identified from human and mouse cell lines based on multiple GTI-Seq replicates. The TISdb, available at http:// tisdb.human.cornell.edu, includes 6991 TIS sites from 4961 human genes and 9973 TIS sites from 5668 mouse genes. The TISdb website provides a simple browser interface for query of high- confidence TIS sites and their associated open reading frames. The output of search results provides a user-friendly visualization of TIS informa- tion in the context of transcript isoforms. Together, the information in the database provides an easy reference for alternative translation in mammalian cells and will support future investigation of novel translational products. INTRODUCTION In all kingdoms of life, mRNA translation represents the last step of the flow of genetic information and primarily defines the proteome. Translation is a complex process, consisting of initiation, elongation, termination and ribosome recycling (1). Initiation is considered to be the rate-limiting step and determines the overall rate of trans- lation (2). In eukaryotes, the cap-dependent initiation mechanism accounts for the vast majority of cellular mRNA translation. During initiation, the 43 S pre-initi- ation complex (PIC) is recruited to the 5 0 end m 7 G cap structure of mRNA with the aid of many translation ini- tiation factors. It is generally accepted that PIC migrates along the 5 0 untranslated region (5 0 UTR) in an ATP- dependent process known as scanning, until it encounters a start codon, normally the first AUG. Following the start codon recognition, the 60S ribosomal subunit joins to form the 80S ribosome complex and elongation now begins. The scanning model implicates that the features of the 5 0 UTR have major influences on the start codon selection. Interestingly, non-AUG start codons, such as CUG, could also serve as initiators (3,4). In contrast, failed recognition of an initiation codon results in continu- ous scanning of the PIC and initiating at a downstream site, in a process known as leaky scanning (5). In addition to the cap-dependent mechanism, translation could also be initiated in a cap-independent manner. For instance, internal initiation can be mediated by a secondary struc- ture within the 5 0 UTR known as an internal ribosome entry site (IRES) (6,7). This alternative translation initi- ation is believed to be regulated under different growth conditions. However, fundamental principles governing the selection of translation initiation sites (TIS) remain unclear. The functional significance of alternative translation is multifaceted. First, selection of upstream TIS codons leads to generation of upstream open reading frames (uORFs), which directly regulate downstream protein syn- thesis from the main open reading frame (ORF) (8,9). Second, translation via alternative TIS sites produces protein isoforms differing in NH 2 -terminal sequences when these alternative initiators are in the same reading frame (10). Depending on the position of TIS sites relative to the annotated start codon, either the NH 2 -terminal extended or truncated isoforms will be produced. Third, totally different proteins will be generated if the alterna- tive TIS sites are in different reading frames. Therefore, alternative translation reshapes the landscape of the proteome by increasing both diversity and complexity of translational products. The existence of alternative TIS codons clearly indicates that the coding potential of a given genome is much richer * To whom correspondence should be addressed. Tel:+1 607 254 3397; Fax: +1 607 255 6249; Email: [email protected] Nucleic Acids Research, 2013, 1–6 doi:10.1093/nar/gkt1085 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Nucleic Acids Research Advance Access published November 6, 2013 by guest on December 16, 2013 http://nar.oxfordjournals.org/ Downloaded from

Transcript of TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids...

Page 1: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

TISdb a database for alternative translationinitiation in mammalian cellsJi Wan and Shu-Bing Qian

Division of Nutritional Sciences Cornell University Ithaca NY 14853 USA

Received August 10 2013 Revised October 15 2013 Accepted October 16 2013

ABSTRACT

Proper selection of the translation initiation site (TIS)on mRNAs is crucial for the production of desiredprotein products Recent studies using ribosomeprofiling technology uncovered a surprising varietyof potential TIS sites in addition to the annotatedstart codon The prevailing alternative translationreshapes the landscape of the proteome in termsof diversity and complexity To identify the hiddencoding potential of the transcriptome in mammaliancells we developed global translation initiationsequencing (GTI-Seq) that maps genome-wide TISpositions at nearly a single nucleotide resolutionTo facilitate studies of alternative translation wecreated a database of alternative TIS sites identifiedfrom human and mouse cell lines based on multipleGTI-Seq replicates The TISdb available at httptisdbhumancornelledu includes 6991 TIS sitesfrom 4961 human genes and 9973 TIS sites from5668 mouse genes The TISdb website provides asimple browser interface for query of high-confidence TIS sites and their associated openreading frames The output of search resultsprovides a user-friendly visualization of TIS informa-tion in the context of transcript isoforms Togetherthe information in the database provides an easyreference for alternative translation in mammaliancells and will support future investigation of noveltranslational products

INTRODUCTION

In all kingdoms of life mRNA translation represents thelast step of the flow of genetic information and primarilydefines the proteome Translation is a complex processconsisting of initiation elongation termination andribosome recycling (1) Initiation is considered to be therate-limiting step and determines the overall rate of trans-lation (2) In eukaryotes the cap-dependent initiationmechanism accounts for the vast majority of cellular

mRNA translation During initiation the 43 S pre-initi-ation complex (PIC) is recruited to the 50 end m7G capstructure of mRNA with the aid of many translation ini-tiation factors It is generally accepted that PIC migratesalong the 50 untranslated region (50 UTR) in an ATP-dependent process known as scanning until it encountersa start codon normally the first AUG Following the startcodon recognition the 60S ribosomal subunit joins toform the 80S ribosome complex and elongation nowbegins The scanning model implicates that the featuresof the 50UTR have major influences on the start codonselection Interestingly non-AUG start codons such asCUG could also serve as initiators (34) In contrastfailed recognition of an initiation codon results in continu-ous scanning of the PIC and initiating at a downstreamsite in a process known as leaky scanning (5) In additionto the cap-dependent mechanism translation could alsobe initiated in a cap-independent manner For instanceinternal initiation can be mediated by a secondary struc-ture within the 50UTR known as an internal ribosomeentry site (IRES) (67) This alternative translation initi-ation is believed to be regulated under different growthconditions However fundamental principles governingthe selection of translation initiation sites (TIS) remainunclearThe functional significance of alternative translation is

multifaceted First selection of upstream TIS codonsleads to generation of upstream open reading frames(uORFs) which directly regulate downstream protein syn-thesis from the main open reading frame (ORF) (89)Second translation via alternative TIS sites producesprotein isoforms differing in NH2-terminal sequenceswhen these alternative initiators are in the same readingframe (10) Depending on the position of TIS sites relativeto the annotated start codon either the NH2-terminalextended or truncated isoforms will be produced Thirdtotally different proteins will be generated if the alterna-tive TIS sites are in different reading frames Thereforealternative translation reshapes the landscape of theproteome by increasing both diversity and complexity oftranslational productsThe existence of alternative TIS codons clearly indicates

that the coding potential of a given genome is much richer

To whom correspondence should be addressed Tel +1 607 254 3397 Fax +1 607 255 6249 Email sq38cornelledu

Nucleic Acids Research 2013 1ndash6doi101093nargkt1085

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc30) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly cited For commercialre-use please contact journalspermissionsoupcom

Nucleic Acids Research Advance Access published November 6 2013 by guest on D

ecember 16 2013

httpnaroxfordjournalsorgD

ownloaded from

than we previously thought Given the physiological im-portance of alternative translation there is an urgent needfor techniques suitable for mapping global TIS positionsEarly attempts applied machine-learning techniques toidentify novel TIS sites on cDNA or genomic sequencesbased on sequence features summarized from the knownstart codons and their flanking sequences (1112)However in silico sequence analysis cannot preciselypredict alternative TIS sites in particular non-AUGcodons Recent development of ribosome profiling tech-niques allows monitoring ribosome dynamics with unpre-cedented resolution at the genome-wide scale (13) Tocapture translation initiation events some variants ofribosome profiling have been developed by applyingdistinct translation inhibitors to freeze initiating ribo-somes Ingolia et al used harringtonine to stall the first80S ribosome complex during initiation (14) Becauseharringtonine cannot completely block the initiating ribo-somes TIS identification relies on a support vectormachine (SVM) learning technique trained with priorTIS information (14) In this instance additional filteringsteps were required to increase the accuracy of identifiedTIS positions The extensive computational processing ofthe sequencing data likely introduces bias in identifyingnon-canonical TIS sites Fritsch et al used puromycin toenrich ribosomes near the start codon because initiatingribosomes are less sensitive to this translation inhibitor(15) The identification of TIS sites once again relies ona machine learning technique based on neural networks(15) In addition it only focuses on a region covering the50UTR and the first 30 nt of the coding region whichcould miss TIS sites downstream of the annotated startcodonTo circumvent the drawbacks mentioned above and

improve the accuracy of TIS identification we developedglobal translation initiation sequencing (GTI-Seq) thatpermits precise TIS identification at the nucleotide reso-lution (16) The rationale of GTI-seq is to use a translationinhibitor lactimidomycin (LTM) that is capable of com-pletely freezing the initiating ribosomes Unlike thecommonly used translation inhibitor cycloheximideLTM preferentially acts on the initiating ribosomesbecause it only binds to the empty E-site of theribosome that is normally occupied by deacylated tRNAduring elongation Indeed ribosome footprints associatedwith LTM were highly enriched at the annotated startcodon The single nucleotide resolution permits frameshifting analysis of ORFs Importantly GTI-Seq uses astraightforward computational approach in mappingglobal TIS sites minimizing possible bias introduced bydata processing From GTI-seq data sets a few novelalternative TIS codons have been experimentallyvalidated The accuracy of TIS identification was furthersupported by evolutionary conservation between humanand mouse genomesA high precision map of global TIS positions will ignite

numerous interests in deciphering physiological functionsof alternative translation To facilitate mechanistic inves-tigation of alternative translation for individual genes wedesigned a comprehensive TIS database (TISdb) built onmultiple high-resolution GTI-seq data sets A web search

interface is provided with certain filtering options tonarrow down the search results The information of TIS-associated ORFs as well as reading frames is presented asan image for easy visualization To our knowledge TISdbis the first public database covering both canonical andalternative TIS positions on a genome-wide scale Thedatabase is a ready resource for researchers looking foralternative translation on individual genes and provides aunique view depicting the richness of genome codingpotential

METHODS

Data generation

Four biological replicates of GTI-seq data from HEK293and one set of GTI-seq data from a MEF cell line wereused to identify putative TIS sites of human and mousegenomes respectively (1617) Using Tophat (18) theGTI-seq data were first mapped to the correspondinggenome and transcriptome downloaded from UCSCgenome browser (hg19 and mm10) (19) From theuniquely mapped reads the 13th nucleotide (12 nt offsetfrom the 50 end) was inferred as the ribosome P-siteposition which corresponds to the start codon recognizedby the initiation Met-tRNA during translation initiationThese uniquely mapped reads were then intersected withthe NCBI Refseq gene annotation to quantify the P-siteread count for each individual mRNA transcript Giventhe fact that many P-sites have a small number of GTI-seqreads and the distribution of the P-site read count is ap-parently Poisson over-dispersed (unequal mean andvariance) we applied the zero-truncated binomial negative(ZTNB) model to determine P-sites with a statistically sig-nificant number of read counts The ZTNB model canhandle non-zero digital values of high-throughputsequencing data and has been applied to cross-linkingimmunoprecipitation sequencing (CLIP-seq) (20) Aglobal ZTBN model was first fit over all the non-emptyP-sites in the entire transcriptome Second for each indi-vidual transcript a local ZTNB model was trained on thenon-zero P-sites of this transcript The P-site satisfyingthe P-value cut-offs based on the parameters estimatedin the global and local ZTBN model was categorized asa putative TIS codon

From human GTI-seq data putative TIS codons dis-covered in at least three biological replicates (out of four)were compiled into the final database The final databasein the current release of TISdb contains 6991 TIS sitesfrom 4961 human genes and 9973 TIS sites from 5668mouse genes Compared with the Refseq gene annotation30 of the total predicted TIS sites match the annotatedTIS (aTIS) codons Approximately 50 of identified TISsites belong to the upstream TIS (uTIS) codons and 20are downstream TIS (dTIS) (Table 1) In terms of codoncomposition gt50 of the identified TIS sites use AUGwhereas 30 use near-cognate codons that only differ atone position from AUG (ie AUC AUA AUU CUGGUG UUG AUG AAG and AGG) It is also worthmentioning that the percentages of the AUG and thenear-cognate codons are underestimated because a small

2 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

portion of GTI-seq reads have 1 nt offset to the 13th P-siteposition Therefore lt10 of the human TIS sites and20 of the mouse TIS sites in our database may eitheruse an unconventional TIS codon or represent a falsepositive identification

Web interface

The web interface is developed using Python Django webframework which is served by Apache The TIS data arestored in an SQLite database All the TIS informationincluding the codon sequence and predicted ORF canbe downloaded from a separate lsquoDownloadrsquo page Inaddition the lsquoMain pagersquo and lsquoHelprsquo page containdetails about data generation result interpretation andother relevant information

Database features

The TISdb database provides a user-friendly web interfacefor searching putative TIS sites on individual genes fromeither the human or mouse genome In addition to thegeneral query the database allows users to specify a setof filter criteria to focus on the TIS sites of their interestsThe query returns a summary table and a carousel con-taining images illustrating the positional information ofTIS sites and the associated ORF in the context ofRefseq transcript In addition the reading frame informa-tion is colour coded for easy visualization These compo-nents of query results offer users a comprehensive andstraightforward visualization of TIS codons

TIS query

On the lsquoSearchrsquo page users can perform a general queryfor TIS sites or a more specialized search During generalsearches users can input different types of gene identifiersincluding official gene symbols Refseq transcript IDEnsembl gene ID and Entrez gene ID By pasting a geneidentifier list or uploading a gene identifier file users cansearch up to 50 genes during an individual query Anotherrequired field is the choice of species (either human ormouse) To narrow down the search results users canselect several optional search criteria provided by TISdbto address different aspects of TIS information Thesesearch criteria include codon composition (canonicalATG near-cognate initiation codon or other codons)and the type of TIS (annotated TIS downstream TISand upstream TIS) Given the regulatory role of uORFin gene expression users can select the lsquouORF onlyrsquo tolimit their search to uTIS and the associated uORFs

In addition to the TIS information derived from GTI-seq we also integrated the TIS sites identified by

harringtonine-based approach and puromycin-basedmethods (1415) Because of the inherent differences ofthose techniques variations of TIS resolution and differ-ent cell types used in those studies only 30 of TIS sitesfrom TISdb have matching TIS positions reported inother studies Nonetheless the general agreement of TISidentification between different studies offers independentclues for experimental biologists to investigate alternativetranslation events with high confidence By selectinglsquoSupported by other studiesrsquo users can limit their searchto those TIS sites commonly identified by differentmethods within a 3 nt window

TIS search results

Once the query has matched the records in the databasethe lsquoResultrsquo page will display two components TISsummary table and ORF prediction view

TIS summary tableThe first component of the result section is a TIS summarytable from selected species (Figure 1A) This tablecontains basic TIS information including the speciesgene symbol transcript ID and genome coordinate ofthe TIS To help interpret biological implication of theTIS it also provides relative positional informationcompared with the annotated start codon Given the regu-latory role of uORF any TIS sites associated with uORFare explicitly indicated As discussed above we alsoprovide information about whether a TIS was similarlyidentified in other studies employing different techniquesTo demonstrate the conservation feature of TIS site se-quences we calculated the average phyloP score over awindow of 3 and+4nt around the TIS site accordingto the UCSC 46-way vertebrate alignment for hg19 and60-way vertebrate alignment for mm10 (19) In general apositive average phyloP score reflects lsquoconservationrsquo of aTIS context region whereas a negative value suggests lsquose-lectionrsquo of a TIS context A hyperlink is created for phyloPscore in the summary table for users to retrieve the exactsequence alignment and other relevant information fromthe UCSC genome browser (Figure 1B) The wholesummary table is organized on the basis of transcriptisoforms instead of the gene This is because a numberof TIS sites are only specific to certain transcriptisoforms of the same gene (for example GAPDH) Inthis way we avoid the loss of isoform-specific TIS infor-mation and make ORF prediction meaningful within thecontext of the mRNA transcript Even though this couldcause redundancy of TIS coordinates in the summarytable it provides a more intuitive way for biological inter-pretation of the TIS codon

Table 1 Statistics of translation initiation sites in the TISdb

Species Start codon sequence TIS category Total TIS (gene)

AUG Near cognate Others Annotated Upstream Downstream Others

Human 4090 2132 769 2575 3540 678 198 6991 (4961)Mouse 4400 3144 2429 2729 5099 2051 94 9973 (5668)

Nucleic Acids Research 2013 3

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

ORF prediction viewTo help users investigate the biological functions of theidentified TIS codons in the context of their ORFs acarousel containing illustrative images of each individualtranscript is displayed Each image corresponds to aqueried gene As explained in the previous section allthe transcript isoforms for that particular gene are dis-played The content of each image contains the coordinateof the TIS codon relative to the transcription start sitecodon composition of the identified TIS colour-codedframe-shift information and the predicted stop codon

(TAG TAA and TGA) position associated with thatORF (Figure 2) The caption of the image is presentedas speciesgene Figure 2B is an example of the ORF pre-diction view of human g-actin 1 (ACTG1) This genehas two transcript isoforms NM_001199954 andNM_001614 In the TISdb there are four TIS sites inboth isoforms two uTIS one aTIS and one dTISHowever the biological implications of the two uTIScodons are different for the two transcript isoforms Theupstream CUG uTIS is in the same reading frame as theCDS of NM_001199954 which is expected to result in a

Figure 1 An example of TIS search results for ACTG1 from both human and mouse genomes (A) Summary table of TIS information for ACTG1The coordinates correspond to the first position of the start codon The genome coordinates are consistent with hg19 (human) or mm10 (mouse)(B) Multiple alignment of TIS context A window of 3 and +4nt around an upstream TIS of mouse ACTG1 is displayed The height in thelsquoPlacental Consrsquo represents the degree of conservation for each base in the window The orthologous sequences of 10 vertebrate species are shown formultiple alignment

4 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

protein product with an NH2-terminal extension Incontrast the same uTIS in NM_001614 only associateswith a short uORF overlapping with the CDS The down-stream CUG uTIS is in a different reading frame relativeto the annotated CDS and the associated dORFs in bothtranscript isoforms have distinct sequences

FUTURE PROSPECTS

GTI-seq represents a remarkable technological advance-ment in mapping global mRNA translation initiation Ithas the potential to reveal the hidden coding potential ofthe transcriptome Comprehensive cataloguing of globalTIS sites and the associated ORFs is just the beginning inunveiling the principles governing alternative translationThe enormous biological breadth of translational controlhas led to an enhanced appreciation of proteome diversityand complexity Despite the advantages of GTI-Seq onTIS identification some limitations still exist Firstcurrent ribosome profiling approaches lack quantitativecapacity The sequencing reads density associated witheither harringtonine or LTM does not truly reflect therate of translation initiation Therefore the currentTISdb provides the position of alternative TIS codonsrather than the efficiency of alternation translation initi-ation Second the depth of ribosome profiling isinfluenced by the abundance of transcripts TIS informa-tion may not be available for transcripts of lowabundance For genes with relatively high expressionGTI-seq could possibly fail to capture some TIS signalsthat have low initiation efficiency Such scenarios includetranscripts with multiple uTIS sites but no annotated TISsites Third TIS selection is subjected to regulation underdifferent growth conditions Future GTI-seq experimentswill focus on quantitative changes of TIS selection in

response to stressors such as nutrient starvation Fourthtissue-specific TIS selection remains a formidable taskbased on technical limitations of the current ribosomeprofiling protocol We envision that variants ofribosome profiling will be developed in the future tocapture quantitative TIS information from varioustissues across a wider range of organismal species Thecurrent TISdb sets the stage for investigation of alterna-tive translation and provides an important platform forstudying translational control There is little doubt thatintegration of GTI-seq data with other data sets such asCHIP-seq RNA-seq miRNA profiling and proteomicswill present a fresh view of global post-transcriptionaland -translational gene regulation The effort to consoli-date the rich biology with detailed understanding ofthe underlying mechanisms promises an exciting andsurprising future

ACKNOWLEDGEMENTS

We thank members of Qian laboratory for helpful sugges-tion on the website development and critical comments forthe manuscript We also thank David Smith KevinMurphy and Cornell information technologies forproviding the virtual hosting space for the TISdb website

FUNDING

National Institutes of Health [1 DP2 OD006449-011R01AG042400-01A1] Ellison Medical Foundation[AG-NS-0605-09] and DOD Exploration-HypothesisDevelopment Award [W81XWH-11-1-0236 to S-BQ]Funding for open access charge National Institutes ofHealth

Figure 2 The image of predicted ORFs associated with the identified TIS sites is shown for ACTG1 All isoforms of the transcript are displayedFrom left to right the transcript structure consists of the 50 UTR (grey) CDS (green) and 30 UTR (grey) Each ORF is represented as a line betweenan identified TIS and a predicted stop codon (black square) The ORF is color coded according to reading frames (red frame 0 purple frame 1blue frame 2) The TIS codon sequence composition is shown together with the coordinates relative to the transcription start site

Nucleic Acids Research 2013 5

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Page 2: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

than we previously thought Given the physiological im-portance of alternative translation there is an urgent needfor techniques suitable for mapping global TIS positionsEarly attempts applied machine-learning techniques toidentify novel TIS sites on cDNA or genomic sequencesbased on sequence features summarized from the knownstart codons and their flanking sequences (1112)However in silico sequence analysis cannot preciselypredict alternative TIS sites in particular non-AUGcodons Recent development of ribosome profiling tech-niques allows monitoring ribosome dynamics with unpre-cedented resolution at the genome-wide scale (13) Tocapture translation initiation events some variants ofribosome profiling have been developed by applyingdistinct translation inhibitors to freeze initiating ribo-somes Ingolia et al used harringtonine to stall the first80S ribosome complex during initiation (14) Becauseharringtonine cannot completely block the initiating ribo-somes TIS identification relies on a support vectormachine (SVM) learning technique trained with priorTIS information (14) In this instance additional filteringsteps were required to increase the accuracy of identifiedTIS positions The extensive computational processing ofthe sequencing data likely introduces bias in identifyingnon-canonical TIS sites Fritsch et al used puromycin toenrich ribosomes near the start codon because initiatingribosomes are less sensitive to this translation inhibitor(15) The identification of TIS sites once again relies ona machine learning technique based on neural networks(15) In addition it only focuses on a region covering the50UTR and the first 30 nt of the coding region whichcould miss TIS sites downstream of the annotated startcodonTo circumvent the drawbacks mentioned above and

improve the accuracy of TIS identification we developedglobal translation initiation sequencing (GTI-Seq) thatpermits precise TIS identification at the nucleotide reso-lution (16) The rationale of GTI-seq is to use a translationinhibitor lactimidomycin (LTM) that is capable of com-pletely freezing the initiating ribosomes Unlike thecommonly used translation inhibitor cycloheximideLTM preferentially acts on the initiating ribosomesbecause it only binds to the empty E-site of theribosome that is normally occupied by deacylated tRNAduring elongation Indeed ribosome footprints associatedwith LTM were highly enriched at the annotated startcodon The single nucleotide resolution permits frameshifting analysis of ORFs Importantly GTI-Seq uses astraightforward computational approach in mappingglobal TIS sites minimizing possible bias introduced bydata processing From GTI-seq data sets a few novelalternative TIS codons have been experimentallyvalidated The accuracy of TIS identification was furthersupported by evolutionary conservation between humanand mouse genomesA high precision map of global TIS positions will ignite

numerous interests in deciphering physiological functionsof alternative translation To facilitate mechanistic inves-tigation of alternative translation for individual genes wedesigned a comprehensive TIS database (TISdb) built onmultiple high-resolution GTI-seq data sets A web search

interface is provided with certain filtering options tonarrow down the search results The information of TIS-associated ORFs as well as reading frames is presented asan image for easy visualization To our knowledge TISdbis the first public database covering both canonical andalternative TIS positions on a genome-wide scale Thedatabase is a ready resource for researchers looking foralternative translation on individual genes and provides aunique view depicting the richness of genome codingpotential

METHODS

Data generation

Four biological replicates of GTI-seq data from HEK293and one set of GTI-seq data from a MEF cell line wereused to identify putative TIS sites of human and mousegenomes respectively (1617) Using Tophat (18) theGTI-seq data were first mapped to the correspondinggenome and transcriptome downloaded from UCSCgenome browser (hg19 and mm10) (19) From theuniquely mapped reads the 13th nucleotide (12 nt offsetfrom the 50 end) was inferred as the ribosome P-siteposition which corresponds to the start codon recognizedby the initiation Met-tRNA during translation initiationThese uniquely mapped reads were then intersected withthe NCBI Refseq gene annotation to quantify the P-siteread count for each individual mRNA transcript Giventhe fact that many P-sites have a small number of GTI-seqreads and the distribution of the P-site read count is ap-parently Poisson over-dispersed (unequal mean andvariance) we applied the zero-truncated binomial negative(ZTNB) model to determine P-sites with a statistically sig-nificant number of read counts The ZTNB model canhandle non-zero digital values of high-throughputsequencing data and has been applied to cross-linkingimmunoprecipitation sequencing (CLIP-seq) (20) Aglobal ZTBN model was first fit over all the non-emptyP-sites in the entire transcriptome Second for each indi-vidual transcript a local ZTNB model was trained on thenon-zero P-sites of this transcript The P-site satisfyingthe P-value cut-offs based on the parameters estimatedin the global and local ZTBN model was categorized asa putative TIS codon

From human GTI-seq data putative TIS codons dis-covered in at least three biological replicates (out of four)were compiled into the final database The final databasein the current release of TISdb contains 6991 TIS sitesfrom 4961 human genes and 9973 TIS sites from 5668mouse genes Compared with the Refseq gene annotation30 of the total predicted TIS sites match the annotatedTIS (aTIS) codons Approximately 50 of identified TISsites belong to the upstream TIS (uTIS) codons and 20are downstream TIS (dTIS) (Table 1) In terms of codoncomposition gt50 of the identified TIS sites use AUGwhereas 30 use near-cognate codons that only differ atone position from AUG (ie AUC AUA AUU CUGGUG UUG AUG AAG and AGG) It is also worthmentioning that the percentages of the AUG and thenear-cognate codons are underestimated because a small

2 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

portion of GTI-seq reads have 1 nt offset to the 13th P-siteposition Therefore lt10 of the human TIS sites and20 of the mouse TIS sites in our database may eitheruse an unconventional TIS codon or represent a falsepositive identification

Web interface

The web interface is developed using Python Django webframework which is served by Apache The TIS data arestored in an SQLite database All the TIS informationincluding the codon sequence and predicted ORF canbe downloaded from a separate lsquoDownloadrsquo page Inaddition the lsquoMain pagersquo and lsquoHelprsquo page containdetails about data generation result interpretation andother relevant information

Database features

The TISdb database provides a user-friendly web interfacefor searching putative TIS sites on individual genes fromeither the human or mouse genome In addition to thegeneral query the database allows users to specify a setof filter criteria to focus on the TIS sites of their interestsThe query returns a summary table and a carousel con-taining images illustrating the positional information ofTIS sites and the associated ORF in the context ofRefseq transcript In addition the reading frame informa-tion is colour coded for easy visualization These compo-nents of query results offer users a comprehensive andstraightforward visualization of TIS codons

TIS query

On the lsquoSearchrsquo page users can perform a general queryfor TIS sites or a more specialized search During generalsearches users can input different types of gene identifiersincluding official gene symbols Refseq transcript IDEnsembl gene ID and Entrez gene ID By pasting a geneidentifier list or uploading a gene identifier file users cansearch up to 50 genes during an individual query Anotherrequired field is the choice of species (either human ormouse) To narrow down the search results users canselect several optional search criteria provided by TISdbto address different aspects of TIS information Thesesearch criteria include codon composition (canonicalATG near-cognate initiation codon or other codons)and the type of TIS (annotated TIS downstream TISand upstream TIS) Given the regulatory role of uORFin gene expression users can select the lsquouORF onlyrsquo tolimit their search to uTIS and the associated uORFs

In addition to the TIS information derived from GTI-seq we also integrated the TIS sites identified by

harringtonine-based approach and puromycin-basedmethods (1415) Because of the inherent differences ofthose techniques variations of TIS resolution and differ-ent cell types used in those studies only 30 of TIS sitesfrom TISdb have matching TIS positions reported inother studies Nonetheless the general agreement of TISidentification between different studies offers independentclues for experimental biologists to investigate alternativetranslation events with high confidence By selectinglsquoSupported by other studiesrsquo users can limit their searchto those TIS sites commonly identified by differentmethods within a 3 nt window

TIS search results

Once the query has matched the records in the databasethe lsquoResultrsquo page will display two components TISsummary table and ORF prediction view

TIS summary tableThe first component of the result section is a TIS summarytable from selected species (Figure 1A) This tablecontains basic TIS information including the speciesgene symbol transcript ID and genome coordinate ofthe TIS To help interpret biological implication of theTIS it also provides relative positional informationcompared with the annotated start codon Given the regu-latory role of uORF any TIS sites associated with uORFare explicitly indicated As discussed above we alsoprovide information about whether a TIS was similarlyidentified in other studies employing different techniquesTo demonstrate the conservation feature of TIS site se-quences we calculated the average phyloP score over awindow of 3 and+4nt around the TIS site accordingto the UCSC 46-way vertebrate alignment for hg19 and60-way vertebrate alignment for mm10 (19) In general apositive average phyloP score reflects lsquoconservationrsquo of aTIS context region whereas a negative value suggests lsquose-lectionrsquo of a TIS context A hyperlink is created for phyloPscore in the summary table for users to retrieve the exactsequence alignment and other relevant information fromthe UCSC genome browser (Figure 1B) The wholesummary table is organized on the basis of transcriptisoforms instead of the gene This is because a numberof TIS sites are only specific to certain transcriptisoforms of the same gene (for example GAPDH) Inthis way we avoid the loss of isoform-specific TIS infor-mation and make ORF prediction meaningful within thecontext of the mRNA transcript Even though this couldcause redundancy of TIS coordinates in the summarytable it provides a more intuitive way for biological inter-pretation of the TIS codon

Table 1 Statistics of translation initiation sites in the TISdb

Species Start codon sequence TIS category Total TIS (gene)

AUG Near cognate Others Annotated Upstream Downstream Others

Human 4090 2132 769 2575 3540 678 198 6991 (4961)Mouse 4400 3144 2429 2729 5099 2051 94 9973 (5668)

Nucleic Acids Research 2013 3

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

ORF prediction viewTo help users investigate the biological functions of theidentified TIS codons in the context of their ORFs acarousel containing illustrative images of each individualtranscript is displayed Each image corresponds to aqueried gene As explained in the previous section allthe transcript isoforms for that particular gene are dis-played The content of each image contains the coordinateof the TIS codon relative to the transcription start sitecodon composition of the identified TIS colour-codedframe-shift information and the predicted stop codon

(TAG TAA and TGA) position associated with thatORF (Figure 2) The caption of the image is presentedas speciesgene Figure 2B is an example of the ORF pre-diction view of human g-actin 1 (ACTG1) This genehas two transcript isoforms NM_001199954 andNM_001614 In the TISdb there are four TIS sites inboth isoforms two uTIS one aTIS and one dTISHowever the biological implications of the two uTIScodons are different for the two transcript isoforms Theupstream CUG uTIS is in the same reading frame as theCDS of NM_001199954 which is expected to result in a

Figure 1 An example of TIS search results for ACTG1 from both human and mouse genomes (A) Summary table of TIS information for ACTG1The coordinates correspond to the first position of the start codon The genome coordinates are consistent with hg19 (human) or mm10 (mouse)(B) Multiple alignment of TIS context A window of 3 and +4nt around an upstream TIS of mouse ACTG1 is displayed The height in thelsquoPlacental Consrsquo represents the degree of conservation for each base in the window The orthologous sequences of 10 vertebrate species are shown formultiple alignment

4 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

protein product with an NH2-terminal extension Incontrast the same uTIS in NM_001614 only associateswith a short uORF overlapping with the CDS The down-stream CUG uTIS is in a different reading frame relativeto the annotated CDS and the associated dORFs in bothtranscript isoforms have distinct sequences

FUTURE PROSPECTS

GTI-seq represents a remarkable technological advance-ment in mapping global mRNA translation initiation Ithas the potential to reveal the hidden coding potential ofthe transcriptome Comprehensive cataloguing of globalTIS sites and the associated ORFs is just the beginning inunveiling the principles governing alternative translationThe enormous biological breadth of translational controlhas led to an enhanced appreciation of proteome diversityand complexity Despite the advantages of GTI-Seq onTIS identification some limitations still exist Firstcurrent ribosome profiling approaches lack quantitativecapacity The sequencing reads density associated witheither harringtonine or LTM does not truly reflect therate of translation initiation Therefore the currentTISdb provides the position of alternative TIS codonsrather than the efficiency of alternation translation initi-ation Second the depth of ribosome profiling isinfluenced by the abundance of transcripts TIS informa-tion may not be available for transcripts of lowabundance For genes with relatively high expressionGTI-seq could possibly fail to capture some TIS signalsthat have low initiation efficiency Such scenarios includetranscripts with multiple uTIS sites but no annotated TISsites Third TIS selection is subjected to regulation underdifferent growth conditions Future GTI-seq experimentswill focus on quantitative changes of TIS selection in

response to stressors such as nutrient starvation Fourthtissue-specific TIS selection remains a formidable taskbased on technical limitations of the current ribosomeprofiling protocol We envision that variants ofribosome profiling will be developed in the future tocapture quantitative TIS information from varioustissues across a wider range of organismal species Thecurrent TISdb sets the stage for investigation of alterna-tive translation and provides an important platform forstudying translational control There is little doubt thatintegration of GTI-seq data with other data sets such asCHIP-seq RNA-seq miRNA profiling and proteomicswill present a fresh view of global post-transcriptionaland -translational gene regulation The effort to consoli-date the rich biology with detailed understanding ofthe underlying mechanisms promises an exciting andsurprising future

ACKNOWLEDGEMENTS

We thank members of Qian laboratory for helpful sugges-tion on the website development and critical comments forthe manuscript We also thank David Smith KevinMurphy and Cornell information technologies forproviding the virtual hosting space for the TISdb website

FUNDING

National Institutes of Health [1 DP2 OD006449-011R01AG042400-01A1] Ellison Medical Foundation[AG-NS-0605-09] and DOD Exploration-HypothesisDevelopment Award [W81XWH-11-1-0236 to S-BQ]Funding for open access charge National Institutes ofHealth

Figure 2 The image of predicted ORFs associated with the identified TIS sites is shown for ACTG1 All isoforms of the transcript are displayedFrom left to right the transcript structure consists of the 50 UTR (grey) CDS (green) and 30 UTR (grey) Each ORF is represented as a line betweenan identified TIS and a predicted stop codon (black square) The ORF is color coded according to reading frames (red frame 0 purple frame 1blue frame 2) The TIS codon sequence composition is shown together with the coordinates relative to the transcription start site

Nucleic Acids Research 2013 5

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Page 3: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

portion of GTI-seq reads have 1 nt offset to the 13th P-siteposition Therefore lt10 of the human TIS sites and20 of the mouse TIS sites in our database may eitheruse an unconventional TIS codon or represent a falsepositive identification

Web interface

The web interface is developed using Python Django webframework which is served by Apache The TIS data arestored in an SQLite database All the TIS informationincluding the codon sequence and predicted ORF canbe downloaded from a separate lsquoDownloadrsquo page Inaddition the lsquoMain pagersquo and lsquoHelprsquo page containdetails about data generation result interpretation andother relevant information

Database features

The TISdb database provides a user-friendly web interfacefor searching putative TIS sites on individual genes fromeither the human or mouse genome In addition to thegeneral query the database allows users to specify a setof filter criteria to focus on the TIS sites of their interestsThe query returns a summary table and a carousel con-taining images illustrating the positional information ofTIS sites and the associated ORF in the context ofRefseq transcript In addition the reading frame informa-tion is colour coded for easy visualization These compo-nents of query results offer users a comprehensive andstraightforward visualization of TIS codons

TIS query

On the lsquoSearchrsquo page users can perform a general queryfor TIS sites or a more specialized search During generalsearches users can input different types of gene identifiersincluding official gene symbols Refseq transcript IDEnsembl gene ID and Entrez gene ID By pasting a geneidentifier list or uploading a gene identifier file users cansearch up to 50 genes during an individual query Anotherrequired field is the choice of species (either human ormouse) To narrow down the search results users canselect several optional search criteria provided by TISdbto address different aspects of TIS information Thesesearch criteria include codon composition (canonicalATG near-cognate initiation codon or other codons)and the type of TIS (annotated TIS downstream TISand upstream TIS) Given the regulatory role of uORFin gene expression users can select the lsquouORF onlyrsquo tolimit their search to uTIS and the associated uORFs

In addition to the TIS information derived from GTI-seq we also integrated the TIS sites identified by

harringtonine-based approach and puromycin-basedmethods (1415) Because of the inherent differences ofthose techniques variations of TIS resolution and differ-ent cell types used in those studies only 30 of TIS sitesfrom TISdb have matching TIS positions reported inother studies Nonetheless the general agreement of TISidentification between different studies offers independentclues for experimental biologists to investigate alternativetranslation events with high confidence By selectinglsquoSupported by other studiesrsquo users can limit their searchto those TIS sites commonly identified by differentmethods within a 3 nt window

TIS search results

Once the query has matched the records in the databasethe lsquoResultrsquo page will display two components TISsummary table and ORF prediction view

TIS summary tableThe first component of the result section is a TIS summarytable from selected species (Figure 1A) This tablecontains basic TIS information including the speciesgene symbol transcript ID and genome coordinate ofthe TIS To help interpret biological implication of theTIS it also provides relative positional informationcompared with the annotated start codon Given the regu-latory role of uORF any TIS sites associated with uORFare explicitly indicated As discussed above we alsoprovide information about whether a TIS was similarlyidentified in other studies employing different techniquesTo demonstrate the conservation feature of TIS site se-quences we calculated the average phyloP score over awindow of 3 and+4nt around the TIS site accordingto the UCSC 46-way vertebrate alignment for hg19 and60-way vertebrate alignment for mm10 (19) In general apositive average phyloP score reflects lsquoconservationrsquo of aTIS context region whereas a negative value suggests lsquose-lectionrsquo of a TIS context A hyperlink is created for phyloPscore in the summary table for users to retrieve the exactsequence alignment and other relevant information fromthe UCSC genome browser (Figure 1B) The wholesummary table is organized on the basis of transcriptisoforms instead of the gene This is because a numberof TIS sites are only specific to certain transcriptisoforms of the same gene (for example GAPDH) Inthis way we avoid the loss of isoform-specific TIS infor-mation and make ORF prediction meaningful within thecontext of the mRNA transcript Even though this couldcause redundancy of TIS coordinates in the summarytable it provides a more intuitive way for biological inter-pretation of the TIS codon

Table 1 Statistics of translation initiation sites in the TISdb

Species Start codon sequence TIS category Total TIS (gene)

AUG Near cognate Others Annotated Upstream Downstream Others

Human 4090 2132 769 2575 3540 678 198 6991 (4961)Mouse 4400 3144 2429 2729 5099 2051 94 9973 (5668)

Nucleic Acids Research 2013 3

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

ORF prediction viewTo help users investigate the biological functions of theidentified TIS codons in the context of their ORFs acarousel containing illustrative images of each individualtranscript is displayed Each image corresponds to aqueried gene As explained in the previous section allthe transcript isoforms for that particular gene are dis-played The content of each image contains the coordinateof the TIS codon relative to the transcription start sitecodon composition of the identified TIS colour-codedframe-shift information and the predicted stop codon

(TAG TAA and TGA) position associated with thatORF (Figure 2) The caption of the image is presentedas speciesgene Figure 2B is an example of the ORF pre-diction view of human g-actin 1 (ACTG1) This genehas two transcript isoforms NM_001199954 andNM_001614 In the TISdb there are four TIS sites inboth isoforms two uTIS one aTIS and one dTISHowever the biological implications of the two uTIScodons are different for the two transcript isoforms Theupstream CUG uTIS is in the same reading frame as theCDS of NM_001199954 which is expected to result in a

Figure 1 An example of TIS search results for ACTG1 from both human and mouse genomes (A) Summary table of TIS information for ACTG1The coordinates correspond to the first position of the start codon The genome coordinates are consistent with hg19 (human) or mm10 (mouse)(B) Multiple alignment of TIS context A window of 3 and +4nt around an upstream TIS of mouse ACTG1 is displayed The height in thelsquoPlacental Consrsquo represents the degree of conservation for each base in the window The orthologous sequences of 10 vertebrate species are shown formultiple alignment

4 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

protein product with an NH2-terminal extension Incontrast the same uTIS in NM_001614 only associateswith a short uORF overlapping with the CDS The down-stream CUG uTIS is in a different reading frame relativeto the annotated CDS and the associated dORFs in bothtranscript isoforms have distinct sequences

FUTURE PROSPECTS

GTI-seq represents a remarkable technological advance-ment in mapping global mRNA translation initiation Ithas the potential to reveal the hidden coding potential ofthe transcriptome Comprehensive cataloguing of globalTIS sites and the associated ORFs is just the beginning inunveiling the principles governing alternative translationThe enormous biological breadth of translational controlhas led to an enhanced appreciation of proteome diversityand complexity Despite the advantages of GTI-Seq onTIS identification some limitations still exist Firstcurrent ribosome profiling approaches lack quantitativecapacity The sequencing reads density associated witheither harringtonine or LTM does not truly reflect therate of translation initiation Therefore the currentTISdb provides the position of alternative TIS codonsrather than the efficiency of alternation translation initi-ation Second the depth of ribosome profiling isinfluenced by the abundance of transcripts TIS informa-tion may not be available for transcripts of lowabundance For genes with relatively high expressionGTI-seq could possibly fail to capture some TIS signalsthat have low initiation efficiency Such scenarios includetranscripts with multiple uTIS sites but no annotated TISsites Third TIS selection is subjected to regulation underdifferent growth conditions Future GTI-seq experimentswill focus on quantitative changes of TIS selection in

response to stressors such as nutrient starvation Fourthtissue-specific TIS selection remains a formidable taskbased on technical limitations of the current ribosomeprofiling protocol We envision that variants ofribosome profiling will be developed in the future tocapture quantitative TIS information from varioustissues across a wider range of organismal species Thecurrent TISdb sets the stage for investigation of alterna-tive translation and provides an important platform forstudying translational control There is little doubt thatintegration of GTI-seq data with other data sets such asCHIP-seq RNA-seq miRNA profiling and proteomicswill present a fresh view of global post-transcriptionaland -translational gene regulation The effort to consoli-date the rich biology with detailed understanding ofthe underlying mechanisms promises an exciting andsurprising future

ACKNOWLEDGEMENTS

We thank members of Qian laboratory for helpful sugges-tion on the website development and critical comments forthe manuscript We also thank David Smith KevinMurphy and Cornell information technologies forproviding the virtual hosting space for the TISdb website

FUNDING

National Institutes of Health [1 DP2 OD006449-011R01AG042400-01A1] Ellison Medical Foundation[AG-NS-0605-09] and DOD Exploration-HypothesisDevelopment Award [W81XWH-11-1-0236 to S-BQ]Funding for open access charge National Institutes ofHealth

Figure 2 The image of predicted ORFs associated with the identified TIS sites is shown for ACTG1 All isoforms of the transcript are displayedFrom left to right the transcript structure consists of the 50 UTR (grey) CDS (green) and 30 UTR (grey) Each ORF is represented as a line betweenan identified TIS and a predicted stop codon (black square) The ORF is color coded according to reading frames (red frame 0 purple frame 1blue frame 2) The TIS codon sequence composition is shown together with the coordinates relative to the transcription start site

Nucleic Acids Research 2013 5

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Page 4: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

ORF prediction viewTo help users investigate the biological functions of theidentified TIS codons in the context of their ORFs acarousel containing illustrative images of each individualtranscript is displayed Each image corresponds to aqueried gene As explained in the previous section allthe transcript isoforms for that particular gene are dis-played The content of each image contains the coordinateof the TIS codon relative to the transcription start sitecodon composition of the identified TIS colour-codedframe-shift information and the predicted stop codon

(TAG TAA and TGA) position associated with thatORF (Figure 2) The caption of the image is presentedas speciesgene Figure 2B is an example of the ORF pre-diction view of human g-actin 1 (ACTG1) This genehas two transcript isoforms NM_001199954 andNM_001614 In the TISdb there are four TIS sites inboth isoforms two uTIS one aTIS and one dTISHowever the biological implications of the two uTIScodons are different for the two transcript isoforms Theupstream CUG uTIS is in the same reading frame as theCDS of NM_001199954 which is expected to result in a

Figure 1 An example of TIS search results for ACTG1 from both human and mouse genomes (A) Summary table of TIS information for ACTG1The coordinates correspond to the first position of the start codon The genome coordinates are consistent with hg19 (human) or mm10 (mouse)(B) Multiple alignment of TIS context A window of 3 and +4nt around an upstream TIS of mouse ACTG1 is displayed The height in thelsquoPlacental Consrsquo represents the degree of conservation for each base in the window The orthologous sequences of 10 vertebrate species are shown formultiple alignment

4 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

protein product with an NH2-terminal extension Incontrast the same uTIS in NM_001614 only associateswith a short uORF overlapping with the CDS The down-stream CUG uTIS is in a different reading frame relativeto the annotated CDS and the associated dORFs in bothtranscript isoforms have distinct sequences

FUTURE PROSPECTS

GTI-seq represents a remarkable technological advance-ment in mapping global mRNA translation initiation Ithas the potential to reveal the hidden coding potential ofthe transcriptome Comprehensive cataloguing of globalTIS sites and the associated ORFs is just the beginning inunveiling the principles governing alternative translationThe enormous biological breadth of translational controlhas led to an enhanced appreciation of proteome diversityand complexity Despite the advantages of GTI-Seq onTIS identification some limitations still exist Firstcurrent ribosome profiling approaches lack quantitativecapacity The sequencing reads density associated witheither harringtonine or LTM does not truly reflect therate of translation initiation Therefore the currentTISdb provides the position of alternative TIS codonsrather than the efficiency of alternation translation initi-ation Second the depth of ribosome profiling isinfluenced by the abundance of transcripts TIS informa-tion may not be available for transcripts of lowabundance For genes with relatively high expressionGTI-seq could possibly fail to capture some TIS signalsthat have low initiation efficiency Such scenarios includetranscripts with multiple uTIS sites but no annotated TISsites Third TIS selection is subjected to regulation underdifferent growth conditions Future GTI-seq experimentswill focus on quantitative changes of TIS selection in

response to stressors such as nutrient starvation Fourthtissue-specific TIS selection remains a formidable taskbased on technical limitations of the current ribosomeprofiling protocol We envision that variants ofribosome profiling will be developed in the future tocapture quantitative TIS information from varioustissues across a wider range of organismal species Thecurrent TISdb sets the stage for investigation of alterna-tive translation and provides an important platform forstudying translational control There is little doubt thatintegration of GTI-seq data with other data sets such asCHIP-seq RNA-seq miRNA profiling and proteomicswill present a fresh view of global post-transcriptionaland -translational gene regulation The effort to consoli-date the rich biology with detailed understanding ofthe underlying mechanisms promises an exciting andsurprising future

ACKNOWLEDGEMENTS

We thank members of Qian laboratory for helpful sugges-tion on the website development and critical comments forthe manuscript We also thank David Smith KevinMurphy and Cornell information technologies forproviding the virtual hosting space for the TISdb website

FUNDING

National Institutes of Health [1 DP2 OD006449-011R01AG042400-01A1] Ellison Medical Foundation[AG-NS-0605-09] and DOD Exploration-HypothesisDevelopment Award [W81XWH-11-1-0236 to S-BQ]Funding for open access charge National Institutes ofHealth

Figure 2 The image of predicted ORFs associated with the identified TIS sites is shown for ACTG1 All isoforms of the transcript are displayedFrom left to right the transcript structure consists of the 50 UTR (grey) CDS (green) and 30 UTR (grey) Each ORF is represented as a line betweenan identified TIS and a predicted stop codon (black square) The ORF is color coded according to reading frames (red frame 0 purple frame 1blue frame 2) The TIS codon sequence composition is shown together with the coordinates relative to the transcription start site

Nucleic Acids Research 2013 5

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Page 5: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

protein product with an NH2-terminal extension Incontrast the same uTIS in NM_001614 only associateswith a short uORF overlapping with the CDS The down-stream CUG uTIS is in a different reading frame relativeto the annotated CDS and the associated dORFs in bothtranscript isoforms have distinct sequences

FUTURE PROSPECTS

GTI-seq represents a remarkable technological advance-ment in mapping global mRNA translation initiation Ithas the potential to reveal the hidden coding potential ofthe transcriptome Comprehensive cataloguing of globalTIS sites and the associated ORFs is just the beginning inunveiling the principles governing alternative translationThe enormous biological breadth of translational controlhas led to an enhanced appreciation of proteome diversityand complexity Despite the advantages of GTI-Seq onTIS identification some limitations still exist Firstcurrent ribosome profiling approaches lack quantitativecapacity The sequencing reads density associated witheither harringtonine or LTM does not truly reflect therate of translation initiation Therefore the currentTISdb provides the position of alternative TIS codonsrather than the efficiency of alternation translation initi-ation Second the depth of ribosome profiling isinfluenced by the abundance of transcripts TIS informa-tion may not be available for transcripts of lowabundance For genes with relatively high expressionGTI-seq could possibly fail to capture some TIS signalsthat have low initiation efficiency Such scenarios includetranscripts with multiple uTIS sites but no annotated TISsites Third TIS selection is subjected to regulation underdifferent growth conditions Future GTI-seq experimentswill focus on quantitative changes of TIS selection in

response to stressors such as nutrient starvation Fourthtissue-specific TIS selection remains a formidable taskbased on technical limitations of the current ribosomeprofiling protocol We envision that variants ofribosome profiling will be developed in the future tocapture quantitative TIS information from varioustissues across a wider range of organismal species Thecurrent TISdb sets the stage for investigation of alterna-tive translation and provides an important platform forstudying translational control There is little doubt thatintegration of GTI-seq data with other data sets such asCHIP-seq RNA-seq miRNA profiling and proteomicswill present a fresh view of global post-transcriptionaland -translational gene regulation The effort to consoli-date the rich biology with detailed understanding ofthe underlying mechanisms promises an exciting andsurprising future

ACKNOWLEDGEMENTS

We thank members of Qian laboratory for helpful sugges-tion on the website development and critical comments forthe manuscript We also thank David Smith KevinMurphy and Cornell information technologies forproviding the virtual hosting space for the TISdb website

FUNDING

National Institutes of Health [1 DP2 OD006449-011R01AG042400-01A1] Ellison Medical Foundation[AG-NS-0605-09] and DOD Exploration-HypothesisDevelopment Award [W81XWH-11-1-0236 to S-BQ]Funding for open access charge National Institutes ofHealth

Figure 2 The image of predicted ORFs associated with the identified TIS sites is shown for ACTG1 All isoforms of the transcript are displayedFrom left to right the transcript structure consists of the 50 UTR (grey) CDS (green) and 30 UTR (grey) Each ORF is represented as a line betweenan identified TIS and a predicted stop codon (black square) The ORF is color coded according to reading frames (red frame 0 purple frame 1blue frame 2) The TIS codon sequence composition is shown together with the coordinates relative to the transcription start site

Nucleic Acids Research 2013 5

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from

Page 6: TISdb: a database for alternative translation initiation ...qian.human.cornell.edu/Files/Nucl. Acids Res.-2013... · consisting of initiation, elongation, termination and ribosome

Conflict of interest statement None declared

REFERENCES

1 JacksonRJ HellenCU and PestovaTV (2010) Themechanism of eukaryotic translation initiation and principlesof its regulation Nat Rev Mol Cell Biol 11 113ndash127

2 AitkenCE and LorschJR (2012) A mechanistic overview oftranslation initiation in eukaryotes Nat Struct Mol Biol 19568ndash576

3 StarckSR JiangV Pavon-EternodM PrasadS McCarthyBPanT and ShastriN (2012) Leucine-tRNA initiates at CUGstart codons for protein synthesis and presentation by MHCclass I Science 336 1719ndash1723

4 SchwabSR ShugartJA HorngT MalarkannanS andShastriN (2004) Unanticipated antigens translation initiation atCUG with leucine PLoS Biol 2 e366

5 KozakM (2002) Pushing the limits of the scanning mechanismfor initiation of translation Gene 299 1ndash34

6 SonenbergN and HinnebuschAG (2009) Regulation oftranslation initiation in eukaryotes mechanisms and biologicaltargets Cell 136 731ndash745

7 MokrejsM VopalenskyV KolenatyO MasekT FeketovaZSekyrovaP SkaloudovaB KrizV and PospisekM (2006)IRESite the database of experimentally verified IRES structuresNucleic Acids Res 34 D125ndashD130

8 MedenbachJ SeilerM and HentzeMW (2011) Translationalcontrol via protein-regulated upstream open reading frames Cell145 902ndash913

9 VattemKM and WekRC (2004) Reinitiation involvingupstream ORFs regulates ATF4 mRNA translation inmammalian cells Proc Natl Acad Sci USA 101 11269ndash11274

10 HopkinsBD FineB SteinbachN DendyM RappZShawJ PappasK YuJS HodakoskiC MenseS et al (2013)A secreted PTEN phosphatase that enters cells to alter signalingand survival Science 341 399ndash402

11 NadershahiA FahrenkrugSC and EllisLB (2004) Comparisonof computational methods for identifying translation initiationsites in EST data BMC Bioinformatics 5 14

12 SaeysY AbeelT DegroeveS and Van de PeerY (2007)Translation initiation site prediction on a genomic scale beautyin simplicity Bioinformatics 23 i418ndashi423

13 IngoliaNT GhaemmaghamiS NewmanJR and WeissmanJS(2009) Genome-wide analysis in vivo of translation withnucleotide resolution using ribosome profiling Science 324218ndash223

14 IngoliaNT LareauLF and WeissmanJS (2011) Ribosomeprofiling of mouse embryonic stem cells reveals the complexityand dynamics of mammalian proteomes Cell 147 789ndash802

15 FritschC HerrmannA NothnagelM SzafranskiK HuseKSchumannF SchreiberS PlatzerM KrawczakM HampeJet al (2012) Genome-wide search for novel human uORFs andN-terminal protein extensions using ribosomal footprintingGenome Res 22 2208ndash2218

16 LeeS LiuB LeeS HuangSX ShenB and QianSB (2012)Global mapping of translation initiation sites in mammalian cellsat single-nucleotide resolution Proc Natl Acad Sci USA 109E2424ndashE2432

17 LiuB HanY and QianSB (2013) Cotranslational response toproteotoxic stress by elongation pausing of ribosomes Mol Cell49 453ndash463

18 TrapnellC PachterL and SalzbergSL (2009) TopHatdiscovering splice junctions with RNA-Seq Bioinformatics 251105ndash1111

19 MeyerLR ZweigAS HinrichsAS KarolchikD KuhnRMWongM SloanCA RosenbloomKR RoeG RheadB et al(2013) The UCSC genome browser database extensions andupdates 2013 Nucleic Acids Res 41 D64ndashD69

20 UrenPJ Bahrami-SamaniE BurnsSC QiaoMKarginovFV HodgesE HannonGJ SanfordJRPenalvaLO and SmithAD (2012) Site identification in high-throughput RNA-protein interaction data Bioinformatics 283013ndash3020

6 Nucleic Acids Research 2013

by guest on Decem

ber 16 2013httpnaroxfordjournalsorg

Dow

nloaded from